1
|
Wessel D, Pogrebnyakov N. Using Social Media as a Source of Real-World Data for Pharmaceutical Drug Development and Regulatory Decision Making. Drug Saf 2024; 47:495-511. [PMID: 38446405 PMCID: PMC11018692 DOI: 10.1007/s40264-024-01409-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/07/2024] [Indexed: 03/07/2024]
Abstract
INTRODUCTION While pharmaceutical companies aim to leverage real-world data (RWD) to bridge the gap between clinical drug development and real-world patient outcomes, extant research has mainly focused on the use of social media in a post-approval safety-surveillance setting. Recent regulatory and technological developments indicate that social media may serve as a rich source to expand the evidence base to pre-approval and drug development activities. However, use cases related to drug development have been largely omitted, thereby missing some of the benefits of RWD. In addition, an applied end-to-end understanding of RWD rooted in both industry and regulations is lacking. OBJECTIVE We aimed to investigate how social media can be used as a source of RWD to support regulatory decision making and drug development in the pharmaceutical industry. We aimed to specifically explore the data pipeline and examine how social-media derived RWD can align with regulatory guidance from the US Food and Drug Administration and industry needs. METHODS A machine learning pipeline was developed to extract patient insights related to anticoagulants from X (Twitter) data. These findings were then analysed from an industry perspective, and complemented by interviews with professionals from a pharmaceutical company. RESULTS The analysis reveals several use cases where RWD derived from social media can be beneficial, particularly in generating hypotheses around patient and therapeutic area needs. We also note certain limitations of social media data, particularly around inferring causality. CONCLUSIONS Social media display considerable potential as a source of RWD for guiding efforts in pharmaceutical drug development and pre-approval settings. Although further regulatory guidance on the use of social media for RWD is needed to encourage its use, regulatory and technological developments are suggested to warrant at least exploratory uses for drug development.
Collapse
Affiliation(s)
- Didrik Wessel
- Copenhagen Business School, Frederiksberg, Denmark.
- , Nørrebrogade 18A 3TH, 2200, Copenhagen N, Denmark.
| | | |
Collapse
|
2
|
Schmidt L, Mohamed S, Meader N, Bacardit J, Craig D. Automated data analysis of unstructured grey literature in health research: A mapping review. Res Synth Methods 2024; 15:178-197. [PMID: 38115736 DOI: 10.1002/jrsm.1692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 11/07/2023] [Accepted: 11/22/2023] [Indexed: 12/21/2023]
Abstract
The amount of grey literature and 'softer' intelligence from social media or websites is vast. Given the long lead-times of producing high-quality peer-reviewed health information, this is causing a demand for new ways to provide prompt input for secondary research. To our knowledge, this is the first review of automated data extraction methods or tools for health-related grey literature and soft data, with a focus on (semi)automating horizon scans, health technology assessments (HTA), evidence maps, or other literature reviews. We searched six databases to cover both health- and computer-science literature. After deduplication, 10% of the search results were screened by two reviewers, the remainder was single-screened up to an estimated 95% sensitivity; screening was stopped early after screening an additional 1000 results with no new includes. All full texts were retrieved, screened, and extracted by a single reviewer and 10% were checked in duplicate. We included 84 papers covering automation for health-related social media, internet fora, news, patents, government agencies and charities, or trial registers. From each paper, we extracted data about important functionalities for users of the tool or method; information about the level of support and reliability; and about practical challenges and research gaps. Poor availability of code, data, and usable tools leads to low transparency regarding performance and duplication of work. Financial implications, scalability, integration into downstream workflows, and meaningful evaluations should be carefully planned before starting to develop a tool, given the vast amounts of data and opportunities those tools offer to expedite research.
Collapse
Affiliation(s)
- Lena Schmidt
- National Institute for Health and Care Research Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Saleh Mohamed
- National Institute for Health and Care Research Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Nick Meader
- National Institute for Health and Care Research Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Jaume Bacardit
- Interdisciplinary Computing and Complex BioSystems (ICOS) Research Group, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Dawn Craig
- National Institute for Health and Care Research Innovation Observatory, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
3
|
Chen MC, Huang TY, Chen TY, Boonyarat P, Chang YC. Clinical narrative-aware deep neural network for emergency department critical outcome prediction. J Biomed Inform 2023; 138:104284. [PMID: 36632861 DOI: 10.1016/j.jbi.2023.104284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 11/10/2022] [Accepted: 01/07/2023] [Indexed: 01/11/2023]
Abstract
Since early identification of potential critical patients in the Emergency Department (ED) can lower mortality and morbidity, this study seeks to develop a machine learning model capable of predicting possible critical outcomes based on the history and vital signs routinely collected at triage. We compare emergency physicians and the predictive performance of the machine learning model. Predictors including patients' chief complaints, present illness, past medical history, vital signs, and demographic data of adult patients (aged ≥ 18 years) visiting the ED at Shuang-Ho Hospital in New Taipei City, Taiwan, are extracted from the hospital's electronic health records. Critical outcomes are defined as in-hospital cardiac arrest (IHCA) or intensive care unit (ICU) admission. A clinical narrative-aware deep neural network was developed to handle the text-intensive data and standardized numerical data, which is compared against other machine learning models. After this, emergency physicians were asked to predict possible clinical outcomes of thirty visits that were extracted randomly from our dataset, and their results were further compared to our machine learning model. A total of 4,308 (2.5 %) out of the 171,275 adult visits to the ED included in this study resulted in critical outcomes. The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) of our proposed prediction model is 0.874 and 0.207, respectively, which not only outperforms the other machine learning models, but even has better sensitivity (0.95 vs 0.41) and accuracy (0.90 vs 0.67) as compared to the emergency physicians. This model is sensitive and accurate in predicting critical outcomes and highlights the potential to use predictive analytics to support post-triage decision-making.
Collapse
Affiliation(s)
- Min-Chen Chen
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Ting-Yun Huang
- Taipei Medical University Shuang-Ho Hospital Ministry of Health and Welfare, New Taipei City, Taiwan
| | - Tzu-Ying Chen
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Panchanit Boonyarat
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Yung-Chun Chang
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan; Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan.
| |
Collapse
|
4
|
Zhang A, Yu H, Zhou S, Huan Z, Yang X. Instance weighted SMOTE by indirectly exploring the data distribution. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Identifying Adverse Drug Reaction-Related Text from Social Media: A Multi-View Active Learning Approach with Various Document Representations. INFORMATION 2022. [DOI: 10.3390/info13040189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Adverse drug reactions (ADRs) are a huge public health issue. Identifying text that mentions ADRs from a large volume of social media data is important. However, we need to address two challenges for high-performing ADR-related text detection: the data imbalance problem and the requirement of simultaneously using data-driven information and handcrafted information. Therefore, we propose an approach named multi-view active learning using domain-specific and data-driven document representations (MVAL4D), endeavoring to enhance the predictive capability and alleviate the requirement of labeled data. Specifically, a new view-generation mechanism is proposed to generate multiple views by simultaneously exploiting various document representations obtained using handcrafted feature engineering and by performing deep learning methods. Moreover, different from previous active learning studies in which all instances are chosen using the same selection criterion, MVAL4D adopts different criteria (i.e., confidence and informativeness) to select potentially positive instances and potentially negative instances for manual annotation. The experimental results verify the effectiveness of MVAL4D. The proposed approach can be generalized to many other text classification tasks. Moreover, it can offer a solid foundation for the ADR mention extraction task, and improve the feasibility of monitoring drug safety using social media data.
Collapse
|
6
|
Huang JY, Lee WP, Lee KD. Predicting Adverse Drug Reactions from Social Media Posts: Data Balance, Feature Selection and Deep Learning. Healthcare (Basel) 2022; 10:healthcare10040618. [PMID: 35455795 PMCID: PMC9024774 DOI: 10.3390/healthcare10040618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 03/22/2022] [Accepted: 03/23/2022] [Indexed: 11/16/2022] Open
Abstract
Social forums offer a lot of new channels for collecting patients’ opinions to construct predictive models of adverse drug reactions (ADRs) for post-marketing surveillance. However, due to the characteristics of social posts, there are many challenges still to be solved when deriving such models, mainly including problems caused by data sparseness, data features with a high-dimensionality, and term diversity in data. To tackle these crucial issues related to identifying ADRs from social posts, we perform data analytics from the perspectives of data balance, feature selection, and feature learning. Meanwhile, we design a comprehensive experimental analysis to investigate the performance of different data processing techniques and data modeling methods. Most importantly, we present a deep learning-based approach that adopts the BERT (Bidirectional Encoder Representations from Transformers) model with a new batch-wise adaptive strategy to enhance the predictive performance. A series of experiments have been conducted to evaluate the machine learning methods with both manual and automated feature engineering processes. The results prove that with their own advantages both types of methods are effective in ADR prediction. In contrast to the traditional machine learning methods, our feature learning approach can automatically achieve the required task to save the manual effort for the large number of experiments.
Collapse
Affiliation(s)
- Jhih-Yuan Huang
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan;
| | - Wei-Po Lee
- Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan;
- Correspondence:
| | - King-Der Lee
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung 80708, Taiwan;
| |
Collapse
|
7
|
Using Machine Learning for Pharmacovigilance: A Systematic Review. Pharmaceutics 2022; 14:pharmaceutics14020266. [PMID: 35213998 PMCID: PMC8924891 DOI: 10.3390/pharmaceutics14020266] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/13/2022] [Accepted: 01/21/2022] [Indexed: 02/04/2023] Open
Abstract
Pharmacovigilance is a science that involves the ongoing monitoring of adverse drug reactions to existing medicines. Traditional approaches in this field can be expensive and time-consuming. The application of natural language processing (NLP) to analyze user-generated content is hypothesized as an effective supplemental source of evidence. In this systematic review, a broad and multi-disciplinary literature search was conducted involving four databases. A total of 5318 publications were initially found. Studies were considered relevant if they reported on the application of NLP to understand user-generated text for pharmacovigilance. A total of 16 relevant publications were included in this systematic review. All studies were evaluated to have medium reliability and validity. For all types of drugs, 14 publications reported positive findings with respect to the identification of adverse drug reactions, providing consistent evidence that natural language processing can be used effectively and accurately on user-generated textual content that was published to the Internet to identify adverse drug reactions for the purpose of pharmacovigilance. The evidence presented in this review suggest that the analysis of textual data has the potential to complement the traditional system of pharmacovigilance.
Collapse
|
8
|
Miscommunication in the age of communication: A crowdsourcing framework for symptom surveillance at the time of pandemics. Int J Med Inform 2021; 151:104486. [PMID: 33991885 PMCID: PMC8111883 DOI: 10.1016/j.ijmedinf.2021.104486] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 04/22/2021] [Accepted: 05/07/2021] [Indexed: 11/20/2022]
Abstract
OBJECTIVE There was a significant delay in compiling a complete list of the symptoms of COVID-19 during the 2020 outbreak of the disease. When there is little information about the symptoms of a novel disease, interventions to contain the spread of the disease would be suboptimal because people experiencing symptoms that are not yet known to be related to the disease may not limit their social activities. Our goal was to understand whether users' social media postings about the symptoms of novel diseases could be used to develop a complete list of the disease symptoms in a shorter time. MATERIALS AND METHODS We used the Twitter API to download tweets that contained 'coronavirus', 'COVID-19', and 'symptom'. After data cleaning, the resulting dataset consisted of over 95,000 unique, English tweets posted between January 17, 2020 and March 15, 2020 that contained references to the symptoms of COVID-19. We analyzed this data using network and time series methods. RESULTS We found that a complete list of the symptoms of COVID-19 could have been compiled by mid-March 2020, before most states in the U.S. announced a lockdown and about 75 days earlier than the list was completed on CDC's website. DISCUSSION & CONCLUSION We conclude that national and international health agencies should use the crowd-sourced intelligence obtained from social media to develop effective symptom surveillance systems in the early stages of pandemics. We propose a high-level framework that facilitates the collection, analysis, and dissemination of information that are posted in various languages and on different social media platforms about the symptoms of novel diseases.
Collapse
|
9
|
Dai HJ, Wang FD, Chen CW, Su CH, Wu CS, Jonnagaddala J. Cohort selection for clinical trials using multiple instance learning. J Biomed Inform 2020; 107:103438. [PMID: 32360937 DOI: 10.1016/j.jbi.2020.103438] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 02/29/2020] [Accepted: 04/27/2020] [Indexed: 10/24/2022]
Abstract
Identifying patients eligible for clinical trials using electronic health records (EHRs) is a challenging task usually requiring a comprehensive analysis of information stored in multiple EHRs of a patient. The goal of this study is to investigate different methods and their effectiveness in identifying patients that meet specific eligibility selection criteria based on patients' longitudinal records. An unstructured dataset released by the n2c2 cohort selection for clinical trials track was used, each of which included 2-5 records manually annotated to thirteen pre-defined selection criteria. Unlike the other studies, we formulated the problem as a multiple instance learning (MIL) task and compared the performance with that of the rule-based and the single instance-based classifiers. Our official best run achieved an average micro-F score of 0.8765 which was ranked as one of the top ten results in the track. Further experiments demonstrated that the performance of the MIL-based classifiers consistently yield better performance than their single-instance counterparts in the criteria that require the overall comprehension of the information distributed among all of the patient's EHRs. Rule-based and single instance learning approaches exhibited better performance in criteria that don't require a consideration of several factors across records. This study demonstrated that cohort selection using longitudinal patient records can be formulated as a MIL problem. Our results exhibit that the MIL-based classifiers supplement the rule-based methods and provide better results in comparison to the single instance learning approaches.
Collapse
Affiliation(s)
- Hong-Jie Dai
- Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan, ROC; School of Post-Baccalaureate Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC; National Institute of Cancer Research, National Health Research Institutes, Taiwan, R.O.C..
| | - Feng-Duo Wang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, ROC
| | - Chih-Wei Chen
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan, ROC
| | - Chu-Hsien Su
- Department of Psychiatry, National Taiwan University Hospital, Taipei, Taiwan, ROC
| | - Chi-Shin Wu
- Department of Psychiatry, National Taiwan University Hospital, Taipei, Taiwan, ROC
| | | |
Collapse
|
10
|
Wu XW, Yang HB, Yuan R, Long EW, Tong RS. Predictive models of medication non-adherence risks of patients with T2D based on multiple machine learning algorithms. BMJ Open Diabetes Res Care 2020; 8:8/1/e001055. [PMID: 32156739 PMCID: PMC7064141 DOI: 10.1136/bmjdrc-2019-001055] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 01/08/2020] [Accepted: 01/16/2020] [Indexed: 12/29/2022] Open
Abstract
OBJECTIVE Medication adherence plays a key role in type 2 diabetes (T2D) care. Identifying patients with high risks of non-compliance helps individualized management, especially for China, where medical resources are relatively insufficient. However, models with good predictive capabilities have not been studied. This study aims to assess multiple machine learning algorithms and screen out a model that can be used to predict patients' non-adherence risks. METHODS A real-world registration study was conducted at Sichuan Provincial People's Hospital from 1 April 2018 to 30 March 2019. Data of patients with T2D on demographics, disease and treatment, diet and exercise, mental status, and treatment adherence were obtained by face-to-face questionnaires. The medication possession ratio was used to evaluate patients' medication adherence status. Fourteen machine learning algorithms were applied for modeling, including Bayesian network, Neural Net, support vector machine, and so on, and balanced sampling, data imputation, binning, and methods of feature selection were evaluated by the area under the receiver operating characteristic curve (AUC). We use two-way cross-validation to ensure the accuracy of model evaluation, and we performed a posteriori test on the sample size based on the trend of AUC as the sample size increase. RESULTS A total of 401 patients out of 630 candidates were investigated, of which 85 were evaluated as poor adherence (21.20%). A total of 16 variables were selected as potential variables for modeling, and 300 models were built based on 30 machine learning algorithms. Among these algorithms, the AUC of the best capable one was 0.866±0.082. Imputing, oversampling and larger sample size will help improve predictive ability. CONCLUSIONS An accurate and sensitive adherence prediction model based on real-world registration data was established after evaluating data filling, balanced sampling, and so on, which may provide a technical tool for individualized diabetes care.
Collapse
Affiliation(s)
- Xing-Wei Wu
- Personalized Drug Therapy Key Laboratory of Sichuan Province, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
- Department of Pharmacy, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, China
| | - Heng-Bo Yang
- School of Pharmacy, Chengdu Medical College, Chengdu, China
| | - Rong Yuan
- Endocrine Department, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, China
| | - En-Wu Long
- Personalized Drug Therapy Key Laboratory of Sichuan Province, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
- Department of Pharmacy, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, China
| | - Rong-Sheng Tong
- Personalized Drug Therapy Key Laboratory of Sichuan Province, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
- Department of Pharmacy, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, China
| |
Collapse
|
11
|
|
12
|
Dai HJ, Su CH, Lee YQ, Zhang YC, Wang CK, Kuo CJ, Wu CS. Deep Learning-Based Natural Language Processing for Screening Psychiatric Patients. Front Psychiatry 2020; 11:533949. [PMID: 33584354 PMCID: PMC7874001 DOI: 10.3389/fpsyt.2020.533949] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 12/10/2020] [Indexed: 12/02/2022] Open
Abstract
The introduction of pre-trained language models in natural language processing (NLP) based on deep learning and the availability of electronic health records (EHRs) presents a great opportunity to transfer the "knowledge" learned from data in the general domain to enable the analysis of unstructured textual data in clinical domains. This study explored the feasibility of applying NLP to a small EHR dataset to investigate the power of transfer learning to facilitate the process of patient screening in psychiatry. A total of 500 patients were randomly selected from a medical center database. Three annotators with clinical experience reviewed the notes to make diagnoses for major/minor depression, bipolar disorder, schizophrenia, and dementia to form a small and highly imbalanced corpus. Several state-of-the-art NLP methods based on deep learning along with pre-trained models based on shallow or deep transfer learning were adapted to develop models to classify the aforementioned diseases. We hypothesized that the models that rely on transferred knowledge would be expected to outperform the models learned from scratch. The experimental results demonstrated that the models with the pre-trained techniques outperformed the models without transferred knowledge by micro-avg. and macro-avg. F-scores of 0.11 and 0.28, respectively. Our results also suggested that the use of the feature dependency strategy to build multi-labeling models instead of problem transformation is superior considering its higher performance and simplicity in the training process.
Collapse
Affiliation(s)
- Hong-Jie Dai
- Intelligent System Laboratory, Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan.,School of Post-Baccalaureate Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,National Institute of Cancer Research, National Health Research Institutes, Tainan, Taiwan
| | - Chu-Hsien Su
- Department of Psychiatry, National Taiwan University Hospital, Taipei, Taiwan
| | - You-Qian Lee
- Intelligent System Laboratory, Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - You-Chen Zhang
- Intelligent System Laboratory, Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Chen-Kai Wang
- Big Data Laboratory, Chunghwa Telecom Laboratories, Taoyuan, Taiwan
| | - Chian-Jue Kuo
- Taipei City Psychiatric Center, Taipei City Hospital, Taipei, Taiwan.,Department of Psychiatry, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Chi-Shin Wu
- Department of Psychiatry, National Taiwan University Hospital, Taipei, Taiwan
| |
Collapse
|