Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Follett L, Geletta S, Laugerman M. Quantifying risk associated with clinical trial termination: A text mining approach. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.11.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

For:	Follett L, Geletta S, Laugerman M. Quantifying risk associated with clinical trial termination: A text mining approach. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.11.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Number

Cited by Other Article(s)

Ferdowsi S, Knafou J, Borissov N, Vicente Alvarez D, Mishra R, Amini P, Teodoro D. Deep learning-based risk prediction for interventional clinical trials based on protocol design: A retrospective study. PATTERNS (NEW YORK, N.Y.) 2023;4:100689. [PMID: 36960445 PMCID: PMC10028430 DOI: 10.1016/j.patter.2023.100689] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/07/2022] [Accepted: 01/16/2023] [Indexed: 02/12/2023]

Improving clinical trial design using interpretable machine learning based prediction of early trial termination. Sci Rep 2023;13:121. [PMID: 36599880 PMCID: PMC9813129 DOI: 10.1038/s41598-023-27416-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 01/02/2023] [Indexed: 01/06/2023] Open

Eysenbach G, Šuster S, Baldwin T, Verspoor K. Predicting Publication of Clinical Trials Using Structured and Unstructured Data: Model Development and Validation Study. J Med Internet Res 2022;24:e38859. [PMID: 36563029 PMCID: PMC9823568 DOI: 10.2196/38859] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 10/14/2022] [Accepted: 11/16/2022] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Publication of registered clinical trials is a critical step in the timely dissemination of trial findings. However, a significant proportion of completed clinical trials are never published, motivating the need to analyze the factors behind success or failure to publish. This could inform study design, help regulatory decision-making, and improve resource allocation. It could also enhance our understanding of bias in the publication of trials and publication trends based on the research direction or strength of the findings. Although the publication of clinical trials has been addressed in several descriptive studies at an aggregate level, there is a lack of research on the predictive analysis of a trial's publishability given an individual (planned) clinical trial description.

OBJECTIVE

We aimed to conduct a study that combined structured and unstructured features relevant to publication status in a single predictive approach. Established natural language processing techniques as well as recent pretrained language models enabled us to incorporate information from the textual descriptions of clinical trials into a machine learning approach. We were particularly interested in whether and which textual features could improve the classification accuracy for publication outcomes.

METHODS

In this study, we used metadata from ClinicalTrials.gov (a registry of clinical trials) and MEDLINE (a database of academic journal articles) to build a data set of clinical trials (N=76,950) that contained the description of a registered trial and its publication outcome (27,702/76,950, 36% published and 49,248/76,950, 64% unpublished). This is the largest data set of its kind, which we released as part of this work. The publication outcome in the data set was identified from MEDLINE based on clinical trial identifiers. We carried out a descriptive analysis and predicted the publication outcome using 2 approaches: a neural network with a large domain-specific language model and a random forest classifier using a weighted bag-of-words representation of text.

RESULTS

First, our analysis of the newly created data set corroborates several findings from the existing literature regarding attributes associated with a higher publication rate. Second, a crucial observation from our predictive modeling was that the addition of textual features (eg, eligibility criteria) offers consistent improvements over using only structured data (F₁-score=0.62-0.64 vs F₁-score=0.61 without textual features). Both pretrained language models and more basic word-based representations provide high-utility text representations, with no significant empirical difference between the two.

CONCLUSIONS

Different factors affect the publication of a registered clinical trial. Our approach to predictive modeling combines heterogeneous features, both structured and unstructured. We show that methods from natural language processing can provide effective textual features to enable more accurate prediction of publication success, which has not been explored for this task previously.

Collapse

A Local Discrete Text Data Mining Method in High-Dimensional Data Space. INT J COMPUT INT SYS 2022. [DOI: 10.1007/s44196-022-00109-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language. FUTURE INTERNET 2022. [DOI: 10.3390/fi14070194] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Alharbey R, Kim JI, Daud A, Song M, Alshdadi AA, Hayat MK. Indexing important drugs from medical literature. Scientometrics 2022. [DOI: 10.1007/s11192-022-04340-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

On Graph Construction for Classification of Clinical Trials Protocols Using Graph Neural Networks. Artif Intell Med 2022. [DOI: 10.1007/978-3-031-09342-5_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Kim B, Jang YJ, Cho HR, Kim SY, Jeong JE, Shim MK, Kim MG. Predicting completion of clinical trials in pregnant women: Cox proportional hazard and neural network models. Clin Transl Sci 2021;15:691-699. [PMID: 34735737 PMCID: PMC8932703 DOI: 10.1111/cts.13187] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 09/25/2021] [Accepted: 10/21/2021] [Indexed: 12/01/2022] Open

Natural language processing in law: Prediction of outcomes in the higher courts of Turkey. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2021.102684] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Understanding and predicting COVID-19 clinical trial completion vs. cessation. PLoS One 2021;16:e0253789. [PMID: 34252108 PMCID: PMC8274906 DOI: 10.1371/journal.pone.0253789] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 06/12/2021] [Indexed: 11/19/2022] Open

Amara A, Hadj Taieb MA, Ben Aouicha M. Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis. APPL INTELL 2021;51:3052-3073. [PMID: 34764585 PMCID: PMC7881346 DOI: 10.1007/s10489-020-02033-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2020] [Indexed: 11/04/2022]

Predictive modeling of clinical trial terminations using feature engineering and embedding learning. Sci Rep 2021;11:3446. [PMID: 33568706 PMCID: PMC7876037 DOI: 10.1038/s41598-021-82840-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 01/25/2021] [Indexed: 11/16/2022] Open

Geletta S, Follett L, Laugerman M. Latent Dirichlet Allocation in predicting clinical trial terminations. BMC Med Inform Decis Mak 2019;19:242. [PMID: 31775737 PMCID: PMC6882341 DOI: 10.1186/s12911-019-0973-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 11/08/2019] [Indexed: 11/10/2022] Open

Abstract

Background

This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least 10 % of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures.

Method

We used data from the clinicialTrials.gov repository, from which we extracted both structured data (study characteristics), and unstructured data (the narrative description of the studies). We applied natural language processing techniques to the unstructured data to quantify the risk of termination by identifying distinctive topics that are more frequently associated with trials that are terminated and trials that are completed. We used the Latent Dirichlet Allocation (LDA) technique to derive 25 “topics” with corresponding sets of probabilities, which we then used to predict study-termination by utilizing random forest modeling. We fit two distinct models – one using only structured data as predictors and another model with both structured data and the 25 text topics derived from the unstructured data.

Results

In this paper, we demonstrate the interpretive and predictive value of LDA as it relates to predicting clinical trial failure. The results also demonstrate that the combined modeling approach yields robust predictive probabilities in terms of both sensitivity and specificity, relative to a model that utilizes the structured data alone.

Conclusions

Our study demonstrated that the use of topic modeling using LDA significantly raises the utility of unstructured data in better predicating the completion vs. termination of studies. This study sets the direction for future research to evaluate the viability of the designs of health studies.

Collapse