1
|
Wright EC, Kapuria D, Ben-Yakov G, Sharma D, Basu D, Cho MH, Abijo T, Wilkins KJ. Time to Publication for Randomized Clinical Trials Presented as Abstracts at Three Gastroenterology and Hepatology Conferences in 2017. GASTRO HEP ADVANCES 2023; 2:370-379. [PMID: 36938381 PMCID: PMC10022591 DOI: 10.1016/j.gastha.2022.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022]
Abstract
Background & Aims Results of randomized clinical trials are often first presented as conference abstracts, but these abstracts may be difficult to find, and trial results included in the abstract may not be followed by subsequent journal publications. In a review of abstracts submitted to eight major medical and surgical conferences in 2017, we identified 237 abstracts reporting primary results of randomized clinical trials accepted for presentation at three major gastroenterology and hepatology conferences. The aims of this new analysis were to determine the publication rate for these abstracts and the proportion of publications that included trial registration numbers in the publication abstract. Methods Clinical trial registries, PubMed, Europe PMC, and Google Scholar were searched through November 1, 2021, for publications reporting trial results for the selected abstracts. Publications were reviewed to determine if they included a trial registration number and if the registration number was in the abstract. Results Publications were found for 157 abstracts (66%) within four years of the conference. Publications were found more frequently for the 194 abstracts reporting results of registered trials (144, 74%) than for the 43 abstracts reporting unregistered trials (13, 30%), but only 67% of these 144 publications included the registration number in the publication abstract. Ten unpublished trials had summary results posted on ClinicalTrials.gov. Conclusions Clinical trial results could be more accessible if all trials were registered, authors included registration numbers in both conference and journal abstracts, and journal editors required the inclusion of registration numbers in publication abstracts for registered clinical trials.
Collapse
Affiliation(s)
- Elizabeth C. Wright
- Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Devika Kapuria
- Department of Gastroenterology, Washington University in St. Louis, St. Louis, Missouri
| | - Gil Ben-Yakov
- The Center for liver diseases Sheba, Tel-Hashomer medical center, Ramat Gan, Israel
| | - Disha Sharma
- Liver Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Dev Basu
- Medstar Good Samaritan Hospital, Baltimore, Maryland
| | - Min Ho Cho
- Department of Medicine, Baystate Medical Center, Springfield, Massachusetts
| | - Tomilowo Abijo
- Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Kenneth J. Wilkins
- Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
2
|
Eysenbach G, Šuster S, Baldwin T, Verspoor K. Predicting Publication of Clinical Trials Using Structured and Unstructured Data: Model Development and Validation Study. J Med Internet Res 2022; 24:e38859. [PMID: 36563029 PMCID: PMC9823568 DOI: 10.2196/38859] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 10/14/2022] [Accepted: 11/16/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Publication of registered clinical trials is a critical step in the timely dissemination of trial findings. However, a significant proportion of completed clinical trials are never published, motivating the need to analyze the factors behind success or failure to publish. This could inform study design, help regulatory decision-making, and improve resource allocation. It could also enhance our understanding of bias in the publication of trials and publication trends based on the research direction or strength of the findings. Although the publication of clinical trials has been addressed in several descriptive studies at an aggregate level, there is a lack of research on the predictive analysis of a trial's publishability given an individual (planned) clinical trial description. OBJECTIVE We aimed to conduct a study that combined structured and unstructured features relevant to publication status in a single predictive approach. Established natural language processing techniques as well as recent pretrained language models enabled us to incorporate information from the textual descriptions of clinical trials into a machine learning approach. We were particularly interested in whether and which textual features could improve the classification accuracy for publication outcomes. METHODS In this study, we used metadata from ClinicalTrials.gov (a registry of clinical trials) and MEDLINE (a database of academic journal articles) to build a data set of clinical trials (N=76,950) that contained the description of a registered trial and its publication outcome (27,702/76,950, 36% published and 49,248/76,950, 64% unpublished). This is the largest data set of its kind, which we released as part of this work. The publication outcome in the data set was identified from MEDLINE based on clinical trial identifiers. We carried out a descriptive analysis and predicted the publication outcome using 2 approaches: a neural network with a large domain-specific language model and a random forest classifier using a weighted bag-of-words representation of text. RESULTS First, our analysis of the newly created data set corroborates several findings from the existing literature regarding attributes associated with a higher publication rate. Second, a crucial observation from our predictive modeling was that the addition of textual features (eg, eligibility criteria) offers consistent improvements over using only structured data (F1-score=0.62-0.64 vs F1-score=0.61 without textual features). Both pretrained language models and more basic word-based representations provide high-utility text representations, with no significant empirical difference between the two. CONCLUSIONS Different factors affect the publication of a registered clinical trial. Our approach to predictive modeling combines heterogeneous features, both structured and unstructured. We show that methods from natural language processing can provide effective textual features to enable more accurate prediction of publication success, which has not been explored for this task previously.
Collapse
Affiliation(s)
| | - Simon Šuster
- School of Computing and Information Systems, University of Melbourne, Melbourne, Australia
| | - Timothy Baldwin
- School of Computing and Information Systems, University of Melbourne, Melbourne, Australia.,Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Karin Verspoor
- School of Computing Technologies, RMIT University, Melbourne, Australia
| |
Collapse
|
3
|
Liu S, Bourgeois FT, Dunn AG. Identifying unreported links between ClinicalTrials.gov trial registrations and their published results. Res Synth Methods 2022; 13:342-352. [PMID: 34970844 PMCID: PMC9090946 DOI: 10.1002/jrsm.1545] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 12/13/2021] [Accepted: 12/17/2021] [Indexed: 11/10/2022]
Abstract
A substantial proportion of trial registrations are not linked to corresponding published articles, limiting analyses and new tools. Our aim was to develop a method for finding articles reporting the results of trials that are registered on ClinicalTrials.gov when they do not include metadata links. We used a set of 27,280 trial registration and article pairs to train and evaluate methods for identifying missing links in both directions-from articles to registrations and from registrations to articles. We trained a classifier with six distance metrics as feature representations to rank the correct article or registration, using recall@K to evaluate performance and compare to baseline methods. When identifying links from registrations to published articles, the classifier ranked the correct article first (recall@1) among 378,048 articles in 80.8% of evaluation cases and 34.9% in the baseline method. Recall@10 was 85.1% compared to 60.7% in the baseline. When predicting links from articles to registrations, recall@1 was 83.4% for the classifier and 39.8% in the baseline. Recall@10 was 89.5% compared to 65.8% in the baseline. The proposed method improves on our baseline document similarity method to be feasible for identifying missing links in practice. Given a ClinicalTrials.gov registration, a user checking 10 ranked articles can expect to identify the matching article in at least 85% of cases, if the trial has been published. The proposed method can be used to improve the coupling of ClinicalTrials.gov and PubMed, with applications related to automating systematic review and evidence synthesis processes.
Collapse
Affiliation(s)
- Shifeng Liu
- Faculty of Medicine and Health, The University of Sydney, Biomedical Informatics and Digital Health, School of Medical Sciences, Sydney, New South Wales, Australia
| | - Florence T Bourgeois
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Adam G Dunn
- Faculty of Medicine and Health, The University of Sydney, Biomedical Informatics and Digital Health, School of Medical Sciences, Sydney, New South Wales, Australia
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| |
Collapse
|
4
|
Nicholls SG, McDonald S, McKenzie JE, Carroll K, Taljaard M. A review identified challenges distinguishing primary reports of randomized trials for meta-research: A proposal for improved reporting. J Clin Epidemiol 2022; 145:121-125. [PMID: 35081448 PMCID: PMC9233092 DOI: 10.1016/j.jclinepi.2022.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 01/04/2022] [Accepted: 01/18/2022] [Indexed: 11/15/2022]
Abstract
Meta-research is the discipline of studying research itself. A core investigative tool in meta-research is the use of systematic or scoping reviews to study the characteristics, methods and reporting of primary research studies. In the context of identifying eligible publications for methodological reviews of randomized controlled trials (RCTs), a challenge is to efficiently distinguish the primary trial report - which reports results for the primary outcome - from other types of reports, including design papers and secondary or supplementary analyses, or what we collectively refer to as non-primary reports. This may not be a straightforward task and may contribute to inefficiencies in the review process. Here, we draw on our recent methodological review of over 13,000 records to identify primary reports of pragmatic RCTs. We offer recommendations to improve the reporting of RCTs to facilitate more efficient identification of primary trial reports. We suggest that future updates to existing CONSORT guidelines include consideration of multiple trial reports and recommendations to clarify the primary or non-primary nature of each report. Our recommendations, together with improved adherence to inclusion of the trial registration number in the abstract and citation of a protocol or previously published primary report, would facilitate the conduct of methodological reviews.
Collapse
Affiliation(s)
- Stuart G Nicholls
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa Canada.
| | - Steve McDonald
- School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Road, Melbourne, Victoria 3004, Australia
| | - Joanne E McKenzie
- School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Road, Melbourne, Victoria 3004, Australia
| | - Kelly Carroll
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa Canada
| | - Monica Taljaard
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa Canada; School of Epidemiology and Public Health, University of Ottawa, Ottawa, Canada
| |
Collapse
|
5
|
Smalheiser NR, Holt AW. A web-based tool for automatically linking clinical trials to their publications. J Am Med Inform Assoc 2022; 29:822-830. [PMID: 35020887 PMCID: PMC9006700 DOI: 10.1093/jamia/ocab290] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 12/20/2021] [Accepted: 12/23/2021] [Indexed: 01/12/2023] Open
Abstract
OBJECTIVE Evidence synthesis teams, physicians, policy makers, and patients and their families all have an interest in following the outcomes of clinical trials and would benefit from being able to evaluate both the results posted in trial registries and in the publications that arise from them. Manual searching for publications arising from a given trial is a laborious and uncertain process. We sought to create a statistical model to automatically identify PubMed articles likely to report clinical outcome results from each registered trial in ClinicalTrials.gov. MATERIALS AND METHODS A machine learning-based model was trained on pairs (publications known to be linked to specific registered trials). Multiple features were constructed based on the degree of matching between the PubMed article metadata and specific fields of the trial registry, as well as matching with the set of publications already known to be linked to that trial. RESULTS Evaluation of the model using known linked articles as gold standard showed that they tend to be top ranked (median best rank = 1.0), and 91% of them are ranked in the top 10. DISCUSSION Based on this model, we have created a free, public web-based tool that, given any registered trial in ClinicalTrials.gov, presents a ranked list of the PubMed articles in order of estimated probability that they report clinical outcome data from that trial. The tool should greatly facilitate studies of trial outcome results and their relation to the original trial designs.
Collapse
Affiliation(s)
- Neil R Smalheiser
- Corresponding Author: Neil R. Smalheiser, MD, PhD, Department of Psychiatry, University of Illinois College of Medicine, 1601 W. Taylor Street, MC912, Chicago, IL 60612, USA;
| | - Arthur W Holt
- Department of Psychiatry, University of Illinois College of Medicine, Chicago, Illinois, USA
| |
Collapse
|
6
|
Surian D, Bourgeois FT, Dunn AG. The automation of relevant trial registration screening for systematic review updates: an evaluation study on a large dataset of ClinicalTrials.gov registrations. BMC Med Res Methodol 2021; 21:281. [PMID: 34922458 PMCID: PMC8684229 DOI: 10.1186/s12874-021-01485-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 11/22/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clinical trial registries can be used as sources of clinical evidence for systematic review synthesis and updating. Our aim was to evaluate methods for identifying clinical trial registrations that should be screened for inclusion in updates of published systematic reviews. METHODS A set of 4644 clinical trial registrations (ClinicalTrials.gov) included in 1089 systematic reviews (PubMed) were used to evaluate two methods (document similarity and hierarchical clustering) and representations (L2-normalised TF-IDF, Latent Dirichlet Allocation, and Doc2Vec) for ranking 163,501 completed clinical trials by relevance. Clinical trial registrations were ranked for each systematic review using seeding clinical trials, simulating how new relevant clinical trials could be automatically identified for an update. Performance was measured by the number of clinical trials that need to be screened to identify all relevant clinical trials. RESULTS Using the document similarity method with TF-IDF feature representation and Euclidean distance metric, all relevant clinical trials for half of the systematic reviews were identified after screening 99 trials (IQR 19 to 491). The best-performing hierarchical clustering was using Ward agglomerative clustering (with TF-IDF representation and Euclidean distance) and needed to screen 501 clinical trials (IQR 43 to 4363) to achieve the same result. CONCLUSION An evaluation using a large set of mined links between published systematic reviews and clinical trial registrations showed that document similarity outperformed hierarchical clustering for identifying relevant clinical trials to include in systematic review updates.
Collapse
Affiliation(s)
- Didi Surian
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, NSW, Australia
| | - Florence T Bourgeois
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Adam G Dunn
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
- The University of Sydney, Discipline of Biomedical Informatics and Digital Health, School of Medical Sciences, Faculty of Medicine and Health, Sydney, NSW, 2006, Australia.
| |
Collapse
|
7
|
Smalheiser NR, Holt AW. New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial. JAMIA Open 2020; 3:338-341. [PMID: 33215068 PMCID: PMC7660960 DOI: 10.1093/jamiaopen/ooaa042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 06/15/2020] [Accepted: 09/02/2020] [Indexed: 12/04/2022] Open
Abstract
Objectives To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. Materials and Methods We updated our previous model by creating larger, more recent, and more diverse positive and negative training sets consisting of article pairs that were (or not) linked to the same ClinicalTrials.gov trial registry number. Features were extracted from PubMed metadata; pairwise similarity scores were modeled using logistic regression and used to form clusters of articles that are likely to arise from the same registered clinical trial. Results Articles from the same trial were identified with high accuracy (F1 = 0.859), nominally better than the previous model (F1 = 0.843). Predicted clusters showed a low error rate of splitting of 8–11% (ie, when 2 articles belonged to the same trial but were assigned to different clusters). Performance was similar whether only randomized controlled trial articles or a more diverse set of clinical trial articles were processed. Discussion Metadata are surprisingly accurate in predicting when 2 articles derive from the same underlying clinical trial. Conclusion We have continued confidence in the Aggregator tool which can be accessed publicly at http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.
Collapse
Affiliation(s)
- Neil R Smalheiser
- Department of Psychiatry, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Arthur W Holt
- Department of Psychiatry, University of Illinois at Chicago, Chicago, Illinois, USA
| |
Collapse
|
8
|
Harrison E, Martin P, Surian D, Dunn AG. Recommending research articles to consumers of online vaccination information. QUANTITATIVE SCIENCE STUDIES 2020. [DOI: 10.1162/qss_a_00030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Online health communications often provide biased interpretations of evidence and have unreliable links to the source research. We tested the feasibility of a tool for matching web pages to their source evidence. From 207,538 eligible vaccination-related PubMed articles, we evaluated several approaches using 3,573 unique links to web pages from Altmetric. We evaluated methods for ranking the source articles for vaccine-related research described on web pages, comparing simple baseline feature representation and dimensionality reduction approaches to those augmented with canonical correlation analysis (CCA). Performance measures included the median rank of the correct source article; the percentage of web pages for which the source article was correctly ranked first (recall@1); and the percentage ranked within the top 50 candidate articles (recall@50). While augmenting baseline methods using CCA generally improved results, no CCA-based approach outperformed a baseline method, which ranked the correct source article first for over one quarter of web pages and in the top 50 for more than half. Tools to help people identify evidence-based sources for the content they access on vaccination-related web pages are potentially feasible and may support the prevention of bias and misrepresentation of research in news and social media.
Collapse
Affiliation(s)
- Eliza Harrison
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Paige Martin
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Didi Surian
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Adam G. Dunn
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
- Discipline of Biomedical Informatics and Digital Health, School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| |
Collapse
|
9
|
Bashir R, Dunn AG. Software engineering principles address current problems in the systematic review ecosystem. J Clin Epidemiol 2019; 109:136-141. [PMID: 30582972 DOI: 10.1016/j.jclinepi.2018.12.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 11/04/2018] [Accepted: 12/17/2018] [Indexed: 12/19/2022]
Abstract
Systematic reviewers are simultaneously unable to produce systematic reviews fast enough to keep up with the availability of new trial evidence while overproducing systematic reviews that are unlikely to change practice because they are redundant or biased. Although the transparency and completeness of trial reporting has improved with changes in policy and new technologies, systematic reviews have not yet benefited from the same level of effort. We found that new methods and tools used to automate aspects of systematic review processes have focused on improving the efficiency of individual systematic reviews rather than the efficiency of the entire ecosystem of systematic review production. We use software engineering principles to review challenges and opportunities for improving the interoperability, integrity, efficiency, and maintainability. We conclude by recommending ways to improve access to structured systematic review results. Major opportunities for improving systematic reviews will come from new tools and changes in policy focused on doing the right systematic reviews rather than just doing more of them faster.
Collapse
Affiliation(s)
- Rabia Bashir
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
| | - Adam G Dunn
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
10
|
Martin P, Surian D, Bashir R, Bourgeois FT, Dunn AG. Trial2rev: Combining machine learning and crowd-sourcing to create a shared space for updating systematic reviews. JAMIA Open 2019; 2:15-22. [PMID: 31984340 PMCID: PMC6951914 DOI: 10.1093/jamiaopen/ooy062] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 12/05/2018] [Accepted: 12/07/2018] [Indexed: 01/15/2023] Open
Abstract
Objectives Systematic reviews of clinical trials could be updated faster by automatically monitoring relevant trials as they are registered, completed, and reported. Our aim was to provide a public interface to a database of curated links between systematic reviews and trial registrations. Materials and Methods We developed the server-side system components in Python, connected them to a PostgreSQL database, and implemented the web-based user interface using Javascript, HTML, and CSS. All code is available on GitHub under an open source MIT license and registered users can access and download all available data. Results The trial2rev system is a web-based interface to a database that collates and augments information from multiple sources including bibliographic databases, the ClinicalTrials.gov registry, and the actions of registered users. Users interact with the system by browsing, searching, or adding systematic reviews, verifying links to trials included in the review, and adding or voting on trials that they would expect to include in an update of the systematic review. The system can trigger the actions of software agents that add or vote on included and relevant trials, in response to user interactions or by scheduling updates from external resources. Discussion and Conclusion We designed a publicly-accessible resource to help systematic reviewers make decisions about systematic review updates. Where previous approaches have sought to reactively filter published reports of trials for inclusion in systematic reviews, our approach is to proactively monitor for relevant trials as they are registered and completed.
Collapse
Affiliation(s)
- Paige Martin
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Didi Surian
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Rabia Bashir
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Florence T Bourgeois
- Computational Health Informatics Program, Children's Hospital Boston, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Adam G Dunn
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|