Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kilicoglu H, Rosemblat G, Hoang L, Wadhwa S, Peng Z, Malički M, Schneider J, Ter Riet G. Toward assessing clinical trial publications for reporting transparency. J Biomed Inform 2021;116:103717. [PMID: 33647518 PMCID: PMC8112250 DOI: 10.1016/j.jbi.2021.103717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 02/14/2021] [Accepted: 02/15/2021] [Indexed: 10/22/2022]

For:	Kilicoglu H, Rosemblat G, Hoang L, Wadhwa S, Peng Z, Malički M, Schneider J, Ter Riet G. Toward assessing clinical trial publications for reporting transparency. J Biomed Inform 2021;116:103717. [PMID: 33647518 PMCID: PMC8112250 DOI: 10.1016/j.jbi.2021.103717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 02/14/2021] [Accepted: 02/15/2021] [Indexed: 10/22/2022]

Number

Cited by Other Article(s)

Lan M, Cheng M, Hoang L, Ter Riet G, Kilicoglu H. Automatic categorization of self-acknowledged limitations in randomized controlled trial publications. J Biomed Inform 2024;152:104628. [PMID: 38548008 DOI: 10.1016/j.jbi.2024.104628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 03/09/2024] [Accepted: 03/24/2024] [Indexed: 04/05/2024]

Abstract

OBJECTIVE

Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications.

METHODS

We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale.

RESULTS

Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F1 score (0.821 vs. 0.8) with statistical significance (p<.001). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F1 score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance (p<.001).

CONCLUSION

The model could support automated screening tools which can be used by journals to draw the authors' attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.

Collapse

Jiang L, Lan M, Menke JD, Vorland CJ, Kilicoglu H. CONSORT-TM: Text classification models for assessing the completeness of randomized controlled trial publications. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.31.24305138. [PMID: 38633775 PMCID: PMC11023672 DOI: 10.1101/2024.03.31.24305138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]

Abstract

Objective

To develop text classification models for determining whether the checklist items in the CONSORT reporting guidelines are reported in randomized controlled trial publications.

Materials and Methods

Using a corpus annotated at the sentence level with 37 fine-grained CONSORT items, we trained several sentence classification models (PubMedBERT fine-tuning, BioGPT fine-tuning, and in-context learning with GPT-4) and compared their performance. To address the problem of small training dataset, we used several data augmentation methods (EDA, UMLS-EDA, text generation and rephrasing with GPT-4) and assessed their impact on the fine-tuned PubMedBERT model. We also fine-tuned PubMedBERT models limited to checklist items associated with specific sections (e.g., Methods) to evaluate whether such models could improve performance compared to the single full model. We performed 5-fold cross-validation and report precision, recall, F1 score, and area under curve (AUC).

Results

Fine-tuned PubMedBERT model that takes as input the sentence and the surrounding sentence representations and uses section headers yielded the best overall performance (0.71 micro-F1, 0.64 macro-F1). Data augmentation had limited positive effect, UMLS-EDA yielding slightly better results than data augmentation using GPT-4. BioGPT fine-tuning and GPT-4 in-context learning exhibited suboptimal results. Methods-specific model yielded higher performance for methodology items, other section-specific models did not have significant impact.

Conclusion

Most CONSORT checklist items can be recognized reasonably well with the fine-tuned PubMedBERT model but there is room for improvement. Improved models can underpin the journal editorial workflows and CONSORT adherence checks and can help authors in improving the reporting quality and completeness of their manuscripts.

Collapse

Liu H, Soroush A, Nestor JG, Park E, Idnay B, Fang Y, Pan J, Liao S, Bernard M, Peng Y, Weng C. Retrieval augmented scientific claim verification. JAMIA Open 2024;7:ooae021. [PMID: 38455840 PMCID: PMC10919922 DOI: 10.1093/jamiaopen/ooae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/19/2024] [Accepted: 02/14/2024] [Indexed: 03/09/2024] Open

Kilicoglu H, Jiang L, Hoang L, Mayo-Wilson E, Vinkers CH, Otte WM. Methodology reporting improved over time in 176,469 randomized controlled trials. J Clin Epidemiol 2023;162:19-28. [PMID: 37562729 PMCID: PMC10829891 DOI: 10.1016/j.jclinepi.2023.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/25/2023] [Accepted: 08/02/2023] [Indexed: 08/12/2023]

García-Méndez S, de Arriba-Pérez F, Barros-Vila A, González-Castaño FJ, Costa-Montenegro E. Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04452-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]

Abstract AbstractFinancial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing (nlp) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation (lda) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by nlp researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and rouge-l results endorse its potential as a valuable tool for busy investors. The rouge-l values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with lda to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations. Collapse

Schulz R, Langen G, Prill R, Cassel M, Weissgerber TL. Reporting and transparent research practices in sports medicine and orthopaedic clinical trials: a meta-research study. BMJ Open 2022;12:e059347. [PMID: 35940834 PMCID: PMC9364413 DOI: 10.1136/bmjopen-2021-059347] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Abstract

OBJECTIVES

Transparent reporting of clinical trials is essential to assess the risk of bias and translate research findings into clinical practice. While existing studies have shown that deficiencies are common, detailed empirical and field-specific data are scarce. Therefore, this study aimed to examine current clinical trial reporting and transparent research practices in sports medicine and orthopaedics.

SETTING

Exploratory meta-research study on reporting quality and transparent research practices in orthopaedics and sports medicine clinical trials.

PARTICIPANTS

The sample included clinical trials published in the top 25% of sports medicine and orthopaedics journals over 9 months.

PRIMARY AND SECONDARY OUTCOME MEASURES

Two independent reviewers assessed pre-registration, open data and criteria related to scientific rigour, like randomisation, blinding, and sample size calculations, as well as the study sample, and data analysis.

RESULTS

The sample included 163 clinical trials from 27 journals. While the majority of trials mentioned rigour criteria, essential details were often missing. Sixty per cent (95% confidence interval (CI) 53% to 68%) of trials reported sample size calculations, but only 32% (95% CI 25% to 39%) justified the expected effect size. Few trials indicated the blinding status of all main stakeholders (4%; 95% CI 1% to 7%). Only 18% (95% CI 12% to 24%) included information on randomisation type, method and concealed allocation. Most trials reported participants' sex/gender (95%; 95% CI 92% to 98%) and information on inclusion and exclusion criteria (78%; 95% CI 72% to 84%). Only 20% (95% CI 14% to 26%) of trials were pre-registered. No trials deposited data in open repositories.

CONCLUSIONS

These results will aid the sports medicine and orthopaedics community in developing tailored interventions to improve reporting. While authors typically mention blinding, randomisation and other factors, essential details are often missing. Greater acceptance of open science practices, like pre-registration and open data, is needed. As these practices have been widely encouraged, we discuss systemic interventions that may improve clinical trial reporting.

Collapse

Hoanga L, Jiang L, Kilicoglu H. Investigating the impact of weakly supervised data on text mining models of publication transparency: a case study on randomized controlled trials. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022;2022:254-263. [PMID: 35854729 PMCID: PMC9285178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]

Schmidt L, Finnerty Mutlu AN, Elmore R, Olorisade BK, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: Update of a living systematic review. F1000Res 2021;10:401. [PMID: 34408850 PMCID: PMC8361807 DOI: 10.12688/f1000research.51117.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 10/12/2023] Open

Abstract

Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search PubMed, ACL Anthology, arXiv, OpenAlex via EPPI-Reviewer, and the dblp computer science bibliography. Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. This living review update includes publications up to December 2022 and OpenAlex content up to March 2023. Results: 76 publications are included in this review. Of these, 64 (84%) of the publications addressed extraction of data from abstracts, while 19 (25%) used full texts. A total of 71 (93%) publications developed classifiers for randomised controlled trials. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. Data are available from 25 (33%), and code from 30 (39%) publications. Six (8%) implemented publicly available tools Conclusions: This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of literature review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data. Between review updates, trends for sharing data and code increased strongly: in the base-review, data and code were available for 13 and 19% respectively, these numbers increased to 78 and 87% within the 23 new publications. Compared with the base-review, we observed another research trend, away from straightforward data extraction and towards additionally extracting relations between entities or automatic text summarisation. With this living review we aim to review the literature continually.

Collapse