1
|
Lan M, Cheng M, Hoang L, Ter Riet G, Kilicoglu H. Automatic categorization of self-acknowledged limitations in randomized controlled trial publications. J Biomed Inform 2024; 152:104628. [PMID: 38548008 DOI: 10.1016/j.jbi.2024.104628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 03/09/2024] [Accepted: 03/24/2024] [Indexed: 04/05/2024]
Abstract
OBJECTIVE Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications. METHODS We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale. RESULTS Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F1 score (0.821 vs. 0.8) with statistical significance (p<.001). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F1 score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance (p<.001). CONCLUSION The model could support automated screening tools which can be used by journals to draw the authors' attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.
Collapse
Affiliation(s)
- Mengfei Lan
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA
| | - Mandy Cheng
- Department of Biological Sciences, Binghamton University, 4400 Vestal Parkway East, New York City, 13902, NY, USA
| | - Linh Hoang
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA
| | - Gerben Ter Riet
- Faculty of Health, Amsterdam University of Applied Sciences, Tafelbergweg 51, Amsterdam, 1105 BD, The Netherlands
| | - Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, 501 Daniel Street, Champaign, 61820, IL, USA.
| |
Collapse
|
2
|
Liu H, Soroush A, Nestor JG, Park E, Idnay B, Fang Y, Pan J, Liao S, Bernard M, Peng Y, Weng C. Retrieval augmented scientific claim verification. JAMIA Open 2024; 7:ooae021. [PMID: 38455840 PMCID: PMC10919922 DOI: 10.1093/jamiaopen/ooae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/19/2024] [Accepted: 02/14/2024] [Indexed: 03/09/2024] Open
Abstract
Objective To automate scientific claim verification using PubMed abstracts. Materials and Methods We developed CliVER, an end-to-end scientific Claim VERification system that leverages retrieval-augmented techniques to automatically retrieve relevant clinical trial abstracts, extract pertinent sentences, and use the PICO framework to support or refute a scientific claim. We also created an ensemble of three state-of-the-art deep learning models to classify rationale of support, refute, and neutral. We then constructed CoVERt, a new COVID VERification dataset comprising 15 PICO-encoded drug claims accompanied by 96 manually selected and labeled clinical trial abstracts that either support or refute each claim. We used CoVERt and SciFact (a public scientific claim verification dataset) to assess CliVER's performance in predicting labels. Finally, we compared CliVER to clinicians in the verification of 19 claims from 6 disease domains, using 189 648 PubMed abstracts extracted from January 2010 to October 2021. Results In the evaluation of label prediction accuracy on CoVERt, CliVER achieved a notable F1 score of 0.92, highlighting the efficacy of the retrieval-augmented models. The ensemble model outperforms each individual state-of-the-art model by an absolute increase from 3% to 11% in the F1 score. Moreover, when compared with four clinicians, CliVER achieved a precision of 79.0% for abstract retrieval, 67.4% for sentence selection, and 63.2% for label prediction, respectively. Conclusion CliVER demonstrates its early potential to automate scientific claim verification using retrieval-augmented strategies to harness the wealth of clinical trial abstracts in PubMed. Future studies are warranted to further test its clinical utility.
Collapse
Affiliation(s)
- Hao Liu
- School of Computing, Montclair State University, Montclair, NJ 07043, United States
| | - Ali Soroush
- Department of Medicine, Columbia University, New York, NY 10027, United States
| | - Jordan G Nestor
- Department of Medicine, Columbia University, New York, NY 10027, United States
| | - Elizabeth Park
- Department of Medicine, Columbia University, New York, NY 10027, United States
| | - Betina Idnay
- Department of Biomedical Informatics, Columbia University, New York, NY 10027, United States
| | - Yilu Fang
- Department of Biomedical Informatics, Columbia University, New York, NY 10027, United States
| | - Jane Pan
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, United States
| | - Stan Liao
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, United States
| | - Marguerite Bernard
- Institute of Human Nutrition, Columbia University, New York, NY 10027, United States
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10027, United States
| |
Collapse
|
3
|
Jiang L, Lan M, Menke JD, Vorland CJ, Kilicoglu H. CONSORT-TM: Text classification models for assessing the completeness of randomized controlled trial publications. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.31.24305138. [PMID: 38633775 PMCID: PMC11023672 DOI: 10.1101/2024.03.31.24305138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Objective To develop text classification models for determining whether the checklist items in the CONSORT reporting guidelines are reported in randomized controlled trial publications. Materials and Methods Using a corpus annotated at the sentence level with 37 fine-grained CONSORT items, we trained several sentence classification models (PubMedBERT fine-tuning, BioGPT fine-tuning, and in-context learning with GPT-4) and compared their performance. To address the problem of small training dataset, we used several data augmentation methods (EDA, UMLS-EDA, text generation and rephrasing with GPT-4) and assessed their impact on the fine-tuned PubMedBERT model. We also fine-tuned PubMedBERT models limited to checklist items associated with specific sections (e.g., Methods) to evaluate whether such models could improve performance compared to the single full model. We performed 5-fold cross-validation and report precision, recall, F1 score, and area under curve (AUC). Results Fine-tuned PubMedBERT model that takes as input the sentence and the surrounding sentence representations and uses section headers yielded the best overall performance (0.71 micro-F1, 0.64 macro-F1). Data augmentation had limited positive effect, UMLS-EDA yielding slightly better results than data augmentation using GPT-4. BioGPT fine-tuning and GPT-4 in-context learning exhibited suboptimal results. Methods-specific model yielded higher performance for methodology items, other section-specific models did not have significant impact. Conclusion Most CONSORT checklist items can be recognized reasonably well with the fine-tuned PubMedBERT model but there is room for improvement. Improved models can underpin the journal editorial workflows and CONSORT adherence checks and can help authors in improving the reporting quality and completeness of their manuscripts.
Collapse
Affiliation(s)
- Lan Jiang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Mengfei Lan
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Joe D. Menke
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Colby J Vorland
- Indiana University, School of Public Health, Bloomington, IN, USA
| | - Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| |
Collapse
|
4
|
Kilicoglu H, Jiang L, Hoang L, Mayo-Wilson E, Vinkers CH, Otte WM. Methodology reporting improved over time in 176,469 randomized controlled trials. J Clin Epidemiol 2023; 162:19-28. [PMID: 37562729 PMCID: PMC10829891 DOI: 10.1016/j.jclinepi.2023.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/25/2023] [Accepted: 08/02/2023] [Indexed: 08/12/2023]
Abstract
OBJECTIVES To describe randomized controlled trial (RCT) methodology reporting over time. STUDY DESIGN AND SETTING We used a deep learning-based sentence classification model based on the Consolidated Standards of Reporting Trials (CONSORT) statement, considered minimum requirements for reporting RCTs. We included 176,469 RCT reports published between 1966 and 2018. We analyzed the reporting trends over 5-year time periods, grouping trials from 1966 to 1990 in a single stratum. We also explored the effect of journal impact factor (JIF) and medical discipline. RESULTS Population, Intervention, Comparator, Outcome (PICO) items were commonly reported during each period, and reporting increased over time (e.g., interventions: 79.1% during 1966-1990 to 87.5% during 2010-2018). Reporting of some methods information has increased, although there is room for improvement (e.g., sequence generation: 10.8-41.8%). Some items are reported infrequently (e.g., allocation concealment: 5.1-19.3%). The number of items reported and JIF are weakly correlated (Pearson's r (162,702) = 0.16, P < 0.001). The differences in the proportion of items reported between disciplines are small (<10%). CONCLUSION Our analysis provides large-scale quantitative support for the hypothesis that RCT methodology reporting has improved over time. Extending these models to all CONSORT items could facilitate compliance checking during manuscript authoring and peer review, and support metaresearch.
Collapse
Affiliation(s)
- Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA.
| | - Lan Jiang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Linh Hoang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Evan Mayo-Wilson
- Department of Epidemiology, University of North Carolina School of Global Public Health, Chapel Hill, NC, USA
| | - Christiaan H Vinkers
- Department of Psychiatry and Anatomy & Neurosciences, Amsterdam University Medical Center Location Vrije Universiteit Amsterdam, 1081 HV, Amsterdam, The Netherlands; Amsterdam Public Health, Mental Health Program and Amsterdam Neuroscience, Mood, Anxiety, Psychosis, Sleep & Stress Program, Amsterdam, The Netherlands; GGZ inGeest Mental Health Care, 1081 HJ, Amsterdam, The Netherlands
| | - Willem M Otte
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht, and Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
5
|
García-Méndez S, de Arriba-Pérez F, Barros-Vila A, González-Castaño FJ, Costa-Montenegro E. Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04452-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
AbstractFinancial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing (nlp) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation (lda) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by nlp researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and rouge-l results endorse its potential as a valuable tool for busy investors. The rouge-l values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with lda to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations.
Collapse
|
6
|
Schulz R, Langen G, Prill R, Cassel M, Weissgerber TL. Reporting and transparent research practices in sports medicine and orthopaedic clinical trials: a meta-research study. BMJ Open 2022; 12:e059347. [PMID: 35940834 PMCID: PMC9364413 DOI: 10.1136/bmjopen-2021-059347] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVES Transparent reporting of clinical trials is essential to assess the risk of bias and translate research findings into clinical practice. While existing studies have shown that deficiencies are common, detailed empirical and field-specific data are scarce. Therefore, this study aimed to examine current clinical trial reporting and transparent research practices in sports medicine and orthopaedics. SETTING Exploratory meta-research study on reporting quality and transparent research practices in orthopaedics and sports medicine clinical trials. PARTICIPANTS The sample included clinical trials published in the top 25% of sports medicine and orthopaedics journals over 9 months. PRIMARY AND SECONDARY OUTCOME MEASURES Two independent reviewers assessed pre-registration, open data and criteria related to scientific rigour, like randomisation, blinding, and sample size calculations, as well as the study sample, and data analysis. RESULTS The sample included 163 clinical trials from 27 journals. While the majority of trials mentioned rigour criteria, essential details were often missing. Sixty per cent (95% confidence interval (CI) 53% to 68%) of trials reported sample size calculations, but only 32% (95% CI 25% to 39%) justified the expected effect size. Few trials indicated the blinding status of all main stakeholders (4%; 95% CI 1% to 7%). Only 18% (95% CI 12% to 24%) included information on randomisation type, method and concealed allocation. Most trials reported participants' sex/gender (95%; 95% CI 92% to 98%) and information on inclusion and exclusion criteria (78%; 95% CI 72% to 84%). Only 20% (95% CI 14% to 26%) of trials were pre-registered. No trials deposited data in open repositories. CONCLUSIONS These results will aid the sports medicine and orthopaedics community in developing tailored interventions to improve reporting. While authors typically mention blinding, randomisation and other factors, essential details are often missing. Greater acceptance of open science practices, like pre-registration and open data, is needed. As these practices have been widely encouraged, we discuss systemic interventions that may improve clinical trial reporting.
Collapse
Affiliation(s)
- Robert Schulz
- BIH QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Department of Sport and Health Sciences, University of Potsdam, Potsdam, Brandenburg, Germany
| | - Georg Langen
- Department of Strength, Power and Tactical Sports, Institute for Applied Training Science, Leipzig, Germany
| | - Robert Prill
- Center of Orthopaedics and Traumatology, Brandenburg Medical School Theodor Fontane, Neuruppin, Brandenburg, Germany
| | - Michael Cassel
- Department of Sport and Health Sciences, University of Potsdam, Potsdam, Brandenburg, Germany
| | - Tracey L Weissgerber
- BIH QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
7
|
Hoanga L, Jiang L, Kilicoglu H. Investigating the impact of weakly supervised data on text mining models of publication transparency: a case study on randomized controlled trials. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2022:254-263. [PMID: 35854729 PMCID: PMC9285178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
Lack of large quantities of annotated data is a major barrier in developing effective text mining models of biomedical literature. In this study, we explored weak supervision to improve the accuracy of text classification models for assessing methodological transparency of randomized controlled trial (RCT) publications. Specifically, we used Snorkel, a framework to programmatically build training sets, and UMLS-EDA, a data augmentation method that leverages a small number of labeled examples to generate new training instances, and assessed their effect on a BioBERT-based text classification model proposed for the task in previous work. Performance improvements due to weak supervision were limited and were surpassed by gains from hyperparameter tuning. Our analysis suggests that refinements to the weak supervision strategies to better deal with multi-label case could be beneficial. Our code and data are available at https://github.com/kilicogluh/CONSORT-TM/tree/master/weakSupervision.
Collapse
Affiliation(s)
- Linh Hoanga
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Lan Jiang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
| |
Collapse
|
8
|
Schmidt L, Finnerty Mutlu AN, Elmore R, Olorisade BK, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: Update of a living systematic review. F1000Res 2021; 10:401. [PMID: 34408850 PMCID: PMC8361807 DOI: 10.12688/f1000research.51117.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 10/12/2023] Open
Abstract
Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search PubMed, ACL Anthology, arXiv, OpenAlex via EPPI-Reviewer, and the dblp computer science bibliography. Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. This living review update includes publications up to December 2022 and OpenAlex content up to March 2023. Results: 76 publications are included in this review. Of these, 64 (84%) of the publications addressed extraction of data from abstracts, while 19 (25%) used full texts. A total of 71 (93%) publications developed classifiers for randomised controlled trials. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. Data are available from 25 (33%), and code from 30 (39%) publications. Six (8%) implemented publicly available tools Conclusions: This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of literature review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data. Between review updates, trends for sharing data and code increased strongly: in the base-review, data and code were available for 13 and 19% respectively, these numbers increased to 78 and 87% within the 23 new publications. Compared with the base-review, we observed another research trend, away from straightforward data extraction and towards additionally extracting relations between entities or automatic text summarisation. With this living review we aim to review the literature continually.
Collapse
Affiliation(s)
- Lena Schmidt
- NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, NE4 5TG, UK
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
| | | | - Rebecca Elmore
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
| | - Babatunde K. Olorisade
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
- Evaluate Ltd, London, SE1 2RE, UK
- Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, CF5 2YB, UK
| | - James Thomas
- UCL Social Research Institute, University College London, London, WC1H 0AL, UK
| | | |
Collapse
|