1
|
Dennstädt F, Zink J, Putora PM, Hastings J, Cihoric N. Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain. Syst Rev 2024; 13:158. [PMID: 38879534 PMCID: PMC11180407 DOI: 10.1186/s13643-024-02575-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 05/30/2024] [Indexed: 06/19/2024] Open
Abstract
BACKGROUND Systematically screening published literature to determine the relevant publications to synthesize in a review is a time-consuming and difficult task. Large language models (LLMs) are an emerging technology with promising capabilities for the automation of language-related tasks that may be useful for such a purpose. METHODS LLMs were used as part of an automated system to evaluate the relevance of publications to a certain topic based on defined criteria and based on the title and abstract of each publication. A Python script was created to generate structured prompts consisting of text strings for instruction, title, abstract, and relevant criteria to be provided to an LLM. The relevance of a publication was evaluated by the LLM on a Likert scale (low relevance to high relevance). By specifying a threshold, different classifiers for inclusion/exclusion of publications could then be defined. The approach was used with four different openly available LLMs on ten published data sets of biomedical literature reviews and on a newly human-created data set for a hypothetical new systematic literature review. RESULTS The performance of the classifiers varied depending on the LLM being used and on the data set analyzed. Regarding sensitivity/specificity, the classifiers yielded 94.48%/31.78% for the FlanT5 model, 97.58%/19.12% for the OpenHermes-NeuralChat model, 81.93%/75.19% for the Mixtral model and 97.58%/38.34% for the Platypus 2 model on the ten published data sets. The same classifiers yielded 100% sensitivity at a specificity of 12.58%, 4.54%, 62.47%, and 24.74% on the newly created data set. Changing the standard settings of the approach (minor adaption of instruction prompt and/or changing the range of the Likert scale from 1-5 to 1-10) had a considerable impact on the performance. CONCLUSIONS LLMs can be used to evaluate the relevance of scientific publications to a certain review topic and classifiers based on such an approach show some promising results. To date, little is known about how well such systems would perform if used prospectively when conducting systematic literature reviews and what further implications this might have. However, it is likely that in the future researchers will increasingly use LLMs for evaluating and classifying scientific publications.
Collapse
Affiliation(s)
- Fabio Dennstädt
- Department of Radiation Oncology, Cantonal Hospital of St. Gallen, St. Gallen, Switzerland.
- Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.
| | - Johannes Zink
- Institute for Computer Science, University of Würzburg, Würzburg, Germany
| | - Paul Martin Putora
- Department of Radiation Oncology, Cantonal Hospital of St. Gallen, St. Gallen, Switzerland
- Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland
| | - Janna Hastings
- Institute for Implementation Science in Health Care, University of Zurich, Zurich, Switzerland
- School of Medicine, University of St. Gallen, St. Gallen, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nikola Cihoric
- Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland
| |
Collapse
|
2
|
Leenaars CHC, Stafleu FR, Häger C, Nieraad H, Bleich A. A systematic review of animal and human data comparing the nasal potential difference test between cystic fibrosis and control. Sci Rep 2024; 14:9664. [PMID: 38671057 PMCID: PMC11053161 DOI: 10.1038/s41598-024-60389-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 04/23/2024] [Indexed: 04/28/2024] Open
Abstract
The nasal potential difference test (nPD) is an electrophysiological measurement which is altered in patients and animal models with cystic fibrosis (CF). Because protocols and outcomes vary substantially between laboratories, there are concerns over its validity and precision. We performed a systematic literature review (SR) of the nPD to answer the following review questions: A. Is the nasal potential difference similarly affected in CF patients and animal models?", and B. "Is the nPD in human patients and animal models of CF similarly affected by various changes in the experimental set-up?". The review protocol was preregistered on PROSPERO (CRD42021236047). We searched PubMed and Embase with comprehensive search strings. Two independent reviewers screened all references for inclusion and extracted all data. Included were studies about CF which described in vivo nPD measurements in separate CF and control groups. Risk of bias was assessed, and three meta-analyses were performed. We included 130 references describing nPD values for CF and control subjects, which confirmed substantial variation in the experimental design and nPD outcome between groups. The meta-analyses showed a clear difference in baseline nPD values between CF and control subjects, both in animals and in humans. However, baseline nPD values were, on average, lower in animal than in human studies. Reporting of experimental details was poor for both animal and human studies, and urgently needs to improve to ensure reproducibility of experiments within and between species.
Collapse
Affiliation(s)
| | - Frans R Stafleu
- Department of Animals in Science and Society-Human-Animal Relationship, Utrecht University, Utrecht, The Netherlands
| | - Christine Häger
- Institute for Laboratory Animal Science, Hannover Medical School, Hannover, Germany
| | - Hendrik Nieraad
- Institute for Laboratory Animal Science, Hannover Medical School, Hannover, Germany
| | - André Bleich
- Institute for Laboratory Animal Science, Hannover Medical School, Hannover, Germany
| |
Collapse
|
3
|
Rix A, Girbig R, Porte C, Lederle W, Leenaars C, Kiessling F. Development of a Systematic Review Protocol and a Scoping Review of Ultrasound-Induced Immune Effects in Peripheral Tumors. Mol Imaging Biol 2022; 24:288-297. [PMID: 34845660 PMCID: PMC8983530 DOI: 10.1007/s11307-021-01686-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 11/02/2021] [Accepted: 11/11/2021] [Indexed: 12/09/2022]
Abstract
PURPOSE Publication numbers reporting that ultrasound can stimulate immune reactions in tumors steadily increase. However, the presented data are partially conflicting, and mechanisms are difficult to identify from single publications. These shortcomings can be addressed by a systematic review and meta-analysis of current literature. As a first step, we here present the methodology and protocol for a systematic review to answer the following research question: Does ultrasound alter the immune reaction of peripheral solid tumors in humans and animals compared to control conditions without ultrasound? PROCEDURES We designed a protocol to perform a systematic review and meta-analysis. The suitability of the protocol to detect and sort relevant literature was tested using a subset of publications. We extracted study characteristics, ultrasound parameters, and study outcomes to pre-evaluate the differences between publications and present the data as a scoping review. RESULTS From 6532 publications detected by our preliminary literature search, 320 were selected for testing our systematic review protocol. Of the latter, 15 publications were eligible for data extraction. There, we found large differences between study characteristics (e.g., tumor type, age) and ultrasound settings (e.g., wavelength 0.5-9.5 MHz, acoustic pressure 0.0001-15,000 W/cm2). Finally, study outcomes included reports on cells of the innate (e.g., dendritic cells, macrophages) and adaptive immune system (e.g., CD8-/CD4-positive T cells). CONCLUSION We designed a protocol to identify relevant literature and perform a systematic review and meta-analysis. The differences between extracted features between publications show the necessity for a comprehensive search and selection strategy in the systematic review to get a complete overview of the literature. Meta-analyses of the extracted outcomes can then enable evidence-based conclusions.
Collapse
Affiliation(s)
- Anne Rix
- Institute for Experimental Molecular Imaging, Medical Faculty, RWTH Aachen International University, Aachen, Germany.
| | - Renée Girbig
- Institute for Experimental Molecular Imaging, Medical Faculty, RWTH Aachen International University, Aachen, Germany
| | - Céline Porte
- Institute for Experimental Molecular Imaging, Medical Faculty, RWTH Aachen International University, Aachen, Germany
| | - Wiltrud Lederle
- Institute for Experimental Molecular Imaging, Medical Faculty, RWTH Aachen International University, Aachen, Germany
| | - Cathalijn Leenaars
- Department for Health Evidence, Radboud Institute for Health Sciences, Radboud University Medical Centre, 6525 GA, Nijmegen, The Netherlands
- Department of Population Health Science, Unit Animals in Science and Society, Utrecht University, 3508 TD, Utrecht, The Netherlands
- Institute for Laboratory Animal Science, Hannover Medical School, 30625, Hannover, Germany
| | - Fabian Kiessling
- Institute for Experimental Molecular Imaging, Medical Faculty, RWTH Aachen International University, Aachen, Germany
| |
Collapse
|
4
|
van der Naald M, Chamuleau SAJ, Menon JML, de Leeuw W, de Haan J, Duncker DJ, Wever KE. Preregistration of animal research protocols: development and 3-year overview of preclinicaltrials.eu. BMJ OPEN SCIENCE 2022; 6:e100259. [PMID: 35372701 PMCID: PMC8928250 DOI: 10.1136/bmjos-2021-100259] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Open, prospective registration of a study protocol can improve research rigour in a number of ways. Through preregistration, key features of the study’s methodology are recorded and maintained as a permanent record, enabling comparison of the completed study with what was planned. By recording the study hypothesis and planned outcomes a priori, preregistration creates transparency and can reduce the risk of several common biases, such as hypothesising after results are known and outcome switching or selective outcome reporting. Second, preregistration raises awareness of measures to reduce bias, such as randomisation and blinding. Third, preregistration provides a comprehensive listing of planned studies, which can prevent unnecessary duplication and reduce publication bias. Although commonly acknowledged and applied in clinical research since 2000, preregistration of animal studies is not yet the norm. In 2018 we launched the first dedicated, open, online register for animal study protocols: wwwpreclinicaltrialseu. Here, we provide insight in the development of preclinicaltrials.eu (PCT) and evaluate its use during the first 3 years after its launch. Furthermore, we elaborate on ongoing developments such as the rise of comparable registries, increasing support for preregistration in the Netherlands—which led to the funding of PCT by the Dutch government—and pilots of mandatory preregistration by several funding bodies. We show the international coverage of currently registered protocols but with the overall low number of (pre)registered protocols.
Collapse
Affiliation(s)
- Mira van der Naald
- Department of Cardiology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Steven A J Chamuleau
- Department of Cardiology, Amsterdam UMC Locatie AMC, Amsterdam, North Holland, The Netherlands
- Netherlands Heart Institute, Utrecht, The Netherlands
| | | | - Wim de Leeuw
- Animal Welfare Body Utrecht, Utrecht, The Netherlands
| | - Judith de Haan
- Open Science Programme, Utrecht University, Utrecht, The Netherlands
| | - Dirk J Duncker
- Department of Cardiology, Thoraxcenter, Erasmus Medical Center, Rotterdam, Zuid-Holland, The Netherlands
| | - Kimberley Elaine Wever
- Systematic Review Centre for Laboratory Animal Experimentation (SYRCLE), Department for Health Evidence, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Anesthesiology, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
5
|
Ye X, Gu Y, Bai Y, Xia S, Zhang Y, Lou Y, Zhu Y, Dai Y, Tsoi JKH, Wang S. Does Low-Magnitude High-Frequency Vibration (LMHFV) Worth for Clinical Trial on Dental Implant? A Systematic Review and Meta-Analysis on Animal Studies. Front Bioeng Biotechnol 2021; 9:626892. [PMID: 33987172 PMCID: PMC8111077 DOI: 10.3389/fbioe.2021.626892] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 03/29/2021] [Indexed: 01/19/2023] Open
Abstract
Being as a non-pharmacological medical intervention, low-magnitude high-frequency vibration (LMHFV) has shown a positive effect on bone induction and remodeling for various muscle diseases in animal studies, among which dental implants osteointegration were reported to be improved as well. However, whether LMHFV can be clinically used in dental implant is still unknown. In this study, efficacy, parameters and side effects of LMHFV were analyzed via data before 15th July 2020, collecting from MEDLINE/PubMed, Embase, Ovid and Cochrane Library databases. In the screened 1,742 abstracts and 45 articles, 15 animal studies involving 972 implants were included. SYRCLE's tool was performed to assess the possible risk of bias for each study. The GRADE approach was applied to evaluate the quality of evidence. Random effects meta-analysis detected statistically significant in total BIC (P < 0.0001) and BV/TV (P = 0.001) upon loading LMHFV on implants. To conclude, LMHFV played an active role on BIC and BV/TV data according to the GRADE analysis results (medium and low quality of evidence). This might illustrate LMHFV to be a worthy way in improving osseointegration clinically, especially for osteoporosis. Systematic Review Registration:https://www.crd.york.ac.uk/PROSPERO, identifier: NCT02612389
Collapse
Affiliation(s)
- Xinjian Ye
- School of Stomatology, Zhejiang Chinese Medical University, Hangzhou, China
| | - Ying Gu
- Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Pokfulam, Hong Kong
| | - Yijing Bai
- The First Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, China
| | - Siqi Xia
- School of Stomatology, Zhejiang Chinese Medical University, Hangzhou, China
| | - Yujia Zhang
- School of Stomatology, Zhejiang Chinese Medical University, Hangzhou, China
| | - Yuwei Lou
- School of Stomatology, Zhejiang Chinese Medical University, Hangzhou, China
| | - Yuchi Zhu
- School of Stomatology, Zhejiang Chinese Medical University, Hangzhou, China
| | - Yuwei Dai
- School of Stomatology, Zhejiang Chinese Medical University, Hangzhou, China
| | - James Kit-Hon Tsoi
- Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Pokfulam, Hong Kong
| | - Shuhua Wang
- School of Stomatology, Zhejiang Chinese Medical University, Hangzhou, China.,Hospital of Stomatology, Zhejiang Chinese Medical University, Hangzhou, China
| |
Collapse
|
6
|
Leenaars C, Tsaioun K, Stafleu F, Rooney K, Meijboom F, Ritskes-Hoitinga M, Bleich A. Reviewing the animal literature: how to describe and choose between different types of literature reviews. Lab Anim 2021; 55:129-141. [PMID: 33135562 PMCID: PMC8044607 DOI: 10.1177/0023677220968599] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 10/01/2020] [Indexed: 12/15/2022]
Abstract
Before starting any (animal) research project, review of the existing literature is good practice. From both the scientific and the ethical perspective, high-quality literature reviews are essential. Literature reviews have many potential advantages besides synthesising the evidence for a research question. First, they can show if a proposed study has already been performed, preventing redundant research. Second, when planning new experiments, reviews can inform the experimental design, thereby increasing the reliability, relevance and efficiency of the study. Third, reviews may even answer research questions using already available data. Multiple definitions of the term literature review co-exist. In this paper, we describe the different steps in the review process, and the risks and benefits of using various methodologies in each step. We then suggest common terminology for different review types: narrative reviews, mapping reviews, scoping reviews, rapid reviews, systematic reviews and umbrella reviews. We recommend which review to select, depending on the research question and available resources. We believe that improved understanding of review methods and terminology will prevent ambiguity and increase appropriate interpretation of the conclusions of reviews.
Collapse
Affiliation(s)
- Cathalijn Leenaars
- Institute for Laboratory Animal Science, Hannover Medical School, Germany
- Department of Animals in Science and Society, Utrecht University, the Netherlands
| | - Katya Tsaioun
- Evidence-based Toxicology Collaboration, Johns Hopkins Bloomberg School of Public Health (EBTC), USA
| | - Frans Stafleu
- Department of Animals in Science and Society, Utrecht University, the Netherlands
| | - Kieron Rooney
- Charles Perkins Centre, Faculty of Medicine and Health, University of Sydney, Australia
| | - Franck Meijboom
- Department of Animals in Science and Society, Utrecht University, the Netherlands
| | - Merel Ritskes-Hoitinga
- SYRCLE, Department for Health Evidence (section HTA), Radboud Institute for Health Sciences, The Netherlands
- AUGUST, Department for Clinical Medicine, Aarhus University, Denmark
| | - André Bleich
- Institute for Laboratory Animal Science, Hannover Medical School, Germany
| |
Collapse
|
7
|
Gosselin RD. Insufficient transparency of statistical reporting in preclinical research: a scoping review. Sci Rep 2021; 11:3335. [PMID: 33558615 PMCID: PMC7870941 DOI: 10.1038/s41598-021-83006-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 01/26/2021] [Indexed: 12/14/2022] Open
Abstract
Non-transparent statistical reporting contributes to the reproducibility crisis in life sciences, despite guidelines and educational articles regularly published. Envisioning more effective measures for ensuring transparency requires the detailed monitoring of incomplete reporting in the literature. In this study, a systematic approach was used to sample 16 periodicals from the ISI Journal Citation Report database and to collect 233 preclinical articles (including both in vitro and animal research) from online journal content published in 2019. Statistical items related to the use of location tests were quantified. Results revealed that a large proportion of articles insufficiently describe tests (median 44.8%, IQR [33.3–62.5%], k = 16 journals), software (31%, IQR [22.3–39.6%]) or sample sizes (44.2%, IQR [35.7–55.4%]). The results further point at contradictory information as a component of poor reporting (18.3%, IQR [6.79–26.7%]). No detectable correlation was found between journal impact factor and the quality of statistical reporting of any studied item. The under-representation of open-source software (4.50% of articles) suggests that the provision of code should remain restricted to articles that use such packages. Since mounting evidence indicates that transparency is key for reproducible science, this work highlights the need for a more rigorous enforcement of existing guidelines.
Collapse
Affiliation(s)
- Romain-Daniel Gosselin
- Precision Medicine Unit, Lausanne University Hospital, Chemin des Roches 1a/1b, 1010, Lausanne, Switzerland.
| |
Collapse
|
8
|
Pound P. Are Animal Models Needed to Discover, Develop and Test Pharmaceutical Drugs for Humans in the 21st Century? Animals (Basel) 2020; 10:ani10122455. [PMID: 33371480 PMCID: PMC7767523 DOI: 10.3390/ani10122455] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 12/20/2020] [Indexed: 12/13/2022] Open
Abstract
Despite many decades of research, much of which has focused on studies in animals, we humans continue to suffer from multiple diseases for which there are no cures or treatments [...].
Collapse
Affiliation(s)
- Pandora Pound
- Safer Medicines Trust, P.O. Box 122, Kingsbridge TQ7 9AX, UK
| |
Collapse
|
9
|
A Systematic Review Comparing Experimental Design of Animal and Human Methotrexate Efficacy Studies for Rheumatoid Arthritis: Lessons for the Translational Value of Animal Studies. Animals (Basel) 2020; 10:ani10061047. [PMID: 32560528 PMCID: PMC7341304 DOI: 10.3390/ani10061047] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 06/11/2020] [Accepted: 06/12/2020] [Indexed: 12/15/2022] Open
Abstract
Simple Summary If we want to use animal studies to predict what will happen if we give a drug to humans, it makes sense to perform the animal studies as similarly to human studies as possible. For example, if animal tests of a drug only look at the effect of injecting the drug in young healthy animals, we cannot expect the results to be similar in human tests giving tablets to older patients who may have other diseases besides the one for which they receive the drug. We did an in-depth analysis of how 147 animal and 512 human studies of the drug methotrexate for rheumatoid arthritis were performed. Important differences were present, for example, animal studies used more males, while rheumatoid arthritis occurs more in females. We calculated the human-equivalent age of the animals, and they were on average younger than humans. Many studies did not fully report the way the experiments were performed. In spite of these differences, the drug methotrexate works well against rheumatoid arthritis in animal models and humans. Further (literature) research is still needed; we do not yet understand when we can reliably predict human effects from animal studies. Abstract Increased awareness and understanding of current practices in translational research is required for informed decision making in drug development. This paper describes a systematic review of methotrexate for rheumatoid arthritis, comparing trial design between 147 animal and 512 human studies. Animal studies generally included fewer subjects than human studies, and less frequently reported randomisation and blinding. In relation to life span, study duration was comparable for animals and humans, but included animals were younger than included humans. Animal studies often comprised males only (61%), human studies always included females (98% included both sexes). Power calculations were poorly reported in both samples. Analyses of human studies more frequently comprised Chi-square tests, those of animal studies more frequently reported analyses of variance. Administration route was more variable, and more frequently reported in animal than human studies. Erythrocyte sedimentation rate and c-reactive protein were analysed more frequently in human than in animal studies. To conclude, experimental designs for animal and human studies are not optimally aligned. However, methotrexate is effective in treating rheumatoid arthritis in animal models and humans. Further evaluation of the available evidence in other research fields is needed to increase the understanding of translational success before we can optimise translational strategies.
Collapse
|