Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gates A, Johnson C, Hartling L. Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool. Syst Rev 2018;7:45. [PMID: 29530097 PMCID: PMC5848519 DOI: 10.1186/s13643-018-0707-8] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

For:	Gates A, Johnson C, Hartling L. Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool. Syst Rev 2018;7:45. [PMID: 29530097 PMCID: PMC5848519 DOI: 10.1186/s13643-018-0707-8] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Number

Cited by Other Article(s)

Lokker C, Abdelkader W, Bagheri E, Parrish R, Cotoi C, Navarro T, Germini F, Linkins LA, Haynes RB, Chu L, Afzal M, Iorio A. Boosting efficiency in a clinical literature surveillance system with LightGBM. PLOS DIGITAL HEALTH 2024;3:e0000299. [PMID: 39312500 PMCID: PMC11419392 DOI: 10.1371/journal.pdig.0000299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 08/14/2024] [Indexed: 09/25/2024]

Abstract

Given the suboptimal performance of Boolean searching to identify methodologically sound and clinically relevant studies in large bibliographic databases, exploring machine learning (ML) to efficiently classify studies is warranted. To boost the efficiency of a literature surveillance program, we used a large internationally recognized dataset of articles tagged for methodological rigor and applied an automated ML approach to train and test binary classification models to predict the probability of clinical research articles being of high methodologic quality. We trained over 12,000 models on a dataset of titles and abstracts of 97,805 articles indexed in PubMed from 2012-2018 which were manually appraised for rigor by highly trained research associates and rated for clinical relevancy by practicing clinicians. As the dataset is unbalanced, with more articles that do not meet the criteria for rigor, we used the unbalanced dataset and over- and under-sampled datasets. Models that maintained sensitivity for high rigor at 99% and maximized specificity were selected and tested in a retrospective set of 30,424 articles from 2020 and validated prospectively in a blinded study of 5253 articles. The final selected algorithm, combining a LightGBM (gradient boosting machine) model trained in each dataset, maintained high sensitivity and achieved 57% specificity in the retrospective validation test and 53% in the prospective study. The number of articles needed to read to find one that met appraisal criteria was 3.68 (95% CI 3.52 to 3.85) in the prospective study, compared with 4.63 (95% CI 4.50 to 4.77) when relying only on Boolean searching. Gradient-boosting ML models reduced the work required to classify high quality clinical research studies by 45%, improving the efficiency of literature surveillance and subsequent dissemination to clinicians and other evidence users.

Collapse

Affiliation(s)

Cynthia Lokker Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Wael Abdelkader Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Elham Bagheri Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Rick Parrish Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Chris Cotoi Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Tamara Navarro Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
Federico Germini Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada Department of Medicine, McMaster University, Hamilton, Ontario, Canada
Lori-Ann Linkins Department of Medicine, McMaster University, Hamilton, Ontario, Canada
R. Brian Haynes Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada Department of Medicine, McMaster University, Hamilton, Ontario, Canada
Lingyang Chu Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada
Muhammad Afzal School of Computing and Digital Technology, Birmingham City University, Birmingham, United Kingdom
Alfonso Iorio Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada Department of Medicine, McMaster University, Hamilton, Ontario, Canada

Collapse

Ramírez SI, Partin M, Snyder AH, Ko E, Aruma J, Castaneda MC, Casas RS. A Scoping Review of Obstetrics and Gynecology Curricula in Primary Care Residency Programs. J Gen Intern Med 2024:10.1007/s11606-024-08987-1. [PMID: 39187722 DOI: 10.1007/s11606-024-08987-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 07/30/2024] [Indexed: 08/28/2024]

Abstract

BACKGROUND

While Women's Health (WH) is a priority for primary care, (Family Medicine (FM), Internal Medicine (IM), Pediatrics (Peds), and combined Medicine/Pediatrics (Med/Peds)), residency curricula remain heterogeneous with deficits in graduates' WH expertise and skills. The overall objective of this study was to assess the quality of WH curricula at primary care residency programs in the United States (US), with a focus on topics in obstetrics and gynecology (OBGYN).

METHODS

PubMed®, ERIC, The Cochrane Library, MedEdPORTAL, and professional organization websites were systematically searched in 2019 and updated in 2021. Included studies described OBGYN educational curricula in US primary care residency programs. Following abstract screening and full-text review, data from eligible studies was abstracted and quality assessed using the Medical Education Research Study Quality Instrument (MERSQI).

RESULTS

A total of 109 studies met the inclusion criteria. Over a quarter of studies were interdepartmental or interdisciplinary. The most common single-department studies were IM (38%) and FM (26%). Twenty (25%) studies addressed comprehensive OBGYN curricula; the most common individual topics were cervical and breast cancer screening (31%) and contraception (16%). Most studies utilized multiple instructional modalities, most commonly didactics (54%), clinical experiences (41%), and/or simulation (21%). Most studies included self-reported outcomes by residents (70%), with few (11%) reporting higher-level assessments (i.e., patient, or clinical outcomes). Most studies were single-group pre- and post-test (42%) with few randomized controlled trials (4%). The mean MERSQI score for studies with sufficient data (90%) was 9.8 (range 3 to 15.5).

DISCUSSION

OBGYN educational curricula for primary care trainees in the US was varied with gaps in represented residents, content, assessments, and study quality.

Collapse

Matsui K, Utsumi T, Aoki Y, Maruki T, Takeshima M, Takaesu Y. Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews. J Med Internet Res 2024;26:e52758. [PMID: 39151163 PMCID: PMC11364944 DOI: 10.2196/52758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/10/2024] [Accepted: 06/25/2024] [Indexed: 08/18/2024] Open

Abstract

BACKGROUND

The screening process for systematic reviews is resource-intensive. Although previous machine learning solutions have reported reductions in workload, they risked excluding relevant papers.

OBJECTIVE

We evaluated the performance of a 3-layer screening method using GPT-3.5 and GPT-4 to streamline the title and abstract-screening process for systematic reviews. Our goal is to develop a screening method that maximizes sensitivity for identifying relevant records.

METHODS

We conducted screenings on 2 of our previous systematic reviews related to the treatment of bipolar disorder, with 1381 records from the first review and 3146 from the second. Screenings were conducted using GPT-3.5 (gpt-3.5-turbo-0125) and GPT-4 (gpt-4-0125-preview) across three layers: (1) research design, (2) target patients, and (3) interventions and controls. The 3-layer screening was conducted using prompts tailored to each study. During this process, information extraction according to each study's inclusion criteria and optimization for screening were carried out using a GPT-4-based flow without manual adjustments. Records were evaluated at each layer, and those meeting the inclusion criteria at all layers were subsequently judged as included.

RESULTS

On each layer, both GPT-3.5 and GPT-4 were able to process about 110 records per minute, and the total time required for screening the first and second studies was approximately 1 hour and 2 hours, respectively. In the first study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.900/0.709 and 0.806/0.996, respectively. Both screenings by GPT-3.5 and GPT-4 judged all 6 records used for the meta-analysis as included. In the second study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.958/0.116 and 0.875/0.855, respectively. The sensitivities for the relevant records align with those of human evaluators: 0.867-1.000 for the first study and 0.776-0.979 for the second study. Both screenings by GPT-3.5 and GPT-4 judged all 9 records used for the meta-analysis as included. After accounting for justifiably excluded records by GPT-4, the sensitivities/specificities of the GPT-4 screening were 0.962/0.996 in the first study and 0.943/0.855 in the second study. Further investigation indicated that the cases incorrectly excluded by GPT-3.5 were due to a lack of domain knowledge, while the cases incorrectly excluded by GPT-4 were due to misinterpretations of the inclusion criteria.

CONCLUSIONS

Our 3-layer screening method with GPT-4 demonstrated acceptable level of sensitivity and specificity that supports its practical application in systematic review screenings. Future research should aim to generalize this approach and explore its effectiveness in diverse settings, both medical and nonmedical, to fully establish its use and operational feasibility.

Collapse

Oami T, Okada Y, Sakuraya M, Fukuda T, Shime N, Nakada TA. Efficiency and Workload Reduction of Semi-automated Citation Screening Software for Creating Clinical Practice Guidelines: A Prospective Observational Study. J Epidemiol 2024;34:380-386. [PMID: 38105001 PMCID: PMC11230876 DOI: 10.2188/jea.je20230227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 11/26/2023] [Indexed: 12/19/2023] Open

Alsanea S, Alkofide H, Almadi B, Almohammed O, Alwhaibi A, Alrabiah Z, Kalagi N. Liraglutide's Effect on Weight Management in Subjects With Pre-diabetes: A Systematic Review & Meta-Analysis. Endocr Pract 2024;30:737-745. [PMID: 38782201 DOI: 10.1016/j.eprac.2024.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/02/2024] [Accepted: 05/05/2024] [Indexed: 05/25/2024]

Abstract

BACKGROUND

Despite the growing literature, the effectiveness of liraglutide in weight management among individuals with prediabetes and in preventing the disease remains controversial. This study aims to critically evaluate the extent of liraglutide's impact on weight management in this population and assess the heterogeneity among extant studies.

METHODS

A systematic literature search was conducted across MEDLINE, Embase, ClinicalTrials.gov, and the reference list of retrieved studies to identify eligible English language randomized controlled trials evaluating liraglutide's effect on weight in individuals with pre-diabetes. Non-randomized studies, studies not reporting relevant outcomes, and those conducted on patients with type 2 diabetes were excluded from this review. Outcomes included a change from baseline in absolute body weight in kg, body mass index (BMI), waist circumference, glycosylated hemoglobin (HbA1c), and low-density lipoprotein cholesterol levels. Additional safety outcomes were also reported. Data were analyzed using R statistical software version 4.3.1. A fixed-effect model was used when pooling crude numbers for study outcomes. Moreover, a sensitivity analysis using random-effect model was performed and heterogeneity was assessed using I2 statistics.

RESULTS

Five eligible studies were included, with a total of 1604 subjects in the liraglutide arm and 859 subjects in the control arm. Participants exposed to liraglutide showed a decrease in body weight (mean difference [MD] = -4.95 kg; 95% CI -5.16, -4.73; I2 = 93%), BMI (MD = -2.06 kg/m2; 95%CI -2.22, -1.89; I2 = 97%), waist circumference (MD = -4.61 cm; 95% CI -4.79, -4.43; I2 = 82%), HbA1c (MD = -0.33%; 95%CI -0.34, -0.31; I2 = 100%), and low-density lipoprotein cholesterol levels (MD = -0.36 mmol/L; 95% CI -0.39, -0.33; I2 = 99%). The overall effect size remained similar when using a random-effects model for all outcomes. In addition, the rate of adverse events was higher with liraglutide when compared to the control; however, the dropout rates were relatively lower in the former arm.

CONCLUSION

While our meta-analysis suggests that liraglutide can reduce body weight, BMI, waist circumference, and HbA1c levels in individuals with pre-diabetes, the findings should be interpreted cautiously due to limitations such as the small number of trials and their short duration, and variability in dosages. Further randomized controlled trials examining long-term outcomes are essential to validate these findings and address the high heterogeneity among the studies included in this analysis.

Collapse

Tóth B, Berek L, Gulácsi L, Péntek M, Zrubka Z. Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed. Syst Rev 2024;13:174. [PMID: 38978132 PMCID: PMC11229257 DOI: 10.1186/s13643-024-02592-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/20/2024] [Indexed: 07/10/2024] Open

Abstract

BACKGROUND

The demand for high-quality systematic literature reviews (SRs) for evidence-based medical decision-making is growing. SRs are costly and require the scarce resource of highly skilled reviewers. Automation technology has been proposed to save workload and expedite the SR workflow. We aimed to provide a comprehensive overview of SR automation studies indexed in PubMed, focusing on the applicability of these technologies in real world practice.

METHODS

In November 2022, we extracted, combined, and ran an integrated PubMed search for SRs on SR automation. Full-text English peer-reviewed articles were included if they reported studies on SR automation methods (SSAM), or automated SRs (ASR). Bibliographic analyses and knowledge-discovery studies were excluded. Record screening was performed by single reviewers, and the selection of full text papers was performed in duplicate. We summarized the publication details, automated review stages, automation goals, applied tools, data sources, methods, results, and Google Scholar citations of SR automation studies.

RESULTS

From 5321 records screened by title and abstract, we included 123 full text articles, of which 108 were SSAM and 15 ASR. Automation was applied for search (19/123, 15.4%), record screening (89/123, 72.4%), full-text selection (6/123, 4.9%), data extraction (13/123, 10.6%), risk of bias assessment (9/123, 7.3%), evidence synthesis (2/123, 1.6%), assessment of evidence quality (2/123, 1.6%), and reporting (2/123, 1.6%). Multiple SR stages were automated by 11 (8.9%) studies. The performance of automated record screening varied largely across SR topics. In published ASR, we found examples of automated search, record screening, full-text selection, and data extraction. In some ASRs, automation fully complemented manual reviews to increase sensitivity rather than to save workload. Reporting of automation details was often incomplete in ASRs.

CONCLUSIONS

Automation techniques are being developed for all SR stages, but with limited real-world adoption. Most SR automation tools target single SR stages, with modest time savings for the entire SR process and varying sensitivity and specificity across studies. Therefore, the real-world benefits of SR automation remain uncertain. Standardizing the terminology, reporting, and metrics of study reports could enhance the adoption of SR automation techniques in real-world practice.

Collapse

Oami T, Okada Y, Nakada TA. Performance of a Large Language Model in Screening Citations. JAMA Netw Open 2024;7:e2420496. [PMID: 38976267 PMCID: PMC11231796 DOI: 10.1001/jamanetworkopen.2024.20496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 05/06/2024] [Indexed: 07/09/2024] Open

Abstract

Importance

Large language models (LLMs) are promising as tools for citation screening in systematic reviews. However, their applicability has not yet been determined.

Objective

To evaluate the accuracy and efficiency of an LLM in title and abstract literature screening.

Design, Setting, and Participants

This prospective diagnostic study used the data from the title and abstract screening process for 5 clinical questions (CQs) in the development of the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock. The LLM decided to include or exclude citations based on the inclusion and exclusion criteria in terms of patient, population, problem; intervention; comparison; and study design of the selected CQ and was compared with the conventional method for title and abstract screening. This study was conducted from January 7 to 15, 2024.

Exposures

LLM (GPT-4 Turbo)-assisted citation screening or the conventional method.

Main Outcomes and Measures

The sensitivity and specificity of the LLM-assisted screening process was calculated, and the full-text screening result using the conventional method was set as the reference standard in the primary analysis. Pooled sensitivity and specificity were also estimated, and screening times of the 2 methods were compared.

Results

In the conventional citation screening process, 8 of 5634 publications in CQ 1, 4 of 3418 in CQ 2, 4 of 1038 in CQ 3, 17 of 4326 in CQ 4, and 8 of 2253 in CQ 5 were selected. In the primary analysis of 5 CQs, LLM-assisted citation screening demonstrated an integrated sensitivity of 0.75 (95% CI, 0.43 to 0.92) and specificity of 0.99 (95% CI, 0.99 to 0.99). Post hoc modifications to the command prompt improved the integrated sensitivity to 0.91 (95% CI, 0.77 to 0.97) without substantially compromising specificity (0.98 [95% CI, 0.96 to 0.99]). Additionally, LLM-assisted screening was associated with reduced time for processing 100 studies (1.3 minutes vs 17.2 minutes for conventional screening methods; mean difference, -15.25 minutes [95% CI, -17.70 to -12.79 minutes]).

Conclusions and Relevance

In this prospective diagnostic study investigating the performance of LLM-assisted citation screening, the model demonstrated acceptable sensitivity and reasonably high specificity with reduced processing time. This novel method could potentially enhance efficiency and reduce workload in systematic reviews.

Collapse

Burns JK, Etherington C, Cheng-Boivin O, Boet S. Using an artificial intelligence tool can be as accurate as human assessors in level one screening for a systematic review. Health Info Libr J 2024;41:136-148. [PMID: 34792285 DOI: 10.1111/hir.12413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 08/19/2021] [Accepted: 10/23/2021] [Indexed: 11/29/2022]

Tran VT, Gartlehner G, Yaacoub S, Boutron I, Schwingshackl L, Stadelmaier J, Sommer I, Alebouyeh F, Afach S, Meerpohl J, Ravaud P. Sensitivity and Specificity of Using GPT-3.5 Turbo Models for Title and Abstract Screening in Systematic Reviews and Meta-analyses. Ann Intern Med 2024;177:791-799. [PMID: 38768452 DOI: 10.7326/m23-3389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open

Abstract

BACKGROUND

Systematic reviews are performed manually despite the exponential growth of scientific literature.

OBJECTIVE

To investigate the sensitivity and specificity of GPT-3.5 Turbo, from OpenAI, as a single reviewer, for title and abstract screening in systematic reviews.

DESIGN

Diagnostic test accuracy study.

SETTING

Unannotated bibliographic databases from 5 systematic reviews representing 22 665 citations.

PARTICIPANTS

None.

MEASUREMENTS

A generic prompt framework to instruct GPT to perform title and abstract screening was designed. The output of the model was compared with decisions from authors under 2 rules. The first rule balanced sensitivity and specificity, for example, to act as a second reviewer. The second rule optimized sensitivity, for example, to reduce the number of citations to be manually screened.

RESULTS

Under the balanced rule, sensitivities ranged from 81.1% to 96.5% and specificities ranged from 25.8% to 80.4%. Across all reviews, GPT identified 7 of 708 citations (1%) missed by humans that should have been included after full-text screening at the cost of 10 279 of 22 665 false-positive recommendations (45.3%) that would require reconciliation during the screening process. Under the sensitive rule, sensitivities ranged from 94.6% to 99.8% and specificities ranged from 2.2% to 46.6%. Limiting manual screening to citations not ruled out by GPT could reduce the number of citations to screen from 127 of 6334 (2%) to 1851 of 4077 (45.4%), at the cost of missing from 0 to 1 of 26 citations (3.8%) at the full-text level.

LIMITATIONS

Time needed to fine-tune prompt. Retrospective nature of the study, convenient sample of 5 systematic reviews, and GPT performance sensitive to prompt development and time.

CONCLUSION

The GPT-3.5 Turbo model may be used as a second reviewer for title and abstract screening, at the cost of additional work to reconcile added false positives. It also showed potential to reduce the number of citations before screening by humans, at the cost of missing some citations at the full-text level.

PRIMARY FUNDING SOURCE

None.

Collapse

Affiliation(s)

Viet-Thi Tran Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris; and Centre d'Epidémiologie Clinique, Hôpital Hôtel-Dieu, AP-HP, Paris, France (V.-T.T.)
Gerald Gartlehner Department for Evidence-based Medicine and Evaluation, University for Continuing Education Krems, Krems, Austria; and Center for Public Health Methods, RTI International, Research Triangle Park, North Carolina (G.G.)
Sally Yaacoub Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris, France (S.Y., F.A.)
Isabelle Boutron Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris, France; and Centre d'Epidémiologie Clinique, Hôpital Hôtel-Dieu, AP-HP, Paris, France (I.B.)
Lukas Schwingshackl Institute for Evidence in Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany (L.S., J.S., J.M.)
Julia Stadelmaier Institute for Evidence in Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany (L.S., J.S., J.M.)
Isolde Sommer Department for Evidence-based Medicine and Evaluation, University for Continuing Education Krems, Krems, Austria (I.S.)
Farzaneh Alebouyeh Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris, France (S.Y., F.A.)
Sivem Afach Epidemiology in Dermatology and Evaluation of Therapeutics (EpiDermE)-EA 7379, University Paris Est Créteil Val de Marne, Créteil, France (S.A.)
Joerg Meerpohl Institute for Evidence in Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany (L.S., J.S., J.M.)
Philippe Ravaud Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris, France; Centre d'Epidémiologie Clinique, Hôpital Hôtel-Dieu, AP-HP, Paris, France; and Department of Epidemiology, Columbia University Mailman School of Public Health, New York, New York (P.R.)

Collapse

Guo Q, Jiang G, Zhao Q, Long Y, Feng K, Gu X, Xu Y, Li Z, Huang J, Du L. Rapid review: A review of methods and recommendations based on current evidence. J Evid Based Med 2024;17:434-453. [PMID: 38512942 DOI: 10.1111/jebm.12594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 02/28/2024] [Indexed: 03/23/2024]

Affiliation(s)

Qiong Guo Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China West China Medical Publishers, West China Hospital, Sichuan University, Chengdu, P. R. China
Guiyu Jiang West China School of Public Health, Sichuan University, Chengdu, P. R. China
Qingwen Zhao West China School of Public Health, Sichuan University, Chengdu, P. R. China
Youlin Long Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
Kun Feng Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
Xianlin Gu Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
Yihan Xu Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China Center for education of medical humanities, West China Hospital, Sichuan University, Chengdu, P. R. China
Zhengchi Li Center for education of medical humanities, West China Hospital, Sichuan University, Chengdu, P. R. China
Jin Huang Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
Liang Du Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China West China Medical Publishers, West China Hospital, Sichuan University, Chengdu, P. R. China Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China

Collapse

De Silva DTN, Moore BR, Strunk T, Petrovski M, Varis V, Chai K, Ng L, Batty K. Development of a pharmaceutical science systematic review process using a semi-automated machine learning tool: Intravenous drug compatibility in the neonatal intensive care setting. Pharmacol Res Perspect 2024;12:e1170. [PMID: 38204432 PMCID: PMC10782215 DOI: 10.1002/prp2.1170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 10/30/2023] [Accepted: 12/11/2023] [Indexed: 01/12/2024] Open

Guo E, Gupta M, Deng J, Park YJ, Paget M, Naugler C. Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study. J Med Internet Res 2024;26:e48996. [PMID: 38214966 PMCID: PMC10818236 DOI: 10.2196/48996] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 08/30/2023] [Accepted: 09/28/2023] [Indexed: 01/13/2024] Open

Abstract

BACKGROUND

The systematic review of clinical research papers is a labor-intensive and time-consuming process that often involves the screening of thousands of titles and abstracts. The accuracy and efficiency of this process are critical for the quality of the review and subsequent health care decisions. Traditional methods rely heavily on human reviewers, often requiring a significant investment of time and resources.

OBJECTIVE

This study aims to assess the performance of the OpenAI generative pretrained transformer (GPT) and GPT-4 application programming interfaces (APIs) in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review data sets and comparing their performance against ground truth labeling by 2 independent human reviewers.

METHODS

We introduce a novel workflow using the Chat GPT and GPT-4 APIs for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the API with the screening criteria in natural language and a corpus of title and abstract data sets filtered by a minimum of 2 human reviewers. We compared the performance of our model against human-reviewed papers across 6 review papers, screening over 24,000 titles and abstracts.

RESULTS

Our results show an accuracy of 0.91, a macro F1-score of 0.60, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. The interrater variability between 2 independent human screeners was κ=0.46, and the prevalence and bias-adjusted κ between our proposed methods and the consensus-based human decisions was κ=0.96. On a randomly selected subset of papers, the GPT models demonstrated the ability to provide reasoning for their decisions and corrected their initial decisions upon being asked to explain their reasoning for incorrect classifications.

CONCLUSIONS

Large language models have the potential to streamline the clinical review process, save valuable time and effort for researchers, and contribute to the overall quality of clinical reviews. By prioritizing the workflow and acting as an aid rather than a replacement for researchers and reviewers, models such as GPT-4 can enhance efficiency and lead to more accurate and reliable conclusions in medical research.

Collapse

Roth S, Wermer-Colan A. Machine Learning Methods for Systematic Reviews:: A Rapid Scoping Review. Dela J Public Health 2023;9:40-47. [PMID: 38173960 PMCID: PMC10759980 DOI: 10.32481/djph.2023.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024] Open

Abstract

Objective

At the forefront of machine learning research since its inception has been natural language processing, also known as text mining, referring to a wide range of statistical processes for analyzing textual data and retrieving information. In medical fields, text mining has made valuable contributions in unexpected ways, not least by synthesizing data from disparate biomedical studies. This rapid scoping review examines how machine learning methods for text mining can be implemented at the intersection of these disparate fields to improve the workflow and process of conducting systematic reviews in medical research and related academic disciplines.

Methods

The primary research question that this investigation asked, "what impact does the use of machine learning have on the methods used by systematic review teams to carry out the systematic review process, such as the precision of search strategies, unbiased article selection or data abstraction and/or analysis for systematic reviews and other comprehensive review types of similar methodology?" A literature search was conducted by a medical librarian utilizing multiple databases, a grey literature search and handsearching of the literature. The search was completed on December 4, 2020. Handsearching was done on an ongoing basis with an end date of April 14, 2023.

Results

The search yielded 23,190 studies after duplicates were removed. As a result, 117 studies (1.70%) met eligibility criteria for inclusion in this rapid scoping review.

Conclusions

There are several techniques and/or types of machine learning methods in development or that have already been fully developed to assist with the systematic review stages. Combined with human intelligence, these machine learning methods and tools provide promise for making the systematic review process more efficient, saving valuable time for systematic review authors, and increasing the speed in which evidence can be created and placed in the hands of decision makers and the public.

Collapse

Waffenschmidt S, Sieben W, Jakubeit T, Knelangen M, Overesch I, Bühn S, Pieper D, Skoetz N, Hausner E. Increasing the efficiency of study selection for systematic reviews using prioritization tools and a single-screening approach. Syst Rev 2023;12:161. [PMID: 37705060 PMCID: PMC10500815 DOI: 10.1186/s13643-023-02334-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 08/22/2023] [Indexed: 09/15/2023] Open

Abstract

BACKGROUND

Systematic literature screening is a key component in systematic reviews. However, this approach is resource intensive as generally two persons independently of each other (double screening) screen a vast number of search results. To develop approaches for increasing efficiency, we tested the use of text mining to prioritize search results as well as the involvement of only one person (single screening) in the study selection process.

METHOD

Our study is based on health technology assessments (HTAs) of drug and non-drug interventions. Using a sample size calculation, we consecutively included 11 searches resulting in 33 study selection processes. Of the three screeners for each search, two used screening tools with prioritization (Rayyan, EPPI Reviewer) and one a tool without prioritization. For each prioritization tool, we investigated the proportion of citations classified as relevant at three cut-offs or STOP criteria (after screening 25%, 50% and 75% of the citation set). For each STOP criterion, we measured sensitivity (number of correctly identified relevant studies divided by the total number of relevant studies in the study pool). In addition, we determined the number of relevant studies identified per single screening round and investigated whether missed studies were relevant to the HTA conclusion.

RESULTS

Overall, EPPI Reviewer performed better than Rayyan and identified the vast majority (88%, Rayyan 66%) of relevant citations after screening half of the citation set. As long as additional information sources were screened, it was sufficient to apply a single-screening approach to identify all studies relevant to the HTA conclusion. Although many relevant publications (n = 63) and studies (n = 29) were incorrectly excluded, ultimately only 5 studies could not be identified at all in 2 of the 11 searches (1x 1 study, 1x 4 studies). However, their omission did not change the overall conclusion in any HTA.

CONCLUSIONS

EPPI Reviewer helped to identify relevant citations earlier in the screening process than Rayyan. Single screening would have been sufficient to identify all studies relevant to the HTA conclusion. However, this requires screening of further information sources. It also needs to be considered that the credibility of an HTA may be questioned if studies are missing, even if they are not relevant to the HTA conclusion.

Collapse

Oude Wolcherink MJ, Pouwels XGLV, van Dijk SHB, Doggen CJM, Koffijberg H. Can artificial intelligence separate the wheat from the chaff in systematic reviews of health economic articles? Expert Rev Pharmacoecon Outcomes Res 2023;23:1049-1056. [PMID: 37573521 DOI: 10.1080/14737167.2023.2234639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/02/2023] [Indexed: 08/15/2023]

Ferdinands G, Schram R, de Bruin J, Bagheri A, Oberski DL, Tummers L, Teijema JJ, van de Schoot R. Performance of active learning models for screening prioritization in systematic reviews: a simulation study into the Average Time to Discover relevant records. Syst Rev 2023;12:100. [PMID: 37340494 PMCID: PMC10280866 DOI: 10.1186/s13643-023-02257-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 05/16/2023] [Indexed: 06/22/2023] Open

Abstract

BACKGROUND

Conducting a systematic review demands a significant amount of effort in screening titles and abstracts. To accelerate this process, various tools that utilize active learning have been proposed. These tools allow the reviewer to interact with machine learning software to identify relevant publications as early as possible. The goal of this study is to gain a comprehensive understanding of active learning models for reducing the workload in systematic reviews through a simulation study.

METHODS

The simulation study mimics the process of a human reviewer screening records while interacting with an active learning model. Different active learning models were compared based on four classification techniques (naive Bayes, logistic regression, support vector machines, and random forest) and two feature extraction strategies (TF-IDF and doc2vec). The performance of the models was compared for six systematic review datasets from different research areas. The evaluation of the models was based on the Work Saved over Sampling (WSS) and recall. Additionally, this study introduces two new statistics, Time to Discovery (TD) and Average Time to Discovery (ATD).

RESULTS

The models reduce the number of publications needed to screen by 91.7 to 63.9% while still finding 95% of all relevant records (WSS@95). Recall of the models was defined as the proportion of relevant records found after screening 10% of of all records and ranges from 53.6 to 99.8%. The ATD values range from 1.4% till 11.7%, which indicate the average proportion of labeling decisions the researcher needs to make to detect a relevant record. The ATD values display a similar ranking across the simulations as the recall and WSS values.

CONCLUSIONS

Active learning models for screening prioritization demonstrate significant potential for reducing the workload in systematic reviews. The Naive Bayes + TF-IDF model yielded the best results overall. The Average Time to Discovery (ATD) measures performance of active learning models throughout the entire screening process without the need for an arbitrary cut-off point. This makes the ATD a promising metric for comparing the performance of different models across different datasets.

Collapse

Oliveira Dos Santos Á, Sergio da Silva E, Machado Couto L, Valadares Labanca Reis G, Silva Belo V. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: a scoping review. J Biomed Inform 2023;142:104389. [PMID: 37187321 DOI: 10.1016/j.jbi.2023.104389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/11/2023] [Accepted: 05/08/2023] [Indexed: 05/17/2023]

Abstract

OBJECTIVE

Evidence-based medicine (EBM) is a decision-making process based on the conscious and judicious use of the best available scientific evidence. However, the exponential increase in the amount of information currently available likely exceeds the capacity of human-only analysis. In this context, artificial intelligence (AI) and its branches such as machine learning (ML) can be used to facilitate human efforts in analyzing the literature to foster EBM. The present scoping review aimed to examine the use of AI in the automation of biomedical literature survey and analysis with a view to establishing the state-of-the-art and identifying knowledge gaps.

MATERIALS AND METHODS

Comprehensive searches of the main databases were performed for articles published up to June 2022 and studies were selected according to inclusion and exclusion criteria. Data were extracted from the included articles and the findings categorized.

RESULTS

The total number of records retrieved from the databases was 12,145, of which 273 were included in the review. Classification of the studies according to the use of AI in evaluating the biomedical literature revealed three main application groups, namely assembly of scientific evidence (n=127; 47%), mining the biomedical literature (n=112; 41%) and quality analysis (n=34; 12%). Most studies addressed the preparation of systematic reviews, while articles focusing on the development of guidelines and evidence synthesis were the least frequent. The biggest knowledge gap was identified within the quality analysis group, particularly regarding methods and tools that assess the strength of recommendation and consistency of evidence.

CONCLUSION

Our review shows that, despite significant progress in the automation of biomedical literature surveys and analyses in recent years, intense research is needed to fill knowledge gaps on more difficult aspects of ML, deep learning and natural language processing, and to consolidate the use of automation by end-users (biomedical researchers and healthcare professionals).

Collapse

Dos Reis AHS, de Oliveira ALM, Fritsch C, Zouch J, Ferreira P, Polese JC. Usefulness of machine learning softwares to screen titles of systematic reviews: a methodological study. Syst Rev 2023;12:68. [PMID: 37061711 PMCID: PMC10105467 DOI: 10.1186/s13643-023-02231-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/05/2023] [Indexed: 04/17/2023] Open

Burgard T, Bittermann A. Reducing Literature Screening Workload With Machine Learning. ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY 2023. [DOI: 10.1027/2151-2604/a000509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]

Tercero-Hidalgo JR, Fernández-Luna JM. [In response to «Systematic reviews in five steps»: available automation tools]. Semergen 2023;49:101828. [PMID: 36195015 DOI: 10.1016/j.semerg.2022.101828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 06/22/2022] [Indexed: 02/05/2023]

Cierco Jimenez R, Lee T, Rosillo N, Cordova R, Cree IA, Gonzalez A, Indave Ruiz BI. Machine learning computational tools to assist the performance of systematic reviews: A mapping review. BMC Med Res Methodol 2022;22:322. [PMID: 36522637 PMCID: PMC9756658 DOI: 10.1186/s12874-022-01805-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/26/2022] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

Within evidence-based practice (EBP), systematic reviews (SR) are considered the highest level of evidence in that they summarize the best available research and describe the progress in a determined field. Due its methodology, SR require significant time and resources to be performed; they also require repetitive steps that may introduce biases and human errors. Machine learning (ML) algorithms therefore present a promising alternative and a potential game changer to speed up and automate the SR process. This review aims to map the current availability of computational tools that use ML techniques to assist in the performance of SR, and to support authors in the selection of the right software for the performance of evidence synthesis.

METHODS

The mapping review was based on comprehensive searches in electronic databases and software repositories to obtain relevant literature and records, followed by screening for eligibility based on titles, abstracts, and full text by two reviewers. The data extraction consisted of listing and extracting the name and basic characteristics of the included tools, for example a tool's applicability to the various SR stages, pricing options, open-source availability, and type of software. These tools were classified and graphically represented to facilitate the description of our findings.

RESULTS

A total of 9653 studies and 585 records were obtained from the structured searches performed on selected bibliometric databases and software repositories respectively. After screening, a total of 119 descriptions from publications and records allowed us to identify 63 tools that assist the SR process using ML techniques.

CONCLUSIONS

This review provides a high-quality map of currently available ML software to assist the performance of SR. ML algorithms are arguably one of the best techniques at present for the automation of SR. The most promising tools were easily accessible and included a high number of user-friendly features permitting the automation of SR and other kinds of evidence synthesis reviews.

Collapse

Pradhan SK, Adnani H, Safadi R, Yerigeri K, Nayak S, Raina R, Sinha R. Cardiorenal syndrome in the pediatric population: A systematic review. Ann Pediatr Cardiol 2022;15:493-510. [PMID: 37152514 PMCID: PMC10158476 DOI: 10.4103/apc.apc_50_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 06/26/2022] [Accepted: 08/17/2022] [Indexed: 03/03/2023] Open

Grisales-Aguirre AM, Figueroa-Vallejo CJ. Modelado de tópicos aplicado al análisis del papel del aprendizaje automático en revisiones sistemáticas. REVISTA DE INVESTIGACIÓN, DESARROLLO E INNOVACIÓN 2022. [DOI: 10.19053/20278306.v12.n2.2022.15271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]

Tercero-Hidalgo JR, Khan KS, Bueno-Cavanillas A, Fernández-López R, Huete JF, Amezcua-Prieto C, Zamora J, Fernández-Luna JM. Artificial intelligence in COVID-19 evidence syntheses was underutilized, but impactful: a methodological study. J Clin Epidemiol 2022;148:124-134. [PMID: 35513213 PMCID: PMC9059390 DOI: 10.1016/j.jclinepi.2022.04.027] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 03/09/2022] [Accepted: 04/28/2022] [Indexed: 11/24/2022]

Grbin L, Nichols P, Russell F, Fuller-Tyszkiewicz M, Olsson CA. The Development of a Living Knowledge System and Implications for Future Systematic Searching. JOURNAL OF THE AUSTRALIAN LIBRARY AND INFORMATION ASSOCIATION 2022. [DOI: 10.1080/24750158.2022.2087954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

A text-mining tool generated title-abstract screening workload savings: performance evaluation versus single-human screening. J Clin Epidemiol 2022;149:53-59. [DOI: 10.1016/j.jclinepi.2022.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 04/13/2022] [Accepted: 05/24/2022] [Indexed: 11/17/2022]

Blaizot A, Veettil SK, Saidoung P, Moreno-Garcia CF, Wiratunga N, Aceves-Martins M, Lai NM, Chaiyakunapruk N. Using artificial intelligence methods for systematic review in health sciences: A systematic review. Res Synth Methods 2022;13:353-362. [PMID: 35174972 DOI: 10.1002/jrsm.1553] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 01/31/2022] [Accepted: 02/07/2022] [Indexed: 11/07/2022]

Tuttle LJ, Donahue MJ. Effects of sediment exposure on corals: a systematic review of experimental studies. ENVIRONMENTAL EVIDENCE 2022;11:4. [PMID: 39294657 PMCID: PMC8818373 DOI: 10.1186/s13750-022-00256-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 01/10/2022] [Indexed: 06/01/2023]

Abstract

BACKGROUND

Management actions that address local-scale stressors on coral reefs can rapidly improve water quality and reef ecosystem condition. In response to reef managers who need actionable thresholds for coastal runoff and dredging, we conducted a systematic review and meta-analysis of experimental studies that explore the effects of sediment on corals. We identified exposure levels that 'adversely' affect corals while accounting for sediment bearing (deposited vs. suspended), coral life-history stage, and species, thus providing empirically based estimates of stressor thresholds on vulnerable coral reefs.

METHODS

We searched online databases and grey literature to obtain a list of potential studies, assess their eligibility, and critically appraise them for validity and risk of bias. Data were extracted from eligible studies and grouped by sediment bearing and coral response to identify thresholds in terms of the lowest exposure levels that induced an adverse physiological and/or lethal effect. Meta-regression estimated the dose-response relationship between exposure level and the magnitude of a coral's response, with random-effects structures to estimate the proportion of variance explained by factors such as study and coral species.

REVIEW FINDINGS

After critical appraisal of over 15,000 records, our systematic review of corals' responses to sediment identified 86 studies to be included in meta-analyses (45 studies for deposited sediment and 42 studies for suspended sediment). The lowest sediment exposure levels that caused adverse effects in corals were well below the levels previously described as 'normal' on reefs: for deposited sediment, adverse effects occurred as low as 1 mg/cm2/day for larvae (limited settlement rates) and 4.9 mg/cm2/day for adults (tissue mortality); for suspended sediment, adverse effects occurred as low as 10 mg/L for juveniles (reduced growth rates) and 3.2 mg/L for adults (bleaching and tissue mortality). Corals take at least 10 times longer to experience tissue mortality from exposure to suspended sediment than to comparable concentrations of deposited sediment, though physiological changes manifest 10 times faster in response to suspended sediment than to deposited sediment. Threshold estimates derived from continuous response variables (magnitude of adverse effect) largely matched the lowest-observed adverse-effect levels from a summary of studies, or otherwise helped us to identify research gaps that should be addressed to better quantify the dose-response relationship between sediment exposure and coral health.

CONCLUSIONS

We compiled a global dataset that spans three oceans, over 140 coral species, decades of research, and a range of field- and lab-based approaches. Our review and meta-analysis inform the no-observed and lowest-observed adverse-effect levels (NOAEL, LOAEL) that are used in management consultations by U.S. federal agencies. In the absence of more location- or species-specific data to inform decisions, our results provide the best available information to protect vulnerable reef-building corals from sediment stress. Based on gaps and limitations identified by our review, we make recommendations to improve future studies and recommend future synthesis to disentangle the potentially synergistic effects of multiple coral-reef stressors.

Collapse

Artificial Intelligence in Evidence-Based Medicine. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Monteiro S, Nejad YS, Aucoin M. Perinatal diet and offspring anxiety: A scoping review. Transl Neurosci 2022;13:275-290. [PMID: 36128579 PMCID: PMC9449687 DOI: 10.1515/tnsci-2022-0242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/04/2022] [Accepted: 08/12/2022] [Indexed: 11/15/2022] Open

Hamel C, Hersi M, Kelly SE, Tricco AC, Straus S, Wells G, Pham B, Hutton B. Guidance for using artificial intelligence for title and abstract screening while conducting knowledge syntheses. BMC Med Res Methodol 2021;21:285. [PMID: 34930132 PMCID: PMC8686081 DOI: 10.1186/s12874-021-01451-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 10/26/2021] [Indexed: 02/01/2023] Open

Abstract

BACKGROUND

Systematic reviews are the cornerstone of evidence-based medicine. However, systematic reviews are time consuming and there is growing demand to produce evidence more quickly, while maintaining robust methods. In recent years, artificial intelligence and active-machine learning (AML) have been implemented into several SR software applications. As some of the barriers to adoption of new technologies are the challenges in set-up and how best to use these technologies, we have provided different situations and considerations for knowledge synthesis teams to consider when using artificial intelligence and AML for title and abstract screening.

METHODS

We retrospectively evaluated the implementation and performance of AML across a set of ten historically completed systematic reviews. Based upon the findings from this work and in consideration of the barriers we have encountered and navigated during the past 24 months in using these tools prospectively in our research, we discussed and developed a series of practical recommendations for research teams to consider in seeking to implement AML tools for citation screening into their workflow.

RESULTS

We developed a seven-step framework and provide guidance for when and how to integrate artificial intelligence and AML into the title and abstract screening process. Steps include: (1) Consulting with Knowledge user/Expert Panel; (2) Developing the search strategy; (3) Preparing your review team; (4) Preparing your database; (5) Building the initial training set; (6) Ongoing screening; and (7) Truncating screening. During Step 6 and/or 7, you may also choose to optimize your team, by shifting some members to other review stages (e.g., full-text screening, data extraction).

CONCLUSION

Artificial intelligence and, more specifically, AML are well-developed tools for title and abstract screening and can be integrated into the screening process in several ways. Regardless of the method chosen, transparent reporting of these methods is critical for future studies evaluating artificial intelligence and AML.

Collapse

Surian D, Bourgeois FT, Dunn AG. The automation of relevant trial registration screening for systematic review updates: an evaluation study on a large dataset of ClinicalTrials.gov registrations. BMC Med Res Methodol 2021;21:281. [PMID: 34922458 PMCID: PMC8684229 DOI: 10.1186/s12874-021-01485-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 11/22/2021] [Indexed: 11/10/2022] Open

Aucoin M, LaChance L, Naidoo U, Remy D, Shekdar T, Sayar N, Cardozo V, Rawana T, Chan I, Cooley K. Diet and Anxiety: A Scoping Review. Nutrients 2021;13:nu13124418. [PMID: 34959972 PMCID: PMC8706568 DOI: 10.3390/nu13124418] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 11/23/2021] [Accepted: 12/04/2021] [Indexed: 12/22/2022] Open

Abstract

Anxiety disorders are the most common group of mental disorders. There is mounting evidence demonstrating the importance of nutrition in the development and progression of mental disorders such as depression; however, less is known about the role of nutrition in anxiety disorders. This scoping review sought to systematically map the existing literature on anxiety disorders and nutrition in order to identify associations between dietary factors and anxiety symptoms or disorder prevalence as well as identify gaps and opportunities for further research. The review followed established methodological approaches for scoping reviews. Due to the large volume of results, an online program (Abstrackr) with artificial intelligence features was used. Studies reporting an association between a dietary constituent and anxiety symptoms or disorders were counted and presented in figures. A total of 55,914 unique results were identified. After a full-text review, 1541 articles met criteria for inclusion. Analysis revealed an association between less anxiety and more fruits and vegetables, omega-3 fatty acids, “healthy” dietary patterns, caloric restriction, breakfast consumption, ketogenic diet, broad-spectrum micronutrient supplementation, zinc, magnesium and selenium, probiotics, and a range of phytochemicals. Analysis revealed an association between higher levels of anxiety and high-fat diet, inadequate tryptophan and dietary protein, high intake of sugar and refined carbohydrates, and “unhealthy” dietary patterns. Results are limited by a large percentage of animal and observational studies. Only 10% of intervention studies involved participants with anxiety disorders, limiting the applicability of the findings. High quality intervention studies involving participants with anxiety disorders are warranted.

Collapse

Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: A scoping review. J Clin Epidemiol 2021;144:22-42. [PMID: 34896236 DOI: 10.1016/j.jclinepi.2021.12.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 11/09/2021] [Accepted: 12/02/2021] [Indexed: 11/19/2022]

Abdelkader W, Navarro T, Parrish R, Cotoi C, Germini F, Iorio A, Haynes RB, Lokker C. Machine Learning Approaches to Retrieve High-Quality, Clinically Relevant Evidence From the Biomedical Literature: Systematic Review. JMIR Med Inform 2021;9:e30401. [PMID: 34499041 PMCID: PMC8461527 DOI: 10.2196/30401] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/15/2021] [Accepted: 07/25/2021] [Indexed: 11/20/2022] Open

O'Hearn K, MacDonald C, Tsampalieros A, Kadota L, Sandarage R, Jayawarden SK, Datko M, Reynolds JM, Bui T, Sultan S, Sampson M, Pratt M, Barrowman N, Nama N, Page M, McNally JD. Evaluating the relationship between citation set size, team size and screening methods used in systematic reviews: a cross-sectional study. BMC Med Res Methodol 2021;21:142. [PMID: 34238247 PMCID: PMC8264476 DOI: 10.1186/s12874-021-01335-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 06/19/2021] [Indexed: 11/26/2022] Open

Abstract

Background

Standard practice for conducting systematic reviews (SRs) is time consuming and involves the study team screening hundreds or thousands of citations. As the volume of medical literature grows, the citation set sizes and corresponding screening efforts increase. While larger team size and alternate screening methods have the potential to reduce workload and decrease SR completion times, it is unknown whether investigators adapt team size or methods in response to citation set sizes. Using a cross-sectional design, we sought to understand how citation set size impacts (1) the total number of authors or individuals contributing to screening and (2) screening methods.

Methods

MEDLINE was searched in April 2019 for SRs on any health topic. A total of 1880 unique publications were identified and sorted into five citation set size categories (after deduplication): < 1,000, 1,001–2,500, 2,501–5,000, 5,001–10,000, and > 10,000. A random sample of 259 SRs were selected (~ 50 per category) for data extraction and analysis.

Results

With the exception of the pairwise t test comparing the under 1000 and over 10,000 categories (median 5 vs. 6, p = 0.049) no statistically significant relationship was evident between author number and citation set size. While visual inspection was suggestive, statistical testing did not consistently identify a relationship between citation set size and number of screeners (title-abstract, full text) or data extractors. However, logistic regression identified investigators were significantly more likely to deviate from gold-standard screening methods (i.e. independent duplicate screening) with larger citation sets. For every doubling of citation size, the odds of using gold-standard screening decreased by 15 and 20% at title-abstract and full text review, respectively. Finally, few SRs reported using crowdsourcing (n = 2) or computer-assisted screening (n = 1).

Conclusions

Large citation set sizes present a challenge to SR teams, especially when faced with time-sensitive health policy questions. Our study suggests that with increasing citation set size, authors are less likely to adhere to gold-standard screening methods. It is possible that adjunct screening methods, such as crowdsourcing (large team) and computer-assisted technologies, may provide a viable solution for authors to complete their SRs in a timely manner.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-021-01335-5.

Collapse

Pham B, Jovanovic J, Bagheri E, Antony J, Ashoor H, Nguyen TT, Rios P, Robson R, Thomas SM, Watt J, Straus SE, Tricco AC. Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow. Syst Rev 2021;10:156. [PMID: 34039433 PMCID: PMC8152711 DOI: 10.1186/s13643-021-01700-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 05/12/2021] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated "workflow" to conduct abstract screening for systematic reviews and other knowledge synthesis methods.

METHODS

We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for ("true") eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening.

RESULTS

With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews.

CONCLUSION

The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review's conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers.

Collapse

Affiliation(s)

Ba’ Pham Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
Jelena Jovanovic Department of Software Engineering, University of Belgrade, Jove Ilica 154, Belgrade, 11000 Serbia
Ebrahim Bagheri Department of Electrical and Computer Engineering, Ryerson University, 350 Victoria Street, Toronto, Ontario M5B 2K3 Canada
Jesmin Antony Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
Huda Ashoor Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
Tam T. Nguyen Department of Electrical and Computer Engineering, Ryerson University, 350 Victoria Street, Toronto, Ontario M5B 2K3 Canada
Patricia Rios Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
Reid Robson Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
Sonia M. Thomas Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
Jennifer Watt Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
Sharon E. Straus Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
Andrea C. Tricco Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada Epidemiology Division and Institute for Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, 155 College St Room 500, Toronto, Ontario M5T 3M7 Canada Queen’s Collaboration for Health Care Quality Joanna Briggs Institute Centre of Excellence, School of Nursing, Queen’s University, 99 University Ave, Kingston, Ontario K7L 3N6 Canada

Collapse

Kazda L, Bell K, Thomas R, McGeechan K, Sims R, Barratt A. Overdiagnosis of Attention-Deficit/Hyperactivity Disorder in Children and Adolescents: A Systematic Scoping Review. JAMA Netw Open 2021;4:e215335. [PMID: 33843998 PMCID: PMC8042533 DOI: 10.1001/jamanetworkopen.2021.5335] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Abstract

IMPORTANCE

Reported increases in attention-deficit/hyperactivity disorder (ADHD) diagnoses are accompanied by growing debate about the underlying factors. Although overdiagnosis is often suggested, no comprehensive evaluation of evidence for or against overdiagnosis has ever been undertaken and is urgently needed to enable evidence-based, patient-centered diagnosis and treatment of ADHD in contemporary health services.

OBJECTIVE

To systematically identify, appraise, and synthesize the evidence on overdiagnosis of ADHD in children and adolescents using a published 5-question framework for detecting overdiagnosis in noncancer conditions.

EVIDENCE REVIEW

This systematic scoping review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews and Joanna Briggs Methodology, including the PRISMA-ScR Checklist. MEDLINE, Embase, PsychINFO, and the Cochrane Library databases were searched for studies published in English between January 1, 1979, and August 21, 2020. Studies of children and adolescents (aged ≤18 years) with ADHD that focused on overdiagnosis plus studies that could be mapped to 1 or more framework question were included. Two researchers independently reviewed all abstracts and full-text articles, and all included studies were assessed for quality.

FINDINGS

Of the 12 267 potentially relevant studies retrieved, 334 (2.7%) were included. Of the 334 studies, 61 (18.3%) were secondary and 273 (81.7%) were primary research articles. Substantial evidence of a reservoir of ADHD was found in 104 studies, providing a potential for diagnoses to increase (question 1). Evidence that actual ADHD diagnosis had increased was found in 45 studies (question 2). Twenty-five studies showed that these additional cases may be on the milder end of the ADHD spectrum (question 3), and 83 studies showed that pharmacological treatment of ADHD was increasing (question 4). A total of 151 studies reported on outcomes of diagnosis and pharmacological treatment (question 5). However, only 5 studies evaluated the critical issue of benefits and harms among the additional, milder cases. These studies supported a hypothesis of diminishing returns in which the harms may outweigh the benefits for youths with milder symptoms.

CONCLUSIONS AND RELEVANCE

This review found evidence of ADHD overdiagnosis and overtreatment in children and adolescents. Evidence gaps remain and future research is needed, in particular research on the long-term benefits and harms of diagnosing and treating ADHD in youths with milder symptoms; therefore, practitioners should be mindful of these knowledge gaps, especially when identifying these individuals and to ensure safe and equitable practice and policy.

Collapse

Chai KEK, Lines RLJ, Gucciardi DF, Ng L. Research Screener: a machine learning tool to semi-automate abstract screening for systematic reviews. Syst Rev 2021;10:93. [PMID: 33795003 PMCID: PMC8017894 DOI: 10.1186/s13643-021-01635-3] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 03/11/2021] [Indexed: 11/10/2022] Open

Natural language processing was effective in assisting rapid title and abstract screening when updating systematic reviews. J Clin Epidemiol 2021;133:121-129. [PMID: 33485929 DOI: 10.1016/j.jclinepi.2021.01.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 01/06/2021] [Accepted: 01/14/2021] [Indexed: 02/05/2023]

Mahri M, Shen N, Berrizbeitia F, Rodan R, Daer A, Faigan M, Taqi D, Wu KY, Ahmadi M, Ducret M, Emami E, Tamimi F. Osseointegration Pharmacology: A Systematic Mapping Using Artificial Intelligence. Acta Biomater 2021;119:284-302. [PMID: 33181361 DOI: 10.1016/j.actbio.2020.11.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 11/04/2020] [Accepted: 11/05/2020] [Indexed: 12/25/2022]

Artificial Intelligence in Evidence-Based Medicine. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_43-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Yamada T, Yoneoka D, Hiraike Y, Hino K, Toyoshiba H, Shishido A, Noma H, Shojima N, Yamauchi T. Deep Neural Network for Reducing the Screening Workload in Systematic Reviews for Clinical Guidelines: Algorithm Validation Study. J Med Internet Res 2020;22:e22422. [PMID: 33262102 PMCID: PMC7806440 DOI: 10.2196/22422] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 11/10/2020] [Accepted: 11/30/2020] [Indexed: 01/16/2023] Open

Abstract

Background

Performing systematic reviews is a time-consuming and resource-intensive process.

Objective

We investigated whether a machine learning system could perform systematic reviews more efficiently.

Methods

All systematic reviews and meta-analyses of interventional randomized controlled trials cited in recent clinical guidelines from the American Diabetes Association, American College of Cardiology, American Heart Association (2 guidelines), and American Stroke Association were assessed. After reproducing the primary screening data set according to the published search strategy of each, we extracted correct articles (those actually reviewed) and incorrect articles (those not reviewed) from the data set. These 2 sets of articles were used to train a neural network–based artificial intelligence engine (Concept Encoder, Fronteo Inc). The primary endpoint was work saved over sampling at 95% recall (WSS@95%).

Results

Among 145 candidate reviews of randomized controlled trials, 8 reviews fulfilled the inclusion criteria. For these 8 reviews, the machine learning system significantly reduced the literature screening workload by at least 6-fold versus that of manual screening based on WSS@95%. When machine learning was initiated using 2 correct articles that were randomly selected by a researcher, a 10-fold reduction in workload was achieved versus that of manual screening based on the WSS@95% value, with high sensitivity for eligible studies. The area under the receiver operating characteristic curve increased dramatically every time the algorithm learned a correct article.

Conclusions

Concept Encoder achieved a 10-fold reduction of the screening workload for systematic review after learning from 2 randomly selected studies on the target topic. However, few meta-analyses of randomized controlled trials were included. Concept Encoder could facilitate the acquisition of evidence for clinical guidelines.

Collapse

Gates A, Gates M, DaRosa D, Elliott SA, Pillay J, Rahman S, Vandermeer B, Hartling L. Decoding semi-automated title-abstract screening: findings from a convenience sample of reviews. Syst Rev 2020;9:272. [PMID: 33243276 PMCID: PMC7694314 DOI: 10.1186/s13643-020-01528-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 11/11/2020] [Indexed: 01/08/2023] Open

Abstract

BACKGROUND

We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening and explored whether Abstrackr's predictions varied by review or study-level characteristics.

METHODS

For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews), we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool's predictions varied by review and study-level characteristics.

RESULTS

Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) h of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) - 1.53 (- 2.92, - 0.15) to - 1.17 (- 2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P = 0.37) or intervention type (simple or complex, P = 0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P = 0.01), or that included only trials (95%) vs. multiple designs (86%) (P = 0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P = 0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P = 0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P = 0.02) were more often correctly predicted as relevant.

CONCLUSION

Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. Several of our findings are paradoxical and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample, the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.

Collapse

Aucoin M, LaChance L, Cooley K, Kidd S. Diet and Psychosis: A Scoping Review. Neuropsychobiology 2020;79:20-42. [PMID: 30359969 DOI: 10.1159/000493399] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 08/29/2018] [Indexed: 12/11/2022]

Abstract

INTRODUCTION

Schizophrenia spectrum disorders (SSD) represent a cluster of severe mental illnesses. Diet has been identified as a modifiable risk factor and opportunity for intervention in many physical illnesses and more recently in mental illnesses such as unipolar depression; however, no dietary guidelines exist for patients with SSD.

OBJECTIVE

This review sought to systematically scope the existing literature in order to identify nutritional interventions for the prevention or treatment of mental health symptoms in SSD as well as gaps and opportunities for further research.

METHODS

This review followed established methodological approaches for scoping reviews including an extensive a priori search strategy and duplicate screening. Because of the large volume of results, an online program (Abstrackr) was used for screening and tagging. Data were extracted based on the dietary constituents and analyzed.

RESULTS

Of 55,330 results identified by the search, 822 studies met the criteria for inclusion. Observational evidence shows a connection between the presence of psychotic disorders and poorer quality dietary patterns, higher intake of refined carbohydrates and total fat, and lower intake or levels of fibre, ω-3 and ω-6 fatty acids, vegetables, fruit, and certain vitamins and minerals (vitamin B12 and B6, folate, vitamin C, zinc, and selenium). Evidence illustrates a role of food allergy and sensitivity as well as microbiome composition and specific phytonutrients (such as L-theanine, sulforaphane, and resveratrol). Experimental studies have demonstrated benefit using healthy diet patterns and specific vitamins and minerals (vitamin B12 and B6, folate, and zinc) and amino acids (serine, lysine, glycine, and tryptophan).

DISCUSSION

Overall, these findings were consistent with many other bodies of knowledge about healthy dietary patterns. Many limitations exist related to the design of the individual studies and the ability to extrapolate the results of studies using dietary supplements to dietary interventions (food). Dietary recommendations are presented as well as recommendations for further research including more prospective observational studies and intervention studies that modify diet constituents or entire dietary patterns with statistical power to detect mental health outcomes.

Collapse

Hinkelbein J, Kerkhoff S, Adler C, Ahlbäck A, Braunecker S, Burgard D, Cirillo F, De Robertis E, Glaser E, Haidl TK, Hodkinson P, Iovino IZ, Jansen S, Johnson KVL, Jünger S, Komorowski M, Leary M, Mackaill C, Nagrebetsky A, Neuhaus C, Rehnberg L, Romano GM, Russomano T, Schmitz J, Spelten O, Starck C, Thierry S, Velho R, Warnecke T. Cardiopulmonary resuscitation (CPR) during spaceflight - a guideline for CPR in microgravity from the German Society of Aerospace Medicine (DGLRM) and the European Society of Aerospace Medicine Space Medicine Group (ESAM-SMG). Scand J Trauma Resusc Emerg Med 2020;28:108. [PMID: 33138865 PMCID: PMC7607644 DOI: 10.1186/s13049-020-00793-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 10/07/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

With the "Artemis"-mission mankind will return to the Moon by 2024. Prolonged periods in space will not only present physical and psychological challenges to the astronauts, but also pose risks concerning the medical treatment capabilities of the crew. So far, no guideline exists for the treatment of severe medical emergencies in microgravity. We, as a international group of researchers related to the field of aerospace medicine and critical care, took on the challenge and developed a an evidence-based guideline for the arguably most severe medical emergency - cardiac arrest.

METHODS

After the creation of said international group, PICO questions regarding the topic cardiopulmonary resuscitation in microgravity were developed to guide the systematic literature research. Afterwards a precise search strategy was compiled which was then applied to "MEDLINE". Four thousand one hundred sixty-five findings were retrieved and consecutively screened by at least 2 reviewers. This led to 88 original publications that were acquired in full-text version and then critically appraised using the GRADE methodology. Those studies formed to basis for the guideline recommendations that were designed by at least 2 experts on the given field. Afterwards those recommendations were subject to a consensus finding process according to the DELPHI-methodology.

RESULTS

We recommend a differentiated approach to CPR in microgravity with a division into basic life support (BLS) and advanced life support (ALS) similar to the Earth-based guidelines. In immediate BLS, the chest compression method of choice is the Evetts-Russomano method (ER), whereas in an ALS scenario, with the patient being restrained on the Crew Medical Restraint System, the handstand method (HS) should be applied. Airway management should only be performed if at least two rescuers are present and the patient has been restrained. A supraglottic airway device should be used for airway management where crew members untrained in tracheal intubation (TI) are involved.

DISCUSSION

CPR in microgravity is feasible and should be applied according to the Earth-based guidelines of the AHA/ERC in relation to fundamental statements, like urgent recognition and action, focus on high-quality chest compressions, compression depth and compression-ventilation ratio. However, the special circumstances presented by microgravity and spaceflight must be considered concerning central points such as rescuer position and methods for the performance of chest compressions, airway management and defibrillation.

Collapse

Affiliation(s)

Jochen Hinkelbein German Society of Aviation and Space Medicine (DGLRM), Munich, Germany. .,Department of Anaesthesiology and Intensive Care Medicine, University Hospital of Cologne, 50937, Cologne, Germany. .,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.
Steffen Kerkhoff German Society of Aviation and Space Medicine (DGLRM), Munich, Germany.,Department of Anaesthesiology and Intensive Care Medicine, University Hospital of Cologne, 50937, Cologne, Germany.,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany
Christoph Adler Department of Internal Medicine III, Heart Centre of the University of Cologne, Cologne, Germany.,Fire Department City of Cologne, Institute for Security Science and Rescue Technology, Cologne, Germany
Anton Ahlbäck Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Department of Anaesthesia and Intensive Care, Örebro University Hospital, Örebro, Sweden
Stefan Braunecker Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Department of Anesthesiology, University of Florida College of Medicine, Jacksonville, FL, USA
Daniel Burgard Department of Cardiology and Angiology, Heart Center Duisburg, Evangelisches Klinikum Niederrhein, Duisburg, Germany
Fabrizio Cirillo Department of Anaesthesia and Intensive Care, Santa Maria delle Grazie Hospital, Pozzuoli, Naples, Italy
Edoardo De Robertis Division of Anaesthesia, Analgesia, and Intensive Care, Department of Surgical and Biomedical Sciences, University of Perugia, Perugia, Italy
Eckard Glaser German Society of Aviation and Space Medicine (DGLRM), Munich, Germany.,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,, Gerbrunn, Germany
Theresa K Haidl Department of Psychiatry and Psychotherapy, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50937, Cologne, Germany
Pete Hodkinson Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Aerospace Medicine, Centre of Human and Applied Physiological Sciences, King's College, London, UK
Ivan Zefiro Iovino Department of Anaesthesia and Intensive Care, Santa Maria delle Grazie Hospital, Pozzuoli, Naples, Italy
Stefanie Jansen Department of Otorhinolaryngology, Head and Neck Surgery, University of Cologne, 50937, Cologne, Germany
Kolaparambil Varghese Lydia Johnson Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,University of Perugia-Terni, Perugia-Terni, Italy
Saskia Jünger Cologne Center for Ethics, Rights, Economics, and Social Sciences of Health (CERES), University of Cologne and University Hospital of Cologne, Cologne, Germany
Matthieu Komorowski Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, Exhibition road, London, SW7 2AZ, UK
Marion Leary School of Nursing, University of Pennsylvania, Philadelphia, PA, USA
Christina Mackaill Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Accident and Emergency Department, Queen Elizabeth University Hospital, Glasgow, Scotland
Alexander Nagrebetsky Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, USA
Christopher Neuhaus German Society of Aviation and Space Medicine (DGLRM), Munich, Germany.,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Department of Anesthesiology, Heidelberg University Hospital, Heidelberg, Germany
Lucas Rehnberg University Hospital Southampton NHS Foundation Trust, Anaesthetic Department, Southampton, UK
Giovanni Marco Romano Anesthesia and Postoperative Intensive Care Unit, AORN Cardarelli, Naples, Italy
Thais Russomano Centre of Human and Applied Physiological Sciences, Kings College London, London, UK
Jan Schmitz German Society of Aviation and Space Medicine (DGLRM), Munich, Germany.,Department of Anaesthesiology and Intensive Care Medicine, University Hospital of Cologne, 50937, Cologne, Germany.,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany
Oliver Spelten Department of Anaesthesiology and Intensive Care Medicine, Schön Klinik Düsseldorf, Am Heerdter Krankenhaus 2, 40549, Düsseldorf, Germany
Clément Starck Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Anesthesiology Department, Brest University Hospital, Brest, France
Seamus Thierry Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Anesthesiology Department, Bretagne Sud General Hospital, Lorient, France.,Medical and Maritime Simulation Center, Lorient, France.,Laboratory of Psychology, Cognition, Communication and Behavior, University of Bretagne Sud, Vannes, France
Rochelle Velho Academic Department of Anaesthesia, Critical Care, Pain and Resuscitation, University Hospitals Birmingham, Heart of England NHS Foundation Trust, Birmingham, UK
Tobias Warnecke University Department for Anesthesia, Intensive and Emergency Medicine and Pain Management, Hospital Oldenburg, Oldenburg, Germany

Collapse

Reddy SM, Patel S, Weyrich M, Fenton J, Viswanathan M. Comparison of a traditional systematic review approach with review-of-reviews and semi-automation as strategies to update the evidence. Syst Rev 2020;9:243. [PMID: 33076975 PMCID: PMC7574591 DOI: 10.1186/s13643-020-01450-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 08/07/2020] [Indexed: 11/30/2022] Open

An evaluation of DistillerSR's machine learning-based prioritization tool for title/abstract screening - impact on reviewer-relevant outcomes. BMC Med Res Methodol 2020;20:256. [PMID: 33059590 PMCID: PMC7559198 DOI: 10.1186/s12874-020-01129-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 09/22/2020] [Indexed: 01/14/2023] Open

Abstract

BACKGROUND

Systematic reviews often require substantial resources, partially due to the large number of records identified during searching. Although artificial intelligence may not be ready to fully replace human reviewers, it may accelerate and reduce the screening burden. Using DistillerSR (May 2020 release), we evaluated the performance of the prioritization simulation tool to determine the reduction in screening burden and time savings.

METHODS

Using a true recall @ 95%, response sets from 10 completed systematic reviews were used to evaluate: (i) the reduction of screening burden; (ii) the accuracy of the prioritization algorithm; and (iii) the hours saved when a modified screening approach was implemented. To account for variation in the simulations, and to introduce randomness (through shuffling the references), 10 simulations were run for each review. Means, standard deviations, medians and interquartile ranges (IQR) are presented.

RESULTS

Among the 10 systematic reviews, using true recall @ 95% there was a median reduction in screening burden of 47.1% (IQR: 37.5 to 58.0%). A median of 41.2% (IQR: 33.4 to 46.9%) of the excluded records needed to be screened to achieve true recall @ 95%. The median title/abstract screening hours saved using a modified screening approach at a true recall @ 95% was 29.8 h (IQR: 28.1 to 74.7 h). This was increased to a median of 36 h (IQR: 32.2 to 79.7 h) when considering the time saved not retrieving and screening full texts of the remaining 5% of records not yet identified as included at title/abstract. Among the 100 simulations (10 simulations per review), none of these 5% of records were a final included study in the systematic review. The reduction in screening burden to achieve true recall @ 95% compared to @ 100% resulted in a reduced screening burden median of 40.6% (IQR: 38.3 to 54.2%).

CONCLUSIONS

The prioritization tool in DistillerSR can reduce screening burden. A modified or stop screening approach once a true recall @ 95% is achieved appears to be a valid method for rapid reviews, and perhaps systematic reviews. This needs to be further evaluated in prospective reviews using the estimated recall.

Collapse

Bougioukas KI, Bouras EC, Avgerinos KI, Dardavessis T, Haidich A. How to keep up to date with medical information using web‐based resources: a systematised review and narrative synthesis. Health Info Libr J 2020;37:254-292. [DOI: 10.1111/hir.12318] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/20/2020] [Indexed: 12/30/2022]

Deng Z, Yin K, Bao Y, Armengol VD, Wang C, Tiwari A, Barzilay R, Parmigiani G, Braun D, Hughes KS. Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance. JCO Clin Cancer Inform 2020;3:1-9. [PMID: 31419182 DOI: 10.1200/cci.19.00043] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

PURPOSE

Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes-that is, penetrance-enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) -based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure.

MATERIALS AND METHODS

We compared the semiautomated NLP-based procedure, which involves automated abstract classification and text mining, followed by human review of identified studies, with the traditional procedure that requires human review of all studies. Ten high-quality gene-cancer penetrance meta-analyses spanning 16 gene-cancer associations were used as the gold standard by which to evaluate the performance of our procedure. For each meta-analysis, we evaluated the number of abstracts that required human review (workload) and the ability to identify the studies that were included by the authors in their quantitative analysis (coverage).

RESULTS

Compared with the traditional procedure, the semiautomated NLP-based procedure led to a lower workload across all 10 meta-analyses, with an overall 84% reduction (2,774 abstracts v 16,941 abstracts) in the amount of human review required. Overall coverage was 93%-we are able to identify 132 of 142 studies-before reviewing references of identified studies. Reasons for the 10 missed studies included blank and poorly written abstracts. After reviewing references, nine of the previously missed studies were identified and coverage improved to 99% (141 of 142 studies).

CONCLUSION

We demonstrated that an NLP-based procedure can significantly reduce the review workload without compromising the ability to identify relevant studies. NLP algorithms have promising potential for reducing human efforts in the literature review process.

Collapse