1
|
Lokker C, Abdelkader W, Bagheri E, Parrish R, Cotoi C, Navarro T, Germini F, Linkins LA, Haynes RB, Chu L, Afzal M, Iorio A. Boosting efficiency in a clinical literature surveillance system with LightGBM. PLOS DIGITAL HEALTH 2024; 3:e0000299. [PMID: 39312500 PMCID: PMC11419392 DOI: 10.1371/journal.pdig.0000299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 08/14/2024] [Indexed: 09/25/2024]
Abstract
Given the suboptimal performance of Boolean searching to identify methodologically sound and clinically relevant studies in large bibliographic databases, exploring machine learning (ML) to efficiently classify studies is warranted. To boost the efficiency of a literature surveillance program, we used a large internationally recognized dataset of articles tagged for methodological rigor and applied an automated ML approach to train and test binary classification models to predict the probability of clinical research articles being of high methodologic quality. We trained over 12,000 models on a dataset of titles and abstracts of 97,805 articles indexed in PubMed from 2012-2018 which were manually appraised for rigor by highly trained research associates and rated for clinical relevancy by practicing clinicians. As the dataset is unbalanced, with more articles that do not meet the criteria for rigor, we used the unbalanced dataset and over- and under-sampled datasets. Models that maintained sensitivity for high rigor at 99% and maximized specificity were selected and tested in a retrospective set of 30,424 articles from 2020 and validated prospectively in a blinded study of 5253 articles. The final selected algorithm, combining a LightGBM (gradient boosting machine) model trained in each dataset, maintained high sensitivity and achieved 57% specificity in the retrospective validation test and 53% in the prospective study. The number of articles needed to read to find one that met appraisal criteria was 3.68 (95% CI 3.52 to 3.85) in the prospective study, compared with 4.63 (95% CI 4.50 to 4.77) when relying only on Boolean searching. Gradient-boosting ML models reduced the work required to classify high quality clinical research studies by 45%, improving the efficiency of literature surveillance and subsequent dissemination to clinicians and other evidence users.
Collapse
Affiliation(s)
- Cynthia Lokker
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Wael Abdelkader
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Elham Bagheri
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Rick Parrish
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Chris Cotoi
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Tamara Navarro
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Federico Germini
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Lori-Ann Linkins
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - R. Brian Haynes
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Lingyang Chu
- Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada
| | - Muhammad Afzal
- School of Computing and Digital Technology, Birmingham City University, Birmingham, United Kingdom
| | - Alfonso Iorio
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
2
|
Ramírez SI, Partin M, Snyder AH, Ko E, Aruma J, Castaneda MC, Casas RS. A Scoping Review of Obstetrics and Gynecology Curricula in Primary Care Residency Programs. J Gen Intern Med 2024:10.1007/s11606-024-08987-1. [PMID: 39187722 DOI: 10.1007/s11606-024-08987-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 07/30/2024] [Indexed: 08/28/2024]
Abstract
BACKGROUND While Women's Health (WH) is a priority for primary care, (Family Medicine (FM), Internal Medicine (IM), Pediatrics (Peds), and combined Medicine/Pediatrics (Med/Peds)), residency curricula remain heterogeneous with deficits in graduates' WH expertise and skills. The overall objective of this study was to assess the quality of WH curricula at primary care residency programs in the United States (US), with a focus on topics in obstetrics and gynecology (OBGYN). METHODS PubMed®, ERIC, The Cochrane Library, MedEdPORTAL, and professional organization websites were systematically searched in 2019 and updated in 2021. Included studies described OBGYN educational curricula in US primary care residency programs. Following abstract screening and full-text review, data from eligible studies was abstracted and quality assessed using the Medical Education Research Study Quality Instrument (MERSQI). RESULTS A total of 109 studies met the inclusion criteria. Over a quarter of studies were interdepartmental or interdisciplinary. The most common single-department studies were IM (38%) and FM (26%). Twenty (25%) studies addressed comprehensive OBGYN curricula; the most common individual topics were cervical and breast cancer screening (31%) and contraception (16%). Most studies utilized multiple instructional modalities, most commonly didactics (54%), clinical experiences (41%), and/or simulation (21%). Most studies included self-reported outcomes by residents (70%), with few (11%) reporting higher-level assessments (i.e., patient, or clinical outcomes). Most studies were single-group pre- and post-test (42%) with few randomized controlled trials (4%). The mean MERSQI score for studies with sufficient data (90%) was 9.8 (range 3 to 15.5). DISCUSSION OBGYN educational curricula for primary care trainees in the US was varied with gaps in represented residents, content, assessments, and study quality.
Collapse
Affiliation(s)
- Sarah I Ramírez
- Department of Family and Community Medicine, Penn State College of Medicine, 500 University Drive; HP 11, Hershey, PA, 17033, USA.
| | - Michael Partin
- Department of Family and Community Medicine, Penn State College of Medicine, 500 University Drive; HP 11, Hershey, PA, 17033, USA
| | - Ashley H Snyder
- Internal Medicine, Penn State College of Medicine, Hershey, PA, USA
| | - Elizabeth Ko
- Internal Medicine, Penn State College of Medicine, Hershey, PA, USA
| | - Jane Aruma
- Anesthesiology, Northwestern University, Evanston, IL, USA
| | - Marie C Castaneda
- Harrell Health Sciences Library: Research and Learning Commons, Penn State College of Medicine, Hershey, PA, USA
| | - Rachel S Casas
- Internal Medicine, Penn State College of Medicine, Hershey, PA, USA
| |
Collapse
|
3
|
Matsui K, Utsumi T, Aoki Y, Maruki T, Takeshima M, Takaesu Y. Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews. J Med Internet Res 2024; 26:e52758. [PMID: 39151163 PMCID: PMC11364944 DOI: 10.2196/52758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/10/2024] [Accepted: 06/25/2024] [Indexed: 08/18/2024] Open
Abstract
BACKGROUND The screening process for systematic reviews is resource-intensive. Although previous machine learning solutions have reported reductions in workload, they risked excluding relevant papers. OBJECTIVE We evaluated the performance of a 3-layer screening method using GPT-3.5 and GPT-4 to streamline the title and abstract-screening process for systematic reviews. Our goal is to develop a screening method that maximizes sensitivity for identifying relevant records. METHODS We conducted screenings on 2 of our previous systematic reviews related to the treatment of bipolar disorder, with 1381 records from the first review and 3146 from the second. Screenings were conducted using GPT-3.5 (gpt-3.5-turbo-0125) and GPT-4 (gpt-4-0125-preview) across three layers: (1) research design, (2) target patients, and (3) interventions and controls. The 3-layer screening was conducted using prompts tailored to each study. During this process, information extraction according to each study's inclusion criteria and optimization for screening were carried out using a GPT-4-based flow without manual adjustments. Records were evaluated at each layer, and those meeting the inclusion criteria at all layers were subsequently judged as included. RESULTS On each layer, both GPT-3.5 and GPT-4 were able to process about 110 records per minute, and the total time required for screening the first and second studies was approximately 1 hour and 2 hours, respectively. In the first study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.900/0.709 and 0.806/0.996, respectively. Both screenings by GPT-3.5 and GPT-4 judged all 6 records used for the meta-analysis as included. In the second study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.958/0.116 and 0.875/0.855, respectively. The sensitivities for the relevant records align with those of human evaluators: 0.867-1.000 for the first study and 0.776-0.979 for the second study. Both screenings by GPT-3.5 and GPT-4 judged all 9 records used for the meta-analysis as included. After accounting for justifiably excluded records by GPT-4, the sensitivities/specificities of the GPT-4 screening were 0.962/0.996 in the first study and 0.943/0.855 in the second study. Further investigation indicated that the cases incorrectly excluded by GPT-3.5 were due to a lack of domain knowledge, while the cases incorrectly excluded by GPT-4 were due to misinterpretations of the inclusion criteria. CONCLUSIONS Our 3-layer screening method with GPT-4 demonstrated acceptable level of sensitivity and specificity that supports its practical application in systematic review screenings. Future research should aim to generalize this approach and explore its effectiveness in diverse settings, both medical and nonmedical, to fully establish its use and operational feasibility.
Collapse
Affiliation(s)
- Kentaro Matsui
- Department of Clinical Laboratory, National Center Hospital, National Center of Neurology and Psychiatry, Kodaira, Japan
- Department of Sleep-Wake Disorders, National Institute of Mental Health, National Center of Neurology and Psychiatry, Kodaira, Japan
| | - Tomohiro Utsumi
- Department of Sleep-Wake Disorders, National Institute of Mental Health, National Center of Neurology and Psychiatry, Kodaira, Japan
- Department of Psychiatry, The Jikei University School of Medicine, Tokyo, Japan
| | - Yumi Aoki
- Graduate School of Nursing Science, St. Luke's International University, Tokyo, Japan
| | - Taku Maruki
- Department of Neuropsychiatry, Kyorin University School of Medicine, Tokyo, Japan
| | - Masahiro Takeshima
- Department of Neuropsychiatry, Akita University Graduate School of Medicine, Akita, Japan
| | - Yoshikazu Takaesu
- Department of Neuropsychiatry, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan
| |
Collapse
|
4
|
Oami T, Okada Y, Sakuraya M, Fukuda T, Shime N, Nakada TA. Efficiency and Workload Reduction of Semi-automated Citation Screening Software for Creating Clinical Practice Guidelines: A Prospective Observational Study. J Epidemiol 2024; 34:380-386. [PMID: 38105001 PMCID: PMC11230876 DOI: 10.2188/jea.je20230227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 11/26/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND We evaluated the applicability of automated citation screening in developing clinical practice guidelines. METHODS We prospectively compared the efficiency of citation screening between the conventional (Rayyan) and semi-automated (ASReview software) methods. We searched the literature for five clinical questions (CQs) in the development of the Japanese Clinical Practice Guidelines for the Management of Sepsis and Septic Shock. Objective measurements of the time required to complete citation screening were recorded. Following the first screening round, in the primary analysis, the sensitivity, specificity, positive predictive value, and overall screening time were calculated for both procedures using the semi-automated tool as index and the results of the conventional method as standard reference. In the secondary analysis, the same parameters were compared between the two procedures using the final list of included studies after the second screening session as standard reference. RESULTS Among the five CQs after the first screening session, the highest and lowest sensitivity, specificity, and positive predictive values were 0.241 and 0.795; 0.991 and 1.000; and 0.482 and 0.929, respectively. In the secondary analysis, the highest sensitivity and specificity in the semi-automated citation screening were 1.000 and 0.997, respectively. The overall screening time per 100 studies was significantly shorter with semi-automated than with conventional citation screening. CONCLUSION The potential advantages of the semi-automated method (shorter screening time and higher discriminatory rate for the final list of studies) warrant further validation.
Collapse
Affiliation(s)
- Takehiko Oami
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, Chiba, Japan
| | - Yohei Okada
- Department of Preventive Services, Kyoto University Graduate School of Medicine, Kyoto, Japan
- Health Services and Systems Research, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Masaaki Sakuraya
- Department of Emergency and Intensive Care Medicine, JA Hiroshima General Hospital, Hiroshima, Japan
| | - Tatsuma Fukuda
- Department of Emergency and Critical Care Medicine, Toranomon Hospital, Tokyo, Japan
| | - Nobuaki Shime
- Department of Emergency and Critical Care Medicine, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Taka-aki Nakada
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, Chiba, Japan
| |
Collapse
|
5
|
Alsanea S, Alkofide H, Almadi B, Almohammed O, Alwhaibi A, Alrabiah Z, Kalagi N. Liraglutide's Effect on Weight Management in Subjects With Pre-diabetes: A Systematic Review & Meta-Analysis. Endocr Pract 2024; 30:737-745. [PMID: 38782201 DOI: 10.1016/j.eprac.2024.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/02/2024] [Accepted: 05/05/2024] [Indexed: 05/25/2024]
Abstract
BACKGROUND Despite the growing literature, the effectiveness of liraglutide in weight management among individuals with prediabetes and in preventing the disease remains controversial. This study aims to critically evaluate the extent of liraglutide's impact on weight management in this population and assess the heterogeneity among extant studies. METHODS A systematic literature search was conducted across MEDLINE, Embase, ClinicalTrials.gov, and the reference list of retrieved studies to identify eligible English language randomized controlled trials evaluating liraglutide's effect on weight in individuals with pre-diabetes. Non-randomized studies, studies not reporting relevant outcomes, and those conducted on patients with type 2 diabetes were excluded from this review. Outcomes included a change from baseline in absolute body weight in kg, body mass index (BMI), waist circumference, glycosylated hemoglobin (HbA1c), and low-density lipoprotein cholesterol levels. Additional safety outcomes were also reported. Data were analyzed using R statistical software version 4.3.1. A fixed-effect model was used when pooling crude numbers for study outcomes. Moreover, a sensitivity analysis using random-effect model was performed and heterogeneity was assessed using I2 statistics. RESULTS Five eligible studies were included, with a total of 1604 subjects in the liraglutide arm and 859 subjects in the control arm. Participants exposed to liraglutide showed a decrease in body weight (mean difference [MD] = -4.95 kg; 95% CI -5.16, -4.73; I2 = 93%), BMI (MD = -2.06 kg/m2; 95%CI -2.22, -1.89; I2 = 97%), waist circumference (MD = -4.61 cm; 95% CI -4.79, -4.43; I2 = 82%), HbA1c (MD = -0.33%; 95%CI -0.34, -0.31; I2 = 100%), and low-density lipoprotein cholesterol levels (MD = -0.36 mmol/L; 95% CI -0.39, -0.33; I2 = 99%). The overall effect size remained similar when using a random-effects model for all outcomes. In addition, the rate of adverse events was higher with liraglutide when compared to the control; however, the dropout rates were relatively lower in the former arm. CONCLUSION While our meta-analysis suggests that liraglutide can reduce body weight, BMI, waist circumference, and HbA1c levels in individuals with pre-diabetes, the findings should be interpreted cautiously due to limitations such as the small number of trials and their short duration, and variability in dosages. Further randomized controlled trials examining long-term outcomes are essential to validate these findings and address the high heterogeneity among the studies included in this analysis.
Collapse
Affiliation(s)
- Sary Alsanea
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia.
| | - Hadeel Alkofide
- Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Bana Almadi
- Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Omar Almohammed
- Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Abdulrahman Alwhaibi
- Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Ziyad Alrabiah
- Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Nora Kalagi
- Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
6
|
Tóth B, Berek L, Gulácsi L, Péntek M, Zrubka Z. Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed. Syst Rev 2024; 13:174. [PMID: 38978132 PMCID: PMC11229257 DOI: 10.1186/s13643-024-02592-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/20/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND The demand for high-quality systematic literature reviews (SRs) for evidence-based medical decision-making is growing. SRs are costly and require the scarce resource of highly skilled reviewers. Automation technology has been proposed to save workload and expedite the SR workflow. We aimed to provide a comprehensive overview of SR automation studies indexed in PubMed, focusing on the applicability of these technologies in real world practice. METHODS In November 2022, we extracted, combined, and ran an integrated PubMed search for SRs on SR automation. Full-text English peer-reviewed articles were included if they reported studies on SR automation methods (SSAM), or automated SRs (ASR). Bibliographic analyses and knowledge-discovery studies were excluded. Record screening was performed by single reviewers, and the selection of full text papers was performed in duplicate. We summarized the publication details, automated review stages, automation goals, applied tools, data sources, methods, results, and Google Scholar citations of SR automation studies. RESULTS From 5321 records screened by title and abstract, we included 123 full text articles, of which 108 were SSAM and 15 ASR. Automation was applied for search (19/123, 15.4%), record screening (89/123, 72.4%), full-text selection (6/123, 4.9%), data extraction (13/123, 10.6%), risk of bias assessment (9/123, 7.3%), evidence synthesis (2/123, 1.6%), assessment of evidence quality (2/123, 1.6%), and reporting (2/123, 1.6%). Multiple SR stages were automated by 11 (8.9%) studies. The performance of automated record screening varied largely across SR topics. In published ASR, we found examples of automated search, record screening, full-text selection, and data extraction. In some ASRs, automation fully complemented manual reviews to increase sensitivity rather than to save workload. Reporting of automation details was often incomplete in ASRs. CONCLUSIONS Automation techniques are being developed for all SR stages, but with limited real-world adoption. Most SR automation tools target single SR stages, with modest time savings for the entire SR process and varying sensitivity and specificity across studies. Therefore, the real-world benefits of SR automation remain uncertain. Standardizing the terminology, reporting, and metrics of study reports could enhance the adoption of SR automation techniques in real-world practice.
Collapse
Affiliation(s)
- Barbara Tóth
- Doctoral School of Innovation Management, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
| | - László Berek
- Doctoral School for Safety and Security, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
- University Library, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
| | - László Gulácsi
- HECON Health Economics Research Center, University Research, and Innovation Center, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
| | - Márta Péntek
- HECON Health Economics Research Center, University Research, and Innovation Center, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
| | - Zsombor Zrubka
- HECON Health Economics Research Center, University Research, and Innovation Center, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary.
| |
Collapse
|
7
|
Oami T, Okada Y, Nakada TA. Performance of a Large Language Model in Screening Citations. JAMA Netw Open 2024; 7:e2420496. [PMID: 38976267 PMCID: PMC11231796 DOI: 10.1001/jamanetworkopen.2024.20496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 05/06/2024] [Indexed: 07/09/2024] Open
Abstract
Importance Large language models (LLMs) are promising as tools for citation screening in systematic reviews. However, their applicability has not yet been determined. Objective To evaluate the accuracy and efficiency of an LLM in title and abstract literature screening. Design, Setting, and Participants This prospective diagnostic study used the data from the title and abstract screening process for 5 clinical questions (CQs) in the development of the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock. The LLM decided to include or exclude citations based on the inclusion and exclusion criteria in terms of patient, population, problem; intervention; comparison; and study design of the selected CQ and was compared with the conventional method for title and abstract screening. This study was conducted from January 7 to 15, 2024. Exposures LLM (GPT-4 Turbo)-assisted citation screening or the conventional method. Main Outcomes and Measures The sensitivity and specificity of the LLM-assisted screening process was calculated, and the full-text screening result using the conventional method was set as the reference standard in the primary analysis. Pooled sensitivity and specificity were also estimated, and screening times of the 2 methods were compared. Results In the conventional citation screening process, 8 of 5634 publications in CQ 1, 4 of 3418 in CQ 2, 4 of 1038 in CQ 3, 17 of 4326 in CQ 4, and 8 of 2253 in CQ 5 were selected. In the primary analysis of 5 CQs, LLM-assisted citation screening demonstrated an integrated sensitivity of 0.75 (95% CI, 0.43 to 0.92) and specificity of 0.99 (95% CI, 0.99 to 0.99). Post hoc modifications to the command prompt improved the integrated sensitivity to 0.91 (95% CI, 0.77 to 0.97) without substantially compromising specificity (0.98 [95% CI, 0.96 to 0.99]). Additionally, LLM-assisted screening was associated with reduced time for processing 100 studies (1.3 minutes vs 17.2 minutes for conventional screening methods; mean difference, -15.25 minutes [95% CI, -17.70 to -12.79 minutes]). Conclusions and Relevance In this prospective diagnostic study investigating the performance of LLM-assisted citation screening, the model demonstrated acceptable sensitivity and reasonably high specificity with reduced processing time. This novel method could potentially enhance efficiency and reduce workload in systematic reviews.
Collapse
Affiliation(s)
- Takehiko Oami
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, Chiba, Japan
| | - Yohei Okada
- Department of Preventive Services, Kyoto University Graduate School of Medicine, Kyoto, Japan
- Health Services and Systems Research, Duke-NUS Medical School, National University of Singapore, Singapore
| | - Taka-Aki Nakada
- Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, Chiba, Japan
| |
Collapse
|
8
|
Burns JK, Etherington C, Cheng-Boivin O, Boet S. Using an artificial intelligence tool can be as accurate as human assessors in level one screening for a systematic review. Health Info Libr J 2024; 41:136-148. [PMID: 34792285 DOI: 10.1111/hir.12413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 08/19/2021] [Accepted: 10/23/2021] [Indexed: 11/29/2022]
Abstract
BACKGROUND Artificial intelligence (AI) offers a promising solution to expedite various phases of the systematic review process such as screening. OBJECTIVE We aimed to assess the accuracy of an AI tool in identifying eligible references for a systematic review compared to identification by human assessors. METHODS For the case study (a systematic review of knowledge translation interventions), we used a diagnostic accuracy design and independently assessed for eligibility a set of articles (n = 300) using human raters and the AI system DistillerAI (Evidence Partners, Ottawa, Canada). We analysed a series of 64 possible confidence levels for the AI's decisions and calculated several standard parameters of diagnostic accuracy for each. RESULTS When set to a lower AI confidence threshold of 0.1 or greater and an upper threshold of 0.9 or lower, DistillerAI made article selection decisions very similarly to human assessors. Within this range, DistillerAI made a decision on the majority of articles (93-100%), with a sensitivity of 1.0 and specificity ranging from 0.9 to 1.0. CONCLUSION DistillerAI appears to be accurate in its assessment of articles in a case study of 300 articles. Further experimentation with DistillerAI will establish its performance among other subject areas.
Collapse
Affiliation(s)
- Joseph K Burns
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, ON, Canada
| | - Cole Etherington
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, ON, Canada
| | - Olivia Cheng-Boivin
- Department of Anesthesiology and Pain Medicine, The Ottawa Hospital, Ottawa, ON, Canada
| | - Sylvain Boet
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, ON, Canada
- Department of Anesthesiology and Pain Medicine, The Ottawa Hospital, Ottawa, ON, Canada
- Francophone Affairs & Department of Innovation in Medical Education, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
- Ottawa Hospital Research Institute, Ottawa, ON, Canada
| |
Collapse
|
9
|
Tran VT, Gartlehner G, Yaacoub S, Boutron I, Schwingshackl L, Stadelmaier J, Sommer I, Alebouyeh F, Afach S, Meerpohl J, Ravaud P. Sensitivity and Specificity of Using GPT-3.5 Turbo Models for Title and Abstract Screening in Systematic Reviews and Meta-analyses. Ann Intern Med 2024; 177:791-799. [PMID: 38768452 DOI: 10.7326/m23-3389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
BACKGROUND Systematic reviews are performed manually despite the exponential growth of scientific literature. OBJECTIVE To investigate the sensitivity and specificity of GPT-3.5 Turbo, from OpenAI, as a single reviewer, for title and abstract screening in systematic reviews. DESIGN Diagnostic test accuracy study. SETTING Unannotated bibliographic databases from 5 systematic reviews representing 22 665 citations. PARTICIPANTS None. MEASUREMENTS A generic prompt framework to instruct GPT to perform title and abstract screening was designed. The output of the model was compared with decisions from authors under 2 rules. The first rule balanced sensitivity and specificity, for example, to act as a second reviewer. The second rule optimized sensitivity, for example, to reduce the number of citations to be manually screened. RESULTS Under the balanced rule, sensitivities ranged from 81.1% to 96.5% and specificities ranged from 25.8% to 80.4%. Across all reviews, GPT identified 7 of 708 citations (1%) missed by humans that should have been included after full-text screening at the cost of 10 279 of 22 665 false-positive recommendations (45.3%) that would require reconciliation during the screening process. Under the sensitive rule, sensitivities ranged from 94.6% to 99.8% and specificities ranged from 2.2% to 46.6%. Limiting manual screening to citations not ruled out by GPT could reduce the number of citations to screen from 127 of 6334 (2%) to 1851 of 4077 (45.4%), at the cost of missing from 0 to 1 of 26 citations (3.8%) at the full-text level. LIMITATIONS Time needed to fine-tune prompt. Retrospective nature of the study, convenient sample of 5 systematic reviews, and GPT performance sensitive to prompt development and time. CONCLUSION The GPT-3.5 Turbo model may be used as a second reviewer for title and abstract screening, at the cost of additional work to reconcile added false positives. It also showed potential to reduce the number of citations before screening by humans, at the cost of missing some citations at the full-text level. PRIMARY FUNDING SOURCE None.
Collapse
Affiliation(s)
- Viet-Thi Tran
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris; and Centre d'Epidémiologie Clinique, Hôpital Hôtel-Dieu, AP-HP, Paris, France (V.-T.T.)
| | - Gerald Gartlehner
- Department for Evidence-based Medicine and Evaluation, University for Continuing Education Krems, Krems, Austria; and Center for Public Health Methods, RTI International, Research Triangle Park, North Carolina (G.G.)
| | - Sally Yaacoub
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris, France (S.Y., F.A.)
| | - Isabelle Boutron
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris, France; and Centre d'Epidémiologie Clinique, Hôpital Hôtel-Dieu, AP-HP, Paris, France (I.B.)
| | - Lukas Schwingshackl
- Institute for Evidence in Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany (L.S., J.S., J.M.)
| | - Julia Stadelmaier
- Institute for Evidence in Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany (L.S., J.S., J.M.)
| | - Isolde Sommer
- Department for Evidence-based Medicine and Evaluation, University for Continuing Education Krems, Krems, Austria (I.S.)
| | - Farzaneh Alebouyeh
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris, France (S.Y., F.A.)
| | - Sivem Afach
- Epidemiology in Dermatology and Evaluation of Therapeutics (EpiDermE)-EA 7379, University Paris Est Créteil Val de Marne, Créteil, France (S.A.)
| | - Joerg Meerpohl
- Institute for Evidence in Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany (L.S., J.S., J.M.)
| | - Philippe Ravaud
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, INRAe, Centre for Research in Epidemiology and Statistics (CRESS), Paris, France; Centre d'Epidémiologie Clinique, Hôpital Hôtel-Dieu, AP-HP, Paris, France; and Department of Epidemiology, Columbia University Mailman School of Public Health, New York, New York (P.R.)
| |
Collapse
|
10
|
Guo Q, Jiang G, Zhao Q, Long Y, Feng K, Gu X, Xu Y, Li Z, Huang J, Du L. Rapid review: A review of methods and recommendations based on current evidence. J Evid Based Med 2024; 17:434-453. [PMID: 38512942 DOI: 10.1111/jebm.12594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 02/28/2024] [Indexed: 03/23/2024]
Abstract
Rapid review (RR) could accelerate the traditional systematic review (SR) process by simplifying or omitting steps using various shortcuts. With the increasing popularity of RR, numerous shortcuts had emerged, but there was no consensus on how to choose the most appropriate ones. This study conducted a literature search in PubMed from inception to December 21, 2023, using terms such as "rapid review" "rapid assessment" "rapid systematic review" and "rapid evaluation". We also scanned the reference lists and performed citation tracking of included impact studies to obtain more included studies. We conducted a narrative synthesis of all RR approaches, shortcuts and studies assessing their effectiveness at each stage of RRs. Based on the current evidence, we provided recommendations on utilizing certain shortcuts in RRs. Ultimately, we identified 185 studies focusing on summarizing RR approaches and shortcuts, or evaluating their impact. There was relatively sufficient evidence to support the use of the following shortcuts in RRs: limiting studies to those published in English-language; conducting abbreviated database searches (e.g., only searching PubMed/MEDLINE, Embase, and CENTRAL); omitting retrieval of grey literature; restricting the search timeframe to the recent 20 years for medical intervention and the recent 15 years for reviewing diagnostic test accuracy; conducting a single screening by an experienced screener. To some extent, the above shortcuts were also applicable to SRs. This study provided a reference for future RR researchers in selecting shortcuts, and it also presented a potential research topic for methodologists.
Collapse
Affiliation(s)
- Qiong Guo
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- West China Medical Publishers, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Guiyu Jiang
- West China School of Public Health, Sichuan University, Chengdu, P. R. China
| | - Qingwen Zhao
- West China School of Public Health, Sichuan University, Chengdu, P. R. China
| | - Youlin Long
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Kun Feng
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Xianlin Gu
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Yihan Xu
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
- Center for education of medical humanities, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Zhengchi Li
- Center for education of medical humanities, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Jin Huang
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Liang Du
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- West China Medical Publishers, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
| |
Collapse
|
11
|
De Silva DTN, Moore BR, Strunk T, Petrovski M, Varis V, Chai K, Ng L, Batty K. Development of a pharmaceutical science systematic review process using a semi-automated machine learning tool: Intravenous drug compatibility in the neonatal intensive care setting. Pharmacol Res Perspect 2024; 12:e1170. [PMID: 38204432 PMCID: PMC10782215 DOI: 10.1002/prp2.1170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 10/30/2023] [Accepted: 12/11/2023] [Indexed: 01/12/2024] Open
Abstract
Our objective was to establish and test a machine learning-based screening process that would be applicable to systematic reviews in pharmaceutical sciences. We used the SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) model, a broad search strategy, and a machine learning tool (Research Screener) to identify relevant references related to y-site compatibility of 95 intravenous drugs used in neonatal intensive care settings. Two independent reviewers conducted pilot studies, including manual screening and evaluation of Research Screener, and used the kappa-coefficient for inter-reviewer reliability. After initial deduplication of the search strategy results, 27 597 references were available for screening. Research Screener excluded 1735 references, including 451 duplicate titles and 1269 reports with no abstract/title, which were manually screened. The remainder (25 862) were subject to the machine learning screening process. All eligible articles for the systematic review were extracted from <10% of the references available for screening. Moderate inter-reviewer reliability was achieved, with kappa-coefficient ≥0.75. Overall, 324 references were subject to full-text reading and 118 were deemed relevant for the systematic review. Our study showed that a broad search strategy to optimize the literature captured for systematic reviews can be efficiently screened by the semi-automated machine learning tool, Research Screener.
Collapse
Affiliation(s)
| | - Brioni R. Moore
- Curtin Medical SchoolCurtin UniversityPerthWestern AustraliaAustralia
- Curtin Health Innovation Research InstituteCurtin UniversityPerthWestern AustraliaAustralia
- Medical SchoolThe University of Western AustraliaCrawleyWestern AustraliaAustralia
- Wesfarmers Centre for Vaccines and Infectious DiseasesTelethon Kids InstituteNedlandsWestern AustraliaAustralia
| | - Tobias Strunk
- Medical SchoolThe University of Western AustraliaCrawleyWestern AustraliaAustralia
- Wesfarmers Centre for Vaccines and Infectious DiseasesTelethon Kids InstituteNedlandsWestern AustraliaAustralia
- Neonatal DirectorateKing Edward Memorial Hospital, Child and Adolescent Health ServiceSubiacoWestern AustraliaAustralia
| | - Michael Petrovski
- Pharmacy Department, King Edward Memorial HospitalWomen and Newborn Health ServiceSubiacoWestern AustraliaAustralia
| | - Vanessa Varis
- University Library, Curtin UniversityPerthWestern AustraliaAustralia
| | - Kevin Chai
- School of Population HealthCurtin UniversityPerthWestern AustraliaAustralia
| | - Leo Ng
- Curtin School of Allied HealthCurtin UniversityPerthWestern AustraliaAustralia
- School of Health SciencesSwinburne University of TechnologyHawthornVictoriaAustralia
| | - Kevin T. Batty
- Curtin Medical SchoolCurtin UniversityPerthWestern AustraliaAustralia
- Curtin Health Innovation Research InstituteCurtin UniversityPerthWestern AustraliaAustralia
| |
Collapse
|
12
|
Guo E, Gupta M, Deng J, Park YJ, Paget M, Naugler C. Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study. J Med Internet Res 2024; 26:e48996. [PMID: 38214966 PMCID: PMC10818236 DOI: 10.2196/48996] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 08/30/2023] [Accepted: 09/28/2023] [Indexed: 01/13/2024] Open
Abstract
BACKGROUND The systematic review of clinical research papers is a labor-intensive and time-consuming process that often involves the screening of thousands of titles and abstracts. The accuracy and efficiency of this process are critical for the quality of the review and subsequent health care decisions. Traditional methods rely heavily on human reviewers, often requiring a significant investment of time and resources. OBJECTIVE This study aims to assess the performance of the OpenAI generative pretrained transformer (GPT) and GPT-4 application programming interfaces (APIs) in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review data sets and comparing their performance against ground truth labeling by 2 independent human reviewers. METHODS We introduce a novel workflow using the Chat GPT and GPT-4 APIs for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the API with the screening criteria in natural language and a corpus of title and abstract data sets filtered by a minimum of 2 human reviewers. We compared the performance of our model against human-reviewed papers across 6 review papers, screening over 24,000 titles and abstracts. RESULTS Our results show an accuracy of 0.91, a macro F1-score of 0.60, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. The interrater variability between 2 independent human screeners was κ=0.46, and the prevalence and bias-adjusted κ between our proposed methods and the consensus-based human decisions was κ=0.96. On a randomly selected subset of papers, the GPT models demonstrated the ability to provide reasoning for their decisions and corrected their initial decisions upon being asked to explain their reasoning for incorrect classifications. CONCLUSIONS Large language models have the potential to streamline the clinical review process, save valuable time and effort for researchers, and contribute to the overall quality of clinical reviews. By prioritizing the workflow and acting as an aid rather than a replacement for researchers and reviewers, models such as GPT-4 can enhance efficiency and lead to more accurate and reliable conclusions in medical research.
Collapse
Affiliation(s)
- Eddie Guo
- Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Mehul Gupta
- Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Jiawen Deng
- Temerty Faculty of Medicine, University of Toronto, Toronto, AB, Canada
| | - Ye-Jean Park
- Temerty Faculty of Medicine, University of Toronto, Toronto, AB, Canada
| | - Michael Paget
- Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | | |
Collapse
|
13
|
Roth S, Wermer-Colan A. Machine Learning Methods for Systematic Reviews:: A Rapid Scoping Review. Dela J Public Health 2023; 9:40-47. [PMID: 38173960 PMCID: PMC10759980 DOI: 10.32481/djph.2023.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024] Open
Abstract
Objective At the forefront of machine learning research since its inception has been natural language processing, also known as text mining, referring to a wide range of statistical processes for analyzing textual data and retrieving information. In medical fields, text mining has made valuable contributions in unexpected ways, not least by synthesizing data from disparate biomedical studies. This rapid scoping review examines how machine learning methods for text mining can be implemented at the intersection of these disparate fields to improve the workflow and process of conducting systematic reviews in medical research and related academic disciplines. Methods The primary research question that this investigation asked, "what impact does the use of machine learning have on the methods used by systematic review teams to carry out the systematic review process, such as the precision of search strategies, unbiased article selection or data abstraction and/or analysis for systematic reviews and other comprehensive review types of similar methodology?" A literature search was conducted by a medical librarian utilizing multiple databases, a grey literature search and handsearching of the literature. The search was completed on December 4, 2020. Handsearching was done on an ongoing basis with an end date of April 14, 2023. Results The search yielded 23,190 studies after duplicates were removed. As a result, 117 studies (1.70%) met eligibility criteria for inclusion in this rapid scoping review. Conclusions There are several techniques and/or types of machine learning methods in development or that have already been fully developed to assist with the systematic review stages. Combined with human intelligence, these machine learning methods and tools provide promise for making the systematic review process more efficient, saving valuable time for systematic review authors, and increasing the speed in which evidence can be created and placed in the hands of decision makers and the public.
Collapse
Affiliation(s)
- Stephanie Roth
- Medical Librarian, Lewis B. Flinn Medical Library, ChristianaCare
| | - Alex Wermer-Colan
- Academic Director, Loretta C. Duckworth Scholars Studio, Temple University Libraries
| |
Collapse
|
14
|
Waffenschmidt S, Sieben W, Jakubeit T, Knelangen M, Overesch I, Bühn S, Pieper D, Skoetz N, Hausner E. Increasing the efficiency of study selection for systematic reviews using prioritization tools and a single-screening approach. Syst Rev 2023; 12:161. [PMID: 37705060 PMCID: PMC10500815 DOI: 10.1186/s13643-023-02334-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 08/22/2023] [Indexed: 09/15/2023] Open
Abstract
BACKGROUND Systematic literature screening is a key component in systematic reviews. However, this approach is resource intensive as generally two persons independently of each other (double screening) screen a vast number of search results. To develop approaches for increasing efficiency, we tested the use of text mining to prioritize search results as well as the involvement of only one person (single screening) in the study selection process. METHOD Our study is based on health technology assessments (HTAs) of drug and non-drug interventions. Using a sample size calculation, we consecutively included 11 searches resulting in 33 study selection processes. Of the three screeners for each search, two used screening tools with prioritization (Rayyan, EPPI Reviewer) and one a tool without prioritization. For each prioritization tool, we investigated the proportion of citations classified as relevant at three cut-offs or STOP criteria (after screening 25%, 50% and 75% of the citation set). For each STOP criterion, we measured sensitivity (number of correctly identified relevant studies divided by the total number of relevant studies in the study pool). In addition, we determined the number of relevant studies identified per single screening round and investigated whether missed studies were relevant to the HTA conclusion. RESULTS Overall, EPPI Reviewer performed better than Rayyan and identified the vast majority (88%, Rayyan 66%) of relevant citations after screening half of the citation set. As long as additional information sources were screened, it was sufficient to apply a single-screening approach to identify all studies relevant to the HTA conclusion. Although many relevant publications (n = 63) and studies (n = 29) were incorrectly excluded, ultimately only 5 studies could not be identified at all in 2 of the 11 searches (1x 1 study, 1x 4 studies). However, their omission did not change the overall conclusion in any HTA. CONCLUSIONS EPPI Reviewer helped to identify relevant citations earlier in the screening process than Rayyan. Single screening would have been sufficient to identify all studies relevant to the HTA conclusion. However, this requires screening of further information sources. It also needs to be considered that the credibility of an HTA may be questioned if studies are missing, even if they are not relevant to the HTA conclusion.
Collapse
Affiliation(s)
- Siw Waffenschmidt
- Institute for Quality and Efficiency in Health Care, Cologne, Germany.
| | - Wiebke Sieben
- Institute for Quality and Efficiency in Health Care, Cologne, Germany
| | - Thomas Jakubeit
- Institute for Quality and Efficiency in Health Care, Cologne, Germany
| | - Marco Knelangen
- Institute for Quality and Efficiency in Health Care, Cologne, Germany
| | - Inga Overesch
- Institute for Quality and Efficiency in Health Care, Cologne, Germany
- Department 2 (Infectious Disease Epidemiology), Public Health Agency of Lower Saxony, Hanover, Germany
| | - Stefanie Bühn
- Institute for Research in Operative Medicine, Herdecke University, Witten, Germany
| | - Dawid Pieper
- Institute for Research in Operative Medicine, Herdecke University, Witten, Germany
- Faculty of Health Sciences Brandenburg, Brandenburg Medical School, Institute for Health Services and Health System Research, Rüdersdorf, Germany
- Brandenburg Medical School, Center for Health Services Research Brandenburg, Rüdersdorf, Germany
| | - Nicole Skoetz
- Evidence-Based Medicine, Department I of Internal Medicine, Faculty of Medicine, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Elke Hausner
- Institute for Quality and Efficiency in Health Care, Cologne, Germany
| |
Collapse
|
15
|
Oude Wolcherink MJ, Pouwels XGLV, van Dijk SHB, Doggen CJM, Koffijberg H. Can artificial intelligence separate the wheat from the chaff in systematic reviews of health economic articles? Expert Rev Pharmacoecon Outcomes Res 2023; 23:1049-1056. [PMID: 37573521 DOI: 10.1080/14737167.2023.2234639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/02/2023] [Indexed: 08/15/2023]
Abstract
OBJECTIVES Artificial intelligence-powered tools, such as ASReview, could reduce the burden of title and abstract screening. This study aimed to assess the accuracy and efficiency of using ASReview in a health economic context. METHODS A sample from a previous systematic literature review containing 4,994 articles was used. Previous manual screening resulted in 134 articles included for full-text screening (FT) and 50 for data extraction (DE). Here, accuracy and efficiency was evaluated by comparing the number of identified relevant articles with ASReview versus manual screening. Pre-defined stopping rules using sampling criteria and heuristic criteria were tested. Robustness of the AI-tool's performance was determined using 1,000 simulations. RESULTS Considering included stopping rules, median accuracy for FT articles remained below 85%, but reached 100% for DE articles. To identify all relevant articles, a median of 89.9% of FT articles needed to be screened, compared to 7.7% for DE articles. Potential time savings between 49 and 59 hours could be achieved, depending on the stopping rule. CONCLUSIONS In our case study, all DE articles were identified after screening 7.7% of the sample, allowing for substantial time savings. ASReview likely has the potential to substantially reduce screening time in systematic reviews of health economic articles.
Collapse
Affiliation(s)
- M J Oude Wolcherink
- Department of Health Technology and Services Research, Technical Medical Centre, Faculty of Behavioral, Management and Social Sciences, University of Twente, Enschede, The Netherlands
| | - X G L V Pouwels
- Department of Health Technology and Services Research, Technical Medical Centre, Faculty of Behavioral, Management and Social Sciences, University of Twente, Enschede, The Netherlands
| | - S H B van Dijk
- Department of Health Technology and Services Research, Technical Medical Centre, Faculty of Behavioral, Management and Social Sciences, University of Twente, Enschede, The Netherlands
| | - C J M Doggen
- Department of Health Technology and Services Research, Technical Medical Centre, Faculty of Behavioral, Management and Social Sciences, University of Twente, Enschede, The Netherlands
| | - H Koffijberg
- Department of Health Technology and Services Research, Technical Medical Centre, Faculty of Behavioral, Management and Social Sciences, University of Twente, Enschede, The Netherlands
| |
Collapse
|
16
|
Ferdinands G, Schram R, de Bruin J, Bagheri A, Oberski DL, Tummers L, Teijema JJ, van de Schoot R. Performance of active learning models for screening prioritization in systematic reviews: a simulation study into the Average Time to Discover relevant records. Syst Rev 2023; 12:100. [PMID: 37340494 PMCID: PMC10280866 DOI: 10.1186/s13643-023-02257-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 05/16/2023] [Indexed: 06/22/2023] Open
Abstract
BACKGROUND Conducting a systematic review demands a significant amount of effort in screening titles and abstracts. To accelerate this process, various tools that utilize active learning have been proposed. These tools allow the reviewer to interact with machine learning software to identify relevant publications as early as possible. The goal of this study is to gain a comprehensive understanding of active learning models for reducing the workload in systematic reviews through a simulation study. METHODS The simulation study mimics the process of a human reviewer screening records while interacting with an active learning model. Different active learning models were compared based on four classification techniques (naive Bayes, logistic regression, support vector machines, and random forest) and two feature extraction strategies (TF-IDF and doc2vec). The performance of the models was compared for six systematic review datasets from different research areas. The evaluation of the models was based on the Work Saved over Sampling (WSS) and recall. Additionally, this study introduces two new statistics, Time to Discovery (TD) and Average Time to Discovery (ATD). RESULTS The models reduce the number of publications needed to screen by 91.7 to 63.9% while still finding 95% of all relevant records (WSS@95). Recall of the models was defined as the proportion of relevant records found after screening 10% of of all records and ranges from 53.6 to 99.8%. The ATD values range from 1.4% till 11.7%, which indicate the average proportion of labeling decisions the researcher needs to make to detect a relevant record. The ATD values display a similar ranking across the simulations as the recall and WSS values. CONCLUSIONS Active learning models for screening prioritization demonstrate significant potential for reducing the workload in systematic reviews. The Naive Bayes + TF-IDF model yielded the best results overall. The Average Time to Discovery (ATD) measures performance of active learning models throughout the entire screening process without the need for an arbitrary cut-off point. This makes the ATD a promising metric for comparing the performance of different models across different datasets.
Collapse
Affiliation(s)
- Gerbrich Ferdinands
- Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, Netherlands.
| | - Raoul Schram
- Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, The Netherlands
| | - Jonathan de Bruin
- Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, The Netherlands
| | - Ayoub Bagheri
- Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, Netherlands
| | - Daniel L Oberski
- Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, Netherlands
| | - Lars Tummers
- School of Governance, Faculty of Law, Economics and Governance, Utrecht University, Utrecht, The Netherlands
| | - Jelle Jasper Teijema
- Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, Netherlands
| | - Rens van de Schoot
- Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
17
|
Oliveira Dos Santos Á, Sergio da Silva E, Machado Couto L, Valadares Labanca Reis G, Silva Belo V. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: a scoping review. J Biomed Inform 2023; 142:104389. [PMID: 37187321 DOI: 10.1016/j.jbi.2023.104389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/11/2023] [Accepted: 05/08/2023] [Indexed: 05/17/2023]
Abstract
OBJECTIVE Evidence-based medicine (EBM) is a decision-making process based on the conscious and judicious use of the best available scientific evidence. However, the exponential increase in the amount of information currently available likely exceeds the capacity of human-only analysis. In this context, artificial intelligence (AI) and its branches such as machine learning (ML) can be used to facilitate human efforts in analyzing the literature to foster EBM. The present scoping review aimed to examine the use of AI in the automation of biomedical literature survey and analysis with a view to establishing the state-of-the-art and identifying knowledge gaps. MATERIALS AND METHODS Comprehensive searches of the main databases were performed for articles published up to June 2022 and studies were selected according to inclusion and exclusion criteria. Data were extracted from the included articles and the findings categorized. RESULTS The total number of records retrieved from the databases was 12,145, of which 273 were included in the review. Classification of the studies according to the use of AI in evaluating the biomedical literature revealed three main application groups, namely assembly of scientific evidence (n=127; 47%), mining the biomedical literature (n=112; 41%) and quality analysis (n=34; 12%). Most studies addressed the preparation of systematic reviews, while articles focusing on the development of guidelines and evidence synthesis were the least frequent. The biggest knowledge gap was identified within the quality analysis group, particularly regarding methods and tools that assess the strength of recommendation and consistency of evidence. CONCLUSION Our review shows that, despite significant progress in the automation of biomedical literature surveys and analyses in recent years, intense research is needed to fill knowledge gaps on more difficult aspects of ML, deep learning and natural language processing, and to consolidate the use of automation by end-users (biomedical researchers and healthcare professionals).
Collapse
Affiliation(s)
| | - Eduardo Sergio da Silva
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | - Letícia Machado Couto
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | | | - Vinícius Silva Belo
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| |
Collapse
|
18
|
Dos Reis AHS, de Oliveira ALM, Fritsch C, Zouch J, Ferreira P, Polese JC. Usefulness of machine learning softwares to screen titles of systematic reviews: a methodological study. Syst Rev 2023; 12:68. [PMID: 37061711 PMCID: PMC10105467 DOI: 10.1186/s13643-023-02231-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/05/2023] [Indexed: 04/17/2023] Open
Abstract
OBJECTIVE To investigate the usefulness and performance metrics of three freely-available softwares (Rayyan®, Abstrackr® and Colandr®) for title screening in systematic reviews. STUDY DESIGN AND SETTING In this methodological study, the usefulness of softwares to screen titles in systematic reviews was investigated by the comparison between the number of titles identified by software-assisted screening and those by manual screening using a previously published systematic review. To test the performance metrics, sensitivity, specificity, false negative rate, proportion missed, workload and timing savings were calculated. A purposely built survey was used to evaluate the rater's experiences regarding the softwares' performances. RESULTS Rayyan® was the most sensitive software and raters correctly identified 78% of the true positives. All three softwares were specific and raters correctly identified 99% of the true negatives. They also had similar values for precision, proportion missed, workload and timing savings. Rayyan®, Abstrackr® and Colandr® had 21%, 39% and 34% of false negatives rates, respectively. Rayyan presented the best performance (35/40) according to the raters. CONCLUSION Rayyan®, Abstrackr® and Colandr® are useful tools and provided good metric performance results for systematic title screening. Rayyan® appears to be the best ranked on the quantitative and on the raters' perspective evaluation. The most important finding of this study is that the use of software to screen titles does not remove any title that would meet the inclusion criteria for the final review, being valuable resources to facilitate the screening process.
Collapse
Affiliation(s)
- Ana Helena Salles Dos Reis
- Post-Graduate Program of Health Sciences, Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Faculty of Health Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Ana Luiza Miranda de Oliveira
- Post-Graduate Program of Health Sciences, Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Carolina Fritsch
- Faculty of Medicine and Health, School of Health Sciences, Sydney Musculoskeletal Health, The Kolling Institute, The University of Sydney, Sydney, NSW, Australia
| | - James Zouch
- Faculty of Health Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Paulo Ferreira
- Faculty of Health Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Janaine Cunha Polese
- Post-Graduate Program of Health Sciences, Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
| |
Collapse
|
19
|
Burgard T, Bittermann A. Reducing Literature Screening Workload With Machine Learning. ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY 2023. [DOI: 10.1027/2151-2604/a000509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Abstract. In our era of accelerated accumulation of knowledge, the manual screening of literature for eligibility is increasingly becoming too labor-intensive for summarizing the current state of knowledge in a timely manner. Recent advances in machine learning and natural language processing promise to reduce the screening workload by automatically detecting unseen references with a high probability of inclusion. As a variety of tools have been developed, the current review provides an overview of their characteristics and performance. A systematic search in various databases yielded 488 eligible reports, revealing 15 tools for screening automation that differed in methodology, features, and accessibility. For the review on the performance of screening tools, 21 studies could be included. In comparison to sampling records randomly, active screening with prioritization approximately halves the screening workload. However, a comparison of tools under equal or at least similar conditions is needed to derive clear recommendations.
Collapse
Affiliation(s)
- Tanja Burgard
- Research Synthesis Methods, Leibniz Institute for Psychology (ZPID), Trier, Germany
| | - André Bittermann
- Big Data, Leibniz Institute for Psychology (ZPID), Trier, Germany
| |
Collapse
|
20
|
Tercero-Hidalgo JR, Fernández-Luna JM. [In response to «Systematic reviews in five steps»: available automation tools]. Semergen 2023; 49:101828. [PMID: 36195015 DOI: 10.1016/j.semerg.2022.101828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 06/22/2022] [Indexed: 02/05/2023]
Affiliation(s)
- J R Tercero-Hidalgo
- Departamento de Medicina Preventiva y Salud Pública, Universidad de Granada, Granada, España.
| | - J M Fernández-Luna
- Departamento de Ciencias de la Computación e Inteligencia Artificial, Universidad de Granada, Granada, España
| |
Collapse
|
21
|
Cierco Jimenez R, Lee T, Rosillo N, Cordova R, Cree IA, Gonzalez A, Indave Ruiz BI. Machine learning computational tools to assist the performance of systematic reviews: A mapping review. BMC Med Res Methodol 2022; 22:322. [PMID: 36522637 PMCID: PMC9756658 DOI: 10.1186/s12874-022-01805-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/26/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Within evidence-based practice (EBP), systematic reviews (SR) are considered the highest level of evidence in that they summarize the best available research and describe the progress in a determined field. Due its methodology, SR require significant time and resources to be performed; they also require repetitive steps that may introduce biases and human errors. Machine learning (ML) algorithms therefore present a promising alternative and a potential game changer to speed up and automate the SR process. This review aims to map the current availability of computational tools that use ML techniques to assist in the performance of SR, and to support authors in the selection of the right software for the performance of evidence synthesis. METHODS The mapping review was based on comprehensive searches in electronic databases and software repositories to obtain relevant literature and records, followed by screening for eligibility based on titles, abstracts, and full text by two reviewers. The data extraction consisted of listing and extracting the name and basic characteristics of the included tools, for example a tool's applicability to the various SR stages, pricing options, open-source availability, and type of software. These tools were classified and graphically represented to facilitate the description of our findings. RESULTS A total of 9653 studies and 585 records were obtained from the structured searches performed on selected bibliometric databases and software repositories respectively. After screening, a total of 119 descriptions from publications and records allowed us to identify 63 tools that assist the SR process using ML techniques. CONCLUSIONS This review provides a high-quality map of currently available ML software to assist the performance of SR. ML algorithms are arguably one of the best techniques at present for the automation of SR. The most promising tools were easily accessible and included a high number of user-friendly features permitting the automation of SR and other kinds of evidence synthesis reviews.
Collapse
Affiliation(s)
- Ramon Cierco Jimenez
- International Agency for Research on Cancer (IARC/WHO), Evidence Synthesis and Classification Branch, Lyon, France.
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, Bellaterra, Spain.
| | - Teresa Lee
- International Agency for Research on Cancer (IARC/WHO), Services to Science and Research Branch, Lyon, France
| | - Nicolás Rosillo
- Servicio de Medicina Preventiva, Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Reynalda Cordova
- International Agency for Research on Cancer (IARC/WHO), Nutrition and Metabolism Branch, Lyon, France
- Department of Nutritional Sciences, University of Vienna, Vienna, Austria
| | - Ian A Cree
- International Agency for Research on Cancer (IARC/WHO), Evidence Synthesis and Classification Branch, Lyon, France
| | - Angel Gonzalez
- Laboratori de Medicina Computacional, Unitat de Bioestadística, Facultat de Medicina, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Blanca Iciar Indave Ruiz
- International Agency for Research on Cancer (IARC/WHO), Evidence Synthesis and Classification Branch, Lyon, France
| |
Collapse
|
22
|
Pradhan SK, Adnani H, Safadi R, Yerigeri K, Nayak S, Raina R, Sinha R. Cardiorenal syndrome in the pediatric population: A systematic review. Ann Pediatr Cardiol 2022; 15:493-510. [PMID: 37152514 PMCID: PMC10158476 DOI: 10.4103/apc.apc_50_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 06/26/2022] [Accepted: 08/17/2022] [Indexed: 03/03/2023] Open
Abstract
The concept of cardiorenal syndrome (CRS) is derived from the crosstalk between the heart and kidneys in pathological conditions. Despite the rising importance of CRS, there is a paucity of information on the understanding of its pathophysiology and management, increasing both morbidity and mortality for patients. This review summarizes the existing conceptual pathophysiology of different types of CRS and delves into the associated therapeutic modalities with a focus on pediatric cases. Prospective or retrospective observational studies, comparative studies, case reports, case-control, and cross-sectional studies that include pediatric patients with CRS were included in this review. Literature was searched using PubMed, EMBASE, and Google Scholar with keywords including "cardio-renal syndrome, type," "reno-cardio syndrome," "children," "acute kidney injury," and "acute decompensated heart failure" from January 2000 to January 2021. A total of 14 pediatric studies were ultimately included and analyzed, comprising a combined population of 3608 children of which 32% had CRS. Of the 14 studies, 57% were based on type 1 CRS, 14% on types 2 and 3 CRS, and 7% were on types 4 and 5 CRS. The majority of included studies were prospective cohort, although a wide spectrum was observed in terms of patient age, comorbidities, etiologies, and treatment strategies. Commonly observed comorbidities in CRS type 1 were hematologic, oncologic, cardiology-related side effects, muscular dystrophy, and pneumonia/bronchiolitis. CRS, particularly type 1, is prevalent in children and has a significant risk of mortality. The current treatment regimen primarily involves diuretics, extracorporeal fluid removal, and treatment of underlying etiologies and comorbidities.
Collapse
Affiliation(s)
- Subal Kumar Pradhan
- Division of Pediatric Nephrology, Sardar Vallabhbhai Patel Post Graduate Institute of Pediatrics and SCB Medical College, Cuttack, Odisha, India
| | - Harsha Adnani
- Anne Arundel Medical Center, Luminis Health System, Annapolis, Maryland, USA
| | - Rama Safadi
- Akron Nephrology Associates/Cleveland Clinic Akron General Medical Center, Akron, Ohio, USA
| | - Keval Yerigeri
- Department of Nephrology, Akron, Ohio, USA, Children’s Hospital, Akron, Ohio, USA
| | - Snehamayee Nayak
- Department of Pediatrics, Sardar Vallabhbhai Patel Post Graduate Institute of Pediatrics and SCB Medical College, Cuttack, Odisha, India
| | - Rupesh Raina
- Akron Nephrology Associates/Cleveland Clinic Akron General Medical Center, Akron, Ohio, USA
- Department of Nephrology, Akron, Ohio, USA, Children’s Hospital, Akron, Ohio, USA
| | - Rajiv Sinha
- Division of Pediatric Nephrology, Institute of Child Health, Kolkata, West Bengal, India
- Department of Pediatrics, Apollo Gleneagles Hospital, Kolkata, West Bengal, India
| |
Collapse
|
23
|
Grisales-Aguirre AM, Figueroa-Vallejo CJ. Modelado de tópicos aplicado al análisis del papel del aprendizaje automático en revisiones sistemáticas. REVISTA DE INVESTIGACIÓN, DESARROLLO E INNOVACIÓN 2022. [DOI: 10.19053/20278306.v12.n2.2022.15271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
El objetivo de la investigación fue analizar el papel del aprendizaje automático de datos en las revisiones sistemáticas de literatura. Se aplicó la técnica de Procesamiento de Lenguaje Natural denominada modelado de tópicos, a un conjunto de títulos y resúmenes recopilados de la base de datos Scopus. Especificamente se utilizó la técnica de Asignación Latente de Dirichlet (LDA), a partir de la cual se lograron descubrir y comprender las temáticas subyacentes en la colección de documentos. Los resultados mostraron la utilidad de la técnica utilizada en la revisión exploratoria de literatura, al permitir agrupar los resultados por temáticas. Igualmente, se pudo identificar las áreas y actividades específicas donde más se ha aplicado el aprendizaje automático, en lo referente a revisiones de literatura. Se concluye que la técnica LDA es una estrategia fácil de utilizar y cuyos resultados permiten abordar una amplia colección de documentos de manera sistemática y coherente, reduciendo notablemente el tiempo de la revisión.
Collapse
|
24
|
Tercero-Hidalgo JR, Khan KS, Bueno-Cavanillas A, Fernández-López R, Huete JF, Amezcua-Prieto C, Zamora J, Fernández-Luna JM. Artificial intelligence in COVID-19 evidence syntheses was underutilized, but impactful: a methodological study. J Clin Epidemiol 2022; 148:124-134. [PMID: 35513213 PMCID: PMC9059390 DOI: 10.1016/j.jclinepi.2022.04.027] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 03/09/2022] [Accepted: 04/28/2022] [Indexed: 11/24/2022]
Abstract
OBJECTIVES A rapidly developing scenario like a pandemic requires the prompt production of high-quality systematic reviews, which can be automated using artificial intelligence (AI) techniques. We evaluated the application of AI tools in COVID-19 evidence syntheses. STUDY DESIGN After prospective registration of the review protocol, we automated the download of all open-access COVID-19 systematic reviews in the COVID-19 Living Overview of Evidence database, indexed them for AI-related keywords, and located those that used AI tools. We compared their journals' JCR Impact Factor, citations per month, screening workloads, completion times (from pre-registration to preprint or submission to a journal) and AMSTAR-2 methodology assessments (maximum score 13 points) with a set of publication date matched control reviews without AI. RESULTS Of the 3,999 COVID-19 reviews, 28 (0.7%, 95% CI 0.47-1.03%) made use of AI. On average, compared to controls (n = 64), AI reviews were published in journals with higher Impact Factors (median 8.9 vs. 3.5, P < 0.001), and screened more abstracts per author (302.2 vs. 140.3, P = 0.009) and per included study (189.0 vs. 365.8, P < 0.001) while inspecting less full texts per author (5.3 vs. 14.0, P = 0.005). No differences were found in citation counts (0.5 vs. 0.6, P = 0.600), inspected full texts per included study (3.8 vs. 3.4, P = 0.481), completion times (74.0 vs. 123.0, P = 0.205) or AMSTAR-2 (7.5 vs. 6.3, P = 0.119). CONCLUSION AI was an underutilized tool in COVID-19 systematic reviews. Its usage, compared to reviews without AI, was associated with more efficient screening of literature and higher publication impact. There is scope for the application of AI in automating systematic reviews.
Collapse
Affiliation(s)
- Juan R Tercero-Hidalgo
- Department of Preventive Medicine and Public Health, University of Granada, Granada, Spain; CIBER Epidemiology and Public Health (CIBERESP), Madrid, Spain; Instituto Biosanitario Granada (IBS-Granada), Granada, Spain.
| | - Khalid S Khan
- Department of Preventive Medicine and Public Health, University of Granada, Granada, Spain; CIBER Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Aurora Bueno-Cavanillas
- Department of Preventive Medicine and Public Health, University of Granada, Granada, Spain; CIBER Epidemiology and Public Health (CIBERESP), Madrid, Spain; Instituto Biosanitario Granada (IBS-Granada), Granada, Spain
| | | | - Juan F Huete
- Department of Computer Science and Artificial Intelligence, School of Technology and Telecommunications Engineering, University of Granada, Granada, Spain
| | - Carmen Amezcua-Prieto
- Department of Preventive Medicine and Public Health, University of Granada, Granada, Spain; CIBER Epidemiology and Public Health (CIBERESP), Madrid, Spain; Instituto Biosanitario Granada (IBS-Granada), Granada, Spain
| | - Javier Zamora
- CIBER Epidemiology and Public Health (CIBERESP), Madrid, Spain; Clinical Biostatistics Unit, Hospital Ramon y Cajal (IRYCIS), Madrid, Spain; Institute for Metabolism and Systems Research, University of Birmingham, Birmingham, United Kingdom
| | - Juan M Fernández-Luna
- Department of Computer Science and Artificial Intelligence, School of Technology and Telecommunications Engineering, University of Granada, Granada, Spain
| |
Collapse
|
25
|
Grbin L, Nichols P, Russell F, Fuller-Tyszkiewicz M, Olsson CA. The Development of a Living Knowledge System and Implications for Future Systematic Searching. JOURNAL OF THE AUSTRALIAN LIBRARY AND INFORMATION ASSOCIATION 2022. [DOI: 10.1080/24750158.2022.2087954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Lisa Grbin
- Faculty of Health Library Services, Deakin University, Geelong, Australia
| | - Peter Nichols
- Library Research Services, Deakin University, Geelong, Australia
| | - Fiona Russell
- Faculty of Health Library Services, Deakin University, Geelong, Australia
| | - Matthew Fuller-Tyszkiewicz
- School of Psychology, Deakin University, Geelong, Australia
- Centre for Social and Early Emotional Development, Deakin University, Geelong, Australia
| | - Craig A. Olsson
- School of Psychology, Deakin University, Geelong, Australia
- Centre for Social and Early Emotional Development, Deakin University, Geelong, Australia
| |
Collapse
|
26
|
A text-mining tool generated title-abstract screening workload savings: performance evaluation versus single-human screening. J Clin Epidemiol 2022; 149:53-59. [DOI: 10.1016/j.jclinepi.2022.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 04/13/2022] [Accepted: 05/24/2022] [Indexed: 11/17/2022]
|
27
|
Blaizot A, Veettil SK, Saidoung P, Moreno-Garcia CF, Wiratunga N, Aceves-Martins M, Lai NM, Chaiyakunapruk N. Using artificial intelligence methods for systematic review in health sciences: A systematic review. Res Synth Methods 2022; 13:353-362. [PMID: 35174972 DOI: 10.1002/jrsm.1553] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 01/31/2022] [Accepted: 02/07/2022] [Indexed: 11/07/2022]
Abstract
The exponential increase in published articles makes a thorough and expedient review of literature increasingly challenging. This review delineated automated tools and platforms that employ artificial intelligence (AI) approaches and evaluated the reported benefits and challenges in using such methods. A search was conducted in 4 databases (Medline, Embase, CDSR, and Epistemonikos) up to April 2021 for systematic reviews and other related reviews implementing AI methods. To be included, the review must use any form of AI method, including machine learning, deep learning, neural network, or any other applications used to enable the full or semi-autonomous performance of one or more stages in the development of evidence synthesis. Twelve reviews were included, using nine different tools to implement 15 different AI methods. Eleven methods were used in the screening stages of the review (73%). The rest were divided: two in data extraction (13%) and two in risk of bias assessment (13%). The ambiguous benefits of the data extractions, combined with the reported advantages from 10 reviews, indicating that AI platforms have taken hold with varying success in evidence synthesis. However, the results are qualified by the reliance on the self-reporting of the review authors. Extensive human validation still appears required at this stage in implementing AI methods, though further evaluation is required to define the overall contribution of such platforms in enhancing efficiency and quality in evidence synthesis.
Collapse
Affiliation(s)
- Aymeric Blaizot
- Department of Pharmacotherapy, College of Pharmacy, University of Utah, Utah, USA
| | - Sajesh K Veettil
- Department of Pharmacotherapy, College of Pharmacy, University of Utah, Utah, USA
| | | | | | | | | | - Nai Ming Lai
- School of Medicine, Faculty of Health and Medical Sciences, Taylors University, Selangor, Malaysia
- School of Pharmacy, Monash University Malaysia, Selangor, Malaysia
| | - Nathorn Chaiyakunapruk
- Department of Pharmacotherapy, College of Pharmacy, University of Utah, Utah, USA
- IDEAS Center, Veterans Affairs Salt Lake City Healthcare System, Salt Lake City, Utah, USA
| |
Collapse
|
28
|
Tuttle LJ, Donahue MJ. Effects of sediment exposure on corals: a systematic review of experimental studies. ENVIRONMENTAL EVIDENCE 2022; 11:4. [PMID: 39294657 PMCID: PMC8818373 DOI: 10.1186/s13750-022-00256-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 01/10/2022] [Indexed: 06/01/2023]
Abstract
BACKGROUND Management actions that address local-scale stressors on coral reefs can rapidly improve water quality and reef ecosystem condition. In response to reef managers who need actionable thresholds for coastal runoff and dredging, we conducted a systematic review and meta-analysis of experimental studies that explore the effects of sediment on corals. We identified exposure levels that 'adversely' affect corals while accounting for sediment bearing (deposited vs. suspended), coral life-history stage, and species, thus providing empirically based estimates of stressor thresholds on vulnerable coral reefs. METHODS We searched online databases and grey literature to obtain a list of potential studies, assess their eligibility, and critically appraise them for validity and risk of bias. Data were extracted from eligible studies and grouped by sediment bearing and coral response to identify thresholds in terms of the lowest exposure levels that induced an adverse physiological and/or lethal effect. Meta-regression estimated the dose-response relationship between exposure level and the magnitude of a coral's response, with random-effects structures to estimate the proportion of variance explained by factors such as study and coral species. REVIEW FINDINGS After critical appraisal of over 15,000 records, our systematic review of corals' responses to sediment identified 86 studies to be included in meta-analyses (45 studies for deposited sediment and 42 studies for suspended sediment). The lowest sediment exposure levels that caused adverse effects in corals were well below the levels previously described as 'normal' on reefs: for deposited sediment, adverse effects occurred as low as 1 mg/cm2/day for larvae (limited settlement rates) and 4.9 mg/cm2/day for adults (tissue mortality); for suspended sediment, adverse effects occurred as low as 10 mg/L for juveniles (reduced growth rates) and 3.2 mg/L for adults (bleaching and tissue mortality). Corals take at least 10 times longer to experience tissue mortality from exposure to suspended sediment than to comparable concentrations of deposited sediment, though physiological changes manifest 10 times faster in response to suspended sediment than to deposited sediment. Threshold estimates derived from continuous response variables (magnitude of adverse effect) largely matched the lowest-observed adverse-effect levels from a summary of studies, or otherwise helped us to identify research gaps that should be addressed to better quantify the dose-response relationship between sediment exposure and coral health. CONCLUSIONS We compiled a global dataset that spans three oceans, over 140 coral species, decades of research, and a range of field- and lab-based approaches. Our review and meta-analysis inform the no-observed and lowest-observed adverse-effect levels (NOAEL, LOAEL) that are used in management consultations by U.S. federal agencies. In the absence of more location- or species-specific data to inform decisions, our results provide the best available information to protect vulnerable reef-building corals from sediment stress. Based on gaps and limitations identified by our review, we make recommendations to improve future studies and recommend future synthesis to disentangle the potentially synergistic effects of multiple coral-reef stressors.
Collapse
Affiliation(s)
- Lillian J. Tuttle
- Hawai‘i Institute of Marine Biology, University of Hawai‘i at Mānoa, Kāne‘ohe, HI 96744 USA
- NOAA NMFS Pacific Islands Regional Office, Honolulu, HI 96860 USA
| | - Megan J. Donahue
- Hawai‘i Institute of Marine Biology, University of Hawai‘i at Mānoa, Kāne‘ohe, HI 96744 USA
| |
Collapse
|
29
|
Artificial Intelligence in Evidence-Based Medicine. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
30
|
Monteiro S, Nejad YS, Aucoin M. Perinatal diet and offspring anxiety: A scoping review. Transl Neurosci 2022; 13:275-290. [PMID: 36128579 PMCID: PMC9449687 DOI: 10.1515/tnsci-2022-0242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/04/2022] [Accepted: 08/12/2022] [Indexed: 11/15/2022] Open
Abstract
Health behaviors during pregnancy have an impact on the developing offspring. Dietary factors play a role in the development of mental illness: however, less is known about the impact of diet factors during pre-conception, gestation, and lactation on anxiety levels in offspring. This scoping review sought to systematically map the available research involving human and animal subjects to identify nutritional interventions which may have a harmful or protective effect, as well as identify gaps. Studies investigating an association between any perinatal diet pattern or diet constituent and offspring anxiety were included. The number of studies reporting an association with increased or decreased levels of anxiety were counted and presented in figures. A total of 55,914 results were identified as part of a larger scoping review, and 120 articles met the criteria for inclusion. A greater intake of phytochemicals and vitamins were associated with decreased offspring anxiety whereas maternal caloric restriction, protein restriction, reduced omega-3 consumption, and exposure to a high fat diet were associated with higher levels of offspring anxiety. Results were limited by a very large proportion of animal studies. High quality intervention studies involving human subjects are warranted to elucidate the precise dietary factors or constituents that modulate the risk of anxiety in offspring.
Collapse
Affiliation(s)
- Sasha Monteiro
- Department of Research and Clinical Epidemiology, Canadian College of Naturopathic Medicine, 1255 Sheppard Ave E, Toronto, ON, M2K 1E2, Canada
| | - Yousef Sadat Nejad
- Department of Research and Clinical Epidemiology, Canadian College of Naturopathic Medicine, 1255 Sheppard Ave E, Toronto, ON, M2K 1E2, Canada
| | - Monique Aucoin
- Department of Research and Clinical Epidemiology, Canadian College of Naturopathic Medicine, 1255 Sheppard Ave E, Toronto, ON, M2K 1E2, Canada
| |
Collapse
|
31
|
Hamel C, Hersi M, Kelly SE, Tricco AC, Straus S, Wells G, Pham B, Hutton B. Guidance for using artificial intelligence for title and abstract screening while conducting knowledge syntheses. BMC Med Res Methodol 2021; 21:285. [PMID: 34930132 PMCID: PMC8686081 DOI: 10.1186/s12874-021-01451-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 10/26/2021] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Systematic reviews are the cornerstone of evidence-based medicine. However, systematic reviews are time consuming and there is growing demand to produce evidence more quickly, while maintaining robust methods. In recent years, artificial intelligence and active-machine learning (AML) have been implemented into several SR software applications. As some of the barriers to adoption of new technologies are the challenges in set-up and how best to use these technologies, we have provided different situations and considerations for knowledge synthesis teams to consider when using artificial intelligence and AML for title and abstract screening. METHODS We retrospectively evaluated the implementation and performance of AML across a set of ten historically completed systematic reviews. Based upon the findings from this work and in consideration of the barriers we have encountered and navigated during the past 24 months in using these tools prospectively in our research, we discussed and developed a series of practical recommendations for research teams to consider in seeking to implement AML tools for citation screening into their workflow. RESULTS We developed a seven-step framework and provide guidance for when and how to integrate artificial intelligence and AML into the title and abstract screening process. Steps include: (1) Consulting with Knowledge user/Expert Panel; (2) Developing the search strategy; (3) Preparing your review team; (4) Preparing your database; (5) Building the initial training set; (6) Ongoing screening; and (7) Truncating screening. During Step 6 and/or 7, you may also choose to optimize your team, by shifting some members to other review stages (e.g., full-text screening, data extraction). CONCLUSION Artificial intelligence and, more specifically, AML are well-developed tools for title and abstract screening and can be integrated into the screening process in several ways. Regardless of the method chosen, transparent reporting of these methods is critical for future studies evaluating artificial intelligence and AML.
Collapse
Affiliation(s)
- Candyce Hamel
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
| | - Mona Hersi
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
| | - Shannon E. Kelly
- Cardiovascular Research Methods Centre, University of Ottawa Heart Institute, Ottawa, Ontario Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario Canada
| | - Andrea C. Tricco
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON Canada
- Epidemiology Division and Institute for Health, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario Canada
| | - Sharon Straus
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON Canada
- Department of Medicine, University of Toronto, Toronto, ON Canada
| | - George Wells
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- Cardiovascular Research Methods Centre, University of Ottawa Heart Institute, Ottawa, Ontario Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario Canada
| | - Ba’ Pham
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON Canada
| | - Brian Hutton
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario Canada
| |
Collapse
|
32
|
Surian D, Bourgeois FT, Dunn AG. The automation of relevant trial registration screening for systematic review updates: an evaluation study on a large dataset of ClinicalTrials.gov registrations. BMC Med Res Methodol 2021; 21:281. [PMID: 34922458 PMCID: PMC8684229 DOI: 10.1186/s12874-021-01485-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 11/22/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clinical trial registries can be used as sources of clinical evidence for systematic review synthesis and updating. Our aim was to evaluate methods for identifying clinical trial registrations that should be screened for inclusion in updates of published systematic reviews. METHODS A set of 4644 clinical trial registrations (ClinicalTrials.gov) included in 1089 systematic reviews (PubMed) were used to evaluate two methods (document similarity and hierarchical clustering) and representations (L2-normalised TF-IDF, Latent Dirichlet Allocation, and Doc2Vec) for ranking 163,501 completed clinical trials by relevance. Clinical trial registrations were ranked for each systematic review using seeding clinical trials, simulating how new relevant clinical trials could be automatically identified for an update. Performance was measured by the number of clinical trials that need to be screened to identify all relevant clinical trials. RESULTS Using the document similarity method with TF-IDF feature representation and Euclidean distance metric, all relevant clinical trials for half of the systematic reviews were identified after screening 99 trials (IQR 19 to 491). The best-performing hierarchical clustering was using Ward agglomerative clustering (with TF-IDF representation and Euclidean distance) and needed to screen 501 clinical trials (IQR 43 to 4363) to achieve the same result. CONCLUSION An evaluation using a large set of mined links between published systematic reviews and clinical trial registrations showed that document similarity outperformed hierarchical clustering for identifying relevant clinical trials to include in systematic review updates.
Collapse
Affiliation(s)
- Didi Surian
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, NSW, Australia
| | - Florence T Bourgeois
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Adam G Dunn
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA.
- The University of Sydney, Discipline of Biomedical Informatics and Digital Health, School of Medical Sciences, Faculty of Medicine and Health, Sydney, NSW, 2006, Australia.
| |
Collapse
|
33
|
Aucoin M, LaChance L, Naidoo U, Remy D, Shekdar T, Sayar N, Cardozo V, Rawana T, Chan I, Cooley K. Diet and Anxiety: A Scoping Review. Nutrients 2021; 13:nu13124418. [PMID: 34959972 PMCID: PMC8706568 DOI: 10.3390/nu13124418] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 11/23/2021] [Accepted: 12/04/2021] [Indexed: 12/22/2022] Open
Abstract
Anxiety disorders are the most common group of mental disorders. There is mounting evidence demonstrating the importance of nutrition in the development and progression of mental disorders such as depression; however, less is known about the role of nutrition in anxiety disorders. This scoping review sought to systematically map the existing literature on anxiety disorders and nutrition in order to identify associations between dietary factors and anxiety symptoms or disorder prevalence as well as identify gaps and opportunities for further research. The review followed established methodological approaches for scoping reviews. Due to the large volume of results, an online program (Abstrackr) with artificial intelligence features was used. Studies reporting an association between a dietary constituent and anxiety symptoms or disorders were counted and presented in figures. A total of 55,914 unique results were identified. After a full-text review, 1541 articles met criteria for inclusion. Analysis revealed an association between less anxiety and more fruits and vegetables, omega-3 fatty acids, “healthy” dietary patterns, caloric restriction, breakfast consumption, ketogenic diet, broad-spectrum micronutrient supplementation, zinc, magnesium and selenium, probiotics, and a range of phytochemicals. Analysis revealed an association between higher levels of anxiety and high-fat diet, inadequate tryptophan and dietary protein, high intake of sugar and refined carbohydrates, and “unhealthy” dietary patterns. Results are limited by a large percentage of animal and observational studies. Only 10% of intervention studies involved participants with anxiety disorders, limiting the applicability of the findings. High quality intervention studies involving participants with anxiety disorders are warranted.
Collapse
Affiliation(s)
- Monique Aucoin
- Canadian College of Naturopathic Medicine, Toronto, ON M2K 1E2, Canada
| | - Laura LaChance
- Department of Psychiatry, McGill University, Montreal, QC H3A 0G4, Canada
- St. Mary's Hospital Centre, Montreal, QC H3T 1M5, Canada
| | - Umadevi Naidoo
- Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA
| | - Daniella Remy
- Canadian College of Naturopathic Medicine, Toronto, ON M2K 1E2, Canada
- Anthrophi Technologies, Toronto, ON M6H1W2, Canada
| | - Tanisha Shekdar
- Canadian College of Naturopathic Medicine, Toronto, ON M2K 1E2, Canada
| | - Negin Sayar
- Canadian College of Naturopathic Medicine, Toronto, ON M2K 1E2, Canada
| | - Valentina Cardozo
- Canadian College of Naturopathic Medicine, Toronto, ON M2K 1E2, Canada
| | - Tara Rawana
- Canadian College of Naturopathic Medicine, Toronto, ON M2K 1E2, Canada
| | - Irina Chan
- Canadian College of Naturopathic Medicine, Toronto, ON M2K 1E2, Canada
| | - Kieran Cooley
- Canadian College of Naturopathic Medicine, Toronto, ON M2K 1E2, Canada
- School of Public Health, Australian Research Centre in Complementary and Integrative Medicine (ARCCIM), University of Technology Sydney, Ultimo 2007, Australia
- Pacific College of Health Sciences, San Diego, CA 92108, USA
- National Centre for Naturopathic Medicine, Southern Cross University, Lismore 2480, Australia
| |
Collapse
|
34
|
Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: A scoping review. J Clin Epidemiol 2021; 144:22-42. [PMID: 34896236 DOI: 10.1016/j.jclinepi.2021.12.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 11/09/2021] [Accepted: 12/02/2021] [Indexed: 11/19/2022]
Abstract
OBJECTIVE The objectives of this scoping review are to identify the reliability and validity of the available tools, their limitations and any recommendations to further improve the use of these tools. STUDY DESIGN A scoping review methodology was followed to map the literature published on the challenges and solutions of conducting evidence synthesis using the JBI scoping review methodology. RESULTS A total of 47 publications were included in the review. The current scoping review identified that LitSuggest, Rayyan, Abstractr, BIBOT, R software, RobotAnalyst, DistillerSR, ExaCT and NetMetaXL have potential to be used for the automation of systematic reviews. However, they are not without limitations. The review also identified other studies that employed algorithms that have not yet been developed into user friendly tools. Some of these algorithms showed high validity and reliability but their use is conditional on user knowledge of computer science and algorithms. CONCLUSION Abstract screening has reached maturity; data extraction is still an active area. Developing methods to semi-automate different steps of evidence synthesis via machine learning remains an important research direction. Also, it is important to move from the research prototypes currently available to professionally maintained platforms.
Collapse
Affiliation(s)
- Hanan Khalil
- School of Psychology and Public Health, Department of Public Health, La Trobe University, Melbourne Campus, Victoria, Australia.
| | - Daniel Ameen
- Faculty of Medicine, Nursing and Health Sciences, Monash University, Wellington Road, Clayton Vic 3168, Australia
| | - Armita Zarnegar
- School of Psychology and Public Health, Department of Public Health, La Trobe University, Melbourne Campus, Victoria, Australia.
- School of Science, Computing and engineering technologies, Swinburne University of Technology, Melbourne, Australia
| |
Collapse
|
35
|
Abdelkader W, Navarro T, Parrish R, Cotoi C, Germini F, Iorio A, Haynes RB, Lokker C. Machine Learning Approaches to Retrieve High-Quality, Clinically Relevant Evidence From the Biomedical Literature: Systematic Review. JMIR Med Inform 2021; 9:e30401. [PMID: 34499041 PMCID: PMC8461527 DOI: 10.2196/30401] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/15/2021] [Accepted: 07/25/2021] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND The rapid growth of the biomedical literature makes identifying strong evidence a time-consuming task. Applying machine learning to the process could be a viable solution that limits effort while maintaining accuracy. OBJECTIVE The goal of the research was to summarize the nature and comparative performance of machine learning approaches that have been applied to retrieve high-quality evidence for clinical consideration from the biomedical literature. METHODS We conducted a systematic review of studies that applied machine learning techniques to identify high-quality clinical articles in the biomedical literature. Multiple databases were searched to July 2020. Extracted data focused on the applied machine learning model, steps in the development of the models, and model performance. RESULTS From 3918 retrieved studies, 10 met our inclusion criteria. All followed a supervised machine learning approach and applied, from a limited range of options, a high-quality standard for the training of their model. The results show that machine learning can achieve a sensitivity of 95% while maintaining a high precision of 86%. CONCLUSIONS Machine learning approaches perform well in retrieving high-quality clinical studies. Performance may improve by applying more sophisticated approaches such as active learning and unsupervised machine learning approaches.
Collapse
Affiliation(s)
- Wael Abdelkader
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Tamara Navarro
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Rick Parrish
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Chris Cotoi
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Federico Germini
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
- Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Alfonso Iorio
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
- Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - R Brian Haynes
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
- Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Cynthia Lokker
- Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
36
|
O'Hearn K, MacDonald C, Tsampalieros A, Kadota L, Sandarage R, Jayawarden SK, Datko M, Reynolds JM, Bui T, Sultan S, Sampson M, Pratt M, Barrowman N, Nama N, Page M, McNally JD. Evaluating the relationship between citation set size, team size and screening methods used in systematic reviews: a cross-sectional study. BMC Med Res Methodol 2021; 21:142. [PMID: 34238247 PMCID: PMC8264476 DOI: 10.1186/s12874-021-01335-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 06/19/2021] [Indexed: 11/26/2022] Open
Abstract
Background Standard practice for conducting systematic reviews (SRs) is time consuming and involves the study team screening hundreds or thousands of citations. As the volume of medical literature grows, the citation set sizes and corresponding screening efforts increase. While larger team size and alternate screening methods have the potential to reduce workload and decrease SR completion times, it is unknown whether investigators adapt team size or methods in response to citation set sizes. Using a cross-sectional design, we sought to understand how citation set size impacts (1) the total number of authors or individuals contributing to screening and (2) screening methods. Methods MEDLINE was searched in April 2019 for SRs on any health topic. A total of 1880 unique publications were identified and sorted into five citation set size categories (after deduplication): < 1,000, 1,001–2,500, 2,501–5,000, 5,001–10,000, and > 10,000. A random sample of 259 SRs were selected (~ 50 per category) for data extraction and analysis. Results With the exception of the pairwise t test comparing the under 1000 and over 10,000 categories (median 5 vs. 6, p = 0.049) no statistically significant relationship was evident between author number and citation set size. While visual inspection was suggestive, statistical testing did not consistently identify a relationship between citation set size and number of screeners (title-abstract, full text) or data extractors. However, logistic regression identified investigators were significantly more likely to deviate from gold-standard screening methods (i.e. independent duplicate screening) with larger citation sets. For every doubling of citation size, the odds of using gold-standard screening decreased by 15 and 20% at title-abstract and full text review, respectively. Finally, few SRs reported using crowdsourcing (n = 2) or computer-assisted screening (n = 1). Conclusions Large citation set sizes present a challenge to SR teams, especially when faced with time-sensitive health policy questions. Our study suggests that with increasing citation set size, authors are less likely to adhere to gold-standard screening methods. It is possible that adjunct screening methods, such as crowdsourcing (large team) and computer-assisted technologies, may provide a viable solution for authors to complete their SRs in a timely manner. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01335-5.
Collapse
Affiliation(s)
| | - Cameron MacDonald
- School of Engineering and Applied Sciences, McMaster University, Hamilton, ON, Canada
| | | | - Leo Kadota
- Department of Pediatrics, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Ryan Sandarage
- Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | | | - Michele Datko
- ECRI Information Center, ECRI, Plymouth Meeting, PA, USA
| | - John M Reynolds
- Calder Memorial Library, University of Miami Miller School of Medicine, MLIS, Miami, FL, USA
| | - Thanh Bui
- Faculty of Arts & Science, University of Toronto, Toronto, ON, Canada
| | - Shagufta Sultan
- Therapeutic Products Directorate, Health Canada, Ottawa, ON, Canada
| | | | | | | | - Nassr Nama
- Department of Pediatrics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Matthew Page
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| | - James Dayre McNally
- CHEO Research Institute, Ottawa, ON, Canada. .,Department of Pediatrics, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada. .,Department of Pediatrics, CHEO, 401 Smyth Road, ON, K1H 8L1, Ottawa, Canada.
| |
Collapse
|
37
|
Pham B, Jovanovic J, Bagheri E, Antony J, Ashoor H, Nguyen TT, Rios P, Robson R, Thomas SM, Watt J, Straus SE, Tricco AC. Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow. Syst Rev 2021; 10:156. [PMID: 34039433 PMCID: PMC8152711 DOI: 10.1186/s13643-021-01700-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 05/12/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated "workflow" to conduct abstract screening for systematic reviews and other knowledge synthesis methods. METHODS We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for ("true") eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening. RESULTS With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews. CONCLUSION The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review's conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers.
Collapse
Affiliation(s)
- Ba’ Pham
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
| | - Jelena Jovanovic
- Department of Software Engineering, University of Belgrade, Jove Ilica 154, Belgrade, 11000 Serbia
| | - Ebrahim Bagheri
- Department of Electrical and Computer Engineering, Ryerson University, 350 Victoria Street, Toronto, Ontario M5B 2K3 Canada
| | - Jesmin Antony
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
| | - Huda Ashoor
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
| | - Tam T. Nguyen
- Department of Electrical and Computer Engineering, Ryerson University, 350 Victoria Street, Toronto, Ontario M5B 2K3 Canada
| | - Patricia Rios
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
| | - Reid Robson
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
| | - Sonia M. Thomas
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
| | - Jennifer Watt
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
| | - Sharon E. Straus
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
| | - Andrea C. Tricco
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario M5B 1T8 Canada
- Epidemiology Division and Institute for Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, 155 College St Room 500, Toronto, Ontario M5T 3M7 Canada
- Queen’s Collaboration for Health Care Quality Joanna Briggs Institute Centre of Excellence, School of Nursing, Queen’s University, 99 University Ave, Kingston, Ontario K7L 3N6 Canada
| |
Collapse
|
38
|
Kazda L, Bell K, Thomas R, McGeechan K, Sims R, Barratt A. Overdiagnosis of Attention-Deficit/Hyperactivity Disorder in Children and Adolescents: A Systematic Scoping Review. JAMA Netw Open 2021; 4:e215335. [PMID: 33843998 PMCID: PMC8042533 DOI: 10.1001/jamanetworkopen.2021.5335] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
IMPORTANCE Reported increases in attention-deficit/hyperactivity disorder (ADHD) diagnoses are accompanied by growing debate about the underlying factors. Although overdiagnosis is often suggested, no comprehensive evaluation of evidence for or against overdiagnosis has ever been undertaken and is urgently needed to enable evidence-based, patient-centered diagnosis and treatment of ADHD in contemporary health services. OBJECTIVE To systematically identify, appraise, and synthesize the evidence on overdiagnosis of ADHD in children and adolescents using a published 5-question framework for detecting overdiagnosis in noncancer conditions. EVIDENCE REVIEW This systematic scoping review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews and Joanna Briggs Methodology, including the PRISMA-ScR Checklist. MEDLINE, Embase, PsychINFO, and the Cochrane Library databases were searched for studies published in English between January 1, 1979, and August 21, 2020. Studies of children and adolescents (aged ≤18 years) with ADHD that focused on overdiagnosis plus studies that could be mapped to 1 or more framework question were included. Two researchers independently reviewed all abstracts and full-text articles, and all included studies were assessed for quality. FINDINGS Of the 12 267 potentially relevant studies retrieved, 334 (2.7%) were included. Of the 334 studies, 61 (18.3%) were secondary and 273 (81.7%) were primary research articles. Substantial evidence of a reservoir of ADHD was found in 104 studies, providing a potential for diagnoses to increase (question 1). Evidence that actual ADHD diagnosis had increased was found in 45 studies (question 2). Twenty-five studies showed that these additional cases may be on the milder end of the ADHD spectrum (question 3), and 83 studies showed that pharmacological treatment of ADHD was increasing (question 4). A total of 151 studies reported on outcomes of diagnosis and pharmacological treatment (question 5). However, only 5 studies evaluated the critical issue of benefits and harms among the additional, milder cases. These studies supported a hypothesis of diminishing returns in which the harms may outweigh the benefits for youths with milder symptoms. CONCLUSIONS AND RELEVANCE This review found evidence of ADHD overdiagnosis and overtreatment in children and adolescents. Evidence gaps remain and future research is needed, in particular research on the long-term benefits and harms of diagnosing and treating ADHD in youths with milder symptoms; therefore, practitioners should be mindful of these knowledge gaps, especially when identifying these individuals and to ensure safe and equitable practice and policy.
Collapse
Affiliation(s)
- Luise Kazda
- Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
| | - Katy Bell
- Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
| | - Rae Thomas
- Institute for Evidence-Based Healthcare, Bond University, Gold Coast, Queensland, Australia
| | - Kevin McGeechan
- Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
| | - Rebecca Sims
- Institute for Evidence-Based Healthcare, Bond University, Gold Coast, Queensland, Australia
| | - Alexandra Barratt
- Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
39
|
Chai KEK, Lines RLJ, Gucciardi DF, Ng L. Research Screener: a machine learning tool to semi-automate abstract screening for systematic reviews. Syst Rev 2021; 10:93. [PMID: 33795003 PMCID: PMC8017894 DOI: 10.1186/s13643-021-01635-3] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 03/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Systematic reviews and meta-analyses provide the highest level of evidence to help inform policy and practice, yet their rigorous nature is associated with significant time and economic demands. The screening of titles and abstracts is the most time consuming part of the review process with analysts required review thousands of articles manually, taking on average 33 days. New technologies aimed at streamlining the screening process have provided initial promising findings, yet there are limitations with current approaches and barriers to the widespread use of these tools. In this paper, we introduce and report initial evidence on the utility of Research Screener, a semi-automated machine learning tool to facilitate abstract screening. METHODS Three sets of analyses (simulation, interactive and sensitivity) were conducted to provide evidence of the utility of the tool through both simulated and real-world examples. RESULTS Research Screener delivered a workload saving of between 60 and 96% across nine systematic reviews and two scoping reviews. Findings from the real-world interactive analysis demonstrated a time saving of 12.53 days compared to the manual screening, which equates to a financial saving of USD 2444. Conservatively, our results suggest that analysts who scan 50% of the total pool of articles identified via a systematic search are highly likely to have identified 100% of eligible papers. CONCLUSIONS In light of these findings, Research Screener is able to reduce the burden for researchers wishing to conduct a comprehensive systematic review without reducing the scientific rigour for which they strive to achieve.
Collapse
Affiliation(s)
- Kevin E K Chai
- Curtin Institute for Computation, Curtin University, Perth, Australia
- School of Population Health, Curtin University, Perth, Australia
| | - Robin L J Lines
- School of Allied Health, Curtin University, Perth, Australia
| | | | - Leo Ng
- School of Allied Health, Curtin University, Perth, Australia.
| |
Collapse
|
40
|
Natural language processing was effective in assisting rapid title and abstract screening when updating systematic reviews. J Clin Epidemiol 2021; 133:121-129. [PMID: 33485929 DOI: 10.1016/j.jclinepi.2021.01.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 01/06/2021] [Accepted: 01/14/2021] [Indexed: 02/05/2023]
Abstract
BACKGROUND AND OBJECTIVE To examine whether the use of natural language processing (NLP) technology is effective in assisting rapid title and abstract screening when updating a systematic review. STUDY DESIGN Using the searched literature from a published systematic review, we trained and tested an NLP model that enables rapid title and abstract screening when updating a systematic review. The model was a light gradient boosting machine (LightGBM), an ensemble learning classifier which integrates four pretrained Bidirectional Encoder Representations from Transformers (BERT) models. We divided the searched citations into two sets (ie, training and test sets). The model was trained using the training set and assessed for screening performance using the test set. The searched citations, whose eligibility was determined by two independent reviewers, were treated as the reference standard. RESULTS The test set included 947 citations; our model included 340 citations, excluded 607 citations, and achieved 96% sensitivity, and 78% specificity. If the classifier assessment in the case study was accepted, reviewers would lose 8 of 180 eligible citations (4%), none of which were ultimately included in the systematic review after full-text consideration, while decreasing the workload by 64.1%. CONCLUSION NLP technology using the ensemble learning method may effectively assist in rapid literature screening when updating systematic reviews.
Collapse
|
41
|
Mahri M, Shen N, Berrizbeitia F, Rodan R, Daer A, Faigan M, Taqi D, Wu KY, Ahmadi M, Ducret M, Emami E, Tamimi F. Osseointegration Pharmacology: A Systematic Mapping Using Artificial Intelligence. Acta Biomater 2021; 119:284-302. [PMID: 33181361 DOI: 10.1016/j.actbio.2020.11.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 11/04/2020] [Accepted: 11/05/2020] [Indexed: 12/25/2022]
Abstract
Clinical performance of osseointegrated implants could be compromised by the medications taken by patients. The effect of a specific medication on osseointegration can be easily investigated using traditional systematic reviews. However, assessment of all known medications requires the use of evidence mapping methods. These methods allow assessment of complex questions, but they are very resource intensive when done manually. The objective of this study was to develop a machine learning algorithm to automatically map the literature assessing the effect of medications on osseointegration. Datasets of articles classified manually were used to train a machine-learning algorithm based on Support Vector Machines. The algorithm was then validated and used to screen 599,604 articles identified with an extremely sensitive search strategy. The algorithm included 281 relevant articles that described the effect of 31 different drugs on osseointegration. This approach achieved an accuracy of 95%, and compared to manual screening, it reduced the workload by 93%. The systematic mapping revealed that the treatment outcomes of osseointegrated medical devices could be influenced by drugs affecting homeostasis, inflammation, cell proliferation and bone remodeling. The effect of all known medications on the performance of osseointegrated medical devices can be assessed using evidence mappings executed with highly accurate machine learning algorithms.
Collapse
|
42
|
Artificial Intelligence in Evidence-Based Medicine. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_43-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
43
|
Yamada T, Yoneoka D, Hiraike Y, Hino K, Toyoshiba H, Shishido A, Noma H, Shojima N, Yamauchi T. Deep Neural Network for Reducing the Screening Workload in Systematic Reviews for Clinical Guidelines: Algorithm Validation Study. J Med Internet Res 2020; 22:e22422. [PMID: 33262102 PMCID: PMC7806440 DOI: 10.2196/22422] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 11/10/2020] [Accepted: 11/30/2020] [Indexed: 01/16/2023] Open
Abstract
Background Performing systematic reviews is a time-consuming and resource-intensive process. Objective We investigated whether a machine learning system could perform systematic reviews more efficiently. Methods All systematic reviews and meta-analyses of interventional randomized controlled trials cited in recent clinical guidelines from the American Diabetes Association, American College of Cardiology, American Heart Association (2 guidelines), and American Stroke Association were assessed. After reproducing the primary screening data set according to the published search strategy of each, we extracted correct articles (those actually reviewed) and incorrect articles (those not reviewed) from the data set. These 2 sets of articles were used to train a neural network–based artificial intelligence engine (Concept Encoder, Fronteo Inc). The primary endpoint was work saved over sampling at 95% recall (WSS@95%). Results Among 145 candidate reviews of randomized controlled trials, 8 reviews fulfilled the inclusion criteria. For these 8 reviews, the machine learning system significantly reduced the literature screening workload by at least 6-fold versus that of manual screening based on WSS@95%. When machine learning was initiated using 2 correct articles that were randomly selected by a researcher, a 10-fold reduction in workload was achieved versus that of manual screening based on the WSS@95% value, with high sensitivity for eligible studies. The area under the receiver operating characteristic curve increased dramatically every time the algorithm learned a correct article. Conclusions Concept Encoder achieved a 10-fold reduction of the screening workload for systematic review after learning from 2 randomly selected studies on the target topic. However, few meta-analyses of randomized controlled trials were included. Concept Encoder could facilitate the acquisition of evidence for clinical guidelines.
Collapse
Affiliation(s)
- Tomohide Yamada
- University Institute for Population Health, King's College London, London, United Kingdom.,Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
| | - Daisuke Yoneoka
- Graduate School of Public Health, St Luke's International University, Tokyo, Japan
| | - Yuta Hiraike
- Department of Cell Biology, Harvard Medical School, Boston, MA, United States
| | | | | | | | - Hisashi Noma
- Department of Data Science, The Institute of Statistical Mathematics, Tokyo, Japan
| | - Nobuhiro Shojima
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
| | - Toshimasa Yamauchi
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
| |
Collapse
|
44
|
Gates A, Gates M, DaRosa D, Elliott SA, Pillay J, Rahman S, Vandermeer B, Hartling L. Decoding semi-automated title-abstract screening: findings from a convenience sample of reviews. Syst Rev 2020; 9:272. [PMID: 33243276 PMCID: PMC7694314 DOI: 10.1186/s13643-020-01528-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 11/11/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening and explored whether Abstrackr's predictions varied by review or study-level characteristics. METHODS For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews), we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool's predictions varied by review and study-level characteristics. RESULTS Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) h of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) - 1.53 (- 2.92, - 0.15) to - 1.17 (- 2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P = 0.37) or intervention type (simple or complex, P = 0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P = 0.01), or that included only trials (95%) vs. multiple designs (86%) (P = 0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P = 0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P = 0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P = 0.02) were more often correctly predicted as relevant. CONCLUSION Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. Several of our findings are paradoxical and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample, the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.
Collapse
Affiliation(s)
- Allison Gates
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Michelle Gates
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Daniel DaRosa
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Sarah A. Elliott
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Jennifer Pillay
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Sholeh Rahman
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Ben Vandermeer
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Lisa Hartling
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| |
Collapse
|
45
|
Aucoin M, LaChance L, Cooley K, Kidd S. Diet and Psychosis: A Scoping Review. Neuropsychobiology 2020; 79:20-42. [PMID: 30359969 DOI: 10.1159/000493399] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 08/29/2018] [Indexed: 12/11/2022]
Abstract
INTRODUCTION Schizophrenia spectrum disorders (SSD) represent a cluster of severe mental illnesses. Diet has been identified as a modifiable risk factor and opportunity for intervention in many physical illnesses and more recently in mental illnesses such as unipolar depression; however, no dietary guidelines exist for patients with SSD. OBJECTIVE This review sought to systematically scope the existing literature in order to identify nutritional interventions for the prevention or treatment of mental health symptoms in SSD as well as gaps and opportunities for further research. METHODS This review followed established methodological approaches for scoping reviews including an extensive a priori search strategy and duplicate screening. Because of the large volume of results, an online program (Abstrackr) was used for screening and tagging. Data were extracted based on the dietary constituents and analyzed. RESULTS Of 55,330 results identified by the search, 822 studies met the criteria for inclusion. Observational evidence shows a connection between the presence of psychotic disorders and poorer quality dietary patterns, higher intake of refined carbohydrates and total fat, and lower intake or levels of fibre, ω-3 and ω-6 fatty acids, vegetables, fruit, and certain vitamins and minerals (vitamin B12 and B6, folate, vitamin C, zinc, and selenium). Evidence illustrates a role of food allergy and sensitivity as well as microbiome composition and specific phytonutrients (such as L-theanine, sulforaphane, and resveratrol). Experimental studies have demonstrated benefit using healthy diet patterns and specific vitamins and minerals (vitamin B12 and B6, folate, and zinc) and amino acids (serine, lysine, glycine, and tryptophan). DISCUSSION Overall, these findings were consistent with many other bodies of knowledge about healthy dietary patterns. Many limitations exist related to the design of the individual studies and the ability to extrapolate the results of studies using dietary supplements to dietary interventions (food). Dietary recommendations are presented as well as recommendations for further research including more prospective observational studies and intervention studies that modify diet constituents or entire dietary patterns with statistical power to detect mental health outcomes.
Collapse
Affiliation(s)
- Monique Aucoin
- Canadian College of Naturopathic Medicine, Toronto, Ontario, Canada,
| | - Laura LaChance
- Centre for Addiction and Mental Health, Toronto, Ontario, Canada.,University of Toronto, Toronto, Ontario, Canada
| | - Kieran Cooley
- Canadian College of Naturopathic Medicine, Toronto, Ontario, Canada.,Australian Research Centre in Complementary and Integrative Medicine, University of Technology, Sydney, New South Wales, Australia
| | - Sean Kidd
- Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| |
Collapse
|
46
|
Hinkelbein J, Kerkhoff S, Adler C, Ahlbäck A, Braunecker S, Burgard D, Cirillo F, De Robertis E, Glaser E, Haidl TK, Hodkinson P, Iovino IZ, Jansen S, Johnson KVL, Jünger S, Komorowski M, Leary M, Mackaill C, Nagrebetsky A, Neuhaus C, Rehnberg L, Romano GM, Russomano T, Schmitz J, Spelten O, Starck C, Thierry S, Velho R, Warnecke T. Cardiopulmonary resuscitation (CPR) during spaceflight - a guideline for CPR in microgravity from the German Society of Aerospace Medicine (DGLRM) and the European Society of Aerospace Medicine Space Medicine Group (ESAM-SMG). Scand J Trauma Resusc Emerg Med 2020; 28:108. [PMID: 33138865 PMCID: PMC7607644 DOI: 10.1186/s13049-020-00793-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 10/07/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With the "Artemis"-mission mankind will return to the Moon by 2024. Prolonged periods in space will not only present physical and psychological challenges to the astronauts, but also pose risks concerning the medical treatment capabilities of the crew. So far, no guideline exists for the treatment of severe medical emergencies in microgravity. We, as a international group of researchers related to the field of aerospace medicine and critical care, took on the challenge and developed a an evidence-based guideline for the arguably most severe medical emergency - cardiac arrest. METHODS After the creation of said international group, PICO questions regarding the topic cardiopulmonary resuscitation in microgravity were developed to guide the systematic literature research. Afterwards a precise search strategy was compiled which was then applied to "MEDLINE". Four thousand one hundred sixty-five findings were retrieved and consecutively screened by at least 2 reviewers. This led to 88 original publications that were acquired in full-text version and then critically appraised using the GRADE methodology. Those studies formed to basis for the guideline recommendations that were designed by at least 2 experts on the given field. Afterwards those recommendations were subject to a consensus finding process according to the DELPHI-methodology. RESULTS We recommend a differentiated approach to CPR in microgravity with a division into basic life support (BLS) and advanced life support (ALS) similar to the Earth-based guidelines. In immediate BLS, the chest compression method of choice is the Evetts-Russomano method (ER), whereas in an ALS scenario, with the patient being restrained on the Crew Medical Restraint System, the handstand method (HS) should be applied. Airway management should only be performed if at least two rescuers are present and the patient has been restrained. A supraglottic airway device should be used for airway management where crew members untrained in tracheal intubation (TI) are involved. DISCUSSION CPR in microgravity is feasible and should be applied according to the Earth-based guidelines of the AHA/ERC in relation to fundamental statements, like urgent recognition and action, focus on high-quality chest compressions, compression depth and compression-ventilation ratio. However, the special circumstances presented by microgravity and spaceflight must be considered concerning central points such as rescuer position and methods for the performance of chest compressions, airway management and defibrillation.
Collapse
Affiliation(s)
- Jochen Hinkelbein
- German Society of Aviation and Space Medicine (DGLRM), Munich, Germany. .,Department of Anaesthesiology and Intensive Care Medicine, University Hospital of Cologne, 50937, Cologne, Germany. .,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.
| | - Steffen Kerkhoff
- German Society of Aviation and Space Medicine (DGLRM), Munich, Germany.,Department of Anaesthesiology and Intensive Care Medicine, University Hospital of Cologne, 50937, Cologne, Germany.,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany
| | - Christoph Adler
- Department of Internal Medicine III, Heart Centre of the University of Cologne, Cologne, Germany.,Fire Department City of Cologne, Institute for Security Science and Rescue Technology, Cologne, Germany
| | - Anton Ahlbäck
- Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Department of Anaesthesia and Intensive Care, Örebro University Hospital, Örebro, Sweden
| | - Stefan Braunecker
- Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Department of Anesthesiology, University of Florida College of Medicine, Jacksonville, FL, USA
| | - Daniel Burgard
- Department of Cardiology and Angiology, Heart Center Duisburg, Evangelisches Klinikum Niederrhein, Duisburg, Germany
| | - Fabrizio Cirillo
- Department of Anaesthesia and Intensive Care, Santa Maria delle Grazie Hospital, Pozzuoli, Naples, Italy
| | - Edoardo De Robertis
- Division of Anaesthesia, Analgesia, and Intensive Care, Department of Surgical and Biomedical Sciences, University of Perugia, Perugia, Italy
| | - Eckard Glaser
- German Society of Aviation and Space Medicine (DGLRM), Munich, Germany.,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,, Gerbrunn, Germany
| | - Theresa K Haidl
- Department of Psychiatry and Psychotherapy, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50937, Cologne, Germany
| | - Pete Hodkinson
- Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Aerospace Medicine, Centre of Human and Applied Physiological Sciences, King's College, London, UK
| | - Ivan Zefiro Iovino
- Department of Anaesthesia and Intensive Care, Santa Maria delle Grazie Hospital, Pozzuoli, Naples, Italy
| | - Stefanie Jansen
- Department of Otorhinolaryngology, Head and Neck Surgery, University of Cologne, 50937, Cologne, Germany
| | | | - Saskia Jünger
- Cologne Center for Ethics, Rights, Economics, and Social Sciences of Health (CERES), University of Cologne and University Hospital of Cologne, Cologne, Germany
| | - Matthieu Komorowski
- Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, Exhibition road, London, SW7 2AZ, UK
| | - Marion Leary
- School of Nursing, University of Pennsylvania, Philadelphia, PA, USA
| | - Christina Mackaill
- Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Accident and Emergency Department, Queen Elizabeth University Hospital, Glasgow, Scotland
| | - Alexander Nagrebetsky
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, USA
| | - Christopher Neuhaus
- German Society of Aviation and Space Medicine (DGLRM), Munich, Germany.,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Department of Anesthesiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Lucas Rehnberg
- University Hospital Southampton NHS Foundation Trust, Anaesthetic Department, Southampton, UK
| | | | - Thais Russomano
- Centre of Human and Applied Physiological Sciences, Kings College London, London, UK
| | - Jan Schmitz
- German Society of Aviation and Space Medicine (DGLRM), Munich, Germany.,Department of Anaesthesiology and Intensive Care Medicine, University Hospital of Cologne, 50937, Cologne, Germany.,Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany
| | - Oliver Spelten
- Department of Anaesthesiology and Intensive Care Medicine, Schön Klinik Düsseldorf, Am Heerdter Krankenhaus 2, 40549, Düsseldorf, Germany
| | - Clément Starck
- Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Anesthesiology Department, Brest University Hospital, Brest, France
| | - Seamus Thierry
- Space Medicine Group, European Society of Aerospace Medicine (ESAM), Cologne, Germany.,Anesthesiology Department, Bretagne Sud General Hospital, Lorient, France.,Medical and Maritime Simulation Center, Lorient, France.,Laboratory of Psychology, Cognition, Communication and Behavior, University of Bretagne Sud, Vannes, France
| | - Rochelle Velho
- Academic Department of Anaesthesia, Critical Care, Pain and Resuscitation, University Hospitals Birmingham, Heart of England NHS Foundation Trust, Birmingham, UK
| | - Tobias Warnecke
- University Department for Anesthesia, Intensive and Emergency Medicine and Pain Management, Hospital Oldenburg, Oldenburg, Germany
| |
Collapse
|
47
|
Reddy SM, Patel S, Weyrich M, Fenton J, Viswanathan M. Comparison of a traditional systematic review approach with review-of-reviews and semi-automation as strategies to update the evidence. Syst Rev 2020; 9:243. [PMID: 33076975 PMCID: PMC7574591 DOI: 10.1186/s13643-020-01450-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 08/07/2020] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The exponential growth of the biomedical literature necessitates investigating strategies to reduce systematic reviewer burden while maintaining the high standards of systematic review validity and comprehensiveness. METHODS We compared the traditional systematic review screening process with (1) a review-of-reviews (ROR) screening approach and (2) a semi-automation screening approach using two publicly available tools (RobotAnalyst and AbstrackR) and different types of training sets (randomly selected citations subjected to dual-review at the title-abstract stage, highly curated citations dually reviewed at the full-text stage, and a combination of the two). We evaluated performance measures of sensitivity, specificity, missed citations, and workload burden RESULTS: The ROR approach for treatments of early-stage prostate cancer had a poor sensitivity (0.54) and studies missed by the ROR approach tended to be of head-to-head comparisons of active treatments, observational studies, and outcomes of physical harms and quality of life. Title and abstract screening incorporating semi-automation only resulted in a sensitivity of 100% at high levels of reviewer burden (review of 99% of citations). A highly curated, smaller-sized, training set (n = 125) performed similarly to a larger training set of random citations (n = 938). CONCLUSION Two approaches to rapidly update SRs-review-of-reviews and semi-automation-failed to demonstrate reduced workload burden while maintaining an acceptable level of sensitivity. We suggest careful evaluation of the ROR approach through comparison of inclusion criteria and targeted searches to fill evidence gaps as well as further research of semi-automation use, including more study of highly curated training sets.
Collapse
Affiliation(s)
- Shivani M. Reddy
- RTI International, 307 Waverly Oaks Road, Suite 101, Waltham, MA 02452 USA
| | - Sheila Patel
- RTI International, 3040 East Cornwallis Road, Research Triangle Park, NC 27709 USA
| | - Meghan Weyrich
- UC Davis, Center for Healthcare Policy and Research, 2103 Stockton Blvd., Sacramento, CA 95817 USA
| | - Joshua Fenton
- UC Davis, Center for Healthcare Policy and Research, 2103 Stockton Blvd., Sacramento, CA 95817 USA
| | - Meera Viswanathan
- RTI International, 3040 East Cornwallis Road, Research Triangle Park, NC 27709 USA
| |
Collapse
|
48
|
An evaluation of DistillerSR's machine learning-based prioritization tool for title/abstract screening - impact on reviewer-relevant outcomes. BMC Med Res Methodol 2020; 20:256. [PMID: 33059590 PMCID: PMC7559198 DOI: 10.1186/s12874-020-01129-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 09/22/2020] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Systematic reviews often require substantial resources, partially due to the large number of records identified during searching. Although artificial intelligence may not be ready to fully replace human reviewers, it may accelerate and reduce the screening burden. Using DistillerSR (May 2020 release), we evaluated the performance of the prioritization simulation tool to determine the reduction in screening burden and time savings. METHODS Using a true recall @ 95%, response sets from 10 completed systematic reviews were used to evaluate: (i) the reduction of screening burden; (ii) the accuracy of the prioritization algorithm; and (iii) the hours saved when a modified screening approach was implemented. To account for variation in the simulations, and to introduce randomness (through shuffling the references), 10 simulations were run for each review. Means, standard deviations, medians and interquartile ranges (IQR) are presented. RESULTS Among the 10 systematic reviews, using true recall @ 95% there was a median reduction in screening burden of 47.1% (IQR: 37.5 to 58.0%). A median of 41.2% (IQR: 33.4 to 46.9%) of the excluded records needed to be screened to achieve true recall @ 95%. The median title/abstract screening hours saved using a modified screening approach at a true recall @ 95% was 29.8 h (IQR: 28.1 to 74.7 h). This was increased to a median of 36 h (IQR: 32.2 to 79.7 h) when considering the time saved not retrieving and screening full texts of the remaining 5% of records not yet identified as included at title/abstract. Among the 100 simulations (10 simulations per review), none of these 5% of records were a final included study in the systematic review. The reduction in screening burden to achieve true recall @ 95% compared to @ 100% resulted in a reduced screening burden median of 40.6% (IQR: 38.3 to 54.2%). CONCLUSIONS The prioritization tool in DistillerSR can reduce screening burden. A modified or stop screening approach once a true recall @ 95% is achieved appears to be a valid method for rapid reviews, and perhaps systematic reviews. This needs to be further evaluated in prospective reviews using the estimated recall.
Collapse
|
49
|
Bougioukas KI, Bouras EC, Avgerinos KI, Dardavessis T, Haidich A. How to keep up to date with medical information using web‐based resources: a systematised review and narrative synthesis. Health Info Libr J 2020; 37:254-292. [DOI: 10.1111/hir.12318] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/20/2020] [Indexed: 12/30/2022]
Affiliation(s)
- Konstantinos I. Bougioukas
- Department of Hygiene Social‐Preventive Medicine and Medical Statistics Medical School Aristotle University of Thessaloniki Thessaloniki Greece
| | - Emmanouil C. Bouras
- Department of Hygiene Social‐Preventive Medicine and Medical Statistics Medical School Aristotle University of Thessaloniki Thessaloniki Greece
| | | | - Theodore Dardavessis
- Department of Hygiene Social‐Preventive Medicine and Medical Statistics Medical School Aristotle University of Thessaloniki Thessaloniki Greece
| | - Anna‐Bettina Haidich
- Department of Hygiene Social‐Preventive Medicine and Medical Statistics Medical School Aristotle University of Thessaloniki Thessaloniki Greece
| |
Collapse
|
50
|
Deng Z, Yin K, Bao Y, Armengol VD, Wang C, Tiwari A, Barzilay R, Parmigiani G, Braun D, Hughes KS. Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance. JCO Clin Cancer Inform 2020; 3:1-9. [PMID: 31419182 DOI: 10.1200/cci.19.00043] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes-that is, penetrance-enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) -based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure. MATERIALS AND METHODS We compared the semiautomated NLP-based procedure, which involves automated abstract classification and text mining, followed by human review of identified studies, with the traditional procedure that requires human review of all studies. Ten high-quality gene-cancer penetrance meta-analyses spanning 16 gene-cancer associations were used as the gold standard by which to evaluate the performance of our procedure. For each meta-analysis, we evaluated the number of abstracts that required human review (workload) and the ability to identify the studies that were included by the authors in their quantitative analysis (coverage). RESULTS Compared with the traditional procedure, the semiautomated NLP-based procedure led to a lower workload across all 10 meta-analyses, with an overall 84% reduction (2,774 abstracts v 16,941 abstracts) in the amount of human review required. Overall coverage was 93%-we are able to identify 132 of 142 studies-before reviewing references of identified studies. Reasons for the 10 missed studies included blank and poorly written abstracts. After reviewing references, nine of the previously missed studies were identified and coverage improved to 99% (141 of 142 studies). CONCLUSION We demonstrated that an NLP-based procedure can significantly reduce the review workload without compromising the ability to identify relevant studies. NLP algorithms have promising potential for reducing human efforts in the literature review process.
Collapse
Affiliation(s)
| | - Kanhua Yin
- Massachusetts General Hospital, Boston, MA
| | - Yujia Bao
- Massachusetts Institute of Technology, Boston, MA
| | | | - Cathy Wang
- Harvard TH Chan School of Public Health, Boston, MA.,Dana-Farber Cancer Institute, Boston, MA
| | | | | | - Giovanni Parmigiani
- Harvard TH Chan School of Public Health, Boston, MA.,Dana-Farber Cancer Institute, Boston, MA
| | - Danielle Braun
- Harvard TH Chan School of Public Health, Boston, MA.,Dana-Farber Cancer Institute, Boston, MA
| | - Kevin S Hughes
- Massachusetts General Hospital, Boston, MA.,Harvard Medical School, Boston, MA
| |
Collapse
|