1
|
Webster J, Ghith J, Penner O, Lieu CH, Schijvenaars BJA. Using Artificial Intelligence to Support Informed Decision-Making on BRAF Mutation Testing. JCO Precis Oncol 2024; 8:e2300685. [PMID: 39475660 DOI: 10.1200/po.23.00685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 07/17/2024] [Accepted: 08/07/2024] [Indexed: 11/07/2024] Open
Abstract
PURPOSE Precision oncology relies on accurate and interpretable reporting of testing and mutation rates. Focusing on the BRAFV600 mutations in advanced colorectal carcinoma, non-small-cell lung carcinoma, and cutaneous melanoma, we developed a platform displaying testing and mutation rates reported in the literature, which we annotated using an artificial intelligence (AI) and natural language processing (NLP) pipeline. METHODS Using AI, we identified publications that likely reported a testing or mutation rate, filtered publications for cancer type, and identified sentences that likely reported rates. Rates and covariates were subsequently manually curated by three experts. The AI performance was evaluated using precision and recall metrics. We used an interactive platform to explore and present the annotated testing and mutation rates by certain study characteristics. RESULTS The interactive dashboard, accessible at the BRAF dimensions website, enables users to filter mutation and testing rates with relevant options (eg, country of study, study type, mutation type) and to visualize annotated rates. The AI pipeline demonstrated excellent filtering performance (>90% precision and recall for all target cancer types) and moderate performance for sentence classification (53%-99% precision; ≥75% recall). The manual annotation of testing and mutation rates revealed inter-rater disagreement (testing rate, 19%; mutation rate, 70%), indicating unclear or nonstandard reporting of rates in some publications. CONCLUSION Our AI-driven NLP pipeline demonstrated the potential for annotating biomarker testing and mutation rates. The difficulties we encountered highlight the need for more advanced AI-powered literature searching and data extraction, and more consistent reporting of testing rates. These improvements would reduce the risk of misinterpretation or misunderstanding of testing and mutation rates by AI-based technologies and the health care community, with beneficial impacts on clinical decision-making, research, and trial design.
Collapse
|
2
|
Matsui K, Utsumi T, Aoki Y, Maruki T, Takeshima M, Takaesu Y. Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews. J Med Internet Res 2024; 26:e52758. [PMID: 39151163 PMCID: PMC11364944 DOI: 10.2196/52758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/10/2024] [Accepted: 06/25/2024] [Indexed: 08/18/2024] Open
Abstract
BACKGROUND The screening process for systematic reviews is resource-intensive. Although previous machine learning solutions have reported reductions in workload, they risked excluding relevant papers. OBJECTIVE We evaluated the performance of a 3-layer screening method using GPT-3.5 and GPT-4 to streamline the title and abstract-screening process for systematic reviews. Our goal is to develop a screening method that maximizes sensitivity for identifying relevant records. METHODS We conducted screenings on 2 of our previous systematic reviews related to the treatment of bipolar disorder, with 1381 records from the first review and 3146 from the second. Screenings were conducted using GPT-3.5 (gpt-3.5-turbo-0125) and GPT-4 (gpt-4-0125-preview) across three layers: (1) research design, (2) target patients, and (3) interventions and controls. The 3-layer screening was conducted using prompts tailored to each study. During this process, information extraction according to each study's inclusion criteria and optimization for screening were carried out using a GPT-4-based flow without manual adjustments. Records were evaluated at each layer, and those meeting the inclusion criteria at all layers were subsequently judged as included. RESULTS On each layer, both GPT-3.5 and GPT-4 were able to process about 110 records per minute, and the total time required for screening the first and second studies was approximately 1 hour and 2 hours, respectively. In the first study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.900/0.709 and 0.806/0.996, respectively. Both screenings by GPT-3.5 and GPT-4 judged all 6 records used for the meta-analysis as included. In the second study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.958/0.116 and 0.875/0.855, respectively. The sensitivities for the relevant records align with those of human evaluators: 0.867-1.000 for the first study and 0.776-0.979 for the second study. Both screenings by GPT-3.5 and GPT-4 judged all 9 records used for the meta-analysis as included. After accounting for justifiably excluded records by GPT-4, the sensitivities/specificities of the GPT-4 screening were 0.962/0.996 in the first study and 0.943/0.855 in the second study. Further investigation indicated that the cases incorrectly excluded by GPT-3.5 were due to a lack of domain knowledge, while the cases incorrectly excluded by GPT-4 were due to misinterpretations of the inclusion criteria. CONCLUSIONS Our 3-layer screening method with GPT-4 demonstrated acceptable level of sensitivity and specificity that supports its practical application in systematic review screenings. Future research should aim to generalize this approach and explore its effectiveness in diverse settings, both medical and nonmedical, to fully establish its use and operational feasibility.
Collapse
Affiliation(s)
- Kentaro Matsui
- Department of Clinical Laboratory, National Center Hospital, National Center of Neurology and Psychiatry, Kodaira, Japan
- Department of Sleep-Wake Disorders, National Institute of Mental Health, National Center of Neurology and Psychiatry, Kodaira, Japan
| | - Tomohiro Utsumi
- Department of Sleep-Wake Disorders, National Institute of Mental Health, National Center of Neurology and Psychiatry, Kodaira, Japan
- Department of Psychiatry, The Jikei University School of Medicine, Tokyo, Japan
| | - Yumi Aoki
- Graduate School of Nursing Science, St. Luke's International University, Tokyo, Japan
| | - Taku Maruki
- Department of Neuropsychiatry, Kyorin University School of Medicine, Tokyo, Japan
| | - Masahiro Takeshima
- Department of Neuropsychiatry, Akita University Graduate School of Medicine, Akita, Japan
| | - Yoshikazu Takaesu
- Department of Neuropsychiatry, Graduate School of Medicine, University of the Ryukyus, Okinawa, Japan
| |
Collapse
|
3
|
Affengruber L, Nussbaumer-Streit B, Hamel C, Van der Maten M, Thomas J, Mavergames C, Spijker R, Gartlehner G. Rapid review methods series: Guidance on the use of supportive software. BMJ Evid Based Med 2024; 29:264-271. [PMID: 38242566 PMCID: PMC11287527 DOI: 10.1136/bmjebm-2023-112530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/22/2023] [Indexed: 01/21/2024]
Abstract
This paper is part of a series of methodological guidance from the Cochrane Rapid Reviews Methods Group. Rapid reviews (RRs) use modified systematic review methods to accelerate the review process while maintaining systematic, transparent and reproducible methods. This paper guides how to use supportive software for RRs.We strongly encourage the use of supportive software throughout RR production. Specifically, we recommend (1) using collaborative online platforms that enable working in parallel, allow for real-time project management and centralise review details; (2) using automation software to support, but not entirely replace a human reviewer and human judgement and (3) being transparent in reporting the methodology and potential risk for bias due to the use of supportive software.
Collapse
Affiliation(s)
- Lisa Affengruber
- Department for Evidence-based Medicine and Evaluation, Cochrane Austria, University for Continuing Education Krems, Krems, Austria
- Department of Family Medicine, Maastricht University, Maastricht, The Netherlands
| | - Barbara Nussbaumer-Streit
- Department for Evidence-based Medicine and Evaluation, Cochrane Austria, University for Continuing Education Krems, Krems, Austria
| | - Candyce Hamel
- Canadian Association of Radiologists, Ottawa, Ontario, Canada
- School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - Miriam Van der Maten
- Knowledge Institute, Dutch Association of Medical Specialists, Utrecht, The Netherlands
| | - James Thomas
- University College London, UCL Social Research Institute, London, UK
| | | | - Rene Spijker
- Cochrane Netherlands, Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Gerald Gartlehner
- Department for Evidence-based Medicine and Evaluation, Cochrane Austria, University for Continuing Education Krems, Krems, Austria
- Center for Public Health Methods, RTI International, Research Triangle Park, North Carolina, USA
| |
Collapse
|
4
|
Tóth B, Berek L, Gulácsi L, Péntek M, Zrubka Z. Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed. Syst Rev 2024; 13:174. [PMID: 38978132 PMCID: PMC11229257 DOI: 10.1186/s13643-024-02592-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/20/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND The demand for high-quality systematic literature reviews (SRs) for evidence-based medical decision-making is growing. SRs are costly and require the scarce resource of highly skilled reviewers. Automation technology has been proposed to save workload and expedite the SR workflow. We aimed to provide a comprehensive overview of SR automation studies indexed in PubMed, focusing on the applicability of these technologies in real world practice. METHODS In November 2022, we extracted, combined, and ran an integrated PubMed search for SRs on SR automation. Full-text English peer-reviewed articles were included if they reported studies on SR automation methods (SSAM), or automated SRs (ASR). Bibliographic analyses and knowledge-discovery studies were excluded. Record screening was performed by single reviewers, and the selection of full text papers was performed in duplicate. We summarized the publication details, automated review stages, automation goals, applied tools, data sources, methods, results, and Google Scholar citations of SR automation studies. RESULTS From 5321 records screened by title and abstract, we included 123 full text articles, of which 108 were SSAM and 15 ASR. Automation was applied for search (19/123, 15.4%), record screening (89/123, 72.4%), full-text selection (6/123, 4.9%), data extraction (13/123, 10.6%), risk of bias assessment (9/123, 7.3%), evidence synthesis (2/123, 1.6%), assessment of evidence quality (2/123, 1.6%), and reporting (2/123, 1.6%). Multiple SR stages were automated by 11 (8.9%) studies. The performance of automated record screening varied largely across SR topics. In published ASR, we found examples of automated search, record screening, full-text selection, and data extraction. In some ASRs, automation fully complemented manual reviews to increase sensitivity rather than to save workload. Reporting of automation details was often incomplete in ASRs. CONCLUSIONS Automation techniques are being developed for all SR stages, but with limited real-world adoption. Most SR automation tools target single SR stages, with modest time savings for the entire SR process and varying sensitivity and specificity across studies. Therefore, the real-world benefits of SR automation remain uncertain. Standardizing the terminology, reporting, and metrics of study reports could enhance the adoption of SR automation techniques in real-world practice.
Collapse
Affiliation(s)
- Barbara Tóth
- Doctoral School of Innovation Management, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
| | - László Berek
- Doctoral School for Safety and Security, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
- University Library, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
| | - László Gulácsi
- HECON Health Economics Research Center, University Research, and Innovation Center, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
| | - Márta Péntek
- HECON Health Economics Research Center, University Research, and Innovation Center, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary
| | - Zsombor Zrubka
- HECON Health Economics Research Center, University Research, and Innovation Center, Óbuda University, Bécsi út 96/B, Budapest, 1034, Hungary.
| |
Collapse
|
5
|
Guo Q, Jiang G, Zhao Q, Long Y, Feng K, Gu X, Xu Y, Li Z, Huang J, Du L. Rapid review: A review of methods and recommendations based on current evidence. J Evid Based Med 2024; 17:434-453. [PMID: 38512942 DOI: 10.1111/jebm.12594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 02/28/2024] [Indexed: 03/23/2024]
Abstract
Rapid review (RR) could accelerate the traditional systematic review (SR) process by simplifying or omitting steps using various shortcuts. With the increasing popularity of RR, numerous shortcuts had emerged, but there was no consensus on how to choose the most appropriate ones. This study conducted a literature search in PubMed from inception to December 21, 2023, using terms such as "rapid review" "rapid assessment" "rapid systematic review" and "rapid evaluation". We also scanned the reference lists and performed citation tracking of included impact studies to obtain more included studies. We conducted a narrative synthesis of all RR approaches, shortcuts and studies assessing their effectiveness at each stage of RRs. Based on the current evidence, we provided recommendations on utilizing certain shortcuts in RRs. Ultimately, we identified 185 studies focusing on summarizing RR approaches and shortcuts, or evaluating their impact. There was relatively sufficient evidence to support the use of the following shortcuts in RRs: limiting studies to those published in English-language; conducting abbreviated database searches (e.g., only searching PubMed/MEDLINE, Embase, and CENTRAL); omitting retrieval of grey literature; restricting the search timeframe to the recent 20 years for medical intervention and the recent 15 years for reviewing diagnostic test accuracy; conducting a single screening by an experienced screener. To some extent, the above shortcuts were also applicable to SRs. This study provided a reference for future RR researchers in selecting shortcuts, and it also presented a potential research topic for methodologists.
Collapse
Affiliation(s)
- Qiong Guo
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- West China Medical Publishers, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Guiyu Jiang
- West China School of Public Health, Sichuan University, Chengdu, P. R. China
| | - Qingwen Zhao
- West China School of Public Health, Sichuan University, Chengdu, P. R. China
| | - Youlin Long
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Kun Feng
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Xianlin Gu
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Yihan Xu
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
- Center for education of medical humanities, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Zhengchi Li
- Center for education of medical humanities, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Jin Huang
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
| | - Liang Du
- Innovation Institute for Integration of Medicine and Engineering, West China Hospital, Sichuan University, Chengdu, P. R. China
- West China Medical Publishers, West China Hospital, Sichuan University, Chengdu, P. R. China
- Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, P. R. China
| |
Collapse
|
6
|
Oerbekke MS, Elbers RG, van der Laan MJ, Hooft L. Designing tailored maintenance strategies for systematic reviews and clinical practice guidelines using the Portfolio Maintenance by Test-Treatment (POMBYTT) framework. BMC Med Res Methodol 2024; 24:29. [PMID: 38308228 PMCID: PMC10835980 DOI: 10.1186/s12874-024-02155-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 01/18/2024] [Indexed: 02/04/2024] Open
Abstract
BACKGROUND Organizations face diverse contexts and requirements when updating and maintaining their portfolio, or pool, of systematic reviews or clinical practice guidelines they need to manage. We aimed to develop a comprehensive, theoretical framework that might enable the design and tailoring of maintenance strategies for portfolios containing systematic reviews and guidelines. METHODS We employed a conceptual approach combined with a literature review. Components of the diagnostic test-treatment pathway used in clinical healthcare were transferred to develop a framework specifically for systematic review and guideline portfolio maintenance strategies. RESULTS We developed the Portfolio Maintenance by Test-Treatment (POMBYTT) framework comprising diagnosis, staging, management, and monitoring components. To illustrate the framework's components and their elements, we provided examples from both a clinical healthcare test-treatment pathway and a clinical practice guideline maintenance scenario. Additionally, our literature review provided possible examples for the elements in the framework, such as detection variables, detection tests, and detection thresholds. We furthermore provide three example strategies using the framework, of which one was based on living recommendations strategies. CONCLUSIONS The developed framework might support the design of maintenance strategies that could contain multiple options besides updating to manage a portfolio (e.g. withdrawing and archiving), even in the absence of the target condition. By making different choices for variables, tests, test protocols, indications, management options, and monitoring, organizations might tailor their maintenance strategy to suit specific contexts and needs. The framework's elements could potentially aid in the design by being explicit about the operational aspects of maintenance strategies. This might also be helpful for end-users and other stakeholders of systematic reviews and clinical practice guidelines.
Collapse
Affiliation(s)
- Michiel S Oerbekke
- Cochrane Netherlands, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
- Knowledge Institute of the Dutch Association of Medical Specialists, Utrecht, The Netherlands.
| | - Roy G Elbers
- Department of General Practice, Intellectual Disability Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | | | - Lotty Hooft
- Cochrane Netherlands, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
7
|
Yao X, Kumar MV, Su E, Flores Miranda A, Saha A, Sussman J. Evaluating the efficacy of artificial intelligence tools for the automation of systematic reviews in cancer research: A systematic review. Cancer Epidemiol 2024; 88:102511. [PMID: 38071872 DOI: 10.1016/j.canep.2023.102511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 11/23/2023] [Accepted: 11/29/2023] [Indexed: 01/27/2024]
Abstract
To evaluate the performance accuracy and workload savings of artificial intelligence (AI)-based automation tools in comparison with human reviewers in medical literature screening for systematic reviews (SR) of primary studies in cancer research in order to gain insights on improving the efficiency of producing SRs. Medline, Embase, the Cochrane Library, and PROSPERO databases were searched from inception to November 30, 2022. Then, forward and backward literature searches were completed, and the experts in this field including the authors of the articles included were contacted for a thorough grey literature search. This SR was registered on PROSPERO (CRD 42023384772). Among the 3947 studies obtained from search, five studies met the preplanned study selection criteria. These five studies evaluated four AI tools: Abstrackr (four studies), RobotAnalyst (one), EPPI-Reviewer (one), and DistillerSR (one). Without missing final included citations, Abstrackr eliminated 20%-88% of titles and abstracts (time saving of 7-86 hours) and 59% of the full-texts (62 h) from human review across four different cancer-related SRs. In comparison, RobotAnalyst (1% of titles and abstracts, 1 h), EPPI Review (38% of titles and abstracts, 58 h; 59% of full-texts, 62 h), DistillerSR (42% of titles and abstracts, 22 h) also provided similar or lower work savings for single cancer-related SRs. AI-based automation tools exhibited promising but varying levels of accuracy and efficiency during the screening process of medical literature for conducting SRs in the cancer field. Until further progress is made and thorough evaluations are conducted, AI tools should be utilized as supplementary aids rather than complete substitutes for human reviewers.
Collapse
Affiliation(s)
- Xiaomei Yao
- Department of Oncology, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada; Center for Clinical Practice Guideline Conduction and Evaluation, Children's Hospital of Fudan University, Shanghai, China.
| | - Mithilesh V Kumar
- Faculty of Engineering, McMaster University, Hamilton, ON, Canada; Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Esther Su
- Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | | | - Ashirbani Saha
- Department of Oncology, McMaster University, Hamilton, Ontario, Canada; Escarpment Cancer Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
| | - Jonathan Sussman
- Department of Oncology, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
8
|
Burgard T, Bittermann A. Reducing Literature Screening Workload With Machine Learning. ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY 2023. [DOI: 10.1027/2151-2604/a000509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Abstract. In our era of accelerated accumulation of knowledge, the manual screening of literature for eligibility is increasingly becoming too labor-intensive for summarizing the current state of knowledge in a timely manner. Recent advances in machine learning and natural language processing promise to reduce the screening workload by automatically detecting unseen references with a high probability of inclusion. As a variety of tools have been developed, the current review provides an overview of their characteristics and performance. A systematic search in various databases yielded 488 eligible reports, revealing 15 tools for screening automation that differed in methodology, features, and accessibility. For the review on the performance of screening tools, 21 studies could be included. In comparison to sampling records randomly, active screening with prioritization approximately halves the screening workload. However, a comparison of tools under equal or at least similar conditions is needed to derive clear recommendations.
Collapse
Affiliation(s)
- Tanja Burgard
- Research Synthesis Methods, Leibniz Institute for Psychology (ZPID), Trier, Germany
| | - André Bittermann
- Big Data, Leibniz Institute for Psychology (ZPID), Trier, Germany
| |
Collapse
|
9
|
Hyams TC, Luo L, Hair B, Lee K, Lu Z, Seminara D. Machine Learning Approach to Facilitate Knowledge Synthesis at the Intersection of Liver Cancer, Epidemiology, and Health Disparities Research. JCO Clin Cancer Inform 2022; 6:e2100129. [PMID: 35623021 DOI: 10.1200/cci.21.00129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Liver cancer is a global challenge, and disparities exist across multiple domains and throughout the disease continuum. However, liver cancer's global epidemiology and etiology are shifting, and the literature is rapidly evolving, presenting a challenge to the synthesis of knowledge needed to identify areas of research needs and to develop research agendas focusing on disparities. Machine learning (ML) techniques can be used to semiautomate the literature review process and improve efficiency. In this study, we detail our approach and provide practical benchmarks for the development of a ML approach to classify literature and extract data at the intersection of three fields: liver cancer, health disparities, and epidemiology. METHODS We performed a six-phase process including: training (I), validating (II), confirming (III), and performing error analysis (IV) for a ML classifier. We then developed an extraction model (V) and applied it (VI) to the liver cancer literature identified through PubMed. We present precision, recall, F1, and accuracy metrics for the classifier and extraction models as appropriate for each phase of the process. We also provide the results for the application of our extraction model. RESULTS With limited training data, we achieved a high degree of accuracy for both our classifier and for the extraction model for liver cancer disparities research literature performed using epidemiologic methods. The disparities concept was the most challenging to accurately classify, and concepts that appeared infrequently in our data set were the most difficult to extract. CONCLUSION We provide a roadmap for using ML to classify and extract comprehensive information on multidisciplinary literature. Our technique can be adapted and modified for other cancers or diseases where disparities persist.
Collapse
Affiliation(s)
- Travis C Hyams
- Office of the Director, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Ling Luo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
| | - Brionna Hair
- Office of the Director, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Kyubum Lee
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
| | - Daniela Seminara
- Office of the Director, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
10
|
A text-mining tool generated title-abstract screening workload savings: performance evaluation versus single-human screening. J Clin Epidemiol 2022; 149:53-59. [DOI: 10.1016/j.jclinepi.2022.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 04/13/2022] [Accepted: 05/24/2022] [Indexed: 11/17/2022]
|
11
|
Hamel C, Hersi M, Kelly SE, Tricco AC, Straus S, Wells G, Pham B, Hutton B. Guidance for using artificial intelligence for title and abstract screening while conducting knowledge syntheses. BMC Med Res Methodol 2021; 21:285. [PMID: 34930132 PMCID: PMC8686081 DOI: 10.1186/s12874-021-01451-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 10/26/2021] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Systematic reviews are the cornerstone of evidence-based medicine. However, systematic reviews are time consuming and there is growing demand to produce evidence more quickly, while maintaining robust methods. In recent years, artificial intelligence and active-machine learning (AML) have been implemented into several SR software applications. As some of the barriers to adoption of new technologies are the challenges in set-up and how best to use these technologies, we have provided different situations and considerations for knowledge synthesis teams to consider when using artificial intelligence and AML for title and abstract screening. METHODS We retrospectively evaluated the implementation and performance of AML across a set of ten historically completed systematic reviews. Based upon the findings from this work and in consideration of the barriers we have encountered and navigated during the past 24 months in using these tools prospectively in our research, we discussed and developed a series of practical recommendations for research teams to consider in seeking to implement AML tools for citation screening into their workflow. RESULTS We developed a seven-step framework and provide guidance for when and how to integrate artificial intelligence and AML into the title and abstract screening process. Steps include: (1) Consulting with Knowledge user/Expert Panel; (2) Developing the search strategy; (3) Preparing your review team; (4) Preparing your database; (5) Building the initial training set; (6) Ongoing screening; and (7) Truncating screening. During Step 6 and/or 7, you may also choose to optimize your team, by shifting some members to other review stages (e.g., full-text screening, data extraction). CONCLUSION Artificial intelligence and, more specifically, AML are well-developed tools for title and abstract screening and can be integrated into the screening process in several ways. Regardless of the method chosen, transparent reporting of these methods is critical for future studies evaluating artificial intelligence and AML.
Collapse
Affiliation(s)
- Candyce Hamel
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
| | - Mona Hersi
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
| | - Shannon E. Kelly
- Cardiovascular Research Methods Centre, University of Ottawa Heart Institute, Ottawa, Ontario Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario Canada
| | - Andrea C. Tricco
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON Canada
- Epidemiology Division and Institute for Health, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario Canada
| | - Sharon Straus
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON Canada
- Department of Medicine, University of Toronto, Toronto, ON Canada
| | - George Wells
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- Cardiovascular Research Methods Centre, University of Ottawa Heart Institute, Ottawa, Ontario Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario Canada
| | - Ba’ Pham
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON Canada
| | - Brian Hutton
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario Canada
| |
Collapse
|
12
|
Kharawala S, Mahajan A, Gandhi P. Artificial intelligence in systematic literature reviews: a case for cautious optimism. J Clin Epidemiol 2021; 138:243-244. [PMID: 33753227 DOI: 10.1016/j.jclinepi.2021.03.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 03/11/2021] [Indexed: 10/21/2022]
|
13
|
Gates A, Gates M, DaRosa D, Elliott SA, Pillay J, Rahman S, Vandermeer B, Hartling L. Decoding semi-automated title-abstract screening: findings from a convenience sample of reviews. Syst Rev 2020; 9:272. [PMID: 33243276 PMCID: PMC7694314 DOI: 10.1186/s13643-020-01528-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 11/11/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening and explored whether Abstrackr's predictions varied by review or study-level characteristics. METHODS For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews), we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool's predictions varied by review and study-level characteristics. RESULTS Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) h of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) - 1.53 (- 2.92, - 0.15) to - 1.17 (- 2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P = 0.37) or intervention type (simple or complex, P = 0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P = 0.01), or that included only trials (95%) vs. multiple designs (86%) (P = 0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P = 0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P = 0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P = 0.02) were more often correctly predicted as relevant. CONCLUSION Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. Several of our findings are paradoxical and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample, the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.
Collapse
Affiliation(s)
- Allison Gates
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Michelle Gates
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Daniel DaRosa
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Sarah A. Elliott
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Jennifer Pillay
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Sholeh Rahman
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Ben Vandermeer
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| | - Lisa Hartling
- Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada
| |
Collapse
|