1
|
Shemilt I, Arno A, Thomas J, Lorenc T, Khouja C, Raine G, Sutcliffe K, Preethy D, Kwan I, Wright K, Sowden A. Cost-effectiveness of Microsoft Academic Graph with machine learning for automated study identification in a living map of coronavirus disease 2019 (COVID-19) research. Wellcome Open Res 2024; 6:210. [PMID: 38686019 PMCID: PMC11056680 DOI: 10.12688/wellcomeopenres.17141.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2024] [Indexed: 05/02/2024] Open
Abstract
Background Identifying new, eligible studies for integration into living systematic reviews and maps usually relies on conventional Boolean updating searches of multiple databases and manual processing of the updated results. Automated searches of one, comprehensive, continuously updated source, with adjunctive machine learning, could enable more efficient searching, selection and prioritisation workflows for updating (living) reviews and maps, though research is needed to establish this. Microsoft Academic Graph (MAG) is a potentially comprehensive single source which also contains metadata that can be used in machine learning to help efficiently identify eligible studies. This study sought to establish whether: (a) MAG was a sufficiently sensitive single source to maintain our living map of COVID-19 research; and (b) eligible records could be identified with an acceptably high level of specificity. Methods We conducted an eight-arm cost-effectiveness analysis to assess the costs, recall and precision of semi-automated workflows, incorporating MAG with adjunctive machine learning, for continually updating our living map. Resource use data (time use) were collected from information specialists and other researchers involved in map production. Our systematic review software, EPPI-Reviewer, was adapted to incorporate MAG and associated machine learning workflows, and also used to collect data on recall, precision, and manual screening workload. Results The semi-automated MAG-enabled workflow dominated conventional workflows in both the base case and sensitivity analyses. At one month our MAG-enabled workflow with machine learning, active learning and fixed screening targets identified 469 additional, eligible articles for inclusion in our living map, and cost £3,179 GBP per week less, compared with conventional methods relying on Boolean searches of Medline and Embase. Conclusions We were able to increase recall and coverage of a large living map, whilst reducing its production costs. This finding is likely to be transferrable to OpenAlex, MAG's successor database platform.
Collapse
Affiliation(s)
- Ian Shemilt
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - Anneliese Arno
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - James Thomas
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - Theo Lorenc
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| | - Claire Khouja
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| | - Gary Raine
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| | - Katy Sutcliffe
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - D'Souza Preethy
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - Irene Kwan
- EPPI-Centre, UCL Social Research Institute, University College London, London, London, WC1H 0NR, UK
| | - Kath Wright
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| | - Amanda Sowden
- Centre for Reviews and Dissemination, University of York, UK, York, Yorkshire, UK
| |
Collapse
|
2
|
Hanegraaf P, Wondimu A, Mosselman JJ, de Jong R, Abogunrin S, Queiros L, Lane M, Postma MJ, Boersma C, van der Schans J. Inter-reviewer reliability of human literature reviewing and implications for the introduction of machine-assisted systematic reviews: a mixed-methods review. BMJ Open 2024; 14:e076912. [PMID: 38508610 PMCID: PMC10952858 DOI: 10.1136/bmjopen-2023-076912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 02/23/2024] [Indexed: 03/22/2024] Open
Abstract
OBJECTIVES Our main objective is to assess the inter-reviewer reliability (IRR) reported in published systematic literature reviews (SLRs). Our secondary objective is to determine the expected IRR by authors of SLRs for both human and machine-assisted reviews. METHODS We performed a review of SLRs of randomised controlled trials using the PubMed and Embase databases. Data were extracted on IRR by means of Cohen's kappa score of abstract/title screening, full-text screening and data extraction in combination with review team size, items screened and the quality of the review was assessed with the A MeaSurement Tool to Assess systematic Reviews 2. In addition, we performed a survey of authors of SLRs on their expectations of machine learning automation and human performed IRR in SLRs. RESULTS After removal of duplicates, 836 articles were screened for abstract, and 413 were screened full text. In total, 45 eligible articles were included. The average Cohen's kappa score reported was 0.82 (SD=0.11, n=12) for abstract screening, 0.77 (SD=0.18, n=14) for full-text screening, 0.86 (SD=0.07, n=15) for the whole screening process and 0.88 (SD=0.08, n=16) for data extraction. No association was observed between the IRR reported and review team size, items screened and quality of the SLR. The survey (n=37) showed overlapping expected Cohen's kappa values ranging between approximately 0.6-0.9 for either human or machine learning-assisted SLRs. No trend was observed between reviewer experience and expected IRR. Authors expect a higher-than-average IRR for machine learning-assisted SLR compared with human based SLR in both screening and data extraction. CONCLUSION Currently, it is not common to report on IRR in the scientific literature for either human and machine learning-assisted SLRs. This mixed-methods review gives first guidance on the human IRR benchmark, which could be used as a minimal threshold for IRR in machine learning-assisted SLRs. PROSPERO REGISTRATION NUMBER CRD42023386706.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Marie Lane
- F. Hoffmann-La Roche, Basel, Switzerland
| | - Maarten J Postma
- Health-Ecore, Zeist, The Netherlands
- Unit of Global Health, Department of Health Sciences, University Medical Center Groningen, Groningen, The Netherlands
- Department of Economics, Econometrics & Finance, University of Groningen, Groningen, Netherlands
| | - Cornelis Boersma
- Health-Ecore, Zeist, The Netherlands
- Unit of Global Health, Department of Health Sciences, University Medical Center Groningen, Groningen, The Netherlands
- Department of Management Sciences, Open University, Heerlen, The Netherlands
| | - Jurjen van der Schans
- Health-Ecore, Zeist, The Netherlands
- Unit of Global Health, Department of Health Sciences, University Medical Center Groningen, Groningen, The Netherlands
- Department of Economics, Econometrics & Finance, University of Groningen, Groningen, Netherlands
- Department of Management Sciences, Open University, Heerlen, The Netherlands
| |
Collapse
|
3
|
Hocking L, Parkinson S, Adams A, Molding Nielsen E, Ang C, de Carvalho Gomes H. Overcoming the challenges of using automated technologies for public health evidence synthesis. Euro Surveill 2023; 28:2300183. [PMID: 37943502 PMCID: PMC10636742 DOI: 10.2807/1560-7917.es.2023.28.45.2300183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 08/10/2023] [Indexed: 11/10/2023] Open
Abstract
Many organisations struggle to keep pace with public health evidence due to the volume of published literature and length of time it takes to conduct literature reviews. New technologies that help automate parts of the evidence synthesis process can help conduct reviews more quickly and efficiently to better provide up-to-date evidence for public health decision making. To date, automated approaches have seldom been used in public health due to significant barriers to their adoption. In this Perspective, we reflect on the findings of a study exploring experiences of adopting automated technologies to conduct evidence reviews within the public health sector. The study, funded by the European Centre for Disease Prevention and Control, consisted of a literature review and qualitative data collection from public health organisations and researchers in the field. We specifically focus on outlining the challenges associated with the adoption of automated approaches and potential solutions and actions that can be taken to mitigate these. We explore these in relation to actions that can be taken by tool developers (e.g. improving tool performance and transparency), public health organisations (e.g. developing staff skills, encouraging collaboration) and funding bodies/the wider research system (e.g. researchers, funding bodies, academic publishers and scholarly journals).
Collapse
|
4
|
Sutton A, O'Keefe H, Johnson EE, Marshall C. A mapping exercise using automated techniques to develop a search strategy to identify systematic review tools. Res Synth Methods 2023; 14:874-881. [PMID: 37669905 DOI: 10.1002/jrsm.1665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/31/2023] [Accepted: 08/04/2023] [Indexed: 09/07/2023]
Abstract
The Systematic Review Toolbox aims provide a web-based catalogue of tools that support various tasks within the systematic review and wider evidence synthesis process. Identifying publications surrounding specific systematic review tools is currently challenging, leading to a high screening burden for few eligible records. We aimed to develop a search strategy that could be regularly and automatically run to identify eligible records for the SR Toolbox, thus reducing time on task and burden for those involved. We undertook a mapping exercise to identify the PubMed IDs of papers indexed within the SR Toolbox. We then used the Yale MeSH Analyser and Visualisation of Similarities (VOS) Viewer text-mining software to identify the most commonly used MeSH terms and text words within the eligible records. These MeSH terms and text words were combined using Boolean Operators into a search strategy for Ovid MEDLINE. Prior to the mapping exercise and search strategy development, 81 software tools and 55 'Other' tools were included within the SR Toolbox. Since implementation of the search strategy, 146 tools have been added. There has been an increase in tools added to the toolbox since the search was developed and its corresponding auto-alert in MEDLINE was originally set up. Developing a search strategy based on a mapping exercise is an effective way of identifying new tools to support the systematic review process. Further research could be conducted to help prioritise records for screening to reduce reviewer burden further and to adapt the strategy for disciplines beyond healthcare.
Collapse
Affiliation(s)
- Anthea Sutton
- Sheffield Centre for Health and Related Research, School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
| | - Hannah O'Keefe
- NIHR Innovation Observatory, Newcastle University, Newcastle, UK
| | - Eugenie Evelynne Johnson
- NIHR Innovation Observatory, Newcastle University, Newcastle, UK
- Population Health Sciences Institute, Newcastle University, Newcastle, UK
| | | |
Collapse
|
5
|
Muller AE, Berg RC, Meneses-Echavez JF, Ames HMR, Borge TC, Jardim PSJ, Cooper C, Rose CJ. The effect of machine learning tools for evidence synthesis on resource use and time-to-completion: protocol for a retrospective pilot study. Syst Rev 2023; 12:7. [PMID: 36650579 PMCID: PMC9843684 DOI: 10.1186/s13643-023-02171-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 01/06/2023] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Machine learning (ML) tools exist that can reduce or replace human activities in repetitive or complex tasks. Yet, ML is underutilized within evidence synthesis, despite the steadily growing rate of primary study publication and the need to periodically update reviews to reflect new evidence. Underutilization may be partially explained by a paucity of evidence on how ML tools can reduce resource use and time-to-completion of reviews. METHODS This protocol describes how we will answer two research questions using a retrospective study design: Is there a difference in resources used to produce reviews using recommended ML versus not using ML, and is there a difference in time-to-completion? We will also compare recommended ML use to non-recommended ML use that merely adds ML use to existing procedures. We will retrospectively include all reviews conducted at our institute from 1 August 2020, corresponding to the commission of the first review in our institute that used ML. CONCLUSION The results of this study will allow us to quantitatively estimate the effect of ML adoption on resource use and time-to-completion, providing our organization and others with better information to make high-level organizational decisions about ML.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Chris Cooper
- Bristol Medical School, University of Bristol, Bristol, UK.,Department of Clinical, Educational and Health Psychology, University College London, London, UK
| | | |
Collapse
|
6
|
Johnson EE, O'Keefe H, Sutton A, Marshall C. The Systematic Review Toolbox: keeping up to date with tools to support evidence synthesis. Syst Rev 2022; 11:258. [PMID: 36457048 PMCID: PMC9713957 DOI: 10.1186/s13643-022-02122-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 11/05/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND The Systematic Review (SR) Toolbox was developed in 2014 to collate tools that can be used to support the systematic review process. Since its inception, the breadth of evidence synthesis methodologies has expanded greatly. This work describes the process of updating the SR Toolbox in 2022 to reflect these changes in evidence synthesis methodology. We also briefly analysed included tools and guidance to identify any potential gaps in what is currently available to researchers. METHODS We manually extracted all guidance and software tools contained within the SR Toolbox in February 2022. A single reviewer, with a second checking a proportion, extracted and analysed information from records contained within the SR Toolbox using Microsoft Excel. Using this spreadsheet and Microsoft Access, the SR Toolbox was updated to reflect expansion of evidence synthesis methodologies and brief analysis conducted. RESULTS The updated version of the SR Toolbox was launched on 13 May 2022, with 235 software tools and 112 guidance documents included. Regarding review families, most software tools (N = 223) and guidance documents (N = 78) were applicable to systematic reviews. However, there were fewer tools and guidance documents applicable to reviews of reviews (N = 66 and N = 22, respectively), while qualitative reviews were less served by guidance documents (N = 19). In terms of review production stages, most guidance documents surrounded quality assessment (N = 70), while software tools related to searching and synthesis (N = 84 and N = 82, respectively). There appears to be a paucity of tools and guidance relating to stakeholder engagement (N = 2 and N = 3, respectively). CONCLUSIONS The SR Toolbox provides a platform for those undertaking evidence syntheses to locate guidance and software tools to support different aspects of the review process across multiple review types. However, this work has also identified potential gaps in guidance and software that could inform future research.
Collapse
Affiliation(s)
- Eugenie Evelynne Johnson
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK. .,NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, UK.
| | - Hannah O'Keefe
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK.,NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, UK
| | - Anthea Sutton
- School of Health and Related Research (ScHARR), The University of Sheffield, Sheffield, UK
| | | |
Collapse
|
7
|
Grisales-Aguirre AM, Figueroa-Vallejo CJ. Modelado de tópicos aplicado al análisis del papel del aprendizaje automático en revisiones sistemáticas. REVISTA DE INVESTIGACIÓN, DESARROLLO E INNOVACIÓN 2022. [DOI: 10.19053/20278306.v12.n2.2022.15271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
El objetivo de la investigación fue analizar el papel del aprendizaje automático de datos en las revisiones sistemáticas de literatura. Se aplicó la técnica de Procesamiento de Lenguaje Natural denominada modelado de tópicos, a un conjunto de títulos y resúmenes recopilados de la base de datos Scopus. Especificamente se utilizó la técnica de Asignación Latente de Dirichlet (LDA), a partir de la cual se lograron descubrir y comprender las temáticas subyacentes en la colección de documentos. Los resultados mostraron la utilidad de la técnica utilizada en la revisión exploratoria de literatura, al permitir agrupar los resultados por temáticas. Igualmente, se pudo identificar las áreas y actividades específicas donde más se ha aplicado el aprendizaje automático, en lo referente a revisiones de literatura. Se concluye que la técnica LDA es una estrategia fácil de utilizar y cuyos resultados permiten abordar una amplia colección de documentos de manera sistemática y coherente, reduciendo notablemente el tiempo de la revisión.
Collapse
|
8
|
Grbin L, Nichols P, Russell F, Fuller-Tyszkiewicz M, Olsson CA. The Development of a Living Knowledge System and Implications for Future Systematic Searching. JOURNAL OF THE AUSTRALIAN LIBRARY AND INFORMATION ASSOCIATION 2022. [DOI: 10.1080/24750158.2022.2087954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Lisa Grbin
- Faculty of Health Library Services, Deakin University, Geelong, Australia
| | - Peter Nichols
- Library Research Services, Deakin University, Geelong, Australia
| | - Fiona Russell
- Faculty of Health Library Services, Deakin University, Geelong, Australia
| | - Matthew Fuller-Tyszkiewicz
- School of Psychology, Deakin University, Geelong, Australia
- Centre for Social and Early Emotional Development, Deakin University, Geelong, Australia
| | - Craig A. Olsson
- School of Psychology, Deakin University, Geelong, Australia
- Centre for Social and Early Emotional Development, Deakin University, Geelong, Australia
| |
Collapse
|
9
|
Hartling L, Gates A. Friend or Foe? The Role of Robots in Systematic Reviews. Ann Intern Med 2022; 175:1045-1046. [PMID: 35635849 DOI: 10.7326/m22-1439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
- Lisa Hartling
- Alberta Research Centre for Health Evidence, Department of Pediatrics, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, Alberta, Canada
| | - Allison Gates
- Alberta Research Centre for Health Evidence, Department of Pediatrics, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
10
|
Arno A, Thomas J, Wallace B, Marshall IJ, McKenzie JE, Elliott JH. Accuracy and Efficiency of Machine Learning-Assisted Risk-of-Bias Assessments in "Real-World" Systematic Reviews : A Noninferiority Randomized Controlled Trial. Ann Intern Med 2022; 175:1001-1009. [PMID: 35635850 DOI: 10.7326/m22-0092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Automation is a proposed solution for the increasing difficulty of maintaining up-to-date, high-quality health evidence. Evidence assessing the effectiveness of semiautomated data synthesis, such as risk-of-bias (RoB) assessments, is lacking. OBJECTIVE To determine whether RobotReviewer-assisted RoB assessments are noninferior in accuracy and efficiency to assessments conducted with human effort only. DESIGN Two-group, parallel, noninferiority, randomized trial. (Monash Research Office Project 11256). SETTING Health-focused systematic reviews using Covidence. PARTICIPANTS Systematic reviewers, who had not previously used RobotReviewer, completing Cochrane RoB assessments between February 2018 and May 2020. INTERVENTION In the intervention group, reviewers received an RoB form prepopulated by RobotReviewer; in the comparison group, reviewers received a blank form. Studies were assigned in a 1:1 ratio via simple randomization to receive RobotReviewer assistance for either Reviewer 1 or Reviewer 2. Participants were blinded to study allocation before starting work on each RoB form. MEASUREMENTS Co-primary outcomes were the accuracy of individual reviewer RoB assessments and the person-time required to complete individual assessments. Domain-level RoB accuracy was a secondary outcome. RESULTS Of the 15 recruited review teams, 7 completed the trial (145 included studies). Integration of RobotReviewer resulted in noninferior overall RoB assessment accuracy (risk difference, -0.014 [95% CI, -0.093 to 0.065]; intervention group: 88.8% accurate assessments; control group: 90.2% accurate assessments). Data were inconclusive for the person-time outcome (RobotReviewer saved 1.40 minutes [CI, -5.20 to 2.41 minutes]). LIMITATION Variability in user behavior and a limited number of assessable reviews led to an imprecise estimate of the time outcome. CONCLUSION In health-related systematic reviews, RoB assessments conducted with RobotReviewer assistance are noninferior in accuracy to those conducted without RobotReviewer assistance. PRIMARY FUNDING SOURCE University College London and Monash University.
Collapse
Affiliation(s)
- Anneliese Arno
- EPPI-Centre, University College London, London, United Kingdom, and School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia (A.A.)
| | - James Thomas
- EPPI-Centre, University College London, London, United Kingdom (J.T.)
| | - Byron Wallace
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts (B.W.)
| | - Iain J Marshall
- School of Population Health and Environmental Sciences, King's College London, London, United Kingdom (I.J.M.)
| | - Joanne E McKenzie
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia (J.E.M., J.H.E.)
| | - Julian H Elliott
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia (J.E.M., J.H.E.)
| |
Collapse
|
11
|
Jardim PSJ, Rose CJ, Ames HM, Echavez JFM, Van de Velde S, Muller AE. Automating risk of bias assessment in systematic reviews: a real-time mixed methods comparison of human researchers to a machine learning system. BMC Med Res Methodol 2022; 22:167. [PMID: 35676632 PMCID: PMC9174024 DOI: 10.1186/s12874-022-01649-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 05/17/2022] [Indexed: 11/17/2022] Open
Abstract
Background Machine learning and automation are increasingly used to make the evidence synthesis process faster and more responsive to policymakers’ needs. In systematic reviews of randomized controlled trials (RCTs), risk of bias assessment is a resource-intensive task that typically requires two trained reviewers. One function of RobotReviewer, an off-the-shelf machine learning system, is an automated risk of bias assessment. Methods We assessed the feasibility of adopting RobotReviewer within a national public health institute using a randomized, real-time, user-centered study. The study included 26 RCTs and six reviewers from two projects examining health and social interventions. We randomized these studies to one of two RobotReviewer platforms. We operationalized feasibility as accuracy, time use, and reviewer acceptability. We measured accuracy by the number of corrections made by human reviewers (either to automated assessments or another human reviewer’s assessments). We explored acceptability through group discussions and individual email responses after presenting the quantitative results. Results Reviewers were equally likely to accept judgment by RobotReviewer as each other’s judgement during the consensus process when measured dichotomously; risk ratio 1.02 (95% CI 0.92 to 1.13; p = 0.33). We were not able to compare time use. The acceptability of the program by researchers was mixed. Less experienced reviewers were generally more positive, and they saw more benefits and were able to use the tool more flexibly. Reviewers positioned human input and human-to-human interaction as superior to even a semi-automation of this process. Conclusion Despite being presented with evidence of RobotReviewer’s equal performance to humans, participating reviewers were not interested in modifying standard procedures to include automation. If further studies confirm equal accuracy and reduced time compared to manual practices, we suggest that the benefits of RobotReviewer may support its future implementation as one of two assessors, despite reviewer ambivalence. Future research should study barriers to adopting automated tools and how highly educated and experienced researchers can adapt to a job market that is increasingly challenged by new technologies. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01649-y.
Collapse
Affiliation(s)
| | - Christopher James Rose
- Division for Health Services, Norwegian Institute of Public Health, Postboks 222 Skøyen, 0213, Oslo, Norway
| | - Heather Melanie Ames
- Division for Health Services, Norwegian Institute of Public Health, Postboks 222 Skøyen, 0213, Oslo, Norway
| | - Jose Francisco Meneses Echavez
- Division for Health Services, Norwegian Institute of Public Health, Postboks 222 Skøyen, 0213, Oslo, Norway.,Facultad de Cultura Física, Deporte y Recreación, Cra. 9 #51-11, Bogotá, Colombia
| | - Stijn Van de Velde
- Division for Health Services, Norwegian Institute of Public Health, Postboks 222 Skøyen, 0213, Oslo, Norway
| | - Ashley Elizabeth Muller
- Division for Health Services, Norwegian Institute of Public Health, Postboks 222 Skøyen, 0213, Oslo, Norway
| |
Collapse
|
12
|
Hamel C, Hersi M, Kelly SE, Tricco AC, Straus S, Wells G, Pham B, Hutton B. Guidance for using artificial intelligence for title and abstract screening while conducting knowledge syntheses. BMC Med Res Methodol 2021; 21:285. [PMID: 34930132 PMCID: PMC8686081 DOI: 10.1186/s12874-021-01451-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 10/26/2021] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Systematic reviews are the cornerstone of evidence-based medicine. However, systematic reviews are time consuming and there is growing demand to produce evidence more quickly, while maintaining robust methods. In recent years, artificial intelligence and active-machine learning (AML) have been implemented into several SR software applications. As some of the barriers to adoption of new technologies are the challenges in set-up and how best to use these technologies, we have provided different situations and considerations for knowledge synthesis teams to consider when using artificial intelligence and AML for title and abstract screening. METHODS We retrospectively evaluated the implementation and performance of AML across a set of ten historically completed systematic reviews. Based upon the findings from this work and in consideration of the barriers we have encountered and navigated during the past 24 months in using these tools prospectively in our research, we discussed and developed a series of practical recommendations for research teams to consider in seeking to implement AML tools for citation screening into their workflow. RESULTS We developed a seven-step framework and provide guidance for when and how to integrate artificial intelligence and AML into the title and abstract screening process. Steps include: (1) Consulting with Knowledge user/Expert Panel; (2) Developing the search strategy; (3) Preparing your review team; (4) Preparing your database; (5) Building the initial training set; (6) Ongoing screening; and (7) Truncating screening. During Step 6 and/or 7, you may also choose to optimize your team, by shifting some members to other review stages (e.g., full-text screening, data extraction). CONCLUSION Artificial intelligence and, more specifically, AML are well-developed tools for title and abstract screening and can be integrated into the screening process in several ways. Regardless of the method chosen, transparent reporting of these methods is critical for future studies evaluating artificial intelligence and AML.
Collapse
Affiliation(s)
- Candyce Hamel
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
| | - Mona Hersi
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
| | - Shannon E. Kelly
- Cardiovascular Research Methods Centre, University of Ottawa Heart Institute, Ottawa, Ontario Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario Canada
| | - Andrea C. Tricco
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON Canada
- Epidemiology Division and Institute for Health, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario Canada
| | - Sharon Straus
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON Canada
- Department of Medicine, University of Toronto, Toronto, ON Canada
| | - George Wells
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- Cardiovascular Research Methods Centre, University of Ottawa Heart Institute, Ottawa, Ontario Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario Canada
| | - Ba’ Pham
- Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON Canada
| | - Brian Hutton
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario Canada
| |
Collapse
|
13
|
Muller AE, Ames HMR, Jardim PSJ, Rose CJ. Machine learning in systematic reviews: Comparing automated text clustering with Lingo3G and human researcher categorization in a rapid review. Res Synth Methods 2021; 13:229-241. [PMID: 34919321 DOI: 10.1002/jrsm.1541] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 11/08/2021] [Accepted: 12/13/2021] [Indexed: 11/08/2022]
Abstract
Systematic reviews are resource-intensive. The machine learning tools being developed mostly focus on the study identification process, but tools to assist in analysis and categorization are also needed. One possibility is to use unsupervised automatic text clustering, in which each study is automatically assigned to one or more meaningful clusters. Our main aim was to assess the usefulness of an automated clustering method, Lingo3G, in categorizing studies in a simplified rapid review, then compare performance (precision and recall) of this method compared to manual categorization. We randomly assigned all 128 studies in a review to be coded by a human researcher blinded to cluster assignment (mimicking two independent researchers) or by a human researcher non-blinded to cluster assignment (mimicking one researcher checking another's work). We compared time use, precision and recall of manual categorization versus automated clustering. Automated clustering and manual categorization organized studies by population and intervention/context. Automated clustering failed to identify two manually identified categories but identified one additional category not identified by the human researcher. We estimate that automated clustering has similar precision to both blinded and non-blinded researchers (e.g., 88% vs. 89%), but higher recall (e.g., 89% vs. 84%). Manual categorization required 49% more time than automated clustering. Using a specific clustering algorithm, automated clustering can be helpful with categorization of and identifying patterns across studies in simpler systematic reviews. We found that the clustering was sensitive enough to group studies according to linguistic differences that often corresponded to the manual categories.
Collapse
Affiliation(s)
| | - Heather Melanie R Ames
- Norwegian Institute of Public Health, Skøyen, Norway.,Cochrane Consumer and Communication Group, Centre for Health Communication and Participation, School of Psychology and Public Health, La Trobe University, Bundoora, Victoria, Australia
| | | | | |
Collapse
|
14
|
Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: A scoping review. J Clin Epidemiol 2021; 144:22-42. [PMID: 34896236 DOI: 10.1016/j.jclinepi.2021.12.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 11/09/2021] [Accepted: 12/02/2021] [Indexed: 11/19/2022]
Abstract
OBJECTIVE The objectives of this scoping review are to identify the reliability and validity of the available tools, their limitations and any recommendations to further improve the use of these tools. STUDY DESIGN A scoping review methodology was followed to map the literature published on the challenges and solutions of conducting evidence synthesis using the JBI scoping review methodology. RESULTS A total of 47 publications were included in the review. The current scoping review identified that LitSuggest, Rayyan, Abstractr, BIBOT, R software, RobotAnalyst, DistillerSR, ExaCT and NetMetaXL have potential to be used for the automation of systematic reviews. However, they are not without limitations. The review also identified other studies that employed algorithms that have not yet been developed into user friendly tools. Some of these algorithms showed high validity and reliability but their use is conditional on user knowledge of computer science and algorithms. CONCLUSION Abstract screening has reached maturity; data extraction is still an active area. Developing methods to semi-automate different steps of evidence synthesis via machine learning remains an important research direction. Also, it is important to move from the research prototypes currently available to professionally maintained platforms.
Collapse
Affiliation(s)
- Hanan Khalil
- School of Psychology and Public Health, Department of Public Health, La Trobe University, Melbourne Campus, Victoria, Australia.
| | - Daniel Ameen
- Faculty of Medicine, Nursing and Health Sciences, Monash University, Wellington Road, Clayton Vic 3168, Australia
| | - Armita Zarnegar
- School of Psychology and Public Health, Department of Public Health, La Trobe University, Melbourne Campus, Victoria, Australia.
- School of Science, Computing and engineering technologies, Swinburne University of Technology, Melbourne, Australia
| |
Collapse
|
15
|
Shemilt I, Arno A, Thomas J, Lorenc T, Khouja C, Raine G, Sutcliffe K, Preethy D, Kwan I, Wright K, Sowden A. Cost-effectiveness of Microsoft Academic Graph with machine learning for automated study identification in a living map of coronavirus disease 2019 (COVID-19) research. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.17141.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: Conventionally, searching for eligible articles to include in systematic reviews and maps of research has relied primarily on information specialists conducting Boolean searches of multiple databases and manually processing the results, including deduplication between these multiple sources. Searching one, comprehensive source, rather than multiple databases, could save time and resources. Microsoft Academic Graph (MAG) is potentially such a source, containing a network graph structure which provides metadata that can be exploited in machine learning processes. Research is needed to establish the relative advantage of using MAG as a single source, compared with conventional searches of multiple databases. This study sought to establish whether: (a) MAG is sufficiently comprehensive to maintain our living map of coronavirus disease 2019 (COVID-19) research; and (b) eligible records can be identified with an acceptably high level of specificity. Methods: We conducted a pragmatic, eight-arm cost-effectiveness analysis (simulation study) to assess the costs, recall and precision of our semi-automated MAG-enabled workflow versus conventional searches of MEDLINE and Embase (with and without machine learning classifiers, active learning and/or fixed screening targets) for maintaining a living map of COVID-19 research. Resource use data (time use) were collected from information specialists and other researchers involved in map production. Results: MAG-enabled workflows dominated MEDLINE-Embase workflows in both the base case and sensitivity analyses. At one month (base case analysis) our MAG-enabled workflow with machine learning, active learning and fixed screening targets identified n=469 more new, eligible articles for inclusion in our living map – and cost £3,179 GBP ($5,691 AUD) less – than conventional MEDLINE-Embase searches without any automation or fixed screening targets. Conclusions: MAG-enabled continuous surveillance workflows have potential to revolutionise study identification methods for living maps, specialised registers, databases of research studies and/or collections of systematic reviews, by increasing their recall and coverage, whilst reducing production costs.
Collapse
|
16
|
Scott AM, Forbes C, Clark J, Carter M, Glasziou P, Munn Z. Systematic review automation tools improve efficiency but lack of knowledge impedes their adoption: a survey. J Clin Epidemiol 2021; 138:80-94. [PMID: 34242757 DOI: 10.1016/j.jclinepi.2021.06.030] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 06/23/2021] [Accepted: 06/30/2021] [Indexed: 01/12/2023]
Abstract
OBJECTIVE We investigated systematic review automation tool use by systematic reviewers, health technology assessors and clinical guideline developerst. STUDY DESIGN AND SETTING An online, 16-question survey was distributed across several evidence synthesis, health technology assessment and guideline development organizations. We asked the respondents what tools they use and abandon, how often and when do they use the tools, their perceived time savings and accuracy, and desired new tools. Descriptive statistics were used to report the results. RESULTS A total of 253 respondents completed the survey; 89% have used systematic review automation tools - most frequently whilst screening (79%). Respondents' "top 3" tools included: Covidence (45%), RevMan (35%), Rayyan and GRADEPro (both 22%); most commonly abandoned were Rayyan (19%), Covidence (15%), DistillerSR (14%) and RevMan (13%). Tools saved time (80%) and increased accuracy (54%). Respondents taught themselves to how to use the tools (72%); lack of knowledge was the most frequent barrier to tool adoption (51%). New tool development was suggested for the searching and data extraction stages. CONCLUSION Automation tools will likely have an increasingly important role in high-quality and timely reviews. Further work is required in training and dissemination of automation tools and ensuring they meet the desirable features of those conducting systematic reviews.
Collapse
Affiliation(s)
- Anna Mae Scott
- Institute for Evidence-Based Healthcare, Bond University, Gold Coast, Australia.
| | - Connor Forbes
- Institute for Evidence-Based Healthcare, Bond University, Gold Coast, Australia
| | - Justin Clark
- Institute for Evidence-Based Healthcare, Bond University, Gold Coast, Australia
| | - Matt Carter
- Institute for Evidence-Based Healthcare, Bond University, Gold Coast, Australia
| | - Paul Glasziou
- Institute for Evidence-Based Healthcare, Bond University, Gold Coast, Australia
| | - Zachary Munn
- JBI, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
| |
Collapse
|