1
|
Hocking L, Parkinson S, Adams A, Molding Nielsen E, Ang C, de Carvalho Gomes H. Overcoming the challenges of using automated technologies for public health evidence synthesis. Euro Surveill 2023; 28:2300183. [PMID: 37943502 PMCID: PMC10636742 DOI: 10.2807/1560-7917.es.2023.28.45.2300183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 08/10/2023] [Indexed: 11/10/2023] Open
Abstract
Many organisations struggle to keep pace with public health evidence due to the volume of published literature and length of time it takes to conduct literature reviews. New technologies that help automate parts of the evidence synthesis process can help conduct reviews more quickly and efficiently to better provide up-to-date evidence for public health decision making. To date, automated approaches have seldom been used in public health due to significant barriers to their adoption. In this Perspective, we reflect on the findings of a study exploring experiences of adopting automated technologies to conduct evidence reviews within the public health sector. The study, funded by the European Centre for Disease Prevention and Control, consisted of a literature review and qualitative data collection from public health organisations and researchers in the field. We specifically focus on outlining the challenges associated with the adoption of automated approaches and potential solutions and actions that can be taken to mitigate these. We explore these in relation to actions that can be taken by tool developers (e.g. improving tool performance and transparency), public health organisations (e.g. developing staff skills, encouraging collaboration) and funding bodies/the wider research system (e.g. researchers, funding bodies, academic publishers and scholarly journals).
Collapse
|
2
|
Mottin L, Goldman JP, Jäggli C, Achermann R, Gobeill J, Knafou J, Ehrsam J, Wicky A, Gérard CL, Schwenk T, Charrier M, Tsantoulis P, Lovis C, Leichtle A, Kiessling MK, Michielin O, Pradervand S, Foufi V, Ruch P. Multilingual RECIST classification of radiology reports using supervised learning. Front Digit Health 2023; 5:1195017. [PMID: 37388252 PMCID: PMC10303934 DOI: 10.3389/fdgth.2023.1195017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/01/2023] Open
Abstract
Objectives The objective of this study is the exploration of Artificial Intelligence and Natural Language Processing techniques to support the automatic assignment of the four Response Evaluation Criteria in Solid Tumors (RECIST) scales based on radiology reports. We also aim at evaluating how languages and institutional specificities of Swiss teaching hospitals are likely to affect the quality of the classification in French and German languages. Methods In our approach, 7 machine learning methods were evaluated to establish a strong baseline. Then, robust models were built, fine-tuned according to the language (French and German), and compared with the expert annotation. Results The best strategies yield average F1-scores of 90% and 86% respectively for the 2-classes (Progressive/Non-progressive) and the 4-classes (Progressive Disease, Stable Disease, Partial Response, Complete Response) RECIST classification tasks. Conclusions These results are competitive with the manual labeling as measured by Matthew's correlation coefficient and Cohen's Kappa (79% and 76%). On this basis, we confirm the capacity of specific models to generalize on new unseen data and we assess the impact of using Pre-trained Language Models (PLMs) on the accuracy of the classifiers.
Collapse
Affiliation(s)
- Luc Mottin
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Jean-Philippe Goldman
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Christoph Jäggli
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Rita Achermann
- Department of Radiology, Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Julien Gobeill
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Knafou
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Ehrsam
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexandre Wicky
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Camille L. Gérard
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Tanja Schwenk
- Department of Oncology, Kantonsspital Aarau, Aarau, Switzerland
| | - Mélinda Charrier
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Petros Tsantoulis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexander Leichtle
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Michael K. Kiessling
- Department of Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
| | - Olivier Michielin
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Sylvain Pradervand
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Vasiliki Foufi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Patrick Ruch
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| |
Collapse
|
3
|
Oliveira Dos Santos Á, Sergio da Silva E, Machado Couto L, Valadares Labanca Reis G, Silva Belo V. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: a scoping review. J Biomed Inform 2023; 142:104389. [PMID: 37187321 DOI: 10.1016/j.jbi.2023.104389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/11/2023] [Accepted: 05/08/2023] [Indexed: 05/17/2023]
Abstract
OBJECTIVE Evidence-based medicine (EBM) is a decision-making process based on the conscious and judicious use of the best available scientific evidence. However, the exponential increase in the amount of information currently available likely exceeds the capacity of human-only analysis. In this context, artificial intelligence (AI) and its branches such as machine learning (ML) can be used to facilitate human efforts in analyzing the literature to foster EBM. The present scoping review aimed to examine the use of AI in the automation of biomedical literature survey and analysis with a view to establishing the state-of-the-art and identifying knowledge gaps. MATERIALS AND METHODS Comprehensive searches of the main databases were performed for articles published up to June 2022 and studies were selected according to inclusion and exclusion criteria. Data were extracted from the included articles and the findings categorized. RESULTS The total number of records retrieved from the databases was 12,145, of which 273 were included in the review. Classification of the studies according to the use of AI in evaluating the biomedical literature revealed three main application groups, namely assembly of scientific evidence (n=127; 47%), mining the biomedical literature (n=112; 41%) and quality analysis (n=34; 12%). Most studies addressed the preparation of systematic reviews, while articles focusing on the development of guidelines and evidence synthesis were the least frequent. The biggest knowledge gap was identified within the quality analysis group, particularly regarding methods and tools that assess the strength of recommendation and consistency of evidence. CONCLUSION Our review shows that, despite significant progress in the automation of biomedical literature surveys and analyses in recent years, intense research is needed to fill knowledge gaps on more difficult aspects of ML, deep learning and natural language processing, and to consolidate the use of automation by end-users (biomedical researchers and healthcare professionals).
Collapse
Affiliation(s)
| | - Eduardo Sergio da Silva
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | - Letícia Machado Couto
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | | | - Vinícius Silva Belo
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| |
Collapse
|
4
|
Pethani F, Dunn AG. Natural language processing for clinical notes in dentistry: A systematic review. J Biomed Inform 2023; 138:104282. [PMID: 36623780 DOI: 10.1016/j.jbi.2023.104282] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/01/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023]
Abstract
OBJECTIVE To identify and synthesise research on applications of natural language processing (NLP) for information extraction and retrieval from clinical notes in dentistry. MATERIALS AND METHODS A predefined search strategy was applied in EMBASE, CINAHL and Medline. Studies eligible for inclusion were those that that described, evaluated, or applied NLP to clinical notes containing either human or simulated patient information. Quality of the study design and reporting was independently assessed based on a set of questions derived from relevant tools including CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS). A narrative synthesis was conducted to present the results. RESULTS Of the 17 included studies, 10 developed and evaluated NLP methods and 7 described applications of NLP-based information retrieval methods in dental records. Studies were published between 2015 and 2021, most were missing key details needed for reproducibility, and there was no consistency in design or reporting. The 10 studies developing or evaluating NLP methods used document classification or entity extraction, and 4 compared NLP methods to non-NLP methods. The quality of reporting on NLP studies in dentistry has modestly improved over time. CONCLUSIONS Study design heterogeneity and incomplete reporting of studies currently limits our ability to synthesise NLP applications in dental records. Standardisation of reporting and improved connections between NLP methods and applied NLP in dentistry may improve how we can make use of clinical notes from dentistry in population health or decision support systems. PROTOCOL REGISTRATION PROSPERO CRD42021227823.
Collapse
Affiliation(s)
- Farhana Pethani
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia
| | - Adam G Dunn
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia.
| |
Collapse
|
5
|
Decreased sensitivity to antidepressant drugs in Wistar Hannover rats submitted to two animal models of depression. Acta Neuropsychiatr 2023; 35:35-49. [PMID: 36101010 DOI: 10.1017/neu.2022.24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The Wistar Hannover rat (WHR) is a strain commonly used for toxicity studies but rarely used in studies investigating depression neurobiology. In this study, we aimed to characterise the behavioural responses of WHR to acute and repeated antidepressant treatments upon exposure to the forced swim test (FST) or learned helplessness (LH) test. WHR were subjected to forced swimming pre-test and test with antidepressant administration (imipramine, fluoxetine, or escitalopram) at 0, 5 h and 23 h after pre-test. WHR displayed high immobility in the test compared to unstressed controls (no pre-swim) and failed to respond to the antidepressants tested. The effect of acute and repeated treatment (imipramine, fluoxetine, escitalopram or s-ketamine) was then tested in animals not previously exposed to pre-test. Only imipramine (20 mg/kg, 7 days) and s-ketamine (acute) reduced the immobility time in the test. To further investigate the possibility that the WHR were less responsive to selective serotonin reuptake inhibitors, the effect of repeated treatment with fluoxetine (20 mg/kg, 7 days) was investigated in the LH model. The results demonstrated that fluoxetine failed to reduce the number of escape failures in two different protocols. These data suggest that the WHR do not respond to the conventional antidepressant treatment in the FST or the LH. Only s-ketamine and repeated imipramine were effective in WHR in a modified FST protocol. Altogether, these results indicate that WHR may be an interesting tool to investigate the mechanisms associated with the resistance to antidepressant drugs and identify more effective treatments.
Collapse
|
6
|
Facchinetti T, Benetti G, Giuffrida D, Nocera A. slr-kit: A semi-supervised machine learning framework for systematic literature reviews. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
7
|
van Altena AJ, Spijker R, Leeflang MMG, Olabarriaga SD. Training sample selection: Impact on screening automation in diagnostic test accuracy reviews. Res Synth Methods 2021; 12:831-841. [PMID: 34390193 PMCID: PMC9292892 DOI: 10.1002/jrsm.1518] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/12/2021] [Accepted: 07/02/2021] [Indexed: 02/01/2023]
Abstract
When performing a systematic review, researchers screen the articles retrieved after a broad search strategy one by one, which is time‐consuming. Computerised support of this screening process has been applied with varying success. This is partly due to the dependency on large amounts of data to develop models that predict inclusion. In this paper, we present an approach to choose which data to use in model training and compare it with established approaches. We used a dataset of 50 Cochrane diagnostic test accuracy reviews, and each was used as a target review. From the remaining 49 reviews, we selected those that most closely resembled the target review's clinical topic using the cosine similarity metric. Included and excluded studies from these selected reviews were then used to develop our prediction models. The performance of models trained on the selected reviews was compared against models trained on studies from all available reviews. The prediction models performed best with a larger number of reviews in the training set and on target reviews that had a research subject similar to other reviews in the dataset. Our approach using cosine similarity may reduce computational costs for model training and the duration of the screening process.
Collapse
Affiliation(s)
- Allard J van Altena
- Department of Epidemiology and Data Science, Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - René Spijker
- Medical Library, Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.,Cochrane Netherlands, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Mariska M G Leeflang
- Department of Epidemiology and Data Science, Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Sílvia Delgado Olabarriaga
- Department of Epidemiology and Data Science, Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
8
|
Schmidt L, Finnerty Mutlu AN, Elmore R, Olorisade BK, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: Update of a living systematic review. F1000Res 2021; 10:401. [PMID: 34408850 PMCID: PMC8361807 DOI: 10.12688/f1000research.51117.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 10/12/2023] Open
Abstract
Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search PubMed, ACL Anthology, arXiv, OpenAlex via EPPI-Reviewer, and the dblp computer science bibliography. Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. This living review update includes publications up to December 2022 and OpenAlex content up to March 2023. Results: 76 publications are included in this review. Of these, 64 (84%) of the publications addressed extraction of data from abstracts, while 19 (25%) used full texts. A total of 71 (93%) publications developed classifiers for randomised controlled trials. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. Data are available from 25 (33%), and code from 30 (39%) publications. Six (8%) implemented publicly available tools Conclusions: This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of literature review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data. Between review updates, trends for sharing data and code increased strongly: in the base-review, data and code were available for 13 and 19% respectively, these numbers increased to 78 and 87% within the 23 new publications. Compared with the base-review, we observed another research trend, away from straightforward data extraction and towards additionally extracting relations between entities or automatic text summarisation. With this living review we aim to review the literature continually.
Collapse
Affiliation(s)
- Lena Schmidt
- NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, NE4 5TG, UK
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
| | | | - Rebecca Elmore
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
| | - Babatunde K. Olorisade
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
- Evaluate Ltd, London, SE1 2RE, UK
- Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, CF5 2YB, UK
| | - James Thomas
- UCL Social Research Institute, University College London, London, WC1H 0AL, UK
| | | |
Collapse
|
9
|
Schmidt L, Finnerty Mutlu AN, Elmore R, Olorisade BK, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: A living systematic review. F1000Res 2021; 10:401. [PMID: 34408850 PMCID: PMC8361807 DOI: 10.12688/f1000research.51117.1] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/04/2021] [Indexed: 01/07/2023] Open
Abstract
Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search MEDLINE, Institute of Electrical and Electronics Engineers (IEEE), arXiv, and the dblp computer science bibliography databases. Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. This iteration of the living review includes publications up to a cut-off date of 22 April 2020. Results: In total, 53 publications are included in this version of our review. Of these, 41 (77%) of the publications addressed extraction of data from abstracts, while 14 (26%) used full texts. A total of 48 (90%) publications developed and evaluated classifiers that used randomised controlled trials as the main target texts. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. A description of their datasets was provided by 49 publications (94%), but only seven (13%) made the data publicly available. Code was made available by 10 (19%) publications, and five (9%) implemented publicly available tools. Conclusions: This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of systematic review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data. The lack of publicly available gold-standard data for evaluation, and lack of application thereof, makes it difficult to draw conclusions on which is the best-performing system for each data extraction target. With this living review we aim to review the literature continually.
Collapse
Affiliation(s)
- Lena Schmidt
- NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, NE4 5TG, UK
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
| | | | - Rebecca Elmore
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
| | - Babatunde K. Olorisade
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
- Evaluate Ltd, London, SE1 2RE, UK
- Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, CF5 2YB, UK
| | - James Thomas
- UCL Social Research Institute, University College London, London, WC1H 0AL, UK
| | | |
Collapse
|
10
|
Schmidt L, Olorisade BK, McGuinness LA, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: A living review protocol. F1000Res 2020; 9:210. [PMID: 32724560 PMCID: PMC7338918 DOI: 10.12688/f1000research.22781.2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/01/2020] [Indexed: 11/20/2022] Open
Abstract
Background: Researchers in evidence-based medicine cannot keep up with the amounts of both old and newly published primary research articles. Support for the early stages of the systematic review process - searching and screening studies for eligibility - is necessary because it is currently impossible to search for relevant research with precision. Better automated data extraction may not only facilitate the stage of review traditionally labelled 'data extraction', but also change earlier phases of the review process by making it possible to identify relevant research. Exponential improvements in computational processing speed and data storage are fostering the development of data mining models and algorithms. This, in combination with quicker pathways to publication, led to a large landscape of tools and methods for data mining and extraction. Objective: To review published methods and tools for data extraction to (semi)automate the systematic reviewing process. Methods: We propose to conduct a living review. With this methodology we aim to do constant evidence surveillance, bi-monthly search updates, as well as review updates every 6 months if new evidence permits it. In a cross-sectional analysis we will extract methodological characteristics and assess the quality of reporting in our included papers. Conclusions: We aim to increase transparency in the reporting and assessment of automation technologies to the benefit of data scientists, systematic reviewers and funders of health research. This living review will help to reduce duplicate efforts by data scientists who develop data mining methods. It will also serve to inform systematic reviewers about possibilities to support their data extraction.
Collapse
Affiliation(s)
- Lena Schmidt
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
| | | | | | - James Thomas
- UCL Social Research Institute, University College London, London, WC1H 0AL, UK
| | | |
Collapse
|
11
|
Schmidt L, Olorisade BK, McGuinness LA, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: A living review protocol. F1000Res 2020; 9:210. [PMID: 32724560 PMCID: PMC7338918 DOI: 10.12688/f1000research.22781.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/19/2020] [Indexed: 10/12/2023] Open
Abstract
Background: Researchers in evidence-based medicine cannot keep up with the amounts of both old and newly published primary research articles. Conducting and updating of systematic reviews is time-consuming. In practice, data extraction is one of the most complex tasks in this process. Exponential improvements in computational processing speed and data storage are fostering the development of data extraction models and algorithms. This, in combination with quicker pathways to publication, led to a large landscape of tools and methods for data extraction tasks. Objective: To review published methods and tools for data extraction to (semi)automate the systematic reviewing process. Methods: We propose to conduct a living review. With this methodology we aim to do monthly search updates, as well as bi-annual review updates if new evidence permits it. In a cross-sectional analysis we will extract methodological characteristics and assess the quality of reporting in our included papers. Conclusions: We aim to increase transparency in the reporting and assessment of machine learning technologies to the benefit of data scientists, systematic reviewers and funders of health research. This living review will help to reduce duplicate efforts by data scientists who develop data extraction methods. It will also serve to inform systematic reviewers about possibilities to support their data extraction.
Collapse
Affiliation(s)
- Lena Schmidt
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
| | | | | | - James Thomas
- UCL Social Research Institute, University College London, London, WC1H 0AL, UK
| | | |
Collapse
|
12
|
Gates A, Guitard S, Pillay J, Elliott SA, Dyson MP, Newton AS, Hartling L. Performance and usability of machine learning for screening in systematic reviews: a comparative evaluation of three tools. Syst Rev 2019; 8:278. [PMID: 31727150 PMCID: PMC6857345 DOI: 10.1186/s13643-019-1222-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 11/05/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We explored the performance of three machine learning tools designed to facilitate title and abstract screening in systematic reviews (SRs) when used to (a) eliminate irrelevant records (automated simulation) and (b) complement the work of a single reviewer (semi-automated simulation). We evaluated user experiences for each tool. METHODS We subjected three SRs to two retrospective screening simulations. In each tool (Abstrackr, DistillerSR, RobotAnalyst), we screened a 200-record training set and downloaded the predicted relevance of the remaining records. We calculated the proportion missed and workload and time savings compared to dual independent screening. To test user experiences, eight research staff tried each tool and completed a survey. RESULTS Using Abstrackr, DistillerSR, and RobotAnalyst, respectively, the median (range) proportion missed was 5 (0 to 28) percent, 97 (96 to 100) percent, and 70 (23 to 100) percent for the automated simulation and 1 (0 to 2) percent, 2 (0 to 7) percent, and 2 (0 to 4) percent for the semi-automated simulation. The median (range) workload savings was 90 (82 to 93) percent, 99 (98 to 99) percent, and 85 (85 to 88) percent for the automated simulation and 40 (32 to 43) percent, 49 (48 to 49) percent, and 35 (34 to 38) percent for the semi-automated simulation. The median (range) time savings was 154 (91 to 183), 185 (95 to 201), and 157 (86 to 172) hours for the automated simulation and 61 (42 to 82), 92 (46 to 100), and 64 (37 to 71) hours for the semi-automated simulation. Abstrackr identified 33-90% of records missed by a single reviewer. RobotAnalyst performed less well and DistillerSR provided no relative advantage. User experiences depended on user friendliness, qualities of the user interface, features and functions, trustworthiness, ease and speed of obtaining predictions, and practicality of the export file(s). CONCLUSIONS The workload savings afforded in the automated simulation came with increased risk of missing relevant records. Supplementing a single reviewer's decisions with relevance predictions (semi-automated simulation) sometimes reduced the proportion missed, but performance varied by tool and SR. Designing tools based on reviewers' self-identified preferences may improve their compatibility with present workflows. SYSTEMATIC REVIEW REGISTRATION Not applicable.
Collapse
Affiliation(s)
- Allison Gates
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Samantha Guitard
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Jennifer Pillay
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Sarah A Elliott
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Michele P Dyson
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Amanda S Newton
- Department of Pediatrics, University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada
| | - Lisa Hartling
- Department of Pediatrics, Alberta Research Centre for Health Evidence and the University of Alberta Evidence-based Practice Center, University of Alberta, 11405 87 Ave NW, Edmonton, Alberta, T6G 1C9, Canada.
| |
Collapse
|
13
|
O’Connor AM, Tsafnat G, Thomas J, Glasziou P, Gilbert SB, Hutton B. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Syst Rev 2019; 8:143. [PMID: 31215463 PMCID: PMC6582554 DOI: 10.1186/s13643-019-1062-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 06/05/2019] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Although many aspects of systematic reviews use computational tools, systematic reviewers have been reluctant to adopt machine learning tools. DISCUSSION We discuss that the potential reason for the slow adoption of machine learning tools into systematic reviews is multifactorial. We focus on the current absence of trust in automation and set-up challenges as major barriers to adoption. It is important that reviews produced using automation tools are considered non-inferior or superior to current practice. However, this standard will likely not be sufficient to lead to widespread adoption. As with many technologies, it is important that reviewers see "others" in the review community using automation tools. Adoption will also be slow if the automation tools are not compatible with workflows and tasks currently used to produce reviews. Many automation tools being developed for systematic reviews mimic classification problems. Therefore, the evidence that these automation tools are non-inferior or superior can be presented using methods similar to diagnostic test evaluations, i.e., precision and recall compared to a human reviewer. However, the assessment of automation tools does present unique challenges for investigators and systematic reviewers, including the need to clarify which metrics are of interest to the systematic review community and the unique documentation challenges for reproducible software experiments. CONCLUSION We discuss adoption barriers with the goal of providing tool developers with guidance as to how to design and report such evaluations and for end users to assess their validity. Further, we discuss approaches to formatting and announcing publicly available datasets suitable for assessment of automation technologies and tools. Making these resources available will increase trust that tools are non-inferior or superior to current practice. Finally, we identify that, even with evidence that automation tools are non-inferior or superior to current practice, substantial set-up challenges remain for main stream integration of automation into the systematic review process.
Collapse
Affiliation(s)
| | - Guy Tsafnat
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | | | | | | | - Brian Hutton
- Knowledge Synthesis Unit, Ottawa Hospital Research Institute, Ottawa, ON K1H 8L6 Canada
| |
Collapse
|
14
|
O'Connor AM, Tsafnat G, Gilbert SB, Thayer KA, Shemilt I, Thomas J, Glasziou P, Wolfe MS. Still moving toward automation of the systematic review process: a summary of discussions at the third meeting of the International Collaboration for Automation of Systematic Reviews (ICASR). Syst Rev 2019; 8:57. [PMID: 30786933 PMCID: PMC6381675 DOI: 10.1186/s13643-019-0975-y] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 02/12/2019] [Indexed: 12/28/2022] Open
Abstract
The third meeting of the International Collaboration for Automation of Systematic Reviews (ICASR) was held 17-18 October 2017 in London, England. ICASR is an interdisciplinary group whose goal is to maximize the use of technology for conducting rapid, accurate, and efficient systematic reviews of scientific evidence. The group seeks to facilitate the development and widespread acceptance of automated techniques for systematic reviews. The meeting's conclusion was that the most pressing needs at present are to develop approaches for validating currently available tools and to provide increased access to curated corpora that can be used for validation. To that end, ICASR's short-term goals in 2018-2019 are to propose and publish protocols for key tasks in systematic reviews and to develop an approach for sharing curated corpora for validating the automation of the key tasks.
Collapse
Affiliation(s)
- Annette M O'Connor
- College of Veterinary Medicine, Iowa State University, 1800 Christensen Drive, Ames, IA, 50011, USA.
| | | | | | - Kristina A Thayer
- US Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Ian Shemilt
- EPPI-Centre, University College London, London, WC1E 6BT, UK
| | - James Thomas
- EPPI-Centre, University College London, London, WC1E 6BT, UK
| | - Paul Glasziou
- Bond University, Robina, Queensland, 4226, Australia
| | - Mary S Wolfe
- National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27709, USA
| |
Collapse
|
15
|
Westgate MJ, Haddaway NR, Cheng SH, McIntosh EJ, Marshall C, Lindenmayer DB. Software support for environmental evidence synthesis. Nat Ecol Evol 2018; 2:588-590. [DOI: 10.1038/s41559-018-0502-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
16
|
Surian D, Dunn AG, Orenstein L, Bashir R, Coiera E, Bourgeois FT. A shared latent space matrix factorisation method for recommending new trial evidence for systematic review updates. J Biomed Inform 2018; 79:32-40. [PMID: 29410356 DOI: 10.1016/j.jbi.2018.01.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Revised: 12/09/2017] [Accepted: 01/22/2018] [Indexed: 10/18/2022]
Abstract
BACKGROUND Clinical trial registries can be used to monitor the production of trial evidence and signal when systematic reviews become out of date. However, this use has been limited to date due to the extensive manual review required to search for and screen relevant trial registrations. Our aim was to evaluate a new method that could partially automate the identification of trial registrations that may be relevant for systematic review updates. MATERIALS AND METHODS We identified 179 systematic reviews of drug interventions for type 2 diabetes, which included 537 clinical trials that had registrations in ClinicalTrials.gov. Text from the trial registrations were used as features directly, or transformed using Latent Dirichlet Allocation (LDA) or Principal Component Analysis (PCA). We tested a novel matrix factorisation approach that uses a shared latent space to learn how to rank relevant trial registrations for each systematic review, comparing the performance to document similarity to rank relevant trial registrations. The two approaches were tested on a holdout set of the newest trials from the set of type 2 diabetes systematic reviews and an unseen set of 141 clinical trial registrations from 17 updated systematic reviews published in the Cochrane Database of Systematic Reviews. The performance was measured by the number of relevant registrations found after examining 100 candidates (recall@100) and the median rank of relevant registrations in the ranked candidate lists. RESULTS The matrix factorisation approach outperformed the document similarity approach with a median rank of 59 (of 128,392 candidate registrations in ClinicalTrials.gov) and recall@100 of 60.9% using LDA feature representation, compared to a median rank of 138 and recall@100 of 42.8% in the document similarity baseline. In the second set of systematic reviews and their updates, the highest performing approach used document similarity and gave a median rank of 67 (recall@100 of 62.9%). CONCLUSIONS A shared latent space matrix factorisation method was useful for ranking trial registrations to reduce the manual workload associated with finding relevant trials for systematic review updates. The results suggest that the approach could be used as part of a semi-automated pipeline for monitoring potentially new evidence for inclusion in a review update.
Collapse
Affiliation(s)
- Didi Surian
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
| | - Adam G Dunn
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Liat Orenstein
- Computational Health Informatics Program, Boston Children's Hospital, Boston, United States
| | - Rabia Bashir
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Florence T Bourgeois
- Computational Health Informatics Program, Boston Children's Hospital, Boston, United States; Department of Pediatrics, Harvard Medical School, Boston, United States
| |
Collapse
|