1
|
Hasan B, Saadi S, Rajjoub NS, Hegazi M, Al-Kordi M, Fleti F, Farah M, Riaz IB, Banerjee I, Wang Z, Murad MH. Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment. BMJ Evid Based Med 2024:bmjebm-2023-112597. [PMID: 38383136 DOI: 10.1136/bmjebm-2023-112597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/12/2024] [Indexed: 02/23/2024]
Abstract
Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of 'Classification of Intervention'. Kendall agreement coefficient was highest for the domains of 'Participant Selection', 'Missing Data' and 'Measurement of Outcomes', suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required.
Collapse
Affiliation(s)
- Bashar Hasan
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Samer Saadi
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Noora S Rajjoub
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
| | - Moustafa Hegazi
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Mohammad Al-Kordi
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Farah Fleti
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Magdoleen Farah
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Irbaz B Riaz
- Division of Hematology-Oncology Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Imon Banerjee
- Department of Radiology, Mayo Clinic Arizona, Scottsdale, Arizona, USA
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, Arizona, USA
| | - Zhen Wang
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Health Care Policy and Research, Mayo Clinic Minnesota, Rochester, Minnesota, USA
| | - Mohammad Hassan Murad
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
2
|
Murad MH, Wang Z, Chu H, Lin L, El Mikati IK, Khabsa J, Akl EA, Nieuwlaat R, Schuenemann HJ, Riaz IB. Proposed triggers for retiring a living systematic review. BMJ Evid Based Med 2023; 28:348-352. [PMID: 36889900 PMCID: PMC10579491 DOI: 10.1136/bmjebm-2022-112100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/23/2023] [Indexed: 03/10/2023]
Abstract
Living systematic reviews (LSRs) are systematic reviews that are continually updated, incorporating relevant new evidence as it becomes available. LSRs are critical for decision-making in topics where the evidence continues to evolve. It is not feasible to continue to update LSRs indefinitely; however, guidance on when to retire LSRs from the living mode is not clear. We propose triggers for making such a decision. The first trigger is to retire LSRs when the evidence becomes conclusive for the outcomes that are required for decision-making. Conclusiveness of evidence is best determined based on the GRADE certainty of evidence construct, which is more comprehensive than solely relying on statistical considerations. The second trigger to retire LSRs is when the question becomes less pertinent for decision-making as determined by relevant stakeholders, including people affected by the problem, healthcare professionals, policymakers and researchers. LSRs can also be retired from a living mode when new studies are not anticipated to be published on the topic and when resources become unavailable to continue updating. We describe examples of retired LSRs and apply the proposed approach using one LSR about adjuvant tyrosine kinase inhibitors in high-risk renal cell carcinoma that we retired from a living mode and published its last update.
Collapse
Affiliation(s)
- Mohammad Hassan Murad
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
- Kern Center for the Science of Healthcare Delivery Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Zhen Wang
- Kern Center for the Science of Healthcare Delivery Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Haitao Chu
- Department of Biostatistics, University of Minnesota Twin Cities, Minneapolis, Minnesota, USA
| | - Lifeng Lin
- Department of Statistics, University of Arizona Medical Center-South Campus, Tucson, Arizona, USA
| | | | - Joanne Khabsa
- Clinical Research Institute, American University of Beirut, Beirut, Lebanon
| | - Elie A Akl
- Clinical Research Institute, American University of Beirut, Beirut, Lebanon
- Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Robby Nieuwlaat
- Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Holger J Schuenemann
- Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
- McMaster University, GRADE Center, Hamilton, Ontario, Canada
- Institute for Evidence in Medicine, University of Freiburg, Freiburg, Germany
- Department of Biomedical Sciences, Humanitas University, Milano, Italy
| | - Irbaz Bin Riaz
- Mayo Clinic, Phoenix, Arizona, USA
- Mass General Brigham Inc, Boston, Massachusetts, USA
| |
Collapse
|