Rimondi A, Gottlieb K, Despott EJ, Iacucci M, Murino A, Tontini GE. Can artificial intelligence replace endoscopists when assessing mucosal healing in ulcerative colitis? A systematic review and diagnostic test accuracy meta-analysis.
Dig Liver Dis 2024;
56:1164-1172. [PMID:
38057218 DOI:
10.1016/j.dld.2023.11.005]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/03/2023] [Accepted: 11/07/2023] [Indexed: 12/08/2023]
Abstract
BACKGROUNDS AND AIMS
Mucosal healing (MH) in inflammatory bowel diseases (IBD) is an important landmark for clinical decision making. Artificial intelligence systems (AI) that automatically deliver the grade of endoscopic inflammation may solve moderate interobserver agreement and the need of central reading in clinical trials.
METHODS
We performed a systematic review of EMBASE and MEDLINE databases up to 01/12/2022 following PRISMA and the Joanna Briggs Institute methodologies to answer the following question: "Can AI replace endoscopists when assessing MH in IBD?". The research was restricted to ulcerative colitis (UC), and a diagnostic odds ratio (DOR) meta-analysis was performed. Risk of bias was evaluated with QUADAS-2 tool.
RESULTS
A total of 21 / 739 records were selected for full text evaluation, and 12 were included in the meta-analysis. Deep learning algorithms based on convolutional neural networks architecture achieved a satisfactory performance in evaluating MH on UC, with sensitivity, specificity, DOR and SROC of respectively 0.91(CI95 %:0.86-0.95);0.89(CI95 %:0.84-0.93);92.42(CI95 %:54.22-157.53) and 0.957 when evaluating fixed images (n = 8) and 0.86(CI95 %:0.75-0.93);0.91(CI95 %:0.87-0.94);70.86(CI95 %:24.63-203.86) and 0.941 when evaluating videos (n = 6). Moderate-high levels of heterogeneity were noted, limiting the quality of the evidence.
CONCLUSIONS
AI systems showed high potential in detecting MH in UC with optimal diagnostic performance, although moderate-high heterogeneity of the data was noted. Standardised and shared AI training may reduce heterogeneity between systems.
Collapse