1
|
Reale-Nosei G, Amador-Domínguez E, Serrano E. From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation. Med Image Anal 2024; 97:103264. [PMID: 39013207 DOI: 10.1016/j.media.2024.103264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 04/25/2024] [Accepted: 07/01/2024] [Indexed: 07/18/2024]
Abstract
Natural Image Captioning (NIC) is an interdisciplinary research area that lies within the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several works have been presented on the subject, ranging from the early template-based approaches to the more recent deep learning-based methods. This paper conducts a survey in the area of NIC, especially focusing on its applications for Medical Image Captioning (MIC) and Diagnostic Captioning (DC) in the field of radiology. A review of the state-of-the-art is conducted summarizing key research works in NIC and DC to provide a wide overview on the subject. These works include existing NIC and MIC models, datasets, evaluation metrics, and previous reviews in the specialized literature. The revised work is thoroughly analyzed and discussed, highlighting the limitations of existing approaches and their potential implications in real clinical practice. Similarly, future potential research lines are outlined on the basis of the detected limitations.
Collapse
Affiliation(s)
- Gabriel Reale-Nosei
- ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| | - Elvira Amador-Domínguez
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Departamento de Sistemas Informáticos, ETSI Sistemas Informáticos, Universidad Politécnica de Madrid, 28031 Madrid, Spain.
| | - Emilio Serrano
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| |
Collapse
|
2
|
Zhang H, He Y, Wu X, Huang P, Qin W, Wang F, Ye J, Huang X, Liao Y, Chen H, Guo L, Shi X, Luo L. PathNarratives: Data annotation for pathological human-AI collaborative diagnosis. Front Med (Lausanne) 2023; 9:1070072. [PMID: 36777158 PMCID: PMC9908590 DOI: 10.3389/fmed.2022.1070072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 12/22/2022] [Indexed: 01/27/2023] Open
Abstract
Pathology is the gold standard of clinical diagnosis. Artificial intelligence (AI) in pathology becomes a new trend, but it is still not widely used due to the lack of necessary explanations for pathologists to understand the rationale. Clinic-compliant explanations besides the diagnostic decision of pathological images are essential for AI model training to provide diagnostic suggestions assisting pathologists practice. In this study, we propose a new annotation form, PathNarratives, that includes a hierarchical decision-to-reason data structure, a narrative annotation process, and a multimodal interactive annotation tool. Following PathNarratives, we recruited 8 pathologist annotators to build a colorectal pathological dataset, CR-PathNarratives, containing 174 whole-slide images (WSIs). We further experiment on the dataset with classification and captioning tasks to explore the clinical scenarios of human-AI-collaborative pathological diagnosis. The classification tasks show that fine-grain prediction enhances the overall classification accuracy from 79.56 to 85.26%. In Human-AI collaboration experience, the trust and confidence scores from 8 pathologists raised from 3.88 to 4.63 with providing more details. Results show that the classification and captioning tasks achieve better results with reason labels, provide explainable clues for doctors to understand and make the final decision and thus can support a better experience of human-AI collaboration in pathological diagnosis. In the future, we plan to optimize the tools for the annotation process, and expand the datasets with more WSIs and covering more pathological domains.
Collapse
Affiliation(s)
- Heyu Zhang
- College of Engineering, Peking University, Beijing, China
| | - Yan He
- Department of Pathology, Longgang Central Hospital of Shenzhen, Shenzhen, China
| | - Xiaomin Wu
- College of Engineering, Peking University, Beijing, China
| | - Peixiang Huang
- College of Engineering, Peking University, Beijing, China
| | - Wenkang Qin
- College of Engineering, Peking University, Beijing, China
| | - Fan Wang
- College of Engineering, Peking University, Beijing, China
| | - Juxiang Ye
- Department of Pathology, School of Basic Medical Science, Peking University Health Science Center, Peking University Third Hospital, Beijing, China
| | - Xirui Huang
- Department of Pathology, Longgang Central Hospital of Shenzhen, Shenzhen, China
| | - Yanfang Liao
- Department of Pathology, Longgang Central Hospital of Shenzhen, Shenzhen, China
| | - Hang Chen
- College of Engineering, Peking University, Beijing, China
| | - Limei Guo
- Department of Pathology, School of Basic Medical Science, Peking University Health Science Center, Peking University Third Hospital, Beijing, China,*Correspondence: Limei Guo,
| | - Xueying Shi
- Department of Pathology, School of Basic Medical Science, Peking University Health Science Center, Peking University Third Hospital, Beijing, China,Xueying Shi,
| | - Lin Luo
- College of Engineering, Peking University, Beijing, China,Lin Luo,
| |
Collapse
|