1
|
Sogancioglu E, Ginneken BV, Behrendt F, Bengs M, Schlaefer A, Radu M, Xu D, Sheng K, Scalzo F, Marcus E, Papa S, Teuwen J, Scholten ET, Schalekamp S, Hendrix N, Jacobs C, Hendrix W, Sanchez CI, Murphy K. Nodule Detection and Generation on Chest X-Rays: NODE21 Challenge. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2839-2853. [PMID: 38530714 DOI: 10.1109/tmi.2024.3382042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of methods for this task. To address this, we organized a public research challenge, NODE21, aimed at the detection and generation of lung nodules in chest X-rays. While the detection track assesses state-of-the-art nodule detection systems, the generation track determines the utility of nodule generation algorithms to augment training data and hence improve the performance of the detection systems. This paper summarizes the results of the NODE21 challenge and performs extensive additional experiments to examine the impact of the synthetically generated nodule training images on the detection algorithm performance.
Collapse
|
2
|
Dai T, Zhang R, Hong F, Yao J, Zhang Y, Wang Y. UniChest: Conquer-and-Divide Pre-Training for Multi-Source Chest X-Ray Classification. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2901-2912. [PMID: 38526891 DOI: 10.1109/tmi.2024.3381123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Vision-Language Pre-training (VLP) that utilizes the multi-modal information to promote the training efficiency and effectiveness, has achieved great success in vision recognition of natural domains and shown promise in medical imaging diagnosis for the Chest X-Rays (CXRs). However, current works mainly pay attention to the exploration on single dataset of CXRs, which locks the potential of this powerful paradigm on larger hybrid of multi-source CXRs datasets. We identify that although blending samples from the diverse sources offers the advantages to improve the model generalization, it is still challenging to maintain the consistent superiority for the task of each source due to the existing heterogeneity among sources. To handle this dilemma, we design a Conquer-and-Divide pre-training framework, termed as UniChest, aiming to make full use of the collaboration benefit of multiple sources of CXRs while reducing the negative influence of the source heterogeneity. Specially, the "Conquer" stage in UniChest encourages the model to sufficiently capture multi-source common patterns, and the "Divide" stage helps squeeze personalized patterns into different small experts (query networks). We conduct thorough experiments on many benchmarks, e.g., ChestX-ray14, CheXpert, Vindr-CXR, Shenzhen, Open-I and SIIM-ACR Pneumothorax, verifying the effectiveness of UniChest over a range of baselines, and release our codes and pre-training models at https://github.com/Elfenreigen/UniChest.
Collapse
|
3
|
Reale-Nosei G, Amador-Domínguez E, Serrano E. From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation. Med Image Anal 2024; 97:103264. [PMID: 39013207 DOI: 10.1016/j.media.2024.103264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 04/25/2024] [Accepted: 07/01/2024] [Indexed: 07/18/2024]
Abstract
Natural Image Captioning (NIC) is an interdisciplinary research area that lies within the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several works have been presented on the subject, ranging from the early template-based approaches to the more recent deep learning-based methods. This paper conducts a survey in the area of NIC, especially focusing on its applications for Medical Image Captioning (MIC) and Diagnostic Captioning (DC) in the field of radiology. A review of the state-of-the-art is conducted summarizing key research works in NIC and DC to provide a wide overview on the subject. These works include existing NIC and MIC models, datasets, evaluation metrics, and previous reviews in the specialized literature. The revised work is thoroughly analyzed and discussed, highlighting the limitations of existing approaches and their potential implications in real clinical practice. Similarly, future potential research lines are outlined on the basis of the detected limitations.
Collapse
Affiliation(s)
- Gabriel Reale-Nosei
- ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| | - Elvira Amador-Domínguez
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Departamento de Sistemas Informáticos, ETSI Sistemas Informáticos, Universidad Politécnica de Madrid, 28031 Madrid, Spain.
| | - Emilio Serrano
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| |
Collapse
|
4
|
López-Úbeda P, Martín-Noguerol T, Díaz-Angulo C, Luna A. Evaluation of large language models performance against humans for summarizing MRI knee radiology reports: A feasibility study. Int J Med Inform 2024; 187:105443. [PMID: 38615509 DOI: 10.1016/j.ijmedinf.2024.105443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 03/20/2024] [Accepted: 03/29/2024] [Indexed: 04/16/2024]
Abstract
OBJECTIVES This study addresses the critical need for accurate summarization in radiology by comparing various Large Language Model (LLM)-based approaches for automatic summary generation. With the increasing volume of patient information, accurately and concisely conveying radiological findings becomes crucial for effective clinical decision-making. Minor inaccuracies in summaries can lead to significant consequences, highlighting the need for reliable automated summarization tools. METHODS We employed two language models - Text-to-Text Transfer Transformer (T5) and Bidirectional and Auto-Regressive Transformers (BART) - in both fine-tuned and zero-shot learning scenarios and compared them with a Recurrent Neural Network (RNN). Additionally, we conducted a comparative analysis of 100 MRI report summaries, using expert human judgment and criteria such as coherence, relevance, fluency, and consistency, to evaluate the models against the original radiologist summaries. To facilitate this, we compiled a dataset of 15,508 retrospective knee Magnetic Resonance Imaging (MRI) reports from our Radiology Information System (RIS), focusing on the findings section to predict the radiologist's summary. RESULTS The fine-tuned models outperform the neural network and show superior performance in the zero-shot variant. Specifically, the T5 model achieved a Rouge-L score of 0.638. Based on the radiologist readers' study, the summaries produced by this model were found to be very similar to those produced by a radiologist, with about 70% similarity in fluency and consistency between the T5-generated summaries and the original ones. CONCLUSIONS Technological advances, especially in NLP and LLM, hold great promise for improving and streamlining the summarization of radiological findings, thus providing valuable assistance to radiologists in their work.
Collapse
Affiliation(s)
| | | | | | - Antonio Luna
- MRI Unit, Radiology Department, Health Time, Jaén, Spain.
| |
Collapse
|
5
|
Liu A, Guo Y, Yong JH, Xu F. Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2657-2669. [PMID: 38437149 DOI: 10.1109/tmi.2024.3372638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
The automatic generation of accurate radiology reports is of great clinical importance and has drawn growing research interest. However, it is still a challenging task due to the imbalance between normal and abnormal descriptions and the multi-sentence and multi-topic nature of radiology reports. These features result in significant challenges to generating accurate descriptions for medical images, especially the important abnormal findings. Previous methods to tackle these problems rely heavily on extra manual annotations, which are expensive to acquire. We propose a multi-grained report generation framework incorporating sentence-level image-sentence contrastive learning, which does not require any extra labeling but effectively learns knowledge from the image-report pairs. We first introduce contrastive learning as an auxiliary task for image feature learning. Different from previous contrastive methods, we exploit the multi-topic nature of imaging reports and perform fine-grained contrastive learning by extracting sentence topics and contents and contrasting between sentence contents and refined image contents guided by sentence topics. This forces the model to learn distinct abnormal image features for each specific topic. During generation, we use two decoders to first generate coarse sentence topics and then the fine-grained text of each sentence. We directly supervise the intermediate topics using sentence topics learned by our contrastive objective. This strengthens the generation constraint and enables independent fine-tuning of the decoders using reinforcement learning, which further boosts model performance. Experiments on two large-scale datasets MIMIC-CXR and IU-Xray demonstrate that our approach outperforms existing state-of-the-art methods, evaluated by both language generation metrics and clinical accuracy.
Collapse
|
6
|
Rückert J, Bloch L, Brüngel R, Idrissi-Yaghir A, Schäfer H, Schmidt CS, Koitka S, Pelka O, Abacha AB, G Seco de Herrera A, Müller H, Horn PA, Nensa F, Friedrich CM. ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset. Sci Data 2024; 11:688. [PMID: 38926396 PMCID: PMC11208523 DOI: 10.1038/s41597-024-03496-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 06/10/2024] [Indexed: 06/28/2024] Open
Abstract
Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning.
Collapse
Affiliation(s)
- Johannes Rückert
- Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany
| | - Louise Bloch
- Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany
- Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Essen, Germany
- Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany
| | - Raphael Brüngel
- Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany
- Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Essen, Germany
- Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany
| | - Ahmad Idrissi-Yaghir
- Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany
- Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Essen, Germany
| | - Henning Schäfer
- Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany
- Institute for Transfusion Medicine, University Hospital Essen, Essen, Germany
| | - Cynthia S Schmidt
- Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany
- Institute for Transfusion Medicine, University Hospital Essen, Essen, Germany
| | - Sven Koitka
- Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany
- Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany
| | - Obioma Pelka
- Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany
- Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Essen, Germany
- Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany
| | | | | | - Henning Müller
- University of Applied Sciences Western Switzerland (HES-SO), Delémont, Switzerland
| | - Peter A Horn
- Institute for Transfusion Medicine, University Hospital Essen, Essen, Germany
| | - Felix Nensa
- Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany
- Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany
| | - Christoph M Friedrich
- Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany.
- Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Essen, Germany.
| |
Collapse
|
7
|
Luo X, Deng Z, Yang B, Luo MY. Pre-trained language models in medicine: A survey. Artif Intell Med 2024; 154:102904. [PMID: 38917600 DOI: 10.1016/j.artmed.2024.102904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/15/2024] [Accepted: 06/03/2024] [Indexed: 06/27/2024]
Abstract
With the rapid progress in Natural Language Processing (NLP), Pre-trained Language Models (PLM) such as BERT, BioBERT, and ChatGPT have shown great potential in various medical NLP tasks. This paper surveys the cutting-edge achievements in applying PLMs to various medical NLP tasks. Specifically, we first brief PLMS and outline the research of PLMs in medicine. Next, we categorise and discuss the types of tasks in medical NLP, covering text summarisation, question-answering, machine translation, sentiment analysis, named entity recognition, information extraction, medical education, relation extraction, and text mining. For each type of task, we first provide an overview of the basic concepts, the main methodologies, the advantages of applying PLMs, the basic steps of applying PLMs application, the datasets for training and testing, and the metrics for task evaluation. Subsequently, a summary of recent important research findings is presented, analysing their motivations, strengths vs weaknesses, similarities vs differences, and discussing potential limitations. Also, we assess the quality and influence of the research reviewed in this paper by comparing the citation count of the papers reviewed and the reputation and impact of the conferences and journals where they are published. Through these indicators, we further identify the most concerned research topics currently. Finally, we look forward to future research directions, including enhancing models' reliability, explainability, and fairness, to promote the application of PLMs in clinical practice. In addition, this survey also collect some download links of some model codes and the relevant datasets, which are valuable references for researchers applying NLP techniques in medicine and medical professionals seeking to enhance their expertise and healthcare service through AI technology.
Collapse
Affiliation(s)
- Xudong Luo
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Zhiqi Deng
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Binxia Yang
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Michael Y Luo
- Emmanuel College, Cambridge University, Cambridge, CB2 3AP, UK.
| |
Collapse
|
8
|
Shahzadi I, Madni TM, Janjua UI, Batool G, Naz B, Ali MQ. CSAMDT: Conditional Self Attention Memory-Driven Transformers for Radiology Report Generation from Chest X-Ray. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01126-6. [PMID: 38831189 DOI: 10.1007/s10278-024-01126-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/21/2024] [Accepted: 04/11/2024] [Indexed: 06/05/2024]
Abstract
A radiology report plays a crucial role in guiding patient treatment, but writing these reports is a time-consuming task that demands a radiologist's expertise. In response to this challenge, researchers in the subfields of artificial intelligence for healthcare have explored techniques for automatically interpreting radiographic images and generating free-text reports, while much of the research on medical report creation has focused on image captioning methods without adequately addressing particular report aspects. This study introduces a Conditional Self Attention Memory-Driven Transformer model for generating radiological reports. The model operates in two phases: initially, a multi-label classification model, utilizing ResNet152 v2 as an encoder, is employed for feature extraction and multiple disease diagnosis. In the second phase, the Conditional Self Attention Memory-Driven Transformer serves as a decoder, utilizing self-attention memory-driven transformers to generate text reports. Comprehensive experimentation was conducted to compare existing and proposed techniques based on Bilingual Evaluation Understudy (BLEU) scores ranging from 1 to 4. The model outperforms the other state-of-the-art techniques by increasing the BLEU 1 (0.475), BLEU 2 (0.358), BLEU 3 (0.229), and BLEU 4 (0.165) respectively. This study's findings can alleviate radiologists' workloads and enhance clinical workflows by introducing an autonomous radiological report generation system.
Collapse
Affiliation(s)
- Iqra Shahzadi
- Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan
| | - Tahir Mustafa Madni
- Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan.
| | - Uzair Iqbal Janjua
- Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan
| | - Ghanwa Batool
- Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan
| | - Bushra Naz
- Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan
| | - Muhammad Qasim Ali
- Rehabilitation Department, Yusra Medical and Dental College, Rawalpindi, Pakistan
| |
Collapse
|
9
|
Ertürk ŞM, Toprak T, Cömert RG, Candemir C, Cingöz E, Akyol Sari ZN, Ercan CC, Düvek E, Ersoy B, Karapinar E, Tunaci A, Selver MA. Thorax computed tomography (CTX) guided ground truth annotation of CHEST radiographs (CXR) for improved classification and detection of COVID-19. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2024; 40:e3823. [PMID: 38587026 DOI: 10.1002/cnm.3823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 09/20/2023] [Accepted: 03/27/2024] [Indexed: 04/09/2024]
Abstract
Several data sets have been collected and various artificial intelligence models have been developed for COVID-19 classification and detection from both chest radiography (CXR) and thorax computed tomography (CTX) images. However, the pitfalls and shortcomings of these systems significantly limit their clinical use. In this respect, improving the weaknesses of advanced models can be very effective besides developing new ones. The inability to diagnose ground-glass opacities by conventional CXR has limited the use of this modality in the diagnostic work-up of COVID-19. In our study, we investigated whether we could increase the diagnostic efficiency by collecting a novel CXR data set, which contains pneumonic regions that are not visible to the experts and can only be annotated under CTX guidance. We develop an ensemble methodology of well-established deep CXR models for this new data set and develop a machine learning-based non-maximum suppression strategy to boost the performance for challenging CXR images. CTX and CXR images of 379 patients who applied to our hospital with suspected COVID-19 were evaluated with consensus by seven radiologists. Among these, CXR images of 161 patients who also have had a CTX examination on the same day or until the day before or after and whose CTX findings are compatible with COVID-19 pneumonia, are selected for annotating. CTX images are arranged in the main section passing through the anterior, middle, and posterior according to the sagittal plane with the reformed maximum intensity projection (MIP) method in the coronal plane. Based on the analysis of coronal MIP reconstructed CTX images, the regions corresponding to the pneumonia foci are annotated manually in CXR images. Radiologically classified posterior to anterior (PA) CXR of 218 patients with negative thorax CTX imaging were classified as COVID-19 pneumonia negative group. Accordingly, we have collected a new data set using anonymized CXR (JPEG) and CT (DICOM) images, where the PA CXRs contain pneumonic regions that are hidden or not easily recognized and annotated under CTX guidance. The reference finding was the presence of pneumonic infiltration consistent with COVID-19 on chest CTX examination. COVID-Net, a specially designed convolutional neural network, was used to detect cases of COVID-19 among CXRs. Diagnostic performances were evaluated by ROC analysis by applying six COVID-Net variants (COVIDNet-CXR3-A, -B, -C/COVIDNet-CXR4-A, -B, -C) to the defined data set and combining these models in various ways via ensemble strategies. Finally, a convex optimization strategy is carried out to find the outperforming weighted ensemble of individual models. The mean age of 161 patients with pneumonia was 49.31 ± 15.12, and the median age was 48 years. The mean age of 218 patients without signs of pneumonia in thorax CTX examination was 40.04 ± 14.46, and the median was 38. When working with different combinations of COVID-Net's six variants, the area under the curve (AUC) using the ensemble COVID-Net CXR 4A-4B-3C was .78, sensitivity 67%, specificity 95%; COVID-Net CXR 4a-3b-3c was .79, sensitivity 69% and specificity 94%. When diverse and complementary COVID-Net models are used together through an ensemble, it has been determined that the AUC values are close to other studies, and the specificity is significantly higher than other studies in the literature.
Collapse
Affiliation(s)
- Şükrü Mehmet Ertürk
- Radiodiagnostics Department, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Tuğçe Toprak
- Institute of Natural and Applied Sciences, Dokuz Eylul University, İzmir, Turkey
| | - Rana Günöz Cömert
- Radiodiagnostics Department, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Cemre Candemir
- International Computer Institute, Ege University, Bornova, Turkey
| | - Eda Cingöz
- Radiodiagnostics Department, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Zeynep Nur Akyol Sari
- Radiodiagnostics Department, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Celal Caner Ercan
- Radiodiagnostics Department, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Esin Düvek
- Radiodiagnostics Department, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Berke Ersoy
- Radiodiagnostics Department, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Edanur Karapinar
- Radiodiagnostics Department, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - Atadan Tunaci
- Radiodiagnostics Department, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Turkey
| | - M Alper Selver
- Electrical and Electronics Engineering Department, Dokuz Eylul University, Faculty of Engineering, İzmir, Turkey
- Izmir Health Technologies Development and Accelerator (BioIzmir), Dokuz Eylul University, İzmir, Turkey
| |
Collapse
|
10
|
Li D, Huo H, Jiao S, Sun X, Chen S. Automated thorax disease diagnosis using multi-branch residual attention network. Sci Rep 2024; 14:11865. [PMID: 38789592 PMCID: PMC11126636 DOI: 10.1038/s41598-024-62813-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 05/21/2024] [Indexed: 05/26/2024] Open
Abstract
Chest X-ray (CXR) is an extensively utilized radiological modality for supporting the diagnosis of chest diseases. However, existing research approaches suffer from limitations in effectively integrating multi-scale CXR image features and are also hindered by imbalanced datasets. Therefore, there is a pressing need for further advancement in computer-aided diagnosis (CAD) of thoracic diseases. To tackle these challenges, we propose a multi-branch residual attention network (MBRANet) for thoracic disease diagnosis. MBRANet comprises three components. Firstly, to address the issue of inadequate extraction of spatial and positional information by the convolutional layer, a novel residual structure incorporating a coordinate attention (CA) module is proposed to extract features at multiple scales. Next, based on the concept of a Feature Pyramid Network (FPN), we perform multi-scale feature fusion in the following manner. Thirdly, we propose a novel Multi-Branch Feature Classifier (MFC) approach, which leverages the class-specific residual attention (CSRA) module for classification instead of relying solely on the fully connected layer. In addition, the designed BCEWithLabelSmoothing loss function improves the generalization ability and mitigates the problem of class imbalance by introducing a smoothing factor. We evaluated MBRANet on the ChestX-Ray14, CheXpert, MIMIC-CXR, and IU X-Ray datasets and achieved average AUCs of 0.841, 0.895, 0.805, and 0.745, respectively. Our method outperformed state-of-the-art baselines on these benchmark datasets.
Collapse
Affiliation(s)
- Dongfang Li
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471000, Henan, China
| | - Hua Huo
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471000, Henan, China.
| | - Shupei Jiao
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471000, Henan, China
| | - Xiaowei Sun
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471000, Henan, China
| | - Shuya Chen
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471000, Henan, China
| |
Collapse
|
11
|
Divya P, Sravani Y, Vishnu C, Mohan CK, Chen YW. Memory Guided Transformer With Spatio-Semantic Visual Extractor for Medical Report Generation. IEEE J Biomed Health Inform 2024; 28:3079-3089. [PMID: 38421843 DOI: 10.1109/jbhi.2024.3371894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Medicalimaging-based report writing for effective diagnosis in radiology is time-consuming and can be error-prone by inexperienced radiologists. Automatic reporting helps radiologists avoid missed diagnoses and saves valuable time. Recently, transformer-based medical report generation has become prominent in capturing long-term dependencies of sequential data with its attention mechanism. Nevertheless, input features obtained from traditional visual extractor of conventional transformers do not capture spatial and semantic information of an image. So, the transformer is unable to capture fine-grained details and may not produce detailed descriptive reports of radiology images. Therefore, we propose a spatio-semantic visual extractor (SSVE) to capture multi-scale spatial and semantic information from radiology images. Here, we incorporate two types of networks in ResNet 101 backbone architecture, i.e. (i) deformable network at the intermediate layer of ResNet 101 that utilizes deformable convolutions in order to obtain spatially invariant features, and (ii) semantic network at the final layer of backbone architecture which uses dilated convolutions to extract rich multi-scale semantic information. Further, these network representations are fused to encode fine-grained details of radiology images. The performance of our proposed model outperforms existing works on two radiology report datasets, i.e., IU X-ray and MIMIC-CXR.
Collapse
|
12
|
Veras Magalhães G, L. de S. Santos R, H. S. Vogado L, Cardoso de Paiva A, de Alcântara dos Santos Neto P. XRaySwinGen: Automatic medical reporting for X-ray exams with multimodal model. Heliyon 2024; 10:e27516. [PMID: 38560155 PMCID: PMC10979158 DOI: 10.1016/j.heliyon.2024.e27516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 04/04/2024] Open
Abstract
The importance of radiology in modern medicine is acknowledged for its non-invasive diagnostic capabilities, yet the manual formulation of unstructured medical reports poses time constraints and error risks. This study addresses the common limitation of Artificial Intelligence applications in medical image captioning, which typically focus on classification problems, lacking detailed information about the patient's condition. Despite advancements in AI-generated medical reports that incorporate descriptive details from X-ray images, which are essential for comprehensive reports, the challenge persists. The proposed solution involves a multimodal model utilizing Computer Vision for image representation and Natural Language Processing for textual report generation. A notable contribution is the innovative use of the Swin Transformer as the image encoder, enabling hierarchical mapping and enhanced model perception without a surge in parameters or computational costs. The model incorporates GPT-2 as the textual decoder, integrating cross-attention layers and bilingual training with datasets in Portuguese PT-BR and English. Promising results are noted in the proposed database with ROUGE-L 0.748, METEOR 0.741, and NIH CHEST X-ray with ROUGE-L 0.404 and METEOR 0.393.
Collapse
Affiliation(s)
| | | | - Luis H. S. Vogado
- Departamento de Computação, Universidade Federal do Piauí, Teresina, Brazil
| | | | | |
Collapse
|
13
|
Sun S, Mei Z, Li X, Tang T, Su Z, Wu Y. A label information fused medical image report generation framework. Artif Intell Med 2024; 150:102823. [PMID: 38553163 DOI: 10.1016/j.artmed.2024.102823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 02/21/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
Medical imaging is an important tool for clinical diagnosis. Nevertheless, it is very time-consuming and error-prone for physicians to prepare imaging diagnosis reports. Therefore, it is necessary to develop some methods to generate medical imaging reports automatically. Currently, the task of medical imaging report generation is challenging in at least two aspects: (1) medical images are very similar to each other. The differences between normal and abnormal images and between different abnormal images are usually trivial; (2) unrelated or incorrect keywords describing abnormal findings in the generated reports lead to mis-communications. In this paper, we propose a medical image report generation framework composed of four modules, including a Transformer encoder, a MIX-MLP multi-label classification network, a co-attention mechanism (CAM) based semantic and visual feature fusion, and a hierarchical LSTM decoder. The Transformer encoder can be used to learn long-range dependencies between images and labels, effectively extract visual and semantic features of images, and establish long-term dependent relationships between visual and semantic information to accurately extract abnormal features from images. The MIX-MLP multi-label classification network, the co-attention mechanism and the hierarchical LSTM network can better identify abnormalities, achieving visual and text alignment fusion and multi-label diagnostic classification to better facilitate report generation. The results of the experiments performed on two widely used radiology report datasets, IU X-RAY and MIMIC-CXR, show that our proposed framework outperforms current report generation models in terms of both natural linguistic generation metrics and clinical efficacy assessment metrics. The code of this work is available online at https://github.com/watersunhznu/LIFMRG.
Collapse
Affiliation(s)
- Shuifa Sun
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China; Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China
| | - Zhoujunsen Mei
- Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, China
| | - Xiaolong Li
- Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Economics and Management, China Three Gorges University, Yichang, 443002, Hubei, China
| | - Tinglong Tang
- Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, China
| | - Zhanglin Su
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China
| | - Yirong Wu
- Institute of Advanced Studies in Humanities and Social Sciences, Beijing Normal University, Zhuhai, 519087, Guangdong, China.
| |
Collapse
|
14
|
Chen J, Pan R. Medical report generation based on multimodal federated learning. Comput Med Imaging Graph 2024; 113:102342. [PMID: 38309174 DOI: 10.1016/j.compmedimag.2024.102342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 01/20/2024] [Accepted: 01/20/2024] [Indexed: 02/05/2024]
Abstract
Medical image reports are integral to clinical decision-making and patient management. Despite their importance, the confidentiality and private nature of medical data pose significant issues for the sharing and analysis of medical image data. This paper addresses these concerns by introducing a multimodal federated learning-based methodology for medical image reporting. This methodology harnesses distributed computing for co-training models across various medical institutions. Under the federated learning framework, every medical institution is capable of training the model locally and aggregating the updated model parameters to curate a top-tier medical image report model. Initially, we advocate for an architecture facilitating multimodal federated learning, including model creation, parameter consolidation, and algorithm enhancement steps. In the model selection phase, we introduce a deep learning-based strategy that utilizes multimodal data for training to produce medical image reports. In the parameter aggregation phase, the federal average algorithm is applied to amalgamate model parameters trained by each institution, which leads to a comprehensive global model. In addition, we introduce an evidence-based optimization algorithm built upon the federal average algorithm. The efficacy of the proposed architecture and scheme is showcased through a series of experiments. Our experimental results validate the proficiency of the proposed multimodal federated learning approach in generating medical image reports. Compared to conventional centralized learning methods, our proposal not only enhances the protection of patient confidentiality but also enriches the accuracy and overall quality of medical image reports. Through this research, we offer a novel solution for the privacy issues linked with the sharing and analyzing of medical data. Expected to assume a crucial role in medical image report generation and other medical applications, the multimodal federated learning method is set to deliver more precise, efficient, and privacy-secured medical services for healthcare professionals and patients.
Collapse
Affiliation(s)
- Jieying Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
| | - Rong Pan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
15
|
Van Veen D, Van Uden C, Blankemeier L, Delbrouck JB, Aali A, Bluethgen C, Pareek A, Polacin M, Reis EP, Seehofnerová A, Rohatgi N, Hosamani P, Collins W, Ahuja N, Langlotz CP, Hom J, Gatidis S, Pauly J, Chaudhari AS. Adapted large language models can outperform medical experts in clinical text summarization. Nat Med 2024; 30:1134-1142. [PMID: 38413730 DOI: 10.1038/s41591-024-02855-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 02/02/2024] [Indexed: 02/29/2024]
Abstract
Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP) tasks, their effectiveness on a diverse range of clinical summarization tasks remains unproven. Here we applied adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes and doctor-patient dialogue. Quantitative assessments with syntactic, semantic and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care.
Collapse
Affiliation(s)
- Dave Van Veen
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA.
| | - Cara Van Uden
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Louis Blankemeier
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA
| | - Jean-Benoit Delbrouck
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA
| | - Asad Aali
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
| | - Christian Bluethgen
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA
- Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Anuj Pareek
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA
- Copenhagen University Hospital, Copenhagen, Denmark
| | - Malgorzata Polacin
- Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Eduardo Pontes Reis
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA
- Albert Einstein Israelite Hospital, São Paulo, Brazil
| | - Anna Seehofnerová
- Department of Medicine, Stanford University, Stanford, CA, USA
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - Nidhi Rohatgi
- Department of Medicine, Stanford University, Stanford, CA, USA
- Department of Neurosurgery, Stanford University, Stanford, CA, USA
| | - Poonam Hosamani
- Department of Medicine, Stanford University, Stanford, CA, USA
| | - William Collins
- Department of Medicine, Stanford University, Stanford, CA, USA
| | - Neera Ahuja
- Department of Medicine, Stanford University, Stanford, CA, USA
| | - Curtis P Langlotz
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA
- Department of Medicine, Stanford University, Stanford, CA, USA
- Department of Radiology, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Jason Hom
- Department of Medicine, Stanford University, Stanford, CA, USA
| | - Sergios Gatidis
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - John Pauly
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
| | - Akshay S Chaudhari
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Palo Alto, CA, USA
- Department of Radiology, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Stanford Cardiovascular Institute, Stanford, CA, USA
| |
Collapse
|
16
|
Thiam P, Kloth C, Blaich D, Liebold A, Beer M, Kestler HA. Segmentation-based cardiomegaly detection based on semi-supervised estimation of cardiothoracic ratio. Sci Rep 2024; 14:5695. [PMID: 38459104 PMCID: PMC10923822 DOI: 10.1038/s41598-024-56079-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 03/01/2024] [Indexed: 03/10/2024] Open
Abstract
The successful integration of neural networks in a clinical setting is still uncommon despite major successes achieved by artificial intelligence in other domains. This is mainly due to the black box characteristic of most optimized models and the undetermined generalization ability of the trained architectures. The current work tackles both issues in the radiology domain by focusing on developing an effective and interpretable cardiomegaly detection architecture based on segmentation models. The architecture consists of two distinct neural networks performing the segmentation of both cardiac and thoracic areas of a radiograph. The respective segmentation outputs are subsequently used to estimate the cardiothoracic ratio, and the corresponding radiograph is classified as a case of cardiomegaly based on a given threshold. Due to the scarcity of pixel-level labeled chest radiographs, both segmentation models are optimized in a semi-supervised manner. This results in a significant reduction in the costs of manual annotation. The resulting segmentation outputs significantly improve the interpretability of the architecture's final classification results. The generalization ability of the architecture is assessed in a cross-domain setting. The assessment shows the effectiveness of the semi-supervised optimization of the segmentation models and the robustness of the ensuing classification architecture.
Collapse
Affiliation(s)
- Patrick Thiam
- Institute of Medical Systems Biology, Albert-Einstein-Allee 11, 89081, Ulm, Germany
| | - Christopher Kloth
- Department of Diagnostic and Interventional Radiology, Ulm University Medical Center, Albert-Einstein-Allee 23, 89081, Ulm, Germany
| | - Daniel Blaich
- Department of Diagnostic and Interventional Radiology, Ulm University Medical Center, Albert-Einstein-Allee 23, 89081, Ulm, Germany
| | - Andreas Liebold
- Department of Cardiothoraxic and Vascular Surgery, Ulm University Medical Center, Albert-Einstein-Allee 23, 89081, Ulm, Germany
| | - Meinrad Beer
- Department of Diagnostic and Interventional Radiology, Ulm University Medical Center, Albert-Einstein-Allee 23, 89081, Ulm, Germany
| | - Hans A Kestler
- Institute of Medical Systems Biology, Albert-Einstein-Allee 11, 89081, Ulm, Germany.
| |
Collapse
|
17
|
C Pereira S, Mendonça AM, Campilho A, Sousa P, Teixeira Lopes C. Automated image label extraction from radiology reports - A review. Artif Intell Med 2024; 149:102814. [PMID: 38462277 DOI: 10.1016/j.artmed.2024.102814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 11/29/2023] [Accepted: 02/12/2024] [Indexed: 03/12/2024]
Abstract
Machine Learning models need large amounts of annotated data for training. In the field of medical imaging, labeled data is especially difficult to obtain because the annotations have to be performed by qualified physicians. Natural Language Processing (NLP) tools can be applied to radiology reports to extract labels for medical images automatically. Compared to manual labeling, this approach requires smaller annotation efforts and can therefore facilitate the creation of labeled medical image data sets. In this article, we summarize the literature on this topic spanning from 2013 to 2023, starting with a meta-analysis of the included articles, followed by a qualitative and quantitative systematization of the results. Overall, we found four types of studies on the extraction of labels from radiology reports: those describing systems based on symbolic NLP, statistical NLP, neural NLP, and those describing systems combining or comparing two or more of the latter. Despite the large variety of existing approaches, there is still room for further improvement. This work can contribute to the development of new techniques or the improvement of existing ones.
Collapse
Affiliation(s)
- Sofia C Pereira
- Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), Portugal; Faculty of Engineering of the University of Porto, Portugal.
| | - Ana Maria Mendonça
- Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), Portugal; Faculty of Engineering of the University of Porto, Portugal.
| | - Aurélio Campilho
- Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), Portugal; Faculty of Engineering of the University of Porto, Portugal.
| | - Pedro Sousa
- Hospital Center of Vila Nova de Gaia/Espinho, Portugal.
| | - Carla Teixeira Lopes
- Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), Portugal; Faculty of Engineering of the University of Porto, Portugal.
| |
Collapse
|
18
|
Kumari S, Singh P. Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives. Comput Biol Med 2024; 170:107912. [PMID: 38219643 DOI: 10.1016/j.compbiomed.2023.107912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 11/02/2023] [Accepted: 12/24/2023] [Indexed: 01/16/2024]
Abstract
Deep learning has demonstrated remarkable performance across various tasks in medical imaging. However, these approaches primarily focus on supervised learning, assuming that the training and testing data are drawn from the same distribution. Unfortunately, this assumption may not always hold true in practice. To address these issues, unsupervised domain adaptation (UDA) techniques have been developed to transfer knowledge from a labeled domain to a related but unlabeled domain. In recent years, significant advancements have been made in UDA, resulting in a wide range of methodologies, including feature alignment, image translation, self-supervision, and disentangled representation methods, among others. In this paper, we provide a comprehensive literature review of recent deep UDA approaches in medical imaging from a technical perspective. Specifically, we categorize current UDA research in medical imaging into six groups and further divide them into finer subcategories based on the different tasks they perform. We also discuss the respective datasets used in the studies to assess the divergence between the different domains. Finally, we discuss emerging areas and provide insights and discussions on future research directions to conclude this survey.
Collapse
Affiliation(s)
- Suruchi Kumari
- Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, India.
| | - Pravendra Singh
- Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, India.
| |
Collapse
|
19
|
Xing S, Fang J, Ju Z, Guo Z, Wang Y. [Research on automatic generation of multimodal medical image reports based on memory driven]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2024; 41:60-69. [PMID: 38403605 PMCID: PMC10894734 DOI: 10.7507/1001-5515.202304001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 11/16/2023] [Indexed: 02/27/2024]
Abstract
The task of automatic generation of medical image reports faces various challenges, such as diverse types of diseases and a lack of professionalism and fluency in report descriptions. To address these issues, this paper proposes a multimodal medical imaging report based on memory drive method (mMIRmd). Firstly, a hierarchical vision transformer using shifted windows (Swin-Transformer) is utilized to extract multi-perspective visual features of patient medical images, and semantic features of textual medical history information are extracted using bidirectional encoder representations from transformers (BERT). Subsequently, the visual and semantic features are integrated to enhance the model's ability to recognize different disease types. Furthermore, a medical text pre-trained word vector dictionary is employed to encode labels of visual features, thereby enhancing the professionalism of the generated reports. Finally, a memory driven module is introduced in the decoder, addressing long-distance dependencies in medical image data. This study is validated on the chest X-ray dataset collected at Indiana University (IU X-Ray) and the medical information mart for intensive care chest x-ray (MIMIC-CXR) released by the Massachusetts Institute of Technology and Massachusetts General Hospital. Experimental results indicate that the proposed method can better focus on the affected areas, improve the accuracy and fluency of report generation, and assist radiologists in quickly completing medical image report writing.
Collapse
Affiliation(s)
- Suxia Xing
- School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China
| | - Junze Fang
- School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China
| | - Zihan Ju
- School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China
| | - Zheng Guo
- School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China
| | - Yu Wang
- School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China
| |
Collapse
|
20
|
Ji J, Hou Y, Chen X, Pan Y, Xiang Y. Vision-Language Model for Generating Textual Descriptions From Clinical Images: Model Development and Validation Study. JMIR Form Res 2024; 8:e32690. [PMID: 38329788 PMCID: PMC10884898 DOI: 10.2196/32690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/12/2023] [Accepted: 01/10/2024] [Indexed: 02/09/2024] Open
Abstract
BACKGROUND The automatic generation of radiology reports, which seeks to create a free-text description from a clinical radiograph, is emerging as a pivotal intersection between clinical medicine and artificial intelligence. Leveraging natural language processing technologies can accelerate report creation, enhancing health care quality and standardization. However, most existing studies have not yet fully tapped into the combined potential of advanced language and vision models. OBJECTIVE The purpose of this study was to explore the integration of pretrained vision-language models into radiology report generation. This would enable the vision-language model to automatically convert clinical images into high-quality textual reports. METHODS In our research, we introduced a radiology report generation model named ClinicalBLIP, building upon the foundational InstructBLIP model and refining it using clinical image-to-text data sets. A multistage fine-tuning approach via low-rank adaptation was proposed to deepen the semantic comprehension of the visual encoder and the large language model for clinical imagery. Furthermore, prior knowledge was integrated through prompt learning to enhance the precision of the reports generated. Experiments were conducted on both the IU X-RAY and MIMIC-CXR data sets, with ClinicalBLIP compared to several leading methods. RESULTS Experimental results revealed that ClinicalBLIP obtained superior scores of 0.570/0.365 and 0.534/0.313 on the IU X-RAY/MIMIC-CXR test sets for the Metric for Evaluation of Translation with Explicit Ordering (METEOR) and the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluations, respectively. This performance notably surpasses that of existing state-of-the-art methods. Further evaluations confirmed the effectiveness of the multistage fine-tuning and the integration of prior information, leading to substantial improvements. CONCLUSIONS The proposed ClinicalBLIP model demonstrated robustness and effectiveness in enhancing clinical radiology report generation, suggesting significant promise for real-world clinical applications.
Collapse
Affiliation(s)
- Jia Ji
- Shenzhen Institute of Information Technology, Shenzhen, China
| | | | - Xinyu Chen
- Harbin Institute of Technology, Shenzhen, China
| | | | | |
Collapse
|
21
|
Zheng F, Li M, Wang Y, Yu W, Wang R, Chen Z, Xiao N, Lu Y. Intensive vision-guided network for radiology report generation. Phys Med Biol 2024; 69:045008. [PMID: 38157546 DOI: 10.1088/1361-6560/ad1995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 12/29/2023] [Indexed: 01/03/2024]
Abstract
Objective.Automatic radiology report generation is booming due to its huge application potential for the healthcare industry. However, existing computer vision and natural language processing approaches to tackle this problem are limited in two aspects. First, when extracting image features, most of them neglect multi-view reasoning in vision and model single-view structure of medical images, such as space-view or channel-view. However, clinicians rely on multi-view imaging information for comprehensive judgment in daily clinical diagnosis. Second, when generating reports, they overlook context reasoning with multi-modal information and focus on pure textual optimization utilizing retrieval-based methods. We aim to address these two issues by proposing a model that better simulates clinicians perspectives and generates more accurate reports.Approach.Given the above limitation in feature extraction, we propose a globally-intensive attention (GIA) module in the medical image encoder to simulate and integrate multi-view vision perception. GIA aims to learn three types of vision perception: depth view, space view, and pixel view. On the other hand, to address the above problem in report generation, we explore how to involve multi-modal signals to generate precisely matched reports, i.e. how to integrate previously predicted words with region-aware visual content in next word prediction. Specifically, we design a visual knowledge-guided decoder (VKGD), which can adaptively consider how much the model needs to rely on visual information and previously predicted text to assist next word prediction. Hence, our final intensive vision-guided network framework includes a GIA-guided visual encoder and the VKGD.Main results.Experiments on two commonly-used datasets IU X-RAY and MIMIC-CXR demonstrate the superior ability of our method compared with other state-of-the-art approaches.Significance.Our model explores the potential of simulating clinicians perspectives and automatically generates more accurate reports, which promotes the exploration of medical automation and intelligence.
Collapse
Affiliation(s)
- Fudan Zheng
- Sun Yat-Sen University, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
| | - Mengfei Li
- Sun Yat-Sen University, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
| | - Ying Wang
- National SuperComputer Center in Guangzhou, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
| | - Weijiang Yu
- Huawei Technologies Co., Ltd, Huawei Industrial Park, Bantian, Longgang District, Shenzhen, 518129, People's Republic of China
| | - Ruixuan Wang
- Sun Yat-Sen University, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
| | - Zhiguang Chen
- Sun Yat-Sen University, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
- National SuperComputer Center in Guangzhou, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
| | - Nong Xiao
- Sun Yat-Sen University, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
- National SuperComputer Center in Guangzhou, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
| | - Yutong Lu
- Sun Yat-Sen University, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
- National SuperComputer Center in Guangzhou, No. 132 Waihuandong Road, Guangzhou Higher Education Mega Center, Guangzhou, 510006, People's Republic of China
| |
Collapse
|
22
|
Zeng X, Liao T, Xu L, Wang Z. AERMNet: Attention-enhanced relational memory network for medical image report generation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107979. [PMID: 38113805 DOI: 10.1016/j.cmpb.2023.107979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 11/26/2023] [Accepted: 12/12/2023] [Indexed: 12/21/2023]
Abstract
BACKGROUND AND OBJECTIVES The automatic generation of medical image diagnostic reports can assist doctors in reducing their workload and improving the efficiency and accuracy of diagnosis. However, among the most existing report generation models, there are problems that the weak correlation between generated words and the lack of contextual information in the report generation process. METHODS To address the above problems, we propose an Attention-Enhanced Relational Memory Network (AERMNet) model, where the relational memory module is continuously updated by the words generated in the previous time step to strengthen the correlation between words in generated medical image report. And the double LSTM with interaction module reduces the loss of context information and makes full use of feature information. Thus, more accurate disease information can be generated by AERMNet for medical image reports. RESULTS Experimental results on four medical datasets Fetal heart (FH), Ultrasound, IU X-Ray and MIMIC-CXR, show that our proposed method outperforms some of the previous models with respect to language generation metrics (Cider improving by 2.4% on FH, Bleu1 improving by 2.4% on Ultrasound, Cider improving by 16.4% on IU X-Ray, Bleu2 improving by 9.7% on MIMIC-CXR). CONCLUSIONS This work promotes the development of medical image report generation and expands the prospects of computer-aided diagnosis applications. Our code is released at https://github.com/llttxx/AERMNET.
Collapse
Affiliation(s)
- Xianhua Zeng
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Tianxing Liao
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Liming Xu
- College of Computer Science, China West Normal University, Nanchong, Sichuan, 637000, China
| | - Zhiqiang Wang
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| |
Collapse
|
23
|
Shao L, Chen B, Zhang Z, Zhang Z, Chen X. Artificial intelligence generated content (AIGC) in medicine: A narrative review. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:1672-1711. [PMID: 38303483 DOI: 10.3934/mbe.2024073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Recently, artificial intelligence generated content (AIGC) has been receiving increased attention and is growing exponentially. AIGC is generated based on the intentional information extracted from human-provided instructions by generative artificial intelligence (AI) models. AIGC quickly and automatically generates large amounts of high-quality content. Currently, there is a shortage of medical resources and complex medical procedures in medicine. Due to its characteristics, AIGC can help alleviate these problems. As a result, the application of AIGC in medicine has gained increased attention in recent years. Therefore, this paper provides a comprehensive review on the recent state of studies involving AIGC in medicine. First, we present an overview of AIGC. Furthermore, based on recent studies, the application of AIGC in medicine is reviewed from two aspects: medical image processing and medical text generation. The basic generative AI models, tasks, target organs, datasets and contribution of studies are considered and summarized. Finally, we also discuss the limitations and challenges faced by AIGC and propose possible solutions with relevant studies. We hope this review can help readers understand the potential of AIGC in medicine and obtain some innovative ideas in this field.
Collapse
Affiliation(s)
- Liangjing Shao
- Academy for Engineering & Technology, Fudan University, Shanghai 200433, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| | - Benshuang Chen
- Academy for Engineering & Technology, Fudan University, Shanghai 200433, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| | - Ziqun Zhang
- Information office, Fudan University, Shanghai 200032, China
| | - Zhen Zhang
- Baoshan Branch of Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200444, China
| | - Xinrong Chen
- Academy for Engineering & Technology, Fudan University, Shanghai 200433, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, Shanghai 200032, China
| |
Collapse
|
24
|
Ouis MY, A Akhloufi M. Deep learning for report generation on chest X-ray images. Comput Med Imaging Graph 2024; 111:102320. [PMID: 38134726 DOI: 10.1016/j.compmedimag.2023.102320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 11/13/2023] [Accepted: 11/29/2023] [Indexed: 12/24/2023]
Abstract
Medical imaging, specifically chest X-ray image analysis, is a crucial component of early disease detection and screening in healthcare. Deep learning techniques, such as convolutional neural networks (CNNs), have emerged as powerful tools for computer-aided diagnosis (CAD) in chest X-ray image analysis. These techniques have shown promising results in automating tasks such as classification, detection, and segmentation of abnormalities in chest X-ray images, with the potential to surpass human radiologists. In this review, we provide an overview of the importance of chest X-ray image analysis, historical developments, impact of deep learning techniques, and availability of labeled databases. We specifically focus on advancements and challenges in radiology report generation using deep learning, highlighting potential future advancements in this area. The use of deep learning for report generation has the potential to reduce the burden on radiologists, improve patient care, and enhance the accuracy and efficiency of chest X-ray image analysis in medical imaging.
Collapse
Affiliation(s)
- Mohammed Yasser Ouis
- Perception, Robotics and Intelligent Machines Lab(PRIME), Department of Computer Science, Université de Moncton, Moncton, NB E1C 3E9, Canada.
| | - Moulay A Akhloufi
- Perception, Robotics and Intelligent Machines Lab(PRIME), Department of Computer Science, Université de Moncton, Moncton, NB E1C 3E9, Canada.
| |
Collapse
|
25
|
Gao D, Kong M, Zhao Y, Huang J, Huang Z, Kuang K, Wu F, Zhu Q. Simulating doctors' thinking logic for chest X-ray report generation via Transformer-based Semantic Query learning. Med Image Anal 2024; 91:102982. [PMID: 37837692 DOI: 10.1016/j.media.2023.102982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 08/20/2023] [Accepted: 09/26/2023] [Indexed: 10/16/2023]
Abstract
Medical report generation can be treated as a process of doctors' observing, understanding, and describing images from different perspectives. Following this process, this paper innovatively proposes a Transformer-based Semantic Query learning paradigm (TranSQ). Briefly, this paradigm is to learn an intention embedding set and make a semantic query to the visual features, generate intent-compliant sentence candidates, and form a coherent report. We apply a bipartite matching mechanism during training to realize the dynamic correspondence between the intention embeddings and the sentences to induct medical concepts into the observation intentions. Experimental results on two major radiology reporting datasets (i.e., IU X-ray and MIMIC-CXR) demonstrate that our model outperforms state-of-the-art models regarding generation effectiveness and clinical efficacy. In addition, comprehensive ablation experiments fully validate the TranSQ model's innovation and interpretation. The code is available at https://github.com/zjukongming/TranSQ.
Collapse
Affiliation(s)
- Danyang Gao
- Computer School, Beijing Information Science and Technology University, Beijing 100005, China
| | - Ming Kong
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| | - Yongrui Zhao
- Computer School, Beijing Information Science and Technology University, Beijing 100005, China
| | - Jing Huang
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| | - Zhengxing Huang
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| | - Kun Kuang
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| | - Fei Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| | - Qiang Zhu
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China.
| |
Collapse
|
26
|
Azad R, Kazerouni A, Heidari M, Aghdam EK, Molaei A, Jia Y, Jose A, Roy R, Merhof D. Advances in medical image analysis with vision Transformers: A comprehensive review. Med Image Anal 2024; 91:103000. [PMID: 37883822 DOI: 10.1016/j.media.2023.103000] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 09/30/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
The remarkable performance of the Transformer architecture in natural language processing has recently also triggered broad interest in Computer Vision. Among other merits, Transformers are witnessed as capable of learning long-range dependencies and spatial correlations, which is a clear advantage over convolutional neural networks (CNNs), which have been the de facto standard in Computer Vision problems so far. Thus, Transformers have become an integral part of modern medical image analysis. In this review, we provide an encyclopedic review of the applications of Transformers in medical imaging. Specifically, we present a systematic and thorough review of relevant recent Transformer literature for different medical image analysis tasks, including classification, segmentation, detection, registration, synthesis, and clinical report generation. For each of these applications, we investigate the novelty, strengths and weaknesses of the different proposed strategies and develop taxonomies highlighting key properties and contributions. Further, if applicable, we outline current benchmarks on different datasets. Finally, we summarize key challenges and discuss different future research directions. In addition, we have provided cited papers with their corresponding implementations in https://github.com/mindflow-institue/Awesome-Transformer.
Collapse
Affiliation(s)
- Reza Azad
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Amirhossein Kazerouni
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Moein Heidari
- School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| | | | - Amirali Molaei
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Yiwei Jia
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Abin Jose
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Rijo Roy
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Dorit Merhof
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany; Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany.
| |
Collapse
|
27
|
Guo B, Liu H, Niu L. Safe physical interaction with cobots: a multi-modal fusion approach for health monitoring. Front Neurorobot 2023; 17:1265936. [PMID: 38111712 PMCID: PMC10725971 DOI: 10.3389/fnbot.2023.1265936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/06/2023] [Indexed: 12/20/2023] Open
Abstract
Health monitoring is a critical aspect of personalized healthcare, enabling early detection, and intervention for various medical conditions. The emergence of cloud-based robot-assisted systems has opened new possibilities for efficient and remote health monitoring. In this paper, we present a Transformer-based Multi-modal Fusion approach for health monitoring, focusing on the effects of cognitive workload, assessment of cognitive workload in human-machine collaboration, and acceptability in human-machine interactions. Additionally, we investigate biomechanical strain measurement and evaluation, utilizing wearable devices to assess biomechanical risks in working environments. Furthermore, we study muscle fatigue assessment during collaborative tasks and propose methods for improving safe physical interaction with cobots. Our approach integrates multi-modal data, including visual, audio, and sensor- based inputs, enabling a holistic assessment of an individual's health status. The core of our method lies in leveraging the powerful Transformer model, known for its ability to capture complex relationships in sequential data. Through effective fusion and representation learning, our approach extracts meaningful features for accurate health monitoring. Experimental results on diverse datasets demonstrate the superiority of our Transformer-based multi- modal fusion approach, outperforming existing methods in capturing intricate patterns and predicting health conditions. The significance of our research lies in revolutionizing remote health monitoring, providing more accurate, and personalized healthcare services.
Collapse
Affiliation(s)
- Bo Guo
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
- Department of Computing, Faculty of Communication, Visual Art and Computing, Universiti Selangor, Selangor, Malaysia
| | - Huaming Liu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| | - Lei Niu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| |
Collapse
|
28
|
Zhao G, Zhao Z, Gong W, Li F. Radiology report generation with medical knowledge and multilevel image-report alignment: A new method and its verification. Artif Intell Med 2023; 146:102714. [PMID: 38042601 DOI: 10.1016/j.artmed.2023.102714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/01/2023] [Accepted: 11/01/2023] [Indexed: 12/04/2023]
Abstract
Medical report generation is an integral part of computer-aided diagnosis aimed at reducing the workload of radiologists and physicians and alerting them of misdiagnosis risks. In general, medical report generation is an image captioning task. Since medical reports have long sequences with data bias, the existing medical report generation models lack medical knowledge and ignore the interaction alignment between the two modalities of reports and images. The current paper attempts to mitigate these deficiencies by proposing an approach based on knowledge enhancement with multilevel alignment (MKMIA). To this end, it includes a knowledge enhancement (MKE) module and a multilevel alignment module (MIRA). Specifically, the MKE deals with general medical knowledge (MK) and historical knowledge (HK) obtained via data training. The general knowledge is embedded in the form of a dictionary with characteristic organs (referred to as Key) and organ aliases, disease symptoms, etc. (referred to as Value). It provides explicit exception candidates to mitigate data bias. Historical knowledge ensures the comparison of similar cases to provide a better diagnosis. MIRA furnishes coarse-to-fine multilevel alignment, reducing the gap between image and text features, improving the knowledge enhancement module's performance, and facilitating the generation of lengthy reports. Experimental results on two radiology report datasets (i.e., IU X-ray and MIMIC-CXR) proved the effectiveness of the proposed approach, achieving state-of-the-art performance.
Collapse
Affiliation(s)
- Guosheng Zhao
- School of Control Science and Engineering, Shandong University, Jinan, 250061, China
| | - Zijian Zhao
- School of Control Science and Engineering, Shandong University, Jinan, 250061, China.
| | - Wuxian Gong
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, 250021, China
| | - Feng Li
- Department of General Surgery, Qilu Hospital of Shandong University, Jinan, 250012, China
| |
Collapse
|
29
|
Zhang Z, Zhang X, Ichiji K, Bukovský I, Homma N. How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images. Sci Rep 2023; 13:19049. [PMID: 37923762 PMCID: PMC10624834 DOI: 10.1038/s41598-023-45368-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 10/18/2023] [Indexed: 11/06/2023] Open
Abstract
Over the past decade, the use of deep learning has been widely increasing in the medical image diagnosis field. Deep learning-based methods' (DLMs) performance strongly relies on training data. Therefore, researchers often focus on collecting as much data as possible from different medical facilities or developing approaches to avoid the impact of inter-category imbalance (ICI), which means a difference in data quantity among categories. However, due to the ICI within each medical facility, medical data are often isolated and acquired in different settings among medical facilities, known as the issue of intra-source imbalance (ISI) characteristic. This imbalance also impacts the performance of DLMs but receives negligible attention. In this study, we study the impact of the ISI on DLMs by comparison of the version of a deep learning model that was trained separately by an intra-source imbalanced chest X-ray (CXR) dataset and an intra-source balanced CXR dataset for COVID-19 diagnosis. The finding is that using the intra-source imbalanced dataset causes a serious training bias, although the dataset has a good inter-category balance. In contrast, the deep learning model performed a reliable diagnosis when trained on the intra-source balanced dataset. Therefore, our study reports clear evidence that the intra-source balance is vital for training data to minimize the risk of poor performance of DLMs.
Collapse
Affiliation(s)
- Zhang Zhang
- Graduate School of Biomedical Engineering, Tohoku University, Sendai, 980-8576, Japan.
| | - Xiaoyong Zhang
- Department of General Engineering, National Institute of Technology, Sendai College, Sendai, 989-3128, Japan
- Institute of Development, Aging and Cancer, Tohoku University, Sendai, 980-8576, Japan
| | - Kei Ichiji
- Tohoku University Graduate School of Medicine, Tohoku University, Sendai, 980-8576, Japan
| | - Ivo Bukovský
- Department of Computer Science, Faculty of Science, University of South Bohemia in Ceske Budejovice, 370 05, Ceske Budejovice, Czech Republic
| | - Noriyasu Homma
- Graduate School of Biomedical Engineering, Tohoku University, Sendai, 980-8576, Japan
- Institute of Development, Aging and Cancer, Tohoku University, Sendai, 980-8576, Japan
- Tohoku University Graduate School of Medicine, Tohoku University, Sendai, 980-8576, Japan
| |
Collapse
|
30
|
Zhang S, Zhou C, Chen L, Li Z, Gao Y, Chen Y. Visual prior-based cross-modal alignment network for radiology report generation. Comput Biol Med 2023; 166:107522. [PMID: 37820559 DOI: 10.1016/j.compbiomed.2023.107522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 09/08/2023] [Accepted: 09/19/2023] [Indexed: 10/13/2023]
Abstract
Automated radiology report generation is gaining popularity as a means to alleviate the workload of radiologists and prevent misdiagnosis and missed diagnoses. By imitating the working patterns of radiologists, previous report generation approaches have achieved remarkable performance. However, these approaches suffer from two significant problems: (1) lack of visual prior: medical observations in radiology images are interdependent and exhibit certain patterns, and lack of such visual prior can result in reduced accuracy in identifying abnormal regions; (2) lack of alignment between images and texts: the absence of annotations and alignments for regions of interest in the radiology images and reports can lead to inconsistent visual and textual features of the abnormal regions generated by the model. To address these issues, we propose a Visual Prior-based Cross-modal Alignment Network for radiology report generation. First, we propose a novel Contrastive Attention that compares input image with normal images to extract difference information, namely visual prior, which helps to identify abnormalities quickly. Then, to facilitate the alignment of images and texts, we propose a Cross-modal Alignment Network that leverages the cross-modal matrix initialized by the features generated by pre-trained models, to compute cross-modal responses for visual and textual features. Finally, a Visual Prior-guided Multi-Head Attention is proposed to incorporate the visual prior into the generation process. The extensive experimental results on two benchmark datasets, IU-Xray and MIMIC-CXR, illustrate that our proposed model outperforms the state-of-the-art models over almost all metrics, achieving BLEU-4 scores of 0.188 and 0.116 and CIDEr scores of 0.409 and 0.240, respectively.
Collapse
Affiliation(s)
- Sheng Zhang
- Key Laboratory of Digital Media Technology of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Chuan Zhou
- Key Laboratory of Digital Media Technology of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Leiting Chen
- Key Laboratory of Digital Media Technology of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Zhiheng Li
- Key Laboratory of Digital Media Technology of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Yuan Gao
- Key Laboratory of Digital Media Technology of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Yongqi Chen
- Key Laboratory of Digital Media Technology of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 611731, China
| |
Collapse
|
31
|
Sun Z, Lin M, Zhu Q, Xie Q, Wang F, Lu Z, Peng Y. A scoping review on multimodal deep learning in biomedical images and texts. J Biomed Inform 2023; 146:104482. [PMID: 37652343 PMCID: PMC10591890 DOI: 10.1016/j.jbi.2023.104482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/18/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
OBJECTIVE Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. METHODS In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. RESULT This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. CONCLUSION Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.
Collapse
Affiliation(s)
- Zhaoyi Sun
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Mingquan Lin
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Qingqing Zhu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.
| | - Qianqian Xie
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Fei Wang
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.
| | - Yifan Peng
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| |
Collapse
|
32
|
Nicolson A, Dowling J, Koopman B. Improving chest X-ray report generation by leveraging warm starting. Artif Intell Med 2023; 144:102633. [PMID: 37783533 DOI: 10.1016/j.artmed.2023.102633] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 07/11/2023] [Accepted: 08/11/2023] [Indexed: 10/04/2023]
Abstract
Automatically generating a report from a patient's Chest X-rays (CXRs) is a promising solution to reducing clinical workload and improving patient care. However, current CXR report generators-which are predominantly encoder-to-decoder models-lack the diagnostic accuracy to be deployed in a clinical setting. To improve CXR report generation, we investigate warm starting the encoder and decoder with recent open-source computer vision and natural language processing checkpoints, such as the Vision Transformer (ViT) and PubMedBERT. To this end, each checkpoint is evaluated on the MIMIC-CXR and IU X-ray datasets. Our experimental investigation demonstrates that the Convolutional vision Transformer (CvT) ImageNet-21K and the Distilled Generative Pre-trained Transformer 2 (DistilGPT2) checkpoints are best for warm starting the encoder and decoder, respectively. Compared to the state-of-the-art (M2 Transformer Progressive), CvT2DistilGPT2 attained an improvement of 8.3% for CE F-1, 1.8% for BLEU-4, 1.6% for ROUGE-L, and 1.0% for METEOR. The reports generated by CvT2DistilGPT2 have a higher similarity to radiologist reports than previous approaches. This indicates that leveraging warm starting improves CXR report generation. Code and checkpoints for CvT2DistilGPT2 are available at https://github.com/aehrc/cvt2distilgpt2.
Collapse
Affiliation(s)
- Aaron Nicolson
- The Australian e-Health Research Centre, CSIRO Health and Biosecurity, Brisbane, Australia.
| | - Jason Dowling
- The Australian e-Health Research Centre, CSIRO Health and Biosecurity, Brisbane, Australia
| | - Bevan Koopman
- The Australian e-Health Research Centre, CSIRO Health and Biosecurity, Brisbane, Australia
| |
Collapse
|
33
|
Hou X, Liu Z, Li X, Li X, Sang S, Zhang Y. MKCL: Medical Knowledge with Contrastive Learning model for radiology report generation. J Biomed Inform 2023; 146:104496. [PMID: 37704104 DOI: 10.1016/j.jbi.2023.104496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 08/30/2023] [Accepted: 09/07/2023] [Indexed: 09/15/2023]
Abstract
Automatic radiology report generation has the potential to alert inexperienced radiologists to misdiagnoses or missed diagnoses and improve healthcare delivery efficiency by reducing the documentation workload of radiologists. Motivated by the continuous development of automatic image captioning, more and more deep learning methods have been proposed for automatic radiology report generation. However, the visual and textual data bias problem still face many challenges in the medical domain. Additionally, do not integrate medical knowledge, ignoring the mutual influences between medical findings, and abundant unlabeled medical images influence the accuracy of generating report. In this paper, we propose a Medical Knowledge with Contrastive Learning model (MKCL) to enhance radiology report generation. The proposed model MKCL uses IU Medical Knowledge Graph (IU-MKG) to mine the relationship among medical findings and improve the accuracy of identifying positive diseases findings from radiologic medical images. In particular, we design Knowledge Enhanced Attention (KEA), which integrates the IU-MKG and the extracted chest radiological visual features to alleviate textual data bias. Meanwhile, this paper leverages supervised contrastive learning to relieve radiographic medical images which have not been labeled, and identify abnormalities from images. Experimental results on the public dataset IU X-ray show that our proposed model MKCL outperforms other state-of-the-art report generation methods. Ablation studies also demonstrate that IU medical knowledge graph module and supervised contrastive learning module enhance the ability of the model to detect the abnormal parts and accurately describe the abnormal findings. The source code is available at: https://github.com/Eleanorhxd/MKCL.
Collapse
Affiliation(s)
- Xiaodi Hou
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, Liaoning, China
| | - Zhi Liu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, Liaoning, China
| | - Xiaobo Li
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, Liaoning, China
| | - Xingwang Li
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, Liaoning, China
| | - Shengtian Sang
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, Liaoning, China.
| |
Collapse
|
34
|
Zhang J, Shen X, Wan S, Goudos SK, Wu J, Cheng M, Zhang W. A Novel Deep Learning Model for Medical Report Generation by Inter-Intra Information Calibration. IEEE J Biomed Health Inform 2023; 27:5110-5121. [PMID: 37018727 DOI: 10.1109/jbhi.2023.3236661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Automatic generation of medical reports can provide diagnostic assistance to doctors and reduce their workload. To improve the quality of the generated medical reports, injecting auxiliary information through knowledge graphs or templates into the model is widely adopted in previous methods. However, they suffer from two problems: 1) The injected external information is limited in amount and difficult to adequately meet the information needs of medical report generation in content. 2) The injected external information increases the complexity of model and is hard to be reasonably integrated into the generation process of medical reports. Therefore, we propose an Information Calibrated Transformer (ICT) to address the above issues. First, we design a Precursor-information Enhancement Module (PEM), which can effectively extract numerous inter-intra report features from the datasets as the auxiliary information without external injection. And the auxiliary information can be dynamically updated with the training process. Secondly, a combination mode, which consists of PEM and our proposed Information Calibration Attention Module (ICA), is designed and embedded into ICT. In this method, the auxiliary information extracted from PEM is flexibly injected into ICT and the increment of model parameters is small. The comprehensive evaluations validate that the ICT is not only superior to previous methods in the X-Ray datasets, IU-X-Ray and MIMIC-CXR, but also successfully be extended to a CT COVID-19 dataset COV-CTR.
Collapse
|
35
|
Liu Z, Lv Q, Yang Z, Li Y, Lee CH, Shen L. Recent progress in transformer-based medical image analysis. Comput Biol Med 2023; 164:107268. [PMID: 37494821 DOI: 10.1016/j.compbiomed.2023.107268] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 05/30/2023] [Accepted: 07/16/2023] [Indexed: 07/28/2023]
Abstract
The transformer is primarily used in the field of natural language processing. Recently, it has been adopted and shows promise in the computer vision (CV) field. Medical image analysis (MIA), as a critical branch of CV, also greatly benefits from this state-of-the-art technique. In this review, we first recap the core component of the transformer, the attention mechanism, and the detailed structures of the transformer. After that, we depict the recent progress of the transformer in the field of MIA. We organize the applications in a sequence of different tasks, including classification, segmentation, captioning, registration, detection, enhancement, localization, and synthesis. The mainstream classification and segmentation tasks are further divided into eleven medical image modalities. A large number of experiments studied in this review illustrate that the transformer-based method outperforms existing methods through comparisons with multiple evaluation metrics. Finally, we discuss the open challenges and future opportunities in this field. This task-modality review with the latest contents, detailed information, and comprehensive comparison may greatly benefit the broad MIA community.
Collapse
Affiliation(s)
- Zhaoshan Liu
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| | - Qiujie Lv
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore; School of Intelligent Systems Engineering, Sun Yat-sen University, No. 66, Gongchang Road, Guangming District, 518107, China.
| | - Ziduo Yang
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore; School of Intelligent Systems Engineering, Sun Yat-sen University, No. 66, Gongchang Road, Guangming District, 518107, China.
| | - Yifan Li
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| | - Chau Hung Lee
- Department of Radiology, Tan Tock Seng Hospital, 11 Jalan Tan Tock Seng, Singapore, 308433, Singapore.
| | - Lei Shen
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| |
Collapse
|
36
|
Gu Y, Li R, Wang X, Zhou Z. Automatic Medical Report Generation Based on Cross-View Attention and Visual-Semantic Long Short Term Memorys. Bioengineering (Basel) 2023; 10:966. [PMID: 37627851 PMCID: PMC10451690 DOI: 10.3390/bioengineering10080966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 08/06/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023] Open
Abstract
Automatic medical report generation based on deep learning can improve the efficiency of diagnosis and reduce costs. Although several automatic report generation algorithms have been proposed, there are still two main challenges in generating more detailed and accurate diagnostic reports: using multi-view images reasonably and integrating visual and semantic features of key lesions effectively. To overcome these challenges, we propose a novel automatic report generation approach. We first propose the Cross-View Attention Module to process and strengthen the multi-perspective features of medical images, using mean square error loss to unify the learning effect of fusing single-view and multi-view images. Then, we design the module Medical Visual-Semantic Long Short Term Memorys to integrate and record the visual and semantic temporal information of each diagnostic sentence, which enhances the multi-modal features to generate more accurate diagnostic sentences. Applied to the open-source Indiana University X-ray dataset, our model achieved an average improvement of 0.8% over the state-of-the-art (SOTA) model on six evaluation metrics. This demonstrates that our model is capable of generating more detailed and accurate diagnostic reports.
Collapse
Affiliation(s)
- Yunchao Gu
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China; (R.L.); (X.W.); (Z.Z.)
- Hangzhou Innovation Institute, Beihang University, Hangzhou 310051, China
- Research Unit of Virtual Body and Virtual Surgery Technologies, Chinese Academy of Medical Sciences, 2019RU004, Beijing 100191, China
| | - Renyu Li
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China; (R.L.); (X.W.); (Z.Z.)
| | - Xinliang Wang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China; (R.L.); (X.W.); (Z.Z.)
| | - Zhong Zhou
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China; (R.L.); (X.W.); (Z.Z.)
| |
Collapse
|
37
|
Shamshad F, Khan S, Zamir SW, Khan MH, Hayat M, Khan FS, Fu H. Transformers in medical imaging: A survey. Med Image Anal 2023; 88:102802. [PMID: 37315483 DOI: 10.1016/j.media.2023.102802] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/11/2023] [Accepted: 03/23/2023] [Indexed: 06/16/2023]
Abstract
Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as de facto operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. Inspired from this transition, in this survey, we attempt to provide a comprehensive review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues. Specifically, we survey the use of Transformers in medical image segmentation, detection, classification, restoration, synthesis, registration, clinical report generation, and other tasks. In particular, for each of these applications, we develop taxonomy, identify application-specific challenges as well as provide insights to solve them, and highlight recent trends. Further, we provide a critical discussion of the field's current state as a whole, including the identification of key challenges, open problems, and outlining promising future directions. We hope this survey will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of Transformer models in medical imaging. Finally, to cope with the rapid development in this field, we intend to regularly update the relevant latest papers and their open-source implementations at https://github.com/fahadshamshad/awesome-transformers-in-medical-imaging.
Collapse
Affiliation(s)
- Fahad Shamshad
- MBZ University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
| | - Salman Khan
- MBZ University of Artificial Intelligence, Abu Dhabi, United Arab Emirates; CECS, Australian National University, Canberra ACT 0200, Australia
| | - Syed Waqas Zamir
- Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | | | - Munawar Hayat
- Faculty of IT, Monash University, Clayton VIC 3800, Australia
| | - Fahad Shahbaz Khan
- MBZ University of Artificial Intelligence, Abu Dhabi, United Arab Emirates; Computer Vision Laboratory, Linköping University, Sweden
| | - Huazhu Fu
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore
| |
Collapse
|
38
|
Feyisa DW, Ayano YM, Debelee TG, Schwenker F. Weak Localization of Radiographic Manifestations in Pulmonary Tuberculosis from Chest X-ray: A Systematic Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:6781. [PMID: 37571564 PMCID: PMC10422452 DOI: 10.3390/s23156781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/03/2023] [Accepted: 07/14/2023] [Indexed: 08/13/2023]
Abstract
Pulmonary tuberculosis (PTB) is a bacterial infection that affects the lung. PTB remains one of the infectious diseases with the highest global mortalities. Chest radiography is a technique that is often employed in the diagnosis of PTB. Radiologists identify the severity and stage of PTB by inspecting radiographic features in the patient's chest X-ray (CXR). The most common radiographic features seen on CXRs include cavitation, consolidation, masses, pleural effusion, calcification, and nodules. Identifying these CXR features will help physicians in diagnosing a patient. However, identifying these radiographic features for intricate disorders is challenging, and the accuracy depends on the radiologist's experience and level of expertise. So, researchers have proposed deep learning (DL) techniques to detect and mark areas of tuberculosis infection in CXRs. DL models have been proposed in the literature because of their inherent capacity to detect diseases and segment the manifestation regions from medical images. However, fully supervised semantic segmentation requires several pixel-by-pixel labeled images. The annotation of such a large amount of data by trained physicians has some challenges. First, the annotation requires a significant amount of time. Second, the cost of hiring trained physicians is expensive. In addition, the subjectivity of medical data poses a difficulty in having standardized annotation. As a result, there is increasing interest in weak localization techniques. Therefore, in this review, we identify methods employed in the weakly supervised segmentation and localization of radiographic manifestations of pulmonary tuberculosis from chest X-rays. First, we identify the most commonly used public chest X-ray datasets for tuberculosis identification. Following that, we discuss the approaches for weakly localizing tuberculosis radiographic manifestations in chest X-rays. The weakly supervised localization of PTB can highlight the region of the chest X-ray image that contributed the most to the DL model's classification output and help pinpoint the diseased area. Finally, we discuss the limitations and challenges of weakly supervised techniques in localizing TB manifestations regions in chest X-ray images.
Collapse
Affiliation(s)
- Degaga Wolde Feyisa
- Ethiopian Artificial Intelligence Institute, Addis Ababa P.O. Box 40782, Ethiopia; (D.W.F.); (Y.M.A.); (T.G.D.)
| | - Yehualashet Megersa Ayano
- Ethiopian Artificial Intelligence Institute, Addis Ababa P.O. Box 40782, Ethiopia; (D.W.F.); (Y.M.A.); (T.G.D.)
| | - Taye Girma Debelee
- Ethiopian Artificial Intelligence Institute, Addis Ababa P.O. Box 40782, Ethiopia; (D.W.F.); (Y.M.A.); (T.G.D.)
- Department of Electrical and Computer Engineering, Addis Ababa Science and Technology University, Addis Ababa P.O. Box 120611, Ethiopia
| | - Friedhelm Schwenker
- Institute of Neural Information Processing, Ulm University, 89069 Ulm, Germany
| |
Collapse
|
39
|
Cai L, Li J, Lv H, Liu W, Niu H, Wang Z. Integrating domain knowledge for biomedical text analysis into deep learning: A survey. J Biomed Inform 2023; 143:104418. [PMID: 37290540 DOI: 10.1016/j.jbi.2023.104418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/24/2023] [Accepted: 05/31/2023] [Indexed: 06/10/2023]
Abstract
The past decade has witnessed an explosion of textual information in the biomedical field. Biomedical texts provide a basis for healthcare delivery, knowledge discovery, and decision-making. Over the same period, deep learning has achieved remarkable performance in biomedical natural language processing, however, its development has been limited by well-annotated datasets and interpretability. To solve this, researchers have considered combining domain knowledge (such as biomedical knowledge graph) with biomedical data, which has become a promising means of introducing more information into biomedical datasets and following evidence-based medicine. This paper comprehensively reviews more than 150 recent literature studies on incorporating domain knowledge into deep learning models to facilitate typical biomedical text analysis tasks, including information extraction, text classification, and text generation. We eventually discuss various challenges and future directions.
Collapse
Affiliation(s)
- Linkun Cai
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Jia Li
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Han Lv
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Wenjuan Liu
- Aerospace Center Hospital, 100049 Beijing, China
| | - Haijun Niu
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Zhenchang Wang
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China; Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China.
| |
Collapse
|
40
|
Das S, Ayus I, Gupta D. A comprehensive review of COVID-19 detection with machine learning and deep learning techniques. HEALTH AND TECHNOLOGY 2023; 13:1-14. [PMID: 37363343 PMCID: PMC10244837 DOI: 10.1007/s12553-023-00757-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 05/14/2023] [Indexed: 06/28/2023]
Abstract
Purpose The first transmission of coronavirus to humans started in Wuhan city of China, took the shape of a pandemic called Corona Virus Disease 2019 (COVID-19), and posed a principal threat to the entire world. The researchers are trying to inculcate artificial intelligence (Machine learning or deep learning models) for the efficient detection of COVID-19. This research explores all the existing machine learning (ML) or deep learning (DL) models, used for COVID-19 detection which may help the researcher to explore in different directions. The main purpose of this review article is to present a compact overview of the application of artificial intelligence to the research experts, helping them to explore the future scopes of improvement. Methods The researchers have used various machine learning, deep learning, and a combination of machine and deep learning models for extracting significant features and classifying various health conditions in COVID-19 patients. For this purpose, the researchers have utilized different image modalities such as CT-Scan, X-Ray, etc. This study has collected over 200 research papers from various repositories like Google Scholar, PubMed, Web of Science, etc. These research papers were passed through various levels of scrutiny and finally, 50 research articles were selected. Results In those listed articles, the ML / DL models showed an accuracy of 99% and above while performing the classification of COVID-19. This study has also presented various clinical applications of various research. This study specifies the importance of various machine and deep learning models in the field of medical diagnosis and research. Conclusion In conclusion, it is evident that ML/DL models have made significant progress in recent years, but there are still limitations that need to be addressed. Overfitting is one such limitation that can lead to incorrect predictions and overburdening of the models. The research community must continue to work towards finding ways to overcome these limitations and make machine and deep learning models even more effective and efficient. Through this ongoing research and development, we can expect even greater advances in the future.
Collapse
Affiliation(s)
- Sreeparna Das
- Department of Computer Science and Engineering, National Institute of Technology Arunachal Pradesh, Jote, Arunachal Pradesh 791113 India
| | - Ishan Ayus
- Department of Computer Science and Engineering, ITER, Siksha ‘O’ Anusandhan Deemed to be University, Bhubaneswar, Odisha 751030 India
| | - Deepak Gupta
- Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, UP 211004 India
| |
Collapse
|
41
|
Nasser AA, Akhloufi MA. Deep Learning Methods for Chest Disease Detection Using Radiography Images. SN COMPUTER SCIENCE 2023; 4:388. [PMID: 37200562 PMCID: PMC10173935 DOI: 10.1007/s42979-023-01818-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 04/04/2023] [Indexed: 05/20/2023]
Abstract
X-ray images are the most widely used medical imaging modality. They are affordable, non-dangerous, accessible, and can be used to identify different diseases. Multiple computer-aided detection (CAD) systems using deep learning (DL) algorithms were recently proposed to support radiologists in identifying different diseases on medical images. In this paper, we propose a novel two-step approach for chest disease classification. The first is a multi-class classification step based on classifying X-ray images by infected organs into three classes (normal, lung disease, and heart disease). The second step of our approach is a binary classification of seven specific lungs and heart diseases. We use a consolidated dataset of 26,316 chest X-ray (CXR) images. Two deep learning methods are proposed in this paper. The first is called DC-ChestNet. It is based on ensembling deep convolutional neural network (DCNN) models. The second is named VT-ChestNet. It is based on a modified transformer model. VT-ChestNet achieved the best performance overcoming DC-ChestNet and state-of-the-art models (DenseNet121, DenseNet201, EfficientNetB5, and Xception). VT-ChestNet obtained an area under curve (AUC) of 95.13% for the first step. For the second step, it obtained an average AUC of 99.26% for heart diseases and an average AUC of 99.57% for lung diseases.
Collapse
Affiliation(s)
- Adnane Ait Nasser
- Perception, Robotics, and Intelligent Machines (PRIME), Université de Moncton, Moncton, NB E1C 3E9 Canada
| | - Moulay A. Akhloufi
- Perception, Robotics, and Intelligent Machines (PRIME), Université de Moncton, Moncton, NB E1C 3E9 Canada
| |
Collapse
|
42
|
Yang S, Wu X, Ge S, Zheng Z, Zhou SK, Xiao L. Radiology report generation with a learned knowledge base and multi-modal alignment. Med Image Anal 2023; 86:102798. [PMID: 36989850 DOI: 10.1016/j.media.2023.102798] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 02/10/2023] [Accepted: 03/10/2023] [Indexed: 03/28/2023]
Abstract
In clinics, a radiology report is crucial for guiding a patient's treatment. However, writing radiology reports is a heavy burden for radiologists. To this end, we present an automatic, multi-modal approach for report generation from a chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with specific information of the x-ray images, features two distinct modules: (i) Learned knowledge base: To absorb the knowledge embedded in the radiology reports, we build a knowledge base that can automatically distill and restore medical knowledge from textual embedding without manual labor; (ii) Multi-modal alignment: to promote the semantic alignment among reports, disease labels, and images, we explicitly utilize textual embedding to guide the learning of the visual feature space. We evaluate the performance of the proposed model using metrics from both natural language generation and clinic efficacy on the public IU-Xray and MIMIC-CXR datasets. Our ablation study shows that each module contributes to improving the quality of generated reports. Furthermore, the assistance of both modules, our approach outperforms state-of-the-art methods over almost all the metrics. Code is available at https://github.com/LX-doctorAI1/M2KT.
Collapse
|
43
|
Borys K, Schmitt YA, Nauta M, Seifert C, Krämer N, Friedrich CM, Nensa F. Explainable AI in medical imaging: An overview for clinical practitioners – Beyond saliency-based XAI approaches. Eur J Radiol 2023; 162:110786. [PMID: 36990051 DOI: 10.1016/j.ejrad.2023.110786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 03/03/2023] [Accepted: 03/14/2023] [Indexed: 03/30/2023]
Abstract
Driven by recent advances in Artificial Intelligence (AI) and Computer Vision (CV), the implementation of AI systems in the medical domain increased correspondingly. This is especially true for the domain of medical imaging, in which the incorporation of AI aids several imaging-based tasks such as classification, segmentation, and registration. Moreover, AI reshapes medical research and contributes to the development of personalized clinical care. Consequently, alongside its extended implementation arises the need for an extensive understanding of AI systems and their inner workings, potentials, and limitations which the field of eXplainable AI (XAI) aims at. Because medical imaging is mainly associated with visual tasks, most explainability approaches incorporate saliency-based XAI methods. In contrast to that, in this article we would like to investigate the full potential of XAI methods in the field of medical imaging by specifically focusing on XAI techniques not relying on saliency, and providing diversified examples. We dedicate our investigation to a broad audience, but particularly healthcare professionals. Moreover, this work aims at establishing a common ground for cross-disciplinary understanding and exchange across disciplines between Deep Learning (DL) builders and healthcare professionals, which is why we aimed for a non-technical overview. Presented XAI methods are divided by a method's output representation into the following categories: Case-based explanations, textual explanations, and auxiliary explanations.
Collapse
|
44
|
Shetty S, S. AV, Mahale A. Multimodal medical tensor fusion network-based DL framework for abnormality prediction from the radiology CXRs and clinical text reports. MULTIMEDIA TOOLS AND APPLICATIONS 2023:1-48. [PMID: 37362656 PMCID: PMC10119019 DOI: 10.1007/s11042-023-14940-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 04/05/2022] [Accepted: 02/22/2023] [Indexed: 06/28/2023]
Abstract
Pulmonary disease is a commonly occurring abnormality throughout this world. The pulmonary diseases include Tuberculosis, Pneumothorax, Cardiomegaly, Pulmonary atelectasis, Pneumonia, etc. A timely prognosis of pulmonary disease is essential. Increasing progress in Deep Learning (DL) techniques has significantly impacted and contributed to the medical domain, specifically in leveraging medical imaging for analysis, prognosis, and therapeutic decisions for clinicians. Many contemporary DL strategies for radiology focus on a single modality of data utilizing imaging features without considering the clinical context that provides more valuable complementary information for clinically consistent prognostic decisions. Also, the selection of the best data fusion strategy is crucial when performing Machine Learning (ML) or DL operation on multimodal heterogeneous data. We investigated multimodal medical fusion strategies leveraging DL techniques to predict pulmonary abnormality from the heterogeneous radiology Chest X-Rays (CXRs) and clinical text reports. In this research, we have proposed two effective unimodal and multimodal subnetworks to predict pulmonary abnormality from the CXR and clinical reports. We have conducted a comprehensive analysis and compared the performance of unimodal and multimodal models. The proposed models were applied to standard augmented data and the synthetic data generated to check the model's ability to predict from the new and unseen data. The proposed models were thoroughly assessed and examined against the publicly available Indiana university dataset and the data collected from the private medical hospital. The proposed multimodal models have given superior results compared to the unimodal models.
Collapse
Affiliation(s)
- Shashank Shetty
- Department of Information Technology, National Institute of Technology Karnataka, Mangalore, 575025 Karnataka India
- Department of Computer Science and Engineering, Nitte (Deemed to be University), NMAM Institute of Technology (NMAMIT), Udupi, 574110 Karnataka India
| | - Ananthanarayana V. S.
- Department of Information Technology, National Institute of Technology Karnataka, Mangalore, 575025 Karnataka India
| | - Ajit Mahale
- Department of Radiology, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Mangalore, 575001 Karnataka India
| |
Collapse
|
45
|
Cui C, Yang H, Wang Y, Zhao S, Asad Z, Coburn LA, Wilson KT, Landman BA, Huo Y. Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: a review. PROGRESS IN BIOMEDICAL ENGINEERING (BRISTOL, ENGLAND) 2023; 5:10.1088/2516-1091/acc2fe. [PMID: 37360402 PMCID: PMC10288577 DOI: 10.1088/2516-1091/acc2fe] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/28/2023]
Abstract
The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on various images (e.g. radiology, pathology and camera images) and non-image data (e.g. clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multimodal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multimodal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (a) overview of current multimodal learning workflows, (b) summarization of multimodal fusion methods, (c) discussion of the performance, (d) applications in disease diagnosis and prognosis, and (e) challenges and future directions.
Collapse
Affiliation(s)
- Can Cui
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
| | - Haichun Yang
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
| | - Yaohong Wang
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
| | - Shilin Zhao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
| | - Zuhayr Asad
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
| | - Lori A Coburn
- Division of Gastroenterology Hepatology, and Nutrition, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States of America
- Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN 37212, United States of America
| | - Keith T Wilson
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
- Division of Gastroenterology Hepatology, and Nutrition, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States of America
- Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN 37212, United States of America
| | - Bennett A Landman
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN 37235, United States of America
| | - Yuankai Huo
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN 37235, United States of America
| |
Collapse
|
46
|
Zeng X, Dong Q, Li Y. MG-CNFNet: A multiple grained channel normalized fusion networks for medical image deblurring. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
47
|
Clunie DA, Flanders A, Taylor A, Erickson B, Bialecki B, Brundage D, Gutman D, Prior F, Seibert JA, Perry J, Gichoya JW, Kirby J, Andriole K, Geneslaw L, Moore S, Fitzgerald TJ, Tellis W, Xiao Y, Farahani K, Luo J, Rosenthal A, Kandarpa K, Rosen R, Goetz K, Babcock D, Xu B, Hsiao J. Report of the Medical Image De-Identification (MIDI) Task Group - Best Practices and Recommendations. ARXIV 2023:arXiv:2303.10473v2. [PMID: 37033463 PMCID: PMC10081345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Affiliation(s)
| | | | | | | | | | | | | | - Fred Prior
- University of Arkansas for Medical Sciences
| | | | | | | | - Justin Kirby
- Frederick National Laboratory for Cancer Research
| | | | | | | | | | | | - Ying Xiao
- University of Pennsylvania Health System
| | | | - James Luo
- National Heart, Lung, and Blood Institute (NHLBI)
| | - Alex Rosenthal
- National Institute of Allergy and Infectious Diseases (NIAID)
| | - Kris Kandarpa
- National Institute of Biomedical Imaging and Bioengineering (NIBIB)
| | - Rebecca Rosen
- Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD)
| | | | - Debra Babcock
- National Institute of Neurological Disorders and Stroke (NINDS)
| | - Ben Xu
- National Institute on Alcohol Abuse and Alcoholism (NIAAA)
| | | |
Collapse
|
48
|
Rehman A, Khan A, Fatima G, Naz S, Razzak I. Review on chest pathogies detection systems using deep learning techniques. Artif Intell Rev 2023; 56:1-47. [PMID: 37362896 PMCID: PMC10027283 DOI: 10.1007/s10462-023-10457-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
Chest radiography is the standard and most affordable way to diagnose, analyze, and examine different thoracic and chest diseases. Typically, the radiograph is examined by an expert radiologist or physician to decide about a particular anomaly, if exists. Moreover, computer-aided methods are used to assist radiologists and make the analysis process accurate, fast, and more automated. A tremendous improvement in automatic chest pathologies detection and analysis can be observed with the emergence of deep learning. The survey aims to review, technically evaluate, and synthesize the different computer-aided chest pathologies detection systems. The state-of-the-art of single and multi-pathologies detection systems, which are published in the last five years, are thoroughly discussed. The taxonomy of image acquisition, dataset preprocessing, feature extraction, and deep learning models are presented. The mathematical concepts related to feature extraction model architectures are discussed. Moreover, the different articles are compared based on their contributions, datasets, methods used, and the results achieved. The article ends with the main findings, current trends, challenges, and future recommendations.
Collapse
Affiliation(s)
- Arshia Rehman
- COMSATS University Islamabad, Abbottabad-Campus, Abbottabad, Pakistan
| | - Ahmad Khan
- COMSATS University Islamabad, Abbottabad-Campus, Abbottabad, Pakistan
| | - Gohar Fatima
- The Islamia University of Bahawalpur, Bahawal Nagar Campus, Bahawal Nagar, Pakistan
| | - Saeeda Naz
- Govt Girls Post Graduate College No.1, Abbottabad, Pakistan
| | - Imran Razzak
- School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
| |
Collapse
|
49
|
Medical image captioning via generative pretrained transformers. Sci Rep 2023; 13:4171. [PMID: 36914733 PMCID: PMC10010644 DOI: 10.1038/s41598-023-31223-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/08/2023] [Indexed: 03/16/2023] Open
Abstract
The proposed model for automatic clinical image caption generation combines the analysis of radiological scans with structured patient information from the textual records. It uses two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The generated textual summary contains essential information about pathologies found, their location, along with the 2D heatmaps that localize each pathology on the scans. The model has been tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO, and the results measured with natural language assessment metrics demonstrated its efficient applicability to chest X-ray image captioning.
Collapse
|
50
|
Mustafa Khan M, ul Islam MS, Siddiqui AA, Qadri MT. Dual deterministic model based on deep neural network for the classification of pneumonia. INTELLIGENT DECISION TECHNOLOGIES 2023. [DOI: 10.3233/idt-220192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Pneumonia is a disease caused by the virus (flu, respiratory Syncytial Virus) or bacteria. It can be fatal if not diagnosed and treated at an early stage. Chest X-rays have been widely utilized to diagnose such abnormalities with high exactitude and are primarily responsible for the augment real-world diagnosis process. Poor availability of authentic data and yardstick-based approaches and studies complicates the comparison process and identifying the safest recognition method. In this paper, a Dual Deterministic Model (DD-M) is proposed based on a Deep Neural network that would identify Pneumonia from chest X-ray and distinguish the cause in case of either viral or bacterial infection at an efficiency equivalent of an active radiologist. To accomplish the automated task of the proposed algorithm, an automatic computer-aided system is necessary. The proposed algorithm incorporates deep learning techniques to understand radiographic imaging better. The results were evaluated after implementing the proposed algorithm where; it reveals various aspects of the chest infected with Pneumonia compared to the healthy individual with approximately 97.45% accuracy and distinguishes between the viral and bacterial infection with the efficiency of 88.41%. The proposed algorithm with an improved image dataset will help the doctors diagnose.
Collapse
|