1
|
Panyarak W, Suttapak W, Mahasantipiya P, Charuakkra A, Boonsong N, Wantanajittikul K, Iamaroon A. CrossViT with ECAP: Enhanced deep learning for jaw lesion classification. Int J Med Inform 2025; 193:105666. [PMID: 39492085 DOI: 10.1016/j.ijmedinf.2024.105666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 07/25/2024] [Accepted: 10/24/2024] [Indexed: 11/05/2024]
Abstract
BACKGROUND Radiolucent jaw lesions like ameloblastoma (AM), dentigerous cyst (DC), odontogenic keratocyst (OKC), and radicular cyst (RC) often share similar characteristics, making diagnosis challenging. In 2021, CrossViT, a novel deep learning approach using multi-scale vision transformers (ViT) with cross-attention, emerged for accurate image classification. Additionally, we introduced Extended Cropping and Padding (ECAP), a method to expand training data by iteratively cropping smaller images while preserving context. However, its application in dental radiographic classification remains unexplored. This study investigates the effectiveness of CrossViTs and ECAP against ResNets for classifying common radiolucent jaw lesions. METHODS We conducted a retrospective study involving 208 prevalent radiolucent jaw lesions (49 AMs, 59 DCs, 48 OKCs, and 54 RCs) observed in panoramic radiographs or orthopantomograms (OPGs) with confirmed histological diagnoses. Three experienced oral radiologists provided annotations with consensus. We implemented horizontal flip and ECAP technique with CrossViT-15, -18, ResNet-50, -101, and -152. A four-fold cross-validation approach was employed. The models' performance assessed through accuracy, specificity, precision, recall (sensitivity), F1-score, and area under the receiver operating characteristics (AUCs) metrics. RESULTS Models using the ECAP technique generally achieved better results, with ResNet-152 showing a statistically significant increase in F1-score. CrossViT models consistently achieved higher accuracy, precision, recall, and F1-score compared to ResNet models, regardless of ECAP usage. CrossViT-18 achieved the best overall performance. While all models showed positive ability to differentiate lesions, DC had the highest AUCs (0.89-0.90) and OKC the lowest (0.72-0.81). Only CrossViT-15 achieved AUCs above 0.80 for all four lesion types. CONCLUSION ECAP, a targeted padding data technique, improves deep learning model performance for radiolucent jaw lesion classification. This context-preserving approach is beneficial for tasks requiring an understanding of the lesion's surroundings. Combined with CrossViT models, ECAP shows promise for accurate classification, particularly for rare lesions with limited data.
Collapse
Affiliation(s)
- Wannakamon Panyarak
- Division of Oral and Maxillofacial Radiology, Department of Oral Biology and Diagnostic Sciences, Faculty of Dentistry, Chiang Mai University, Suthep Road, Suthep Sub-district, Mueang Chiang Mai District, Chiang Mai 50200, Thailand.
| | - Wattanapong Suttapak
- Division of Computer Engineering, School of Information and Communication Technology, University of Phayao, Phahon Yothin Road, Mae Ka Sub-district, Mueang Phayao District, Phayao 56000, Thailand.
| | - Phattaranant Mahasantipiya
- Division of Oral and Maxillofacial Radiology, Department of Oral Biology and Diagnostic Sciences, Faculty of Dentistry, Chiang Mai University, Suthep Road, Suthep Sub-district, Mueang Chiang Mai District, Chiang Mai 50200, Thailand.
| | - Arnon Charuakkra
- Division of Oral and Maxillofacial Radiology, Department of Oral Biology and Diagnostic Sciences, Faculty of Dentistry, Chiang Mai University, Suthep Road, Suthep Sub-district, Mueang Chiang Mai District, Chiang Mai 50200, Thailand.
| | - Nattanit Boonsong
- Department of Oral Biology and Diagnostic Sciences, Faculty of Dentistry, Chiang Mai University, Suthep Road, Suthep Sub-district, Mueang Chiang Mai District, Chiang Mai 50200, Thailand.
| | - Kittichai Wantanajittikul
- Department of Radiologic Technology, Faculty of Associated Medical Sciences, Chiang Mai University, Suthep Road, Suthep Sub-district, Mueang Chiang Mai District, Chiang Mai 50200, Thailand.
| | - Anak Iamaroon
- Department of Oral Biology and Diagnostic Sciences, Faculty of Dentistry, Chiang Mai University, Suthep Road, Suthep Sub-district, Mueang Chiang Mai District, Chiang Mai 50200, Thailand.
| |
Collapse
|
2
|
Zhang J, Chen J. Research on grading detection methods for diabetic retinopathy based on deep learning. Pak J Med Sci 2025; 41:225-229. [PMID: 39867796 PMCID: PMC11755306 DOI: 10.12669/pjms.41.1.9171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/29/2024] [Accepted: 11/06/2024] [Indexed: 01/28/2025] Open
Abstract
Objective To design a deep learning-based model for early screening of diabetic retinopathy, predict the condition, and provide interpretable justifications. Methods The experiment's model structure is designed based on the Vision Transformer architecture which was initiated in March 2023 and the first version was produced in July 2023 at Affiliated Hospital of Hangzhou Normal University. We use the publicly available EyePACS dataset as input to train the model. Using the trained model, we predict whether a given patient's fundus images indicate diabetic retinopathy and provide the relevant affected areas as the basis for the judgement. Results The model was validated using two subsets of the IDRiD dataset. Our model not only achieved good results in terms of detection accuracy, reaching around 0.88, but also performed comparably to similar models annotated for affected areas in predicting the affected regions. Conclusion Utilizing image-level annotations, we implemented a method for detecting diabetic retinopathy through deep learning and provided interpretable justifications to assist clinicians in diagnosis.
Collapse
Affiliation(s)
- Jing Zhang
- Jing Zhang, Department of Ophthalmology, Affiliated Hospital of Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Juan Chen
- Juan Chen, Department of Ophthalmology, Affiliated Hospital of Hangzhou Normal University, Hangzhou, Zhejiang, China
| |
Collapse
|
3
|
Di Stefano V, D’Angelo M, Monaco F, Vignapiano A, Martiadis V, Barone E, Fornaro M, Steardo L, Solmi M, Manchia M, Steardo L. Decoding Schizophrenia: How AI-Enhanced fMRI Unlocks New Pathways for Precision Psychiatry. Brain Sci 2024; 14:1196. [PMID: 39766395 PMCID: PMC11674252 DOI: 10.3390/brainsci14121196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 11/24/2024] [Accepted: 11/25/2024] [Indexed: 01/11/2025] Open
Abstract
Schizophrenia, a highly complex psychiatric disorder, presents significant challenges in diagnosis and treatment due to its multifaceted neurobiological underpinnings. Recent advancements in functional magnetic resonance imaging (fMRI) and artificial intelligence (AI) have revolutionized the understanding and management of this condition. This manuscript explores how the integration of these technologies has unveiled key insights into schizophrenia's structural and functional neural anomalies. fMRI research highlights disruptions in crucial brain regions like the prefrontal cortex and hippocampus, alongside impaired connectivity within networks such as the default mode network (DMN). These alterations correlate with the cognitive deficits and emotional dysregulation characteristic of schizophrenia. AI techniques, including machine learning (ML) and deep learning (DL), have enhanced the detection and analysis of these complex patterns, surpassing traditional methods in precision. Algorithms such as support vector machines (SVMs) and Vision Transformers (ViTs) have proven particularly effective in identifying biomarkers and aiding early diagnosis. Despite these advancements, challenges such as variability in methodologies and the disorder's heterogeneity persist, necessitating large-scale, collaborative studies for clinical translation. Moreover, ethical considerations surrounding data integrity, algorithmic transparency, and patient individuality must guide AI's integration into psychiatry. Looking ahead, AI-augmented fMRI holds promise for tailoring personalized interventions, addressing unique neural dysfunctions, and improving therapeutic outcomes for individuals with schizophrenia. This convergence of neuroimaging and computational innovation heralds a transformative era in precision psychiatry.
Collapse
Affiliation(s)
- Valeria Di Stefano
- Psychiatry Unit, Department of Health Sciences, University of Catanzaro Magna Graecia, 88100 Catanzaro, Italy; (V.D.S.); (L.S.J.)
| | - Martina D’Angelo
- Psychiatry Unit, Department of Health Sciences, University of Catanzaro Magna Graecia, 88100 Catanzaro, Italy; (V.D.S.); (L.S.J.)
| | - Francesco Monaco
- Department of Mental Health, Azienda Sanitaria Locale Salerno, 84125 Salerno, Italy; (F.M.); (A.V.)
- European Biomedical Research Institute of Salerno (EBRIS), 84125 Salerno, Italy
| | - Annarita Vignapiano
- Department of Mental Health, Azienda Sanitaria Locale Salerno, 84125 Salerno, Italy; (F.M.); (A.V.)
- European Biomedical Research Institute of Salerno (EBRIS), 84125 Salerno, Italy
| | - Vassilis Martiadis
- Department of Mental Health, Azienda Sanitaria Locale (ASL) Napoli 1 Centro, 80145 Naples, Italy;
| | - Eugenia Barone
- Department of Psychiatry, University of Campania “Luigi Vanvitelli”, 80138 Naples, Italy;
| | - Michele Fornaro
- Department of Neuroscience, Reproductive Science and Odontostomatology, University of Naples Federico II, 80138 Naples, Italy;
| | - Luca Steardo
- Department of Clinical Psychology, University Giustino Fortunato, 82100 Benevento, Italy;
- Department of Physiology and Pharmacology “Vittorio Erspamer”, SAPIENZA University of Rome, 00185 Rome, Italy
| | - Marco Solmi
- Department of Psychiatry, University of Ottawa, Ottawa, ON K1N 6N5, Canada;
- On Track: The Champlain First Episode Psychosis Program, Department of Mental Health, The Ottawa Hospital, Ottawa, ON K1H 8L6, Canada
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, University of Ottawa, Ottawa, ON K1N 6N5, Canada
- School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, ON K1N 6N5, Canada
- Department of Child and Adolescent Psychiatry, Charité-Universitätsmedizin, 10117 Berlin, Germany
| | - Mirko Manchia
- Section of Psychiatry, Department of Medical Sciences and Public Health, University of Cagliari, 09124 Cagliari, Italy;
- Unit of Clinical Psychiatry, University Hospital Agency of Cagliari, 09123 Cagliari, Italy
- Department of Pharmacology, Dalhousie University, Halifax, NS B3H 4R2, Canada
| | - Luca Steardo
- Psychiatry Unit, Department of Health Sciences, University of Catanzaro Magna Graecia, 88100 Catanzaro, Italy; (V.D.S.); (L.S.J.)
| |
Collapse
|
4
|
Wei J, Xu Y, Wang H, Niu T, Jiang Y, Shen Y, Su L, Dou T, Peng Y, Bi L, Xu X, Wang Y, Liu K. Metadata information and fundus image fusion neural network for hyperuricemia classification in diabetes. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 256:108382. [PMID: 39213898 DOI: 10.1016/j.cmpb.2024.108382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 07/21/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024]
Abstract
OBJECTIVE In diabetes mellitus patients, hyperuricemia may lead to the development of diabetic complications, including macrovascular and microvascular dysfunction. However, the level of blood uric acid in diabetic patients is obtained by sampling peripheral blood from the patient, which is an invasive procedure and not conducive to routine monitoring. Therefore, we developed deep learning algorithm to detect noninvasively hyperuricemia from retina photographs and metadata of patients with diabetes and evaluated performance in multiethnic populations and different subgroups. MATERIALS AND METHODS To achieve the task of non-invasive detection of hyperuricemia in diabetic patients, given that blood uric acid metabolism is directly related to estimated glomerular filtration rate(eGFR), we first performed a regression task for eGFR value before the classification task for hyperuricemia and reintroduced the eGFR regression values into the baseline information. We trained 3 deep learning models: (1) metadata model adjusted for sex, age, body mass index, duration of diabetes, HbA1c, systolic blood pressure, diastolic blood pressure; (2) image model based on fundus photographs; (3)hybrid model combining image and metadata model. Data from the Shanghai General Hospital Diabetes Management Center (ShDMC) were used to develop (6091 participants with diabetes) and internally validated (using 5-fold cross-validation) the models. External testing was performed on an independent dataset (UK Biobank dataset) consisting of 9327 participants with diabetes. RESULTS For the regression task of eGFR, in ShDMC dataset, the coefficient of determination (R2) was 0.684±0.07 (95 % CI) for image model, 0.501±0.04 for metadata model, and 0.727±0.002 for hybrid model. In external UK Biobank dataset, a coefficient of determination (R2) was 0.647±0.06 for image model, 0.627±0.03 for metadata model, and 0.697±0.07 for hybrid model. Our method was demonstrably superior to previous methods. For the classification of hyperuricemia, in ShDMC validation, the area, under the curve (AUC) was 0.86±0.013for image model, 0.86±0.013 for metadata model, and 0.92±0.026 for hybrid model. Estimates with UK biobank were 0.82±0.017 for image model, 0.79±0.024 for metadata model, and 0.89±0.032 for hybrid model. CONCLUSION There is a potential deep learning algorithm using fundus photographs as a noninvasively screening adjunct for hyperuricemia among individuals with diabetes. Meanwhile, combining patient's metadata enables higher screening accuracy. After applying the visualization tool, it found that the deep learning network for the identification of hyperuricemia mainly focuses on the fundus optic disc region.
Collapse
Affiliation(s)
- Jin Wei
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yupeng Xu
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Hanying Wang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Tian Niu
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yan Jiang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yinchen Shen
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Li Su
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Tianyu Dou
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yige Peng
- Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 20080, PR China
| | - Lei Bi
- Institute of Translational Medicine, National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 20080, PR China
| | - Xun Xu
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China
| | - Yufan Wang
- Department of Endocrinology and Metabolism, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, PR China
| | - Kun Liu
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, No. 100 Haining Road, Shanghai 20080, PR China.
| |
Collapse
|
5
|
Bhati D, Neha F, Amiruzzaman M. A Survey on Explainable Artificial Intelligence (XAI) Techniques for Visualizing Deep Learning Models in Medical Imaging. J Imaging 2024; 10:239. [PMID: 39452402 PMCID: PMC11508748 DOI: 10.3390/jimaging10100239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 09/14/2024] [Accepted: 09/21/2024] [Indexed: 10/26/2024] Open
Abstract
The combination of medical imaging and deep learning has significantly improved diagnostic and prognostic capabilities in the healthcare domain. Nevertheless, the inherent complexity of deep learning models poses challenges in understanding their decision-making processes. Interpretability and visualization techniques have emerged as crucial tools to unravel the black-box nature of these models, providing insights into their inner workings and enhancing trust in their predictions. This survey paper comprehensively examines various interpretation and visualization techniques applied to deep learning models in medical imaging. The paper reviews methodologies, discusses their applications, and evaluates their effectiveness in enhancing the interpretability, reliability, and clinical relevance of deep learning models in medical image analysis.
Collapse
Affiliation(s)
- Deepshikha Bhati
- Department of Computer Science, Kent State University, Kent, OH 44242, USA;
| | - Fnu Neha
- Department of Computer Science, Kent State University, Kent, OH 44242, USA;
| | - Md Amiruzzaman
- Department of Computer Science, West Chester University, West Chester, PA 19383, USA;
| |
Collapse
|
6
|
Kesavapillai AR, Aslam SM, Umapathy S, Almutairi F. RA-XTNet: A Novel CNN Model to Predict Rheumatoid Arthritis from Hand Radiographs and Thermal Images: A Comparison with CNN Transformer and Quantum Computing. Diagnostics (Basel) 2024; 14:1911. [PMID: 39272696 PMCID: PMC11394616 DOI: 10.3390/diagnostics14171911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 08/17/2024] [Accepted: 08/18/2024] [Indexed: 09/15/2024] Open
Abstract
The aim and objective of the research are to develop an automated diagnosis system for the prediction of rheumatoid arthritis (RA) based on artificial intelligence (AI) and quantum computing for hand radiographs and thermal images. The hand radiographs and thermal images were segmented using a UNet++ model and color-based k-means clustering technique, respectively. The attributes from the segmented regions were generated using the Speeded-Up Robust Features (SURF) feature extractor and classification was performed using k-star and Hoeffding classifiers. For the ground truth and the predicted test image, the study utilizing UNet++ segmentation achieved a pixel-wise accuracy of 98.75%, an intersection over union (IoU) of 0.87, and a dice coefficient of 0.86, indicating a high level of similarity. The custom RA-X-ray thermal imaging (XTNet) surpassed all the models for the detection of RA with a classification accuracy of 90% and 93% for X-ray and thermal imaging modalities, respectively. Furthermore, the study employed quantum support vector machine (QSVM) as a quantum computing approach which yielded an accuracy of 93.75% and 87.5% for the detection of RA from hand X-ray and thermal images. In addition, vision transformer (ViT) was employed to classify RA which obtained an accuracy of 80% for hand X-rays and 90% for thermal images. Thus, depending on the performance measures, the RA-XTNet model can be used as an effective automated diagnostic method to diagnose RA accurately and rapidly in hand radiographs and thermal images.
Collapse
Affiliation(s)
- Ahalya R Kesavapillai
- Department of Biomedical Engineering, SRM Institute of Science and Technology, College of Engineering and Technology, Chennai 603203, India
- Department of Biomedical Engineering, Easwari Engineering College, Ramapuram, Chennai 600089, India
| | - Shabnam M Aslam
- Department of Information Technology, College of Computer and Information Sciences (CCIS), Majmaah University, Al Majmaah 11952, Saudi Arabia
| | - Snekhalatha Umapathy
- Department of Biomedical Engineering, SRM Institute of Science and Technology, College of Engineering and Technology, Chennai 603203, India
| | - Fadiyah Almutairi
- Department of Information System, College of Computer and Information Sciences (CCIS), Majmaah University, Al Majmaah 11952, Saudi Arabia
| |
Collapse
|
7
|
Chen Z, Hu B, Niu C, Chen T, Li Y, Shan H, Wang G. IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models. Vis Comput Ind Biomed Art 2024; 7:20. [PMID: 39101954 DOI: 10.1186/s42492-024-00171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 07/24/2024] [Indexed: 08/06/2024] Open
Abstract
Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) that learn rich vision-language correlation from image-text pairs, like BLIP-2 and GPT-4, have been intensively investigated. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains unexplored. This is valuable for objective performance evaluation and potential supplement or even replacement of radiologists' opinions. To this end, this study introduces IQAGPT, an innovative computed tomography (CT) IQA system that integrates image-quality captioning VLM with ChatGPT to generate quality scores and textual reports. First, a CT-IQA dataset comprising 1,000 CT slices with diverse quality levels is professionally annotated and compiled for training and evaluation. To better leverage the capabilities of LLMs, the annotated quality scores are converted into semantically rich text descriptions using a prompt template. Second, the image-quality captioning VLM is fine-tuned on the CT-IQA dataset to generate quality descriptions. The captioning model fuses image and text features through cross-modal attention. Third, based on the quality descriptions, users verbally request ChatGPT to rate image-quality scores or produce radiological quality reports. Results demonstrate the feasibility of assessing image quality using LLMs. The proposed IQAGPT outperformed GPT-4 and CLIP-IQA, as well as multitask classification and regression models that solely rely on images.
Collapse
Affiliation(s)
- Zhihao Chen
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China
| | - Bin Hu
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, 200040, China
| | - Chuang Niu
- Biomedical Imaging Center, Center for Biotechnology and Interdisciplinary Studies, Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, 12180, US
| | - Tao Chen
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China
| | - Yuxin Li
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, 200040, China.
| | - Hongming Shan
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China.
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200032, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Ministry of Education), Fudan University, Shanghai, 200433, China.
| | - Ge Wang
- Biomedical Imaging Center, Center for Biotechnology and Interdisciplinary Studies, Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, 12180, US.
| |
Collapse
|
8
|
AlMohimeed A, Shehata M, El-Rashidy N, Mostafa S, Samy Talaat A, Saleh H. ViT-PSO-SVM: Cervical Cancer Predication Based on Integrating Vision Transformer with Particle Swarm Optimization and Support Vector Machine. Bioengineering (Basel) 2024; 11:729. [PMID: 39061811 PMCID: PMC11273508 DOI: 10.3390/bioengineering11070729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 07/10/2024] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open
Abstract
Cervical cancer (CCa) is the fourth most prevalent and common cancer affecting women worldwide, with increasing incidence and mortality rates. Hence, early detection of CCa plays a crucial role in improving outcomes. Non-invasive imaging procedures with good diagnostic performance are desirable and have the potential to lessen the degree of intervention associated with the gold standard, biopsy. Recently, artificial intelligence-based diagnostic models such as Vision Transformers (ViT) have shown promising performance in image classification tasks, rivaling or surpassing traditional convolutional neural networks (CNNs). This paper studies the effect of applying a ViT to predict CCa using different image benchmark datasets. A newly developed approach (ViT-PSO-SVM) was presented for boosting the results of the ViT based on integrating the ViT with particle swarm optimization (PSO), and support vector machine (SVM). First, the proposed framework extracts features from the Vision Transformer. Then, PSO is used to reduce the complexity of extracted features and optimize feature representation. Finally, a softmax classification layer is replaced with an SVM classification model to precisely predict CCa. The models are evaluated using two benchmark cervical cell image datasets, namely SipakMed and Herlev, with different classification scenarios: two, three, and five classes. The proposed approach achieved 99.112% accuracy and 99.113% F1-score for SipakMed with two classes and achieved 97.778% accuracy and 97.805% F1-score for Herlev with two classes outperforming other Vision Transformers, CNN models, and pre-trained models. Finally, GradCAM is used as an explainable artificial intelligence (XAI) tool to visualize and understand the regions of a given image that are important for a model's prediction. The obtained experimental results demonstrate the feasibility and efficacy of the developed ViT-PSO-SVM approach and hold the promise of providing a robust, reliable, accurate, and non-invasive diagnostic tool that will lead to improved healthcare outcomes worldwide.
Collapse
Affiliation(s)
- Abdulaziz AlMohimeed
- College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 13318, Saudi Arabia;
| | - Mohamed Shehata
- Bioengineering Department, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA
| | - Nora El-Rashidy
- Machine Learning and Information Retrieval Department, Faculty of Artificial Intelligence, Kafrelsheiksh University, Kafrelsheiksh 13518, Egypt;
| | - Sherif Mostafa
- Faculty of Computers and Artificial Intelligence, South Valley University, Hurghada 84511, Egypt;
| | - Amira Samy Talaat
- Computers and Systems Department, Electronics Research Institute, Cairo 12622, Egypt;
| | - Hager Saleh
- Faculty of Computers and Artificial Intelligence, South Valley University, Hurghada 84511, Egypt;
- Insight SFI Research Centre for Data Analytics, Galway University, H91 TK33 Galway, Ireland
- Research Development, Atlantic Technological University, Letterkenny, H91 AH5K Donegal, Ireland
| |
Collapse
|
9
|
Yagis E, Aslani S, Jain Y, Zhou Y, Rahmani S, Brunet J, Bellier A, Werlein C, Ackermann M, Jonigk D, Tafforeau P, Lee PD, Walsh C. Deep Learning for 3D Vascular Segmentation in Phase Contrast Tomography. RESEARCH SQUARE 2024:rs.3.rs-4613439. [PMID: 39070623 PMCID: PMC11276017 DOI: 10.21203/rs.3.rs-4613439/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Automated blood vessel segmentation is critical for biomedical image analysis, as vessel morphology changes are associated with numerous pathologies. Still, precise segmentation is difficult due to the complexity of vascular structures, anatomical variations across patients, the scarcity of annotated public datasets, and the quality of images. Our goal is to provide a foundation on the topic and identify a robust baseline model for application to vascular segmentation using a new imaging modality, Hierarchical Phase-Contrast Tomography (HiP-CT). We begin with an extensive review of current machine learning approaches for vascular segmentation across various organs. Our work introduces a meticulously curated training dataset, verified by double annotators, consisting of vascular data from three kidneys imaged using Hierarchical Phase-Contrast Tomography (HiP-CT) as part of the Human Organ Atlas Project. HiP-CT, pioneered at the European Synchrotron Radiation Facility in 2020, revolutionizes 3D organ imaging by offering resolution around 20μm/voxel, and enabling highly detailed localized zooms up to 1μm/voxel without physical sectioning. We leverage the nnU-Net framework to evaluate model performance on this high-resolution dataset, using both known and novel samples, and implementing metrics tailored for vascular structures. Our comprehensive review and empirical analysis on HiP-CT data sets a new standard for evaluating machine learning models in high-resolution organ imaging. Our three experiments yielded Dice scores of 0.9523 and 0.9410, and 0.8585, respectively. Nevertheless, DSC primarily assesses voxel-to-voxel concordance, overlooking several crucial characteristics of the vessels and should not be the sole metric for deciding the performance of vascular segmentation. Our results show that while segmentations yielded reasonably high scores-such as centerline Dice values ranging from 0.82 to 0.88, certain errors persisted. Specifically, large vessels that collapsed due to the lack of hydro-static pressure (HiP-CT is an ex vivo technique) were segmented poorly. Moreover, decreased connectivity in finer vessels and higher segmentation errors at vessel boundaries were observed. Such errors, particularly in significant vessels, obstruct the understanding of the structures by interrupting vascular tree connectivity. Through our review and outputs, we aim to set a benchmark for subsequent model evaluations using various modalities, especially with the HiP-CT imaging database.
Collapse
Affiliation(s)
- Ekin Yagis
- Department of Mechanical Engineering, University College London, London, UK
| | - Shahab Aslani
- Department of Mechanical Engineering, University College London, London, UK
- Centre for Medical Image Computing, University College London, London UK
| | - Yashvardhan Jain
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, USA
| | - Yang Zhou
- Department of Mechanical Engineering, University College London, London, UK
| | - Shahrokh Rahmani
- Department of Mechanical Engineering, University College London, London, UK
| | - Joseph Brunet
- Department of Mechanical Engineering, University College London, London, UK
- European Synchrotron Radiation Facility, Grenoble, France
| | | | - Christopher Werlein
- Institute of Pathology, Hannover Medical School, Carl-Neuberg-Straße 1, 30625, Hannover, Germany
| | | | - Danny Jonigk
- Member of the German Center for Lung Research (DZL), Biomedical Research in Endstage and Obstructive Lung Disease Hannover (BREATH), Hannover, Germany
| | - Paul Tafforeau
- European Synchrotron Radiation Facility, Grenoble, France
| | - Peter D. Lee
- Department of Mechanical Engineering, University College London, London, UK
| | - Claire Walsh
- Department of Mechanical Engineering, University College London, London, UK
| |
Collapse
|
10
|
Chukwujindu E, Faiz H, Ai-Douri S, Faiz K, De Sequeira A. Role of artificial intelligence in brain tumour imaging. Eur J Radiol 2024; 176:111509. [PMID: 38788610 DOI: 10.1016/j.ejrad.2024.111509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/29/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024]
Abstract
Artificial intelligence (AI) is a rapidly evolving field with many neuro-oncology applications. In this review, we discuss how AI can assist in brain tumour imaging, focusing on machine learning (ML) and deep learning (DL) techniques. We describe how AI can help in lesion detection, differential diagnosis, anatomic segmentation, molecular marker identification, prognostication, and pseudo-progression evaluation. We also cover AI applications in non-glioma brain tumours, such as brain metastasis, posterior fossa, and pituitary tumours. We highlight the challenges and limitations of AI implementation in radiology, such as data quality, standardization, and integration. Based on the findings in the aforementioned areas, we conclude that AI can potentially improve the diagnosis and treatment of brain tumours and provide a path towards personalized medicine and better patient outcomes.
Collapse
Affiliation(s)
| | | | | | - Khunsa Faiz
- McMaster University, Department of Radiology, L8S 4L8, Canada.
| | | |
Collapse
|
11
|
Tagnamas J, Ramadan H, Yahyaouy A, Tairi H. Multi-task approach based on combined CNN-transformer for efficient segmentation and classification of breast tumors in ultrasound images. Vis Comput Ind Biomed Art 2024; 7:2. [PMID: 38273164 PMCID: PMC10811315 DOI: 10.1186/s42492-024-00155-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/11/2024] [Indexed: 01/27/2024] Open
Abstract
Accurate segmentation of breast ultrasound (BUS) images is crucial for early diagnosis and treatment of breast cancer. Further, the task of segmenting lesions in BUS images continues to pose significant challenges due to the limitations of convolutional neural networks (CNNs) in capturing long-range dependencies and obtaining global context information. Existing methods relying solely on CNNs have struggled to address these issues. Recently, ConvNeXts have emerged as a promising architecture for CNNs, while transformers have demonstrated outstanding performance in diverse computer vision tasks, including the analysis of medical images. In this paper, we propose a novel breast lesion segmentation network CS-Net that combines the strengths of ConvNeXt and Swin Transformer models to enhance the performance of the U-Net architecture. Our network operates on BUS images and adopts an end-to-end approach to perform segmentation. To address the limitations of CNNs, we design a hybrid encoder that incorporates modified ConvNeXt convolutions and Swin Transformer. Furthermore, to enhance capturing the spatial and channel attention in feature maps we incorporate the Coordinate Attention Module. Second, we design an Encoder-Decoder Features Fusion Module that facilitates the fusion of low-level features from the encoder with high-level semantic features from the decoder during the image reconstruction. Experimental results demonstrate the superiority of our network over state-of-the-art image segmentation methods for BUS lesions segmentation.
Collapse
Affiliation(s)
- Jaouad Tagnamas
- Department of Informatics, Faculty of Sciences Dhar El Mahraz, University of Sidi Mohamed Ben Abdellah, 30000, Fez, Morocco.
| | - Hiba Ramadan
- Department of Informatics, Faculty of Sciences Dhar El Mahraz, University of Sidi Mohamed Ben Abdellah, 30000, Fez, Morocco
| | - Ali Yahyaouy
- Department of Informatics, Faculty of Sciences Dhar El Mahraz, University of Sidi Mohamed Ben Abdellah, 30000, Fez, Morocco
| | - Hamid Tairi
- Department of Informatics, Faculty of Sciences Dhar El Mahraz, University of Sidi Mohamed Ben Abdellah, 30000, Fez, Morocco
| |
Collapse
|
12
|
Gujarati KR, Bathala L, Venkatesh V, Mathew RS, Yalavarthy PK. Transformer-Based Automated Segmentation of the Median Nerve in Ultrasound Videos of Wrist-to-Elbow Region. IEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL 2024; 71:56-69. [PMID: 37930930 DOI: 10.1109/tuffc.2023.3330539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Segmenting the median nerve is essential for identifying nerve entrapment syndromes, guiding surgical planning and interventions, and furthering understanding of nerve anatomy. This study aims to develop an automated tool that can assist clinicians in localizing and segmenting the median nerve from the wrist, mid-forearm, and elbow in ultrasound videos. This is the first fully automated single deep learning model for accurate segmentation of the median nerve from the wrist to the elbow in ultrasound videos, along with the computation of the cross-sectional area (CSA) of the nerve. The visual transformer architecture, which was originally proposed to detect and classify 41 classes in YouTube videos, was modified to predict the median nerve in every frame of ultrasound videos. This is achieved by modifying the bounding box sequence matching block of the visual transformer. The median nerve segmentation is a binary class prediction, and the entire bipartite matching sequence is eliminated, enabling a direct comparison of the prediction with expert annotation in a frame-by-frame fashion. Model training, validation, and testing were performed on a dataset comprising ultrasound videos collected from 100 subjects, which were partitioned into 80, ten, and ten subjects, respectively. The proposed model was compared with U-Net, U-Net++, Siam U-Net, Attention U-Net, LSTM U-Net, and Trans U-Net. The proposed transformer-based model effectively leveraged the temporal and spatial information present in ultrasound video frames and efficiently segmented the median nerve with an average dice similarity coefficient (DSC) of approximately 94% at the wrist and 84% in the entire forearm region.
Collapse
|
13
|
Pinto-Coelho L. How Artificial Intelligence Is Shaping Medical Imaging Technology: A Survey of Innovations and Applications. Bioengineering (Basel) 2023; 10:1435. [PMID: 38136026 PMCID: PMC10740686 DOI: 10.3390/bioengineering10121435] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 12/12/2023] [Accepted: 12/15/2023] [Indexed: 12/24/2023] Open
Abstract
The integration of artificial intelligence (AI) into medical imaging has guided in an era of transformation in healthcare. This literature review explores the latest innovations and applications of AI in the field, highlighting its profound impact on medical diagnosis and patient care. The innovation segment explores cutting-edge developments in AI, such as deep learning algorithms, convolutional neural networks, and generative adversarial networks, which have significantly improved the accuracy and efficiency of medical image analysis. These innovations have enabled rapid and accurate detection of abnormalities, from identifying tumors during radiological examinations to detecting early signs of eye disease in retinal images. The article also highlights various applications of AI in medical imaging, including radiology, pathology, cardiology, and more. AI-based diagnostic tools not only speed up the interpretation of complex images but also improve early detection of disease, ultimately delivering better outcomes for patients. Additionally, AI-based image processing facilitates personalized treatment plans, thereby optimizing healthcare delivery. This literature review highlights the paradigm shift that AI has brought to medical imaging, highlighting its role in revolutionizing diagnosis and patient care. By combining cutting-edge AI techniques and their practical applications, it is clear that AI will continue shaping the future of healthcare in profound and positive ways.
Collapse
Affiliation(s)
- Luís Pinto-Coelho
- ISEP—School of Engineering, Polytechnic Institute of Porto, 4200-465 Porto, Portugal;
- INESCTEC, Campus of the Engineering Faculty of the University of Porto, 4200-465 Porto, Portugal
| |
Collapse
|