1
|
Cai L, Chen L, Huang J, Wang Y, Zhang Y. Know your orientation: A viewpoint-aware framework for polyp segmentation. Med Image Anal 2024; 97:103288. [PMID: 39096844 DOI: 10.1016/j.media.2024.103288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 08/05/2024]
Abstract
Automatic polyp segmentation in endoscopic images is critical for the early diagnosis of colorectal cancer. Despite the availability of powerful segmentation models, two challenges still impede the accuracy of polyp segmentation algorithms. Firstly, during a colonoscopy, physicians frequently adjust the orientation of the colonoscope tip to capture underlying lesions, resulting in viewpoint changes in the colonoscopy images. These variations increase the diversity of polyp visual appearance, posing a challenge for learning robust polyp features. Secondly, polyps often exhibit properties similar to the surrounding tissues, leading to indistinct polyp boundaries. To address these problems, we propose a viewpoint-aware framework named VANet for precise polyp segmentation. In VANet, polyps are emphasized as a discriminative feature and thus can be localized by class activation maps in a viewpoint classification process. With these polyp locations, we design a viewpoint-aware Transformer (VAFormer) to alleviate the erosion of attention by the surrounding tissues, thereby inducing better polyp representations. Additionally, to enhance the polyp boundary perception of the network, we develop a boundary-aware Transformer (BAFormer) to encourage self-attention towards uncertain regions. As a consequence, the combination of the two modules is capable of calibrating predictions and significantly improving polyp segmentation performance. Extensive experiments on seven public datasets across six metrics demonstrate the state-of-the-art results of our method, and VANet can handle colonoscopy images in real-world scenarios effectively. The source code is available at https://github.com/1024803482/Viewpoint-Aware-Network.
Collapse
Affiliation(s)
- Linghan Cai
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China; Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China.
| | - Lijiang Chen
- Department of Electronic Information Engineering, Beihang University, Beijing, 100191, China
| | - Jianhao Huang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yifeng Wang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China
| | - Yongbing Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China.
| |
Collapse
|
2
|
Joye AS, Firlie MG, Wittberg DM, Aragie S, Nash SD, Tadesse Z, Dagnew A, Hailu D, Admassu F, Wondimteka B, Getachew H, Kabtu E, Beyecha S, Shibiru M, Getnet B, Birhanu T, Abdu S, Tekew S, Lietman TM, Keenan JD, Redd TK. Computer Vision Identification of Trachomatous Inflammation-Follicular Using Deep Learning. Cornea 2024:00003226-990000000-00692. [PMID: 39312712 DOI: 10.1097/ico.0000000000003701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 07/25/2024] [Indexed: 09/25/2024]
Abstract
PURPOSE Trachoma surveys are used to estimate the prevalence of trachomatous inflammation-follicular (TF) to guide mass antibiotic distribution. These surveys currently rely on human graders, introducing a significant resource burden and potential for human error. This study describes the development and evaluation of machine learning models intended to reduce cost and improve reliability of these surveys. METHODS Fifty-six thousand seven hundred twenty-five everted eyelid photographs were obtained from 11,358 children of age 0 to 9 years in a single trachoma-endemic region of Ethiopia over a 3-year period. Expert graders reviewed all images from each examination to determine the estimated number of tarsal conjunctival follicles and the degree of trachomatous inflammation-intense. The median estimate of the 3 grader groups was used as the ground truth to train a MobileNetV3 large deep convolutional neural network to detect cases with TF. RESULTS The classification model predicted a TF prevalence of 32%, which was not significantly different from the human consensus estimate (30%; 95% confidence interval of difference, -2 to +4%). The model had an area under the receiver operating characteristic curve of 0.943, F1 score of 0.923, 88% accuracy, 83% sensitivity, and 91% specificity. The area under the receiver operating characteristic curve increased to 0.995 when interpreting nonborderline cases of TF. CONCLUSIONS Deep convolutional neural network models performed well at classifying TF and detecting the number of follicles evident in conjunctival photographs. Implementation of similar models may enable accurate, efficient, large-scale trachoma screening. Further validation in diverse populations with varying TF prevalence is needed before implementation at scale.
Collapse
Affiliation(s)
- Ashlin S Joye
- Casey Eye Institute, Oregon Health and Science University, Portland, OR
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| | - Marissa G Firlie
- George Washington University, School of Medicine and Health Sciences, Washington, DC
| | - Dionna M Wittberg
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| | | | | | | | - Adane Dagnew
- The Carter Center Ethiopia, Addis Ababa, Ethiopia
| | | | - Fisseha Admassu
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Bilen Wondimteka
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Habib Getachew
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Endale Kabtu
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Social Beyecha
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Meskerem Shibiru
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Banchalem Getnet
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Tibebe Birhanu
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Seid Abdu
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Solomon Tekew
- Department of Ophthalmology, University of Gondar, Gondar, Ethiopia
| | - Thomas M Lietman
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| | - Jeremy D Keenan
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| | - Travis K Redd
- Casey Eye Institute, Oregon Health and Science University, Portland, OR
- Francis I Proctor Foundation, University of California San Francisco, San Francisco, CA
| |
Collapse
|
3
|
El Hmimdi AE, Palpanas T, Kapoula Z. Efficient diagnostic classification of diverse pathologies through contextual eye movement data analysis with a novel hybrid architecture. Sci Rep 2024; 14:21461. [PMID: 39271749 PMCID: PMC11399410 DOI: 10.1038/s41598-024-68056-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 07/19/2024] [Indexed: 09/15/2024] Open
Abstract
The analysis of eye movements has proven valuable for understanding brain function and the neuropathology of various disorders. This research aims to utilize eye movement data analysis as a screening tool for differentiation between eight different groups of pathologies, including scholar, neurologic, and postural disorders. Leveraging a dataset from 20 clinical centers, all employing AIDEAL and REMOBI eye movement technologies this study extends prior research by considering a multi-annotation setting, incorporating information from recordings from saccade and vergence eye movement tests, and using contextual information (e.g. target signals and latency of the eye movement relative to the target and confidence level of the quality of eye movement recording) to improve accuracy while reducing noise interference. Additionally, we introduce a novel hybrid architecture that combines the weight-sharing feature of convolution layers with the long-range capabilities of the transformer architecture to improve model efficiency and reduce the computation cost by a factor of 3.36, while still being competitive in terms of macro F1 score. Evaluated on two diverse datasets, our method demonstrates promising results, the most powerful discrimination being Attention & Neurologic; with a macro F1 score of up to 78.8%; disorder. The results indicate the effectiveness of our approach in classifying eye movement data from different pathologies and different clinical centers accurately, thus enabling the creation of an assistant tool in the future.
Collapse
Affiliation(s)
- Alae Eddine El Hmimdi
- Orasis Eye Analytics and Rehabilitation, Paris, France
- Laboratoire d'Informatique Paris Descartes,LIPADE, French University Institute (IUF) Universitá de Paris, 45 Rue Des Saints-Peres, 75006, Paris, France
| | - Themis Palpanas
- Laboratoire d'Informatique Paris Descartes,LIPADE, French University Institute (IUF) Universitá de Paris, 45 Rue Des Saints-Peres, 75006, Paris, France
| | - Zoi Kapoula
- Orasis Eye Analytics and Rehabilitation, Paris, France.
- Laboratoire d'Informatique Paris Descartes,LIPADE, French University Institute (IUF) Universitá de Paris, 45 Rue Des Saints-Peres, 75006, Paris, France.
| |
Collapse
|
4
|
Bankin M, Tyrykin Y, Duk M, Samsonova M, Kozlov K. Modeling Chickpea Productivity with Artificial Image Objects and Convolutional Neural Network. PLANTS (BASEL, SWITZERLAND) 2024; 13:2444. [PMID: 39273927 PMCID: PMC11397516 DOI: 10.3390/plants13172444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 08/23/2024] [Accepted: 08/28/2024] [Indexed: 09/15/2024]
Abstract
The chickpea plays a significant role in global agriculture and occupies an increasing share in the human diet. The main aim of the research was to develop a model for the prediction of two chickpea productivity traits in the available dataset. Genomic data for accessions were encoded in Artificial Image Objects, and a model for the thousand-seed weight (TSW) and number of seeds per plant (SNpP) prediction was constructed using a Convolutional Neural Network, dictionary learning and sparse coding for feature extraction, and extreme gradient boosting for regression. The model was capable of predicting both traits with an acceptable accuracy of 84-85%. The most important factors for model solution were identified using the dense regression attention maps method. The SNPs important for the SNpP and TSW traits were found in 34 and 49 genes, respectively. Genomic prediction with a constructed model can help breeding programs harness genotypic and phenotypic diversity to more effectively produce varieties with a desired phenotype.
Collapse
Affiliation(s)
- Mikhail Bankin
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Yaroslav Tyrykin
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Maria Duk
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Maria Samsonova
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Konstantin Kozlov
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| |
Collapse
|
5
|
Zhao C, Hsiao JH, Chan AB. Gradient-Based Instance-Specific Visual Explanations for Object Specification and Object Discrimination. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:5967-5985. [PMID: 38517727 DOI: 10.1109/tpami.2024.3380604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/24/2024]
Abstract
We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visual explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works on classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to one-stage, two-stage, and transformer-based detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art in terms of both effectiveness and efficiency. We discuss two explanation tasks for object detection: 1) object specification: what is the important region for the prediction? 2) object discrimination: which object is detected? Aiming at these two aspects, we present a detailed analysis of the visual explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM. Furthermore, we investigate user trust on the explanation maps, how well the visual explanations of object detectors agrees with human explanations, as measured through human eye gaze, and whether this agreement is related with user trust. Finally, we also propose two applications, ODAM-KD and ODAM-NMS, based on these two abilities of ODAM. ODAM-KD utilizes the object specification of ODAM to generate top-down attention for key predictions and instruct the knowledge distillation of object detection. ODAM-NMS considers the location of the model's explanation for each prediction to distinguish the duplicate detected objects. A training scheme, ODAM-Train, is proposed to improve the quality on object discrimination, and help with ODAM-NMS.
Collapse
|
6
|
Won H, Lee HS, Youn D, Park D, Eo T, Kim W, Hwang D. Deep Learning-Based Joint Effusion Classification in Adult Knee Radiographs: A Multi-Center Prospective Study. Diagnostics (Basel) 2024; 14:1900. [PMID: 39272685 PMCID: PMC11394442 DOI: 10.3390/diagnostics14171900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/09/2024] [Accepted: 08/23/2024] [Indexed: 09/15/2024] Open
Abstract
Knee effusion, a common and important indicator of joint diseases such as osteoarthritis, is typically more discernible on magnetic resonance imaging (MRI) scans compared to radiographs. However, the use of radiographs for the early detection of knee effusion remains promising due to their cost-effectiveness and accessibility. This multi-center prospective study collected a total of 1413 radiographs from four hospitals between February 2022 to March 2023, of which 1281 were analyzed after exclusions. To automatically detect knee effusion on radiographs, we utilized a state-of-the-art (SOTA) deep learning-based classification model with a novel preprocessing technique to optimize images for diagnosing knee effusion. The diagnostic performance of the proposed method was significantly higher than that of the baseline model, achieving an area under the receiver operating characteristic curve (AUC) of 0.892, accuracy of 0.803, sensitivity of 0.820, and specificity of 0.785. Moreover, the proposed method significantly outperformed two non-orthopedic physicians. Coupled with an explainable artificial intelligence method for visualization, this approach not only improved diagnostic performance but also interpretability, highlighting areas of effusion. These results demonstrate that the proposed method enables the early and accurate classification of knee effusions on radiographs, thereby reducing healthcare costs and improving patient outcomes through timely interventions.
Collapse
Affiliation(s)
- Hyeyeon Won
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
- Probe Medical Inc., 61, Yonsei-ro 2na-gil, Seodaemun-gu, Seoul 03777, Republic of Korea
| | - Hye Sang Lee
- Independent Researcher, Seoul 06295, Republic of Korea
| | - Daemyung Youn
- School of Management of Technology, Yonsei University, Seoul 03722, Republic of Korea
| | - Doohyun Park
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Taejoon Eo
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
- Probe Medical Inc., 61, Yonsei-ro 2na-gil, Seodaemun-gu, Seoul 03777, Republic of Korea
| | - Wooju Kim
- Department of Industrial Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Dosik Hwang
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
- Probe Medical Inc., 61, Yonsei-ro 2na-gil, Seodaemun-gu, Seoul 03777, Republic of Korea
- Artificial Intelligence and Robotics Institute, Korea Institute of Science and Technology, 5, Hwarang-ro 14-gil, Seongbuk-gu, Seoul 02792, Republic of Korea
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul 03722, Republic of Korea
- Department of Radiology, Center for Clinical Imaging Data Science (CCIDS), Yonsei University College of Medical, Seoul 03722, Republic of Korea
| |
Collapse
|
7
|
Guo W, Jin S, Li Y, Jiang Y. The dynamic-static dual-branch deep neural network for urban speeding hotspot identification using street view image data. ACCIDENT; ANALYSIS AND PREVENTION 2024; 203:107636. [PMID: 38776837 DOI: 10.1016/j.aap.2024.107636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 04/24/2024] [Accepted: 05/10/2024] [Indexed: 05/25/2024]
Abstract
The visual information regarding the road environment can influence drivers' perception and judgment, often resulting in frequent speeding incidents. Identifying speeding hotspots in cities can prevent potential speeding incidents, thereby improving traffic safety levels. We propose the Dual-Branch Contextual Dynamic-Static Feature Fusion Network based on static panoramic images and dynamically changing sequence data, aiming to capture global features in the macro scene of the area and dynamically changing information in the micro view for a more accurate urban speeding hotspot area identification. For the static branch, we propose the Multi-scale Contextual Feature Aggregation Network for learning global spatial contextual association information. In the dynamic branch, we construct the Multi-view Dynamic Feature Fusion Network to capture the dynamically changing features of a scene from a continuous sequence of street view images. Additionally, we designed the Dynamic-Static Feature Correlation Fusion Structure to correlate and fuse dynamic and static features. The experimental results show that the model has good performance, and the overall recognition accuracy reaches 99.4%. The ablation experiments show that the recognition effect after the fusion of dynamic and static features is better than that of static and dynamic branches. The proposed model also shows better performance than other deep learning models. In addition, we combine image processing methods and different Class Activation Mapping (CAM) methods to extract speeding frequency visual features from the model perception results. The results show that more accurate speeding frequency features can be obtained by using LayerCAM and GradCAM-Plus for static global scenes and dynamic local sequences, respectively. In the static global scene, the speeding frequency features are mainly concentrated on the buildings and green layout on both sides of the road, while in the dynamic scene, the speeding frequency features shift with the scene changes and are mainly concentrated on the dynamically changing transition areas of greenery, roads, and surrounding buildings. The code and model used for identifying hotspots of urban traffic accidents in this study are available for access: https://github.com/gwt-ZJU/DCDSFF-Net.
Collapse
Affiliation(s)
- Wentong Guo
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| | - Sheng Jin
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China; Zhongyuan Institute, Zhejiang University, Zhengzhou 450000, China.
| | - Yiding Li
- Henan Institute of Advanced Technology, Zhengzhou University, Zhengzhou 450003, China
| | - Yang Jiang
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| |
Collapse
|
8
|
Dai W, Wu T, Liu R, Wang M, Yin J, Liu J. Any region can be perceived equally and effectively on rotation pretext task using full rotation and weighted-region mixture. Neural Netw 2024; 176:106350. [PMID: 38723309 DOI: 10.1016/j.neunet.2024.106350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 01/15/2024] [Accepted: 04/28/2024] [Indexed: 06/17/2024]
Abstract
In recent years, self-supervised learning has emerged as a powerful approach to learning visual representations without requiring extensive manual annotation. One popular technique involves using rotation transformations of images, which provide a clear visual signal for learning semantic representation. However, in this work, we revisit the pretext task of predicting image rotation in self-supervised learning and discover that it tends to marginalise the perception of features located near the centre of an image. To address this limitation, we propose a new self-supervised learning method, namely FullRot, which spotlights underrated regions by resizing the randomly selected and cropped regions of images. Moreover, FullRot increases the complexity of the rotation pretext task by applying the degree-free rotation to the region cropped into a circle. To encourage models to learn from different general parts of an image, we introduce a new data mixture technique called WRMix, which merges two random intra-image patches. By combining these innovative crop and rotation methods with the data mixture scheme, our approach, FullRot + WRMix, surpasses the state-of-the-art self-supervision methods in classification, segmentation, and object detection tasks on ten benchmark datasets with an improvement of up to +13.98% accuracy on STL-10, +8.56% accuracy on CIFAR-10, +10.20% accuracy on Sports-100, +15.86% accuracy on Mammals-45, +15.15% accuracy on PAD-UFES-20, +32.44% mIoU on VOC 2012, +7.62% mIoU on ISIC 2018, +9.70% mIoU on FloodArea, +25.16% AP50 on VOC 2007, and +58.69% AP50 on UTDAC 2020. The code is available at https://github.com/anthonyweidai/FullRot_WRMix.
Collapse
Affiliation(s)
- Wei Dai
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Tianyi Wu
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Rui Liu
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Min Wang
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Jianqin Yin
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China.
| | - Jun Liu
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
9
|
Yuan H, Hong C, Jiang PT, Zhao G, Tran NTA, Xu X, Yan YY, Liu N. Clinical domain knowledge-derived template improves post hoc AI explanations in pneumothorax classification. J Biomed Inform 2024; 156:104673. [PMID: 38862083 DOI: 10.1016/j.jbi.2024.104673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 06/01/2024] [Accepted: 06/07/2024] [Indexed: 06/13/2024]
Abstract
OBJECTIVE Pneumothorax is an acute thoracic disease caused by abnormal air collection between the lungs and chest wall. Recently, artificial intelligence (AI), especially deep learning (DL), has been increasingly employed for automating the diagnostic process of pneumothorax. To address the opaqueness often associated with DL models, explainable artificial intelligence (XAI) methods have been introduced to outline regions related to pneumothorax. However, these explanations sometimes diverge from actual lesion areas, highlighting the need for further improvement. METHOD We propose a template-guided approach to incorporate the clinical knowledge of pneumothorax into model explanations generated by XAI methods, thereby enhancing the quality of the explanations. Utilizing one lesion delineation created by radiologists, our approach first generates a template that represents potential areas of pneumothorax occurrence. This template is then superimposed on model explanations to filter out extraneous explanations that fall outside the template's boundaries. To validate its efficacy, we carried out a comparative analysis of three XAI methods (Saliency Map, Grad-CAM, and Integrated Gradients) with and without our template guidance when explaining two DL models (VGG-19 and ResNet-50) in two real-world datasets (SIIM-ACR and ChestX-Det). RESULTS The proposed approach consistently improved baseline XAI methods across twelve benchmark scenarios built on three XAI methods, two DL models, and two datasets. The average incremental percentages, calculated by the performance improvements over the baseline performance, were 97.8% in Intersection over Union (IoU) and 94.1% in Dice Similarity Coefficient (DSC) when comparing model explanations and ground-truth lesion areas. We further visualized baseline and template-guided model explanations on radiographs to showcase the performance of our approach. CONCLUSIONS In the context of pneumothorax diagnoses, we proposed a template-guided approach for improving model explanations. Our approach not only aligns model explanations more closely with clinical insights but also exhibits extensibility to other thoracic diseases. We anticipate that our template guidance will forge a novel approach to elucidating AI models by integrating clinical domain expertise.
Collapse
Affiliation(s)
- Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, USA
| | | | - Gangming Zhao
- Faculty of Engineering, The University of Hong Kong, China
| | | | - Xinxing Xu
- Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore
| | - Yet Yen Yan
- Department of Radiology, Changi General Hospital, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Institute of Data Science, National University of Singapore, Singapore.
| |
Collapse
|
10
|
Li C, Narayanan A, Ghobakhlou A. Overlapping Shoeprint Detection by Edge Detection and Deep Learning. J Imaging 2024; 10:186. [PMID: 39194975 DOI: 10.3390/jimaging10080186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 07/04/2024] [Accepted: 07/30/2024] [Indexed: 08/29/2024] Open
Abstract
In the field of 2-D image processing and computer vision, accurately detecting and segmenting objects in scenarios where they overlap or are obscured remains a challenge. This difficulty is worse in the analysis of shoeprints used in forensic investigations because they are embedded in noisy environments such as the ground and can be indistinct. Traditional convolutional neural networks (CNNs), despite their success in various image analysis tasks, struggle with accurately delineating overlapping objects due to the complexity of segmenting intertwined textures and boundaries against a background of noise. This study introduces and employs the YOLO (You Only Look Once) model enhanced by edge detection and image segmentation techniques to improve the detection of overlapping shoeprints. By focusing on the critical boundary information between shoeprint textures and the ground, our method demonstrates improvements in sensitivity and precision, achieving confidence levels above 85% for minimally overlapped images and maintaining above 70% for extensively overlapped instances. Heatmaps of convolution layers were generated to show how the network converges towards successful detection using these enhancements. This research may provide a potential methodology for addressing the broader challenge of detecting multiple overlapping objects against noisy backgrounds.
Collapse
Affiliation(s)
- Chengran Li
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| | - Ajit Narayanan
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| | - Akbar Ghobakhlou
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1010, New Zealand
| |
Collapse
|
11
|
Bhave S, Rodriguez V, Poterucha T, Mutasa S, Aberle D, Capaccione KM, Chen Y, Dsouza B, Dumeer S, Goldstein J, Hodes A, Leb J, Lungren M, Miller M, Monoky D, Navot B, Wattamwar K, Wattamwar A, Clerkin K, Ouyang D, Ashley E, Topkara VK, Maurer M, Einstein AJ, Uriel N, Homma S, Schwartz A, Jaramillo D, Perotte AJ, Elias P. Deep learning to detect left ventricular structural abnormalities in chest X-rays. Eur Heart J 2024; 45:2002-2012. [PMID: 38503537 PMCID: PMC11156488 DOI: 10.1093/eurheartj/ehad782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 09/24/2023] [Accepted: 11/14/2023] [Indexed: 03/21/2024] Open
Abstract
BACKGROUND AND AIMS Early identification of cardiac structural abnormalities indicative of heart failure is crucial to improving patient outcomes. Chest X-rays (CXRs) are routinely conducted on a broad population of patients, presenting an opportunity to build scalable screening tools for structural abnormalities indicative of Stage B or worse heart failure with deep learning methods. In this study, a model was developed to identify severe left ventricular hypertrophy (SLVH) and dilated left ventricle (DLV) using CXRs. METHODS A total of 71 589 unique CXRs from 24 689 different patients completed within 1 year of echocardiograms were identified. Labels for SLVH, DLV, and a composite label indicating the presence of either were extracted from echocardiograms. A deep learning model was developed and evaluated using area under the receiver operating characteristic curve (AUROC). Performance was additionally validated on 8003 CXRs from an external site and compared against visual assessment by 15 board-certified radiologists. RESULTS The model yielded an AUROC of 0.79 (0.76-0.81) for SLVH, 0.80 (0.77-0.84) for DLV, and 0.80 (0.78-0.83) for the composite label, with similar performance on an external data set. The model outperformed all 15 individual radiologists for predicting the composite label and achieved a sensitivity of 71% vs. 66% against the consensus vote across all radiologists at a fixed specificity of 73%. CONCLUSIONS Deep learning analysis of CXRs can accurately detect the presence of certain structural abnormalities and may be useful in early identification of patients with LV hypertrophy and dilation. As a resource to promote further innovation, 71 589 CXRs with adjoining echocardiographic labels have been made publicly available.
Collapse
Affiliation(s)
- Shreyas Bhave
- Division of Cardiology and Department of Biomedical Informatics, Columbia University Irving Medical Center, 622 West 168th Street, PH20, NewYork, NY 10032, USA
| | - Victor Rodriguez
- Division of Cardiology and Department of Biomedical Informatics, Columbia University Irving Medical Center, 622 West 168th Street, PH20, NewYork, NY 10032, USA
| | - Timothy Poterucha
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Simukayi Mutasa
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Dwight Aberle
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Kathleen M Capaccione
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Yibo Chen
- Inova Fairfax Hospital Imaging Center, Inova Fairfax Medical Campus, Falls Church, VA, USA
| | - Belinda Dsouza
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Shifali Dumeer
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Jonathan Goldstein
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Aaron Hodes
- Hackensack Radiology Group, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Jay Leb
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Matthew Lungren
- Department of Radiology, University of California, SanFrancisco, CA, USA
| | - Mitchell Miller
- Hackensack Radiology Group, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - David Monoky
- Hackensack Radiology Group, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Benjamin Navot
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Kapil Wattamwar
- Division of Vascular and Interventional Radiology, Department of Radiology, Montefiore Medical Center, Bronx, NY, USA
| | - Anoop Wattamwar
- Hackensack Radiology Group, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Kevin Clerkin
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - David Ouyang
- Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Euan Ashley
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Palo Alto, CA, USA
| | - Veli K Topkara
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Mathew Maurer
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Andrew J Einstein
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Nir Uriel
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Shunichi Homma
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Allan Schwartz
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| | - Diego Jaramillo
- Department of Radiology, Columbia University Irving Medical Center, NewYork, NY, USA
| | - Adler J Perotte
- Division of Cardiology and Department of Biomedical Informatics, Columbia University Irving Medical Center, 622 West 168th Street, PH20, NewYork, NY 10032, USA
| | - Pierre Elias
- Division of Cardiology and Department of Biomedical Informatics, Columbia University Irving Medical Center, 622 West 168th Street, PH20, NewYork, NY 10032, USA
- Seymour, Paul, and Gloria Milstein Division of Cardiology, Department of Medicine, Columbia University Irving Medical Center, NewYork-Presbyterian Hospital, 630 West 168th Street, NewYork, NY 10032, USA
| |
Collapse
|
12
|
You J, Ajlouni S, Kakaletri I, Charalampaki P, Giannarou S. XRelevanceCAM: towards explainable tissue characterization with improved localisation of pathological structures in probe-based confocal laser endomicroscopy. Int J Comput Assist Radiol Surg 2024; 19:1061-1073. [PMID: 38538880 PMCID: PMC11178611 DOI: 10.1007/s11548-024-03096-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 02/29/2024] [Indexed: 06/15/2024]
Abstract
PURPOSE Probe-based confocal laser endomicroscopy (pCLE) enables intraoperative tissue characterization with improved resection rates of brain tumours. Although a plethora of deep learning models have been developed for automating tissue characterization, their lack of transparency is a concern. To tackle this issue, techniques like Class Activation Map (CAM) and its variations highlight image regions related to model decisions. However, they often fall short of providing human-interpretable visual explanations for surgical decision support, primarily due to the shattered gradient problem or insufficient theoretical underpinning. METHODS In this paper, we introduce XRelevanceCAM, an explanation method rooted in a better backpropagation approach, incorporating sensitivity and conservation axioms. This enhanced method offers greater theoretical foundation and effectively mitigates the shattered gradient issue when compared to other CAM variants. RESULTS Qualitative and quantitative evaluations are based on ex vivo pCLE data of brain tumours. XRelevanceCAM effectively highlights clinically relevant areas that characterize the tissue type. Specifically, it yields a remarkable 56% improvement over our closest baseline, RelevanceCAM, in the network's shallowest layer as measured by the mean Intersection over Union (mIoU) metric based on ground-truth annotations (from 18 to 28.07%). Furthermore, a 6% improvement in mIoU is observed when generating the final saliency map from all network layers. CONCLUSION We introduce a new CAM variation, XRelevanceCAM, for precise identification of clinically important structures in pCLE data. This can aid introperative decision support in brain tumour resection surgery, as validated in our performance study.
Collapse
Affiliation(s)
- Jianzhong You
- Department of Computing, Imperial College London, Huxley Building, 180 Queen's Gate, South Kensington, London, UK.
| | - Serine Ajlouni
- Medical Faculty, University Witten Herdecke, 58455, Witten, Germany
| | - Irini Kakaletri
- Medical Faculty, Rheinische Friedrich Wilhelms University of Bonn, 53127, Bonn, Germany
| | - Patra Charalampaki
- Department of Neurosurgery, University Witten Herdecke, 58455, Witten, Germany
| | - Stamatia Giannarou
- Department of Surgery and Cancer, Imperial College London, 413, 4th Floor, Bessemer Building, South Kensington Campus, London, UK
| |
Collapse
|
13
|
Wang S, Sun M, Sun J, Wang Q, Wang G, Wang X, Meng X, Wang Z, Yu H. Advancing musculoskeletal tumor diagnosis: Automated segmentation and predictive classification using deep learning and radiomics. Comput Biol Med 2024; 175:108502. [PMID: 38678943 DOI: 10.1016/j.compbiomed.2024.108502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/18/2024] [Accepted: 04/21/2024] [Indexed: 05/01/2024]
Abstract
OBJECTIVES Musculoskeletal (MSK) tumors, given their high mortality rate and heterogeneity, necessitate precise examination and diagnosis to guide clinical treatment effectively. Magnetic resonance imaging (MRI) is pivotal in detecting MSK tumors, as it offers exceptional image contrast between bone and soft tissue. This study aims to enhance the speed of detection and the diagnostic accuracy of MSK tumors through automated segmentation and grading utilizing MRI. MATERIALS AND METHODS The research included 170 patients (mean age, 58 years ±12 (standard deviation), 84 men) with MSK lesions, who underwent MRI scans from April 2021 to May 2023. We proposed a deep learning (DL) segmentation model MSAPN based on multi-scale attention and pixel-level reconstruction, and compared it with existing algorithms. Using MSAPN-segmented lesions to extract their radiomic features for the benign and malignant classification of tumors. RESULTS Compared to the most advanced segmentation algorithms, MSAPN demonstrates better performance. The Dice similarity coefficients (DSC) are 0.871 and 0.815 in the testing set and independent validation set, respectively. The radiomics model for classifying benign and malignant lesions achieves an accuracy of 0.890. Moreover, there is no statistically significant difference between the radiomics model based on manual segmentation and MSAPN segmentation. CONCLUSION This research contributes to the advancement of MSK tumor diagnosis through automated segmentation and predictive classification. The integration of DL algorithms and radiomics shows promising results, and the visualization analysis of feature maps enhances clinical interpretability.
Collapse
Affiliation(s)
- Shuo Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China; State Key Laboratory of Advanced Medical Materials and Devices, Tianjin University, Tianjin, 300072, China.
| | - Man Sun
- Radiology Department, Tianjin University Tianjin Hospital, Tianjin, 300299, China.
| | - Jinglai Sun
- The School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, 300072, China.
| | - Qingsong Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China.
| | - Guangpu Wang
- The School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, 300072, China.
| | - Xiaolin Wang
- The School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, 300072, China.
| | - Xianghong Meng
- Radiology Department, Tianjin University Tianjin Hospital, Tianjin, 300299, China.
| | - Zhi Wang
- Radiology Department, Tianjin University Tianjin Hospital, Tianjin, 300299, China.
| | - Hui Yu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China; State Key Laboratory of Advanced Medical Materials and Devices, Tianjin University, Tianjin, 300072, China; The School of Precision Instrument and Opto-Electronics Engineering, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
14
|
Rao S, Bohle M, Schiele B. Better Understanding Differences in Attribution Methods via Systematic Evaluations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4090-4101. [PMID: 38215324 DOI: 10.1109/tpami.2024.3353528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/14/2024]
Abstract
Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.
Collapse
|
15
|
Odusami M, Maskeliūnas R, Damaševičius R, Misra S. Machine learning with multimodal neuroimaging data to classify stages of Alzheimer's disease: a systematic review and meta-analysis. Cogn Neurodyn 2024; 18:775-794. [PMID: 38826669 PMCID: PMC11143094 DOI: 10.1007/s11571-023-09993-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 06/23/2023] [Accepted: 07/17/2023] [Indexed: 06/04/2024] Open
Abstract
In recent years, Alzheimer's disease (AD) has been a serious threat to human health. Researchers and clinicians alike encounter a significant obstacle when trying to accurately identify and classify AD stages. Several studies have shown that multimodal neuroimaging input can assist in providing valuable insights into the structural and functional changes in the brain related to AD. Machine learning (ML) algorithms can accurately categorize AD phases by identifying patterns and linkages in multimodal neuroimaging data using powerful computational methods. This study aims to assess the contribution of ML methods to the accurate classification of the stages of AD using multimodal neuroimaging data. A systematic search is carried out in IEEE Xplore, Science Direct/Elsevier, ACM DigitalLibrary, and PubMed databases with forward snowballing performed on Google Scholar. The quantitative analysis used 47 studies. The explainable analysis was performed on the classification algorithm and fusion methods used in the selected studies. The pooled sensitivity and specificity, including diagnostic efficiency, were evaluated by conducting a meta-analysis based on a bivariate model with the hierarchical summary receiver operating characteristics (ROC) curve of multimodal neuroimaging data and ML methods in the classification of AD stages. Wilcoxon signed-rank test is further used to statistically compare the accuracy scores of the existing models. With a 95% confidence interval of 78.87-87.71%, the combined sensitivity for separating participants with mild cognitive impairment (MCI) from healthy control (NC) participants was 83.77%; for separating participants with AD from NC, it was 94.60% (90.76%, 96.89%); for separating participants with progressive MCI (pMCI) from stable MCI (sMCI), it was 80.41% (74.73%, 85.06%). With a 95% confidence interval (78.87%, 87.71%), the Pooled sensitivity for distinguishing mild cognitive impairment (MCI) from healthy control (NC) participants was 83.77%, with a 95% confidence interval (90.76%, 96.89%), the Pooled sensitivity for distinguishing AD from NC was 94.60%, likewise (MCI) from healthy control (NC) participants was 83.77% progressive MCI (pMCI) from stable MCI (sMCI) was 80.41% (74.73%, 85.06%), and early MCI (EMCI) from NC was 86.63% (82.43%, 89.95%). Pooled specificity for differentiating MCI from NC was 79.16% (70.97%, 87.71%), AD from NC was 93.49% (91.60%, 94.90%), pMCI from sMCI was 81.44% (76.32%, 85.66%), and EMCI from NC was 85.68% (81.62%, 88.96%). The Wilcoxon signed rank test showed a low P-value across all the classification tasks. Multimodal neuroimaging data with ML is a promising future in classifying the stages of AD but more research is required to increase the validity of its application in clinical practice.
Collapse
Affiliation(s)
- Modupe Odusami
- Department of Multimedia Engineering, Kaunas University of Technology, Kaunas, Lithuania
| | - Rytis Maskeliūnas
- Department of Multimedia Engineering, Kaunas University of Technology, Kaunas, Lithuania
| | | | - Sanjay Misra
- Department of Applied Data Science, Institute for Energy Technology, Halden, Norway
| |
Collapse
|
16
|
Song B, Yoshida S. Explainability of three-dimensional convolutional neural networks for functional magnetic resonance imaging of Alzheimer's disease classification based on gradient-weighted class activation mapping. PLoS One 2024; 19:e0303278. [PMID: 38771733 PMCID: PMC11108152 DOI: 10.1371/journal.pone.0303278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 04/22/2024] [Indexed: 05/23/2024] Open
Abstract
Currently, numerous studies focus on employing fMRI-based deep neural networks to diagnose neurological disorders such as Alzheimer's Disease (AD), yet only a handful have provided results regarding explainability. We address this gap by applying several prevalent explainability methods such as gradient-weighted class activation mapping (Grad-CAM) to an fMRI-based 3D-VGG16 network for AD diagnosis to improve the model's explainability. The aim is to explore the specific Region of Interest (ROI) of brain the model primarily focuses on when making predictions, as well as whether there are differences in these ROIs between AD and normal controls (NCs). First, we utilized multiple resting-state functional activity maps including ALFF, fALFF, ReHo, and VMHC to reduce the complexity of fMRI data, which differed from many studies that utilized raw fMRI data. Compared to methods utilizing raw fMRI data, this manual feature extraction approach may potentially alleviate the model's burden. Subsequently, 3D-VGG16 were employed for AD classification, where the final fully connected layers were replaced with a Global Average Pooling (GAP) layer, aimed at mitigating overfitting while preserving spatial information within the feature maps. The model achieved a maximum of 96.4% accuracy on the test set. Finally, several 3D CAM methods were employed to interpret the models. In the explainability results of the models with relatively high accuracy, the highlighted ROIs were primarily located in the precuneus and the hippocampus for AD subjects, while the models focused on the entire brain for NC. This supports current research on ROIs involved in AD. We believe that explaining deep learning models would not only provide support for existing research on brain disorders, but also offer important referential recommendations for the study of currently unknown etiologies.
Collapse
Affiliation(s)
- Boyue Song
- Graduate School of Engineering, Kochi University of Technology, Kami City, Kochi Prefecture, Japan
| | - Shinichi Yoshida
- School of Information, Kochi University of Technology, Kami City, Kochi Prefecture, Japan
| | | |
Collapse
|
17
|
Fan Y, Li Q, Mao H, Jiang F. Magnetoencephalography Decoding Transfer Approach: From Deep Learning Models to Intrinsically Interpretable Models. IEEE J Biomed Health Inform 2024; 28:2818-2829. [PMID: 38349827 DOI: 10.1109/jbhi.2024.3365051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2024]
Abstract
When decoding neuroelectrophysiological signals represented by Magnetoencephalography (MEG), deep learning models generally achieve high predictive performance but lack the ability to interpret their predicted results. This limitation prevents them from meeting the essential requirements of reliability and ethical-legal considerations in practical applications. In contrast, intrinsically interpretable models, such as decision trees, possess self-evident interpretability while typically sacrificing accuracy. To effectively combine the respective advantages of both deep learning and intrinsically interpretable models, an MEG transfer approach through feature attribution-based knowledge distillation is pioneered, which transforms deep models (teacher) into highly accurate intrinsically interpretable models (student). The resulting models provide not only intrinsic interpretability but also high predictive performance, besides serving as an excellent approximate proxy to understand the inner workings of deep models. In the proposed approach, post-hoc feature knowledge derived from post-hoc interpretable algorithms, specifically feature attribution maps, is introduced into knowledge distillation for the first time. By guiding intrinsically interpretable models to assimilate this knowledge, the transfer of MEG decoding information from deep models to intrinsically interpretable models is implemented. Experimental results demonstrate that the proposed approach outperforms the benchmark knowledge distillation algorithms. This approach successfully improves the prediction accuracy of Soft Decision Tree by a maximum of 8.28%, reaching almost equivalent or even superior performance to deep teacher models. Furthermore, the model-agnostic nature of this approach offers broad application potential.
Collapse
|
18
|
Niu Y, Ding M, Ge M, Karlsson R, Zhang Y, Carballo A, Takeda K. R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut. SENSORS (BASEL, SWITZERLAND) 2024; 24:2695. [PMID: 38732800 PMCID: PMC11085337 DOI: 10.3390/s24092695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 04/20/2024] [Accepted: 04/21/2024] [Indexed: 05/13/2024]
Abstract
Transformer-based models have gained popularity in the field of natural language processing (NLP) and are extensively utilized in computer vision tasks and multi-modal models such as GPT4. This paper presents a novel method to enhance the explainability of transformer-based image classification models. Our method aims to improve trust in classification results and empower users to gain a deeper understanding of the model for downstream tasks by providing visualizations of class-specific maps. We introduce two modules: the "Relationship Weighted Out" and the "Cut" modules. The "Relationship Weighted Out" module focuses on extracting class-specific information from intermediate layers, enabling us to highlight relevant features. Additionally, the "Cut" module performs fine-grained feature decomposition, taking into account factors such as position, texture, and color. By integrating these modules, we generate dense class-specific visual explainability maps. We validate our method with extensive qualitative and quantitative experiments on the ImageNet dataset. Furthermore, we conduct a large number of experiments on the LRN dataset, which is specifically designed for automatic driving danger alerts, to evaluate the explainability of our method in scenarios with complex backgrounds. The results demonstrate a significant improvement over previous methods. Moreover, we conduct ablation experiments to validate the effectiveness of each module. Through these experiments, we are able to confirm the respective contributions of each module, thus solidifying the overall effectiveness of our proposed approach.
Collapse
Affiliation(s)
- Yingjie Niu
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Ming Ding
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Maoning Ge
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Robin Karlsson
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Yuxiao Zhang
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
| | - Alexander Carballo
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
- Graduate School of Engineering, Gifu University, Gifu 501-1112, Japan
| | - Kazuya Takeda
- Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan (R.K.); (A.C.); (K.T.)
- Tier IV Inc., Tokyo 140-0001, Japan
| |
Collapse
|
19
|
Huang C, Jiang Y, Yang X, Wei C, Chen H, Xiong W, Lin H, Wang X, Tian T, Tan H. Enhancing Retinal Fundus Image Quality Assessment With Swin-Transformer-Based Learning Across Multiple Color-Spaces. Transl Vis Sci Technol 2024; 13:8. [PMID: 38568606 PMCID: PMC10996994 DOI: 10.1167/tvst.13.4.8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 02/18/2024] [Indexed: 04/05/2024] Open
Abstract
Purpose The assessment of retinal image (RI) quality holds significant importance in both clinical trials and large datasets, because suboptimal images can potentially conceal early signs of diseases, thereby resulting in inaccurate medical diagnoses. This study aims to develop an automatic method for Retinal Image Quality Assessment (RIQA) that incorporates visual explanations, aiming to comprehensively evaluate the quality of retinal fundus images (RIs). Methods We developed an automatic RIQA system, named Swin-MCSFNet, utilizing 28,792 RIs from the EyeQ dataset, as well as 2000 images from the EyePACS dataset and an additional 1,000 images from the OIA-ODIR dataset. After preprocessing, including cropping black regions, data augmentation, and normalization, a Swin-MCSFNet classifier based on the Swin-Transformer for multiple color-space fusion was proposed to grade the quality of RIs. The generalizability of Swin-MCSFNet was validated across multiple data centers. Additionally, for enhanced interpretability, a Score-CAM-generated heatmap was applied to provide visual explanations. Results Experimental results reveal that the proposed Swin-MCSFNet achieves promising performance, yielding a micro-receiver operating characteristic (ROC) of 0.93 and ROC scores of 0.96, 0.81, and 0.96 for the "Good," "Usable," and "Reject" categories, respectively. These scores underscore the accuracy of RIQA based on Swin-MCSF in distinguishing among the three categories. Furthermore, heatmaps generated across different RIQA classification scores and various color spaces suggest that regions in the retinal images from multiple color spaces contribute significantly to the decision-making process of the Swin-MCSFNet classifier. Conclusions Our study demonstrates that the proposed Swin-MCSFNet outperforms other methods in experiments conducted on multiple datasets, as evidenced by the superior performance metrics and insightful Score-CAM heatmaps. Translational Relevance This study constructs a new retinal image quality evaluation system, which will contribute to the subsequent research of retinal images.
Collapse
Affiliation(s)
- Chengcheng Huang
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Yukang Jiang
- School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Xiaochun Yang
- The First People's Hospital of Yun Nan Province, Kunming, China
| | - Chiyu Wei
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Hongyu Chen
- Department of Optoelectronic Information Science and Engineering, Physical and Materials Science College, Guangzhou University, Guangzhou, China
| | - Weixue Xiong
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Henghui Lin
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Xueqin Wang
- School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Ting Tian
- School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Haizhu Tan
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| |
Collapse
|
20
|
Deng J, Heybati K, Shammas-Toma M. When vision meets reality: Exploring the clinical applicability of GPT-4 with vision. Clin Imaging 2024; 108:110101. [PMID: 38341880 DOI: 10.1016/j.clinimag.2024.110101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/29/2024] [Accepted: 02/01/2024] [Indexed: 02/13/2024]
Affiliation(s)
- Jiawen Deng
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada; Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, ON, Canada.
| | - Kiyan Heybati
- Mayo Clinic Alix School of Medicine, Mayo Clinic, Jacksonville, FL, USA
| | - Matthew Shammas-Toma
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada; Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, ON, Canada
| |
Collapse
|
21
|
Hong SJ, Hou JU, Chung MJ, Kang SH, Shim BS, Lee SL, Park DH, Choi A, Oh JY, Lee KJ, Shin E, Cho E, Park SW. Convolutional neural network model for automatic recognition and classification of pancreatic cancer cell based on analysis of lipid droplet on unlabeled sample by 3D optical diffraction tomography. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 246:108041. [PMID: 38325025 DOI: 10.1016/j.cmpb.2024.108041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 01/05/2024] [Accepted: 01/19/2024] [Indexed: 02/09/2024]
Abstract
INTRODUCTION Pancreatic cancer cells generally accumulate large numbers of lipid droplets (LDs), which regulate lipid storage. To promote rapid diagnosis, an automatic pancreatic cancer cell recognition system based on a deep convolutional neural network was proposed in this study using quantitative images of LDs from stain-free cytologic samples by optical diffraction tomography. METHODS We retrieved 3D refractive index tomograms and reconstructed 37 optical images of one cell. From the four cell lines, the obtained fields were separated into training and test datasets with 10,397 and 3,478 images, respectively. Furthermore, we adopted several machine learning techniques based on a single image-based prediction model to improve the performance of the computer-aided diagnostic system. RESULTS Pancreatic cancer cells had a significantly lower total cell volume and dry mass than did normal pancreatic cells and were accompanied by greater numbers of lipid droplets (LDs). When evaluating multitask learning techniques utilizing the EfficientNet-b3 model through confusion matrices, the overall 2-category accuracy for cancer classification reached 96.7 %. Simultaneously, the overall 4-category accuracy for individual cell line classification achieved a high accuracy of 96.2 %. Furthermore, when we added the core techniques one by one, the overall performance of the proposed technique significantly improved, reaching an area under the curve (AUC) of 0.997 and an accuracy of 97.06 %. Finally, the AUC reached 0.998 through the ablation study with the score fusion technique. DISCUSSION Our novel training strategy has significant potential for automating and promoting rapid recognition of pancreatic cancer cells. In the near future, deep learning-embedded medical devices will substitute laborious manual cytopathologic examinations for sustainable economic potential.
Collapse
Affiliation(s)
- Seok Jin Hong
- Department of Otolaryngology-Head and Neck Surgery, Kangbuk Samsung Hospital Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jong-Uk Hou
- School of Software, Hallym University, Chuncheon, Republic of Korea
| | - Moon Jae Chung
- Division of Gastroenterology, Department of Internal Medicine, Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sung Hun Kang
- Department of Otolaryngology-Head and Neck Surgery, Kangbuk Samsung Hospital Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Bo-Seok Shim
- School of Software, Hallym University, Chuncheon, Republic of Korea
| | - Seung-Lee Lee
- School of Software, Hallym University, Chuncheon, Republic of Korea
| | - Da Hae Park
- Division of Gastroenterology, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, 7, Keunjaebong-gil, Hwaseong-si, Gyeonggi-do 18450, Republic of Korea
| | - Anna Choi
- Division of Gastroenterology, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, 7, Keunjaebong-gil, Hwaseong-si, Gyeonggi-do 18450, Republic of Korea
| | - Jae Yeon Oh
- Hallym University College of Medicine, Chuncheon, Republic of Korea
| | - Kyong Joo Lee
- Division of Gastroenterology, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, 7, Keunjaebong-gil, Hwaseong-si, Gyeonggi-do 18450, Republic of Korea
| | - Eun Shin
- Department of Pathology, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, Hwaseong, Republic of Korea
| | - Eunae Cho
- Division of Gastroenterology, Department of Internal Medicine, Chonnam National University Hospital, Gwangju, Republic of Korea
| | - Se Woo Park
- Division of Gastroenterology, Department of Internal Medicine, Hallym University Dongtan Sacred Heart Hospital, Hallym University College of Medicine, 7, Keunjaebong-gil, Hwaseong-si, Gyeonggi-do 18450, Republic of Korea.
| |
Collapse
|
22
|
Famiglini L, Campagner A, Barandas M, La Maida GA, Gallazzi E, Cabitza F. Evidence-based XAI: An empirical approach to design more effective and explainable decision support systems. Comput Biol Med 2024; 170:108042. [PMID: 38308866 DOI: 10.1016/j.compbiomed.2024.108042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 12/19/2023] [Accepted: 01/26/2024] [Indexed: 02/05/2024]
Abstract
This paper proposes a user study aimed at evaluating the impact of Class Activation Maps (CAMs) as an eXplainable AI (XAI) method in a radiological diagnostic task, the detection of thoracolumbar (TL) fractures from vertebral X-rays. In particular, we focus on two oft-neglected features of CAMs, that is granularity and coloring, in terms of what features, lower-level vs higher-level, should the maps highlight and adopting which coloring scheme, to bring better impact to the decision-making process, both in terms of diagnostic accuracy (that is effectiveness) and of user-centered dimensions, such as perceived confidence and utility (that is satisfaction), depending on case complexity, AI accuracy, and user expertise. Our findings show that lower-level features CAMs, which highlight more focused anatomical landmarks, are associated with higher diagnostic accuracy than higher-level features CAMs, particularly among experienced physicians. Moreover, despite the intuitive appeal of semantic CAMs, traditionally colored CAMs consistently yielded higher diagnostic accuracy across all groups. Our results challenge some prevalent assumptions in the XAI field and emphasize the importance of adopting an evidence-based and human-centered approach to design and evaluate AI- and XAI-assisted diagnostic tools. To this aim, the paper also proposes a hierarchy of evidence framework to help designers and practitioners choose the XAI solutions that optimize performance and satisfaction on the basis of the strongest evidence available or to focus on the gaps in the literature that need to be filled to move from opinionated and eminence-based research to one more based on empirical evidence and end-user work and preferences.
Collapse
Affiliation(s)
- Lorenzo Famiglini
- Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
| | | | - Marilia Barandas
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, Porto, Portugal
| | | | - Enrico Gallazzi
- Istituto Ortopedico Gaetano Pini - ASST Pini-CTO, Milan, Italy
| | - Federico Cabitza
- Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milan, Italy; IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
| |
Collapse
|
23
|
Fuentes AM, Milligan K, Wiebe M, Narayan A, Lum JJ, Brolo AG, Andrews JL, Jirasek A. Stratification of tumour cell radiation response and metabolic signatures visualization with Raman spectroscopy and explainable convolutional neural network. Analyst 2024; 149:1645-1657. [PMID: 38312026 DOI: 10.1039/d3an01797d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
Reprogramming of cellular metabolism is a driving factor of tumour progression and radiation therapy resistance. Identifying biochemical signatures associated with tumour radioresistance may assist with the development of targeted treatment strategies to improve clinical outcomes. Raman spectroscopy (RS) can monitor post-irradiation biomolecular changes and signatures of radiation response in tumour cells in a label-free manner. Convolutional Neural Networks (CNN) perform feature extraction directly from data in an end-to-end learning manner, with high classification performance. Furthermore, recently developed CNN explainability techniques help visualize the critical discriminative features captured by the model. In this work, a CNN is developed to characterize tumour response to radiotherapy based on its degree of radioresistance. The model was trained to classify Raman spectra of three human tumour cell lines as radiosensitive (LNCaP) or radioresistant (MCF7, H460) over a range of treatment doses and data collection time points. Additionally, a method based on Gradient-Weighted Class Activation Mapping (Grad-CAM) was used to determine response-specific salient Raman peaks influencing the CNN predictions. The CNN effectively classified the cell spectra, with accuracy, sensitivity, specificity, and F1 score exceeding 99.8%. Grad-CAM heatmaps of H460 and MCF7 cell spectra (radioresistant) exhibited high contributions from Raman bands tentatively assigned to glycogen, amino acids, and nucleic acids. Conversely, heatmaps of LNCaP cells (radiosensitive) revealed activations at lipid and phospholipid bands. Finally, Grad-CAM variable importance scores were derived for glycogen, asparagine, and phosphatidylcholine, and we show that their trends over cell line, dose, and acquisition time agreed with previously established models. Thus, the CNN can accurately detect biomolecular differences in the Raman spectra of tumour cells of varying radiosensitivity without requiring manual feature extraction. Finally, Grad-CAM may help identify metabolic signatures associated with the observed categories, offering the potential for automated clinical tumour radiation response characterization.
Collapse
Affiliation(s)
- Alejandra M Fuentes
- Department of Physics, The University of British Columbia Okanagan Campus, Kelowna, Canada.
| | - Kirsty Milligan
- Department of Physics, The University of British Columbia Okanagan Campus, Kelowna, Canada.
| | - Mitchell Wiebe
- Department of Physics, The University of British Columbia Okanagan Campus, Kelowna, Canada.
| | - Apurva Narayan
- Department of Computer Science, Western University, London, Canada
- Department of Computer Science, The University of British Columbia Okanagan Campus, Kelowna, Canada
| | - Julian J Lum
- Department of Biochemistry and Microbiology, The University of Victoria, Victoria, Canada
- Trev and Joyce Deeley Research Centre, BC Cancer, Victoria, Canada
| | - Alexandre G Brolo
- Department of Chemistry, The University of Victoria, Victoria, Canada
| | - Jeffrey L Andrews
- Department of Statistics, The University of British Columbia Okanagan Campus, Kelowna, Canada
| | - Andrew Jirasek
- Department of Physics, The University of British Columbia Okanagan Campus, Kelowna, Canada.
| |
Collapse
|
24
|
Yao Y, Yang J, Sun H, Kong H, Wang S, Xu K, Dai W, Jiang S, Bai Q, Xing S, Yuan J, Liu X, Lu F, Chen Z, Qu J, Su J. DeepGraFT: A novel semantic segmentation auxiliary ROI-based deep learning framework for effective fundus tessellation classification. Comput Biol Med 2024; 169:107881. [PMID: 38159401 DOI: 10.1016/j.compbiomed.2023.107881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/04/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024]
Abstract
Fundus tessellation (FT) is a prevalent clinical feature associated with myopia and has implications in the development of myopic maculopathy, which causes irreversible visual impairment. Accurate classification of FT in color fundus photo can help predict the disease progression and prognosis. However, the lack of precise detection and classification tools has created an unmet medical need, underscoring the importance of exploring the clinical utility of FT. Thus, to address this gap, we introduce an automatic FT grading system (called DeepGraFT) using classification-and-segmentation co-decision models by deep learning. ConvNeXt, utilizing transfer learning from pretrained ImageNet weights, was employed for the classification algorithm, aligning with a region of interest based on the ETDRS grading system to boost performance. A segmentation model was developed to detect FT exits, complementing the classification for improved grading accuracy. The training set of DeepGraFT was from our in-house cohort (MAGIC), and the validation sets consisted of the rest part of in-house cohort and an independent public cohort (UK Biobank). DeepGraFT demonstrated a high performance in the training stage and achieved an impressive accuracy in validation phase (in-house cohort: 86.85 %; public cohort: 81.50 %). Furthermore, our findings demonstrated that DeepGraFT surpasses machine learning-based classification models in FT classification, achieving a 5.57 % increase in accuracy. Ablation analysis revealed that the introduced modules significantly enhanced classification effectiveness and elevated accuracy from 79.85 % to 86.85 %. Further analysis using the results provided by DeepGraFT unveiled a significant negative association between FT and spherical equivalent (SE) in the UK Biobank cohort. In conclusion, DeepGraFT accentuates potential benefits of the deep learning model in automating the grading of FT and allows for potential utility as a clinical-decision support tool for predicting progression of pathological myopia.
Collapse
Affiliation(s)
- Yinghao Yao
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - Jiaying Yang
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - Haojun Sun
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - Hengte Kong
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - Sheng Wang
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - Ke Xu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - Wei Dai
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - Siyi Jiang
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - QingShi Bai
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - Shilai Xing
- Institute of PSI Genomics, Wenzhou Global Eye & Vision Innovation Center, Wenzhou, 325024, China
| | - Jian Yuan
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China
| | - Xinting Liu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Fan Lu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China
| | - Zhenhui Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
| | - Jia Qu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
| | - Jianzhong Su
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Eye Hospital, Wenzhou Medical University, Wenzhou, 325011, Zhejiang, China; National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, China.
| |
Collapse
|
25
|
Zhang J, Jia X, Zhou J, Zhang J, Hu J. Weakly Supervised Solar Panel Mapping via Uncertainty Adjusted Label Transition in Aerial Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:881-896. [PMID: 38064328 DOI: 10.1109/tip.2023.3336170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2024]
Abstract
This paper proposes a novel uncertainty-adjusted label transition (UALT) method for weakly supervised solar panel mapping (WS-SPM) in aerial Images. In weakly supervised learning (WSL), the noisy nature of pseudo labels (PLs) often leads to poor model performance. To address this problem, we formulate the task as a label-noise learning problem and build a statistically consistent mapping model by estimating the instance-dependent transition matrix (IDTM). We propose to estimate the IDTM with a parameterized label transition network describing the relationship between the latent clean labels and noisy PLs. A trace regularizer is employed to impose constraints on the form of IDTM for its stability. To further reduce the estimation difficulty of IDTM, we incorporate uncertainty estimation to first improve the accuracy of noisy dataset distillation and then mitigate the negative impacts of falsely distilled examples with an uncertainty-adjusted re-weighting strategy. Extensive experiments and ablation studies on two challenging aerial data sets support the validity of the proposed UALT.
Collapse
|
26
|
Wang C, He N, Zhang Y, Li Y, Huang P, Liu Y, Jin Z, Cheng Z, Liu Y, Wang Y, Zhang C, Haacke EM, Chen S, Yan F, Yang G. Enhancing Nigrosome-1 Sign Identification via Interpretable AI using True Susceptibility Weighted Imaging. J Magn Reson Imaging 2024. [PMID: 38236577 DOI: 10.1002/jmri.29245] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/05/2024] [Accepted: 01/08/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Nigrosome 1 (N1), the largest nigrosome region in the ventrolateral area of the substantia nigra pars compacta, is identifiable by the "N1 sign" in long echo time gradient echo MRI. The N1 sign's absence is a vital Parkinson's disease (PD) diagnostic marker. However, it is challenging to visualize and assess the N1 sign in clinical practice. PURPOSE To automatically detect the presence or absence of the N1 sign from true susceptibility weighted imaging by using deep-learning method. STUDY TYPE Prospective. POPULATION/SUBJECTS 453 subjects, including 225 PD patients, 120 healthy controls (HCs), and 108 patients with other movement disorders, were prospectively recruited including 227 males and 226 females. They were divided into training, validation, and test cohorts of 289, 73, and 91 cases, respectively. FIELD STRENGTH/SEQUENCE 3D gradient echo SWI sequence at 3T; 3D multiecho strategically acquired gradient echo imaging at 3T; NM-sensitive 3D gradient echo sequence with MTC pulse at 3T. ASSESSMENT A neuroradiologist with 5 years of experience manually delineated substantia nigra regions. Two raters with 2 and 36 years of experience assessed the N1 sign on true susceptibility weighted imaging (tSWI), QSM with high-pass filter, and magnitude data combined with MTC data. We proposed NINet, a neural model, for automatic N1 sign identification in tSWI images. STATISTICAL TESTS We compared the performance of NINet to the subjective reference standard using Receiver Operating Characteristic analyses, and a decision curve analysis assessed identification accuracy. RESULTS NINet achieved an area under the curve (AUC) of 0.87 (CI: 0.76-0.89) in N1 sign identification, surpassing other models and neuroradiologists. NINet localized the putative N1 sign within tSWI images with 67.3% accuracy. DATA CONCLUSION Our proposed NINet model's capability to determine the presence or absence of the N1 sign, along with its localization, holds promise for enhancing diagnostic accuracy when evaluating PD using MR images. LEVEL OF EVIDENCE 2 TECHNICAL EFFICACY: Stage 1.
Collapse
Affiliation(s)
- Chenglong Wang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| | - Naying He
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Youmin Zhang
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yan Li
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Pei Huang
- Department of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yu Liu
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhijia Jin
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zenghui Cheng
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yun Liu
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| | - Yida Wang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| | - Chengxiu Zhang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| | - E Mark Haacke
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Department of Biomedical Engineering, Wayne State University, Detroit, Michigan, USA
| | - Shengdi Chen
- Department of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Fuhua Yan
- Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Guang Yang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China
| |
Collapse
|
27
|
Herr J, Stoyanova R, Mellon EA. Convolutional Neural Networks for Glioma Segmentation and Prognosis: A Systematic Review. Crit Rev Oncog 2024; 29:33-65. [PMID: 38683153 DOI: 10.1615/critrevoncog.2023050852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
Deep learning (DL) is poised to redefine the way medical images are processed and analyzed. Convolutional neural networks (CNNs), a specific type of DL architecture, are exceptional for high-throughput processing, allowing for the effective extraction of relevant diagnostic patterns from large volumes of complex visual data. This technology has garnered substantial interest in the field of neuro-oncology as a promising tool to enhance medical imaging throughput and analysis. A multitude of methods harnessing MRI-based CNNs have been proposed for brain tumor segmentation, classification, and prognosis prediction. They are often applied to gliomas, the most common primary brain cancer, to classify subtypes with the goal of guiding therapy decisions. Additionally, the difficulty of repeating brain biopsies to evaluate treatment response in the setting of often confusing imaging findings provides a unique niche for CNNs to help distinguish the treatment response to gliomas. For example, glioblastoma, the most aggressive type of brain cancer, can grow due to poor treatment response, can appear to grow acutely due to treatment-related inflammation as the tumor dies (pseudo-progression), or falsely appear to be regrowing after treatment as a result of brain damage from radiation (radiation necrosis). CNNs are being applied to separate this diagnostic dilemma. This review provides a detailed synthesis of recent DL methods and applications for intratumor segmentation, glioma classification, and prognosis prediction. Furthermore, this review discusses the future direction of MRI-based CNN in the field of neuro-oncology and challenges in model interpretability, data availability, and computation efficiency.
Collapse
Affiliation(s)
| | - Radka Stoyanova
- Department of Radiation Oncology, University of Miami Miller School of Medicine, Sylvester Comprehensive Cancer Center, Miami, Fl 33136, USA
| | - Eric Albert Mellon
- Department of Radiation Oncology, University of Miami Miller School of Medicine, Sylvester Comprehensive Cancer Center, Miami, Fl 33136, USA
| |
Collapse
|
28
|
Zhang C, Dong K, Aihara K, Chen L, Zhang S. STAMarker: determining spatial domain-specific variable genes with saliency maps in deep learning. Nucleic Acids Res 2023; 51:e103. [PMID: 37811885 PMCID: PMC10639070 DOI: 10.1093/nar/gkad801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 08/26/2023] [Accepted: 09/19/2023] [Indexed: 10/10/2023] Open
Abstract
Spatial transcriptomics characterizes gene expression profiles while retaining the information of the spatial context, providing an unprecedented opportunity to understand cellular systems. One of the essential tasks in such data analysis is to determine spatially variable genes (SVGs), which demonstrate spatial expression patterns. Existing methods only consider genes individually and fail to model the inter-dependence of genes. To this end, we present an analytic tool STAMarker for robustly determining spatial domain-specific SVGs with saliency maps in deep learning. STAMarker is a three-stage ensemble framework consisting of graph-attention autoencoders, multilayer perceptron (MLP) classifiers, and saliency map computation by the backpropagated gradient. We illustrate the effectiveness of STAMarker and compare it with serveral commonly used competing methods on various spatial transcriptomic data generated by different platforms. STAMarker considers all genes at once and is more robust when the dataset is very sparse. STAMarker could identify spatial domain-specific SVGs for characterizing spatial domains and enable in-depth analysis of the region of interest in the tissue section.
Collapse
Affiliation(s)
- Chihao Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kangning Dong
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kazuyuki Aihara
- International Research Center for Neurointelligence, The University of Tokyo Institutes for Advanced Study, The University of Tokyo, Tokyo 113-0033, Japan
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong 519031, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
29
|
Sujatha Ravindran A, Contreras-Vidal J. An empirical comparison of deep learning explainability approaches for EEG using simulated ground truth. Sci Rep 2023; 13:17709. [PMID: 37853010 PMCID: PMC10584975 DOI: 10.1038/s41598-023-43871-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 09/29/2023] [Indexed: 10/20/2023] Open
Abstract
Recent advancements in machine learning and deep learning (DL) based neural decoders have significantly improved decoding capabilities using scalp electroencephalography (EEG). However, the interpretability of DL models remains an under-explored area. In this study, we compared multiple model explanation methods to identify the most suitable method for EEG and understand when some of these approaches might fail. A simulation framework was developed to evaluate the robustness and sensitivity of twelve back-propagation-based visualization methods by comparing to ground truth features. Multiple methods tested here showed reliability issues after randomizing either model weights or labels: e.g., the saliency approach, which is the most used visualization technique in EEG, was not class or model-specific. We found that DeepLift was consistently accurate as well as robust to detect the three key attributes tested here (temporal, spatial, and spectral precision). Overall, this study provides a review of model explanation methods for DL-based neural decoders and recommendations to understand when some of these methods fail and what they can capture in EEG.
Collapse
Affiliation(s)
- Akshay Sujatha Ravindran
- Noninvasive Brain-Machine Interface System Laboratory, Department of Electrical and Computer Engineering, University of Houston, Houston, 77204, USA.
- IUCRC BRAIN, University of Houston, Houston, 77204, USA.
- Alto Neuroscience, Los Altos, CA, 94022, USA.
| | - Jose Contreras-Vidal
- Noninvasive Brain-Machine Interface System Laboratory, Department of Electrical and Computer Engineering, University of Houston, Houston, 77204, USA
- IUCRC BRAIN, University of Houston, Houston, 77204, USA
| |
Collapse
|
30
|
Szczepankiewicz K, Popowicz A, Charkiewicz K, Nałęcz-Charkiewicz K, Szczepankiewicz M, Lasota S, Zawistowski P, Radlak K. Ground truth based comparison of saliency maps algorithms. Sci Rep 2023; 13:16887. [PMID: 37803108 PMCID: PMC10558518 DOI: 10.1038/s41598-023-42946-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/16/2023] [Indexed: 10/08/2023] Open
Abstract
Deep neural networks (DNNs) have achieved outstanding results in domains such as image processing, computer vision, natural language processing and bioinformatics. In recent years, many methods have been proposed that can provide a visual explanation of decision made by such classifiers. Saliency maps are probably the most popular. However, it is still unclear how to properly interpret saliency maps for a given image and which techniques perform most accurately. This paper presents a methodology to practically evaluate the real effectiveness of saliency map generation methods. We used three state-of-the-art network architectures along with specially prepared benchmark datasets, and we proposed a novel metric to provide a quantitative comparison of the methods. The comparison identified the most reliable techniques and the solutions which usually failed in our tests.
Collapse
Affiliation(s)
| | - Adam Popowicz
- Department of Electronics, Electrical Engineering and Microelectronics, Silesian University of Technology, Akademicka 16, Gliwice, Poland
| | | | | | | | - Sławomir Lasota
- Department of Electronics, Electrical Engineering and Microelectronics, Silesian University of Technology, Akademicka 16, Gliwice, Poland
| | - Paweł Zawistowski
- Institute of Computer Science, Warsaw University of Technology, Pl. Politechniki 1, Warsaw, Poland
| | - Krystian Radlak
- Institute of Computer Science, Warsaw University of Technology, Pl. Politechniki 1, Warsaw, Poland.
| |
Collapse
|
31
|
Zheng Y, Huang D, Hao X, Wei J, Lu H, Liu Y. UniVisNet: A Unified Visualization and Classification Network for accurate grading of gliomas from MRI. Comput Biol Med 2023; 165:107332. [PMID: 37598632 DOI: 10.1016/j.compbiomed.2023.107332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/30/2023] [Accepted: 08/07/2023] [Indexed: 08/22/2023]
Abstract
Accurate grading of brain tumors plays a crucial role in the diagnosis and treatment of glioma. While convolutional neural networks (CNNs) have shown promising performance in this task, their clinical applicability is still constrained by the interpretability and robustness of the models. In the conventional framework, the classification model is trained first, and then visual explanations are generated. However, this approach often leads to models that prioritize classification performance or complexity, making it difficult to achieve a precise visual explanation. Motivated by these challenges, we propose the Unified Visualization and Classification Network (UniVisNet), a novel framework that aims to improve both the classification performance and the generation of high-resolution visual explanations. UniVisNet addresses attention misalignment by introducing a subregion-based attention mechanism, which replaces traditional down-sampling operations. Additionally, multiscale feature maps are fused to achieve higher resolution, enabling the generation of detailed visual explanations. To streamline the process, we introduce the Unified Visualization and Classification head (UniVisHead), which directly generates visual explanations without the need for additional separation steps. Through extensive experiments, our proposed UniVisNet consistently outperforms strong baseline classification models and prevalent visualization methods. Notably, UniVisNet achieves remarkable results on the glioma grading task, including an AUC of 94.7%, an accuracy of 89.3%, a sensitivity of 90.4%, and a specificity of 85.3%. Moreover, UniVisNet provides visually interpretable explanations that surpass existing approaches. In conclusion, UniVisNet innovatively generates visual explanations in brain tumor grading by simultaneously improving the classification performance and generating high-resolution visual explanations. This work contributes to the clinical application of deep learning, empowering clinicians with comprehensive insights into the spatial heterogeneity of glioma.
Collapse
Affiliation(s)
- Yao Zheng
- Air Force Medical University, No. 169 Changle West Road, Xi'an, 710032, ShaanXi, China
| | - Dong Huang
- Air Force Medical University, No. 169 Changle West Road, Xi'an, 710032, ShaanXi, China; Shaanxi Provincial Key Laboratory of Bioelectromagnetic Detection and Intelligent Perception, No. 169 Changle West Road, Xi'an, 710032, ShaanXi, China
| | - Xiaoshuo Hao
- Air Force Medical University, No. 169 Changle West Road, Xi'an, 710032, ShaanXi, China
| | - Jie Wei
- Air Force Medical University, No. 169 Changle West Road, Xi'an, 710032, ShaanXi, China
| | - Hongbing Lu
- Air Force Medical University, No. 169 Changle West Road, Xi'an, 710032, ShaanXi, China; Shaanxi Provincial Key Laboratory of Bioelectromagnetic Detection and Intelligent Perception, No. 169 Changle West Road, Xi'an, 710032, ShaanXi, China.
| | - Yang Liu
- Air Force Medical University, No. 169 Changle West Road, Xi'an, 710032, ShaanXi, China; Shaanxi Provincial Key Laboratory of Bioelectromagnetic Detection and Intelligent Perception, No. 169 Changle West Road, Xi'an, 710032, ShaanXi, China.
| |
Collapse
|
32
|
Baraheem SS, Nguyen TV. AI vs. AI: Can AI Detect AI-Generated Images? J Imaging 2023; 9:199. [PMID: 37888306 PMCID: PMC10607823 DOI: 10.3390/jimaging9100199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/13/2023] [Accepted: 09/27/2023] [Indexed: 10/28/2023] Open
Abstract
The proliferation of Artificial Intelligence (AI) models such as Generative Adversarial Networks (GANs) has shown impressive success in image synthesis. Artificial GAN-based synthesized images have been widely spread over the Internet with the advancement in generating naturalistic and photo-realistic images. This might have the ability to improve content and media; however, it also constitutes a threat with regard to legitimacy, authenticity, and security. Moreover, implementing an automated system that is able to detect and recognize GAN-generated images is significant for image synthesis models as an evaluation tool, regardless of the input modality. To this end, we propose a framework for reliably detecting AI-generated images from real ones through Convolutional Neural Networks (CNNs). First, GAN-generated images were collected based on different tasks and different architectures to help with the generalization. Then, transfer learning was applied. Finally, several Class Activation Maps (CAM) were integrated to determine the discriminative regions that guided the classification model in its decision. Our approach achieved 100% on our dataset, i.e., Real or Synthetic Images (RSI), and a superior performance on other datasets and configurations in terms of its accuracy. Hence, it can be used as an evaluation tool in image generation. Our best detector was a pre-trained EfficientNetB4 fine-tuned on our dataset with a batch size of 64 and an initial learning rate of 0.001 for 20 epochs. Adam was used as an optimizer, and learning rate reduction along with data augmentation were incorporated.
Collapse
Affiliation(s)
- Samah S. Baraheem
- Department of Computer Science, Umm Al-Qura University, Prince Sultan Bin Abdulaziz Road, Mecca 21421, Makkah, Saudi Arabia
- Department of Computer Science, University of Dayton, Dayton, OH 45469, USA;
| | - Tam V. Nguyen
- Department of Computer Science, University of Dayton, Dayton, OH 45469, USA;
| |
Collapse
|
33
|
Lei Y, Wang T, Roper J, Tian S, Patel P, Bradley JD, Jani AB, Liu T, Yang X. Automatic segmentation of neurovascular bundle on mri using deep learning based topological modulated network. Med Phys 2023; 50:5479-5488. [PMID: 36939189 PMCID: PMC10509305 DOI: 10.1002/mp.16378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 01/20/2023] [Accepted: 03/09/2023] [Indexed: 03/21/2023] Open
Abstract
PURPOSE Radiation damage on neurovascular bundles (NVBs) may be the cause of sexual dysfunction after radiotherapy for prostate cancer. However, it is challenging to delineate NVBs as organ-at-risks from planning CTs during radiotherapy. Recently, the integration of MR into radiotherapy made NVBs contour delineating possible. In this study, we aim to develop an MRI-based deep learning method for automatic NVB segmentation. METHODS The proposed method, named topological modulated network, consists of three subnetworks, that is, a focal modulation, a hierarchical block and a topological fully convolutional network (FCN). The focal modulation is used to derive the location and bounds of left and right NVBs', namely the candidate volume-of-interests (VOIs). The hierarchical block aims to highlight the NVB boundaries information on derived feature map. The topological FCN then segments the NVBs inside the VOIs by considering the topological consistency nature of the vascular delineating. Based on the location information of candidate VOIs, the segmentations of NVBs can then be brought back to the input MRI's coordinate system. RESULTS A five-fold cross-validation study was performed on 60 patient cases to evaluate the performance of the proposed method. The segmented results were compared with manual contours. The Dice similarity coefficient (DSC) and 95th percentile Hausdorff distance (HD95 ) are (left NVB) 0.81 ± 0.10, 1.49 ± 0.88 mm, and (right NVB) 0.80 ± 0.15, 1.54 ± 1.22 mm, respectively. CONCLUSION We proposed a novel deep learning-based segmentation method for NVBs on pelvic MR images. The good segmentation agreement of our method with the manually drawn ground truth contours supports the feasibility of the proposed method, which can be potentially used to spare NVBs during proton and photon radiotherapy and thereby improve the quality of life for prostate cancer patients.
Collapse
Affiliation(s)
- Yang Lei
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
| | - Tonghe Wang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Justin Roper
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
| | - Sibo Tian
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
| | - Pretesh Patel
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
| | - Jeffrey D Bradley
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
| | - Ashesh B Jani
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
| | - Tian Liu
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
- Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Xiaofeng Yang
- Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA
| |
Collapse
|
34
|
Sun R, Wei C, Jiang Z, Huang G, Xie Y, Nie S. Weakly Supervised Breast Lesion Detection in Dynamic Contrast-Enhanced MRI. J Digit Imaging 2023; 36:1553-1564. [PMID: 37253896 PMCID: PMC10406986 DOI: 10.1007/s10278-023-00846-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 05/05/2023] [Accepted: 05/08/2023] [Indexed: 06/01/2023] Open
Abstract
Currently, obtaining accurate medical annotations requires high labor and time effort, which largely limits the development of supervised learning-based tumor detection tasks. In this work, we investigated a weakly supervised learning model for detecting breast lesions in dynamic contrast-enhanced MRI (DCE-MRI) with only image-level labels. Two hundred fifty-four normal and 398 abnormal cases with pathologically confirmed lesions were retrospectively enrolled into the breast dataset, which was divided into the training set (80%), validation set (10%), and testing set (10%) at the patient level. First, the second image series S2 after the injection of a contrast agent was acquired from the 3.0-T, T1-weighted dynamic enhanced MR imaging sequences. Second, a feature pyramid network (FPN) with convolutional block attention module (CBAM) was proposed to extract multi-scale feature maps of the modified classification network VGG16. Then, initial location information was obtained from the heatmaps generated using the layer class activation mapping algorithm (Layer-CAM). Finally, the detection results of breast lesion were refined by the conditional random field (CRF). Accuracy, sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (AUC) were utilized for evaluation of image-level classification. Average precision (AP) was estimated for breast lesion localization. Delong's test was used to compare the AUCs of different models for significance. The proposed model was effective with accuracy of 95.2%, sensitivity of 91.6%, specificity of 99.2%, and AUC of 0.986. The AP for breast lesion detection was 84.1% using weakly supervised learning. Weakly supervised learning based on FPN combined with Layer-CAM facilitated automatic detection of breast lesion.
Collapse
Affiliation(s)
- Rong Sun
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 516 Jun-Gong Road, Shanghai, 200093, China
| | - Chuanling Wei
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 516 Jun-Gong Road, Shanghai, 200093, China
| | - Zhuoyun Jiang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 516 Jun-Gong Road, Shanghai, 200093, China
| | - Gang Huang
- Shanghai University of Medicine & Health Sciences, Shanghai, China
| | - Yuanzhong Xie
- Medical Imaging Center, Tai'an Central Hospital, No. 29 Long-Tan Road, Shandong, 271099, China.
| | - Shengdong Nie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 516 Jun-Gong Road, Shanghai, 200093, China.
| |
Collapse
|
35
|
Zheng Y, Huang D, Feng Y, Hao X, He Y, Liu Y. CSF-Glioma: A Causal Segmentation Framework for Accurate Grading and Subregion Identification of Gliomas. Bioengineering (Basel) 2023; 10:887. [PMID: 37627772 PMCID: PMC10451284 DOI: 10.3390/bioengineering10080887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 07/22/2023] [Accepted: 07/24/2023] [Indexed: 08/27/2023] Open
Abstract
Deep networks have shown strong performance in glioma grading; however, interpreting their decisions remains challenging due to glioma heterogeneity. To address these challenges, the proposed solution is the Causal Segmentation Framework (CSF). This framework aims to accurately predict high- and low-grade gliomas while simultaneously highlighting key subregions. Our framework utilizes a shrinkage segmentation method to identify subregions containing essential decision information. Moreover, we introduce a glioma grading module that combines deep learning and traditional approaches for precise grading. Our proposed model achieves the best performance among all models, with an AUC of 96.14%, an F1 score of 93.74%, an accuracy of 91.04%, a sensitivity of 91.83%, and a specificity of 88.88%. Additionally, our model exhibits efficient resource utilization, completing predictions within 2.31s and occupying only 0.12 GB of memory during the test phase. Furthermore, our approach provides clear and specific visualizations of key subregions, surpassing other methods in terms of interpretability. In conclusion, the Causal Segmentation Framework (CSF) demonstrates its effectiveness at accurately predicting glioma grades and identifying key subregions. The inclusion of causality in the CSF model enhances the reliability and accuracy of preoperative decision-making for gliomas. The interpretable results provided by the CSF model can assist clinicians in their assessment and treatment planning.
Collapse
Affiliation(s)
- Yao Zheng
- School of Biomedical Engineering, Air Force Medical University, No. 169 Changle West Road, Xi’an 710032, China; (Y.Z.); (D.H.); (Y.F.); (X.H.)
| | - Dong Huang
- School of Biomedical Engineering, Air Force Medical University, No. 169 Changle West Road, Xi’an 710032, China; (Y.Z.); (D.H.); (Y.F.); (X.H.)
- Shaanxi Provincial Key Laboratory of Bioelectromagnetic Detection and Intelligent Perception, No. 169 Changle West Road, Xi’an 710032, China
| | - Yuefei Feng
- School of Biomedical Engineering, Air Force Medical University, No. 169 Changle West Road, Xi’an 710032, China; (Y.Z.); (D.H.); (Y.F.); (X.H.)
| | - Xiaoshuo Hao
- School of Biomedical Engineering, Air Force Medical University, No. 169 Changle West Road, Xi’an 710032, China; (Y.Z.); (D.H.); (Y.F.); (X.H.)
| | - Yutao He
- School of Biomedical Engineering, Air Force Medical University, No. 169 Changle West Road, Xi’an 710032, China; (Y.Z.); (D.H.); (Y.F.); (X.H.)
| | - Yang Liu
- School of Biomedical Engineering, Air Force Medical University, No. 169 Changle West Road, Xi’an 710032, China; (Y.Z.); (D.H.); (Y.F.); (X.H.)
- Shaanxi Provincial Key Laboratory of Bioelectromagnetic Detection and Intelligent Perception, No. 169 Changle West Road, Xi’an 710032, China
| |
Collapse
|
36
|
TCNN: A Transformer Convolutional Neural Network for artifact classification in whole slide images. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
|
37
|
Yuan J, Wu F, Li Y, Li J, Huang G, Huang Q. DPDH-CapNet: A Novel Lightweight Capsule Network with Non-routing for COVID-19 Diagnosis Using X-ray Images. J Digit Imaging 2023; 36:988-1000. [PMID: 36813978 PMCID: PMC9946284 DOI: 10.1007/s10278-023-00791-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 01/26/2023] [Accepted: 01/29/2023] [Indexed: 02/24/2023] Open
Abstract
COVID-19 has claimed millions of lives since its outbreak in December 2019, and the damage continues, so it is urgent to develop new technologies to aid its diagnosis. However, the state-of-the-art deep learning methods often rely on large-scale labeled data, limiting their clinical application in COVID-19 identification. Recently, capsule networks have achieved highly competitive performance for COVID-19 detection, but they require expensive routing computation or traditional matrix multiplication to deal with the capsule dimensional entanglement. A more lightweight capsule network is developed to effectively address these problems, namely DPDH-CapNet, which aims to enhance the technology of automated diagnosis for COVID-19 chest X-ray images. It adopts depthwise convolution (D), point convolution (P), and dilated convolution (D) to construct a new feature extractor, thus successfully capturing the local and global dependencies of COVID-19 pathological features. Simultaneously, it constructs the classification layer by homogeneous (H) vector capsules with an adaptive, non-iterative, and non-routing mechanism. We conduct experiments on two publicly available combined datasets, including normal, pneumonia, and COVID-19 images. With a limited number of samples, the parameters of the proposed model are reduced by 9x compared to the state-of-the-art capsule network. Moreover, our model has faster convergence speed and better generalization, and its accuracy, precision, recall, and F-measure are improved to 97.99%, 98.05%, 98.02%, and 98.03%, respectively. In addition, experimental results demonstrate that, contrary to the transfer learning method, the proposed model does not require pre-training and a large number of training samples.
Collapse
Affiliation(s)
- Jianjun Yuan
- College of Artificial Intelligence, Southwest University, Chongqing, 40075, China.
| | - Fujun Wu
- College of Artificial Intelligence, Southwest University, Chongqing, 40075, China
| | - Yuxi Li
- College of Artificial Intelligence, Southwest University, Chongqing, 40075, China
| | - Jinyi Li
- College of Artificial Intelligence, Southwest University, Chongqing, 40075, China
| | - Guojun Huang
- College of Artificial Intelligence, Southwest University, Chongqing, 40075, China
| | - Quanyong Huang
- College of Machinery and Automation, Wuhan University of Science and Technology, Heping Avenue No. 947, Wuhan, Hubei Province, 430091, China.
| |
Collapse
|
38
|
Zhu X, Sun J, Liu G, Shen C, Dai Z, Zhao L. Hybrid Domain Consistency Constraints-Based Deep Neural Network for Facial Expression Recognition. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23115201. [PMID: 37299930 DOI: 10.3390/s23115201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/23/2023] [Accepted: 05/28/2023] [Indexed: 06/12/2023]
Abstract
Facial expression recognition (FER) has received increasing attention. However, multiple factors (e.g., uneven illumination, facial deflection, occlusion, and subjectivity of annotations in image datasets) probably reduce the performance of traditional FER methods. Thus, we propose a novel Hybrid Domain Consistency Network (HDCNet) based on a feature constraint method that combines both spatial domain consistency and channel domain consistency. Specifically, first, the proposed HDCNet mines the potential attention consistency feature expression (different from manual features, e.g., HOG and SIFT) as effective supervision information by comparing the original sample image with the augmented facial expression image. Second, HDCNet extracts facial expression-related features in the spatial and channel domains, and then it constrains the consistent expression of features through the mixed domain consistency loss function. In addition, the loss function based on the attention-consistency constraints does not require additional labels. Third, the network weights are learned to optimize the classification network through the loss function of the mixed domain consistency constraints. Finally, experiments conducted on the public RAF-DB and AffectNet benchmark datasets verify that the proposed HDCNet improved classification accuracy by 0.3-3.84% compared to the existing methods.
Collapse
Affiliation(s)
- Xiaoliang Zhu
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Junyi Sun
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Gendong Liu
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Chen Shen
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Zhicheng Dai
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| | - Liang Zhao
- National Engineering Research Center of Educational Big Data, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
39
|
Lin SY, Chiang PL, Chen MH, Lee MY, Lin WC, Chen YS. DGA3-Net: A parameter-efficient deep learning model for ASPECTS assessment for acute ischemic stroke using non-contrast computed tomography. Neuroimage Clin 2023; 38:103441. [PMID: 37224605 PMCID: PMC10225927 DOI: 10.1016/j.nicl.2023.103441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 05/15/2023] [Accepted: 05/16/2023] [Indexed: 05/26/2023]
Abstract
Detecting the early signs of stroke using non-contrast computerized tomography (NCCT) is essential for the diagnosis of acute ischemic stroke (AIS). However, the hypoattenuation in NCCT is difficult to precisely identify, and accurate assessments of the Alberta Stroke Program Early CT Score (ASPECTS) are usually time-consuming and require experienced neuroradiologists. To this end, this study proposes DGA3-Net, a convolutional neural network (CNN)-based model for ASPECTS assessment via detecting early ischemic changes in ASPECTS regions. DGA3-Net is based on a novel parameter-efficient dihedral group CNN encoder to exploit the rotation and reflection symmetry of convolution kernels. The bounding volume of each ASPECTS region is extracted from the encoded feature, and an attention-guided slice aggregation module is used to aggregate features from all slices. An asymmetry-aware classifier is then used to predict stroke presence via comparison between ASPECTS regions from the left and right hemispheres. Pre-treatment NCCTs of suspected AIS patients were collected retrospectively, which consists of a primary dataset (n = 170) and an external validation dataset (n = 90), with expert consensus ASPECTS readings as ground truth. DGA3-Net outperformed two expert neuroradiologists in regional stroke identification (F1 = 0.69) and ASPECTS evaluation (Cohen's weighted Kappa = 0.70). Our ablation study also validated the efficacy of the proposed model design. In addition, class-relevant areas highlighted by visualization techniques corresponded highly with various well-established qualitative imaging signs, further validating the learned representation. This study demonstrates the potential of deep learning techniques for timely and accurate AIS diagnosis from NCCT, which could substantially improve the quality of treatment for AIS patients.
Collapse
Affiliation(s)
- Shih-Yen Lin
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Pi-Ling Chiang
- Department of Diagnostic Radiology, Kaohsiung Chang Gung Memorial Hospital, and Chang Gung University College of Medicine, Kaohsiung, Taiwan.
| | - Meng-Hsiang Chen
- Department of Diagnostic Radiology, Kaohsiung Chang Gung Memorial Hospital, and Chang Gung University College of Medicine, Kaohsiung, Taiwan.
| | - Meng-Yang Lee
- Institute of Biomedical Engineering, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Wei-Che Lin
- Department of Diagnostic Radiology, Kaohsiung Chang Gung Memorial Hospital, and Chang Gung University College of Medicine, Kaohsiung, Taiwan.
| | - Yong-Sheng Chen
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.
| |
Collapse
|
40
|
Watanabe N, Miyoshi K, Jimura K, Shimane D, Keerativittayayut R, Nakahara K, Takeda M. Multimodal deep neural decoding reveals highly resolved spatiotemporal profile of visual object representation in humans. Neuroimage 2023; 275:120164. [PMID: 37169115 DOI: 10.1016/j.neuroimage.2023.120164] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 05/02/2023] [Accepted: 05/09/2023] [Indexed: 05/13/2023] Open
Abstract
Perception and categorization of objects in a visual scene are essential to grasp the surrounding situation. Recently, neural decoding schemes, such as machine learning in functional magnetic resonance imaging (fMRI), has been employed to elucidate the underlying neural mechanisms. However, it remains unclear as to how spatially distributed brain regions temporally represent visual object categories and sub-categories. One promising strategy to address this issue is neural decoding with concurrently obtained neural response data of high spatial and temporal resolution. In this study, we explored the spatial and temporal organization of visual object representations using concurrent fMRI and electroencephalography (EEG), combined with neural decoding using deep neural networks (DNNs). We hypothesized that neural decoding by multimodal neural data with DNN would show high classification performance in visual object categorization (faces or non-face objects) and sub-categorization within faces and objects. Visualization of the fMRI DNN was more sensitive than that in the univariate approach and revealed that visual categorization occurred in brain-wide regions. Interestingly, the EEG DNN valued the earlier phase of neural responses for categorization and the later phase of neural responses for sub-categorization. Combination of the two DNNs improved the classification performance for both categorization and sub-categorization compared with fMRI DNN or EEG DNN alone. These deep learning-based results demonstrate a categorization principle in which visual objects are represented in a spatially organized and coarse-to-fine manner, and provide strong evidence of the ability of multimodal deep learning to uncover spatiotemporal neural machinery in sensory processing.
Collapse
Affiliation(s)
- Noriya Watanabe
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan
| | - Kosuke Miyoshi
- Narrative Nights, Inc., Yokohama, Kanagawa, 236-0011, Japan
| | - Koji Jimura
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan; Department of Informatics, Gunma University, Maebashi, Gunma, 371-8510, Japan
| | - Daisuke Shimane
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan
| | - Ruedeerat Keerativittayayut
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan; Chulabhorn Royal Academy, Bangkok, 10210, Thailand
| | - Kiyoshi Nakahara
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan
| | - Masaki Takeda
- Research Center for Brain Communication, Kochi University of Technology, Kami, Kochi, 782-8502, Japan.
| |
Collapse
|
41
|
Yang S, Xing Z, Wang H, Gao X, Dong X, Yao Y, Zhang R, Zhang X, Li S, Zhao Y, Liu Z. Classification and localization of maize leaf spot disease based on weakly supervised learning. FRONTIERS IN PLANT SCIENCE 2023; 14:1128399. [PMID: 37223797 PMCID: PMC10201986 DOI: 10.3389/fpls.2023.1128399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 04/10/2023] [Indexed: 05/25/2023]
Abstract
Precisely discerning disease types and vulnerable areas is crucial in implementing effective monitoring of crop production. This forms the basis for generating targeted plant protection recommendations and automatic, precise applications. In this study, we constructed a dataset comprising six types of field maize leaf images and developed a framework for classifying and localizing maize leaf diseases. Our approach involved integrating lightweight convolutional neural networks with interpretable AI algorithms, which resulted in high classification accuracy and fast detection speeds. To evaluate the performance of our framework, we tested the mean Intersection over Union (mIoU) of localized disease spot coverage and actual disease spot coverage when relying solely on image-level annotations. The results showed that our framework achieved a mIoU of up to 55.302%, indicating the feasibility of using weakly supervised semantic segmentation based on class activation mapping techniques for identifying disease spots in crop disease detection. This approach, which combines deep learning models with visualization techniques, improves the interpretability of the deep learning models and achieves successful localization of infected areas of maize leaves through weakly supervised learning. The framework allows for smart monitoring of crop diseases and plant protection operations using mobile phones, smart farm machines, and other devices. Furthermore, it offers a reference for deep learning research on crop diseases.
Collapse
Affiliation(s)
- Shuai Yang
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Ziyao Xing
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Hengbin Wang
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Xiang Gao
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Xinrui Dong
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Yu Yao
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Runda Zhang
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Xiaodong Zhang
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Shaoming Li
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Yuanyuan Zhao
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Zhe Liu
- College of Land Science and Technology, China Agricultural University, Beijing, China
- Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing, China
| |
Collapse
|
42
|
Mao J, Qiu S, Wei W, He H. Cross-modal guiding and reweighting network for multi-modal RSVP-based target detection. Neural Netw 2023; 161:65-82. [PMID: 36736001 DOI: 10.1016/j.neunet.2023.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 10/31/2022] [Accepted: 01/11/2023] [Indexed: 01/17/2023]
Abstract
Rapid Serial Visual Presentation (RSVP) based Brain-Computer Interface (BCI) facilities the high-throughput detection of rare target images by detecting evoked event-related potentials (ERPs). At present, the decoding accuracy of the RSVP-based BCI system limits its practical applications. This study introduces eye movements (gaze and pupil information), referred to as EYE modality, as another useful source of information to combine with EEG-based BCI and forms a novel target detection system to detect target images in RSVP tasks. We performed an RSVP experiment, recorded the EEG signals and eye movements simultaneously during a target detection task, and constructed a multi-modal dataset including 20 subjects. Also, we proposed a cross-modal guiding and fusion network to fully utilize EEG and EYE modalities and fuse them for better RSVP decoding performance. In this network, a two-branch backbone was built to extract features from these two modalities. A Cross-Modal Feature Guiding (CMFG) module was proposed to guide EYE modality features to complement the EEG modality for better feature extraction. A Multi-scale Multi-modal Reweighting (MMR) module was proposed to enhance the multi-modal features by exploring intra- and inter-modal interactions. And, a Dual Activation Fusion (DAF) was proposed to modulate the enhanced multi-modal features for effective fusion. Our proposed network achieved a balanced accuracy of 88.00% (±2.29) on the collected dataset. The ablation studies and visualizations revealed the effectiveness of the proposed modules. This work implies the effectiveness of introducing the EYE modality in RSVP tasks. And, our proposed network is a promising method for RSVP decoding and further improves the performance of RSVP-based target detection systems.
Collapse
Affiliation(s)
- Jiayu Mao
- Laboratory of Brain Atlas and Brain-Inspired Intelligence, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Shuang Qiu
- Laboratory of Brain Atlas and Brain-Inspired Intelligence, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Wei Wei
- Laboratory of Brain Atlas and Brain-Inspired Intelligence, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Huiguang He
- Laboratory of Brain Atlas and Brain-Inspired Intelligence, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
43
|
Meng Z, Zhu Y, Pang W, Tian J, Nie F, Wang K. MSMFN: An Ultrasound Based Multi-Step Modality Fusion Network for Identifying the Histologic Subtypes of Metastatic Cervical Lymphadenopathy. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:996-1008. [PMID: 36383594 DOI: 10.1109/tmi.2022.3222541] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Identifying squamous cell carcinoma and adenocarcinoma subtypes of metastatic cervical lymphadenopathy (CLA) is critical for localizing the primary lesion and initiating timely therapy. B-mode ultrasound (BUS), color Doppler flow imaging (CDFI), ultrasound elastography (UE) and dynamic contrast-enhanced ultrasound provide effective tools for identification but synthesis of modality information is a challenge for clinicians. Therefore, based on deep learning, rationally fusing these modalities with clinical information to personalize the classification of metastatic CLA requires new explorations. In this paper, we propose Multi-step Modality Fusion Network (MSMFN) for multi-modal ultrasound fusion to identify histological subtypes of metastatic CLA. MSMFN can mine the unique features of each modality and fuse them in a hierarchical three-step process. Specifically, first, under the guidance of high-level BUS semantic feature maps, information in CDFI and UE is extracted by modality interaction, and the static imaging feature vector is obtained. Then, a self-supervised feature orthogonalization loss is introduced to help learn modality heterogeneity features while maintaining maximal task-consistent category distinguishability of modalities. Finally, six encoded clinical information are utilized to avoid prediction bias and improve prediction ability further. Our three-fold cross-validation experiments demonstrate that our method surpasses clinicians and other multi-modal fusion methods with an accuracy of 80.06%, a true-positive rate of 81.81%, and a true-negative rate of 80.00%. Our network provides a multi-modal ultrasound fusion framework that considers prior clinical knowledge and modality-specific characteristics. Our code will be available at: https://github.com/RichardSunnyMeng/MSMFN.
Collapse
|
44
|
Mukhtorov D, Rakhmonova M, Muksimova S, Cho YI. Endoscopic Image Classification Based on Explainable Deep Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:3176. [PMID: 36991887 PMCID: PMC10058443 DOI: 10.3390/s23063176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 03/09/2023] [Accepted: 03/10/2023] [Indexed: 06/19/2023]
Abstract
Deep learning has achieved remarkably positive results and impacts on medical diagnostics in recent years. Due to its use in several proposals, deep learning has reached sufficient accuracy to implement; however, the algorithms are black boxes that are hard to understand, and model decisions are often made without reason or explanation. To reduce this gap, explainable artificial intelligence (XAI) offers a huge opportunity to receive informed decision support from deep learning models and opens the black box of the method. We conducted an explainable deep learning method based on ResNet152 combined with Grad-CAM for endoscopy image classification. We used an open-source KVASIR dataset that consisted of a total of 8000 wireless capsule images. The heat map of the classification results and an efficient augmentation method achieved a high positive result with 98.28% training and 93.46% validation accuracy in terms of medical image classification.
Collapse
|
45
|
Xiang T, Liu H, Guo S, Gan Y, He W, Liao X. Towards Query Efficient Black-Box Attacks: A Universal Dual Transferability-Based Framework. ACM T INTEL SYST TEC 2023. [DOI: 10.1145/3583777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Adversarial attacks have threatened the application of deep neural networks in security-sensitive scenarios. Most existing black-box attacks fool the target model by interacting with it many times and producing global perturbations. However, all pixels are not equally crucial to the target model, so indiscriminately treating all pixels will increase the query overhead inevitably. Besides, existing black-box attacks take clean samples as start points, which also limits the query efficiency. In this paper, we propose a novel black-box attack framework, constructed on a strategy of dual transferability (DT), to perturb the discriminative areas of clean examples within limited queries. The first kind of transferability is the transferability of model interpretations. Based on this property, we identify the discriminative areas of clean samples for generating local perturbations. The second one is the transferability of adversarial examples, which helps us to produce local pre-perturbations for further improving the query efficiency. We achieve the two kinds of transferability through an independent auxiliary model and do not incur extra query overhead. After identifying discriminative areas and generating pre-perturbations, we use the pre-perturbed samples as better start points and further perturb them locally in a black-box way to search the corresponding adversarial examples. The DT strategy is general, so the proposed framework can be applied to different types of black-box attacks. We conduct extensive experiments to show that, under various system settings, our framework can significantly improve the query efficiency of existing black-box attacks and attack success rate.
Collapse
Affiliation(s)
| | | | | | | | | | - Xiaofeng Liao
- College of Computer Science Chongqing University, China
| |
Collapse
|
46
|
Syed S, Anderssen KE, Stormo SK, Kranz M. Weakly supervised semantic segmentation for MRI: exploring the advantages and disadvantages of class activation maps for biological image segmentation with soft boundaries. Sci Rep 2023; 13:2574. [PMID: 36781947 PMCID: PMC9925800 DOI: 10.1038/s41598-023-29665-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/08/2023] [Indexed: 02/15/2023] Open
Abstract
Fully supervised semantic segmentation models require pixel-level annotations that are costly to obtain. As a remedy, weakly supervised semantic segmentation has been proposed, where image-level labels and class activation maps (CAM) can detect discriminative regions for specific class objects. In this paper, we evaluated several CAM methods applied to different convolutional neural networks (CNN) to highlight tissue damage of cod fillets with soft boundaries in MRI. Our results show that different CAM methods produce very different CAM regions, even when applying them to the same CNN model. CAM methods that claim to highlight more of the class object do not necessarily highlight more damaged regions or originate from the same high discriminatory regions, nor do these damaged regions show high agreement across the different CAM methods. Additionally, CAM methods produce damaged regions that do not align with external reference metrics, and even show correlations contrary to what can be expected.
Collapse
Affiliation(s)
- Shaheen Syed
- Department of Seafood Industry, Nofima AS, P.O. Box 6122, 9291, Tromsø, Norway.
- Department of Computer Science, UiT The Arctic University of Norway, Hansine Hansens veg 18, 9009, Tromsø, Norway.
| | - Kathryn E Anderssen
- Department of Seafood Industry, Nofima AS, P.O. Box 6122, 9291, Tromsø, Norway
| | | | - Mathias Kranz
- PET Imaging Center Tromsø, University Hospital North-Norway (UNN), Hansine Hansens veg 67, 9009, Tromsø, Norway
- Nuclear Medicine and Radiation Biology Research Group, UiT The Arctic University of Norway, Hansine Hansens veg 18, 9009, Tromsø, Norway
| |
Collapse
|
47
|
TSSK-Net: Weakly supervised biomarker localization and segmentation with image-level annotation in retinal OCT images. Comput Biol Med 2023; 153:106467. [PMID: 36584602 DOI: 10.1016/j.compbiomed.2022.106467] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 11/16/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
The localization and segmentation of biomarkers in OCT images are critical steps in retina-related disease diagnosis. Although fully supervised deep learning models can segment pathological regions, their performance relies on labor-intensive pixel-level annotations. Compared with dense pixel-level annotation, image-level annotation can reduce the burden of manual annotation. Existing methods for image-level annotation are usually based on class activation maps (CAM). However, current methods still suffer from model collapse, training instability, and anatomical mismatch due to the considerable variation in retinal biomarkers' shape, texture, and size. This paper proposes a novel weakly supervised biomarkers localization and segmentation method, requiring only image-level annotations. The technique is a Teacher-Student network with joint Self-supervised contrastive learning and Knowledge distillation-based anomaly localization, namely TSSK-Net. Specifically, we treat retinal biomarker regions as abnormal regions distinct from normal regions. First, we propose a novel pre-training strategy based on supervised contrastive learning that encourages the model to learn the anatomical structure of normal OCT images. Second, we design a fine-tuning module and propose a novel hybrid network structure. The network includes supervised contrastive loss for feature learning and cross-entropy loss for classification learning. To further improve the performance, we propose an efficient strategy to combine these two losses to preserve the anatomical structure and enhance the encoding representation of features. Finally, we design a knowledge distillation-based anomaly segmentation method that is effectively combined with the previous model to alleviate the challenge of insufficient supervision. Experimental results on a local dataset and a public dataset demonstrated the effectiveness of our proposed method. Our proposed method can effectively reduce the annotation burden of ophthalmologists in OCT images.
Collapse
|
48
|
Deeply Explain CNN Via Hierarchical Decomposition. Int J Comput Vis 2023. [DOI: 10.1007/s11263-022-01746-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
49
|
Weakly-supervised localization and classification of biomarkers in OCT images with integrated reconstruction and attention. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
50
|
Wang C, Zhang Y, Xu S, Liu Y, Xie L, Wu C, Yang Q, Chu Y, Ye Q. Research on Assistant Diagnosis of Fundus Optic Neuropathy Based on Deep Learning. Curr Eye Res 2023; 48:51-59. [PMID: 36264060 DOI: 10.1080/02713683.2022.2138917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
PURPOSE The purpose of this study was to use the neural network to distinguish optic edema (ODE), and optic atrophy from normal fundus images and try to use visualization to explain the artificial intelligence methods. METHODS Three hundred and sixty-seven images of ODE, 206 images of optic atrophy, and 231 images of normal fundus were used, which were provided by two hospitals. A set of image preprocessing and data enhancement methods were created and a variety of different neural network models, such as VGG16, VGG19, Inception V3, and 50-layer Deep Residual Learning (ResNet50) were used. The accuracy, recall, F1-score, and ROC curve under different networks were analyzed to evaluate the performance of models. Besides, CAM (class activation mapping) was utilized to find the focus of neural network and visualization of neural network with feature fusion. RESULTS Our image preprocessing and data enhancement method significantly improved the accuracy of model performance by about 10%. Among the networks, VGG16 had the best effect, as the accuracy of ODE, optic atrophy and normal fundus were 98, 90, and 95%, respectively. The macro-average and micro-average of VGG16 both reached 0.98. From CAM we can clearly find out that the focus area of the network is near the optic cup. From feature fusion images, we can find out the difference between the three types fundus images. CONCLUSION Through image preprocessing, data enhancement, and neural network training, we applied artificial intelligence to identify ophthalmic diseases, acquired the focus area through CAM, and identified the difference between the three ophthalmic diseases through neural network middle layers visualization. With the help of assistant diagnosis, ophthalmologists can evaluate cases more precisely and more clearly.
Collapse
Affiliation(s)
- Chengjin Wang
- Key Laboratory of Weak-Light Nonlinear Photonics, School of Physics and TEDA Applied Physics, Ministry of Education, Nankai University, Tianjin, China
| | - Yuwei Zhang
- Key Laboratory of Weak-Light Nonlinear Photonics, School of Physics and TEDA Applied Physics, Ministry of Education, Nankai University, Tianjin, China
| | - Shuai Xu
- Key Laboratory of Weak-Light Nonlinear Photonics, School of Physics and TEDA Applied Physics, Ministry of Education, Nankai University, Tianjin, China
| | - Yuyan Liu
- Tianjin Key Lab of Ophthalmology and Visual Science, Tianjin Eye Hospital and Eye Institute, Nankai University Affiliated Eye Hospital, Clinical College of Ophthalmology Tianjin Medical University, Tianjin, China
| | - Lindan Xie
- Tianjin Key Lab of Ophthalmology and Visual Science, Tianjin Eye Hospital and Eye Institute, Nankai University Affiliated Eye Hospital, Clinical College of Ophthalmology Tianjin Medical University, Tianjin, China
| | - Changlong Wu
- Ophthalmology, Jinan Second People's Hospital, Jinan City, Shandong Province, China
| | - Qianhui Yang
- Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin, China
| | - Yanhua Chu
- Tianjin Key Lab of Ophthalmology and Visual Science, Tianjin Eye Hospital and Eye Institute, Nankai University Affiliated Eye Hospital, Clinical College of Ophthalmology Tianjin Medical University, Tianjin, China
| | - Qing Ye
- Key Laboratory of Weak-Light Nonlinear Photonics, School of Physics and TEDA Applied Physics, Ministry of Education, Nankai University, Tianjin, China.,Nankai University Eye Institute, Nankai University Afflicted Eye Hospital, Nankai University, Tianjin, China
| |
Collapse
|