1
|
Hans R, Sharma SK, Aickelin U. Optimised deep k-nearest neighbour's based diabetic retinopathy diagnosis(ODeep-NN) using retinal images. Health Inf Sci Syst 2024; 12:23. [PMID: 38469456 PMCID: PMC10924814 DOI: 10.1007/s13755-024-00282-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/18/2024] [Indexed: 03/13/2024] Open
Abstract
Diabetes mellitus has been regarded as one of the prime health issues in present days, which can often lead to diabetic retinopathy, a complication of the disease that affects the eyes, causing loss of vision. For precisely detecting the condition's existence, clinicians are required to recognise the presence of lesions in colour fundus images, making it an arduous and time-consuming task. To deal with this problem, a lot of work has been undertaken to develop deep learning-based computer-aided diagnosis systems that assist clinicians in making accurate diagnoses of the diseases in medical images. Contrariwise, the basic operations involved in deep learning models lead to the extraction of a bulky set of features, further taking a long period of training to predict the existence of the disease. For effective execution of these models, feature selection becomes an important task that aids in selecting the most appropriate features, with an aim to increase the classification accuracy. This research presents an optimised deep k-nearest neighbours'-based pipeline model in a bid to amalgamate the feature extraction capability of deep learning models with nature-inspired metaheuristic algorithms, further using k-nearest neighbour algorithm for classification. The proposed model attains an accuracy of 97.67 and 98.05% on two different datasets considered, outperforming Resnet50 and AlexNet deep learning models. Additionally, the experimental results also portray an analysis of five different nature-inspired metaheuristic algorithms, considered for feature selection on the basis of various evaluation parameters.
Collapse
Affiliation(s)
- Rahul Hans
- Department of Computer Science and Engineering, DAV University, Jalandhar, Punjab India
| | - Sanjeev Kumar Sharma
- Department of Computer Science and Applications, DAV University, Jalandhar, Punjab India
| | - Uwe Aickelin
- School of Computing and Information Systems, University of Melbourne, Melbourne, Australia
| |
Collapse
|
2
|
Tan HQ, Cai J, Tay SH, Sim AY, Huang L, Chua ML, Tang Y. Cluster-based radiomics reveal spatial heterogeneity of bevacizumab response for treatment of radiotherapy-induced cerebral necrosis. Comput Struct Biotechnol J 2024; 23:43-51. [PMID: 38125298 PMCID: PMC10730953 DOI: 10.1016/j.csbj.2023.11.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 11/21/2023] [Accepted: 11/21/2023] [Indexed: 12/23/2023] Open
Abstract
Background Bevacizumab is used in the treatment of radiation necrosis (RN), which is a debilitating toxicity following head and neck radiotherapy. However, there is no biomarker to predict if a patient would respond to bevacizumab. Purpose We aimed to develop a cluster-based radiomics approach to characterize the spatial heterogeneity of RN and map their responses to bevacizumab. Methods 118 consecutive nasopharyngeal carcinoma patients diagnosed with RN were enrolled. We divided 152 lesions from the patients into 101 for training, and 51 for validation. We extracted voxel-level radiomics features from each lesion segmented on T1-weighted+contrast and T2 FLAIR sequences of pre- and post-bevacizumab magnetic resonance images, followed by a three-step analysis involving individual- and population-level clustering, before delta-radiomics to derive five radiomics clusters within the lesions. We tested the association of each cluster with response to bevacizumab and developed a clinico-radiomics model using clinical predictors and cluster-specific features. Results 71 (70.3%) and 34 (66.7%) lesions had responded to bevacizumab in the training and validation datasets, respectively. Two radiomics clusters were spatially mapped to the edema region, and the volume changes were significantly associated with bevacizumab response (OR:11.12 [95% CI: 2.54-73.47], P = 0.004; and 1.63[1.07-2.78], P = 0.042). The combined clinico-radiomics model based on textural features extracted from the most significant cluster improved the prediction of bevacizumab response, compared with a clinical-only model (AUC:0.755 [0.645-0.865] to 0.852 [0.764-0.940], training; 0.708 [0.554-0.861] to 0.816 [0.699-0.933], validation). Conclusion Our radiomics approach yielded intralesional resolution, enabling a more refined feature selection for predicting bevacizumab efficacy in the treatment of RN.
Collapse
Affiliation(s)
- Hong Qi Tan
- Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
| | - Jinhua Cai
- Department of Neurology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
- Guangdong Provincial Key Laboratory of Brain Function and Disease, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Shi Hui Tay
- Division of Medical Sciences, National Cancer Centre Singapore, Singapore
| | - Adelene Y.L. Sim
- Division of Medical Sciences, National Cancer Centre Singapore, Singapore
| | - Luo Huang
- Department of Radiation Oncology, Chongqing University Cancer Hospital, People's Republic of China
| | - Melvin L.K. Chua
- Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
- Division of Medical Sciences, National Cancer Centre Singapore, Singapore
- Oncology Academic Programme, Duke-NUS Medical School, Singapore
| | - Yamei Tang
- Department of Neurology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
- Guangdong Provincial Key Laboratory of Brain Function and Disease, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, People's Republic of China
| |
Collapse
|
3
|
Tang J, Du W, Shu Z, Cao Z. A generative benchmark for evaluating the performance of fluorescent cell image segmentation. Synth Syst Biotechnol 2024; 9:627-637. [PMID: 38798889 PMCID: PMC11127598 DOI: 10.1016/j.synbio.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 04/13/2024] [Accepted: 05/08/2024] [Indexed: 05/29/2024] Open
Abstract
Fluorescent cell imaging technology is fundamental in life science research, offering a rich source of image data crucial for understanding cell spatial positioning, differentiation, and decision-making mechanisms. As the volume of this data expands, precise image analysis becomes increasingly critical. Cell segmentation, a key analysis step, significantly influences quantitative analysis outcomes. However, selecting the most effective segmentation method is challenging, hindered by existing evaluation methods' inaccuracies, lack of graded evaluation, and narrow assessment scope. Addressing this, we developed a novel framework with two modules: StyleGAN2-based contour generation and Pix2PixHD-based image rendering, producing diverse, graded-density cell images. Using this dataset, we evaluated three leading cell segmentation methods: DeepCell, CellProfiler, and CellPose. Our comprehensive comparison revealed CellProfiler's superior accuracy in segmenting cytoplasm and nuclei. Our framework diversifies cell image data generation and systematically addresses evaluation challenges in cell segmentation technologies, establishing a solid foundation for advancing research and applications in cell image analysis.
Collapse
Affiliation(s)
- Jun Tang
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China
- MOE Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Wei Du
- MOE Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Zhanpeng Shu
- College of Electrical Engineering, Shanghai Dianji University, Shanghai, 201306, China
| | - Zhixing Cao
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China
| |
Collapse
|
4
|
Wang S, Shen Y, Zeng F, Wang M, Li B, Shen D, Tang X, Wang B. Exploiting biochemical data to improve osteosarcoma diagnosis with deep learning. Health Inf Sci Syst 2024; 12:31. [PMID: 38645838 PMCID: PMC11026331 DOI: 10.1007/s13755-024-00288-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/05/2024] [Indexed: 04/23/2024] Open
Abstract
Early and accurate diagnosis of osteosarcomas (OS) is of great clinical significance, and machine learning (ML) based methods are increasingly adopted. However, current ML-based methods for osteosarcoma diagnosis consider only X-ray images, usually fail to generalize to new cases, and lack explainability. In this paper, we seek to explore the capability of deep learning models in diagnosing primary OS, with higher accuracy, explainability, and generality. Concretely, we analyze the added value of integrating the biochemical data, i.e., alkaline phosphatase (ALP) and lactate dehydrogenase (LDH), and design a model that incorporates the numerical features of ALP and LDH and the visual features of X-ray imaging through a late fusion approach in the feature space. We evaluate this model on real-world clinic data with 848 patients aged from 4 to 81. The experimental results reveal the effectiveness of incorporating ALP and LDH simultaneously in a late fusion approach, with the accuracy of the considered 2608 cases increased to 97.17%, compared to 94.35% in the baseline. Grad-CAM visualizations consistent with orthopedic specialists further justified the model's explainability.
Collapse
Affiliation(s)
- Shidong Wang
- Musculoskeletal Tumor Center, Peking University People’s Hospital, Beijing, China
| | - Yangyang Shen
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Fanwei Zeng
- Musculoskeletal Tumor Center, Peking University People’s Hospital, Beijing, China
| | - Meng Wang
- College of Design and Innovation, Tongji University, Shanghai, China
| | - Bohan Li
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
- Ministry of Industry and Information Technology, Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, China
- National Engineering Laboratory for Integrated Aero-Space-Ground Ocean Big Data Application Technology, Xi’an, China
| | - Dian Shen
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Xiaodong Tang
- Musculoskeletal Tumor Center, Peking University People’s Hospital, Beijing, China
| | - Beilun Wang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| |
Collapse
|
5
|
Demirbaş AA, Üzen H, Fırat H. Spatial-attention ConvMixer architecture for classification and detection of gastrointestinal diseases using the Kvasir dataset. Health Inf Sci Syst 2024; 12:32. [PMID: 38685985 PMCID: PMC11056348 DOI: 10.1007/s13755-024-00290-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
Gastrointestinal (GI) disorders, encompassing conditions like cancer and Crohn's disease, pose a significant threat to public health. Endoscopic examinations have become crucial for diagnosing and treating these disorders efficiently. However, the subjective nature of manual evaluations by gastroenterologists can lead to potential errors in disease classification. In addition, the difficulty of diagnosing diseased tissues in GI and the high similarity between classes made the subject a difficult area. Automated classification systems that use artificial intelligence to solve these problems have gained traction. Automatic detection of diseases in medical images greatly benefits in the diagnosis of diseases and reduces the time of disease detection. In this study, we suggested a new architecture to enable research on computer-assisted diagnosis and automated disease detection in GI diseases. This architecture, called Spatial-Attention ConvMixer (SAC), further developed the patch extraction technique used as the basis of the ConvMixer architecture with a spatial attention mechanism (SAM). The SAM enables the network to concentrate selectively on the most informative areas, assigning importance to each spatial location within the feature maps. We employ the Kvasir dataset to assess the accuracy of classifying GI illnesses using the SAC architecture. We compare our architecture's results with Vanilla ViT, Swin Transformer, ConvMixer, MLPMixer, ResNet50, and SqueezeNet models. Our SAC method gets 93.37% accuracy, while the other architectures get respectively 79.52%, 74.52%, 92.48%, 63.04%, 87.44%, and 85.59%. The proposed spatial attention block improves the accuracy of the ConvMixer architecture on the Kvasir, outperforming the state-of-the-art methods with an accuracy rate of 93.37%.
Collapse
Affiliation(s)
| | - Hüseyin Üzen
- Department of Computer Engineering, Faculty of Engineering, Bingol University, Bingol, Turkey
| | - Hüseyin Fırat
- Department of Computer Engineering, Faculty of Engineering, Dicle University, Diyarbakır, Turkey
| |
Collapse
|
6
|
Sirugue L, Langenfeld F, Lagarde N, Montes M. PLO3S: Protein LOcal Surficial Similarity Screening. Comput Struct Biotechnol J 2024; 26:1-10. [PMID: 38189058 PMCID: PMC10770625 DOI: 10.1016/j.csbj.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/01/2023] [Accepted: 12/03/2023] [Indexed: 01/09/2024] Open
Abstract
The study of protein molecular surfaces enables to better understand and predict protein interactions. Different methods have been developed in computer vision to compare surfaces that can be applied to protein molecular surfaces. The present work proposes a method using the Wave Kernel Signature: Protein LOcal Surficial Similarity Screening (PLO3S). The descriptor of the PLO3S method is a local surface shape descriptor projected on a unit sphere mapped onto a 2D plane and called Surface Wave Interpolated Maps (SWIM). PLO3S allows to rapidly compare protein surface shapes through local comparisons to filter large protein surfaces datasets in protein structures virtual screening protocols.
Collapse
Affiliation(s)
- Léa Sirugue
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Matthieu Montes
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| |
Collapse
|
7
|
Hosseini MS, Bejnordi BE, Trinh VQH, Chan L, Hasan D, Li X, Yang S, Kim T, Zhang H, Wu T, Chinniah K, Maghsoudlou S, Zhang R, Zhu J, Khaki S, Buin A, Chaji F, Salehi A, Nguyen BN, Samaras D, Plataniotis KN. Computational pathology: A survey review and the way forward. J Pathol Inform 2024; 15:100357. [PMID: 38420608 PMCID: PMC10900832 DOI: 10.1016/j.jpi.2023.100357] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 12/21/2023] [Accepted: 12/23/2023] [Indexed: 03/02/2024] Open
Abstract
Computational Pathology (CPath) is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that are mainly address by CPath tools. With evergrowing developments in deep learning and computer vision algorithms, and the ease of the data flow from digital pathology, currently CPath is witnessing a paradigm shift. Despite the sheer volume of engineering and scientific works being introduced for cancer image analysis, there is still a considerable gap of adopting and integrating these algorithms in clinical practice. This raises a significant question regarding the direction and trends that are undertaken in CPath. In this article we provide a comprehensive review of more than 800 papers to address the challenges faced in problem design all-the-way to the application and implementation viewpoints. We have catalogued each paper into a model-card by examining the key works and challenges faced to layout the current landscape in CPath. We hope this helps the community to locate relevant works and facilitate understanding of the field's future directions. In a nutshell, we oversee the CPath developments in cycle of stages which are required to be cohesively linked together to address the challenges associated with such multidisciplinary science. We overview this cycle from different perspectives of data-centric, model-centric, and application-centric problems. We finally sketch remaining challenges and provide directions for future technical developments and clinical integration of CPath. For updated information on this survey review paper and accessing to the original model cards repository, please refer to GitHub. Updated version of this draft can also be found from arXiv.
Collapse
Affiliation(s)
- Mahdi S Hosseini
- Department of Computer Science and Software Engineering (CSSE), Concordia Univeristy, Montreal, QC H3H 2R9, Canada
| | | | - Vincent Quoc-Huy Trinh
- Institute for Research in Immunology and Cancer of the University of Montreal, Montreal, QC H3T 1J4, Canada
| | - Lyndon Chan
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Danial Hasan
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Xingwen Li
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Stephen Yang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Taehyo Kim
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Haochen Zhang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Theodore Wu
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Kajanan Chinniah
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Sina Maghsoudlou
- Department of Computer Science and Software Engineering (CSSE), Concordia Univeristy, Montreal, QC H3H 2R9, Canada
| | - Ryan Zhang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Jiadai Zhu
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Samir Khaki
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Andrei Buin
- Huron Digitial Pathology, St. Jacobs, ON N0B 2N0, Canada
| | - Fatemeh Chaji
- Department of Computer Science and Software Engineering (CSSE), Concordia Univeristy, Montreal, QC H3H 2R9, Canada
| | - Ala Salehi
- Department of Electrical and Computer Engineering, University of New Brunswick, Fredericton, NB E3B 5A3, Canada
| | - Bich Ngoc Nguyen
- University of Montreal Hospital Center, Montreal, QC H2X 0C2, Canada
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, United States
| | - Konstantinos N Plataniotis
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| |
Collapse
|
8
|
Li G, Munawar A, Su Su Win N, Fan M, Zeeshan Nawaz M, Lin L. Multispectral breast image grayscale and quality enhancement by repeated pair image registration & accumulation method. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 320:124558. [PMID: 38870695 DOI: 10.1016/j.saa.2024.124558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 05/27/2024] [Accepted: 05/28/2024] [Indexed: 06/15/2024]
Abstract
Nowadays, for detecting breast cancer in its early stages, the focus is on multispectral transmission imaging. Frame accumulation is a promising technique to enhance the grayscale level of the multispectral transmission images. Still, during the image acquisition process, human respiration or camera jitter causes the displacement of the frame's sequence which leads to the loss of accuracy and image quality of the frame accumulated image is reduced. In this article, we have proposed a new method named "repeated pair image registration and accumulation "to resolve the issue. In this method first pair of images from the sequence is first registered and accumulated followed by the next pair to be registered and accumulated. Then these two accumulated frames are registered and accumulated again. This process is repeated until all the frames from the sequence are processed and the final image is obtained. This method is tested on the sequence of breast frames taken at 600 nm, 620 nm, 670 nm, and 760 nm wavelength of light and proved the enhancement of quality, accuracy, and grayscale by various mathematical assessments. Furthermore, the processing time of our proposed method is very low because descent gradient optimization algorithm is used here for image registration purpose. This optimization algorithm has high speed as compared to other methods and is verified by registering a single image of each wavelength by three different methods. It has laid the foundations of early detection of breast cancer using multispectral transmission imaging.
Collapse
Affiliation(s)
- Gang Li
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Adnan Munawar
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Nan Su Su Win
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Meiling Fan
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Muhammad Zeeshan Nawaz
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Ling Lin
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
9
|
Yang S, Huang Q, Yu M. Advancements in remote sensing for active fire detection: A review of datasets and methods. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 943:173273. [PMID: 38823698 DOI: 10.1016/j.scitotenv.2024.173273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 04/06/2024] [Accepted: 05/13/2024] [Indexed: 06/03/2024]
Abstract
This study comprehensively and critically reviews active fire detection advancements in remote sensing from 1975 to the present, focusing on two main perspectives: datasets and corresponding instruments, and detection algorithms. The study highlights the increasing role of machine learning, particularly deep learning techniques, in active fire detection. Looking forward, the review outlines current challenges and future research opportunities in remote sensing for active fire detection. These include exploring data quality management and multi-modal learning, developing spatiotemporally explicit models, investigating self-supervised learning models, improving explainable and interpretable models, integrating physical-process based models with machine learning, and building digital twins to replicate wildfire dynamics and perform what-if scenario analysis. The review aims to serve as a valuable resource for informing natural resource management and enhancing environmental protection efforts through the application of remote sensing technology.
Collapse
Affiliation(s)
- Songxi Yang
- Spatial Computing and Data Mining Lab, Department of Geography, University of Wisconsin-Madison, Madison 53705, WI, USA
| | - Qunying Huang
- Spatial Computing and Data Mining Lab, Department of Geography, University of Wisconsin-Madison, Madison 53705, WI, USA.
| | - Manzhu Yu
- Department of Geography, Pennsylvania State University, University Park, 16802, PA, USA
| |
Collapse
|
10
|
Zhou Y, He B, Cao X, Xiao Y, Feng Q, Yang F, Xiao F, Geng X, Du Y. Remotely sensed estimates of long-term biochemical oxygen demand over Hong Kong marine waters using machine learning enhanced by imbalanced label optimisation. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 943:173748. [PMID: 38857793 DOI: 10.1016/j.scitotenv.2024.173748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 04/30/2024] [Accepted: 06/02/2024] [Indexed: 06/12/2024]
Abstract
In many coastal cities around the world, continuing water degradation threatens the living environment of humans and aquatic organisms. To assess and control the water pollution situation, this study estimated the Biochemical Oxygen Demand (BOD) concentration of Hong Kong's marine waters using remote sensing and an improved machine learning (ML) method. The scheme was derived from four ML algorithms (RBF, SVR, RF, XGB) and calibrated using a large amount (N > 1000) of in-situ BOD5 data. Based on labeled datasets with different preprocessing, i.e., the original BOD5, the log10(BOD5), and label distribution smoothing (LDS), three types of models were trained and evaluated. The results highlight the superior potential of the LDS-based model to improve BOD5 estimate by dealing with imbalanced training dataset. Additionally, XGB and RF outperformed RBF and SVR when the model was developed using log10(BOD5) or LDS(BOD5). Over two decades, the BOD5 concentration of Hong Kong marine waters in the autumn (Sep. to Nov.) shows a downward trend, with significant decreases in Deep Bay, Western Buffer, Victoria Harbour, Eastern Buffer, Junk Bay, Port Shelter, and the Tolo Harbour and Channel. Principal component analysis revealed that nutrient levels emerged as the predominant factor in Victoria Harbour and the interior of Deep Bay, while chlorophyll-related and physical parameters were dominant in Southern, Mirs Bay, Northwestern, and the outlet of Deep Bay. LDS provides a new perspective to improve ML-based water quality estimation by alleviating the imbalance in the labeled dataset. Overall, the remotely sensed BOD5 can offer insight into the spatial-temporal distribution of organic matter in Hong Kong coastal waters and valuable guidance for the pollution control.
Collapse
Affiliation(s)
- Yadong Zhou
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| | - Boayin He
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China.
| | - Xiaoyu Cao
- School of Geography and Ocean Science, Nanjing University, Nanjing 210023, China
| | - Yu Xiao
- Key Laboratory of Wetland Ecology and Environment, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qi Feng
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| | - Fan Yang
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fei Xiao
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| | - Xueer Geng
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yun Du
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| |
Collapse
|
11
|
Liu G, Zhang J, Chan AB, Hsiao JH. Human attention guided explainable artificial intelligence for computer vision models. Neural Netw 2024; 177:106392. [PMID: 38788290 DOI: 10.1016/j.neunet.2024.106392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 05/11/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024]
Abstract
Explainable artificial intelligence (XAI) has been increasingly investigated to enhance the transparency of black-box artificial intelligence models, promoting better user understanding and trust. Developing an XAI that is faithful to models and plausible to users is both a necessity and a challenge. This work examines whether embedding human attention knowledge into saliency-based XAI methods for computer vision models could enhance their plausibility and faithfulness. Two novel XAI methods for object detection models, namely FullGrad-CAM and FullGrad-CAM++, were first developed to generate object-specific explanations by extending the current gradient-based XAI methods for image classification models. Using human attention as the objective plausibility measure, these methods achieve higher explanation plausibility. Interestingly, all current XAI methods when applied to object detection models generally produce saliency maps that are less faithful to the model than human attention maps from the same object detection task. Accordingly, human attention-guided XAI (HAG-XAI) was proposed to learn from human attention how to best combine explanatory information from the models to enhance explanation plausibility by using trainable activation functions and smoothing kernels to maximize the similarity between XAI saliency map and human attention map. The proposed XAI methods were evaluated on widely used BDD-100K, MS-COCO, and ImageNet datasets and compared with typical gradient-based and perturbation-based XAI methods. Results suggest that HAG-XAI enhanced explanation plausibility and user trust at the expense of faithfulness for image classification models, and it enhanced plausibility, faithfulness, and user trust simultaneously and outperformed existing state-of-the-art XAI methods for object detection models.
Collapse
Affiliation(s)
- Guoyang Liu
- School of Integrated Circuits, Shandong University, Jinan, China; Department of Psychology, University of Hong Kong, Pokfulam Road, Hong Kong.
| | | | - Antoni B Chan
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.
| | - Janet H Hsiao
- Division of Social Science, Hong Kong University of Science and Technology, Clearwater Bay, Hong Kong; Department of Psychology, University of Hong Kong, Pokfulam Road, Hong Kong.
| |
Collapse
|
12
|
Bugler H, Berto R, Souza R, Harris AD. Frequency and phase correction of GABA-edited magnetic resonance spectroscopy using complex-valued convolutional neural networks. Magn Reson Imaging 2024; 111:186-195. [PMID: 38744351 DOI: 10.1016/j.mri.2024.05.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/16/2024]
Abstract
PURPOSE To determine the significance of complex-valued inputs and complex-valued convolutions compared to real-valued inputs and real-valued convolutions in convolutional neural networks (CNNs) for frequency and phase correction (FPC) of GABA-edited magnetic resonance spectroscopy (MRS) data. METHODS An ablation study using simulated data was performed to determine the most effective input (real or complex) and convolution type (real or complex) to predict frequency and phase shifts in GABA-edited MEGA-PRESS data using CNNs. The best CNN model was subsequently compared using both simulated and in vivo data to two recently proposed deep learning (DL) methods for FPC of GABA-edited MRS. All methods were trained using the same experimental setup and evaluated using the signal-to-noise ratio (SNR) and linewidth of the GABA peak, choline artifact, and by visually assessing the reconstructed final difference spectrum. Statistical significance was assessed using the Wilcoxon signed rank test. RESULTS The ablation study showed that using complex values for the input represented by real and imaginary channels in our model input tensor, with complex convolutions was most effective for FPC. Overall, in the comparative study using simulated data, our CC-CNN model (that received complex-valued inputs with complex convolutions) outperformed the other models as evaluated by the mean absolute error. CONCLUSION Our results indicate that the optimal CNN configuration for GABA-edited MRS FPC uses a complex-valued input and complex convolutions. Overall, this model outperformed existing DL models.
Collapse
Affiliation(s)
- Hanna Bugler
- Department of Biomedical Engineering, University of Calgary, Canada; Department of Radiology, University of Calgary, Canada; Hotchkiss Brain Institute, University of Calgary,Canada; Alberta Children's Hospital Research Institute, University of Calgary, Canada.
| | - Rodrigo Berto
- Department of Biomedical Engineering, University of Calgary, Canada; Department of Radiology, University of Calgary, Canada; Hotchkiss Brain Institute, University of Calgary,Canada; Alberta Children's Hospital Research Institute, University of Calgary, Canada
| | - Roberto Souza
- Hotchkiss Brain Institute, University of Calgary,Canada; Department of Electrical and Software Engineering, University of Calgary, Canada
| | - Ashley D Harris
- Department of Radiology, University of Calgary, Canada; Hotchkiss Brain Institute, University of Calgary,Canada; Alberta Children's Hospital Research Institute, University of Calgary, Canada
| |
Collapse
|
13
|
Liu Z, Lv Q, Lee CH, Shen L. Segmenting medical images with limited data. Neural Netw 2024; 177:106367. [PMID: 38754215 DOI: 10.1016/j.neunet.2024.106367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/03/2024] [Accepted: 05/03/2024] [Indexed: 05/18/2024]
Abstract
While computer vision has proven valuable for medical image segmentation, its application faces challenges such as limited dataset sizes and the complexity of effectively leveraging unlabeled images. To address these challenges, we present a novel semi-supervised, consistency-based approach termed the data-efficient medical segmenter (DEMS). The DEMS features an encoder-decoder architecture and incorporates the developed online automatic augmenter (OAA) and residual robustness enhancement (RRE) blocks. The OAA augments input data with various image transformations, thereby diversifying the dataset to improve the generalization ability. The RRE enriches feature diversity and introduces perturbations to create varied inputs for different decoders, thereby providing enhanced variability. Moreover, we introduce a sensitive loss to further enhance consistency across different decoders and stabilize the training process. Extensive experimental results on both our own and three public datasets affirm the effectiveness of DEMS. Under extreme data shortage scenarios, our DEMS achieves 16.85% and 10.37% improvement in dice score compared with the U-Net and top-performed state-of-the-art method, respectively. Given its superior data efficiency, DEMS could present significant advancements in medical segmentation under small data regimes. The project homepage can be accessed at https://github.com/NUS-Tim/DEMS.
Collapse
Affiliation(s)
- Zhaoshan Liu
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| | - Qiujie Lv
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore; School of Intelligent Systems Engineering, Sun Yat-sen University, No. 66, Gongchang Road, Guangming District, 518107, China.
| | - Chau Hung Lee
- Department of Radiology, Tan Tock Seng Hospital, 11 Jalan Tan Tock Seng, Singapore, 308433, Singapore.
| | - Lei Shen
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| |
Collapse
|
14
|
Liu L, Zhou B, Zhao Z, Liu Z. Active Dynamic Weighting for multi-domain adaptation. Neural Netw 2024; 177:106398. [PMID: 38805796 DOI: 10.1016/j.neunet.2024.106398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 03/11/2024] [Accepted: 05/19/2024] [Indexed: 05/30/2024]
Abstract
Multi-source unsupervised domain adaptation aims to transfer knowledge from multiple labeled source domains to an unlabeled target domain. Existing methods either seek a mixture of distributions across various domains or combine multiple single-source models for weighted fusion in the decision process, with little insight into the distributional discrepancy between different source domains and the target domain. Considering the discrepancies in global and local feature distributions between different domains and the complexity of obtaining category boundaries across domains, this paper proposes a novel Active Dynamic Weighting (ADW) for multi-source domain adaptation. Specifically, to effectively utilize the locally advantageous features in the source domains, ADW designs a multi-source dynamic adjustment mechanism during the training process to dynamically control the degree of feature alignment between each source and target domain in the training batch. In addition, to ensure the cross-domain categories can be distinguished, ADW devises a dynamic boundary loss to guide the model to focus on the hard samples near the decision boundary, which enhances the clarity of the decision boundary and improves the model's classification ability. Meanwhile, ADW applies active learning to multi-source unsupervised domain adaptation for the first time, guided by dynamic boundary loss, proposes an efficient importance sampling strategy to select target domain hard samples to annotate at a minimal annotation budget, integrates it into the training process, and further refines the domain alignment at the category level. Experiments on various benchmark datasets consistently demonstrate the superiority of our method.
Collapse
Affiliation(s)
- Long Liu
- Xi'an University of Technology, Xi'an, 710048, China.
| | - Bo Zhou
- Xi'an University of Technology, Xi'an, 710048, China.
| | - Zhipeng Zhao
- Xi'an University of Technology, Xi'an, 710048, China.
| | - Zening Liu
- Xi'an University of Technology, Xi'an, 710048, China.
| |
Collapse
|
15
|
Chen Y, Zheng S, Jin M, Chang Y, Wang N. DualFluidNet: An attention-based dual-pipeline network for fluid simulation. Neural Netw 2024; 177:106401. [PMID: 38805793 DOI: 10.1016/j.neunet.2024.106401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/14/2024] [Accepted: 05/19/2024] [Indexed: 05/30/2024]
Abstract
Fluid motion can be considered as a point cloud transformation when using the SPH method. Compared to traditional numerical analysis methods, using machine learning techniques to learn physics simulations can achieve near-accurate results, while significantly increasing efficiency. In this paper, we propose an innovative approach for 3D fluid simulations utilizing an Attention-based Dual-pipeline Network, which employs a dual-pipeline architecture, seamlessly integrated with an Attention-based Feature Fusion Module. Unlike previous methods, which often make difficult trade-offs between global fluid control and physical law constraints, we find a way to achieve a better balance between these two crucial aspects with a well-designed dual-pipeline approach. Additionally, we design a Type-aware Input Module to adaptively recognize particles of different types and perform feature fusion afterward, such that fluid-solid coupling issues can be better dealt with. Furthermore, we propose a new dataset, Tank3D, to further explore the network's ability to handle more complicated scenes. The experiments demonstrate that our approach not only attains a quantitative enhancement in various metrics, surpassing the state-of-the-art methods, but also signifies a qualitative leap in neural network-based simulation by faithfully adhering to the physical laws. Code and video demonstrations are available at https://github.com/chenyu-xjtu/DualFluidNet.
Collapse
Affiliation(s)
- Yu Chen
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Shuai Zheng
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Menglong Jin
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yan Chang
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Nianyi Wang
- School of Software Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| |
Collapse
|
16
|
Kim Y, Li Y, Moitra A, Yin R, Panda P. Do we really need a large number of visual prompts? Neural Netw 2024; 177:106390. [PMID: 38805797 DOI: 10.1016/j.neunet.2024.106390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 05/03/2024] [Accepted: 05/12/2024] [Indexed: 05/30/2024]
Abstract
Due to increasing interest in adapting models on resource-constrained edges, parameter-efficient transfer learning has been widely explored. Among various methods, Visual Prompt Tuning (VPT), prepending learnable prompts to input space, shows competitive fine-tuning performance compared to training of full network parameters. However, VPT increases the number of input tokens, resulting in additional computational overhead. In this paper, we analyze the impact of the number of prompts on fine-tuning performance and self-attention operation in a vision transformer architecture. Through theoretical and empirical analysis we show that adding more prompts does not lead to linear performance improvement. Further, we propose a Prompt Condensation (PC) technique that aims to prevent performance degradation from using a small number of prompts. We validate our methods on FGVC and VTAB-1k tasks and show that our approach reduces the number of prompts by ∼70% while maintaining accuracy.
Collapse
Affiliation(s)
- Youngeun Kim
- Department of Electrical Engineering, Yale University, New Haven, CT, USA.
| | - Yuhang Li
- Department of Electrical Engineering, Yale University, New Haven, CT, USA
| | - Abhishek Moitra
- Department of Electrical Engineering, Yale University, New Haven, CT, USA
| | - Ruokai Yin
- Department of Electrical Engineering, Yale University, New Haven, CT, USA
| | | |
Collapse
|
17
|
Dong W, Liang Z, Wang L, Tian G, Long Q. Unsupervised domain adaptive segmentation algorithm based on two-level category alignment. Neural Netw 2024; 177:106399. [PMID: 38805794 DOI: 10.1016/j.neunet.2024.106399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 03/13/2024] [Accepted: 05/19/2024] [Indexed: 05/30/2024]
Abstract
To enhance the model's generalization ability in unsupervised domain adaptive segmentation tasks, most approaches have primarily focused on pixel-level local features, but neglected the clue in category information. This limitation results in the segmentation network only learning global inter-domain invariant features but ignoring the category-specific inter-domain invariant features, which degenerates the segmentation performance. To address this issue, we present an Unsupervised Domain Adaptive algorithm based on two-level Category Alignment in two different spaces for semantic segmentation tasks, denoted as UDAca+. The first level is image-level category alignment based on class activation map (CAM), and the second one is pixel-level category alignment based on pseudo label. By utilizing category information, UDAca+ can effectively capture domain-invariant yet category-discriminative feature representations to improve segmentation accuracy. In addition, an adversarial learning-based strategy in mixed domain is designed to train the proposed network. Moreover, a confidence calculation method is introduced to mitigate the misleading issues of negative transfer and over-alignment caused by the noise in image-level pseudo labels. UDAca+ achieves the state-of-the-art (SOTA) performance on two synthetic-to-real adaptative tasks, and verifies its effectiveness for image segmentation.
Collapse
Affiliation(s)
- Wenyong Dong
- School of Computer Science, Wuhan University, Wuhan, 430072, China; School of Information Network Security, Xinjiang University of Political Science and Law, Tumushuke, 843900, China.
| | - Zhixue Liang
- School of Computer Science, Wuhan University, Wuhan, 430072, China; School of Computer and Software, Nanyang Institute of Technology, Nanyang, 473000, China
| | - Liping Wang
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Gang Tian
- School of Computer Science, Wuhan University, Wuhan, 430072, China.
| | - Qianhui Long
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| |
Collapse
|
18
|
Nejat F, Eghtedari S, Alimoradi F. Next-Generation Tear Meniscus Height Detecting and Measuring Smartphone-Based Deep Learning Algorithm Leads in Dry Eye Management. OPHTHALMOLOGY SCIENCE 2024; 4:100546. [PMID: 39051043 PMCID: PMC11268344 DOI: 10.1016/j.xops.2024.100546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 04/16/2024] [Accepted: 04/29/2024] [Indexed: 07/27/2024]
Abstract
Purpose This study aims to develop and assess an infrastructure using Python-based deep learning code for future diagnostic and management purposes related to dry eye disease (DED) utilizing smartphone images. Design Cross-sectional study using data which was gathered in Vision Health Research Clinic. Participants One thousand twenty-one eye images from 734 patients were included in this article that categorizes into 70% females and 30% males, with no sex and age limit. Methods One specialist captured eye images using Samsung A71 (601 images) and iPhone 11 (420 images) cell phones with the flashlight on and direct gaze to the camera. These images include the area of only 1 eye (left/right). Main Outcome Measures First, our specialist did 3 different segmentations for every eye image separately for 80% of the training data. This part contains eye, lower eyelid, and iris segmentation. In 20% of test data after automated cropping of the lower eyelid margin and upscaling by 8×, the appropriate tear meniscus height segmentation will be chosen and measured using a deep learning algorithm. Results The model was trained on 80% of the data and 20% of the data used for validation from both phones with different resolutions. The dice coefficient of the trained model for validation data is 98.68%, and the accuracy of the overall model is 95.39%. Conclusions It appears that this algorithm holds the potential to herald an evolution in the future of diagnosis and management of DED by homecare devices solely through smartphones. Financial Disclosures The author(s) have no proprietary or commercial interest in any materials discussed in this article.
Collapse
Affiliation(s)
- Farhad Nejat
- Ophthalmic Department, Vision Health Reaserch Center, Tehran, Iran
| | - Shima Eghtedari
- Ophthalmic Department, Vision Health Reaserch Center, Tehran, Iran
| | - Fatemeh Alimoradi
- Electrical Department, AmirKabir University of Technology (Tehran Polytechnique), Tehran, Iran
| |
Collapse
|
19
|
Xu X, Chen Y, Yin H, Wang X, Zhang X. Nondestructive detection of SSC in multiple pear (Pyrus pyrifolia Nakai) cultivars using Vis-NIR spectroscopy coupled with the Grad-CAM method. Food Chem 2024; 450:139283. [PMID: 38615528 DOI: 10.1016/j.foodchem.2024.139283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/22/2024] [Accepted: 04/06/2024] [Indexed: 04/16/2024]
Abstract
Vis-NIR spectroscopy coupled with chemometric models is frequently used for pear soluble solid content (SSC) prediction. However, the model robustness is challenged by the variations in pear cultivars. This study explored the feasibility of developing universal models for predicting SSC of multiple pear varieties to improve the model's generalizability. The mature fruits of 6 pear cultivars with green skin (Pyrus pyrifolia Nakai cv. 'Cuiyu', 'Sucui No.1' and 'Cuiguan') and brown skin (Pyrus pyrifolia Nakai cv. 'Hosui','Syusui' and 'Wakahikari') were used to establish single-cultivar models and multi-cultivar universal models using convolutional neural network (CNN), partial least square (PLS), and support vector regression (SVR) approaches. Multi-cultivar universal models were built using full spectra and important variables extracted by gradient-weighted class activation mapping (Grad-CAM), respectively. The universal models based on important variables obtained satisfactory performances with RMSEPs of 0.76, 0.59, 0.80, 1.64, 0.98, and 1.03°Brix on 6 cultivars, respectively.
Collapse
Affiliation(s)
- Xin Xu
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China
| | - Yanyu Chen
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China
| | - Hao Yin
- College of Horticulture, Nanjing Agricultural University, Nanjing 210031, China
| | - Xiaochan Wang
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China
| | - Xiaolei Zhang
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China.
| |
Collapse
|
20
|
Nabi IR, Cardoen B, Khater IM, Gao G, Wong TH, Hamarneh G. AI analysis of super-resolution microscopy: Biological discovery in the absence of ground truth. J Cell Biol 2024; 223:e202311073. [PMID: 38865088 PMCID: PMC11169916 DOI: 10.1083/jcb.202311073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/02/2024] [Accepted: 05/21/2024] [Indexed: 06/13/2024] Open
Abstract
Super-resolution microscopy, or nanoscopy, enables the use of fluorescent-based molecular localization tools to study molecular structure at the nanoscale level in the intact cell, bridging the mesoscale gap to classical structural biology methodologies. Analysis of super-resolution data by artificial intelligence (AI), such as machine learning, offers tremendous potential for the discovery of new biology, that, by definition, is not known and lacks ground truth. Herein, we describe the application of weakly supervised paradigms to super-resolution microscopy and its potential to enable the accelerated exploration of the nanoscale architecture of subcellular macromolecules and organelles.
Collapse
Affiliation(s)
- Ivan R. Nabi
- Department of Cellular and Physiological Sciences, Life Sciences Institute, University of British Columbia, Vancouver, Canada
- School of Biomedical Engineering, University of British Columbia, Vancouver, Canada
| | - Ben Cardoen
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | - Ismail M. Khater
- School of Computing Science, Simon Fraser University, Burnaby, Canada
- Department of Electrical and Computer Engineering, Faculty of Engineering and Technology, Birzeit University, Birzeit, Palestine
| | - Guang Gao
- Department of Cellular and Physiological Sciences, Life Sciences Institute, University of British Columbia, Vancouver, Canada
| | - Timothy H. Wong
- Department of Cellular and Physiological Sciences, Life Sciences Institute, University of British Columbia, Vancouver, Canada
| | - Ghassan Hamarneh
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
21
|
Chen Y, Liu Y, Wang C, Elliott M, Kwok CF, Peña-Solorzano C, Tian Y, Liu F, Frazer H, McCarthy DJ, Carneiro G. BRAIxDet: Learning to detect malignant breast lesion with incomplete annotations. Med Image Anal 2024; 96:103192. [PMID: 38810516 DOI: 10.1016/j.media.2024.103192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 12/30/2023] [Accepted: 05/02/2024] [Indexed: 05/31/2024]
Abstract
Methods to detect malignant lesions from screening mammograms are usually trained with fully annotated datasets, where images are labelled with the localisation and classification of cancerous lesions. However, real-world screening mammogram datasets commonly have a subset that is fully annotated and another subset that is weakly annotated with just the global classification (i.e., without lesion localisation). Given the large size of such datasets, researchers usually face a dilemma with the weakly annotated subset: to not use it or to fully annotate it. The first option will reduce detection accuracy because it does not use the whole dataset, and the second option is too expensive given that the annotation needs to be done by expert radiologists. In this paper, we propose a middle-ground solution for the dilemma, which is to formulate the training as a weakly- and semi-supervised learning problem that we refer to as malignant breast lesion detection with incomplete annotations. To address this problem, our new method comprises two stages, namely: (1) pre-training a multi-view mammogram classifier with weak supervision from the whole dataset, and (2) extending the trained classifier to become a multi-view detector that is trained with semi-supervised student-teacher learning, where the training set contains fully and weakly-annotated mammograms. We provide extensive detection results on two real-world screening mammogram datasets containing incomplete annotations and show that our proposed approach achieves state-of-the-art results in the detection of malignant breast lesions with incomplete annotations.
Collapse
Affiliation(s)
- Yuanhong Chen
- Australian Institute for Machine Learning, The University of Adelaide, Adelaide, Australia.
| | - Yuyuan Liu
- Australian Institute for Machine Learning, The University of Adelaide, Adelaide, Australia
| | - Chong Wang
- Australian Institute for Machine Learning, The University of Adelaide, Adelaide, Australia.
| | - Michael Elliott
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, Melbourne, Australia
| | - Chun Fung Kwok
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, Melbourne, Australia
| | - Carlos Peña-Solorzano
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, Melbourne, Australia
| | - Yu Tian
- Australian Institute for Machine Learning, The University of Adelaide, Adelaide, Australia
| | - Fengbei Liu
- Australian Institute for Machine Learning, The University of Adelaide, Adelaide, Australia
| | - Helen Frazer
- St Vincent's Hospital Melbourne, Melbourne, Australia
| | - Davis J McCarthy
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, Melbourne, Australia; Melbourne Integrative Genomics, The University of Melbourne, Melbourne, Australia
| | - Gustavo Carneiro
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, United Kingdom
| |
Collapse
|
22
|
He W, Zhao B, Zhou Y, Wu R, Wu G, Li Y, Lu M, Zhu L, Gao Y. Freehand 3D Ultrasound Imaging Based on Probe-mounted Vision and IMU System. ULTRASOUND IN MEDICINE & BIOLOGY 2024; 50:1143-1154. [PMID: 38702284 DOI: 10.1016/j.ultrasmedbio.2024.03.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 03/24/2024] [Accepted: 03/31/2024] [Indexed: 05/06/2024]
Abstract
OBJECTIVES Freehand three-dimensional (3D) ultrasound (US) is of great significance for clinical diagnosis and treatment, it is often achieved with the aid of external devices (optical and/or electromagnetic, etc.) that monitor the location and orientation of the US probe. However, this external monitoring is often impacted by imaging environment such as optical occlusions and/or electromagnetic (EM) interference. METHODS To address the above issues, we integrated a binocular camera and an inertial measurement unit (IMU) on a US probe. Subsequently, we built a tight coupling model utilizing the unscented Kalman algorithm based on Lie groups (UKF-LG), combining vision and inertial information to infer the probe's movement, through which the position and orientation of the US image frame are calculated. Finally, the volume data was reconstructed with the voxel-based hole-filling method. RESULTS The experiments including calibration experiments, tracking performance evaluation, phantom scans, and real scenarios scans have been conducted. The results show that the proposed system achieved the accumulated frame position error of 3.78 mm and the orientation error of 0.36° and reconstructed 3D US images with high quality in both phantom and real scenarios. CONCLUSIONS The proposed method has been demonstrated to enhance the robustness and effectiveness of freehand 3D US. Follow-up research will focus on improving the accuracy and stability of multi-sensor fusion to make the system more practical in clinical environments.
Collapse
Affiliation(s)
- Weizhen He
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Bingshuai Zhao
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Yongjin Zhou
- Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Ruodai Wu
- Department of Radiology, Shenzhen University General Hospital, Shenzhen University, Shenzhen, China
| | - Guangyao Wu
- Department of Radiology, Shenzhen University General Hospital, Shenzhen University, Shenzhen, China
| | - Ye Li
- Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen. China
| | - Minhua Lu
- Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | | | - Yi Gao
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Guangdong Key Laboratory of Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China; Shenzhen Key Laboratory of Precision Medicine for Hematological Malignancies, Shenzhen, China; Marshall Laboratory of Biomedical Engineering, Shenzhen, China.
| |
Collapse
|
23
|
Liu Z, Zhao Y, Zhan S, Liu Y, Chen R, He Y. PCDNF: Revisiting Learning-Based Point Cloud Denoising via Joint Normal Filtering. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5419-5436. [PMID: 37405886 DOI: 10.1109/tvcg.2023.3292464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Point cloud denoising is a fundamental and challenging problem in geometry processing. Existing methods typically involve direct denoising of noisy input or filtering raw normals followed by point position updates. Recognizing the crucial relationship between point cloud denoising and normal filtering, we re-examine this problem from a multitask perspective and propose an end-to-end network called PCDNF for joint normal filtering-based point cloud denoising. We introduce an auxiliary normal filtering task to enhance the network's ability to remove noise while preserving geometric features more accurately. Our network incorporates two novel modules. First, we design a shape-aware selector to improve noise removal performance by constructing latent tangent space representations for specific points, taking into account learned point and normal features as well as geometric priors. Second, we develop a feature refinement module to fuse point and normal features, capitalizing on the strengths of point features in describing geometric details and normal features in representing geometric structures, such as sharp edges and corners. This combination overcomes the limitations of each feature type and better recovers geometric information. Extensive evaluations, comparisons, and ablation studies demonstrate that the proposed method outperforms state-of-the-art approaches in both point cloud denoising and normal filtering.
Collapse
|
24
|
Zhou WY, Yuan L, Chen SY, Gao L, Hu SM. LC-NeRF: Local Controllable Face Generation in Neural Radiance Field. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5437-5448. [PMID: 37459257 DOI: 10.1109/tvcg.2023.3293653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]
Abstract
3D face generation has achieved high visual quality and 3D consistency thanks to the development of neural radiance fields (NeRF). However, these methods model the whole face as a neural radiance field, which limits the controllability of the local regions. In other words, previous methods struggle to independently control local regions, such as the mouth, nose, and hair. To improve local controllability in NeRF-based face generation, we propose LC-NeRF, which is composed of a Local Region Generators Module (LRGM) and a Spatial-Aware Fusion Module (SAFM), allowing for geometry and texture control of local facial regions. The LRGM models different facial regions as independent neural radiance fields and the SAFM is responsible for merging multiple independent neural radiance fields into a complete representation. Finally, LC-NeRF enables the modification of the latent code associated with each individual generator, thereby allowing precise control over the corresponding local region. Qualitative and quantitative evaluations show that our method provides better local controllability than state-of-the-art 3D-aware face generation methods. A perception study reveals that our method outperforms existing state-of-the-art methods in terms of image quality, face consistency, and editing effects. Furthermore, our method exhibits favorable performance in downstream tasks, including real image editing and text-driven facial image editing.
Collapse
|
25
|
Ma J, Wang P, Kong D, Wang Z, Liu J, Pei H, Zhao J. Robust Visual Question Answering: Datasets, Methods, and Future Challenges. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:5575-5594. [PMID: 38358867 DOI: 10.1109/tpami.2024.3366154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question. However, it is widely recognized that previous generic VQA methods often tend to memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers. Therefore, these methods usually achieve high in-distribution but poor out-of-distribution performance. In recent years, various datasets and debiasing methods have been proposed to evaluate and enhance the VQA robustness, respectively. This paper provides the first comprehensive survey focused on this emerging fashion. Specifically, we first provide an overview of the development process of datasets from in-distribution and out-of-distribution perspectives. Then, we examine the evaluation metrics employed by these datasets. Third, we propose a typology that presents the development process, similarities and differences, robustness comparison, and technical features of existing debiasing methods. Furthermore, we analyze and discuss the robustness of representative vision-and-language pre-training models on VQA. Finally, through a thorough review of the available literature and experimental analysis, we discuss the key areas for future research from various viewpoints.
Collapse
|
26
|
Ghislain F, Beaudelaire ST, Daniel T. An accurate unsupervised extraction of retinal vasculature using curvelet transform and classical morphological operators. Comput Biol Med 2024; 178:108801. [PMID: 38917533 DOI: 10.1016/j.compbiomed.2024.108801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/18/2024] [Accepted: 06/21/2024] [Indexed: 06/27/2024]
Abstract
BACKGROUND Many ophthalmic disorders such as diabetic retinopathy and hypertension can be early diagnosed by analyzing changes related to the vascular structure of the retina. Accuracy and efficiency of the segmentation of retinal blood vessels are important parameters that can help the ophthalmologist to better characterize the targeted anomalies. METHOD In this work, we propose a new method for accurate unsupervised automatic segmentation of retinal blood vessels based on a simple and adequate combination of classical filters. Initially, contrast of vessels in retinal image is significantly improved by adding the Curvelet Transform to commonly used Contrast-Limited Adaptive Histogram Equalization technique. Afterwards, a morphological operator using Top Hat is applied to highlight vascular network. Then, a global threshold-based Otsu technique using minimum of intra-class variance is applied for vessel detection. Finally, a cleanup operation based on Match Filter and First Derivative Order Gaussian with fixed parameters is used to remove unwanted or isolated segments. We test the proposed method on images from two publicly available STARE and DRIVE databases. RESULTS We achieve in terms of sensitivity, specificity and accuracy the respective average performances of 0.7407, 0.9878 and 0.9667 on the DRIVE database, then 0.7028, 0.9755 and 0.9507 on the STARE database. CONCLUSIONS Compared to some recent similar work, the obtained results are quite promising and can thus contribute to the optimization of automatic tools to aid in the diagnosis of eye disorders.
Collapse
Affiliation(s)
- Feudjio Ghislain
- Unité de Recherche de Matière Condensée, d'Electronique et de Traitements du Signal (URMACETS), Department of Physics, Faculty of Science, University of Dschang, P.O.Box 67, Dschang, Cameroon; Unité de Recherche d'Automatique et d'Informatique Appliquée (URAIA), IUT-FV de Bandjoun, Université de Dschang-Cameroun, B.P. 134, Bandjoun, Cameroon.
| | - Saha Tchinda Beaudelaire
- Unité de Recherche d'Automatique et d'Informatique Appliquée (URAIA), IUT-FV de Bandjoun, Université de Dschang-Cameroun, B.P. 134, Bandjoun, Cameroon.
| | - Tchiotsop Daniel
- Unité de Recherche d'Automatique et d'Informatique Appliquée (URAIA), IUT-FV de Bandjoun, Université de Dschang-Cameroun, B.P. 134, Bandjoun, Cameroon.
| |
Collapse
|
27
|
Chen X, Liu Q, Deng HH, Kuang T, Lin HHY, Xiao D, Gateno J, Xia JJ, Yap PT. Improving Image Segmentation with Contextual and Structural Similarity. PATTERN RECOGNITION 2024; 152:110489. [PMID: 38645435 PMCID: PMC11027435 DOI: 10.1016/j.patcog.2024.110489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Deep learning models for medical image segmentation are usually trained with voxel-wise losses, e.g., cross-entropy loss, focusing on unary supervision without considering inter-voxel relationships. This oversight potentially leads to semantically inconsistent predictions. Here, we propose a contextual similarity loss (CSL) and a structural similarity loss (SSL) to explicitly and efficiently incorporate inter-voxel relationships for improved performance. The CSL promotes consistency in predicted object categories for each image sub-region compared to ground truth. The SSL enforces compatibility between the predictions of voxel pairs by computing pair-wise distances between them, ensuring that voxels of the same class are close together whereas those from different classes are separated by a wide margin in the distribution space. The effectiveness of the CSL and SSL is evaluated using a clinical cone-beam computed tomography (CBCT) dataset of patients with various craniomaxillofacial (CMF) deformities and a public pancreas dataset. Experimental results show that the CSL and SSL outperform state-of-the-art regional loss functions in preserving segmentation semantics.
Collapse
Affiliation(s)
- Xiaoyang Chen
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, 27599, NC, USA
| | - Qin Liu
- Department of Computer Science, University of North Carolina, Chapel Hill, 27599, NC, USA
| | - Hannah H. Deng
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
| | - Tianshu Kuang
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
| | - Henry Hung-Ying Lin
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
| | - Deqiang Xiao
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, 27599, NC, USA
| | - Jaime Gateno
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
- Department of Surgery (Oral and Maxillofacial Surgery), Weill Medical College, Cornell University, New York, 10065, NY, USA
| | - James J. Xia
- Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, 77030, TX, USA
- Department of Surgery (Oral and Maxillofacial Surgery), Weill Medical College, Cornell University, New York, 10065, NY, USA
| | - Pew-Thian Yap
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, 27599, NC, USA
| |
Collapse
|
28
|
Dong W, Zhu C, Xie D, Zhang Y, Tao S, Tian C. Image restoration for ring-array photoacoustic tomography system based on blind spatially rotational deconvolution. PHOTOACOUSTICS 2024; 38:100607. [PMID: 38665365 PMCID: PMC11044036 DOI: 10.1016/j.pacs.2024.100607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 03/17/2024] [Accepted: 04/09/2024] [Indexed: 04/28/2024]
Abstract
Ring-array photoacoustic tomography (PAT) system has been widely used in noninvasive biomedical imaging. However, the reconstructed image usually suffers from spatially rotational blur and streak artifacts due to the non-ideal imaging conditions. To improve the reconstructed image towards higher quality, we propose a concept of spatially rotational convolution to formulate the image blur process, then we build a regularized restoration problem model accordingly and design an alternating minimization algorithm which is called blind spatially rotational deconvolution to achieve the restored image. Besides, we also present an image preprocessing method based on the proposed algorithm to remove the streak artifacts. We take experiments on phantoms and in vivo biological tissues for evaluation, the results show that our approach can significantly enhance the resolution of the image obtained from ring-array PAT system and remove the streak artifacts effectively.
Collapse
Affiliation(s)
- Wende Dong
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 211106, China
- Key Laboratory of Space Photoelectric Detection and Perception (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing, Jiangsu 211106, China
| | - Chenlong Zhu
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 211106, China
- Key Laboratory of Space Photoelectric Detection and Perception (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing, Jiangsu 211106, China
| | - Dan Xie
- School of Engineering Science, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Yanli Zhang
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 211106, China
- Key Laboratory of Space Photoelectric Detection and Perception (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing, Jiangsu 211106, China
| | - Shuyin Tao
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu 210094, China
| | - Chao Tian
- School of Engineering Science, University of Science and Technology of China, Hefei, Anhui 230026, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, Anhui 230088, China
- Anhui Province Key Laboratory of Biomedical Imaging and Intelligent Processing, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, Anhui 230088, China
| |
Collapse
|
29
|
Lamprou V, Kallipolitis A, Maglogiannis I. On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 253:108238. [PMID: 38823117 DOI: 10.1016/j.cmpb.2024.108238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 05/01/2024] [Accepted: 05/21/2024] [Indexed: 06/03/2024]
Abstract
BACKGROUND AND OBJECTIVE Evaluating the interpretability of Deep Learning models is crucial for building trust and gaining insights into their decision-making processes. In this work, we employ class activation map based attribution methods in a setting where only High-Resolution Class Activation Mapping (HiResCAM) is known to produce faithful explanations. The objective is to evaluate the quality of the attribution maps using quantitative metrics and investigate whether faithfulness aligns with the metrics results. METHODS We fine-tune pre-trained deep learning architectures over four medical image datasets in order to calculate attribution maps. The maps are evaluated on a threefold metrics basis utilizing well-established evaluation scores. RESULTS Our experimental findings suggest that the Area Over Perturbation Curve (AOPC) and Max-Sensitivity scores favor the HiResCAM maps. On the other hand, the Heatmap Assisted Accuracy Score (HAAS) does not provide insights to our comparison as it evaluates almost all maps as inaccurate. To this purpose we further compare our calculated values against values obtained over a diverse group of models which are trained on non-medical benchmark datasets, to eventually achieve more responsive results. CONCLUSION This study develops a series of experiments to discuss the connection between faithfulness and quantitative metrics over medical attribution maps. HiResCAM preserves the gradient effect on a pixel level ultimately producing high-resolution, informative and resilient mappings. In turn, this is depicted in the results of AOPC and Max-Sensitivity metrics, successfully identifying the faithful algorithm. In regards to HAAS, our experiments yield that it is sensitive over complex medical patterns, commonly characterized by strong color dependency and multiple attention areas.
Collapse
Affiliation(s)
- Vangelis Lamprou
- Department of Digital Systems, University of Piraeus, 80, M. Karaoli & A. Dimitriou St, Piraeus 18534, Greece
| | - Athanasios Kallipolitis
- Department of Digital Systems, University of Piraeus, 80, M. Karaoli & A. Dimitriou St, Piraeus 18534, Greece.
| | - Ilias Maglogiannis
- Department of Digital Systems, University of Piraeus, 80, M. Karaoli & A. Dimitriou St, Piraeus 18534, Greece
| |
Collapse
|
30
|
Guo W, Jin S, Li Y, Jiang Y. The dynamic-static dual-branch deep neural network for urban speeding hotspot identification using street view image data. ACCIDENT; ANALYSIS AND PREVENTION 2024; 203:107636. [PMID: 38776837 DOI: 10.1016/j.aap.2024.107636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 04/24/2024] [Accepted: 05/10/2024] [Indexed: 05/25/2024]
Abstract
The visual information regarding the road environment can influence drivers' perception and judgment, often resulting in frequent speeding incidents. Identifying speeding hotspots in cities can prevent potential speeding incidents, thereby improving traffic safety levels. We propose the Dual-Branch Contextual Dynamic-Static Feature Fusion Network based on static panoramic images and dynamically changing sequence data, aiming to capture global features in the macro scene of the area and dynamically changing information in the micro view for a more accurate urban speeding hotspot area identification. For the static branch, we propose the Multi-scale Contextual Feature Aggregation Network for learning global spatial contextual association information. In the dynamic branch, we construct the Multi-view Dynamic Feature Fusion Network to capture the dynamically changing features of a scene from a continuous sequence of street view images. Additionally, we designed the Dynamic-Static Feature Correlation Fusion Structure to correlate and fuse dynamic and static features. The experimental results show that the model has good performance, and the overall recognition accuracy reaches 99.4%. The ablation experiments show that the recognition effect after the fusion of dynamic and static features is better than that of static and dynamic branches. The proposed model also shows better performance than other deep learning models. In addition, we combine image processing methods and different Class Activation Mapping (CAM) methods to extract speeding frequency visual features from the model perception results. The results show that more accurate speeding frequency features can be obtained by using LayerCAM and GradCAM-Plus for static global scenes and dynamic local sequences, respectively. In the static global scene, the speeding frequency features are mainly concentrated on the buildings and green layout on both sides of the road, while in the dynamic scene, the speeding frequency features shift with the scene changes and are mainly concentrated on the dynamically changing transition areas of greenery, roads, and surrounding buildings. The code and model used for identifying hotspots of urban traffic accidents in this study are available for access: https://github.com/gwt-ZJU/DCDSFF-Net.
Collapse
Affiliation(s)
- Wentong Guo
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| | - Sheng Jin
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China; Zhongyuan Institute, Zhejiang University, Zhengzhou 450000, China.
| | - Yiding Li
- Henan Institute of Advanced Technology, Zhengzhou University, Zhengzhou 450003, China
| | - Yang Jiang
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310058, China; Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| |
Collapse
|
31
|
Bazargani R, Fazli L, Gleave M, Goldenberg L, Bashashati A, Salcudean S. Multi-scale relational graph convolutional network for multiple instance learning in histopathology images. Med Image Anal 2024; 96:103197. [PMID: 38805765 DOI: 10.1016/j.media.2024.103197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/11/2024] [Accepted: 05/02/2024] [Indexed: 05/30/2024]
Abstract
Graph convolutional neural networks have shown significant potential in natural and histopathology images. However, their use has only been studied in a single magnification or multi-magnification with either homogeneous graphs or only different node types. In order to leverage the multi-magnification information and improve message passing with graph convolutional networks, we handle different embedding spaces at each magnification by introducing the Multi-Scale Relational Graph Convolutional Network (MS-RGCN) as a multiple instance learning method. We model histopathology image patches and their relation with neighboring patches and patches at other scales (i.e., magnifications) as a graph. We define separate message-passing neural networks based on node and edge types to pass the information between different magnification embedding spaces. We experiment on prostate cancer histopathology images to predict the grade groups based on the extracted features from patches. We also compare our MS-RGCN with multiple state-of-the-art methods with evaluations on several source and held-out datasets. Our method outperforms the state-of-the-art on all of the datasets and image types consisting of tissue microarrays, whole-mount slide regions, and whole-slide images. Through an ablation study, we test and show the value of the pertinent design features of the MS-RGCN.
Collapse
Affiliation(s)
- Roozbeh Bazargani
- Electrical and Computer Engineering, University of British Columbia, 2332 Main Mall, Vancouver, BC V6T 1Z4, Canada.
| | - Ladan Fazli
- The Vancouver Prostate Centre, 2660 Oak St, Vancouver, BC V6H 3Z6, Canada; Department of Urologic Sciences, University of British Columbia, 2775 Laurel Street, Vancouver, BC V5Z 1M9, Canada
| | - Martin Gleave
- The Vancouver Prostate Centre, 2660 Oak St, Vancouver, BC V6H 3Z6, Canada; Department of Urologic Sciences, University of British Columbia, 2775 Laurel Street, Vancouver, BC V5Z 1M9, Canada
| | - Larry Goldenberg
- The Vancouver Prostate Centre, 2660 Oak St, Vancouver, BC V6H 3Z6, Canada; Department of Urologic Sciences, University of British Columbia, 2775 Laurel Street, Vancouver, BC V5Z 1M9, Canada
| | - Ali Bashashati
- School of Biomedical Engineering, University of British Columbia, 2222 Health Sciences Mall, Vancouver, BC V6T 1Z3, Canada; Department of Pathology & Laboratory Medicine, University of British Columbia, 2211 Wesbrook Mall, Vancouver, BC V6T 1Z7, Canada.
| | - Septimiu Salcudean
- Electrical and Computer Engineering, University of British Columbia, 2332 Main Mall, Vancouver, BC V6T 1Z4, Canada; School of Biomedical Engineering, University of British Columbia, 2222 Health Sciences Mall, Vancouver, BC V6T 1Z3, Canada.
| |
Collapse
|
32
|
Su Q, He W, Wei X, Xu B, Li G. Multi-scale full spike pattern for semantic segmentation. Neural Netw 2024; 176:106330. [PMID: 38688068 DOI: 10.1016/j.neunet.2024.106330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 02/08/2024] [Accepted: 04/19/2024] [Indexed: 05/02/2024]
Abstract
Spiking neural networks (SNNs), as the brain-inspired neural networks, encode information in spatio-temporal dynamics. They have the potential to serve as low-power alternatives to artificial neural networks (ANNs) due to their sparse and event-driven nature. However, existing SNN-based models for pixel-level semantic segmentation tasks suffer from poor performance and high memory overhead, failing to fully exploit the computational effectiveness and efficiency of SNNs. To address these challenges, we propose the multi-scale and full spike segmentation network (MFS-Seg), which is based on the deep direct trained SNN and represents the first attempt to train a deep SNN with surrogate gradients for semantic segmentation. Specifically, we design an efficient fully-spike residual block (EFS-Res) to alleviate representation issues caused by spiking noise on different channels. EFS-Res utilizes depthwise separable convolution to improve the distributions of spiking feature maps. The visualization shows that our model can effectively extract the edge features of segmented objects. Furthermore, it can significantly reduce the memory overhead and energy consumption of the network. In addition, we theoretically analyze and prove that EFS-Res can avoid the degradation problem based on block dynamical isometry theory. Experimental results on the Camvid dataset, the DDD17 dataset, and the DSEC-Semantic dataset show that our model achieves comparable performance to the mainstream UNet network with up to 31× fewer parameters, while significantly reducing power consumption by over 13×. Overall, our MFS-Seg model demonstrates promising results in terms of performance, memory efficiency, and energy consumption, showcasing the potential of deep SNNs for semantic segmentation tasks. Our code is available in https://github.com/BICLab/MFS-Seg.
Collapse
Affiliation(s)
- Qiaoyi Su
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.
| | - Weihua He
- Department of Precision Instrument, Tsinghua University, Beijing 100084, China.
| | - Xiaobao Wei
- Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Bo Xu
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Guoqi Li
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Institute of Automation, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Chinese Academy of Sciences, Beijing 100190, China.
| |
Collapse
|
33
|
Sun H, Wen Y, Feng H, Zheng Y, Mei Q, Ren D, Yu M. Unsupervised Bidirectional Contrastive Reconstruction and Adaptive Fine-Grained Channel Attention Networks for image dehazing. Neural Netw 2024; 176:106314. [PMID: 38669785 DOI: 10.1016/j.neunet.2024.106314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 03/04/2024] [Accepted: 04/09/2024] [Indexed: 04/28/2024]
Abstract
Recently, Unsupervised algorithms has achieved remarkable performance in image dehazing. However, the CycleGAN framework can lead to confusion in generator learning due to inconsistent data distributions, and the DisentGAN framework lacks effective constraints on generated images, resulting in the loss of image content details and color distortion. Moreover, Squeeze and Excitation channel attention employs only fully connected layers to capture global information, lacking interaction with local information, resulting in inaccurate feature weight allocation for image dehazing. To solve the above problems, in this paper, we propose an Unsupervised Bidirectional Contrastive Reconstruction and Adaptive Fine-Grained Channel Attention Networks (UBRFC-Net). Specifically, an Unsupervised Bidirectional Contrastive Reconstruction Framework (BCRF) is proposed, aiming to establish bidirectional contrastive reconstruction constraints, not only to avoid the generator learning confusion in CycleGAN but also to enhance the constraint capability for clear images and the reconstruction ability of the unsupervised dehazing network. Furthermore, an Adaptive Fine-Grained Channel Attention (FCA) is developed to utilize the correlation matrix to capture the correlation between global and local information at various granularities promotes interaction between them, achieving more efficient feature weight assignment. Experimental results on challenging benchmark datasets demonstrate the superiority of our UBRFC-Net over state-of-the-art unsupervised image dehazing methods. This study successfully introduces an enhanced unsupervised image dehazing approach, addressing limitations of existing methods and achieving superior dehazing results. The source code is available at https://github.com/Lose-Code/UBRFC-Net.
Collapse
Affiliation(s)
- Hang Sun
- Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang, 443002, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, China
| | - Yang Wen
- Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang, 443002, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, China
| | - Huijing Feng
- Department of Thoracic Oncology, Cancer Center, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Third Hospital of Shanxi Medical University, Taiyuan, 030002, China
| | - Yuelin Zheng
- Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang, 443002, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, China.
| | - Qi Mei
- Department of Thoracic Oncology, Cancer Center, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Third Hospital of Shanxi Medical University, Taiyuan, 030002, China; Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.
| | - Dong Ren
- Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang, 443002, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, China
| | - Mei Yu
- Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang, 443002, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, China
| |
Collapse
|
34
|
Xia Y, Liu Y, Li T, He S, Chang H, Wang Y, Zhang Y, Ge W. Assessing parameter efficient methods for pre-trained language model in annotating scRNA-seq data. Methods 2024; 228:12-21. [PMID: 38759908 DOI: 10.1016/j.ymeth.2024.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 04/28/2024] [Accepted: 05/10/2024] [Indexed: 05/19/2024] Open
Abstract
Annotating cell types of single-cell RNA sequencing (scRNA-seq) data is crucial for studying cellular heterogeneity in the tumor microenvironment. Recently, large-scale pre-trained language models (PLMs) have achieved significant progress in cell-type annotation of scRNA-seq data. This approach effectively addresses previous methods' shortcomings in performance and generalization. However, fine-tuning PLMs for different downstream tasks demands considerable computational resources, rendering it impractical. Hence, a new research branch introduces parameter-efficient fine-tuning (PEFT). This involves optimizing a few parameters while leaving the majority unchanged, leading to substantial reductions in computational expenses. Here, we utilize scBERT, a large-scale pre-trained model, to explore the capabilities of three PEFT methods in scRNA-seq cell type annotation. Extensive benchmark studies across several datasets demonstrate the superior applicability of PEFT methods. Furthermore, downstream analysis using models obtained through PEFT showcases their utility in novel cell type discovery and model interpretability for potential marker genes. Our findings underscore the considerable potential of PEFT in PLM-based cell type annotation, presenting novel perspectives for the analysis of scRNA-seq data.
Collapse
Affiliation(s)
- Yucheng Xia
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, 610209, China
| | - Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Tianhao Li
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Sihan He
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Hong Chang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yaqing Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Wenyi Ge
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China.
| |
Collapse
|
35
|
Tan B, Qin H, Zhang X, Wang Y, Xiang T, Chen B. Using Multi-Level Consistency Learning for Partial-to-Partial Point Cloud Registration. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4881-4894. [PMID: 37235469 DOI: 10.1109/tvcg.2023.3280171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Point cloud registration is a basic task in computer vision and computer graphics. Recently, deep learning-based end-to-end methods have made great progress in this field. One of the challenges of these methods is to deal with partial-to-partial registration tasks. In this work, we propose a novel end-to-end framework called MCLNet that makes full use of multi-level consistency for point cloud registration. First, the point-level consistency is exploited to prune points located outside overlapping regions. Second, we propose a multi-scale attention module to perform consistency learning at the correspondence-level for obtaining reliable correspondences. To further improve the accuracy of our method, we propose a novel scheme to estimate the transformation based on geometric consistency between correspondences. Compared to baseline methods, experimental results show that our method performs well on smaller-scale data, especially with exact matches. The reference time and memory footprint of our method are relatively balanced, which is more beneficial for practical applications.
Collapse
|
36
|
Yuan W, Cheng J, Gong Y, He L, Zhang J. MACG-Net: Multi-axis cross gating network for deformable medical image registration. Comput Biol Med 2024; 178:108673. [PMID: 38905891 DOI: 10.1016/j.compbiomed.2024.108673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 04/18/2024] [Accepted: 05/26/2024] [Indexed: 06/23/2024]
Abstract
Deformable Image registration is a fundamental yet vital task for preoperative planning, intraoperative information fusion, disease diagnosis and follow-ups. It solves the non-rigid deformation field to align an image pair. Latest approaches such as VoxelMorph and TransMorph compute features from a simple concatenation of moving and fixed images. However, this often leads to weak alignment. Moreover, the convolutional neural network (CNN) or the hybrid CNN-Transformer based backbones are constrained to have limited sizes of receptive field and cannot capture long range relations while full Transformer based approaches are computational expensive. In this paper, we propose a novel multi-axis cross grating network (MACG-Net) for deformable medical image registration, which combats these limitations. MACG-Net uses a dual stream multi-axis feature fusion module to capture both long-range and local context relationships from the moving and fixed images. Cross gate blocks are integrated with the dual stream backbone to consider both independent feature extractions in the moving-fixed image pair and the relationship between features from the image pair. We benchmark our method on several different datasets including 3D atlas-based brain MRI, inter-patient brain MRI and 2D cardiac MRI. The results demonstrate that the proposed method has achieved state-of-the-art performance. The source code has been released at https://github.com/Valeyards/MACG.
Collapse
Affiliation(s)
- Wei Yuan
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jun Cheng
- Institute for Infocomm Research, Agency for Science, Technology and Research, 138632, Singapore
| | - Yuhang Gong
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Ling He
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China.
| | - Jing Zhang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
37
|
Baldini C, Azam MA, Sampieri C, Ioppi A, Ruiz-Sevilla L, Vilaseca I, Alegre B, Tirrito A, Pennacchi A, Peretti G, Moccia S, Mattos LS. An automated approach for real-time informative frames classification in laryngeal endoscopy using deep learning. Eur Arch Otorhinolaryngol 2024; 281:4255-4264. [PMID: 38698163 PMCID: PMC11266252 DOI: 10.1007/s00405-024-08676-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Accepted: 04/08/2024] [Indexed: 05/05/2024]
Abstract
PURPOSE Informative image selection in laryngoscopy has the potential for improving automatic data extraction alone, for selective data storage and a faster review process, or in combination with other artificial intelligence (AI) detection or diagnosis models. This paper aims to demonstrate the feasibility of AI in providing automatic informative laryngoscopy frame selection also capable of working in real-time providing visual feedback to guide the otolaryngologist during the examination. METHODS Several deep learning models were trained and tested on an internal dataset (n = 5147 images) and then tested on an external test set (n = 646 images) composed of both white light and narrow band images. Four videos were used to assess the real-time performance of the best-performing model. RESULTS ResNet-50, pre-trained with the pretext strategy, reached a precision = 95% vs. 97%, recall = 97% vs, 89%, and the F1-score = 96% vs. 93% on the internal and external test set respectively (p = 0.062). The four testing videos are provided in the supplemental materials. CONCLUSION The deep learning model demonstrated excellent performance in identifying diagnostically relevant frames within laryngoscopic videos. With its solid accuracy and real-time capabilities, the system is promising for its development in a clinical setting, either autonomously for objective quality control or in conjunction with other algorithms within a comprehensive AI toolset aimed at enhancing tumor detection and diagnosis.
Collapse
Affiliation(s)
- Chiara Baldini
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Departement of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genoa, Italy
| | - Muhammad Adeel Azam
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Departement of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genoa, Italy
| | - Claudio Sampieri
- Department of Experimental Medicine (DIMES), University of Genoa, Genoa, Italy.
- Department of Otolaryngology, Hospital Clínic, C. de Villarroel, 170, 08029, Barcelona, Spain.
- Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain.
| | | | - Laura Ruiz-Sevilla
- Otorhinolaryngology Head-Neck Surgery Department, Hospital Universitari Joan XXIII de Tarragona, Tarragona, Spain
| | - Isabel Vilaseca
- Department of Otolaryngology, Hospital Clínic, C. de Villarroel, 170, 08029, Barcelona, Spain
- Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
- Translational Genomics and Target Therapies in Solid Tumors Group, Institut d́Investigacions Biomèdiques August Pi i Sunyer, IDIBAPS, Barcelona, Spain
- Faculty of Medicine, University of Barcelona, Barcelona, Spain
| | - Berta Alegre
- Department of Otolaryngology, Hospital Clínic, C. de Villarroel, 170, 08029, Barcelona, Spain
- Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
| | - Alessandro Tirrito
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alessia Pennacchi
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Giorgio Peretti
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Sara Moccia
- The BioRobotics Institute and Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| |
Collapse
|
38
|
Dvoeglazova M, Sawada T. A role of rectangularity in perceiving a 3D shape of an object. Vision Res 2024; 221:108433. [PMID: 38772272 DOI: 10.1016/j.visres.2024.108433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/19/2024] [Accepted: 05/06/2024] [Indexed: 05/23/2024]
Abstract
Rectangularity and perpendicularity of contours are important properties of 3D shape for the visual system and the visual system can use them asa prioriconstraints for perceivingshape veridically. The presentarticle provides a comprehensive review ofpriorstudiesofthe perception of rectangularity and perpendicularity anditdiscussestheir effects on3D shape perception from both theoretical and empiricalapproaches. It has been shown that the visual system is biased to perceive a rectangular 3D shape from a 2D image. We thought that this bias might be attributable to the likelihood of a rectangular interpretation but this hypothesis is not supported by the results of our psychophysical experiment. Note that the perception ofa rectangular shape cannot be explained solely on the basis of geometry. A rectangular shape is perceived from an image that is inconsistent with a rectangular interpretation. To address thisissue, we developed a computational model that can recover a rectangular shape from an image of a parallelopiped. The model allows the recovered shape to be slightly inconsistent so that the recovered shape satisfies the a priori constraints of maximum compactness and minimal surface area. This model captures someof thephenomenaassociated withthe perception of the rectangular shape that were reported inpriorstudies. This finding suggests that rectangularity works for shape perception by incorporatingitwith someadditionalconstraints.
Collapse
Affiliation(s)
| | - Tadamasa Sawada
- School of Psychology, HSE University, Moscow, Russia; Akian College of Science and Engineering, American University of Armenia, Yerevan, Armenia; Department of Psychology, Russian-Armenian (Slavonic) University, Yerevan, Armenia; European University of Armenia, Yerevan, Armenia
| |
Collapse
|
39
|
Zheng JW, Hsu JY, Li CC, Lin IC. Characteristic-Preserving Latent Space for Unpaired Cross-Domain Translation of 3D Point Clouds. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5212-5226. [PMID: 37339041 DOI: 10.1109/tvcg.2023.3287923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
This article aims at unpaired shape-to-shape transformation for 3D point clouds, for instance, turning a chair to its table counterpart. Recent work for 3D shape transfer or deformation highly relies on paired inputs or specific correspondences. However, it is usually not feasible to assign precise correspondences or prepare paired data from two domains. A few methods start to study unpaired learning, but the characteristics of a source model may not be preserved after transformation. To overcome the difficulty of unpaired learning for transformation, we propose alternately training the autoencoder and translators to construct shape-aware latent space. This latent space based on novel loss functions enables our translators to transform 3D point clouds across domains and maintain the consistency of shape characteristics. We also crafted a test dataset to objectively evaluate the performance of point-cloud translation. The experiments demonstrate that our framework can construct high-quality models and retain more shape characteristics during cross-domain translation compared to the state-of-the-art methods. Moreover, we also present shape editing applications with our proposed latent space, including shape-style mixing and shape-type shifting, which do not require retraining a model.
Collapse
|
40
|
Yang K, Li Q, Tian C, Zhang H, Shi A, Li J. DeforT: Deformable transformer for visual tracking. Neural Netw 2024; 176:106380. [PMID: 38754289 DOI: 10.1016/j.neunet.2024.106380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 04/19/2024] [Accepted: 05/06/2024] [Indexed: 05/18/2024]
Abstract
Most trackers formulate visual tracking as common classification and regression (i.e., bounding box regression) tasks. Correlation features that are computed through depth-wise convolution or channel-wise multiplication operations are input into both the classification and regression branches for inference. However, this matching computation with the linear correlation method tends to lose semantic features and obtain only a local optimum. Moreover, these trackers use an unreliable ranking based on the classification score and the intersection over union (IoU) loss for the regression training, thus degrading the tracking performance. In this paper, we introduce a deformable transformer model, which effectively computes the correlation features of the training and search sets. A new loss called the quality-aware focal loss (QAFL) is used to train the classification network; it efficiently alleviates the inconsistency between the classification and localization quality predictions. We use a new regression loss called α-GIoU to train the regression network, and it effectively improves localization accuracy. To further improve the tracker's robustness, the candidate object location is predicted by using a combination of online learning scores with a transformer-assisted framework and classification scores. An extensive experiment on six testing datasets demonstrates the effectiveness of our method. In particular, the proposed method attains a success score of 71.7% on the OTB-2015 dataset and an AUC score of 67.3% on the NFS30 dataset, respectively.
Collapse
Affiliation(s)
- Kai Yang
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430200, China; Hubei Luojia Laboratory, Wuhan 430200, China
| | - Qun Li
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Chunwei Tian
- School of Software, Northwestern Polytechnical University, Xi'an, Shaanxi 710129, China; Yangtze River Delta Research Institute, Northwestern Polytechnical University, Taicang 215400, China
| | - Haijun Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Aiwu Shi
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430200, China.
| | - Jinkai Li
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
| |
Collapse
|
41
|
Dai W, Wu T, Liu R, Wang M, Yin J, Liu J. Any region can be perceived equally and effectively on rotation pretext task using full rotation and weighted-region mixture. Neural Netw 2024; 176:106350. [PMID: 38723309 DOI: 10.1016/j.neunet.2024.106350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 01/15/2024] [Accepted: 04/28/2024] [Indexed: 06/17/2024]
Abstract
In recent years, self-supervised learning has emerged as a powerful approach to learning visual representations without requiring extensive manual annotation. One popular technique involves using rotation transformations of images, which provide a clear visual signal for learning semantic representation. However, in this work, we revisit the pretext task of predicting image rotation in self-supervised learning and discover that it tends to marginalise the perception of features located near the centre of an image. To address this limitation, we propose a new self-supervised learning method, namely FullRot, which spotlights underrated regions by resizing the randomly selected and cropped regions of images. Moreover, FullRot increases the complexity of the rotation pretext task by applying the degree-free rotation to the region cropped into a circle. To encourage models to learn from different general parts of an image, we introduce a new data mixture technique called WRMix, which merges two random intra-image patches. By combining these innovative crop and rotation methods with the data mixture scheme, our approach, FullRot + WRMix, surpasses the state-of-the-art self-supervision methods in classification, segmentation, and object detection tasks on ten benchmark datasets with an improvement of up to +13.98% accuracy on STL-10, +8.56% accuracy on CIFAR-10, +10.20% accuracy on Sports-100, +15.86% accuracy on Mammals-45, +15.15% accuracy on PAD-UFES-20, +32.44% mIoU on VOC 2012, +7.62% mIoU on ISIC 2018, +9.70% mIoU on FloodArea, +25.16% AP50 on VOC 2007, and +58.69% AP50 on UTDAC 2020. The code is available at https://github.com/anthonyweidai/FullRot_WRMix.
Collapse
Affiliation(s)
- Wei Dai
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Tianyi Wu
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Rui Liu
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Min Wang
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| | - Jianqin Yin
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China.
| | - Jun Liu
- Centre for Robotics and Automation, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
42
|
Motlagh SC, Joanisse M, Wang B, Mohsenzadeh Y. Unveiling the neural dynamics of conscious perception in rapid object recognition. Neuroimage 2024; 296:120668. [PMID: 38848982 DOI: 10.1016/j.neuroimage.2024.120668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 05/23/2024] [Accepted: 06/05/2024] [Indexed: 06/09/2024] Open
Abstract
Our brain excels at recognizing objects, even when they flash by in a rapid sequence. However, the neural processes determining whether a target image in a rapid sequence can be recognized or not remains elusive. We used electroencephalography (EEG) to investigate the temporal dynamics of brain processes that shape perceptual outcomes in these challenging viewing conditions. Using naturalistic images and advanced multivariate pattern analysis (MVPA) techniques, we probed the brain dynamics governing conscious object recognition. Our results show that although initially similar, the processes for when an object can or cannot be recognized diverge around 180 ms post-appearance, coinciding with feedback neural processes. Decoding analyses indicate that gist perception (partial conscious perception) can occur at ∼120 ms through feedforward mechanisms. In contrast, object identification (full conscious perception of the image) is resolved at ∼190 ms after target onset, suggesting involvement of recurrent processing. These findings underscore the importance of recurrent neural connections in object recognition and awareness in rapid visual presentations.
Collapse
Affiliation(s)
- Saba Charmi Motlagh
- Western Center for Brain and Mind, Western University, London, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Marc Joanisse
- Western Center for Brain and Mind, Western University, London, Ontario, Canada; Department of Psychology, Western University, London, Ontario, Canada
| | - Boyu Wang
- Western Center for Brain and Mind, Western University, London, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada; Department of Computer Science, Western University, London, Ontario, Canada
| | - Yalda Mohsenzadeh
- Western Center for Brain and Mind, Western University, London, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada; Department of Computer Science, Western University, London, Ontario, Canada.
| |
Collapse
|
43
|
Ma B, Guo J, De Biase A, van Dijk LV, van Ooijen PMA, Langendijk JA, Both S, Sijtsema NM. PET/CT based transformer model for multi-outcome prediction in oropharyngeal cancer. Radiother Oncol 2024; 197:110368. [PMID: 38834153 DOI: 10.1016/j.radonc.2024.110368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/08/2024] [Accepted: 06/01/2024] [Indexed: 06/06/2024]
Abstract
BACKGROUND AND PURPOSE To optimize our previously proposed TransRP, a model integrating CNN (convolutional neural network) and ViT (Vision Transformer) designed for recurrence-free survival prediction in oropharyngeal cancer and to extend its application to the prediction of multiple clinical outcomes, including locoregional control (LRC), Distant metastasis-free survival (DMFS) and overall survival (OS). MATERIALS AND METHODS Data was collected from 400 patients (300 for training and 100 for testing) diagnosed with oropharyngeal squamous cell carcinoma (OPSCC) who underwent (chemo)radiotherapy at University Medical Center Groningen. Each patient's data comprised pre-treatment PET/CT scans, clinical parameters, and clinical outcome endpoints, namely LRC, DMFS and OS. The prediction performance of TransRP was compared with CNNs when inputting image data only. Additionally, three distinct methods (m1-3) of incorporating clinical predictors into TransRP training and one method (m4) that uses TransRP prediction as one parameter in a clinical Cox model were compared. RESULTS TransRP achieved higher test C-index values of 0.61, 0.84 and 0.70 than CNNs for LRC, DMFS and OS, respectively. Furthermore, when incorporating TransRP's prediction into a clinical Cox model (m4), a higher C-index of 0.77 for OS was obtained. Compared with a clinical routine risk stratification model of OS, our model, using clinical variables, radiomics and TransRP prediction as predictors, achieved larger separations of survival curves between low, intermediate and high risk groups. CONCLUSION TransRP outperformed CNN models for all endpoints. Combining clinical data and TransRP prediction in a Cox model achieved better OS prediction.
Collapse
Affiliation(s)
- Baoqiang Ma
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands.
| | - Jiapan Guo
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Machine Learning Lab, Data Science Center in Health (DASH), Groningen, the Netherlands; Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Groningen, the Netherlands
| | - Alessia De Biase
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Machine Learning Lab, Data Science Center in Health (DASH), Groningen, the Netherlands
| | - Lisanne V van Dijk
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Department of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Peter M A van Ooijen
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Machine Learning Lab, Data Science Center in Health (DASH), Groningen, the Netherlands
| | - Johannes A Langendijk
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Stefan Both
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Nanna M Sijtsema
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| |
Collapse
|
44
|
Lian S, Li Z. An end-to-end multi-task motor imagery EEG classification neural network based on dynamic fusion of spectral-temporal features. Comput Biol Med 2024; 178:108727. [PMID: 38897146 DOI: 10.1016/j.compbiomed.2024.108727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 05/18/2024] [Accepted: 06/07/2024] [Indexed: 06/21/2024]
Abstract
Electroencephalograph (EEG) brain-computer interfaces (BCI) have potential to provide new paradigms for controlling computers and devices. The accuracy of brain pattern classification in EEG BCI is directly affected by the quality of features extracted from EEG signals. Currently, feature extraction heavily relies on prior knowledge to engineer features (for example from specific frequency bands); therefore, better extraction of EEG features is an important research direction. In this work, we propose an end-to-end deep neural network that automatically finds and combines features for motor imagery (MI) based EEG BCI with 4 or more imagery classes (multi-task). First, spectral domain features of EEG signals are learned by compact convolutional neural network (CCNN) layers. Then, gated recurrent unit (GRU) neural network layers automatically learn temporal patterns. Lastly, an attention mechanism dynamically combines (across EEG channels) the extracted spectral-temporal features, reducing redundancy. We test our method using BCI Competition IV-2a and a data set we collected. The average classification accuracy on 4-class BCI Competition IV-2a was 85.1 % ± 6.19 %, comparable to recent work in the field and showing low variability among participants; average classification accuracy on our 6-class data was 64.4 % ± 8.35 %. Our dynamic fusion of spectral-temporal features is end-to-end and has relatively few network parameters, and the experimental results show its effectiveness and potential.
Collapse
Affiliation(s)
- Shidong Lian
- School of Systems Science, Beijing Normal University, Beijing, China; International Academic Center of Complex Systems, Beijing Normal University, Zhuhai, China
| | - Zheng Li
- Center for Cognition and Neuroergonomics, State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Zhuhai, China; Department of Psychology, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai, China.
| |
Collapse
|
45
|
Guo R, Wei J, Sun L, Yu B, Chang G, Liu D, Zhang S, Yao Z, Xu M, Bu L. A survey on advancements in image-text multimodal models: From general techniques to biomedical implementations. Comput Biol Med 2024; 178:108709. [PMID: 38878398 DOI: 10.1016/j.compbiomed.2024.108709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 06/01/2024] [Accepted: 06/03/2024] [Indexed: 07/24/2024]
Abstract
With the significant advancements of Large Language Models (LLMs) in the field of Natural Language Processing (NLP), the development of image-text multimodal models has garnered widespread attention. Current surveys on image-text multimodal models mainly focus on representative models or application domains, but lack a review on how general technical models influence the development of domain-specific models, which is crucial for domain researchers. Based on this, this paper first reviews the technological evolution of image-text multimodal models, from early explorations of feature space to visual language encoding structures, and then to the latest large model architectures. Next, from the perspective of technological evolution, we explain how the development of general image-text multimodal technologies promotes the progress of multimodal technologies in the biomedical field, as well as the importance and complexity of specific datasets in the biomedical domain. Then, centered on the tasks of image-text multimodal models, we analyze their common components and challenges. After that, we summarize the architecture, components, and data of general image-text multimodal models, and introduce the applications and improvements of image-text multimodal models in the biomedical field. Finally, we categorize the challenges faced in the development and application of general models into external factors and intrinsic factors, further refining them into 2 external factors and 5 intrinsic factors, and propose targeted solutions, providing guidance for future research directions. For more details and data, please visit our GitHub page: https://github.com/i2vec/A-survey-on-image-text-multimodal-models.
Collapse
Affiliation(s)
- Ruifeng Guo
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Jingxuan Wei
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Linzhuang Sun
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Bihui Yu
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Guiyong Chang
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Dawei Liu
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Sibo Zhang
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Zhengbing Yao
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Mingjun Xu
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Liping Bu
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, 110168, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
46
|
Quan Q, Yao Q, Zhu H, Wang Q, Zhou SK. Which images to label for few-shot medical image analysis? Med Image Anal 2024; 96:103200. [PMID: 38801797 DOI: 10.1016/j.media.2024.103200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 03/26/2024] [Accepted: 05/06/2024] [Indexed: 05/29/2024]
Abstract
The success of deep learning methodologies hinges upon the availability of meticulously labeled extensive datasets. However, when dealing with medical images, the annotation process for such abundant training data often necessitates the involvement of experienced radiologists, thereby consuming their limited time resources. In order to alleviate this burden, few-shot learning approaches have been developed, which manage to achieve competitive performance levels with only several labeled images. Nevertheless, a crucial yet previously overlooked problem in few-shot learning is about the selection of template images for annotation before learning, which affects the final performance. In this study, we propose a novel TEmplate Choosing Policy (TECP) that aims to identify and select "the most worthy" images for annotation, particularly within the context of multiple few-shot medical tasks, including landmark detection, anatomy detection, and anatomy segmentation. TECP is composed of four integral components: (1) Self-supervised training, which entails training a pre-existing deep model to extract salient features from radiological images; (2) Alternative proposals for localizing informative regions within the images; and (3) Representative Score Estimation, which involves the evaluation and identification of the most representative samples or templates. (4) Ranking, which rank all candidates and select one with highest representative score. The efficacy of the TECP approach is demonstrated through a series of comprehensive experiments conducted on multiple public datasets. Across all three medical tasks, the utilization of TECP yields noticeable improvements in model performance.
Collapse
Affiliation(s)
- Quan Quan
- Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing, 100080, China; University of Chinese Academy of Sciences (UCAS), Beijing, 101408, China
| | - Qingsong Yao
- Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing, 100080, China; University of Chinese Academy of Sciences (UCAS), Beijing, 101408, China
| | - Heqin Zhu
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei, 230026, China
| | - Qiyuan Wang
- School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei, 230026, China
| | - S Kevin Zhou
- Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing, 100080, China; School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei, 230026, China; Center for Medical Imaging, Robotics, Analytic Computing; Learning (MIRACLE), Suzhou Institute for Advance Research, USTC, Suzhou, 215000, China; Key Laboratory of Precision and Intelligent Chemistry, USTC, Hefei, 230026, China.
| |
Collapse
|
47
|
Liu Z, Kainth K, Zhou A, Deyer TW, Fayad ZA, Greenspan H, Mei X. A review of self-supervised, generative, and few-shot deep learning methods for data-limited magnetic resonance imaging segmentation. NMR IN BIOMEDICINE 2024; 37:e5143. [PMID: 38523402 DOI: 10.1002/nbm.5143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/26/2024]
Abstract
Magnetic resonance imaging (MRI) is a ubiquitous medical imaging technology with applications in disease diagnostics, intervention, and treatment planning. Accurate MRI segmentation is critical for diagnosing abnormalities, monitoring diseases, and deciding on a course of treatment. With the advent of advanced deep learning frameworks, fully automated and accurate MRI segmentation is advancing. Traditional supervised deep learning techniques have advanced tremendously, reaching clinical-level accuracy in the field of segmentation. However, these algorithms still require a large amount of annotated data, which is oftentimes unavailable or impractical. One way to circumvent this issue is to utilize algorithms that exploit a limited amount of labeled data. This paper aims to review such state-of-the-art algorithms that use a limited number of annotated samples. We explain the fundamental principles of self-supervised learning, generative models, few-shot learning, and semi-supervised learning and summarize their applications in cardiac, abdomen, and brain MRI segmentation. Throughout this review, we highlight algorithms that can be employed based on the quantity of annotated data available. We also present a comprehensive list of notable publicly available MRI segmentation datasets. To conclude, we discuss possible future directions of the field-including emerging algorithms, such as contrastive language-image pretraining, and potential combinations across the methods discussed-that can further increase the efficacy of image segmentation with limited labels.
Collapse
Affiliation(s)
- Zelong Liu
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Komal Kainth
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Alexander Zhou
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Timothy W Deyer
- East River Medical Imaging, New York, New York, USA
- Department of Radiology, Cornell Medicine, New York, New York, USA
| | - Zahi A Fayad
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Hayit Greenspan
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Xueyan Mei
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
48
|
Wang Z, Wang P, Wang PS, Dong Q, Gao J, Chen S, Xin S, Tu C, Wang W. Neural-IMLS: Self-Supervised Implicit Moving Least-Squares Network for Surface Reconstruction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5018-5033. [PMID: 37289616 DOI: 10.1109/tvcg.2023.3284233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Surface reconstruction is a challenging task when input point clouds, especially real scans, are noisy and lack normals. Observing that the Multilayer Perceptron (MLP) and the implicit moving least-square function (IMLS) provide a dual representation of the underlying surface, we introduce Neural-IMLS, a novel approach that directly learns a noise-resistant signed distance function (SDF) from unoriented raw point clouds in a self-supervised manner. In particular, IMLS regularizes MLP by providing estimated SDFs near the surface and helps enhance its ability to represent geometric details and sharp features, while MLP regularizes IMLS by providing estimated normals. We prove that at convergence, our neural network produces a faithful SDF whose zero-level set approximates the underlying surface due to the mutual learning mechanism between the MLP and the IMLS. Extensive experiments on various benchmarks, including synthetic and real scans, show that Neural-IMLS can reconstruct faithful shapes even with noise and missing parts. The source code can be found at https://github.com/bearprin/Neural-IMLS.
Collapse
|
49
|
Zhuang J, Zeng P, Zhuang W, Guo X, Liu P. Supervertex Sampling Network: A Geodesic Differential SLIC Approach for 3D Mesh. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5553-5565. [PMID: 37440384 DOI: 10.1109/tvcg.2023.3294845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/15/2023]
Abstract
The analysis of 3D meshes with deep learning has become prevalent in computer graphics. As an essential structure, hierarchical representation is critical for mesh pooling in multiscale analysis. Existing clustering-based mesh hierarchy construction methods involve nonlinear discretization optimization operations, making them nondifferential and challenging to embed in other trainable networks for learning. Inspired by deep superpixel learning methods in image processing, we extend them from 2D images to 3D meshes by proposing a novel differentiable chart-based segmentation method named geodesic differential supervertex (GDSV). The key to the GDSV method is to ensure that the geodesic position updates are differentiable while satisfying the constraint that the renewed supervertices lie on the manifold surface. To this end, in addition to using the differential SLIC clustering algorithm to update the nonpositional features of the supervertices, a reparameterization trick, the Gumbel-Softmax trick, is employed to renew the geodesic positions of the supervertices. Therefore, the geodesic position update problem is converted into a linear matrix multiplication issue. The GDSV method can be an independent module for chart-based segmentation tasks. Meanwhile, it can be combined with the front-end feature learning network and the back-end task-specific network as a plug-in-plug-out module for training; and be applied to tasks such as shape classification, part segmentation, and 3D scene understanding. Experimental results show the excellent performance of our proposed algorithm on a range of datasets.
Collapse
|
50
|
Lin Y, Ma J, Sun DW, Cheng JH, Zhou C. Fast real-time monitoring of meat freshness based on fluorescent sensing array and deep learning: From development to deployment. Food Chem 2024; 448:139078. [PMID: 38527403 DOI: 10.1016/j.foodchem.2024.139078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/03/2024] [Accepted: 03/18/2024] [Indexed: 03/27/2024]
Abstract
A fluorescent sensor array (FSA) combined with deep learning (DL) techniques was developed for meat freshness real-time monitoring from development to deployment. The array was made up of copper metal nanoclusters (CuNCs) and fluorescent dyes, having a good ability in the quantitative and qualitative detection of ammonia, dimethylamine, and trimethylamine gases with a low limit of detection (as low as 131.56 ppb) in range of 5 ∼ 1000 ppm and visually monitoring the freshness of various meats stored at 4 °C. Moreover, SqueezeNet was applied to automatically identify the fresh level of meat based on FSA images with high accuracy (98.17 %) and further deployed in various production environments such as personal computers, mobile devices, and websites by using open neural network exchange (ONNX) technique. The entire meat freshness recognition process only takes 5 ∼ 7 s. Furthermore, gradient-weighted class activation mapping (Grad-CAM) and uniform manifold approximation and projection (UMAP) explanatory algorithms were used to improve the interpretability and transparency of SqueezeNet. Thus, this study shows a new idea for FSA assisted with DL in meat freshness intelligent monitoring from development to deployment.
Collapse
Affiliation(s)
- Yuandong Lin
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510641, China; Academy of Contemporary Food Engineering, South China University of Technology, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China; Engineering and Technological Research Centre of Guangdong Province on Intelligent Sensing and Process Control of Cold Chain Foods, & Guangdong Province Engineering Laboratory for Intelligent Cold Chain Logistics Equipment for Agricultural Products, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China
| | - Ji Ma
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510641, China; Academy of Contemporary Food Engineering, South China University of Technology, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China; Engineering and Technological Research Centre of Guangdong Province on Intelligent Sensing and Process Control of Cold Chain Foods, & Guangdong Province Engineering Laboratory for Intelligent Cold Chain Logistics Equipment for Agricultural Products, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China
| | - Da-Wen Sun
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510641, China; Academy of Contemporary Food Engineering, South China University of Technology, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China; Engineering and Technological Research Centre of Guangdong Province on Intelligent Sensing and Process Control of Cold Chain Foods, & Guangdong Province Engineering Laboratory for Intelligent Cold Chain Logistics Equipment for Agricultural Products, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China; Food Refrigeration and Computerized Food Technology (FRCFT), Agriculture and Food Science Centre, University College Dublin, National University of Ireland, Belfield, Dublin 4, Ireland.
| | - Jun-Hu Cheng
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510641, China; Academy of Contemporary Food Engineering, South China University of Technology, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China; Engineering and Technological Research Centre of Guangdong Province on Intelligent Sensing and Process Control of Cold Chain Foods, & Guangdong Province Engineering Laboratory for Intelligent Cold Chain Logistics Equipment for Agricultural Products, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China
| | - Chenyue Zhou
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510641, China; Academy of Contemporary Food Engineering, South China University of Technology, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China; Engineering and Technological Research Centre of Guangdong Province on Intelligent Sensing and Process Control of Cold Chain Foods, & Guangdong Province Engineering Laboratory for Intelligent Cold Chain Logistics Equipment for Agricultural Products, Guangzhou Higher Education Mega Centre, Guangzhou 510006, China
| |
Collapse
|