1
|
Zhang J, Qin Y, Tian R, Bai X, Liu J. Similarity measure method of near-infrared spectrum combined with multi-attribute information. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 322:124783. [PMID: 38972098 DOI: 10.1016/j.saa.2024.124783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 07/01/2024] [Accepted: 07/03/2024] [Indexed: 07/09/2024]
Abstract
Due to the high-dimensionality, redundancy, and non-linearity of the near-infrared (NIR) spectra data, as well as the influence of attributes such as producing area and grade of the sample, which can all affect the similarity measure between samples. This paper proposed a t-distributed stochastic neighbor embedding algorithm based on Sinkhorn distance (St-SNE) combined with multi-attribute data information. Firstly, the Sinkhorn distance was introduced which can solve problems such as KL divergence asymmetry and sparse data distribution in high-dimensional space, thereby constructing probability distributions that make low-dimensional space similar to high-dimensional space. In addition, to address the impact of multi-attribute features of samples on similarity measure, a multi-attribute distance matrix was constructed using information entropy, and then combined with the numerical matrix of spectral data to obtain a mixed data matrix. In order to validate the effectiveness of the St-SNE algorithm, dimensionality reduction projection was performed on NIR spectral data and compared with PCA, LPP, and t-SNE algorithms. The results demonstrated that the St-SNE algorithm effectively distinguishes samples with different attribute information, and produced more distinct projection boundaries of sample category in low-dimensional space. Then we tested the classification performance of St-SNE for different attributes by using the tobacco and mango datasets, and compared it with LPP, t-SNE, UMAP, and Fisher t-SNE algorithms. The results showed that St-SNE algorithm had the highest classification accuracy for different attributes. Finally, we compared the results of searching the most similar sample with the target tobacco for cigarette formulas, and experiments showed that the St-SNE had the highest consistency with the recommendation of the experts than that of the other algorithms. It can provide strong support for the maintenance and design of the product formula.
Collapse
Affiliation(s)
- Jinfeng Zhang
- College of Information Science and Technology, Qingdao University of Science and Technology, China
| | - Yuhua Qin
- College of Information Science and Technology, Qingdao University of Science and Technology, China.
| | - Rongkun Tian
- College of Information Science and Technology, Qingdao University of Science and Technology, China
| | - Xiaoli Bai
- R&D Center, China Tobacco Yunnan Industrial Co., Ltd, No. 367 Hongjin Road, Kunming 650231, China
| | - Jing Liu
- R&D Center, China Tobacco Yunnan Industrial Co., Ltd, No. 367 Hongjin Road, Kunming 650231, China
| |
Collapse
|
2
|
Hans R, Sharma SK, Aickelin U. Optimised deep k-nearest neighbour's based diabetic retinopathy diagnosis(ODeep-NN) using retinal images. Health Inf Sci Syst 2024; 12:23. [PMID: 38469456 PMCID: PMC10924814 DOI: 10.1007/s13755-024-00282-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/18/2024] [Indexed: 03/13/2024] Open
Abstract
Diabetes mellitus has been regarded as one of the prime health issues in present days, which can often lead to diabetic retinopathy, a complication of the disease that affects the eyes, causing loss of vision. For precisely detecting the condition's existence, clinicians are required to recognise the presence of lesions in colour fundus images, making it an arduous and time-consuming task. To deal with this problem, a lot of work has been undertaken to develop deep learning-based computer-aided diagnosis systems that assist clinicians in making accurate diagnoses of the diseases in medical images. Contrariwise, the basic operations involved in deep learning models lead to the extraction of a bulky set of features, further taking a long period of training to predict the existence of the disease. For effective execution of these models, feature selection becomes an important task that aids in selecting the most appropriate features, with an aim to increase the classification accuracy. This research presents an optimised deep k-nearest neighbours'-based pipeline model in a bid to amalgamate the feature extraction capability of deep learning models with nature-inspired metaheuristic algorithms, further using k-nearest neighbour algorithm for classification. The proposed model attains an accuracy of 97.67 and 98.05% on two different datasets considered, outperforming Resnet50 and AlexNet deep learning models. Additionally, the experimental results also portray an analysis of five different nature-inspired metaheuristic algorithms, considered for feature selection on the basis of various evaluation parameters.
Collapse
Affiliation(s)
- Rahul Hans
- Department of Computer Science and Engineering, DAV University, Jalandhar, Punjab India
| | - Sanjeev Kumar Sharma
- Department of Computer Science and Applications, DAV University, Jalandhar, Punjab India
| | - Uwe Aickelin
- School of Computing and Information Systems, University of Melbourne, Melbourne, Australia
| |
Collapse
|
3
|
Tan HQ, Cai J, Tay SH, Sim AY, Huang L, Chua ML, Tang Y. Cluster-based radiomics reveal spatial heterogeneity of bevacizumab response for treatment of radiotherapy-induced cerebral necrosis. Comput Struct Biotechnol J 2024; 23:43-51. [PMID: 38125298 PMCID: PMC10730953 DOI: 10.1016/j.csbj.2023.11.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 11/21/2023] [Accepted: 11/21/2023] [Indexed: 12/23/2023] Open
Abstract
Background Bevacizumab is used in the treatment of radiation necrosis (RN), which is a debilitating toxicity following head and neck radiotherapy. However, there is no biomarker to predict if a patient would respond to bevacizumab. Purpose We aimed to develop a cluster-based radiomics approach to characterize the spatial heterogeneity of RN and map their responses to bevacizumab. Methods 118 consecutive nasopharyngeal carcinoma patients diagnosed with RN were enrolled. We divided 152 lesions from the patients into 101 for training, and 51 for validation. We extracted voxel-level radiomics features from each lesion segmented on T1-weighted+contrast and T2 FLAIR sequences of pre- and post-bevacizumab magnetic resonance images, followed by a three-step analysis involving individual- and population-level clustering, before delta-radiomics to derive five radiomics clusters within the lesions. We tested the association of each cluster with response to bevacizumab and developed a clinico-radiomics model using clinical predictors and cluster-specific features. Results 71 (70.3%) and 34 (66.7%) lesions had responded to bevacizumab in the training and validation datasets, respectively. Two radiomics clusters were spatially mapped to the edema region, and the volume changes were significantly associated with bevacizumab response (OR:11.12 [95% CI: 2.54-73.47], P = 0.004; and 1.63[1.07-2.78], P = 0.042). The combined clinico-radiomics model based on textural features extracted from the most significant cluster improved the prediction of bevacizumab response, compared with a clinical-only model (AUC:0.755 [0.645-0.865] to 0.852 [0.764-0.940], training; 0.708 [0.554-0.861] to 0.816 [0.699-0.933], validation). Conclusion Our radiomics approach yielded intralesional resolution, enabling a more refined feature selection for predicting bevacizumab efficacy in the treatment of RN.
Collapse
Affiliation(s)
- Hong Qi Tan
- Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
| | - Jinhua Cai
- Department of Neurology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
- Guangdong Provincial Key Laboratory of Brain Function and Disease, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Shi Hui Tay
- Division of Medical Sciences, National Cancer Centre Singapore, Singapore
| | - Adelene Y.L. Sim
- Division of Medical Sciences, National Cancer Centre Singapore, Singapore
| | - Luo Huang
- Department of Radiation Oncology, Chongqing University Cancer Hospital, People's Republic of China
| | - Melvin L.K. Chua
- Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
- Division of Medical Sciences, National Cancer Centre Singapore, Singapore
- Oncology Academic Programme, Duke-NUS Medical School, Singapore
| | - Yamei Tang
- Department of Neurology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, People's Republic of China
- Guangdong Provincial Key Laboratory of Brain Function and Disease, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, People's Republic of China
| |
Collapse
|
4
|
Tang J, Du W, Shu Z, Cao Z. A generative benchmark for evaluating the performance of fluorescent cell image segmentation. Synth Syst Biotechnol 2024; 9:627-637. [PMID: 38798889 PMCID: PMC11127598 DOI: 10.1016/j.synbio.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 04/13/2024] [Accepted: 05/08/2024] [Indexed: 05/29/2024] Open
Abstract
Fluorescent cell imaging technology is fundamental in life science research, offering a rich source of image data crucial for understanding cell spatial positioning, differentiation, and decision-making mechanisms. As the volume of this data expands, precise image analysis becomes increasingly critical. Cell segmentation, a key analysis step, significantly influences quantitative analysis outcomes. However, selecting the most effective segmentation method is challenging, hindered by existing evaluation methods' inaccuracies, lack of graded evaluation, and narrow assessment scope. Addressing this, we developed a novel framework with two modules: StyleGAN2-based contour generation and Pix2PixHD-based image rendering, producing diverse, graded-density cell images. Using this dataset, we evaluated three leading cell segmentation methods: DeepCell, CellProfiler, and CellPose. Our comprehensive comparison revealed CellProfiler's superior accuracy in segmenting cytoplasm and nuclei. Our framework diversifies cell image data generation and systematically addresses evaluation challenges in cell segmentation technologies, establishing a solid foundation for advancing research and applications in cell image analysis.
Collapse
Affiliation(s)
- Jun Tang
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China
- MOE Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Wei Du
- MOE Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Zhanpeng Shu
- College of Electrical Engineering, Shanghai Dianji University, Shanghai, 201306, China
| | - Zhixing Cao
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China
| |
Collapse
|
5
|
Wang S, Shen Y, Zeng F, Wang M, Li B, Shen D, Tang X, Wang B. Exploiting biochemical data to improve osteosarcoma diagnosis with deep learning. Health Inf Sci Syst 2024; 12:31. [PMID: 38645838 PMCID: PMC11026331 DOI: 10.1007/s13755-024-00288-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/05/2024] [Indexed: 04/23/2024] Open
Abstract
Early and accurate diagnosis of osteosarcomas (OS) is of great clinical significance, and machine learning (ML) based methods are increasingly adopted. However, current ML-based methods for osteosarcoma diagnosis consider only X-ray images, usually fail to generalize to new cases, and lack explainability. In this paper, we seek to explore the capability of deep learning models in diagnosing primary OS, with higher accuracy, explainability, and generality. Concretely, we analyze the added value of integrating the biochemical data, i.e., alkaline phosphatase (ALP) and lactate dehydrogenase (LDH), and design a model that incorporates the numerical features of ALP and LDH and the visual features of X-ray imaging through a late fusion approach in the feature space. We evaluate this model on real-world clinic data with 848 patients aged from 4 to 81. The experimental results reveal the effectiveness of incorporating ALP and LDH simultaneously in a late fusion approach, with the accuracy of the considered 2608 cases increased to 97.17%, compared to 94.35% in the baseline. Grad-CAM visualizations consistent with orthopedic specialists further justified the model's explainability.
Collapse
Affiliation(s)
- Shidong Wang
- Musculoskeletal Tumor Center, Peking University People’s Hospital, Beijing, China
| | - Yangyang Shen
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Fanwei Zeng
- Musculoskeletal Tumor Center, Peking University People’s Hospital, Beijing, China
| | - Meng Wang
- College of Design and Innovation, Tongji University, Shanghai, China
| | - Bohan Li
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
- Ministry of Industry and Information Technology, Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, China
- National Engineering Laboratory for Integrated Aero-Space-Ground Ocean Big Data Application Technology, Xi’an, China
| | - Dian Shen
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Xiaodong Tang
- Musculoskeletal Tumor Center, Peking University People’s Hospital, Beijing, China
| | - Beilun Wang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| |
Collapse
|
6
|
Demirbaş AA, Üzen H, Fırat H. Spatial-attention ConvMixer architecture for classification and detection of gastrointestinal diseases using the Kvasir dataset. Health Inf Sci Syst 2024; 12:32. [PMID: 38685985 PMCID: PMC11056348 DOI: 10.1007/s13755-024-00290-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
Gastrointestinal (GI) disorders, encompassing conditions like cancer and Crohn's disease, pose a significant threat to public health. Endoscopic examinations have become crucial for diagnosing and treating these disorders efficiently. However, the subjective nature of manual evaluations by gastroenterologists can lead to potential errors in disease classification. In addition, the difficulty of diagnosing diseased tissues in GI and the high similarity between classes made the subject a difficult area. Automated classification systems that use artificial intelligence to solve these problems have gained traction. Automatic detection of diseases in medical images greatly benefits in the diagnosis of diseases and reduces the time of disease detection. In this study, we suggested a new architecture to enable research on computer-assisted diagnosis and automated disease detection in GI diseases. This architecture, called Spatial-Attention ConvMixer (SAC), further developed the patch extraction technique used as the basis of the ConvMixer architecture with a spatial attention mechanism (SAM). The SAM enables the network to concentrate selectively on the most informative areas, assigning importance to each spatial location within the feature maps. We employ the Kvasir dataset to assess the accuracy of classifying GI illnesses using the SAC architecture. We compare our architecture's results with Vanilla ViT, Swin Transformer, ConvMixer, MLPMixer, ResNet50, and SqueezeNet models. Our SAC method gets 93.37% accuracy, while the other architectures get respectively 79.52%, 74.52%, 92.48%, 63.04%, 87.44%, and 85.59%. The proposed spatial attention block improves the accuracy of the ConvMixer architecture on the Kvasir, outperforming the state-of-the-art methods with an accuracy rate of 93.37%.
Collapse
Affiliation(s)
| | - Hüseyin Üzen
- Department of Computer Engineering, Faculty of Engineering, Bingol University, Bingol, Turkey
| | - Hüseyin Fırat
- Department of Computer Engineering, Faculty of Engineering, Dicle University, Diyarbakır, Turkey
| |
Collapse
|
7
|
Sirugue L, Langenfeld F, Lagarde N, Montes M. PLO3S: Protein LOcal Surficial Similarity Screening. Comput Struct Biotechnol J 2024; 26:1-10. [PMID: 38189058 PMCID: PMC10770625 DOI: 10.1016/j.csbj.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/01/2023] [Accepted: 12/03/2023] [Indexed: 01/09/2024] Open
Abstract
The study of protein molecular surfaces enables to better understand and predict protein interactions. Different methods have been developed in computer vision to compare surfaces that can be applied to protein molecular surfaces. The present work proposes a method using the Wave Kernel Signature: Protein LOcal Surficial Similarity Screening (PLO3S). The descriptor of the PLO3S method is a local surface shape descriptor projected on a unit sphere mapped onto a 2D plane and called Surface Wave Interpolated Maps (SWIM). PLO3S allows to rapidly compare protein surface shapes through local comparisons to filter large protein surfaces datasets in protein structures virtual screening protocols.
Collapse
Affiliation(s)
- Léa Sirugue
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Matthieu Montes
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| |
Collapse
|
8
|
Hosseini MS, Bejnordi BE, Trinh VQH, Chan L, Hasan D, Li X, Yang S, Kim T, Zhang H, Wu T, Chinniah K, Maghsoudlou S, Zhang R, Zhu J, Khaki S, Buin A, Chaji F, Salehi A, Nguyen BN, Samaras D, Plataniotis KN. Computational pathology: A survey review and the way forward. J Pathol Inform 2024; 15:100357. [PMID: 38420608 PMCID: PMC10900832 DOI: 10.1016/j.jpi.2023.100357] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 12/21/2023] [Accepted: 12/23/2023] [Indexed: 03/02/2024] Open
Abstract
Computational Pathology (CPath) is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that are mainly address by CPath tools. With evergrowing developments in deep learning and computer vision algorithms, and the ease of the data flow from digital pathology, currently CPath is witnessing a paradigm shift. Despite the sheer volume of engineering and scientific works being introduced for cancer image analysis, there is still a considerable gap of adopting and integrating these algorithms in clinical practice. This raises a significant question regarding the direction and trends that are undertaken in CPath. In this article we provide a comprehensive review of more than 800 papers to address the challenges faced in problem design all-the-way to the application and implementation viewpoints. We have catalogued each paper into a model-card by examining the key works and challenges faced to layout the current landscape in CPath. We hope this helps the community to locate relevant works and facilitate understanding of the field's future directions. In a nutshell, we oversee the CPath developments in cycle of stages which are required to be cohesively linked together to address the challenges associated with such multidisciplinary science. We overview this cycle from different perspectives of data-centric, model-centric, and application-centric problems. We finally sketch remaining challenges and provide directions for future technical developments and clinical integration of CPath. For updated information on this survey review paper and accessing to the original model cards repository, please refer to GitHub. Updated version of this draft can also be found from arXiv.
Collapse
Affiliation(s)
- Mahdi S Hosseini
- Department of Computer Science and Software Engineering (CSSE), Concordia Univeristy, Montreal, QC H3H 2R9, Canada
| | | | - Vincent Quoc-Huy Trinh
- Institute for Research in Immunology and Cancer of the University of Montreal, Montreal, QC H3T 1J4, Canada
| | - Lyndon Chan
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Danial Hasan
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Xingwen Li
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Stephen Yang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Taehyo Kim
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Haochen Zhang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Theodore Wu
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Kajanan Chinniah
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Sina Maghsoudlou
- Department of Computer Science and Software Engineering (CSSE), Concordia Univeristy, Montreal, QC H3H 2R9, Canada
| | - Ryan Zhang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Jiadai Zhu
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Samir Khaki
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Andrei Buin
- Huron Digitial Pathology, St. Jacobs, ON N0B 2N0, Canada
| | - Fatemeh Chaji
- Department of Computer Science and Software Engineering (CSSE), Concordia Univeristy, Montreal, QC H3H 2R9, Canada
| | - Ala Salehi
- Department of Electrical and Computer Engineering, University of New Brunswick, Fredericton, NB E3B 5A3, Canada
| | - Bich Ngoc Nguyen
- University of Montreal Hospital Center, Montreal, QC H2X 0C2, Canada
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, United States
| | - Konstantinos N Plataniotis
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| |
Collapse
|
9
|
Li G, Munawar A, Su Su Win N, Fan M, Zeeshan Nawaz M, Lin L. Multispectral breast image grayscale and quality enhancement by repeated pair image registration & accumulation method. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 320:124558. [PMID: 38870695 DOI: 10.1016/j.saa.2024.124558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 05/27/2024] [Accepted: 05/28/2024] [Indexed: 06/15/2024]
Abstract
Nowadays, for detecting breast cancer in its early stages, the focus is on multispectral transmission imaging. Frame accumulation is a promising technique to enhance the grayscale level of the multispectral transmission images. Still, during the image acquisition process, human respiration or camera jitter causes the displacement of the frame's sequence which leads to the loss of accuracy and image quality of the frame accumulated image is reduced. In this article, we have proposed a new method named "repeated pair image registration and accumulation "to resolve the issue. In this method first pair of images from the sequence is first registered and accumulated followed by the next pair to be registered and accumulated. Then these two accumulated frames are registered and accumulated again. This process is repeated until all the frames from the sequence are processed and the final image is obtained. This method is tested on the sequence of breast frames taken at 600 nm, 620 nm, 670 nm, and 760 nm wavelength of light and proved the enhancement of quality, accuracy, and grayscale by various mathematical assessments. Furthermore, the processing time of our proposed method is very low because descent gradient optimization algorithm is used here for image registration purpose. This optimization algorithm has high speed as compared to other methods and is verified by registering a single image of each wavelength by three different methods. It has laid the foundations of early detection of breast cancer using multispectral transmission imaging.
Collapse
Affiliation(s)
- Gang Li
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Adnan Munawar
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Nan Su Su Win
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Meiling Fan
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Muhammad Zeeshan Nawaz
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
| | - Ling Lin
- Medical School of Tianjin University, Tianjin 300072, China; State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
10
|
Wang M, Chen M, Wang Z, Guo Y, Wu Y, Zhao W, Liu X. Estimating rainfall intensity based on surveillance audio and deep-learning. ENVIRONMENTAL SCIENCE AND ECOTECHNOLOGY 2024; 22:100450. [PMID: 39161573 PMCID: PMC11331698 DOI: 10.1016/j.ese.2024.100450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 08/21/2024]
Abstract
Rainfall data with high spatial and temporal resolutions are essential for urban hydrological modeling. Ubiquitous surveillance cameras can continuously record rainfall events through video and audio, so they have been recognized as potential rain gauges to supplement professional rainfall observation networks. Since video-based rainfall estimation methods can be affected by variable backgrounds and lighting conditions, audio-based approaches could be a supplement without suffering from these conditions. However, most audio-based approaches focus on rainfall-level classification rather than rainfall intensity estimation. Here, we introduce a dataset named Surveillance Audio Rainfall Intensity Dataset (SARID) and a deep learning model for estimating rainfall intensity. First, we created the dataset through audio of six real-world rainfall events. This dataset's audio recordings are segmented into 12,066 pieces and annotated with rainfall intensity and environmental information, such as underlying surfaces, temperature, humidity, and wind. Then, we developed a deep learning-based baseline using Mel-Frequency Cepstral Coefficients (MFCC) and Transformer architecture to estimate rainfall intensity from surveillance audio. Validated from ground truth data, our baseline achieves a root mean absolute error of 0.88 mm h-1 and a coefficient of correlation of 0.765. Our findings demonstrate the potential of surveillance audio-based models as practical and effective tools for rainfall observation systems, initiating a new chapter in rainfall intensity estimation. It offers a novel data source for high-resolution hydrological sensing and contributes to the broader landscape of urban sensing, emergency response, and resilience.
Collapse
Affiliation(s)
- Meizhen Wang
- Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing, 210023, China
- State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, 210023, China
| | - Mingzheng Chen
- Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing, 210023, China
- State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, 210023, China
| | - Ziran Wang
- Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing, 210023, China
- State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, 210023, China
- School of Information Engineering, Nanjing Normal University Taizhou College, Taizhou 225300, China
| | - Yuxuan Guo
- Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing, 210023, China
- State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, 210023, China
| | - Yong Wu
- Institute of Geography, Fujian Normal University, Fuzhou, 350000, China
| | - Wei Zhao
- Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing, 210023, China
- State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, 210023, China
| | - Xuejun Liu
- Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Ministry of Education, Nanjing, 210023, China
- State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing, 210023, China
| |
Collapse
|
11
|
Kim S, Yang H, Kim Y, Hong Y, Park E. Hydra: Multi-head low-rank adaptation for parameter efficient fine-tuning. Neural Netw 2024; 178:106414. [PMID: 38936110 DOI: 10.1016/j.neunet.2024.106414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/12/2024] [Accepted: 05/29/2024] [Indexed: 06/29/2024]
Abstract
The recent surge in large-scale foundation models has spurred the development of efficient methods for adapting these models to various downstream tasks. Low-rank adaptation methods, such as LoRA, have gained significant attention due to their outstanding parameter efficiency and no additional inference latency. This paper investigates a more general form of adapter module based on the analysis that parallel and sequential adaptation branches learn novel and general features during fine-tuning, respectively. The proposed method, named Hydra, combines parallel and sequential branch to integrate capabilities, which is more expressive than existing single branch methods and enables the exploration of a broader range of optimal points in the fine-tuning process. In addition, the proposed method explicitly leverages the pre-trained weights by performing a linear combination of the pre-trained features. It allows the learned features to have better generalization performance across diverse downstream tasks. Furthermore, we perform a comprehensive analysis of the characteristics of each adaptation branch with empirical evidence. Through an extensive range of experiments, we substantiate the efficiency and demonstrate the superior performance of Hydra. This comprehensive evaluation underscores the potential impact and effectiveness of Hydra in a variety of applications. The source code of this work is publicly opened on https://github.com/extremebird/Hydra.
Collapse
Affiliation(s)
- Sanghyeon Kim
- Department of Electrical and Computer Engineering, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea.
| | - Hyunmo Yang
- Department of Artificial Intelligence, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea.
| | - Yunghyun Kim
- Department of Artificial Intelligence, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea.
| | - Youngjoon Hong
- Department of Mathematical Sciences, KAIST, 291, Daehak-ro, Daejeon, 34141, Republic of Korea.
| | - Eunbyung Park
- Department of Electrical and Computer Engineering, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, 2066, Seoubu-ro, Suwon, 16419, Republic of Korea.
| |
Collapse
|
12
|
Chen J, Mei J, Li X, Lu Y, Yu Q, Wei Q, Luo X, Xie Y, Adeli E, Wang Y, Lungren MP, Zhang S, Xing L, Lu L, Yuille A, Zhou Y. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med Image Anal 2024; 97:103280. [PMID: 39096845 DOI: 10.1016/j.media.2024.103280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 06/16/2024] [Accepted: 07/15/2024] [Indexed: 08/05/2024]
Abstract
Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-Net face limitations in modeling long-range dependencies. To address this, Transformers designed for sequence-to-sequence predictions have been integrated into medical image segmentation. However, a comprehensive understanding of Transformers' self-attention in U-Net components is lacking. TransUNet, first introduced in 2021, is widely recognized as one of the first models to integrate Transformer into medical image analysis. In this study, we present the versatile framework of TransUNet that encapsulates Transformers' self-attention into two key modules: (1) a Transformer encoder tokenizing image patches from a convolution neural network (CNN) feature map, facilitating global context extraction, and (2) a Transformer decoder refining candidate regions through cross-attention between proposals and U-Net features. These modules can be flexibly inserted into the U-Net backbone, resulting in three configurations: Encoder-only, Decoder-only, and Encoder+Decoder. TransUNet provides a library encompassing both 2D and 3D implementations, enabling users to easily tailor the chosen architecture. Our findings highlight the encoder's efficacy in modeling interactions among multiple abdominal organs and the decoder's strength in handling small targets like tumors. It excels in diverse medical applications, such as multi-organ segmentation, pancreatic tumor segmentation, and hepatic vessel segmentation. Notably, our TransUNet achieves a significant average Dice improvement of 1.06% and 4.30% for multi-organ segmentation and pancreatic tumor segmentation, respectively, when compared to the highly competitive nn-UNet, and surpasses the top-1 solution in the BrasTS2021 challenge. 2D/3D Code and models are available at https://github.com/Beckschen/TransUNet and https://github.com/Beckschen/TransUNet-3D, respectively.
Collapse
Affiliation(s)
- Jieneng Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jieru Mei
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Xianhang Li
- Department of Computer Science and Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Yongyi Lu
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Qihang Yu
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Qingyue Wei
- Department of Radiation Oncology, Stanford University, Stanford, CA 94305, USA
| | - Xiangde Luo
- Shanghai AI Lab, Xuhui District, Shanghai, 200000, China
| | - Yutong Xie
- The Australian Institute for Machine Learning, University of Adelaide, Australia
| | - Ehsan Adeli
- The School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Yan Wang
- The East China Normal University, Shanghai 200062, China
| | - Matthew P Lungren
- The School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Shaoting Zhang
- Shanghai AI Lab, Xuhui District, Shanghai, 200000, China
| | - Lei Xing
- Department of Radiation Oncology, Stanford University, Stanford, CA 94305, USA
| | - Le Lu
- DAMO Academy, Alibaba Group, New York, NY 10014, USA
| | - Alan Yuille
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Yuyin Zhou
- Department of Computer Science and Engineering, University of California, Santa Cruz, CA 95064, USA.
| |
Collapse
|
13
|
Song M, Wang J, Yu Z, Wang J, Yang L, Lu Y, Li B, Wang X, Wang X, Huang Q, Li Z, Kanellakis NI, Liu J, Wang J, Wang B, Yang J. PneumoLLM: Harnessing the power of large language model for pneumoconiosis diagnosis. Med Image Anal 2024; 97:103248. [PMID: 38941859 DOI: 10.1016/j.media.2024.103248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 06/17/2024] [Accepted: 06/17/2024] [Indexed: 06/30/2024]
Abstract
The conventional pretraining-and-finetuning paradigm, while effective for common diseases with ample data, faces challenges in diagnosing data-scarce occupational diseases like pneumoconiosis. Recently, large language models (LLMs) have exhibits unprecedented ability when conducting multiple tasks in dialogue, bringing opportunities to diagnosis. A common strategy might involve using adapter layers for vision-language alignment and diagnosis in a dialogic manner. Yet, this approach often requires optimization of extensive learnable parameters in the text branch and the dialogue head, potentially diminishing the LLMs' efficacy, especially with limited training data. In our work, we innovate by eliminating the text branch and substituting the dialogue head with a classification head. This approach presents a more effective method for harnessing LLMs in diagnosis with fewer learnable parameters. Furthermore, to balance the retention of detailed image information with progression towards accurate diagnosis, we introduce the contextual multi-token engine. This engine is specialized in adaptively generating diagnostic tokens. Additionally, we propose the information emitter module, which unidirectionally emits information from image tokens to diagnosis tokens. Comprehensive experiments validate the superiority of our methods.
Collapse
Affiliation(s)
- Meiyue Song
- Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, 100005, China; State Key Laboratory of Respiratory Health and Multimorbidity, Beijing, 100005, China
| | - Jiarui Wang
- School of Automation, Northwestern Polytechnical University, Shaanxi, Xi'an 710072, China
| | - Zhihua Yu
- Jinneng Holding Coal Industry Group Co. Ltd Occupational Disease Precaution Clinic, Shanxi, 037001, China
| | - Jiaxin Wang
- School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Le Yang
- School of Electronics and Control Engineering, Chang'an University, Shaanxi, Xi'an 710064, China
| | - Yuting Lu
- School of Automation, Northwestern Polytechnical University, Shaanxi, Xi'an 710072, China
| | - Baicun Li
- Center of Respiratory Medicine, China-Japan Friendship Hospital, National Center for Respiratory Medicine, Institute of Respiratory Medicine, Chinese Academy of Medical Sciences, National Clinical Research Center for Respiratory Diseases, Beijing, 100020, China
| | - Xue Wang
- Department of Respiratory, the Second Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, 150086, China; Internal Medicine, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Xiaoxu Wang
- School of Automation, Northwestern Polytechnical University, Shaanxi, Xi'an 710072, China
| | - Qinghua Huang
- School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, Xi'an 710072, China
| | - Zhijun Li
- Translational Research Center, Shanghai YangZhi Rehabilitation Hospital (Shanghai Sunshine Rehabilitation Center), Shanghai 201619, China; School of Mechanical Engineering, Tongji University, Shanghai 201804, China
| | - Nikolaos I Kanellakis
- Laboratory of Pleural and Lung Cancer Translational Research, CAMS Oxford Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK; Oxford Centre for Respiratory Medicine, Churchill Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK; National Institute for Health Research Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
| | - Jiangfeng Liu
- Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, 100005, China; Plastic Surgery Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100144, China; State Key Laboratory of Common Mechanism Research for Major Diseases, Beijing, 100005, China.
| | - Jing Wang
- Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, 100005, China; State Key Laboratory of Respiratory Health and Multimorbidity, Beijing, 100005, China.
| | - Binglu Wang
- School of Automation, Northwestern Polytechnical University, Shaanxi, Xi'an 710072, China.
| | - Juntao Yang
- Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, 100005, China; Plastic Surgery Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100144, China; State Key Laboratory of Common Mechanism Research for Major Diseases, Beijing, 100005, China
| |
Collapse
|
14
|
Hao Q, Ren R, Niu S, Wang K, Wang M, Zhang J. UGEE-Net: Uncertainty-guided and edge-enhanced network for image splicing localization. Neural Netw 2024; 178:106430. [PMID: 38870563 DOI: 10.1016/j.neunet.2024.106430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 05/29/2024] [Accepted: 06/02/2024] [Indexed: 06/15/2024]
Abstract
Image splicing, a prevalent method for image tampering, has significantly undermined image authenticity. Existing methods for Image Splicing Localization (ISL) struggle with challenges like limited accuracy and subpar performance when dealing with imperceptible tampering and multiple tampered regions. We introduce an Uncertainty-Guided and Edge-Enhanced Network (UGEE-Net) for ISL to tackle these issues. UGEE-Net consists of two core tasks: uncertainty guidance and edge enhancement. We employ Bayesian learning to model uncertainty maps of tampered regions, directing the model's focus to challenging pixels. Simultaneously, we employ a frequency domain-auxiliary edge enhancement strategy to imbue localization features with global contour information and fine-grained local details. These mechanisms work in parallel, synergistically boosting performance. Additionally, we introduce a cross-level fusion and propagation mechanism that effectively utilizes contextual information for cross-layer feature integration and leverages channel-level correlations for cross-layer feature propagation, gradually enhancing the localization feature's details. Experiment results affirm UGEE-Net's superiority in terms of detection accuracy, robustness, and generalization capabilities. Furthermore, to meet the growing demand for high-quality datasets in image forensics, we present the HTSI12K dataset, which includes 12,000 spliced images with imperceptible tampering traces and diverse categories, rendering it suitable for real-world auxiliary model training.
Collapse
Affiliation(s)
- Qixian Hao
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Ruyong Ren
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Shaozhang Niu
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China; Southeast Digital Economy Development Institute, Quzhou 324000, China.
| | - Kai Wang
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Maosen Wang
- Southeast Digital Economy Development Institute, Quzhou 324000, China
| | - Jiwei Zhang
- Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, School of Computer, Beijing University of Posts and Telecommunications, Beijing 100876, China
| |
Collapse
|
15
|
Yue H, Guo J, Yin X, Zhang Y, Zheng S. Salient object detection in low-light RGB-T scene via spatial-frequency cues mining. Neural Netw 2024; 178:106406. [PMID: 38838393 DOI: 10.1016/j.neunet.2024.106406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 01/26/2024] [Accepted: 05/21/2024] [Indexed: 06/07/2024]
Abstract
Low-light conditions pose significant challenges to vision tasks, such as salient object detection (SOD), due to insufficient photons. Light-insensitive RGB-T SOD models mitigate the above problems to some extent, but they are limited in performance as they only focus on spatial feature fusion while ignoring the frequency discrepancy. To this end, we propose an RGB-T SOD model by mining spatial-frequency cues, called SFMNet, for low-light scenes. Our SFMNet consists of spatial-frequency feature exploration (SFFE) modules and spatial-frequency feature interaction (SFFI) modules. To be specific, the SFFE module aims to separate spatial-frequency features and adaptively extract high and low-frequency features. Moreover, the SFFI module integrates cross-modality and cross-domain information to capture effective feature representations. By deploying both modules in a top-down pathway, our method generates high-quality saliency predictions. Furthermore, we construct the first low-light RGB-T SOD dataset as a benchmark for evaluating performance. Extensive experiments demonstrate that our SFMNet can achieve higher accuracy than the existing models for low-light scenes.
Collapse
Affiliation(s)
- Huihui Yue
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Jichang Guo
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Xiangjun Yin
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Yi Zhang
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| | - Sida Zheng
- School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
16
|
Wang X, Wang S, Li J, Li M, Li J, Xu Y. Omnidirectional image super-resolution via position attention network. Neural Netw 2024; 178:106464. [PMID: 38968779 DOI: 10.1016/j.neunet.2024.106464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 03/11/2024] [Accepted: 06/12/2024] [Indexed: 07/07/2024]
Abstract
For convenient transmission, omnidirectional images (ODIs) usually follow the equirectangular projection (ERP) format and are low-resolution. To provide better immersive experience, omnidirectional image super resolution (ODISR) is essential. However, ERP ODIs suffer from serious geometric distortion and pixel stretching across latitudes, generating massive redundant information at high latitudes. This characteristic poses a huge challenge for the traditional SR methods, which can only obtain the suboptimal ODISR performance. To address this issue, we propose a novel position attention network (PAN) for ODISR in this paper. Specifically, a two-branch structure is introduced, in which the basic enhancement branch (BE) serves to achieve coarse deep feature enhancement for extracted shallow features. Meanwhile, the position attention enhancement branch (PAE) builds a positional attention mechanism to dynamically adjust the contribution of features at different latitudes in the ERP representation according to their positions and stretching degrees, which achieves the enhancement for the differentiated information, suppresses the redundant information, and modulate the deep features with spatial distortion. Subsequently, the features of two branches are fused effectively to achieve the further refinement and adapt the distortion characteristic of ODIs. After that, we exploit a long-term memory module (LM), promoting information interactions and fusions between the branches to enhance the perception of the distortion, aggregating the prior hierarchical features to keep the long-term memory and boosting the ODISR performance. Extensive results demonstrate the state-of-the-art performance and the high efficiency of our PAN in ODISR.
Collapse
Affiliation(s)
- Xin Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China; Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Shiqi Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Jinxing Li
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China
| | - Mu Li
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China.
| | - Jinkai Li
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China
| | - Yong Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China; Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China.
| |
Collapse
|
17
|
Qiu H, Ning M, Song Z, Fang W, Chen Y, Sun T, Ma Z, Yuan L, Tian Y. Self-architectural knowledge distillation for spiking neural networks. Neural Netw 2024; 178:106475. [PMID: 38941738 DOI: 10.1016/j.neunet.2024.106475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 05/16/2024] [Accepted: 06/17/2024] [Indexed: 06/30/2024]
Abstract
Spiking neural networks (SNNs) have attracted attention due to their biological plausibility and the potential for low-energy applications on neuromorphic hardware. Two mainstream approaches are commonly used to obtain SNNs, i.e., ANN-to-SNN conversion methods, and Directly-trained-SNN methods. However, the former achieve excellent performance at the cost of a large number of time steps (i.e., latency), while the latter exhibit lower latency but suffers from suboptimal performance. To tackle the performance-latency trade-off, we propose Self-Architectural Knowledge Distillation (SAKD), an intuitive and effective method for SNNs leveraging Knowledge Distillation (KD). We adopt a bilevel teacher-student training strategy in SAKD, i.e., level-1 involves directly transferring same-architectural pre-trained ANN weights to SNNs, and level-2 encourages the SNNs to mimic ANN's behavior, considering both final responses and intermediate features aspects. Learning with informative supervision signals fostered by labels and ANNs, our SAKD achieves new state-of-the-art (SOTA) performance with a few time steps on widely-used classification benchmark datasets. On ImageNet-1K, with only 4 time steps, our Spiking-ResNet34 model attains a Top-1 accuracy of 70.04%, outperforming the previous same-architectural SOTA methods. Notably, our SEW-ResNet152 model reaches a Top-1 accuracy of 77.30% on ImageNet-1K, setting a new SOTA benchmark for SNNs. Furthermore, we apply our SAKD to various dense prediction downstream tasks, such as object detection and semantic segmentation, demonstrating strong generalization ability and superior performance. In conclusion, our proposed SAKD framework presents a promising approach for achieving both high performance and low latency in SNNs, potentially paving the way for future advancements in the field.
Collapse
Affiliation(s)
- Haonan Qiu
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China.
| | - Munan Ning
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China
| | - Zeyin Song
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China
| | - Wei Fang
- Peking University, School of Computer Science, China; PengCheng Laboratory, China
| | - Yanqi Chen
- Peking University, School of Computer Science, China; PengCheng Laboratory, China
| | - Tao Sun
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China
| | | | - Li Yuan
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China; PengCheng Laboratory, China.
| | - Yonghong Tian
- Peking University, School of Electronic and Computer Engineering, Shenzhen Graduate School, China; Peking University, School of Computer Science, China; PengCheng Laboratory, China.
| |
Collapse
|
18
|
Kratsios A, Hong R, Sáez de Ocáriz Borde H. Capacity bounds for hyperbolic neural network representations of latent tree structures. Neural Netw 2024; 178:106420. [PMID: 38901097 DOI: 10.1016/j.neunet.2024.106420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 05/14/2024] [Accepted: 05/29/2024] [Indexed: 06/22/2024]
Abstract
We study the representation capacity of deep hyperbolic neural networks (HNNs) with a ReLU activation function. We establish the first proof that HNNs can ɛ-isometrically embed any finite weighted tree into a hyperbolic space of dimension d at least equal to 2 with prescribed sectional curvature κ<0, for any ɛ>1 (where ɛ=1 being optimal). We establish rigorous upper bounds for the network complexity on an HNN implementing the embedding. We find that the network complexity of HNN implementing the graph representation is independent of the representation fidelity/distortion. We contrast this result against our lower bounds on distortion which any ReLU multi-layer perceptron (MLP) must exert when embedding a tree with L>2d leaves into a d-dimensional Euclidean space, which we show at least Ω(L1/d); independently of the depth, width, and (possibly discontinuous) activation function defining the MLP.
Collapse
Affiliation(s)
- Anastasis Kratsios
- Department of Mathematics, McMaster University, Canada; Vector Institute, Canada.
| | - Ruiyang Hong
- Department of Mathematics, McMaster University, Canada; Vector Institute, Canada.
| | | |
Collapse
|
19
|
Zu W, Xie S, Zhao Q, Li G, Ma L. Embedded prompt tuning: Towards enhanced calibration of pretrained models for medical images. Med Image Anal 2024; 97:103258. [PMID: 38996667 DOI: 10.1016/j.media.2024.103258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/14/2024]
Abstract
Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analysing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at github.com/zuwenqiang/EPT.
Collapse
Affiliation(s)
- Wenqiang Zu
- School of Artificial Intelligence, University of Chinese Academy of Sciences, China; Institute of Automation, Chinese Academy of Sciences, China; Beijing Academy of Artificial Intelligence, China
| | - Shenghao Xie
- Academy for Advanced Interdisciplinary Studies, Peking University, China; School of Cyber Science and Engineering, Wuhan University, China; Beijing Academy of Artificial Intelligence, China
| | - Qing Zhao
- College of Future Technology, National Biomedical Imaging Center, Peking University, China
| | - Guoqi Li
- Institute of Automation, Chinese Academy of Sciences, China
| | - Lei Ma
- College of Future Technology, National Biomedical Imaging Center, Peking University, China; Beijing Academy of Artificial Intelligence, China.
| |
Collapse
|
20
|
Han P, Zhang F, Zhao B, Li X. Motion-Aware Video Frame Interpolation. Neural Netw 2024; 178:106433. [PMID: 38941737 DOI: 10.1016/j.neunet.2024.106433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 05/03/2024] [Accepted: 06/03/2024] [Indexed: 06/30/2024]
Abstract
Video frame interpolation methodologies endeavor to create novel frames betwixt extant ones, with the intent of augmenting the video's frame frequency. However, current methods are prone to image blurring and spurious artifacts in challenging scenarios involving occlusions and discontinuous motion. Moreover, they typically rely on optical flow estimation, which adds complexity to modeling and computational costs. To address these issues, we introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames by introducing a novel hierarchical pyramid module. It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, enabling the model to capture intricate motion patterns, but also effectively reduces the required computational cost and complexity. Subsequently, a cross-scale motion structure is presented to estimate and refine intermediate flow maps by the extracted features. This approach facilitates the interplay between input frame features and flow maps during the frame interpolation process and markedly heightens the precision of the intervening flow delineations. Finally, a discerningly fashioned loss centered around an intermediate flow is meticulously contrived, serving as a deft rudder to skillfully guide the prognostication of said intermediate flow, thereby substantially refining the precision of the intervening flow mappings. Experiments illustrate that MA-VFI surpasses several representative VFI methods across various datasets, and can enhance efficiency while maintaining commendable efficacy.
Collapse
Affiliation(s)
- Pengfei Han
- School of Cybersecurity, Northwestern Polytechnical University, Xi'an, China; School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, China.
| | - Fuhua Zhang
- School of Electrical and Information Engineering, Hunan University, Changsha, China.
| | - Bin Zhao
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, China.
| | - Xuelong Li
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
21
|
Yang Y, Wang P, He X, Zou D. GRAM: An interpretable approach for graph anomaly detection using gradient attention maps. Neural Netw 2024; 178:106463. [PMID: 38908167 DOI: 10.1016/j.neunet.2024.106463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 04/13/2024] [Accepted: 06/12/2024] [Indexed: 06/24/2024]
Abstract
Detecting unusual patterns in graph data is a crucial task in data mining. However, existing methods face challenges in consistently achieving satisfactory performance and often lack interpretability, which hinders our understanding of anomaly detection decisions. In this paper, we propose a novel approach to graph anomaly detection that leverages the power of interpretability to enhance performance. Specifically, our method extracts an attention map derived from gradients of graph neural networks, which serves as a basis for scoring anomalies. Notably, our approach is flexible and can be used in various anomaly detection settings. In addition, we conduct theoretical analysis using synthetic data to validate our method and gain insights into its decision-making process. To demonstrate the effectiveness of our method, we extensively evaluate our approach against state-of-the-art graph anomaly detection techniques on real-world graph classification and wireless network datasets. The results consistently demonstrate the superior performance of our method compared to the baselines.
Collapse
Affiliation(s)
- Yifei Yang
- Electronic Information School, Wuhan University, Hubei, China; Data Science Research Center, Duke Kunshan University, Jiangsu, China.
| | - Peng Wang
- Data Science Research Center, Duke Kunshan University, Jiangsu, China.
| | - Xiaofan He
- Electronic Information School, Wuhan University, Hubei, China.
| | - Dongmian Zou
- Data Science Research Center, Duke Kunshan University, Jiangsu, China.
| |
Collapse
|
22
|
Peng H, Lin S, King D, Su YH, Abuzeid WM, Bly RA, Moe KS, Hannaford B. Reducing annotating load: Active learning with synthetic images in surgical instrument segmentation. Med Image Anal 2024; 97:103246. [PMID: 38943835 DOI: 10.1016/j.media.2024.103246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/28/2024] [Accepted: 06/17/2024] [Indexed: 07/01/2024]
Abstract
Accurate instrument segmentation in the endoscopic vision of minimally invasive surgery is challenging due to complex instruments and environments. Deep learning techniques have shown competitive performance in recent years. However, deep learning usually requires a large amount of labeled data to achieve accurate prediction, which poses a significant workload. To alleviate this workload, we propose an active learning-based framework to generate synthetic images for efficient neural network training. In each active learning iteration, a small number of informative unlabeled images are first queried by active learning and manually labeled. Next, synthetic images are generated based on these selected images. The instruments and backgrounds are cropped out and randomly combined with blending and fusion near the boundary. The proposed method leverages the advantage of both active learning and synthetic images. The effectiveness of the proposed method is validated on two sinus surgery datasets and one intraabdominal surgery dataset. The results indicate a considerable performance improvement, especially when the size of the annotated dataset is small. All the code is open-sourced at: https://github.com/HaonanPeng/active_syn_generator.
Collapse
Affiliation(s)
- Haonan Peng
- University of Washington, 185 E Stevens Way NE AE100R, Seattle, WA 98195, USA.
| | - Shan Lin
- University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Daniel King
- University of Washington, 185 E Stevens Way NE AE100R, Seattle, WA 98195, USA
| | - Yun-Hsuan Su
- Mount Holyoke College, 50 College St, South Hadley, MA 01075, USA
| | - Waleed M Abuzeid
- University of Washington, 185 E Stevens Way NE AE100R, Seattle, WA 98195, USA
| | - Randall A Bly
- University of Washington, 185 E Stevens Way NE AE100R, Seattle, WA 98195, USA
| | - Kris S Moe
- University of Washington, 185 E Stevens Way NE AE100R, Seattle, WA 98195, USA
| | - Blake Hannaford
- University of Washington, 185 E Stevens Way NE AE100R, Seattle, WA 98195, USA
| |
Collapse
|
23
|
Sun S, Han K, You C, Tang H, Kong D, Naushad J, Yan X, Ma H, Khosravi P, Duncan JS, Xie X. Medical image registration via neural fields. Med Image Anal 2024; 97:103249. [PMID: 38963972 DOI: 10.1016/j.media.2024.103249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 05/24/2024] [Accepted: 06/21/2024] [Indexed: 07/06/2024]
Abstract
Image registration is an essential step in many medical image analysis tasks. Traditional methods for image registration are primarily optimization-driven, finding the optimal deformations that maximize the similarity between two images. Recent learning-based methods, trained to directly predict transformations between two images, run much faster, but suffer from performance deficiencies due to domain shift. Here we present a new neural network based image registration framework, called NIR (Neural Image Registration), which is based on optimization but utilizes deep neural networks to model deformations between image pairs. NIR represents the transformation between two images with a continuous function implemented via neural fields, receiving a 3D coordinate as input and outputting the corresponding deformation vector. NIR provides two ways of generating deformation field: directly output a displacement vector field for general deformable registration, or output a velocity vector field and integrate the velocity field to derive the deformation field for diffeomorphic image registration. The optimal registration is discovered by updating the parameters of the neural field via stochastic mini-batch gradient descent. We describe several design choices that facilitate model optimization, including coordinate encoding, sinusoidal activation, coordinate sampling, and intensity sampling. NIR is evaluated on two 3D MR brain scan datasets, demonstrating highly competitive performance in terms of both registration accuracy and regularity. Compared to traditional optimization-based methods, our approach achieves better results in shorter computation times. In addition, our methods exhibit performance on a cross-dataset registration task, compared to the pre-trained learning-based methods.
Collapse
Affiliation(s)
- Shanlin Sun
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Kun Han
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Chenyu You
- Yale University, New Haven, CT 06520, USA.
| | - Hao Tang
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Deying Kong
- University of California, Irvine, Irvine, CA 92697, USA.
| | | | - Xiangyi Yan
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Haoyu Ma
- University of California, Irvine, Irvine, CA 92697, USA.
| | - Pooya Khosravi
- University of California, Irvine, Irvine, CA 92697, USA.
| | | | - Xiaohui Xie
- University of California, Irvine, Irvine, CA 92697, USA.
| |
Collapse
|
24
|
Molina-Moreno M, González-Díaz I, Mikut R, Díaz-de-María F. A self-supervised embedding of cell migration features for behavior discovery over cell populations. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 255:108337. [PMID: 39067139 DOI: 10.1016/j.cmpb.2024.108337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 04/26/2024] [Accepted: 07/17/2024] [Indexed: 07/30/2024]
Abstract
BACKGROUND AND OBJECTIVE Recent studies point out that the dynamics and interaction of cell populations within their environment are related to several biological processes in immunology. Hence, single-cell analysis in immunology now relies on spatial omics. Moreover, recent literature suggests that immunology scenarios are hierarchically organized, including unknown cell behaviors appearing in different proportions across some observable control and therapy groups. These dynamic behaviors play a crucial role in identifying the causes of processes such as inflammation, aging, and fighting off pathogens or cancerous cells. In this work, we use a self-supervised learning approach to discover these behaviors associated with cell dynamics in an immunology scenario. MATERIALS AND METHODS Specifically, we study the different responses of control group and therapy groups in a scenario involving inflammation due to infarct, with a focus on neutrophil migration within blood vessels. Starting from a set of hand-crafted spatio-temporal features, we use a recurrent neural network to generate embeddings that properly describe the dynamics of the migration processes. The network is trained using a novel multi-task contrastive loss that, on the one hand, models the hierarchical structure of our scenario (groups-behaviors-samples) and, on the other, ensures temporal consistency within the embedding, enforcing that subsequent temporal samples obtained from a given cell stay close in the latent space. RESULTS Our experimental results demonstrate that the resulting embeddings improve the separability of cell behaviors and log-likelihood of the therapies, when compared to the hand-crafted feature extraction and recent methods from the state of the art, even with dimensionality reduction (16 vs. 21 hand-crafted features). CONCLUSIONS Our approach enables single-cell analyses at a population level, being able to automatically discover shared behaviors among different groups. This, in turn, enables the prediction of the therapy effectiveness based on their proportions within a study group.
Collapse
Affiliation(s)
- Miguel Molina-Moreno
- Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Avda. de la Universidad, 30, Leganés, 28911, Spain; Department of Immunobiology, Yale University, Amistad Street Building, 10 Amistad St, New Haven, 06520, USA.
| | - Iván González-Díaz
- Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Avda. de la Universidad, 30, Leganés, 28911, Spain.
| | - Ralf Mikut
- Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz, 1, Eggenstein-Leopoldshafen, 76344, Baden-Württemberg, Germany.
| | - Fernando Díaz-de-María
- Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Avda. de la Universidad, 30, Leganés, 28911, Spain.
| |
Collapse
|
25
|
Zhao C, Cai W, Hu C, Yuan Z. Cycle contrastive adversarial learning with structural consistency for unsupervised high-quality image deraining transformer. Neural Netw 2024; 178:106428. [PMID: 38901091 DOI: 10.1016/j.neunet.2024.106428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 04/18/2024] [Accepted: 06/02/2024] [Indexed: 06/22/2024]
Abstract
In overcoming the challenges faced in adapting to paired real-world data, recent unsupervised single image deraining (SID) methods have proven capable of accomplishing notably acceptable deraining performance. However, the previous methods usually fail to produce a high quality rain-free image due to neglecting sufficient attention to semantic representation and the image content, which results in the inability to completely separate the content from the rain layer. In this paper, we develop a novel cycle contrastive adversarial framework for unsupervised SID, which mainly consists of cycle contrastive learning (CCL) and location contrastive learning (LCL). Specifically, CCL achieves high-quality image reconstruction and rain-layer stripping by pulling similar features together while pushing dissimilar features further in both semantic and discriminant latent spaces. Meanwhile, LCL implicitly constrains the mutual information of the same location of different exemplars to maintain the content information. In addition, recently inspired by the powerful Segment Anything Model (SAM) that can effectively extract widely applicable semantic structural details, we formulate a structural-consistency regularization to fine-tune our network using SAM. Apart from this, we attempt to introduce vision transformer (VIT) into our network architecture to further improve the performance. In our designed transformer-based GAN, to obtain a stronger representation, we propose a multi-layer channel compression attention module (MCCAM) to extract a richer feature. Equipped with the above techniques, our proposed unsupervised SID algorithm, called CCLformer, can show advantageous image deraining performance. Extensive experiments demonstrate both the superiority of our method and the effectiveness of each module in CCLformer. The code is available at https://github.com/zhihefang/CCLGAN.
Collapse
Affiliation(s)
- Chen Zhao
- School of Artificial Intelligence, Nanjing Normal University, Nanjing, 210023, China.
| | - Weiling Cai
- School of Artificial Intelligence, Nanjing Normal University, Nanjing, 210023, China.
| | - Chengwei Hu
- School of Artificial Intelligence, Nanjing Normal University, Nanjing, 210023, China.
| | - Zheng Yuan
- School of Artificial Intelligence, Nanjing Normal University, Nanjing, 210023, China.
| |
Collapse
|
26
|
Thomas CI, Ryan MA, McNabb MC, Kamasawa N, Scholl B. Astrocyte coverage of excitatory synapses correlates to measures of synapse structure and function in ferret primary visual cortex. Glia 2024; 72:1785-1800. [PMID: 38856149 PMCID: PMC11324397 DOI: 10.1002/glia.24582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 05/25/2024] [Accepted: 06/02/2024] [Indexed: 06/11/2024]
Abstract
Most excitatory synapses in the mammalian brain are contacted or ensheathed by astrocyte processes, forming tripartite synapses. Astrocytes are thought to be critical regulators of the structural and functional dynamics of synapses. While the degree of synaptic coverage by astrocytes is known to vary across brain regions and animal species, the reason for and implications of this variability remains unknown. Further, how astrocyte coverage of synapses relates to in vivo functional properties of individual synapses has not been investigated. Here, we characterized astrocyte coverage of synapses of pyramidal neurons in the ferret visual cortex and, using correlative light and electron microscopy, examined their relationship to synaptic strength and sensory-evoked Ca2+ activity. Nearly, all synapses were contacted by astrocytes, and most were contacted along the axon-spine interface. Structurally, we found that the degree of synaptic astrocyte coverage directly scaled with synapse size and postsynaptic density complexity. Functionally, we found that the amount of astrocyte coverage scaled with how selectively a synapse responds to a particular visual stimulus and, at least for the largest synapses, scaled with the reliability of visual stimuli to evoke postsynaptic Ca2+ events. Our study shows astrocyte coverage is highly correlated with structural metrics of synaptic strength of excitatory synapses in the visual cortex and demonstrates a previously unknown relationship between astrocyte coverage and reliable sensory activation.
Collapse
Affiliation(s)
- Connon I Thomas
- Electron Microscopy Core Facility, Max Planck Florida Institute for Neuroscience, Jupiter, Florida, USA
| | - Melissa A Ryan
- Electron Microscopy Core Facility, Max Planck Florida Institute for Neuroscience, Jupiter, Florida, USA
| | - Micaiah C McNabb
- Electron Microscopy Core Facility, Max Planck Florida Institute for Neuroscience, Jupiter, Florida, USA
| | - Naomi Kamasawa
- Electron Microscopy Core Facility, Max Planck Florida Institute for Neuroscience, Jupiter, Florida, USA
| | - Benjamin Scholl
- Department of Physiology and Biophysics, University of Colorado Denver, Aurora, Colorado, USA
| |
Collapse
|
27
|
Ritter C, Lee JY, Pham MT, Pabba MK, Cardoso MC, Bartenschlager R, Rohr K. Multi-detector fusion and Bayesian smoothing for tracking viral and chromatin structures. Med Image Anal 2024; 97:103227. [PMID: 38897031 DOI: 10.1016/j.media.2024.103227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 08/15/2023] [Accepted: 05/27/2024] [Indexed: 06/21/2024]
Abstract
Automatic tracking of viral and intracellular structures displayed as spots with varying sizes in fluorescence microscopy images is an important task to quantify cellular processes. We propose a novel probabilistic tracking approach for multiple particle tracking based on multi-detector and multi-scale data fusion as well as Bayesian smoothing. The approach integrates results from multiple detectors using a novel intensity-based covariance intersection method which takes into account information about the image intensities, positions, and uncertainties. The method ensures a consistent estimate of multiple fused particle detections and does not require an optimization step. Our probabilistic tracking approach performs data fusion of detections from classical and deep learning methods as well as exploits single-scale and multi-scale detections. In addition, we use Bayesian smoothing to fuse information of predictions from both past and future time points. We evaluated our approach using image data of the Particle Tracking Challenge and achieved state-of-the-art results or outperformed previous methods. Our method was also assessed on challenging live cell fluorescence microscopy image data of viral and cellular proteins expressed in hepatitis C virus-infected cells and chromatin structures in non-infected cells, acquired at different spatial-temporal resolutions. We found that the proposed approach outperforms existing methods.
Collapse
Affiliation(s)
- C Ritter
- Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, Heidelberg, Germany.
| | - J-Y Lee
- Department of Infectious Diseases, Molecular Virology, Heidelberg University, Im Neuenheimer Feld 344, Heidelberg, Germany; German Center for Infection Research (DZIF), Heidelberg Partner Site, Germany
| | - M-T Pham
- Department of Infectious Diseases, Molecular Virology, Heidelberg University, Im Neuenheimer Feld 344, Heidelberg, Germany; German Center for Infection Research (DZIF), Heidelberg Partner Site, Germany
| | - M K Pabba
- Department of Biology, Cell Biology and Epigenetics, Technical University of Darmstadt, Schnittspahnstraße 10, Darmstadt, Germany
| | - M C Cardoso
- Department of Biology, Cell Biology and Epigenetics, Technical University of Darmstadt, Schnittspahnstraße 10, Darmstadt, Germany
| | - R Bartenschlager
- Department of Infectious Diseases, Molecular Virology, Heidelberg University, Im Neuenheimer Feld 344, Heidelberg, Germany; German Center for Infection Research (DZIF), Heidelberg Partner Site, Germany
| | - K Rohr
- Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Im Neuenheimer Feld 267, Heidelberg, Germany.
| |
Collapse
|
28
|
Zhou H, Zhong P, Li D, Shen Z. Unsupervised domain adaptation with weak source domain labels via bidirectional subdomain alignment. Neural Netw 2024; 178:106418. [PMID: 38850639 DOI: 10.1016/j.neunet.2024.106418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 03/22/2024] [Accepted: 05/29/2024] [Indexed: 06/10/2024]
Abstract
Unsupervised domain adaptation (UDA) enables knowledge transfer from a labeled source domain to an unlabeled target domain. However, UDA performance often relies heavily on the accuracy of source domain labels, which are frequently noisy or missing in real applications. To address unreliable source labels, we propose a novel framework for extracting robust, discriminative features via iterative pseudo-labeling, queue-based clustering, and bidirectional subdomain alignment (BSA). The proposed framework begins by generating pseudo-labels for unlabeled source data and constructing codebooks via iterative clustering to obtain label-independent class centroids. Then, the proposed framework performs two main tasks: rectifying features from both domains using BSA to match subdomain distributions and enhance features; and employing a two-stage adversarial process for global feature alignment. The feature rectification is done before feature enhancement, while the global alignment is done after feature enhancement. To optimize our framework, we formulate BSA and adversarial learning as maximizing a log-likelihood function, which is implemented via the Expectation-Maximization algorithm. The proposed framework shows significant improvements compared to state-of-the-art methods on Office-31, Office-Home, and VisDA-2017 datasets, achieving average accuracies of 91.5%, 76.6%, and 87.4%, respectively. Compared to existing methods, the proposed method shows consistent superiority in unsupervised domain adaptation tasks with both fully and weakly labeled source domains.
Collapse
Affiliation(s)
- Heng Zhou
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China; National Innovation Center for Digital Fishery, Beijing, China; Key Laboratory of Smart Farming Technologies for Aquatic Animal and Livestock, Ministry of Agriculture and Rural Affairs, Beijing, China; Beijing Engineering and Technology Research Center for Internet of Things in Agriculture, Beijing, China.
| | - Ping Zhong
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China; National Innovation Center for Digital Fishery, Beijing, China; Key Laboratory of Smart Farming Technologies for Aquatic Animal and Livestock, Ministry of Agriculture and Rural Affairs, Beijing, China; Beijing Engineering and Technology Research Center for Internet of Things in Agriculture, Beijing, China.
| | - Daoliang Li
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China; National Innovation Center for Digital Fishery, Beijing, China; Key Laboratory of Smart Farming Technologies for Aquatic Animal and Livestock, Ministry of Agriculture and Rural Affairs, Beijing, China; Beijing Engineering and Technology Research Center for Internet of Things in Agriculture, Beijing, China.
| | - Zhencai Shen
- National Innovation Center for Digital Fishery, Beijing, China; Key Laboratory of Smart Farming Technologies for Aquatic Animal and Livestock, Ministry of Agriculture and Rural Affairs, Beijing, China; Beijing Engineering and Technology Research Center for Internet of Things in Agriculture, Beijing, China; College of Science, China Agricultural University, Beijing, 100083, China.
| |
Collapse
|
29
|
Pham VT, Zniyed Y, Nguyen TP. Efficient tensor decomposition-based filter pruning. Neural Netw 2024; 178:106393. [PMID: 38830300 DOI: 10.1016/j.neunet.2024.106393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/16/2024] [Accepted: 05/15/2024] [Indexed: 06/05/2024]
Abstract
In this paper, we present CORING, which is short for effiCient tensOr decomposition-based filteR prunING, a novel filter pruning methodology for neural networks. CORING is crafted to achieve efficient tensor decomposition-based pruning, a stark departure from conventional approaches that rely on vectorized or matricized filter representations. Our approach represents a significant leap forward in the field by introducing tensor decompositions, specifically the HOSVD, which preserves the multidimensional nature of filters while providing a low-rank approximation, thus substantially reducing complexity. Furthermore, we introduce a versatile method for calculating filter similarity by using the low-rank approximation offered by the HOSVD. This obviates the need for using full filters or reshaped versions and enhances the overall efficiency and effectiveness of our approach. Extensive experimentation across diverse architectures and datasets spanning various vision tasks, including image classification, object detection, instance segmentation, and keypoint detection, validates CORING's prowess. Remarkably, it outperforms state-of-the-art methods in reducing MACs and parameters, consistently enhancing validation accuracy. Furthermore, we supplement our quantitative results with a comprehensive ablation study, providing substantial evidence of the efficiency of our tensor-based approach. Beyond quantitative outcomes, qualitative results vividly illustrate CORING's ability to retain essential features within pruned neural networks. Our code is available for research purposes.
Collapse
Affiliation(s)
- Van Tien Pham
- Université de Toulon, Aix Marseille University, CNRS, LIS UMR 7020, France.
| | - Yassine Zniyed
- Université de Toulon, Aix Marseille University, CNRS, LIS UMR 7020, France.
| | | |
Collapse
|
30
|
Reale-Nosei G, Amador-Domínguez E, Serrano E. From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation. Med Image Anal 2024; 97:103264. [PMID: 39013207 DOI: 10.1016/j.media.2024.103264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 04/25/2024] [Accepted: 07/01/2024] [Indexed: 07/18/2024]
Abstract
Natural Image Captioning (NIC) is an interdisciplinary research area that lies within the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several works have been presented on the subject, ranging from the early template-based approaches to the more recent deep learning-based methods. This paper conducts a survey in the area of NIC, especially focusing on its applications for Medical Image Captioning (MIC) and Diagnostic Captioning (DC) in the field of radiology. A review of the state-of-the-art is conducted summarizing key research works in NIC and DC to provide a wide overview on the subject. These works include existing NIC and MIC models, datasets, evaluation metrics, and previous reviews in the specialized literature. The revised work is thoroughly analyzed and discussed, highlighting the limitations of existing approaches and their potential implications in real clinical practice. Similarly, future potential research lines are outlined on the basis of the detected limitations.
Collapse
Affiliation(s)
- Gabriel Reale-Nosei
- ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| | - Elvira Amador-Domínguez
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Departamento de Sistemas Informáticos, ETSI Sistemas Informáticos, Universidad Politécnica de Madrid, 28031 Madrid, Spain.
| | - Emilio Serrano
- Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| |
Collapse
|
31
|
Di Lernia D, Finotti G, Tsakiris M, Riva G, Naber M. Remote photoplethysmography (rPPG) in the wild: Remote heart rate imaging via online webcams. Behav Res Methods 2024; 56:6904-6914. [PMID: 38632165 DOI: 10.3758/s13428-024-02398-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/12/2024] [Indexed: 04/19/2024]
Abstract
Remote photoplethysmography (rPPG) is a low-cost technique to measure physiological parameters such as heart rate by analyzing videos of a person. There has been growing attention to this technique due to the increased possibilities and demand for running psychological experiments on online platforms. Technological advancements in commercially available cameras and video processing algorithms have led to significant progress in this field. However, despite these advancements, past research indicates that suboptimal video recording conditions can severely compromise the accuracy of rPPG. In this study, we aimed to develop an open-source rPPG methodology and test its performance on videos collected via an online platform, without control of the hardware of the participants and the contextual variables, such as illumination, distance, and motion. Across two experiments, we compared the results of the rPPG extraction methodology to a validated dataset used for rPPG testing. Furthermore, we then collected 231 online video recordings and compared the results of the rPPG extraction to finger pulse oximeter data acquired with a validated mobile heart rate application. Results indicated that the rPPG algorithm was highly accurate, showing a significant degree of convergence with both datasets thus providing an improved tool for recording and analyzing heart rate in online experiments.
Collapse
Affiliation(s)
- Daniele Di Lernia
- Humane Technology Lab, Università Cattolica del Sacro Cuore, Largo Gemelli, 1, 20100, Milan, Italy.
- Applied Technology for Neuro-Psychology Lab, IRCCS Istituto Auxologico Italiano, Via Magnasco, 2, 20149, Milan, Italy.
- Department of Psychology, Università Cattolica del Sacro Cuore, Largo Gemelli, 1, 20100, Milan, Italy.
| | - Gianluca Finotti
- Lab of Action and Body, Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
| | - Manos Tsakiris
- Lab of Action and Body, Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX, UK
- Centre for the Politics of Feelings, School of Advanced Study, University of London, London, UK
| | - Giuseppe Riva
- Humane Technology Lab, Università Cattolica del Sacro Cuore, Largo Gemelli, 1, 20100, Milan, Italy
- Applied Technology for Neuro-Psychology Lab, IRCCS Istituto Auxologico Italiano, Via Magnasco, 2, 20149, Milan, Italy
| | - Marnix Naber
- Experimental Psychology, Helmholtz Institute, Utrecht University, Heidelberglaan 1, 3584CS, Utrecht, The Netherlands
| |
Collapse
|
32
|
Kang Q, Lao Q, Gao J, Liu J, Yi H, Ma B, Zhang X, Li K. Deblurring masked image modeling for ultrasound image analysis. Med Image Anal 2024; 97:103256. [PMID: 39047605 DOI: 10.1016/j.media.2024.103256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 03/19/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024]
Abstract
Recently, large pretrained vision foundation models based on masked image modeling (MIM) have attracted unprecedented attention and achieved remarkable performance across various tasks. However, the study of MIM for ultrasound imaging remains relatively unexplored, and most importantly, current MIM approaches fail to account for the gap between natural images and ultrasound, as well as the intrinsic imaging characteristics of the ultrasound modality, such as the high noise-to-signal ratio. In this paper, motivated by the unique high noise-to-signal ratio property in ultrasound, we propose a deblurring MIM approach specialized to ultrasound, which incorporates a deblurring task into the pretraining proxy task. The incorporation of deblurring facilitates the pretraining to better recover the subtle details within ultrasound images that are vital for subsequent downstream analysis. Furthermore, we employ a multi-scale hierarchical encoder to extract both local and global contextual cues for improved performance, especially on pixel-wise tasks such as segmentation. We conduct extensive experiments involving 280,000 ultrasound images for the pretraining and evaluate the downstream transfer performance of the pretrained model on various disease diagnoses (nodule, Hashimoto's thyroiditis) and task types (classification, segmentation). The experimental results demonstrate the efficacy of the proposed deblurring MIM, achieving state-of-the-art performance across a wide range of downstream tasks and datasets. Overall, our work highlights the potential of deblurring MIM for ultrasound image analysis, presenting an ultrasound-specific vision foundation model.
Collapse
Affiliation(s)
- Qingbo Kang
- Department of Ultrasonography, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China
| | - Qicheng Lao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China.
| | - Jun Gao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; College of Computer Science, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Jingyan Liu
- Department of Ultrasonography, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Huahui Yi
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Buyun Ma
- Department of Ultrasonography, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xiaofan Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China; Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Kang Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China.
| |
Collapse
|
33
|
Wang Z, Zou H, Guo Y, Guo S, Zhao X, Wang Y, Sun M. Retinal image registration method for myopia development. Med Image Anal 2024; 97:103242. [PMID: 38901099 DOI: 10.1016/j.media.2024.103242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 04/23/2024] [Accepted: 06/10/2024] [Indexed: 06/22/2024]
Abstract
OBJECTIVE The development of myopia is usually accompanied by changes in retinal vessels, optic disc, optic cup, fovea, and other retinal structures as well as the length of the ocular axis. And the accurate registration of retinal images is very important for the extraction and analysis of retinal structural changes. However, the registration of retinal images with myopia development faces a series of challenges, due to the unique curved surface of the retina, as well as the changes in fundus curvature caused by ocular axis elongation. Therefore, our goal is to improve the registration accuracy of the retinal images with myopia development. METHOD In this study, we propose a 3D spatial model for the pair of retinal images with myopia development. In this model, we introduce a novel myopia development model that simulates the changes in the length of ocular axis and fundus curvature due to the development of myopia. We also consider the distortion model of the fundus camera during the imaging process. Based on the 3D spatial model, we further implement a registration framework, which utilizes corresponding points in the pair of retinal images to achieve registration in the way of 3D pose estimation. RESULTS The proposed method is quantitatively evaluated on the publicly available dataset without myopia development and our Fundus Image Myopia Development (FIMD) dataset. The proposed method is shown to perform more accurate and stable registration than state-of-the-art methods, especially for retinal images with myopia development. SIGNIFICANCE To the best of our knowledge, this is the first retinal image registration method for the study of myopia development. This method significantly improves the registration accuracy of retinal images which have myopia development. The FIMD dataset we constructed has been made publicly available to promote the study in related fields.
Collapse
Affiliation(s)
- Zengshuo Wang
- Nankai University Eye Institute, Nankai University, Tianjin 300350, China; Institute of Robotics and Automatic Information System (IRAIS), the Tianjin Key Laboratory of Intelligent Robotic (tjKLIR), Nankai University, Tianjin 300350, China
| | - Haohan Zou
- Nankai University Eye Institute, Nankai University, Tianjin 300350, China; Tianjin Eye Hospital, Tianjin Eye Institute, Tianjin Key Laboratory of Ophthalmology and Visual Science, Tianjin Medical University, Tianjin 300350, China
| | - Yin Guo
- Department of Ophthalmology, Haidian Section of Peking University Third Hospital (Beijing Haidian Hospital), Beijing 100089, China
| | - Shan Guo
- Nankai University Eye Institute, Nankai University, Tianjin 300350, China; Institute of Robotics and Automatic Information System (IRAIS), the Tianjin Key Laboratory of Intelligent Robotic (tjKLIR), Nankai University, Tianjin 300350, China
| | - Xin Zhao
- Nankai University Eye Institute, Nankai University, Tianjin 300350, China; Institute of Robotics and Automatic Information System (IRAIS), the Tianjin Key Laboratory of Intelligent Robotic (tjKLIR), Nankai University, Tianjin 300350, China
| | - Yan Wang
- Nankai University Eye Institute, Nankai University, Tianjin 300350, China; Tianjin Eye Hospital, Tianjin Eye Institute, Tianjin Key Laboratory of Ophthalmology and Visual Science, Tianjin Medical University, Tianjin 300350, China.
| | - Mingzhu Sun
- Nankai University Eye Institute, Nankai University, Tianjin 300350, China; Institute of Robotics and Automatic Information System (IRAIS), the Tianjin Key Laboratory of Intelligent Robotic (tjKLIR), Nankai University, Tianjin 300350, China.
| |
Collapse
|
34
|
Yang S, Huang Q, Yu M. Advancements in remote sensing for active fire detection: A review of datasets and methods. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 943:173273. [PMID: 38823698 DOI: 10.1016/j.scitotenv.2024.173273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 04/06/2024] [Accepted: 05/13/2024] [Indexed: 06/03/2024]
Abstract
This study comprehensively and critically reviews active fire detection advancements in remote sensing from 1975 to the present, focusing on two main perspectives: datasets and corresponding instruments, and detection algorithms. The study highlights the increasing role of machine learning, particularly deep learning techniques, in active fire detection. Looking forward, the review outlines current challenges and future research opportunities in remote sensing for active fire detection. These include exploring data quality management and multi-modal learning, developing spatiotemporally explicit models, investigating self-supervised learning models, improving explainable and interpretable models, integrating physical-process based models with machine learning, and building digital twins to replicate wildfire dynamics and perform what-if scenario analysis. The review aims to serve as a valuable resource for informing natural resource management and enhancing environmental protection efforts through the application of remote sensing technology.
Collapse
Affiliation(s)
- Songxi Yang
- Spatial Computing and Data Mining Lab, Department of Geography, University of Wisconsin-Madison, Madison 53705, WI, USA
| | - Qunying Huang
- Spatial Computing and Data Mining Lab, Department of Geography, University of Wisconsin-Madison, Madison 53705, WI, USA.
| | - Manzhu Yu
- Department of Geography, Pennsylvania State University, University Park, 16802, PA, USA
| |
Collapse
|
35
|
Zhou Y, He B, Cao X, Xiao Y, Feng Q, Yang F, Xiao F, Geng X, Du Y. Remotely sensed estimates of long-term biochemical oxygen demand over Hong Kong marine waters using machine learning enhanced by imbalanced label optimisation. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 943:173748. [PMID: 38857793 DOI: 10.1016/j.scitotenv.2024.173748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 04/30/2024] [Accepted: 06/02/2024] [Indexed: 06/12/2024]
Abstract
In many coastal cities around the world, continuing water degradation threatens the living environment of humans and aquatic organisms. To assess and control the water pollution situation, this study estimated the Biochemical Oxygen Demand (BOD) concentration of Hong Kong's marine waters using remote sensing and an improved machine learning (ML) method. The scheme was derived from four ML algorithms (RBF, SVR, RF, XGB) and calibrated using a large amount (N > 1000) of in-situ BOD5 data. Based on labeled datasets with different preprocessing, i.e., the original BOD5, the log10(BOD5), and label distribution smoothing (LDS), three types of models were trained and evaluated. The results highlight the superior potential of the LDS-based model to improve BOD5 estimate by dealing with imbalanced training dataset. Additionally, XGB and RF outperformed RBF and SVR when the model was developed using log10(BOD5) or LDS(BOD5). Over two decades, the BOD5 concentration of Hong Kong marine waters in the autumn (Sep. to Nov.) shows a downward trend, with significant decreases in Deep Bay, Western Buffer, Victoria Harbour, Eastern Buffer, Junk Bay, Port Shelter, and the Tolo Harbour and Channel. Principal component analysis revealed that nutrient levels emerged as the predominant factor in Victoria Harbour and the interior of Deep Bay, while chlorophyll-related and physical parameters were dominant in Southern, Mirs Bay, Northwestern, and the outlet of Deep Bay. LDS provides a new perspective to improve ML-based water quality estimation by alleviating the imbalance in the labeled dataset. Overall, the remotely sensed BOD5 can offer insight into the spatial-temporal distribution of organic matter in Hong Kong coastal waters and valuable guidance for the pollution control.
Collapse
Affiliation(s)
- Yadong Zhou
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| | - Boayin He
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China.
| | - Xiaoyu Cao
- School of Geography and Ocean Science, Nanjing University, Nanjing 210023, China
| | - Yu Xiao
- Key Laboratory of Wetland Ecology and Environment, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qi Feng
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| | - Fan Yang
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fei Xiao
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| | - Xueer Geng
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yun Du
- Key Laboratory for Environment and Disaster Monitoring and Evaluation of Hubei, Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, China
| |
Collapse
|
36
|
Kuo JC, Chan W, Leon-Novelo L, Lairson DR, Brown A, Fujimoto K. Latent classification model for censored longitudinal binary outcome. Stat Med 2024; 43:3943-3957. [PMID: 38951953 DOI: 10.1002/sim.10156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/23/2024] [Accepted: 06/10/2024] [Indexed: 07/03/2024]
Abstract
Latent classification model is a class of statistical methods for identifying unobserved class membership among the study samples using some observed data. In this study, we proposed a latent classification model that takes a censored longitudinal binary outcome variable and uses its changing pattern over time to predict individuals' latent class membership. Assuming the time-dependent outcome variables follow a continuous-time Markov chain, the proposed method has two primary goals: (1) estimate the distribution of the latent classes and predict individuals' class membership, and (2) estimate the class-specific transition rates and rate ratios. To assess the model's performance, we conducted a simulation study and verified that our algorithm produces accurate model estimates (ie, small bias) with reasonable confidence intervals (ie, achieving approximately 95% coverage probability). Furthermore, we compared our model to four other existing latent class models and demonstrated that our approach yields higher prediction accuracies for latent classes. We applied our proposed method to analyze the COVID-19 data in Houston, Texas, US collected between January first 2021 and December 31st 2021. Early reports on the COVID-19 pandemic showed that the severity of a SARS-CoV-2 infection tends to vary greatly by cases. We found that while demographic characteristics explain some of the differences in individuals' experience with COVID-19, some unaccounted-for latent variables were associated with the disease.
Collapse
Affiliation(s)
- Jacky C Kuo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Wenyaw Chan
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Luis Leon-Novelo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - David R Lairson
- Department of Management, Policy and Community Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Armand Brown
- Bureau of Epidemiology, Houston Health Department, Houston, Texas, USA
| | - Kayo Fujimoto
- Department of Health Promotion and Behaviroal Sciences, University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
37
|
Yao T, Li Y, Pan Y, Mei T. HIRI-ViT: Scaling Vision Transformer With High Resolution Inputs. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6431-6442. [PMID: 38502628 DOI: 10.1109/tpami.2024.3379457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
The hybrid deep models of Vision Transformer (ViT) and Convolution Neural Network (CNN) have emerged as a powerful class of backbones for vision tasks. Scaling up the input resolution of such hybrid backbones naturally strengthes model capacity, but inevitably suffers from heavy computational cost that scales quadratically. Instead, we present a new hybrid backbone with HIgh-Resolution Inputs (namely HIRI-ViT), that upgrades prevalent four-stage ViT to five-stage ViT tailored for high-resolution inputs. HIRI-ViT is built upon the seminal idea of decomposing the typical CNN operations into two parallel CNN branches in a cost-efficient manner. One high-resolution branch directly takes primary high-resolution features as inputs, but uses less convolution operations. The other low-resolution branch first performs down-sampling and then utilizes more convolution operations over such low-resolution features. Experiments on both recognition task (ImageNet-1K dataset) and dense prediction tasks (COCO and ADE20 K datasets) demonstrate the superiority of HIRI-ViT. More remarkably, under comparable computational cost ( ∼ 5.0 GFLOPs), HIRI-ViT achieves to-date the best published Top-1 accuracy of 84.3% on ImageNet with 448×448 inputs, which absolutely improves 83.4% of iFormer-S by 0.9% with 224×224 inputs.
Collapse
|
38
|
Liu G, Zhang J, Chan AB, Hsiao JH. Human attention guided explainable artificial intelligence for computer vision models. Neural Netw 2024; 177:106392. [PMID: 38788290 DOI: 10.1016/j.neunet.2024.106392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 05/11/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024]
Abstract
Explainable artificial intelligence (XAI) has been increasingly investigated to enhance the transparency of black-box artificial intelligence models, promoting better user understanding and trust. Developing an XAI that is faithful to models and plausible to users is both a necessity and a challenge. This work examines whether embedding human attention knowledge into saliency-based XAI methods for computer vision models could enhance their plausibility and faithfulness. Two novel XAI methods for object detection models, namely FullGrad-CAM and FullGrad-CAM++, were first developed to generate object-specific explanations by extending the current gradient-based XAI methods for image classification models. Using human attention as the objective plausibility measure, these methods achieve higher explanation plausibility. Interestingly, all current XAI methods when applied to object detection models generally produce saliency maps that are less faithful to the model than human attention maps from the same object detection task. Accordingly, human attention-guided XAI (HAG-XAI) was proposed to learn from human attention how to best combine explanatory information from the models to enhance explanation plausibility by using trainable activation functions and smoothing kernels to maximize the similarity between XAI saliency map and human attention map. The proposed XAI methods were evaluated on widely used BDD-100K, MS-COCO, and ImageNet datasets and compared with typical gradient-based and perturbation-based XAI methods. Results suggest that HAG-XAI enhanced explanation plausibility and user trust at the expense of faithfulness for image classification models, and it enhanced plausibility, faithfulness, and user trust simultaneously and outperformed existing state-of-the-art XAI methods for object detection models.
Collapse
Affiliation(s)
- Guoyang Liu
- School of Integrated Circuits, Shandong University, Jinan, China; Department of Psychology, University of Hong Kong, Pokfulam Road, Hong Kong.
| | | | - Antoni B Chan
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.
| | - Janet H Hsiao
- Division of Social Science, Hong Kong University of Science and Technology, Clearwater Bay, Hong Kong; Department of Psychology, University of Hong Kong, Pokfulam Road, Hong Kong.
| |
Collapse
|
39
|
Yamada A, Hanaoka S, Takenaga T, Miki S, Yoshikawa T, Nomura Y. Investigation of distributed learning for automated lesion detection in head MR images. Radiol Phys Technol 2024; 17:725-738. [PMID: 39048847 PMCID: PMC11341643 DOI: 10.1007/s12194-024-00827-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 06/11/2024] [Accepted: 07/14/2024] [Indexed: 07/27/2024]
Abstract
In this study, we investigated the application of distributed learning, including federated learning and cyclical weight transfer, in the development of computer-aided detection (CADe) software for (1) cerebral aneurysm detection in magnetic resonance (MR) angiography images and (2) brain metastasis detection in brain contrast-enhanced MR images. We used datasets collected from various institutions, scanner vendors, and magnetic field strengths for each target CADe software. We compared the performance of multiple strategies, including a centralized strategy, in which software development is conducted at a development institution after collecting de-identified data from multiple institutions. Our results showed that the performance of CADe software trained through distributed learning was equal to or better than that trained through the centralized strategy. However, the distributed learning strategies that achieved the highest performance depend on the target CADe software. Hence, distributed learning can become one of the strategies for CADe software development using data collected from multiple institutions.
Collapse
Affiliation(s)
- Aiki Yamada
- Department of Medical Engineering, Graduate School of Science and Engineering, Chiba University, 1-33 Yayoi-Cho, Inage-Ku, Chiba, 263-8522, Japan.
- Department of Radiology, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-8655, Japan.
| | - Shouhei Hanaoka
- Department of Radiology, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-8655, Japan
| | - Tomomi Takenaga
- Department of Radiology, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-8655, Japan
| | - Soichiro Miki
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-8655, Japan
| | - Takeharu Yoshikawa
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-8655, Japan
| | - Yukihiro Nomura
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-8655, Japan
- Center for Frontier Medical Engineering, Chiba University, 1-33 Yayoi-Cho, Inage-Ku, Chiba, 263-8522, Japan
| |
Collapse
|
40
|
Xiao G, Yu J, Ma J, Fan DP, Shao L. Latent Semantic Consensus for Deterministic Geometric Model Fitting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6139-6153. [PMID: 38478435 DOI: 10.1109/tpami.2024.3376731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Estimating reliable geometric model parameters from the data with severe outliers is a fundamental and important task in computer vision. This paper attempts to sample high-quality subsets and select model instances to estimate parameters in the multi-structural data. To address this, we propose an effective method called Latent Semantic Consensus (LSC). The principle of LSC is to preserve the latent semantic consensus in both data points and model hypotheses. Specifically, LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses, respectively. Then, LSC explores the distributions of points in the two latent semantic spaces, to remove outliers, generate high-quality model hypotheses, and effectively estimate model instances. Finally, LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting, due to its deterministic fitting nature and efficiency. Compared with several state-of-the-art model fitting methods, our LSC achieves significant superiority for the performance of both accuracy and speed on synthetic data and real images.
Collapse
|
41
|
Huang YH, Cao YP, Lai YK, Shan Y, Gao L. NeRF-Texture: Synthesizing Neural Radiance Field Textures. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:5986-6000. [PMID: 38564349 DOI: 10.1109/tpami.2024.3382198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Texture synthesis is a fundamental problem in computer graphics that would benefit various applications. Existing methods are effective in handling 2D image textures. In contrast, many real-world textures contain meso-structure in the 3D geometry space, such as grass, leaves, and fabrics, which cannot be effectively modeled using only 2D image textures. We propose a novel texture synthesis method with Neural Radiance Fields (NeRF) to capture and synthesize textures from given multi-view images. In the proposed NeRF texture representation, a scene with fine geometric details is disentangled into the meso-structure textures and the underlying base shape. This allows textures with meso-structure to be effectively learned as latent features situated on the base shape, which are fed into a NeRF decoder trained simultaneously to represent the rich view-dependent appearance. Using this implicit representation, we can synthesize NeRF-based textures through patch matching of latent features. However, inconsistencies between the metrics of the reconstructed content space and the latent feature space may compromise the synthesis quality. To enhance matching performance, we further regularize the distribution of latent features by incorporating a clustering constraint. In addition to generating NeRF textures over a planar domain, our method can also synthesize NeRF textures over curved surfaces, which are practically useful. Experimental results and evaluations demonstrate the effectiveness of our approach.
Collapse
|
42
|
Sun Z, Yang Q, Yan N, Chen S, Zhu J, Zhao J, Sun S. Utilizing deep learning algorithms for automated oil spill detection in medium resolution optical imagery. MARINE POLLUTION BULLETIN 2024; 206:116777. [PMID: 39083910 DOI: 10.1016/j.marpolbul.2024.116777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 07/22/2024] [Accepted: 07/23/2024] [Indexed: 08/02/2024]
Abstract
This study evaluates the performance of three typical convolutional neural network based deep learning algorithms for oil spill detection using medium-resolution optical satellite imagery from Sentinel-2 MSI, Landsat-8 OLI, and Landsat-9 OLI2. Oil slick training and validation dataset were created through a semi-automatic labeling approach, based on chronic and accidental oil spill cases reported worldwide. The research enhances UNet, BiSeNetV2, and DeepLabV3+ architectures by integrating attention mechanisms including the Squeeze-and-Excitation module (SE), Convolutional Block Attention Module (CBAM), and a Simple, parameter-free Attention Module (SimAM), analyzing the optimal model for oil spill detection. Notably, UNet integrated with CBAM, especially with sun glint as a feature, significantly outperformed others, achieving a micro-average F1 score of 88.8 %. This research highlights deep learning's potential in optical remote sensing for oil spill detection, stressing its escalating relevance with the growing deployment of medium- to high-resolution optical satellites.
Collapse
Affiliation(s)
- Zhen Sun
- Institute of Estuarine and Coastal Research, School of Ocean Engineering and Technology, Sun Yat-sen University, and Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519082, China
| | - Qingshu Yang
- Institute of Estuarine and Coastal Research, School of Ocean Engineering and Technology, Sun Yat-sen University, and Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519082, China
| | - Nanyang Yan
- Guangzhou Urban Planning & Design Survey Research Institute, Guangzhou 510060, China; Collaborative Innovation Center for Natural Resources Planning and Marine Technology of Guangzhou, Guangzhou 510060, China
| | - Siyu Chen
- School of Marine Sciences, Sun Yat-sen University, Zhuhai 519082, China
| | - Jianhang Zhu
- School of Marine Sciences, Sun Yat-sen University, Zhuhai 519082, China
| | - Jun Zhao
- School of Marine Sciences, Sun Yat-sen University, Zhuhai 519082, China; Guangdong Provincial Key Laboratory of Marine Resources and Coastal Engineering, Guangzhou 510275, China; Pearl River Estuary Marine Ecosystem Research Station, Ministry of Education, Zhuhai 519000, China
| | - Shaojie Sun
- School of Marine Sciences, Sun Yat-sen University, Zhuhai 519082, China; Guangdong Provincial Key Laboratory of Marine Resources and Coastal Engineering, Guangzhou 510275, China; Pearl River Estuary Marine Ecosystem Research Station, Ministry of Education, Zhuhai 519000, China.
| |
Collapse
|
43
|
Schumann Y, Dottermusch M, Schweizer L, Krech M, Lempertz T, Schüller U, Neumann P, Neumann JE. Morphology-based molecular classification of spinal cord ependymomas using deep neural networks. Brain Pathol 2024; 34:e13239. [PMID: 38205683 PMCID: PMC11328346 DOI: 10.1111/bpa.13239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/31/2023] [Indexed: 01/12/2024] Open
Abstract
Based on DNA-methylation, ependymomas growing in the spinal cord comprise two major molecular types termed spinal (SP-EPN) and myxopapillary ependymomas (MPE(-A/B)), which differ with respect to their clinical features and prognosis. Due to the existing discrepancy between histomorphogical diagnoses and classification using methylation data, we asked whether deep neural networks can predict the DNA methylation class of spinal cord ependymomas from hematoxylin and eosin stained whole-slide images. Using explainable AI, we further aimed to prospectively improve the consistency of histology-based diagnoses with DNA methylation profiling by identifying and quantifying distinct morphological patterns of these molecular ependymoma types. We assembled a case series of 139 molecularly characterized spinal cord ependymomas (nMPE = 84, nSP-EPN = 55). Self-supervised and weakly-supervised neural networks were used for classification. We employed attention analysis and supervised machine-learning methods for the discovery and quantification of morphological features and their correlation to the diagnoses of experienced neuropathologists. Our best performing model predicted the DNA methylation class with 98% test accuracy and used self-supervised learning to outperform pretrained encoder-networks (86% test accuracy). In contrast, the diagnoses of neuropathologists matched the DNA methylation class in only 83% of cases. Domain-adaptation techniques improved model generalization to an external validation cohort by up to 22%. Statistically significant morphological features were identified per molecular type and quantitatively correlated to human diagnoses. The approach was extended to recently defined subtypes of myxopapillary ependymomas (MPE-(A/B), 80% test accuracy). In summary, we demonstrated the accurate prediction of the DNA methylation class of spinal cord ependymomas (SP-EPN, MPE(-A/B)) using hematoxylin and eosin stained whole-slide images. Our approach may prospectively serve as a supplementary resource for integrated diagnostics and may even help to establish a standardized, high-quality level of histology-based diagnostics across institutions-in particular in low-income countries, where expensive DNA-methylation analyses may not be readily available.
Collapse
Affiliation(s)
- Yannis Schumann
- Chair for High Performance Computing, Helmut-Schmidt-University Hamburg, Hamburg, Germany
| | - Matthias Dottermusch
- Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Institute of Neuropathology, UKE, Hamburg, Germany
| | - Leonille Schweizer
- Institute of Neurology (Edinger Institute), University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
- German Cancer Consortium (DKTK), Partner Site Frankfurt/Mainz, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Frankfurt Cancer Institute (FCI), Frankfurt am Main, Germany
| | - Maja Krech
- Institute for Neuropathology, Charité Berlin, Berlin, Germany
| | - Tasja Lempertz
- Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Ulrich Schüller
- Institute of Neuropathology, UKE, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, UKE, Hamburg, Germany
- Department of Pediatric Hematology and Oncology, UKE, Hamburg, Germany
| | - Philipp Neumann
- Chair for High Performance Computing, Helmut-Schmidt-University Hamburg, Hamburg, Germany
| | - Julia E Neumann
- Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Institute of Neuropathology, UKE, Hamburg, Germany
| |
Collapse
|
44
|
Meng L, Li Y, Duan W. Three-stage polyp segmentation network based on reverse attention feature purification with Pyramid Vision Transformer. Comput Biol Med 2024; 179:108930. [PMID: 39067285 DOI: 10.1016/j.compbiomed.2024.108930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/30/2024] [Accepted: 07/18/2024] [Indexed: 07/30/2024]
Abstract
Colorectal polyps serve as potential precursors of colorectal cancer and automating polyp segmentation aids physicians in accurately identifying potential polyp regions, thereby reducing misdiagnoses and missed diagnoses. However, existing models often fall short in accurately segmenting polyps due to the high degree of similarity between polyp regions and surrounding tissue in terms of color, texture, and shape. To address this challenge, this study proposes a novel three-stage polyp segmentation network, named Reverse Attention Feature Purification with Pyramid Vision Transformer (RAFPNet), which adopts an iterative feedback UNet architecture to refine polyp saliency maps for precise segmentation. Initially, a Multi-Scale Feature Aggregation (MSFA) module is introduced to generate preliminary polyp saliency maps. Subsequently, a Reverse Attention Feature Purification (RAFP) module is devised to effectively suppress low-level surrounding tissue features while enhancing high-level semantic polyp information based on the preliminary saliency maps. Finally, the UNet architecture is leveraged to further refine the feature maps in a coarse-to-fine approach. Extensive experiments conducted on five widely used polyp segmentation datasets and three video polyp segmentation datasets demonstrate the superior performance of RAFPNet over state-of-the-art models across multiple evaluation metrics.
Collapse
Affiliation(s)
- Lingbing Meng
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Yuting Li
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China
| | - Weiwei Duan
- School of Computer and Software Engineering, Anhui Institute of Information Technology, China.
| |
Collapse
|
45
|
Dimitriadis SI. ℛSCZ: A Riemannian schizophrenia diagnosis framework based on the multiplexity of EEG-based dynamic functional connectivity patterns. Comput Biol Med 2024; 180:108862. [PMID: 39068901 DOI: 10.1016/j.compbiomed.2024.108862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 06/30/2024] [Accepted: 07/06/2024] [Indexed: 07/30/2024]
Abstract
Abnormal electrophysiological (EEG) activity has been largely reported in schizophrenia (SCZ). In the last decade, research has focused to the automatic diagnosis of SCZ via the investigation of an EEG aberrant activity and connectivity linked to this mental disorder. These studies followed various preprocessing steps of EEG activity focusing on frequency-dependent functional connectivity brain network (FCBN) construction disregarding the topological dependency among edges. FCBN belongs to a family of symmetric positive definite (SPD) matrices forming the Riemannian manifold. Due to its unique geometric properties, the whole analysis of FCBN can be performed on the Riemannian geometry of the SPD space. The advantage of the analysis of FCBN on the SPD space is that it takes into account all the pairwise interdependencies as a whole. However, only a few studies have adopted a FCBN analysis on the SPD manifold, while no study exists on the analysis of dynamic FCBN (dFCBN) tailored to SCZ. In the present study, I analyzed two open EEG-SCZ datasets under a Riemannian geometry of SPD matrices for the dFCBN analysis proposing also a multiplexity index that quantifies the associations of multi-frequency brainwave patterns. I adopted a machine learning procedure employing a leave-one-subject-out cross-validation (LOSO-CV) using snapshots of dFCBN from (N-1) subjects to train a battery of classifiers. Each classifier operated in the inter-subject dFCBN distances of sample covariance matrices (SCMs) following a rhythm-dependent decision and a multiplex-dependent one. The proposed ℛSCZ decoder supported both the Riemannian geometry of SPD and the multiplexity index DC reaching an absolute accuracy (100 %) in both datasets in the virtual default mode network (DMN) source space.
Collapse
Affiliation(s)
- Stavros I Dimitriadis
- Department of Clinical Psychology and Psychobiology, University of Barcelona, Passeig Vall D'Hebron 171, 08035, Barcelona, Spain; Institut de Neurociencies, University of Barcelona, Municipality of Horta-Guinardó, 08035, Barcelona, Spain; Integrative Neuroimaging Lab, Thessaloniki, 55133, Makedonia, Greece; Neuroinformatics Group, Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, College of Biomedical and Life Sciences, Cardiff University, Maindy Rd, CF24 4HQ, Cardiff, Wales, United Kingdom.
| |
Collapse
|
46
|
Bugler H, Berto R, Souza R, Harris AD. Frequency and phase correction of GABA-edited magnetic resonance spectroscopy using complex-valued convolutional neural networks. Magn Reson Imaging 2024; 111:186-195. [PMID: 38744351 DOI: 10.1016/j.mri.2024.05.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/16/2024]
Abstract
PURPOSE To determine the significance of complex-valued inputs and complex-valued convolutions compared to real-valued inputs and real-valued convolutions in convolutional neural networks (CNNs) for frequency and phase correction (FPC) of GABA-edited magnetic resonance spectroscopy (MRS) data. METHODS An ablation study using simulated data was performed to determine the most effective input (real or complex) and convolution type (real or complex) to predict frequency and phase shifts in GABA-edited MEGA-PRESS data using CNNs. The best CNN model was subsequently compared using both simulated and in vivo data to two recently proposed deep learning (DL) methods for FPC of GABA-edited MRS. All methods were trained using the same experimental setup and evaluated using the signal-to-noise ratio (SNR) and linewidth of the GABA peak, choline artifact, and by visually assessing the reconstructed final difference spectrum. Statistical significance was assessed using the Wilcoxon signed rank test. RESULTS The ablation study showed that using complex values for the input represented by real and imaginary channels in our model input tensor, with complex convolutions was most effective for FPC. Overall, in the comparative study using simulated data, our CC-CNN model (that received complex-valued inputs with complex convolutions) outperformed the other models as evaluated by the mean absolute error. CONCLUSION Our results indicate that the optimal CNN configuration for GABA-edited MRS FPC uses a complex-valued input and complex convolutions. Overall, this model outperformed existing DL models.
Collapse
Affiliation(s)
- Hanna Bugler
- Department of Biomedical Engineering, University of Calgary, Canada; Department of Radiology, University of Calgary, Canada; Hotchkiss Brain Institute, University of Calgary,Canada; Alberta Children's Hospital Research Institute, University of Calgary, Canada.
| | - Rodrigo Berto
- Department of Biomedical Engineering, University of Calgary, Canada; Department of Radiology, University of Calgary, Canada; Hotchkiss Brain Institute, University of Calgary,Canada; Alberta Children's Hospital Research Institute, University of Calgary, Canada
| | - Roberto Souza
- Hotchkiss Brain Institute, University of Calgary,Canada; Department of Electrical and Software Engineering, University of Calgary, Canada
| | - Ashley D Harris
- Department of Radiology, University of Calgary, Canada; Hotchkiss Brain Institute, University of Calgary,Canada; Alberta Children's Hospital Research Institute, University of Calgary, Canada
| |
Collapse
|
47
|
Hashimoto F, Onishi Y, Ote K, Tashima H, Yamaya T. Two-step optimization for accelerating deep image prior-based PET image reconstruction. Radiol Phys Technol 2024; 17:776-781. [PMID: 39096446 DOI: 10.1007/s12194-024-00831-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 07/25/2024] [Accepted: 07/27/2024] [Indexed: 08/05/2024]
Abstract
Deep learning, particularly convolutional neural networks (CNNs), has advanced positron emission tomography (PET) image reconstruction. However, it requires extensive, high-quality training datasets. Unsupervised learning methods, such as deep image prior (DIP), have shown promise for PET image reconstruction. Although DIP-based PET image reconstruction methods demonstrate superior performance, they involve highly time-consuming calculations. This study proposed a two-step optimization method to accelerate end-to-end DIP-based PET image reconstruction and improve PET image quality. The proposed two-step method comprised a pre-training step using conditional DIP denoising, followed by an end-to-end reconstruction step with fine-tuning. Evaluations using Monte Carlo simulation data demonstrated that the proposed two-step method significantly reduced the computation time and improved the image quality, thereby rendering it a practical and efficient approach for end-to-end DIP-based PET image reconstruction.
Collapse
Affiliation(s)
- Fumio Hashimoto
- Central Research Laboratory, Hamamatsu Photonics K. K, 5000 Hirakuchi, Hamana-Ku, Hamamatsu, 434-8601, Japan.
- Graduate School of Science and Engineering, Chiba University, 1-33, Yayoicho,Inage-Ku, Chiba, 263-8522, Japan.
- National Institutes for Quantum Science and Technology, 4-9-1, Anagawa,Inage-Ku, Chiba, 263-8555, Japan.
| | - Yuya Onishi
- Central Research Laboratory, Hamamatsu Photonics K. K, 5000 Hirakuchi, Hamana-Ku, Hamamatsu, 434-8601, Japan
| | - Kibo Ote
- Central Research Laboratory, Hamamatsu Photonics K. K, 5000 Hirakuchi, Hamana-Ku, Hamamatsu, 434-8601, Japan
| | - Hideaki Tashima
- National Institutes for Quantum Science and Technology, 4-9-1, Anagawa,Inage-Ku, Chiba, 263-8555, Japan
| | - Taiga Yamaya
- Graduate School of Science and Engineering, Chiba University, 1-33, Yayoicho,Inage-Ku, Chiba, 263-8522, Japan
- National Institutes for Quantum Science and Technology, 4-9-1, Anagawa,Inage-Ku, Chiba, 263-8555, Japan
| |
Collapse
|
48
|
Rubaiyat AHM, Li S, Yin X, Shifat-E-Rabbi M, Zhuang Y, Rohde GK. End-to-End Signal Classification in Signed Cumulative Distribution Transform Space. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:5936-5950. [PMID: 38427542 PMCID: PMC11345860 DOI: 10.1109/tpami.2024.3372455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/03/2024]
Abstract
This paper presents a new end-to-end signal classification method using the signed cumulative distribution transform (SCDT). We adopt a transport generative model to define the classification problem. We then make use of mathematical properties of the SCDT to render the problem easier in transform domain, and solve for the class of an unknown sample using a nearest local subspace (NLS) search algorithm in SCDT domain. Experiments show that the proposed method provides high accuracy classification results while being computationally cheap, data efficient, and robust to out-of-distribution samples with respect to the existing end-to-end classification methods. The implementation of the proposed method in Python language is integrated as a part of the software package PyTransKit [1].
Collapse
|
49
|
Liu Z, Lv Q, Lee CH, Shen L. Segmenting medical images with limited data. Neural Netw 2024; 177:106367. [PMID: 38754215 DOI: 10.1016/j.neunet.2024.106367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/03/2024] [Accepted: 05/03/2024] [Indexed: 05/18/2024]
Abstract
While computer vision has proven valuable for medical image segmentation, its application faces challenges such as limited dataset sizes and the complexity of effectively leveraging unlabeled images. To address these challenges, we present a novel semi-supervised, consistency-based approach termed the data-efficient medical segmenter (DEMS). The DEMS features an encoder-decoder architecture and incorporates the developed online automatic augmenter (OAA) and residual robustness enhancement (RRE) blocks. The OAA augments input data with various image transformations, thereby diversifying the dataset to improve the generalization ability. The RRE enriches feature diversity and introduces perturbations to create varied inputs for different decoders, thereby providing enhanced variability. Moreover, we introduce a sensitive loss to further enhance consistency across different decoders and stabilize the training process. Extensive experimental results on both our own and three public datasets affirm the effectiveness of DEMS. Under extreme data shortage scenarios, our DEMS achieves 16.85% and 10.37% improvement in dice score compared with the U-Net and top-performed state-of-the-art method, respectively. Given its superior data efficiency, DEMS could present significant advancements in medical segmentation under small data regimes. The project homepage can be accessed at https://github.com/NUS-Tim/DEMS.
Collapse
Affiliation(s)
- Zhaoshan Liu
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| | - Qiujie Lv
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore; School of Intelligent Systems Engineering, Sun Yat-sen University, No. 66, Gongchang Road, Guangming District, 518107, China.
| | - Chau Hung Lee
- Department of Radiology, Tan Tock Seng Hospital, 11 Jalan Tan Tock Seng, Singapore, 308433, Singapore.
| | - Lei Shen
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| |
Collapse
|
50
|
Wollek A, Hyska S, Sedlmeyr T, Haitzer P, Rueckel J, Sabel BO, Ingrisch M, Lasser T. German CheXpert Chest X-ray Radiology Report Labeler. ROFO-FORTSCHR RONTG 2024; 196:956-965. [PMID: 38295825 DOI: 10.1055/a-2234-8268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
PURPOSE The aim of this study was to develop an algorithm to automatically extract annotations from German thoracic radiology reports to train deep learning-based chest X-ray classification models. MATERIALS AND METHODS An automatic label extraction model for German thoracic radiology reports was designed based on the CheXpert architecture. The algorithm can extract labels for twelve common chest pathologies, the presence of support devices, and "no finding". For iterative improvements and to generate a ground truth, a web-based multi-reader annotation interface was created. With the proposed annotation interface, a radiologist annotated 1086 retrospectively collected radiology reports from 2020-2021 (data set 1). The effect of automatically extracted labels on chest radiograph classification performance was evaluated on an additional, in-house pneumothorax data set (data set 2), containing 6434 chest radiographs with corresponding reports, by comparing a DenseNet-121 model trained on extracted labels from the associated reports, image-based pneumothorax labels, and publicly available data, respectively. RESULTS Comparing automated to manual labeling on data set 1: "mention extraction" class-wise F1 scores ranged from 0.8 to 0.995, the "negation detection" F1 scores from 0.624 to 0.981, and F1 scores for "uncertainty detection" from 0.353 to 0.725. Extracted pneumothorax labels on data set 2 had a sensitivity of 0.997 [95 % CI: 0.994, 0.999] and specificity of 0.991 [95 % CI: 0.988, 0.994]. The model trained on publicly available data achieved an area under the receiver operating curve (AUC) for pneumothorax classification of 0.728 [95 % CI: 0.694, 0.760], while the models trained on automatically extracted labels and on manual annotations achieved values of 0.858 [95 % CI: 0.832, 0.882] and 0.934 [95 % CI: 0.918, 0.949], respectively. CONCLUSION Automatic label extraction from German thoracic radiology reports is a promising substitute for manual labeling. By reducing the time required for data annotation, larger training data sets can be created, resulting in improved overall modeling performance. Our results demonstrated that a pneumothorax classifier trained on automatically extracted labels strongly outperformed the model trained on publicly available data, without the need for additional annotation time and performed competitively compared to manually labeled data. KEY POINTS · An algorithm for automatic German thoracic radiology report annotation was developed.. · Automatic label extraction is a promising substitute for manual labeling.. · The classifier trained on extracted labels outperformed the model trained on publicly available data.. ZITIERWEISE · Wollek A, Hyska S, Sedlmeyr T et al. German CheXpert Chest X-ray Radiology Report Labeler. Fortschr Röntgenstr 2024; 196: 956 - 965.
Collapse
Affiliation(s)
- Alessandro Wollek
- Munich Institute of Biomedical Engineering, Technical University of Munich, Garching b. München, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching b. München, Germany
| | - Sardi Hyska
- Department of Radiology, Ludwig-Maximilians-University Hospital Munich, München, Germany
| | - Thomas Sedlmeyr
- Munich Institute of Biomedical Engineering, Technical University of Munich, Garching b. München, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching b. München, Germany
| | - Philip Haitzer
- Munich Institute of Biomedical Engineering, Technical University of Munich, Garching b. München, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching b. München, Germany
| | - Johannes Rueckel
- Department of Radiology, Ludwig-Maximilians-University Hospital Munich, München, Germany
- Institute of Neuroradiology, Ludwig-Maximilians-University Hospital Munich, München, Germany
| | - Bastian O Sabel
- Institute for Clinical Radiology, Ludwig-Maximilians-University Hospital Munich, Germany, München, Germany
| | - Michael Ingrisch
- Department of Radiology, Ludwig-Maximilians-University Hospital Munich, München, Germany
| | - Tobias Lasser
- Munich Institute of Biomedical Engineering, Technical University of Munich, Garching b. München, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching b. München, Germany
| |
Collapse
|