1
|
Wang X, Wang M, Liu X, Mao Y, Chen Y, Dai S. Surveillance-image-based outdoor air quality monitoring. Environ Sci Ecotechnol 2024; 18:100319. [PMID: 37841651 PMCID: PMC10569950 DOI: 10.1016/j.ese.2023.100319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 10/17/2023]
Abstract
Air pollution threatens human health, necessitating effective and convenient air quality monitoring. Recently, there has been a growing interest in using camera images for air quality estimation. However, a major challenge has been nighttime detection due to the limited visibility of nighttime images. Here we present a hybrid deep learning model, capitalizing on the temporal continuity of air quality changes for estimating outdoor air quality from surveillance images. Our model, which integrates a convolutional neural network (CNN) and long short-term memory (LSTM), adeptly captures spatial-temporal image features, enabling air quality estimation at any time of day, including PM2.5 and PM10 concentrations, as well as the air quality index (AQI). Compared to independent CNN networks that solely extract spatial features, our model demonstrates superior accuracy on self-constructed datasets with R2 = 0.94 and RMSE = 5.11 μg m-3 for PM2.5, R2 = 0.92 and RMSE = 7.30 μg m-3 for PM10, and R2 = 0.94 and RMSE = 5.38 for AQI. Furthermore, our model excels in daytime air quality estimation and enhances nighttime predictions, elevating overall accuracy. Validation across diverse image datasets and comparative analyses underscore the applicability and superiority of our model, reaffirming its applicability and superiority for air quality monitoring.
Collapse
Affiliation(s)
- Xiaochu Wang
- School of Geography, Nanjing Normal University, Nanjing, 210023, China
- Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Ministry of Education, Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing Normal University, Nanjing, 210023, China
| | - Meizhen Wang
- School of Geography, Nanjing Normal University, Nanjing, 210023, China
- Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Ministry of Education, Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing Normal University, Nanjing, 210023, China
| | - Xuejun Liu
- School of Geography, Nanjing Normal University, Nanjing, 210023, China
- Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Ministry of Education, Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing Normal University, Nanjing, 210023, China
| | - Ying Mao
- School of Geography, Nanjing Normal University, Nanjing, 210023, China
- Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Ministry of Education, Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing Normal University, Nanjing, 210023, China
| | - Yang Chen
- School of Geography, Nanjing Normal University, Nanjing, 210023, China
- Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Ministry of Education, Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing Normal University, Nanjing, 210023, China
| | - Songsong Dai
- School of Geography, Nanjing Normal University, Nanjing, 210023, China
- Key Laboratory of Virtual Geographic Environment, Nanjing Normal University, Ministry of Education, Nanjing, 210023, China
- Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing Normal University, Nanjing, 210023, China
| |
Collapse
|
2
|
Bergen RV, Rajotte JF, Yousefirizi F, Rahmim A, Ng RT. Assessing privacy leakage in synthetic 3-D PET imaging using transversal GAN. Comput Methods Programs Biomed 2024; 243:107910. [PMID: 37976611 DOI: 10.1016/j.cmpb.2023.107910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 10/27/2023] [Accepted: 10/31/2023] [Indexed: 11/19/2023]
Abstract
BACKGROUND AND OBJECTIVE Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. METHODS We introduce our 3-D generative model, Transversal GAN (TrGAN), using head & neck PET images which are conditioned on tumor masks as a case study. We define quantitative measures of image fidelity and utility, and propose a novel framework for evaluating privacy-utility trade-off through membership inference attack. These metrics are evaluated in the course of training to identify ideal fidelity, utility and privacy trade-offs and establish the relationships between these parameters. RESULTS We show that the discriminator of the TrGAN is vulnerable to attack, and that an attacker can identify which samples were used in training with almost perfect accuracy (AUC = 0.99). We also show that an attacker with access to only the generator cannot reliably classify whether a sample had been used for training (AUC = 0.51). We also propose and demonstrate a general decision procedure for any deep learning based generative model, which allows the user to quantify and evaluate the decision trade-off between downstream utility and privacy protection. CONCLUSIONS TrGAN can generate 3-D medical images that retain important image features and statistical properties of the training data set, with minimal privacy loss as determined by a membership inference attack. Our utility-privacy decision procedure may be beneficial to researchers who wish to share data or lack a sufficient number of large labeled image datasets.
Collapse
Affiliation(s)
- Robert V Bergen
- Data Science Institute, University of British Columbia, BC V6T 1Z4, Canada.
| | | | - Fereshteh Yousefirizi
- Department of Integrative Oncology, BC Cancer Research Institute, BC V5Z 1L3, Canada
| | - Arman Rahmim
- Department of Integrative Oncology, BC Cancer Research Institute, BC V5Z 1L3, Canada; Department of Radiology, University of British Columbia, BC V5Z 1M9, Canada
| | - Raymond T Ng
- Data Science Institute, University of British Columbia, BC V6T 1Z4, Canada
| |
Collapse
|
3
|
Wulamu A, Luo J, Chen S, Zheng H, Wang T, Yang R, Jiao L, Zhang T. CASMatching strategy for automated detection and quantification of carotid artery stenosis based on digital subtraction angiography. Comput Methods Programs Biomed 2024; 243:107871. [PMID: 37925855 DOI: 10.1016/j.cmpb.2023.107871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/16/2023] [Accepted: 10/15/2023] [Indexed: 11/07/2023]
Abstract
BACKGROUND AND OBJECTIVE Automated detection and quantification of carotid artery stenosis is a crucial task in establishing a computer-aided diagnostic system for brain diseases. Digital subtraction angiography (DSA) is known as the "gold standard" for carotid stenosis diagnosis. It is commonly used to identify carotid artery stenosis and measure morphological indices of the stenosis. However, using deep learning to detect stenosis based on DSA images and further quantitatively predicting the morphological indices remain a challenge due the absence of prior work. In this paper, we propose a quantitative method for predicting morphological indices of carotid stenosis. METHODS Our method adopts a two-stage pipeline, first locating regions suitable for predicting morphological indices by object detection model, and then using a regression model to predict indices. A novel Carotid Artery Stenosis Matching (CASMatching) strategy is introduced into the object detection to model the matching relationship between a stenosis and multiple normal vessel segments. The proposed Match-ness branch predicts a Match-ness score for each normal vessel segment to indicate the degree of matching to the stenosis. A novel Direction Distance-IoU (2DIoU) loss based on the Distance-IoU loss is proposed to make the model focused more on the bounding box regression in the direction of vessel extension. After detection, the normal vessel segment with the highest Match-ness score and the stenosis are intercepted from the original image, then fed into a regression model to predict morphological indices and calculate the degree of stenosis. RESULTS Our method is trained and evaluated on a dataset collected from three different manufacturers' monoplane X-ray systems. The results show that the proposed components in the object detector substantially improve the detection performance of normal vascular segments. For the prediction of morphological indices, our model achieves Mean Absolute Error of 0.378, 0.221, 4.9 on reference vessel diameter (RVD), minimum lumen diameter (MLD) and stenosis degree. CONCLUSIONS Our method can precisely localize the carotid stenosis and the normal vessel segment suitable for predicting RVD of the stenosis, and further achieve accurate quantification, providing a novel solution for the quantification of carotid artery stenosis.
Collapse
Affiliation(s)
- Aziguli Wulamu
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, China.
| | - Jichang Luo
- Department of Neurosurgery, Xuanwu Hospital, Capital Medical University, Beijing, China; China International Neuroscience Institute (China-INI), Beijing, China
| | - Saian Chen
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, China
| | - Han Zheng
- Education Department of Guangxi Zhuang Autonomous Region, Key Laboratory of AI and Information Processing (Hechi University), Hechi, Guangxi 546300, China.
| | - Tao Wang
- Department of Neurosurgery, Xuanwu Hospital, Capital Medical University, Beijing, China; China International Neuroscience Institute (China-INI), Beijing, China
| | - Renjie Yang
- Department of Neurosurgery, Xuanwu Hospital, Capital Medical University, Beijing, China; China International Neuroscience Institute (China-INI), Beijing, China
| | - Liqun Jiao
- Department of Neurosurgery, Xuanwu Hospital, Capital Medical University, Beijing, China; China International Neuroscience Institute (China-INI), Beijing, China; Department of Interventional Radiology, Xuanwu Hospital, Capital Medical University, Beijing, China.
| | - Taohong Zhang
- Department of Computer, School of Computer and Communication Engineering, University of Science and Technology Beijing (USTB), Beijing, China; Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, China.
| |
Collapse
|
4
|
Zhou Z, Geng JJ. Learned associations serve as target proxies during difficult but not easy visual search. Cognition 2024; 242:105648. [PMID: 37897882 DOI: 10.1016/j.cognition.2023.105648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 10/03/2023] [Accepted: 10/12/2023] [Indexed: 10/30/2023]
Abstract
The target template contains information in memory that is used to guide attention during visual search and is typically thought of as containing features of the actual target object. However, when targets are hard to find, it is advantageous to use other information in the visual environment that is predictive of the target's location to help guide attention. The purpose of these studies was to test if newly learned associations between face and scene category images lead observers to use scene information as a proxy for the face target. Our results showed that scene information was used as a proxy for the target to guide attention but only when the target face was difficult to discriminate from the distractor face; when the faces were easy to distinguish, attention was no longer guided by the scene unless the scene was presented earlier. The results suggest that attention is flexibly guided by both target features as well as features of objects that are predictive of the target location. The degree to which each contributes to guiding attention depends on the efficiency with which that information can be used to decode the location of the target in the current moment. The results contribute to the view that attentional guidance is highly flexible in its use of information to rapidly locate the target.
Collapse
Affiliation(s)
- Zhiheng Zhou
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA 95618, USA.
| | - Joy J Geng
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA 95618, USA; Department of Psychology, University of California, One Shields Ave, Davis, CA 95616, USA.
| |
Collapse
|
5
|
Krasich K, O'Neill K, Murray S, Brockmole JR, De Brigard F, Nuthmann A. A computational modeling approach to investigating mind wandering-related adjustments to gaze behavior during scene viewing. Cognition 2024; 242:105624. [PMID: 37944314 DOI: 10.1016/j.cognition.2023.105624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 07/08/2023] [Accepted: 09/14/2023] [Indexed: 11/12/2023]
Abstract
Research on gaze control has long shown that increased visual-cognitive processing demands in scene viewing are associated with longer fixation durations. More recently, though, longer durations have also been linked to mind wandering, a perceptually decoupled state of attention marked by decreased visual-cognitive processing. Toward better understanding the relationship between fixation durations and visual-cognitive processing, we ran simulations using an established random-walk model for saccade timing and programming and assessed which model parameters best predicted modulations in fixation durations associated with mind wandering compared to attentive viewing. Mind wandering-related fixation durations were best described as an increase in the variability of the fixation-generating process, leading to more variable-sometimes very long-durations. In contrast, past research showed that increased processing demands increased the mean duration of the fixation-generating process. The findings thus illustrate that mind wandering and processing demands modulate fixation durations through different mechanisms in scene viewing. This suggests that processing demands cannot be inferred from changes in fixation durations without understanding the underlying mechanism by which these changes were generated.
Collapse
Affiliation(s)
- Kristina Krasich
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA; Duke Institute for Brain Sciences, Duke University, Durham, NC, USA.
| | - Kevin O'Neill
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA; Duke Institute for Brain Sciences, Duke University, Durham, NC, USA; Department of Psychology & Neuroscience, Duke University, Durham, NC, USA
| | - Samuel Murray
- Philosophy Department, Providence College, Providence, RI, USA
| | - James R Brockmole
- Department of Psychology, University of Notre Dame, Notre Dame, IN, USA
| | - Felipe De Brigard
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA; Duke Institute for Brain Sciences, Duke University, Durham, NC, USA; Department of Psychology & Neuroscience, Duke University, Durham, NC, USA; Department of Philosophy, Duke University, Durham, NC, USA
| | | |
Collapse
|
6
|
Kumar S, Bhagat V, Sahu P, Chaube MK, Behera AK, Guizani M, Gravina R, Di Dio M, Fortino G, Curry E, Alsamhi SH. A novel multimodal framework for early diagnosis and classification of COPD based on CT scan images and multivariate pulmonary respiratory diseases. Comput Methods Programs Biomed 2024; 243:107911. [PMID: 37981453 DOI: 10.1016/j.cmpb.2023.107911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 10/23/2023] [Accepted: 11/01/2023] [Indexed: 11/21/2023]
Abstract
BACKGROUND AND OBJECTIVE Chronic Obstructive Pulmonary Disease (COPD) is one of the world's worst diseases; its early diagnosis using existing methods like statistical machine learning techniques, medical diagnostic tools, conventional medical procedures, and other methods is challenging due to misclassification results of COPD diagnosis and takes a long time to perform accurate prediction. Due to the severe consequences of COPD, detection and accurate diagnosis of COPD at an early stage is essential. This paper aims to design and develop a multimodal framework for early diagnosis and accurate prediction of COPD patients based on prepared Computerized Tomography (CT) scan images and lung sound/cough (audio) samples using machine learning techniques, which are presented in this study. METHOD The proposed multimodal framework extracts texture, histogram intensity, chroma, Mel-Frequency Cepstral Coefficients (MFCCs), and Gaussian scale space from the prepared CT images and lung sound/cough samples. Accurate data from All India Institute Medical Sciences (AIIMS), Raipur, India, and the open respiratory CT images and lung sound/cough (audio) sample dataset validate the proposed framework. The discriminatory features are selected from the extracted feature sets using unsupervised ML techniques, and customized ensemble learning techniques are applied to perform early classification and assess the severity levels of COPD patients. RESULTS The proposed framework provided 97.50%, 98%, and 95.30% accuracy for early diagnosis of COPD patients based on the fusion technique, CT diagnostic model, and cough sample model. CONCLUSION Finally, we compare the performance of the proposed framework with existing methods, current approaches, and conventional benchmark techniques for early diagnosis.
Collapse
Affiliation(s)
- Santosh Kumar
- Department of Computer Science and Engineering, IIIT-Naya Raipur, Chhattisgarh, India.
| | - Vijesh Bhagat
- Department of Computer Science and Engineering, IIIT-Naya Raipur, Chhattisgarh, India.
| | - Prakash Sahu
- Department of Computer Science and Engineering, IIIT-Naya Raipur, Chhattisgarh, India.
| | | | - Ajoy Kumar Behera
- Department of Pulmonary Medicine & TB, All India Institute of Medical Sciences (AIIMS), Raipur, Chhattisgarh, India.
| | - Mohsen Guizani
- Machine Learning Department, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab Emirates.
| | - Raffaele Gravina
- Department of Informatics, Modeling, Electronic, and System Engineering, University of Calabria, 87036 Rende, Italy.
| | - Michele Di Dio
- Department of Informatics, Modeling, Electronic, and System Engineering, University of Calabria, 87036 Rende, Italy; Annunziata Hospital Cosenza, Italy.
| | - Giancarlo Fortino
- Department of Informatics, Modeling, Electronic, and System Engineering, University of Calabria, 87036 Rende, Italy.
| | - Edward Curry
- Insight Centre for Data Analytics, University of Galway, Galway, Ireland.
| | - Saeed Hamood Alsamhi
- Insight Centre for Data Analytics, University of Galway, Galway, Ireland; Faculty of Engineering, IBB University, Ibb, Yemen.
| |
Collapse
|
7
|
Glüge S, Balabanov S, Koelzer VH, Ott T. Evaluation of deep learning training strategies for the classification of bone marrow cell images. Comput Methods Programs Biomed 2024; 243:107924. [PMID: 37979517 DOI: 10.1016/j.cmpb.2023.107924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 09/28/2023] [Accepted: 11/06/2023] [Indexed: 11/20/2023]
Abstract
BACKGROUND AND OBJECTIVE The classification of bone marrow (BM) cells by light microscopy is an important cornerstone of hematological diagnosis, performed thousands of times a day by highly trained specialists in laboratories worldwide. As the manual evaluation of blood or BM smears is very time-consuming and prone to inter-observer variation, new reliable automated systems are needed. METHODS We aim to improve the automatic classification performance of hematological cell types. Therefore, we evaluate four state-of-the-art Convolutional Neural Network (CNN) architectures on a dataset of 171,374 microscopic cytological single-cell images obtained from BM smears from 945 patients diagnosed with a variety of hematological diseases. We further evaluate the effect of an in-domain vs. out-of-domain pre-training, and assess whether class activation maps provide human-interpretable explanations for the models' predictions. RESULTS The best performing pre-trained model (Regnet_y_32gf) yields a mean precision, recall, and F1 scores of 0.787±0.060, 0.755±0.061, and 0.762±0.050, respectively. This is a 53.5% improvement in precision and 7.3% improvement in recall over previous results with CNNs (ResNeXt-50) that were trained from scratch. The out-of-domain pre-training apparently yields general feature extractors/filters that apply very well to the BM cell classification use case. The class activation maps on cell types with characteristic morphological features were found to be consistent with the explanations of a human domain expert. For example, the Auer rods in the cytoplasm were the predictive cellular feature for correctly classified images of faggot cells. CONCLUSIONS Our study provides data that can help hematology laboratories to choose the optimal training strategy for blood cell classification deep learning models to improve computer-assisted blood and bone marrow cell identification. It also highlights the need for more specific training data, i.e. images of difficult-to-classify classes, including cells labeled with disease information.
Collapse
Affiliation(s)
- Stefan Glüge
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Schloss 1, 8820 Wädenswil, Switzerland.
| | - Stefan Balabanov
- Department of Medical Oncology and Haematology, University Hospital Zurich and University of Zurich, Rämistrasse 100, 8091 Zurich, Switzerland
| | - Viktor Hendrik Koelzer
- Department of Pathology and Molecular Pathology, University Hospital Zurich and University of Zurich, Schmelzbergstrasse 12, 8091 Zurich, Switzerland
| | - Thomas Ott
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Schloss 1, 8820 Wädenswil, Switzerland
| |
Collapse
|
8
|
Képeš E, Vrábel J, Brázdil T, Holub P, Pořízka P, Kaiser J. Interpreting convolutional neural network classifiers applied to laser-induced breakdown optical emission spectra. Talanta 2024; 266:124946. [PMID: 37454514 DOI: 10.1016/j.talanta.2023.124946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 06/06/2023] [Accepted: 07/12/2023] [Indexed: 07/18/2023]
Abstract
Laser-induced breakdown spectroscopy (LIBS) is a well-established industrial tool with emerging relevance in high-stakes applications. To achieve its required analytical performance, LIBS is often coupled with advanced pattern-recognition algorithms, including machine learning models. Namely, artificial neural networks (ANNs) have recently become a frequently applied part of LIBS practitioners' toolkit. Nevertheless, ANNs are generally applied in spectroscopy as black-box models, without a real insight into their predictions. Here, we apply various post-hoc interpretation techniques with the aim of understanding the decision-making of convolutional neural networks. Namely, we find synthetic spectra that yield perfect expected classification predictions and denote these spectra class-specific prototype spectra. We investigate the simplest possible convolutional neural network (consisting of a single convolutional and fully connected layers) trained to classify the extended calibration dataset collected for the ChemCam laser-induced breakdown spectroscopy instrument of the Curiosity Mars rover. The trained convolutional neural network predominantly learned meaningful spectroscopic features which correspond to the elements comprising the major oxides found in the calibration targets. In addition, the discrete convolution operation with the learnt filters results in a crude baseline correction.
Collapse
Affiliation(s)
- Erik Képeš
- Central European Institute of Technology, Brno University of Technology, Purkyňova 656/123, CZ-61200, Brno, Czech Republic; Brno University of Technology, Faculty of Mechanical Engineering, Institute of Physical Engineering, Technická 2, CZ-61669, Brno, Czech Republic.
| | - Jakub Vrábel
- Central European Institute of Technology, Brno University of Technology, Purkyňova 656/123, CZ-61200, Brno, Czech Republic.
| | - Tomáš Brázdil
- Faculty of Informatics, Masaryk University, Botanická 68A, CZ-60200, Brno, Czech Republic.
| | - Petr Holub
- Institute of Computer Science, Masaryk University, Šumavská 416/15, CZ-60200, Brno, Czech Republic.
| | - Pavel Pořízka
- Central European Institute of Technology, Brno University of Technology, Purkyňova 656/123, CZ-61200, Brno, Czech Republic; Brno University of Technology, Faculty of Mechanical Engineering, Institute of Physical Engineering, Technická 2, CZ-61669, Brno, Czech Republic.
| | - Jozef Kaiser
- Central European Institute of Technology, Brno University of Technology, Purkyňova 656/123, CZ-61200, Brno, Czech Republic; Brno University of Technology, Faculty of Mechanical Engineering, Institute of Physical Engineering, Technická 2, CZ-61669, Brno, Czech Republic.
| |
Collapse
|
9
|
Xu F, Pan D, Zheng H, Ouyang Y, Jia Z, Zeng H. EESCN: A novel spiking neural network method for EEG-based emotion recognition. Comput Methods Programs Biomed 2024; 243:107927. [PMID: 38000320 DOI: 10.1016/j.cmpb.2023.107927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 10/16/2023] [Accepted: 11/13/2023] [Indexed: 11/26/2023]
Abstract
BACKGROUND AND OBJECTIVE Although existing artificial neural networks have achieved good results in electroencephalograph (EEG) emotion recognition, further improvements are needed in terms of bio-interpretability and robustness. In this research, we aim to develop a highly efficient and high-performance method for emotion recognition based on EEG. METHODS We propose an Emo-EEGSpikeConvNet (EESCN), a novel emotion recognition method based on spiking neural network (SNN). It consists of a neuromorphic data generation module and a NeuroSpiking framework. The neuromorphic data generation module converts EEG data into 2D frame format as input to the NeuroSpiking framework, while the NeuroSpiking framework is used to extract spatio-temporal features of EEG for classification. RESULTS EESCN achieves high emotion recognition accuracies on DEAP and SEED-IV datasets, ranging from 94.56% to 94.81% on DEAP and a mean accuracy of 79.65% on SEED-IV. Compared to existing SNN methods, EESCN significantly improves EEG emotion recognition performance. In addition, it also has the advantages of faster running speed and less memory footprint. CONCLUSIONS EESCN has shown excellent performance and efficiency in EEG-based emotion recognition with potential for practical applications requiring portability and resource constraints.
Collapse
Affiliation(s)
- FeiFan Xu
- Hangzhou Dianzi University, School of Computer Science and Technology, HangZhou, ZheJiang, China.
| | - Deng Pan
- Hangzhou Dianzi University, School of Computer Science and Technology, HangZhou, ZheJiang, China.
| | - Haohao Zheng
- Hangzhou Dianzi University, School of Computer Science and Technology, HangZhou, ZheJiang, China.
| | - Yu Ouyang
- Hangzhou Dianzi University, School of Computer Science and Technology, HangZhou, ZheJiang, China.
| | - Zhe Jia
- Hangzhou Dianzi University, School of Computer Science and Technology, HangZhou, ZheJiang, China.
| | - Hong Zeng
- Hangzhou Dianzi University, School of Computer Science and Technology, HangZhou, ZheJiang, China; Key Laboratory of Brain Machine Collaborative of Zhejiang Province, HangZhou, ZheJiang, China.
| |
Collapse
|
10
|
Yang YH, Fukiage T, Sun Z, Nishida S. Psychophysical measurement of perceived motion flow of naturalistic scenes. iScience 2023; 26:108307. [PMID: 38025782 PMCID: PMC10679809 DOI: 10.1016/j.isci.2023.108307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 08/09/2023] [Accepted: 10/20/2023] [Indexed: 12/01/2023] Open
Abstract
The neural and computational mechanisms underlying visual motion perception have been extensively investigated over several decades, but little attempt has been made to measure and analyze, how human observers perceive the map of motion vectors, or optical flow, in complex naturalistic scenes. Here, we developed a psychophysical method to assess human-perceived motion flows using local vector matching and a flash probe. The estimated perceived flow for naturalistic movies agreed with the physically correct flow (ground truth) at many points, but also showed consistent deviations from the ground truth (flow illusions) at other points. Comparisons with the predictions of various computational models, including cutting-edge computer vision algorithms and coordinate transformation models, indicated that some flow illusions are attributable to lower-level factors such as spatiotemporal pooling and signal loss, while others reflect higher-level computations, including vector decomposition. Our study demonstrates a promising data-driven psychophysical paradigm for an advanced understanding of visual motion perception.
Collapse
Affiliation(s)
- Yung-Hao Yang
- Cognitive Informatics Laboratory, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan
| | - Taiki Fukiage
- Human Information Science Laboratory, NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, 3-1, Morinosato-Wakamiya, Atsugi, Kanagawa 243-0198, Japan
| | - Zitang Sun
- Cognitive Informatics Laboratory, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan
| | - Shin’ya Nishida
- Cognitive Informatics Laboratory, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan
- Human Information Science Laboratory, NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, 3-1, Morinosato-Wakamiya, Atsugi, Kanagawa 243-0198, Japan
| |
Collapse
|
11
|
Croom S, Zhou H, Firestone C. Seeing and understanding epistemic actions. Proc Natl Acad Sci U S A 2023; 120:e2303162120. [PMID: 37983484 DOI: 10.1073/pnas.2303162120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/27/2023] [Indexed: 11/22/2023] Open
Abstract
Many actions have instrumental aims, in which we move our bodies to achieve a physical outcome in the environment. However, we also perform actions with epistemic aims, in which we move our bodies to acquire information and learn about the world. A large literature on action recognition investigates how observers represent and understand the former class of actions; but what about the latter class? Can one person tell, just by observing another person's movements, what they are trying to learn? Here, five experiments explore epistemic action understanding. We filmed volunteers playing a "physics game" consisting of two rounds: Players shook an opaque box and attempted to determine i) the number of objects hidden inside, or ii) the shape of the objects inside. Then, independent subjects watched these videos and were asked to determine which videos came from which round: Who was shaking for number and who was shaking for shape? Across several variations, observers successfully determined what an actor was trying to learn, based only on their actions (i.e., how they shook the box)-even when the box's contents were identical across rounds. These results demonstrate that humans can infer epistemic intent from physical behaviors, adding a new dimension to research on action understanding.
Collapse
Affiliation(s)
- Sholei Croom
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218
| | - Hanbei Zhou
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218
| | - Chaz Firestone
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD 21218
| |
Collapse
|
12
|
Capone C, Lupo C, Muratore P, Paolucci PS. Beyond spiking networks: The computational advantages of dendritic amplification and input segregation. Proc Natl Acad Sci U S A 2023; 120:e2220743120. [PMID: 38019856 DOI: 10.1073/pnas.2220743120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 10/11/2023] [Indexed: 12/01/2023] Open
Abstract
The brain can efficiently learn a wide range of tasks, motivating the search for biologically inspired learning rules for improving current artificial intelligence technology. Most biological models are composed of point neurons and cannot achieve state-of-the-art performance in machine learning. Recent works have proposed that input segregation (neurons receive sensory information and higher-order feedback in segregated compartments), and nonlinear dendritic computation would support error backpropagation in biological neurons. However, these approaches require propagating errors with a fine spatiotemporal structure to all the neurons, which is unlikely to be feasible in a biological network. To relax this assumption, we suggest that bursts and dendritic input segregation provide a natural support for target-based learning, which propagates targets rather than errors. A coincidence mechanism between the basal and the apical compartments allows for generating high-frequency bursts of spikes. This architecture supports a burst-dependent learning rule, based on the comparison between the target bursting activity triggered by the teaching signal and the one caused by the recurrent connections, providing support for target-based learning. We show that this framework can be used to efficiently solve spatiotemporal tasks, such as context-dependent store and recall of three-dimensional trajectories, and navigation tasks. Finally, we suggest that this neuronal architecture naturally allows for orchestrating "hierarchical imitation learning", enabling the decomposition of challenging long-horizon decision-making tasks into simpler subtasks. We show a possible implementation of this in a two-level network, where the high network produces the contextual signal for the low network.
Collapse
Affiliation(s)
- Cristiano Capone
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Roma, Rome 00185, Italy
| | - Cosimo Lupo
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Roma, Rome 00185, Italy
| | - Paolo Muratore
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Visual Neuroscience Lab, Trieste 34136, Italy
| | | |
Collapse
|
13
|
Lin S, Ramani V, Martin M, Arjunan P, Chong A, Biljecki F, Ignatius M, Poolla K, Miller C. District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images. Sci Data 2023; 10:859. [PMID: 38042845 DOI: 10.1038/s41597-023-02749-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 11/14/2023] [Indexed: 12/04/2023] Open
Abstract
This paper describes a dataset collected by infrared thermography, a non-contact, non-intrusive technique to acquire data and analyze the built environment in various aspects. While most studies focus on the city and building scales, an observatory installed on a rooftop provides high temporal and spatial resolution observations with dynamic interactions on the district scale. The rooftop infrared thermography observatory with a multi-modal platform capable of assessing a wide range of dynamic processes in urban systems was deployed in Singapore. It was placed on the top of two buildings that overlook the outdoor context of the National University of Singapore campus. The platform collects remote sensing data from tropical areas on a temporal scale, allowing users to determine the temperature trend of individual features such as buildings, roads, and vegetation. The dataset includes 1,365,921 thermal images collected on average at approximately 10-second intervals from two locations during ten months.
Collapse
Affiliation(s)
- Subin Lin
- Berkeley Education Alliance for Research in Singapore, CREATE Tower 1 Create 6 Way, 138602, Singapore, Singapore
| | - Vasantha Ramani
- Berkeley Education Alliance for Research in Singapore, CREATE Tower 1 Create 6 Way, 138602, Singapore, Singapore
| | - Miguel Martin
- Berkeley Education Alliance for Research in Singapore, CREATE Tower 1 Create 6 Way, 138602, Singapore, Singapore
| | - Pandarasamy Arjunan
- Berkeley Education Alliance for Research in Singapore, CREATE Tower 1 Create 6 Way, 138602, Singapore, Singapore
- Robert Bosch Centre for Cyber-physical Systems, Indian Institute of Science, Bengaluru, Karnataka, 560012, India
| | - Adrian Chong
- Department of the Built Environment, College of Design and Engineering, National University of Singapore, 4 Architecture Drive, 117566, Singapore, Singapore
| | - Filip Biljecki
- Department of Architecture, College of Design and Engineering, National University of Singapore, 4 Architecture Drive, 117566, Singapore, Singapore
- Department of Real Estate, Business School, National University of Singapore, 15 Kent Ridge Drive, 119245, Singapore, Singapore
| | - Marcel Ignatius
- Department of Architecture, College of Design and Engineering, National University of Singapore, 4 Architecture Drive, 117566, Singapore, Singapore
| | - Kameshwar Poolla
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Clayton Miller
- Department of the Built Environment, College of Design and Engineering, National University of Singapore, 4 Architecture Drive, 117566, Singapore, Singapore.
| |
Collapse
|
14
|
Zhu H, Yang H, Guo L, Zhang Y, Wang Y, Huang M, Wu M, Shen Q, Yang R, Cao X. FaceScape: 3D Facial Dataset and Benchmark for Single-View 3D Face Reconstruction. IEEE Trans Pattern Anal Mach Intell 2023; 45:14528-14545. [PMID: 37607140 DOI: 10.1109/tpami.2023.3307338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
In this article, we present a large-scale detailed 3D face dataset, FaceScape, and the corresponding benchmark to evaluate single-view facial 3D reconstruction. By training on FaceScape data, a novel algorithm is proposed to predict elaborate riggable 3D face models from a single image input. FaceScape dataset releases 16,940 textured 3D faces, captured from 847 subjects and each with 20 specific expressions. The 3D models contain the pore-level facial geometry that is also processed to be topologically uniform. These fine 3D facial models can be represented as a 3D morphable model for coarse shapes and displacement maps for detailed geometry. Taking advantage of the large-scale and high-accuracy dataset, a novel algorithm is further proposed to learn the expression-specific dynamic details using a deep neural network. The learned relationship serves as the foundation of our 3D face prediction system from a single image input. Different from most previous methods, our predicted 3D models are riggable with highly detailed geometry under different expressions. We also use FaceScape data to generate the in-the-wild and in-the-lab benchmark to evaluate recent methods of single-view face reconstruction. The accuracy is reported and analyzed on the dimensions of camera pose and focal length, which provides a faithful and comprehensive evaluation and reveals new challenges. The unprecedented dataset, benchmark, and code have been released to the public for research purpose.
Collapse
|
15
|
Hou C, Gu S, Xu C, Qian Y. Incremental Learning for Simultaneous Augmentation of Feature and Class. IEEE Trans Pattern Anal Mach Intell 2023; 45:14789-14806. [PMID: 37610915 DOI: 10.1109/tpami.2023.3307670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
With the emergence of new data collection ways in many dynamic environment applications, the samples are gathered gradually in the accumulated feature spaces. With the incorporation of new type features, it may result in the augmentation of class numbers. For instance, in activity recognition, using the old features during warm-up, we can separate different warm-up exercises. With the accumulation of new attributes obtained from newly added sensors, we can better separate the newly appeared formal exercises. Learning for such simultaneous augmentation of feature and class is crucial but rarely studied, particularly when the labeled samples with full observations are limited. In this paper, we tackle this problem by proposing a novel incremental learning method for Simultaneous Augmentation of Feature and Class (SAFC) in a two-stage way. To guarantee the reusability of the model trained on previous data, we add a regularizer in the current model, which can provide solid prior in training the new classifier. We also present the theoretical analyses about the generalization bound, which can validate the efficiency of model inheritance. After solving the one-shot problem, we also extend it to multi-shot. Experimental results demonstrate the effectiveness of our approaches, together with their effectiveness in activity recognition applications.
Collapse
|
16
|
Su Z, Zhang J, Wang L, Zhang H, Liu Z, Pietikainen M, Liu L. Lightweight Pixel Difference Networks for Efficient Visual Representation Learning. IEEE Trans Pattern Anal Mach Intell 2023; 45:14956-14974. [PMID: 37527290 DOI: 10.1109/tpami.2023.3300513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Recently, there have been tremendous efforts in developing lightweight Deep Neural Networks (DNNs) with satisfactory accuracy, which can enable the ubiquitous deployment of DNNs in edge devices. The core challenge of developing compact and efficient DNNs lies in how to balance the competing goals of achieving high accuracy and high efficiency. In this paper we propose two novel types of convolutions, dubbed Pixel Difference Convolution (PDC) and Binary PDC (Bi-PDC) which enjoy the following benefits: capturing higher-order local differential information, computationally efficient, and able to be integrated with existing DNNs. With PDC and Bi-PDC, we further present two lightweight deep networks named Pixel Difference Networks (PiDiNet) and Binary PiDiNet (Bi-PiDiNet) respectively to learn highly efficient yet more accurate representations for visual tasks including edge detection and object recognition. Extensive experiments on popular datasets (BSDS500, ImageNet, LFW, YTF, etc.) show that PiDiNet and Bi-PiDiNet achieve the best accuracy-efficiency trade-off. For edge detection, PiDiNet is the first network that can be trained without ImageNet, and can achieve the human-level performance on BSDS500 at 100 FPS and with 1 M parameters. For object recognition, among existing Binary DNNs, Bi-PiDiNet achieves the best accuracy and a nearly 2× reduction of computational cost on ResNet18.
Collapse
|
17
|
Yao R, Du S, Cui W, Ye A, Wen F, Zhang H, Tian Z, Gao Y. Hunter: Exploring High-Order Consistency for Point Cloud Registration With Severe Outliers. IEEE Trans Pattern Anal Mach Intell 2023; 45:14760-14776. [PMID: 37695971 DOI: 10.1109/tpami.2023.3312592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/13/2023]
Abstract
After decades of investigation, point cloud registration is still a challenging task in practice, especially when the correspondences are contaminated by a large number of outliers. It may result in a rapidly decreasing probability of generating a hypothesis close to the true transformation, leading to the failure of point cloud registration. To tackle this problem, we propose a transformation estimation method, named Hunter, for robust point cloud registration with severe outliers. The core of Hunter is to design a global-to-local exploration scheme to robustly find the correct correspondences. The global exploration aims to exploit guided sampling to generate promising initial alignments. To this end, a hypergraph-based consistency reasoning module is introduced to learn the high-order consistency among correct correspondences, which is able to yield a more distinct inlier cluster that facilitates the generation of all-inlier hypotheses. Moreover, we propose a preference-based local exploration module that exploits the preference information of top- k promising hypotheses to find a better transformation. This module can efficiently obtain multiple reliable transformation hypotheses by using a multi-initialization searching strategy. Finally, we present a distance-angle based hypothesis selection criterion to choose the most reliable transformation, which can avoid selecting symmetrically aligned false transformations. Experimental results on simulated, indoor, and outdoor datasets, demonstrate that Hunter can achieve significant superiority over the state-of-the-art methods, including both learning-based and traditional methods (as shown in Fig. 1). Moreover, experimental results also indicate that Hunter can achieve more stable performance compared with all other methods with severe outliers.
Collapse
|
18
|
Xie X, Lang C, Miao S, Cheng G, Li K, Han J. Mutual-Assistance Learning for Object Detection. IEEE Trans Pattern Anal Mach Intell 2023; 45:15171-15184. [PMID: 37756169 DOI: 10.1109/tpami.2023.3319634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]
Abstract
Object detection is a fundamental yet challenging task in computer vision. Despite the great strides made over recent years, modern detectors may still produce unsatisfactory performance due to certain factors, such as non-universal object features and single regression manner. In this paper, we draw on the idea of mutual-assistance (MA) learning and accordingly propose a robust one-stage detector, referred as MADet, to address these weaknesses. First, the spirit of MA is manifested in the head design of the detector. Decoupled classification and regression features are reintegrated to provide shared offsets, avoiding inconsistency between feature-prediction pairs induced by zero or erroneous offsets. Second, the spirit of MA is captured in the optimization paradigm of the detector. Both anchor-based and anchor-free regression fashions are utilized jointly to boost the capability to retrieve objects with various characteristics, especially for large aspect ratios, occlusion from similar-sized objects, etc. Furthermore, we meticulously devise a quality assessment mechanism to facilitate adaptive sample selection and loss term reweighting. Extensive experiments on standard benchmarks verify the effectiveness of our approach. On MS-COCO, MADet achieves 42.5% AP with vanilla ResNet50 backbone, dramatically surpassing multiple strong baselines and setting a new state of the art.
Collapse
|
19
|
Lei C, Jiang X, Chen Q. Robust Reflection Removal With Flash-Only Cues in the Wild. IEEE Trans Pattern Anal Mach Intell 2023; 45:15530-15545. [PMID: 37703147 DOI: 10.1109/tpami.2023.3314972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
We propose a simple yet effective reflection-free cue for robust reflection removal from a pair of flash and ambient (no-flash) images. The reflection-free cue exploits a flash-only image obtained by subtracting the ambient image from the corresponding flash image in raw data space. The flash-only image is equivalent to an image taken in a dark environment with only a flash on. This flash-only image is visually reflection-free and thus can provide robust cues to infer the reflection in the ambient image. Since the flash-only image usually has artifacts, we further propose a dedicated model that not only utilizes the reflection-free cue but also avoids introducing artifacts, which helps accurately estimate reflection and transmission. Our experiments on real-world images with various types of reflection demonstrate the effectiveness of our model with reflection-free flash-only cues: our model outperforms state-of-the-art reflection removal approaches by more than 5.23 dB in PSNR. We extend our approach to handheld photography to address the misalignment between the flash and no-flash pair. With misaligned training data and the alignment module, our aligned model outperforms our previous version by more than 3.19 dB in PSNR on a misaligned dataset. We also study using linear RGB images as training data.
Collapse
|
20
|
Jiang S, Li J, Zhang J, Wang Y, Xu T. Dynamic Loss for Robust Learning. IEEE Trans Pattern Anal Mach Intell 2023; 45:14420-14434. [PMID: 37665707 DOI: 10.1109/tpami.2023.3311636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
Label noise and class imbalance are common challenges encountered in real-world datasets. Existing approaches for robust learning often focus on addressing either label noise or class imbalance individually, resulting in suboptimal performance when both biases are present. To bridge this gap, this work introduces a novel meta-learning-based dynamic loss that adapts the objective functions during the training process to effectively learn a classifier from long-tailed noisy data. Specifically, our dynamic loss consists of two components: a label corrector and a margin generator. The label corrector is responsible for correcting noisy labels, while the margin generator generates per-class classification margins by capturing the underlying data distribution and the learning state of the classifier. In addition, we employ a hierarchical sampling strategy that enriches a small amount of unbiased metadata with diverse and challenging samples. This enables the joint optimization of the two components in the dynamic loss through meta-learning, allowing the classifier to effectively adapt to clean and balanced test data. Extensive experiments conducted on multiple real-world and synthetic datasets with various types of data biases, including CIFAR-10/100, Animal-10N, ImageNet-LT, and Webvision, demonstrate that our method achieves state-of-the-art accuracy.
Collapse
|
21
|
Ishikawa Y, Sugino T, Okubo K, Nakajima Y. Detecting the location of lung cancer on thoracoscopic images using deep convolutional neural networks. Surg Today 2023; 53:1380-1387. [PMID: 37354240 DOI: 10.1007/s00595-023-02708-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 04/03/2023] [Indexed: 06/26/2023]
Abstract
OBJECTIVES The prevalence of minimally invasive surgeries has increased the need for tumor detection using thoracoscopic images during lung cancer surgery. We conducted this study to analyze the efficacy of a deep convolutional neural network (DCNN) for tumor detection using recorded thoracoscopic images of pulmonary surfaces. MATERIALS AND METHODS We collected 644 intraoperative thoracoscopic images of changes in pulmonary appearance from 427 patients with lung cancer between 2012 and 2021. The lesion areas on the thoracoscopic images were detected by bounding boxes using an advanced version of YOLO, a well-known DCNN for object detection. The DCNN model was trained and evaluated by a 15-fold cross-validation scheme. Each predicted bounding box was considered successful detection when it overlapped more than 50% of the lesion areas annotated by board-certified surgeons. RESULTS AND CONCLUSIONS Precision, recall, and F1-measured values of 91.9%, 90.5%, and 91.1%, respectively, were obtained. The presence of lymphatic vessel invasion was associated with successful detection (p = 0.045). The presence of pathological pleural invasion also showed a tendency toward successful detection (p = 0.081). The proposed DCNN-based algorithm yielded an accuracy of more than 90% tumor detection. These algorithms will help surgeons detect lung cancer displayed on a screen automatically.
Collapse
Affiliation(s)
- Yuya Ishikawa
- Department of Thoracic Surgery, Tokyo Medical and Dental University, Tokyo, Japan
| | - Takaaki Sugino
- Department of Biomedical Information, Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University, 2-3-10, Surugadai, Chiyoda-ku, Tokyo, 101-0062, Japan
| | - Kenichi Okubo
- Department of Thoracic Surgery, Tokyo Medical and Dental University, Tokyo, Japan
| | - Yoshikazu Nakajima
- Department of Biomedical Information, Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University, 2-3-10, Surugadai, Chiyoda-ku, Tokyo, 101-0062, Japan.
| |
Collapse
|
22
|
Wang Z, Chen C, Dong D. Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments. IEEE Trans Neural Netw Learn Syst 2023; 34:9742-9756. [PMID: 35349452 DOI: 10.1109/tnnls.2022.3160173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Evolution strategies (ESs), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient and are much faster when many central processing units (CPUs) are available due to better parallelization. In this article, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move toward new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, instance weighted incremental evolution strategies (IW-IESs), is verified to achieve significantly improved performance on challenging RL tasks ranging from robot navigation to locomotion. This article thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.
Collapse
|
23
|
Ju C, Guan C. Tensor-CSPNet: A Novel Geometric Deep Learning Framework for Motor Imagery Classification. IEEE Trans Neural Netw Learn Syst 2023; 34:10955-10969. [PMID: 35749326 DOI: 10.1109/tnnls.2022.3172108] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Deep learning (DL) has been widely investigated in a vast majority of applications in electroencephalography (EEG)-based brain-computer interfaces (BCIs), especially for motor imagery (MI) classification in the past five years. The mainstream DL methodology for the MI-EEG classification exploits the temporospatial patterns of EEG signals using convolutional neural networks (CNNs), which have been particularly successful in visual images. However, since the statistical characteristics of visual images depart radically from EEG signals, a natural question arises whether an alternative network architecture exists apart from CNNs. To address this question, we propose a novel geometric DL (GDL) framework called Tensor-CSPNet, which characterizes spatial covariance matrices derived from EEG signals on symmetric positive definite (SPD) manifolds and fully captures the temporospatiofrequency patterns using existing deep neural networks on SPD manifolds, integrating with experiences from many successful MI-EEG classifiers to optimize the framework. In the experiments, Tensor-CSPNet attains or slightly outperforms the current state-of-the-art performance on the cross-validation and holdout scenarios in two commonly used MI-EEG datasets. Moreover, the visualization and interpretability analyses also exhibit the validity of Tensor-CSPNet for the MI-EEG classification. To conclude, in this study, we provide a feasible answer to the question by generalizing the DL methodologies on SPD manifolds, which indicates the start of a specific GDL methodology for the MI-EEG classification.
Collapse
|
24
|
Fan H, Xu P, Chen X, Li Y, Zhang Z, Hsu J, Le M, Ye E, Gao B, Demos H, Yao H, Ye T. Mask R-CNN provides efficient and accurate measurement of chondrocyte viability in the label-free assessment of articular cartilage. Osteoarthr Cartil Open 2023; 5:100415. [PMID: 38025155 PMCID: PMC10679817 DOI: 10.1016/j.ocarto.2023.100415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 11/03/2023] [Indexed: 12/01/2023] Open
Abstract
Objective Chondrocyte viability (CV) can be measured with the label-free method using second harmonic generation (SHG) and two-photon excitation autofluorescence (TPAF) imaging. To automate the image processing for the label-free CV measurement, we previously demonstrated a two-step deep-learning method: Step 1 used a U-Net to segment the lacuna area on SHG images; Step 2 used dual CNN networks to count live cells and the total number of cells in extracted cell clusters from TPAF images. This study aims to develop one-step deep learning methods to improve the efficiency of CV measurement. Method TPAF/SHG images were acquired simultaneously on cartilage samples from rats and pigs using two-photon microscopes and were merged to form RGB color images with red, green, and blue channels assigned to emission bands of oxidized flavoproteins, reduced forms of nicotinamide adenine dinucleotide, and SHG signals, respectively. Based on the Mask R-CNN, we designed a deep learning network and its denoising version using Wiener deconvolution for CV measurement. Results Using training and test datasets from rat and porcine cartilage, we have demonstrated that Mask R-CNN-based networks can segment and classify individual cells with a single-step processing flow. The absolute error (difference between the measured and the ground-truth CV) of the CV measurement using the Mask R-CNN with or without Wiener deconvolution denoising reaches 0.01 or 0.08, respectively; the error of the previous CV networks is 0.18, significantly larger than that of the Mask R-CNN methods. Conclusions Mask R-CNN-based deep-learning networks improve efficiency and accuracy of the label-free CV measurement.
Collapse
Affiliation(s)
- Hongming Fan
- Department of Bioengineering, Clemson University, SC, USA
| | - Pei Xu
- School of Computing, Clemson University, SC, USA
| | - Xun Chen
- Department of Bioengineering, Clemson University, SC, USA
| | - Yang Li
- School of Medicine, Yale University, New Haven, CT, USA
| | - Zhao Zhang
- Department of Bioengineering, Clemson University, SC, USA
| | - Jennifer Hsu
- Department of Bioengineering, Clemson University, SC, USA
- School of Computing, Clemson University, SC, USA
| | - Michael Le
- Department of Bioengineering, Clemson University, SC, USA
| | - Emily Ye
- College of Medicine, Medical University of South Carolina, Charleston, SC, USA
| | - Bruce Gao
- Department of Bioengineering, Clemson University, SC, USA
| | - Harry Demos
- Department of Orthopaedics & Physical Medicine, Medical University of South Carolina, Charleston, SC, USA
| | - Hai Yao
- Department of Bioengineering, Clemson University, SC, USA
- Department of Orthopaedics & Physical Medicine, Medical University of South Carolina, Charleston, SC, USA
- Department of Oral Health Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Tong Ye
- Department of Bioengineering, Clemson University, SC, USA
- Department of Regenerative Medicine and Cell Biology, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
25
|
Qu J, Dong W, Li Y, Hou S, Du Q. An Interpretable Unsupervised Unrolling Network for Hyperspectral Pansharpening. IEEE Trans Cybern 2023; 53:7943-7956. [PMID: 37027771 DOI: 10.1109/tcyb.2023.3241165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Existing deep convolutional neural networks (CNNs) have recently achieved great success in pansharpening. However, most deep CNN-based pansharpening models are based on "black-box" architecture and require supervision, making these methods rely heavily on the ground-truth data and lose their interpretability for specific problems during network training. This study proposes a novel interpretable unsupervised end-to-end pansharpening network, called as IU2PNet, which explicitly encodes the well-studied pansharpening observation model into an unsupervised unrolling iterative adversarial network. Specifically, we first design a pansharpening model, whose iterative process can be computed by the half-quadratic splitting algorithm. Then, the iterative steps are unfolded into a deep interpretable iterative generative dual adversarial network (iGDANet). Generator in iGDANet is interwoven by multiple deep feature pyramid denoising modules and deep interpretable convolutional reconstruction modules. In each iteration, the generator establishes an adversarial game with the spatial and spectral discriminators to update both spectral and spatial information without ground-truth images. Extensive experiments show that, compared with the state-of-the-art methods, our proposed IU2PNet exhibits very competitive performance in terms of quantitative evaluation metrics and qualitative visual effects.
Collapse
|
26
|
Liu B, Lu D, Wei D, Wu X, Wang Y, Zhang Y, Zheng Y. Improving Medical Vision-Language Contrastive Pretraining With Semantics-Aware Triage. IEEE Trans Med Imaging 2023; 42:3579-3589. [PMID: 37440389 DOI: 10.1109/tmi.2023.3294980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/15/2023]
Abstract
Medical contrastive vision-language pretraining has shown great promise in many downstream tasks, such as data-efficient/zero-shot recognition. Current studies pretrain the network with contrastive loss by treating the paired image-reports as positive samples and the unpaired ones as negative samples. However, unlike natural datasets, many medical images or reports from different cases could have large similarity especially for the normal cases, and treating all the unpaired ones as negative samples could undermine the learned semantic structure and impose an adverse effect on the representations. Therefore, we design a simple yet effective approach for better contrastive learning in medical vision-language field. Specifically, by simplifying the computation of similarity between medical image-report pairs into the calculation of the inter-report similarity, the image-report tuples are divided into positive, negative, and additional neutral groups. With this better categorization of samples, more suitable contrastive loss is constructed. For evaluation, we perform extensive experiments by applying the proposed model-agnostic strategy to two state-of-the-art pretraining frameworks. The consistent improvements on four common downstream tasks, including cross-modal retrieval, zero-shot/data-efficient image classification, and image segmentation, demonstrate the effectiveness of the proposed strategy in medical field.
Collapse
|
27
|
Guan Y, Li Y, Liu R, Meng Z, Li Y, Ying L, Du YP, Liang ZP. Subspace Model-Assisted Deep Learning for Improved Image Reconstruction. IEEE Trans Med Imaging 2023; 42:3833-3846. [PMID: 37682643 DOI: 10.1109/tmi.2023.3313421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/10/2023]
Abstract
Image reconstruction from limited and/or sparse data is known to be an ill-posed problem and a priori information/constraints have played an important role in solving the problem. Early constrained image reconstruction methods utilize image priors based on general image properties such as sparsity, low-rank structures, spatial support bound, etc. Recent deep learning-based reconstruction methods promise to produce even higher quality reconstructions by utilizing more specific image priors learned from training data. However, learning high-dimensional image priors requires huge amounts of training data that are currently not available in medical imaging applications. As a result, deep learning-based reconstructions often suffer from two known practical issues: a) sensitivity to data perturbations (e.g., changes in data sampling scheme), and b) limited generalization capability (e.g., biased reconstruction of lesions). This paper proposes a new method to address these issues. The proposed method synergistically integrates model-based and data-driven learning in three key components. The first component uses the linear vector space framework to capture global dependence of image features; the second exploits a deep network to learn the mapping from a linear vector space to a nonlinear manifold; the third is an unrolling-based deep network that captures local residual features with the aid of a sparsity model. The proposed method has been evaluated with magnetic resonance imaging data, demonstrating improved reconstruction in the presence of data perturbation and/or novel image features. The method may enhance the practical utility of deep learning-based image reconstruction.
Collapse
|
28
|
Farhat N, Lazebnik T, Monteny J, Moons CPH, Wydooghe E, van der Linden D, Zamansky A. Digitally-enhanced dog behavioral testing. Sci Rep 2023; 13:21252. [PMID: 38040814 PMCID: PMC10692085 DOI: 10.1038/s41598-023-48423-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 11/27/2023] [Indexed: 12/03/2023] Open
Abstract
Behavioral traits in dogs are assessed for a wide range of purposes such as determining selection for breeding, chance of being adopted or prediction of working aptitude. Most methods for assessing behavioral traits are questionnaire or observation-based, requiring significant amounts of time, effort and expertise. In addition, these methods might be also susceptible to subjectivity and bias, negatively impacting their reliability. In this study, we proposed an automated computational approach that may provide a more objective, robust and resource-efficient alternative to current solutions. Using part of a 'Stranger Test' protocol, we tested n = 53 dogs for their response to the presence and neutral actions of a stranger. Dog coping styles were scored by three dog behavior experts. Moreover, data were collected from their owners/trainers using the Canine Behavioral Assessment and Research Questionnaire (C-BARQ). An unsupervised clustering of the dogs' trajectories revealed two main clusters showing a significant difference in the stranger-directed fear C-BARQ category, as well as a good separation between (sufficiently) relaxed dogs and dogs with excessive behaviors towards strangers based on expert scoring. Based on the clustering, we obtained a machine learning classifier for expert scoring of coping styles towards strangers, which reached an accuracy of 78%. We also obtained a regression model predicting C-BARQ scores with varying performance, the best being Owner-Directed Aggression (with a mean average error of 0.108) and Excitability (with a mean square error of 0.032). This case study demonstrates a novel paradigm of 'machine-based' dog behavioral assessment, highlighting the value and great promise of AI in this context.
Collapse
Affiliation(s)
| | - Teddy Lazebnik
- Ariel University, Ariel, Israel.
- University College London, London, UK.
| | | | | | | | | | | |
Collapse
|
29
|
Qiu Z, Yang H, Fu J, Liu D, Xu C, Fu D. Learning Degradation-Robust Spatiotemporal Frequency-Transformer for Video Super-Resolution. IEEE Trans Pattern Anal Mach Intell 2023; 45:14888-14904. [PMID: 37669199 DOI: 10.1109/tpami.2023.3312166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges remain to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. This work proposes a novel degradation-robust Frequency-Transformer (FTVSR++) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a "divided attention" which conducts joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR++ outperforms state-of-the-art methods on different low-quality videos with clear visual margins.
Collapse
|
30
|
Niemann A, Tulamo R, Netti E, Preim B, Berg P, Cebral J, Robertson A, Saalfeld S. Multimodal exploration of the intracranial aneurysm wall. Int J Comput Assist Radiol Surg 2023; 18:2243-2252. [PMID: 36877287 PMCID: PMC10480333 DOI: 10.1007/s11548-023-02850-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 02/02/2023] [Indexed: 03/07/2023]
Abstract
PURPOSE Intracranial aneurysms (IAs) are pathological changes of the intracranial vessel wall, although clinical image data can only show the vessel lumen. Histology can provide wall information but is typically restricted to ex vivo 2D slices where the shape of the tissue is altered. METHODS We developed a visual exploration pipeline for a comprehensive view of an IA. We extract multimodal information (like stain classification and segmentation of histologic images) and combine them via 2D to 3D mapping and virtual inflation of deformed tissue. Histological data, including four stains, micro-CT data and segmented calcifications as well as hemodynamic information like wall shear stress (WSS), are combined with the 3D model of the resected aneurysm. RESULTS Calcifications were mostly present in the tissue part with increased WSS. In the 3D model, an area of increased wall thickness was identified and correlated to histology, where the Oil red O (ORO) stained images showed a lipid accumulation and the alpha-smooth muscle actin (aSMA) stained images showed a slight loss of muscle cells. CONCLUSION Our visual exploration pipeline combines multimodal information about the aneurysm wall to improve the understanding of wall changes and IA development. The user can identify regions and correlate how hemodynamic forces, e.g. WSS, are reflected by histological structures of the vessel wall, wall thickness and calcifications.
Collapse
Affiliation(s)
- Annika Niemann
- Department of Simulation and Graphics, Otto-von-Guericke University, Magdeburg, Germany
- STIMULATE Research Campus, Magdeburg, Germany
| | - Riikka Tulamo
- Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Eliisa Netti
- Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Bernhard Preim
- Department of Simulation and Graphics, Otto-von-Guericke University, Magdeburg, Germany
- STIMULATE Research Campus, Magdeburg, Germany
| | - Philipp Berg
- STIMULATE Research Campus, Magdeburg, Germany
- Department of Medical Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Juan Cebral
- Computational Hemodynamics Lab, Georg Mason University, Fairfax, USA
| | - Anne Robertson
- Department of Mechanical Engineering and Materials Science, University of Pittsburgh, Pittsburgh, USA
| | - Sylvia Saalfeld
- Department of Simulation and Graphics, Otto-von-Guericke University, Magdeburg, Germany.
- STIMULATE Research Campus, Magdeburg, Germany.
| |
Collapse
|
31
|
Guo L, Nahm W. Texture synthesis for generating realistic-looking bronchoscopic videos. Int J Comput Assist Radiol Surg 2023; 18:2287-2293. [PMID: 37162734 PMCID: PMC10632244 DOI: 10.1007/s11548-023-02874-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 03/15/2023] [Indexed: 05/11/2023]
Abstract
PURPOSE Synthetic realistic-looking bronchoscopic videos are needed to develop and evaluate depth estimation methods as part of investigating vision-based bronchoscopic navigation system. To generate these synthetic videos under the circumstance where access to real bronchoscopic images/image sequences is limited, we need to create various realistic-looking image textures of the airway inner surface with large size using a small number of real bronchoscopic image texture patches. METHODS A generative adversarial networks-based method is applied to create realistic-looking textures of the airway inner surface by learning from a limited number of small texture patches from real bronchoscopic images. By applying a purely convolutional architecture without any fully connected layers, this method allows the production of textures with arbitrary size. RESULTS Authentic image textures of airway inner surface are created. An example of the synthesized textures and two frames of the thereby generated bronchoscopic video are shown. The necessity and sufficiency of the generated textures as image features for further depth estimation methods are demonstrated. CONCLUSIONS The method can generate textures of the airway inner surface that meet the requirements for the texture itself and for the thereby generated bronchoscopic videos, including "realistic-looking," "long-term temporal consistency," "sufficient image features for depth estimation," and "large size and variety of synthesized textures." Besides, it also shows advantages with respect to the easy accessibility to required data source. A further validation of this approach is planned by utilizing the realistic-looking bronchoscopic videos with textures generated by this method as training and test data for some depth estimation networks.
Collapse
Affiliation(s)
- Lu Guo
- Karlsruhe Institute of Technology, Kaiserstraße 12, Karlsruhe, 76131, Germany.
| | - Werner Nahm
- Karlsruhe Institute of Technology, Kaiserstraße 12, Karlsruhe, 76131, Germany
| |
Collapse
|
32
|
Li X, Long M, Huang J, Wu J, Shen H, Zhou F, Hou J, Xu Y, Wang D, Mei L, Liu Y, Hu T, Lei C. An orientation-free ring feature descriptor with stain-variability normalization for pathology image matching. Comput Biol Med 2023; 167:107675. [PMID: 37976825 DOI: 10.1016/j.compbiomed.2023.107675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 10/08/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023]
Abstract
Comprehensively analyzing the corresponding regions in the images of serial slices stained using different methods is a common but important operation in pathological diagnosis. To help increase the efficiency of the analysis, various image registration methods are proposed to match the corresponding regions in different images, but their performance is highly influenced by the rotations, deformations, and variations of staining between the serial pathology images. In this work, we propose an orientation-free ring feature descriptor with stain-variability normalization for pathology image matching. Specifically, we normalize image staining to similar levels to minimize the impact of staining differences on pathology image matching. To overcome the rotation and deformation issues, we propose a rotation-invariance orientation-free ring feature descriptor that generates novel adaptive bins from ring features to build feature vectors. We measure the Euclidean distance of the feature vectors to evaluate keypoint similarity to achieve pathology image matching. A total of 46 pairs of clinical pathology images in hematoxylin-eosin and immunohistochemistry straining to verify the performance of our method. Experimental results indicate that our method meets the pathology image matching accuracy requirements (error ¡ 300μm), especially competent for large-angle rotation cases common in clinical practice.
Collapse
Affiliation(s)
- Xiaoxiao Li
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China
| | - Mengping Long
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China; Department of Pathology, Peking University Cancer Hospital, Beijing 100142, China
| | - Jin Huang
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China
| | - Jianghua Wu
- Department of Pathology, Peking University Cancer Hospital, Beijing 100142, China
| | - Hui Shen
- Department of Hematology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Fuling Zhou
- Department of Hematology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Jinxuan Hou
- Department of Thyroid and Breast Surgery, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Yu Xu
- Department of Radiation and Medical Oncology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Du Wang
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China
| | - Liye Mei
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China; School of Computer Science, Hubei University of Technology, Wuhan, 430068, China.
| | - Yiqiang Liu
- Department of Pathology, Peking University Cancer Hospital, Beijing 100142, China
| | - Taobo Hu
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China; Department of Breast Surgery, Peking University People's Hospital, Beijing, 100044, China
| | - Cheng Lei
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China; Suzhou Institute of Wuhan University, Suzhou, 215000, China; Shenzhen Institute of Wuhan University, Shenzhen, 518057, China.
| |
Collapse
|
33
|
Li Z, Wang Y, Zhao Q, Zhang S, Meng D. A Tensor-Based Online RPCA Model for Compressive Background Subtraction. IEEE Trans Neural Netw Learn Syst 2023; 34:10668-10682. [PMID: 35536805 DOI: 10.1109/tnnls.2022.3170789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Background subtraction of videos has been a fundamental research topic in computer vision in the past decades. To alleviate the computation burden and enhance the efficiency, background subtraction from online compressive measurements has recently attracted much attention. However, current methods still have limitations. First, they are all based on matrix modeling, which breaks the spatial structure within video frames. Second, they generally ignore the complex disturbance within the background, which reduces the efficiency of the low-rank assumption. To alleviate this issue, we propose a tensor-based online compressive video reconstruction and background subtraction method, abbreviated as NIOTenRPCA, by explicitly modeling the background disturbance in different frames as nonidentical but correlated noise. By virtue of such sophisticated modeling, the proposed method can well adapt to complex video scenes and, thus, perform more robustly. Extensive experiments on a series of real-world video datasets have demonstrated the effectiveness of the proposed method compared with the existing state of the arts. The code of our method is released on the website: https://github.com/crystalzina/NIOTenRPCA.
Collapse
|
34
|
Wang Z, Zhu H, Huang B, Wang Z, Lu W, Chen N, Wang Y. M-MSSEU: source-free domain adaptation for multi-modal stroke lesion segmentation using shadowed sets and evidential uncertainty. Health Inf Sci Syst 2023; 11:46. [PMID: 37780536 PMCID: PMC10539264 DOI: 10.1007/s13755-023-00247-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 09/08/2023] [Indexed: 10/03/2023] Open
Abstract
Due to the unavailability of source domain data encountered in unsupervised domain adaptation, there has been an increasing number of studies on source-free domain adaptation (SFDA) in recent years. To better solve the SFDA problem and effectively leverage the multi-modal information in medical images, this paper presents a novel SFDA method for multi-modal stroke lesion segmentation in which evidential deep learning instead of convolutional neural network. Specifically, for multi-modal stroke images, we design a multi-modal opinion fusion module which uses Dempster-Shafer evidence theory for decision fusion of different modalities. Besides, for the SFDA problem, we use the pseudo label learning method, which obtains pseudo labels from the pre-trained source model to perform the adaptation process. To solve the unreliability of pseudo label caused by domain shift, we propose a pseudo label filtering scheme using shadowed sets theory and a pseudo label refining scheme using evidential uncertainty. These two schemes can automatically extract unreliable parts in pseudo labels and jointly improve the quality of pseudo labels with low computational costs. Experiments on two multi-modal stroke lesion datasets demonstrate the superiority of our method over other state-of-the-art SFDA methods.
Collapse
Affiliation(s)
- Zhicheng Wang
- School of Information Science and Engineering, East China University of Science and Technology, No.130 Meilong Road, Shanghai, 200237 China
| | - Hongqing Zhu
- School of Information Science and Engineering, East China University of Science and Technology, No.130 Meilong Road, Shanghai, 200237 China
| | - Bingcang Huang
- Department of Radiology, Gongli Hospital of Shanghai Pudong New Area, Shanghai, 200135 China
| | - Ziying Wang
- School of Information Science and Engineering, East China University of Science and Technology, No.130 Meilong Road, Shanghai, 200237 China
| | - Weiping Lu
- Department of Radiology, Gongli Hospital of Shanghai Pudong New Area, Shanghai, 200135 China
| | - Ning Chen
- School of Information Science and Engineering, East China University of Science and Technology, No.130 Meilong Road, Shanghai, 200237 China
| | - Ying Wang
- Shanghai Health Commission Key Lab of Artificial Intelligence (AI)-Based Management of Inflammation and Chronic Diseases, Sino-French Cooperative Central Lab, Gongli Hospital of Shanghai Pudong New Area, Shanghai, 200135 China
| |
Collapse
|
35
|
Chen Z, Wu XJ, Xu T, Kittler J. Discriminative Dictionary Pair Learning With Scale-Constrained Structured Representation for Image Classification. IEEE Trans Neural Netw Learn Syst 2023; 34:10225-10239. [PMID: 37015383 DOI: 10.1109/tnnls.2022.3165217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The dictionary pair learning (DPL) model aims to design a synthesis dictionary and an analysis dictionary to accomplish the goal of rapid sample encoding. In this article, we propose a novel structured representation learning algorithm based on the DPL for image classification. It is referred to as discriminative DPL with scale-constrained structured representation (DPL-SCSR). The proposed DPL-SCSR utilizes the binary label matrix of dictionary atoms to project the representation into the corresponding label space of the training samples. By imposing a non-negative constraint, the learned representation adaptively approximates a block-diagonal structure. This innovative transformation is also capable of controlling the scale of the block-diagonal representation by enforcing the sum of within-class coefficients of each sample to 1, which means that the dictionary atoms of each class compete to represent the samples from the same class. This implies that the requirement of similarity preservation is considered from the perspective of the constraint on the sum of coefficients. More importantly, the DPL-SCSR does not need to design a classifier in the representation space as the label matrix of the dictionary can also be used as an efficient linear classifier. Finally, the DPL-SCSR imposes the l2,p -norm on the analysis dictionary to make the process of feature extraction more interpretable. The DPL-SCSR seamlessly incorporates the scale-constrained structured representation learning, within-class similarity preservation of representation, and the linear classifier into one regularization term, which dramatically reduces the complexity of training and parameter tuning. The experimental results on several popular image classification datasets show that our DPL-SCSR can deliver superior performance compared with the state-of-the-art (SOTA) dictionary learning methods. The MATLAB code of this article is available at https://github.com/chenzhe207/DPL-SCSR.
Collapse
|
36
|
Chen Z, Wu XJ, Xu T, Kittler J. Fast Self-Guided Multi-View Subspace Clustering. IEEE Trans Image Process 2023; 32:6514-6525. [PMID: 37030827 DOI: 10.1109/tip.2023.3261746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Multi-view subspace clustering is an important topic in cluster analysis. Its aim is to utilize the complementary information conveyed by multiple views of objects to be clustered. Recently, view-shared anchor learning based multi-view clustering methods have been developed to speed up the learning of common data representation. Although widely applied to large-scale scenarios, most of the existing approaches are still faced with two limitations. First, they do not pay sufficient consideration on the negative impact caused by certain noisy views with unclear clustering structures. Second, many of them only focus on the multi-view consistency, yet are incapable of capturing the cross-view diversity. As a result, the learned complementary features may be inaccurate and adversely affect clustering performance. To solve these two challenging issues, we propose a Fast Self-guided Multi-view Subspace Clustering (FSMSC) algorithm which skillfully integrates the view-shared anchor learning and global-guided-local self-guidance learning into a unified model. Such an integration is inspired by the observation that the view with clean clustering structures will play a more crucial role in grouping the clusters when the features of all views are concatenated. Specifically, we first learn a locally-consistent data representation shared by all views in the local learning module, then we learn a globally-discriminative data representation from multi-view concatenated features in the global learning module. Afterwards, a feature selection matrix constrained by the l2,1 -norm is designed to construct a guidance from global learning to local learning. In this way, the multi-view consistent and diverse information can be simultaneously utilized and the negative impact caused by noisy views can be overcame to some extent. Extensive experiments on different datasets demonstrate the effectiveness of our proposed fast self-guided learning model, and its promising performance compared to both, the state-of-the-art non-deep and deep multi-view clustering algorithms. The code of this paper is available at https://github.com/chenzhe207/FSMSC.
Collapse
|
37
|
He H, Zhang J, Zhuang B, Cai J, Tao D. End-to-End One-Shot Human Parsing. IEEE Trans Pattern Anal Mach Intell 2023; 45:14481-14496. [PMID: 37535486 DOI: 10.1109/tpami.2023.3301672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Previous human parsing methods are limited to parsing humans into pre-defined classes, which is inflexible for practical fashion applications that often have new fashion item classes. In this paper, we define a novel one-shot human parsing (OSHP) task that requires parsing humans into an open set of classes defined by any test example. During training, only base classes are exposed, which only overlap with part of the test-time classes. To address three main challenges in OSHP, i.e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net). Firstly, an end-to-end human parsing framework is proposed to parse the query image into both coarse-grained and fine-grained human classes, which embeds rich semantic information that is shared across different granularities to identify the small-sized human classes. Then, we gradually smooth the training-time static prototypes to get robust class representations. Moreover, we employ a dynamic objective to encourage the network enhancing features' representational capability in the early training phase while improving features' transferability in the late training phase. Therefore, our method can quickly adapt to the novel classes and mitigate the testing bias issue. In addition, we add a contrastive loss at the prototype level to enforce inter-class distances, thereby discriminating the similar parts. For comprehensive evaluations on the new task, we tailor three existing popular human parsing benchmarks to the OSHP task. Experiments demonstrate that EOP-Net outperforms representative one-shot segmentation models by large margins and serves as a strong baseline for further research.
Collapse
|
38
|
Tian Y, Zhang H, Liu Y, Wang L. Recovering 3D Human Mesh From Monocular Images: A Survey. IEEE Trans Pattern Anal Mach Intell 2023; 45:15406-15425. [PMID: 37494160 DOI: 10.1109/tpami.2023.3298850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
Estimating human pose and shape from monocular images is a long-standing problem in computer vision. Since the release of statistical body models, 3D human mesh recovery has been drawing broader attention. With the same goal of obtaining well-aligned and physically plausible mesh results, two paradigms have been developed to overcome challenges in the 2D-to-3D lifting process: i) an optimization-based paradigm, where different data terms and regularization terms are exploited as optimization objectives; and ii) a regression-based paradigm, where deep learning techniques are embraced to solve the problem in an end-to-end fashion. Meanwhile, continuous efforts are devoted to improving the quality of 3D mesh labels for a wide range of datasets. Though remarkable progress has been achieved in the past decade, the task is still challenging due to flexible body motions, diverse appearances, complex environments, and insufficient in-the-wild annotations. To the best of our knowledge, this is the first survey that focuses on the task of monocular 3D human mesh recovery. We start with the introduction of body models and then elaborate recovery frameworks and training objectives by providing in-depth analyses of their strengths and weaknesses. We also summarize datasets, evaluation metrics, and benchmark results. Open issues and future directions are discussed in the end, hoping to motivate researchers and facilitate their research in this area.
Collapse
|
39
|
Zeng P, Song R, Lin Y, Li H, Chen S, Shi M, Cai G, Gong Z, Huang K, Chen Z. Abnormal maxillary sinus diagnosing on CBCT images via object detection and 'straight-forward' classification deep learning strategy. J Oral Rehabil 2023; 50:1465-1480. [PMID: 37665121 DOI: 10.1111/joor.13585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/06/2023] [Accepted: 08/18/2023] [Indexed: 09/05/2023]
Abstract
BACKGROUND Pathological maxillary sinus would affect implant treatment and even result in failure of maxillary sinus lift and implant surgery. However, the maxillary sinus abnormalities are challenging to be diagnosed through CBCT images, especially for young dentists or dentists in grassroots medical institutions without systematical education of general medicine. OBJECTIVES To develop a deep-learning-based screening model incorporating object detection and 'straight-forward' classification strategy to screen out maxillary sinus abnormalities on CBCT images. METHODS The large area of background noise outside maxillary sinus would affect the generalisation and prediction accuracy of the model, and the diversity and imbalanced distribution of imaging manifestations may bring challenges to intellectualization. Thus we adopted an object detection to limit model's observation zone and 'straight-forward' classification strategy with various tuning methods to adapt to dental clinical need and extract typical features of diverse manifestations so that turn the task into a 'normal-or-not' classification. RESULTS We successfully constructed a deep-learning model consist of well-trained detector and diagnostor module. This model achieved ideal AUROC and AUPRC of 0.953 and 0.887, reaching more than 90% accuracy at optimal cut-off. McNemar and Kappa test verified no statistical difference and high consistency between the prediction and ground truth. Dentist-model comparison test showed the model's statistically higher diagnostic performance than dental students. Visualisation method confirmed the model's effectiveness in region recognition and feature extraction. CONCLUSION The deep-learning model incorporating object detection and straightforward classification strategy could achieve satisfying predictive performance for screening maxillary sinus abnormalities on CBCT images.
Collapse
Affiliation(s)
- Peisheng Zeng
- Hospital of Stomatology, Guanghua School of Stomatology, Sun Yat-sen University and Guangdong Research Center for Dental and Cranial Rehabilitation and Material Engineering, Guangzhou, China
| | - Rihui Song
- School of Biomedical Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yixiong Lin
- Hospital of Stomatology, Guanghua School of Stomatology, Sun Yat-sen University and Guangdong Research Center for Dental and Cranial Rehabilitation and Material Engineering, Guangzhou, China
| | - Haopeng Li
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Shijie Chen
- Hospital of Stomatology, Guanghua School of Stomatology, Sun Yat-sen University and Guangdong Research Center for Dental and Cranial Rehabilitation and Material Engineering, Guangzhou, China
| | - Mengru Shi
- Hospital of Stomatology, Guanghua School of Stomatology, Sun Yat-sen University and Guangdong Research Center for Dental and Cranial Rehabilitation and Material Engineering, Guangzhou, China
| | - Gengbin Cai
- Hospital of Stomatology, Guanghua School of Stomatology, Sun Yat-sen University and Guangdong Research Center for Dental and Cranial Rehabilitation and Material Engineering, Guangzhou, China
| | - Zhuohong Gong
- Hospital of Stomatology, Guanghua School of Stomatology, Sun Yat-sen University and Guangdong Research Center for Dental and Cranial Rehabilitation and Material Engineering, Guangzhou, China
| | - Kai Huang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Zetao Chen
- Hospital of Stomatology, Guanghua School of Stomatology, Sun Yat-sen University and Guangdong Research Center for Dental and Cranial Rehabilitation and Material Engineering, Guangzhou, China
| |
Collapse
|
40
|
Sun R, Wei L, Hou X, Chen Y, Han B, Xie Y, Nie S. Molecular-subtype guided automatic invasive breast cancer grading using dynamic contrast-enhanced MRI. Comput Methods Programs Biomed 2023; 242:107804. [PMID: 37716219 DOI: 10.1016/j.cmpb.2023.107804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 04/05/2023] [Accepted: 09/05/2023] [Indexed: 09/18/2023]
Abstract
BACKGROUND AND OBJECTIVES Histological grade and molecular subtype have presented valuable references in assigning personalized or precision medicine as the significant prognostic indicators representing biological behaviors of invasive breast cancer (IBC). To evaluate a two-stage deep learning framework for IBC grading that incorporates with molecular-subtype (MS) information using DCE-MRI. METHODS In Stage I, an innovative neural network called IOS2-DA is developed, which includes a dense atrous-spatial pyramid pooling block with a pooling layer (DA) and inception-octconved blocks with double kernel squeeze-and-excitations (IOS2). This method focuses on the imaging manifestation of IBC grades and performs preliminary prediction using a novel class F1-score loss function. In Stage II, a MS attention branch is introduced to fine-tune the integrated deep vectors from IOS2-DA via Kullback-Leibler divergence. The MS-guided information is weighted with preliminary results to obtain classification values, which are analyzed by ensemble learning for tumor grade prediction on three MRI post-contrast series. Objective assessment is quantitatively evaluated by receiver operating characteristic curve analysis. DeLong test is applied to measure statistical significance (P < 0.05). RESULTS The molecular-subtype guided IOS2-DA performs significantly better than the single IOS2-DA in terms of accuracy (0.927), precision (0.942), AUC (0.927, 95% CI: [0.908, 0.946]), and F1-score (0.930). The gradient-weighted class activation maps show that the feature representations extracted from IOS2-DA are consistent with tumor areas. CONCLUSIONS IOS2-DA elucidates its potential in non-invasive tumor grade prediction. With respect to the correlation between MS and histological grade, it exhibits remarkable clinical prospects in the application of relevant clinical biomarkers to enhance the diagnostic effectiveness of IBC grading. Therefore, DCE-MRI tends to be a feasible imaging modality for the thorough preoperative assessment of breast biological behavior and carcinoma prognosis.
Collapse
Affiliation(s)
- Rong Sun
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 516 Jun-Gong Road, Shanghai 200093, China
| | - Long Wei
- School of Computer Science and Technology, Shandong Jianzhu University, Shandong, China
| | - Xuewen Hou
- School of Health Science and Engineering, University of Shanghai for Science a
|
|