1
|
Qin H, Ma X, Ding Y, Li X, Zhang Y, Ma Z, Wang J, Luo J, Liu X. BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to Real-Network Performance. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10674-10686. [PMID: 37027695 DOI: 10.1109/tnnls.2023.3243259] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Deep neural networks, such as the deep-FSMN, have been widely studied for keyword spotting (KWS) applications while suffering expensive computation and storage. Therefore, network compression technologies such as binarization are studied to deploy KWS models on edge. In this article, we present a strong yet efficient binary neural network for KWS, namely, BiFSMNv2, pushing it to the real-network accuracy performance. First, we present a dual-scale thinnable 1-bit-architecture (DTA) to recover the representation capability of the binarized computation units by dual-scale activation binarization and liberate the speedup potential from an overall architecture perspective. Second, we also construct a frequency-independent distillation (FID) scheme for KWS binarization-aware training, which distills the high- and low-frequency components independently to mitigate the information mismatch between full-precision and binarized representations. Moreover, we propose the learning propagation binarizer (LPB), a general and efficient binarizer that enables the forward and backward propagation of binary KWS networks to be continuously improved through learning. We implement and deploy BiFSMNv2 on ARMv8 real-world hardware with a novel fast bitwise computation kernel (FBCK), which is proposed to fully use registers and increase instruction throughput. Comprehensive experiments show our BiFSMNv2 outperforms the existing binary networks for KWS by convincing margins across different datasets and achieves comparable accuracy with the full-precision networks (only a tiny 1.51% drop on Speech Commands V1-12). We highlight that benefiting from the compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an impressive 25.1× speedup and 20.2× storage-saving on edge hardware.
Collapse
|
2
|
Lai Y, Guan W, Luo L, Guo Y, Song H, Meng H. Bayesian Estimation of Inverted Beta Mixture Models With Extended Stochastic Variational Inference for Positive Vector Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6948-6962. [PMID: 36279334 DOI: 10.1109/tnnls.2022.3213518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The finite inverted beta mixture model (IBMM) has been proven to be efficient in modeling positive vectors. Under the traditional variational inference framework, the critical challenge in Bayesian estimation of the IBMM is that the computational cost of performing inference with large datasets is prohibitively expensive, which often limits the use of Bayesian approaches to small datasets. An efficient alternative provided by the recently proposed stochastic variational inference (SVI) framework allows for efficient inference on large datasets. Nevertheless, when using the SVI framework to address the non-Gaussian statistical models, the evidence lower bound (ELBO) cannot be explicitly calculated due to the intractable moment computation. Therefore, the algorithm under the SVI framework cannot directly use stochastic optimization to optimize the ELBO, and an analytically tractable solution cannot be derived. To address this problem, we propose an extended version of the SVI framework with more flexibility, namely, the extended SVI (ESVI) framework. This framework can be used in many non-Gaussian statistical models. First, some approximation strategies are applied to further lower the ELBO to avoid intractable moment calculations. Then, stochastic optimization with noisy natural gradients is used to optimize the lower bound. The excellent performance and effectiveness of the proposed method are verified in real data evaluation.
Collapse
|
3
|
Du H, Wang J, Liu M, Wang Y, Meijering E. SwinPA-Net: Swin Transformer-Based Multiscale Feature Pyramid Aggregation Network for Medical Image Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5355-5366. [PMID: 36121961 DOI: 10.1109/tnnls.2022.3204090] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The precise segmentation of medical images is one of the key challenges in pathology research and clinical practice. However, many medical image segmentation tasks have problems such as large differences between different types of lesions and similar shapes as well as colors between lesions and surrounding tissues, which seriously affects the improvement of segmentation accuracy. In this article, a novel method called Swin Pyramid Aggregation network (SwinPA-Net) is proposed by combining two designed modules with Swin Transformer to learn more powerful and robust features. The two modules, named dense multiplicative connection (DMC) module and local pyramid attention (LPA) module, are proposed to aggregate the multiscale context information of medical images. The DMC module cascades the multiscale semantic feature information through dense multiplicative feature fusion, which minimizes the interference of shallow background noise to improve the feature expression and solves the problem of excessive variation in lesion size and type. Moreover, the LPA module guides the network to focus on the region of interest by merging the global attention and the local attention, which helps to solve similar problems. The proposed network is evaluated on two public benchmark datasets for polyp segmentation task and skin lesion segmentation task as well as a clinical private dataset for laparoscopic image segmentation task. Compared with existing state-of-the-art (SOTA) methods, the SwinPA-Net achieves the most advanced performance and can outperform the second-best method on the mean Dice score by 1.68%, 0.8%, and 1.2% on the three tasks, respectively.
Collapse
|
4
|
Guo YR, Bai YQ. Two-dimensional k-subspace clustering and its applications on image recognition. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-023-01790-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
|
5
|
Baghdadi A, Manouchehri N, Patterson Z, Fan W, Bouguila N. Hierarchical Dirichlet and Pitman–Yor process mixtures of shifted‐scaled Dirichlet distributions for proportional data modeling. Comput Intell 2022. [DOI: 10.1111/coin.12558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Ali Baghdadi
- Concordia Institute for Information Systems Engineering Concordia University Montreal Quebec Canada
| | - Narges Manouchehri
- Concordia Institute for Information Systems Engineering Concordia University Montreal Quebec Canada
| | - Zachary Patterson
- Concordia Institute for Information Systems Engineering Concordia University Montreal Quebec Canada
| | - Wentao Fan
- Department of Computer Science and Technology Huaqiao University Xiamen China
| | - Nizar Bouguila
- Concordia Institute for Information Systems Engineering Concordia University Montreal Quebec Canada
| |
Collapse
|
6
|
Ma Z, Lai Y, Xie J, Meng D, Kleijn WB, Guo J, Yu J. Dirichlet Process Mixture of Generalized Inverted Dirichlet Distributions for Positive Vector Data With Extended Variational Inference. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6089-6102. [PMID: 34086578 DOI: 10.1109/tnnls.2021.3072209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A Bayesian nonparametric approach for estimation of a Dirichlet process (DP) mixture of generalized inverted Dirichlet distributions [i.e., an infinite generalized inverted Dirichlet mixture model (InGIDMM)] has been proposed. The generalized inverted Dirichlet distribution has been proven to be efficient in modeling the vectors that contain only positive elements. Under the classical variational inference (VI) framework, the key challenge in the Bayesian estimation of InGIDMM is that the expectation of the joint distribution of data and variables cannot be explicitly calculated. Therefore, numerical methods are usually applied to simulate the optimal posterior distributions. With the recently proposed extended VI (EVI) framework, we introduce lower bound approximations to the original variational objective function in the VI framework such that an analytically tractable solution can be derived. Hence, the problem in numerical simulation has been overcome. By applying the DP mixture technique, an InGIDMM can automatically determine the number of mixture components from the observed data. Moreover, the DP mixture model with an infinite number of mixture components also avoids the problems of underfitting and overfitting. The performance of the proposed approach is demonstrated with both synthesized data and real-life data applications.
Collapse
|
7
|
Liu M, Zhang C, Bai H, Zhang R, Zhao Y. Cross-Part Learning for Fine-Grained Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:748-758. [PMID: 34928798 DOI: 10.1109/tip.2021.3135477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent techniques have achieved remarkable improvements depended on mining subtle yet distinctive features for fine-grained visual classification (FGVC). While prior works directly combine discriminative features extracted from different parts, we argue that the potential interactions between different parts and their abilities to category predictions should be taken into consideration, which enables significant parts to contribute more to the decision of the sub-category. To this end, we present a Cross-Part Convolutional Neural Network (CP-CNN) in a weakly supervised manner to explore cross-learning among multi-regional features. Specifically, the context transformer is implemented to encourage joint feature learning across different parts under the guidance of a navigator. The part with the highest confidence is regarded as a navigator to deliver distinguishing characteristics to the others with lower confidence while the complementary information is retained. To locate discriminative but subtle parts precisely, a part proposal generator (PPG) is designed with the feature enhancement blocks, through which complex scale variations caused by the viewpoint diversity can be effectively alleviated. Extensive experiments on three benchmark datasets demonstrate that our proposed method consistently outperforms existing state-of-the-art methods.
Collapse
|
8
|
Application of Dirichlet Process and Support Vector Machine Techniques for Mapping Alteration Zones Associated with Porphyry Copper Deposit Using ASTER Remote Sensing Imagery. MINERALS 2021. [DOI: 10.3390/min11111235] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The application of machine learning (ML) algorithms for processing remote sensing data is momentous, particularly for mapping hydrothermal alteration zones associated with porphyry copper deposits. The unsupervised Dirichlet Process (DP) and the supervised Support Vector Machine (SVM) techniques can be executed for mapping hydrothermal alteration zones associated with porphyry copper deposits. The main objective of this investigation is to practice an algorithm that can accurately model the best training data as input for supervised methods such as SVM. For this purpose, the Zefreh porphyry copper deposit located in the Urumieh-Dokhtar Magmatic Arc (UDMA) of central Iran was selected and used as training data. Initially, using ASTER data, different alteration zones of the Zefreh porphyry copper deposit were detected by Band Ratio, Relative Band Depth (RBD), Linear Spectral Unmixing (LSU), Spectral Feature Fitting (SFF), and Orthogonal Subspace Projection (OSP) techniques. Then, using the DP method, the exact extent of each alteration was determined. Finally, the detected alterations were used as training data to identify similar alteration zones in full scene of ASTER using SVM and Spectral Angle Mapper (SAM) methods. Several high potential zones were identified in the study area. Field surveys and laboratory analysis were used to validate the image processing results. This investigation demonstrates that the application of the SVM algorithm for mapping hydrothermal alteration zones associated with porphyry copper deposits is broadly applicable to ASTER data and can be used for prospectivity mapping in many metallogenic provinces around the world.
Collapse
|
9
|
Lai Y, Guan W, Luo L, Ruan Q, Ping Y, Song H, Meng H, Pan Y. Extended variational inference for Dirichlet process mixture of Beta‐Liouville distributions for proportional data modeling. INT J INTELL SYST 2021. [DOI: 10.1002/int.22721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Yuping Lai
- School of Cyberspace Security Beijing University of Posts and Telecommunications Beijing China
| | - Wenbo Guan
- School of Information Science and Technology North China University of Technology Beijing China
| | - Lijuan Luo
- School of Business and Management Shanghai International Studies University Shanghai China
| | - Qiang Ruan
- DigApis Information Security Technology Co. Ltd. Nantong Jiangsu China
| | - Yuan Ping
- School of Information Engineering Xuchang University Xuchang China
| | - Heping Song
- School of Computer Science and Communications Engineering Jiangsu University Zhenjiang China
| | - Hongying Meng
- Electronic and Electrical Engineering Department Brunel University London UK
| | - Yu Pan
- School of Business and Management Shanghai International Studies University Shanghai China
| |
Collapse
|
10
|
Li X, Chang D, Ma Z, Tan ZH, Xue JH, Cao J, Guo J. Deep InterBoost networks for small-sample image classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.06.135] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
11
|
|
12
|
|
13
|
Xu Y, Ye T, Wang X, Lai Y, Qiu J, Zhang L, Zhang X. GMM with parameters initialization based on SVD for network threat detection. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-200066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In the field of security, the data labels are unknown or the labels are too expensive to label, so that clustering methods are used to detect the threat behavior contained in the big data. The most widely used probabilistic clustering model is Gaussian Mixture Models(GMM), which is flexible and powerful to apply prior knowledge for modelling the uncertainty of the data. Therefore, in this paper, we use GMM to build the threat behavior detection model. Commonly, Expectation Maximization (EM) and Variational Inference (VI) are used to estimate the optimal parameters of GMM. However, both EM and VI are quite sensitive to the initial values of the parameters. Therefore, we propose to use Singular Value Decomposition (SVD) to initialize the parameters. Firstly, SVD is used to factorize the data set matrix to get the singular value matrix and singular matrices. Then we calculate the number of the components of GMM by the first two singular values in the singular value matrix and the dimension of the data. Next, other parameters of GMM, such as the mixing coefficients, the mean and the covariance, are calculated based on the number of the components. After that, the initialization values of the parameters are input into EM and VI to estimate the optimal parameters of GMM. The experiment results indicate that our proposed method performs well on the parameters initialization of GMM clustering using EM and VI for estimating parameters.
Collapse
Affiliation(s)
- Yanping Xu
- School of Cyberspace Security, Hangzhou Dianzi University, Xiasha Higher Education Zone, Hangzhou, Zhejiang Province, China
| | - Tingcong Ye
- School of Cyberspace Security, Hangzhou Dianzi University, Xiasha Higher Education Zone, Hangzhou, Zhejiang Province, China
| | - Xin Wang
- School of Business and Management, Shanghai International Studies University, Shanghai, China
| | - Yuping Lai
- School of Information Science and Technology, North China University of Technology, Shijingshan District, Beijing, China
| | - Jian Qiu
- Center for Undergraduate Education, Westlake University, Xihu District, Hangzhou, China
| | - Lingjun Zhang
- School of Computer Science and Technology, Hangzhou Dianzi University, Xiasha Higher Education Zone, Hangzhou, Zhejiang, China
| | - Xia Zhang
- School of Cyberspace Security, Hangzhou Dianzi University, Xiasha Higher Education Zone, Hangzhou, Zhejiang Province, China
| |
Collapse
|
14
|
|
15
|
Qiao Y, Wu Y, Duo F, Lin W, Yang J. Siamese Neural Networks for User Identity Linkage Through Web Browsing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2741-2751. [PMID: 31425058 DOI: 10.1109/tnnls.2019.2929575] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Linking online identities of users among countless heterogeneous network services on the Internet can provide an explicit digital representation of users, which can benefit both research and industry. In recent years, user identity linkage (UIL) through the Internet has become an emerging task with great potential and many challenges. Existing works mainly focus on online social networks that consider inconsistent profiles, content, and networks as features or use sparse location-based data sets to link the online behaviors of a real person. To extend the UIL problem to a general scenario, we try to link the web-browsing behaviors of users, which can help to distinguish specific users from others, such as children or malicious users. More specifically, we propose a Siamese neural network (NN) architecture-based UIL (SAUIL) model that learns and compares the highest-level feature representation of input web-browsing behaviors with deep NNs. Although the number of matching and nonmatching pairs for the UIL problem is highly imbalanced, previous studies have not considered imbalanced UIL data sets. Therefore, we further address the imbalanced learning issue by proposing cost-sensitive SAUIL (C-SAUIL) model, which assumes higher costs for misclassifying the minority class. In the experiments, the proposed model is robust and exhibits a good performance on very large, real-world data sets collected from different regions with distinct characteristics.
Collapse
|
16
|
Shu T, Zhang B, Tang YY. Sparse Supervised Representation-Based Classifier for Uncontrolled and Imbalanced Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2847-2856. [PMID: 30582555 DOI: 10.1109/tnnls.2018.2884444] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The sparse representation-based classification (SRC) has been utilized in many applications and is an effective algorithm in machine learning. However, the performance of SRC highly depends on the data distribution. Some existing works proved that SRC could not obtain satisfactory results on uncontrolled data sets. Except the uncontrolled data sets, SRC cannot deal with imbalanced classification either. In this paper, we proposed a model named sparse supervised representation classifier (SSRC) to solve the above-mentioned issues. The SSRC involves the class label information during the test sample representation phase to deal with the uncontrolled data sets. In SSRC, each class has the opportunity to linearly represent the test sample in its subspace, which can decrease the influences of the uncontrolled data distribution. In order to classify imbalanced data sets, a class weight learning model is proposed and added to SSRC. Each class weight is learned from its corresponding training samples. The experimental results based on the AR face database (uncontrolled) and 15 KEEL data sets (imbalanced) with an imbalanced rate ranging from 1.48 to 61.18 prove SSRC can effectively classify uncontrolled and imbalanced data sets.
Collapse
|
17
|
Yang J, Wu X, Liang J, Sun X, Cheng MM, Rosin PL, Wang L. Self-Paced Balance Learning for Clinical Skin Disease Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2832-2846. [PMID: 31199274 DOI: 10.1109/tnnls.2019.2917524] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Class imbalance is a challenging problem in many classification tasks. It induces biased classification results for minority classes that contain less training samples than others. Most existing approaches aim to remedy the imbalanced number of instances among categories by resampling the majority and minority classes accordingly. However, the imbalanced level of difficulty of recognizing different categories is also crucial, especially for distinguishing samples with many classes. For example, in the task of clinical skin disease recognition, several rare diseases have a small number of training samples, but they are easy to diagnose because of their distinct visual properties. On the other hand, some common skin diseases, e.g., eczema, are hard to recognize due to the lack of special symptoms. To address this problem, we propose a self-paced balance learning (SPBL) algorithm in this paper. Specifically, we introduce a comprehensive metric termed the complexity of image category that is a combination of both sample number and recognition difficulty. First, the complexity is initialized using the model of the first pace, where the pace indicates one iteration in the self-paced learning paradigm. We then assign each class a penalty weight that is larger for more complex categories and smaller for easier ones, after which the curriculum is reconstructed by rearranging the training samples. Consequently, the model can iteratively learn discriminative representations via balancing the complexity in each pace. Experimental results on the SD-198 and SD-260 benchmark data sets demonstrate that the proposed SPBL algorithm performs favorably against the state-of-the-art methods. We also demonstrate the effectiveness of the SPBL algorithm's generalization capacity on various tasks, such as indoor scene image recognition and object classification.
Collapse
|
18
|
He X, Tang J, Du X, Hong R, Ren T, Chua TS. Fast Matrix Factorization With Nonuniform Weights on Missing Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2791-2804. [PMID: 30676983 DOI: 10.1109/tnnls.2018.2890117] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high dimensional but sparse. This poses an imbalanced learning problem since the scale of missing entries is usually much larger than that of the observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this paper, we weight the missing data nonuniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix rather than the matrix size. The key idea is twofold: 1) we apply truncated singular value decomposition on the weight matrix to get a more compact representation of the weights and 2) we learn MF parameters with elementwise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method.
Collapse
|
19
|
Expert Refined Topic Models to Edit Topic Clusters in Image Analysis Applied to Welding Engineering. INFORMATICS 2020. [DOI: 10.3390/informatics7030021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper proposes a new method to generate edited topics or clusters to analyze images for prioritizing quality issues. The approach is associated with a new way for subject matter experts to edit the cluster definitions by “zapping” or “boosting” pixels. We refer to the information entered by users or experts as “high-level” data and we are apparently the first to allow in our model for the possibility of errors coming from the experts. The collapsed Gibbs sampler is proposed that permits efficient processing for datasets involving tens of thousands of records. Numerical examples illustrate the benefits of the high-level data related to improving accuracy measured by Kullback–Leibler (KL) distance. The numerical examples include a Tungsten inert gas example from the literature. In addition, a novel laser aluminum alloy image application illustrates the assignment of welds to groups that correspond to part conformance standards.
Collapse
|
20
|
Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Wu M, Guo J, Song YZ. The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:4683-4695. [PMID: 32092002 DOI: 10.1109/tip.2020.2973812] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The key to solving fine-grained image categorization is finding discriminate and local regions that correspond to subtle visual traits. Great strides have been made, with complex networks designed specifically to learn part-level discriminate feature representations. In this paper, we show that it is possible to cultivate subtle details without the need for overly complicated network designs or training mechanisms - a single loss is all it takes. The main trick lies with how we delve into individual feature channels early on, as opposed to the convention of starting from a consolidated feature map. The proposed loss function, termed as mutual-channel loss (MC-Loss), consists of two channel-specific components: a discriminality component and a diversity component. The discriminality component forces all feature channels belonging to the same class to be discriminative, through a novel channel-wise attention mechanism. The diversity component additionally constraints channels so that they become mutually exclusive across the spatial dimension. The end result is therefore a set of feature channels, each of which reflects different locally discriminative regions for a specific class. The MC-Loss can be trained end-to-end, without the need for any bounding-box/part annotations, and yields highly discriminative regions during inference. Experimental results show our MC-Loss when implemented on top of common base networks can achieve state-of-the-art performance on all four fine-grained categorization datasets (CUB-Birds, FGVC-Aircraft, Flowers-102, and Stanford Cars). Ablative studies further demonstrate the superiority of the MC-Loss when compared with other recently proposed general-purpose losses for visual classification, on two different base networks.
Collapse
|
21
|
Fan J, Zhang Q, Zhu J, Zhang M, Yang Z, Cao H. Robust deep auto-encoding Gaussian process regression for unsupervised anomaly detection. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.078] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
22
|
Shen H, Li H. A gradient approximation algorithm based weight momentum for restricted Boltzmann machine. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.07.074] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|