1
|
Pourpanah F, Abdar M, Luo Y, Zhou X, Wang R, Lim CP, Wang XZ, Wu QMJ. A Review of Generalized Zero-Shot Learning Methods. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:4051-4070. [PMID: 35849673 DOI: 10.1109/tpami.2022.3191696] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Generalized zero-shot learning (GZSL) aims to train a model for classifying data samples under the condition that some output classes are unknown during supervised learning. To address this challenging task, GZSL leverages semantic information of the seen (source) and unseen (target) classes to bridge the gap between both seen and unseen classes. Since its introduction, many GZSL models have been formulated. In this review paper, we present a comprehensive review on GZSL. First, we provide an overview of GZSL including the problems and challenges. Then, we introduce a hierarchical categorization for the GZSL methods and discuss the representative methods in each category. In addition, we discuss the available benchmark data sets and applications of GZSL, along with a discussion on the research gaps and directions for future investigations.
Collapse
|
2
|
Jin Y, Lu H, Li Z, Wang Y. A cross-modal deep metric learning model for disease diagnosis based on chest x-ray images. MULTIMEDIA TOOLS AND APPLICATIONS 2023; 82:1-22. [PMID: 37362731 PMCID: PMC10015533 DOI: 10.1007/s11042-023-14790-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 08/12/2022] [Accepted: 02/05/2023] [Indexed: 06/28/2023]
Abstract
The emergence of unknown diseases is often with few or no samples available. Zero-shot learning and few-shot learning have promising applications in medical image analysis. In this paper, we propose a Cross-Modal Deep Metric Learning Generalized Zero-Shot Learning (CM-DML-GZSL) model. The proposed network consists of a visual feature extractor, a fixed semantic feature extractor, and a deep regression module. The network belongs to a two-stream network for multiple modalities. In a multi-label setting, each sample contains a small number of positive labels and a large number of negative labels on average. This positive-negative imbalance dominates the optimization procedure and may prevent the establishment of an effective correspondence between visual features and semantic vectors during training, resulting in a low degree of accuracy. A novel weighted focused Euclidean distance metric loss is introduced in this regard. This loss not only can dynamically increase the weight of hard samples and decrease the weight of simple samples, but it can also promote the connection between samples and semantic vectors corresponding to their positive labels, which helps mitigate bias in predicting unseen classes in the generalized zero-shot learning setting. The weighted focused Euclidean distance metric loss function can dynamically adjust sample weights, enabling zero-shot multi-label learning for chest X-ray diagnosis, as experimental results on large publicly available datasets demonstrate.
Collapse
Affiliation(s)
- Yufei Jin
- China JiLiang University, Hangzhou, 310018 Zhejiang China
- Key Laboratory of Electromagnetic Wave Information Technology and Metrology of Zhejiang Province, Hangzhou, 310018 Zhejiang China
| | - Huijuan Lu
- China JiLiang University, Hangzhou, 310018 Zhejiang China
- Key Laboratory of Electromagnetic Wave Information Technology and Metrology of Zhejiang Province, Hangzhou, 310018 Zhejiang China
| | - Zhao Li
- Zhejiang University, Hangzhou, 310018 Zhejiang China
| | - Yanbin Wang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310018 Zhejiang China
| |
Collapse
|
3
|
Wei K, Deng C, Yang X, Tao D. Incremental Zero-Shot Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13788-13799. [PMID: 34591777 DOI: 10.1109/tcyb.2021.3110369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The goal of zero-shot learning (ZSL) is to recognize objects from unseen classes correctly without corresponding training samples. The existing ZSL methods are trained on a set of predefined classes and do not have the ability to learn from a stream of training data. However, in many real-world applications, training data are collected incrementally; this is one of the main reasons why ZSL methods cannot be applied to certain real-world situations. Accordingly, in order to handle practical learning tasks of this kind, we introduce a novel ZSL setting, referred to as incremental ZSL (IZSL), the goal of which is to accumulate historical knowledge and alleviate Catastrophic Forgetting to facilitate better recognition when incrementally trained on new classes. We further propose a novel method to realize IZSL, which employs a generative replay strategy to produce virtual samples of previously seen classes. The historical knowledge is then transferred from the former learning step to the current step through joint training on both real new and virtual old data. Subsequently, a knowledge distillation strategy is leveraged to distill the knowledge from the former model to the current model, which regularizes the training process of the current model. In addition, our method can be flexibly equipped with the most generative-ZSL methods to tackle IZSL. Extensive experiments on three challenging benchmarks indicate that the proposed method can effectively tackle the IZSL problem effectively, while the existing ZSL methods fail.
Collapse
|
4
|
Ji Z, Yu X, Yu Y, Pang Y, Zhang Z. Semantic-Guided Class-Imbalance Learning Model for Zero-Shot Image Classification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6543-6554. [PMID: 34043516 DOI: 10.1109/tcyb.2020.3004641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we focus on the task of zero-shot image classification (ZSIC) that equips a learning system with the ability to recognize visual images from unseen classes. In contrast to the traditional image classification, ZSIC more easily suffers from the class-imbalance issue since it is more concerned with the class-level knowledge transferring capability. In the real world, the sample numbers of different categories generally follow a long-tailed distribution, and the discriminative information in the sample-scarce seen classes is hard to transfer to the related unseen classes in the traditional batch-based training manner, which degrades the overall generalization ability a lot. To alleviate the class-imbalance issue in ZSIC, we propose a sample-balanced training process to encourage all training classes to contribute equally to the learned model. Specifically, we randomly select the same number of images from each class across all training classes to form a training batch to ensure that the sample-scarce classes contribute equally as those classes with sufficient samples during each iteration. Considering that the instances from the same class differ in class representativeness, we further develop an efficient semantic-guided feature fusion model to obtain the discriminative class visual prototype for the following visual-semantic interaction process via distributing different weights to the selected samples based on their class representativeness. Extensive experiments on three imbalanced ZSIC benchmark datasets for both traditional ZSIC and generalized ZSIC tasks demonstrate that our approach achieves promising results, especially for the unseen categories that are closely related to the sample-scarce seen categories. Besides, the experimental results on two class-balanced datasets show that the proposed approach also improves the classification performance against the baseline model.
Collapse
|
5
|
Hu Y, Chapman A, Wen G, Hall DW. What Can Knowledge Bring to Machine Learning?—A Survey of Low-shot Learning for Structured Data. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3510030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Supervised machine learning has several drawbacks that make it difficult to use in many situations. Drawbacks include heavy reliance on massive training data, limited generalizability, and poor expressiveness of high-level semantics. Low-shot Learning attempts to address these drawbacks. Low-shot learning allows the model to obtain good predictive power with very little or no training data, where structured knowledge plays a key role as a high-level semantic representation of human. This article will review the fundamental factors of low-shot learning technologies, with a focus on the operation of structured knowledge under different low-shot conditions. We also introduce other techniques relevant to low-shot learning. Finally, we point out the limitations of low-shot learning, the prospects and gaps of industrial applications, and future research directions.
Collapse
Affiliation(s)
- Yang Hu
- University of Southampton, United Kingdom and South China University of Technology, Guangzhou, Guangdong, China
| | - Adriane Chapman
- University of Southampton, Southampton, Hampshire, United Kingdom
| | - Guihua Wen
- South China University of Technology, Guangzhou, Guangdong, China
| | - Dame Wendy Hall
- University of Southampton, Southampton, Hampshire, United Kingdom
| |
Collapse
|
6
|
Mi JX, Zhang Z, Tai D, Zhou LF. Attribute self-representation steered by exclusive lasso for zero-shot learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03497-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
7
|
Li X, Fang M, Chen B. An active unseen sample selection framework for generalized zero-shot classification. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01509-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
8
|
Xu X, Aggarwal D, Shankar K. Instantaneous Property Prediction and Inverse Design of Plasmonic Nanostructures Using Machine Learning: Current Applications and Future Directions. NANOMATERIALS (BASEL, SWITZERLAND) 2022; 12:633. [PMID: 35214962 PMCID: PMC8874423 DOI: 10.3390/nano12040633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 02/06/2023]
Abstract
Advances in plasmonic materials and devices have given rise to a variety of applications in photocatalysis, microscopy, nanophotonics, and metastructures. With the advent of computing power and artificial neural networks, the characterization and design process of plasmonic nanostructures can be significantly accelerated using machine learning as opposed to conventional FDTD simulations. The machine learning (ML) based methods can not only perform with high accuracy and return optical spectra and optimal design parameters, but also maintain a stable high computing efficiency without being affected by the structural complexity. This work reviews the prominent ML methods involved in forward simulation and inverse design of plasmonic nanomaterials, such as Convolutional Neural Networks, Generative Adversarial Networks, Genetic Algorithms and Encoder-Decoder Networks. Moreover, we acknowledge the current limitations of ML methods in the context of plasmonics and provide perspectives on future research directions.
Collapse
Affiliation(s)
| | | | - Karthik Shankar
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada; (X.X.); (D.A.)
| |
Collapse
|
9
|
Zhao Y, Xu T, Liu X, Guo D, Hu Z, Liu H, Li Y. Visual feature synthesis with semantic reconstructor for traditional and generalized zero‐shot object classification. INT J INTELL SYST 2022. [DOI: 10.1002/int.22811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Ye Zhao
- School of Computer and Information Hefei University of Technology Hefei China
| | - Tingting Xu
- School of Computer and Information Hefei University of Technology Hefei China
| | - Xueliang Liu
- School of Computer and Information Hefei University of Technology Hefei China
| | - Dan Guo
- School of Computer and Information Hefei University of Technology Hefei China
| | - Zhenzhen Hu
- School of Computer and Information Hefei University of Technology Hefei China
| | - Hengchang Liu
- School of Computer Sciences University of Electronic Science and Technology of China Chengdu China
| | - Yicong Li
- School of Computing National University of Singapore Singapore Singapore
| |
Collapse
|
10
|
Shermin T, Teng SW, Sohel F, Murshed M, Lu G. Bidirectional Mapping Coupled GAN for Generalized Zero-Shot Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:721-733. [PMID: 34928799 DOI: 10.1109/tip.2021.3135480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Bidirectional mapping-based generalized zero-shot learning (GZSL) methods rely on the quality of synthesized features to recognize seen and unseen data. Therefore, learning a joint distribution of seen-unseen classes and preserving the distinction between seen-unseen classes is crucial for GZSL methods. However, existing methods only learn the underlying distribution of seen data, although unseen class semantics are available in the GZSL problem setting. Most methods neglect retaining seen-unseen classes distinction and use the learned distribution to recognize seen and unseen data. Consequently, they do not perform well. In this work, we utilize the available unseen class semantics alongside seen class semantics and learn joint distribution through a strong visual-semantic coupling. We propose a bidirectional mapping coupled generative adversarial network (BMCoGAN) by extending the concept of the coupled generative adversarial network into a bidirectional mapping model. We further integrate a Wasserstein generative adversarial optimization to supervise the joint distribution learning. We design a loss optimization for retaining distinctive information of seen-unseen classes in the synthesized features and reducing bias towards seen classes, which pushes synthesized seen features towards real seen features and pulls synthesized unseen features away from real seen features. We evaluate BMCoGAN on benchmark datasets and demonstrate its superior performance against contemporary methods.
Collapse
|
11
|
Lai N, Kan M, Han C, Song X, Shan S. Learning to Learn Adaptive Classifier-Predictor for Few-Shot Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3458-3470. [PMID: 32755872 DOI: 10.1109/tnnls.2020.3011526] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Few-shot learning aims to learn a well-performing model from a few labeled examples. Recently, quite a few works propose to learn a predictor to directly generate model parameter weights with episodic training strategy of meta-learning and achieve fairly promising performance. However, the predictor in these works is task-agnostic, which means that the predictor cannot adjust to novel tasks in the testing phase. In this article, we propose a novel meta-learning method to learn how to learn task-adaptive classifier-predictor to generate classifier weights for few-shot classification. Specifically, a meta classifier-predictor module, (MPM) is introduced to learn how to adaptively update a task-agnostic classifier-predictor to a task-specialized one on a novel task with a newly proposed center-uniqueness loss function. Compared with previous works, our task-adaptive classifier-predictor can better capture characteristics of each category in a novel task and thus generate a more accurate and effective classifier. Our method is evaluated on two commonly used benchmarks for few-shot classification, i.e., miniImageNet and tieredImageNet. Ablation study verifies the necessity of learning task-adaptive classifier-predictor and the effectiveness of our newly proposed center-uniqueness loss. Moreover, our method achieves the state-of-the-art performance on both benchmarks, thus demonstrating its superiority.
Collapse
|
12
|
Feng L, Zhao C. Transfer Increment for Generalized Zero-Shot Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2506-2520. [PMID: 32663133 DOI: 10.1109/tnnls.2020.3006322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Zero-shot learning (ZSL) is a successful paradigm for categorizing objects from the previously unseen classes. However, it suffers from severe performance degradation in the generalized ZSL (GZSL) setting, i.e., to recognize the test images that are from both seen and unseen classes. In this article, we present a simple but effective mechanism for GZSL and more open scenarios based on a transfer-increment strategy. On the one hand, a dual-knowledge-source-based generative model is constructed to tackle the missing data problem. Specifically, the local relational knowledge extracted from the label-embedding space and the global relational knowledge, which is the estimated data center in the feature-embedding space, are concurrently considered to synthesize the virtual exemplars. On the other hand, we further explore the training issue for the generative models under the GZSL setting. Two incremental training modes are designed to learn directly the unseen classes from the synthesized exemplars instead of the training classifiers with the seen and synthesized unseen exemplars together. It not only presents an effective unseen class learning but also requires less computing and storage resources in practical application. Comprehensive experiments are conducted based on five benchmark data sets. In comparison with the state-of-the-art methods, both the generating and training processes are considered for virtual exemplars by the proposed transfer-increment strategy, which results in a significant improvement in the conventional and GZSL tasks.
Collapse
|
13
|
|
14
|
|
15
|
Song J, Shi G, Xie X, Wu Q, Zhang M. Domain-aware Stacked AutoEncoders for zero-shot learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
16
|
|
17
|
Zhang H, Liu J, Yao Y, Long Y. Pseudo distribution on unseen classes for generalized zero shot learning. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2020.05.021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
18
|
Ji Z, Cui B, Li H, Jiang YG, Xiang T, Hospedales T, Fu Y. Deep Ranking for Image Zero-Shot Multi-Label Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:6549-6560. [PMID: 32406834 DOI: 10.1109/tip.2020.2991527] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
During the past decade, both multi-label learning and zero-shot learning have attracted huge research attention, and significant progress has been made. Multi-label learning algorithms aim to predict multiple labels given one instance, while most existing zero-shot learning approaches target at predicting a single testing label for each unseen class via transferring knowledge from auxiliary seen classes to target unseen classes. However, relatively less effort has been made on predicting multiple labels in the zero-shot setting, which is nevertheless a quite challenging task. In this work, we investigate and formalize a flexible framework consisting of two components, i.e., visual-semantic embedding and zero-shot multi-label prediction. First, we present a deep regression model to project the visual features into the semantic space, which explicitly exploits the correlations in the intermediate semantic layer of word vectors and makes label prediction possible. Then, we formulate the label prediction problem as a pairwise one and employ Ranking SVM to seek the unique multi-label correlations in the embedding space. Furthermore, we provide a transductive multi-label zeroshot prediction approach that exploits the testing data manifold structure. We demonstrate the effectiveness of the proposed approach on three popular multi-label datasets with state-of-theart performance obtained on both conventional and generalized ZSL settings.
Collapse
|