1
|
Quiñones R, Samal A, Das Choudhury S, Muñoz-Arriola F. OSC-CO 2: coattention and cosegmentation framework for plant state change with multiple features. FRONTIERS IN PLANT SCIENCE 2023; 14:1211409. [PMID: 38023863 PMCID: PMC10644038 DOI: 10.3389/fpls.2023.1211409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 10/06/2023] [Indexed: 12/01/2023]
Abstract
Cosegmentation and coattention are extensions of traditional segmentation methods aimed at detecting a common object (or objects) in a group of images. Current cosegmentation and coattention methods are ineffective for objects, such as plants, that change their morphological state while being captured in different modalities and views. The Object State Change using Coattention-Cosegmentation (OSC-CO2) is an end-to-end unsupervised deep-learning framework that enhances traditional segmentation techniques, processing, analyzing, selecting, and combining suitable segmentation results that may contain most of our target object's pixels, and then displaying a final segmented image. The framework leverages coattention-based convolutional neural networks (CNNs) and cosegmentation-based dense Conditional Random Fields (CRFs) to address segmentation accuracy in high-dimensional plant imagery with evolving plant objects. The efficacy of OSC-CO2 is demonstrated using plant growth sequences imaged with infrared, visible, and fluorescence cameras in multiple views using a remote sensing, high-throughput phenotyping platform, and is evaluated using Jaccard index and precision measures. We also introduce CosegPP+, a dataset that is structured and can provide quantitative information on the efficacy of our framework. Results show that OSC-CO2 out performed state-of-the art segmentation and cosegmentation methods by improving segementation accuracy by 3% to 45%.
Collapse
Affiliation(s)
- Rubi Quiñones
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE, United States
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Ashok Samal
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Sruti Das Choudhury
- School of Computing, University of Nebraska-Lincoln, Lincoln, NE, United States
- School of Natural Resources, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Francisco Muñoz-Arriola
- School of Natural Resources, University of Nebraska-Lincoln, Lincoln, NE, United States
- Department of Biological Systems Engineering, University of Nebraska-Lincoln, Lincoln, NE, United States
| |
Collapse
|
2
|
Wang Q, Tao Z, Xia W, Gao Q, Cao X, Jiao L. Adversarial Multiview Clustering Networks With Adaptive Fusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7635-7647. [PMID: 35113790 DOI: 10.1109/tnnls.2022.3145048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The existing deep multiview clustering (MVC) methods are mainly based on autoencoder networks, which seek common latent variables to reconstruct the original input of each view individually. However, due to the view-specific reconstruction loss, it is challenging to extract consistent latent representations over multiple views for clustering. To address this challenge, we propose adversarial MVC (AMvC) networks in this article. The proposed AMvC generates each view's samples conditioning on the fused latent representations among different views to encourage a more consistent clustering structure. Specifically, multiview encoders are used to extract latent descriptions from all the views, and the corresponding generators are used to generate the reconstructed samples. The discriminative networks and the mean squared loss are jointly utilized for training the multiview encoders and generators to balance the distinctness and consistency of each view's latent representation. Moreover, an adaptive fusion layer is developed to obtain a shared latent representation, on which a clustering loss and the l1,2 -norm constraint are further imposed to improve clustering performance and distinguish the latent space. Experimental results on video, image, and text datasets demonstrate that the effectiveness of our AMvC is over several state-of-the-art deep MVC methods.
Collapse
|
3
|
Yan X, Mao Y, Ye Y, Yu H, Wang FY. Explanation guided cross-modal social image clustering. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.01.065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
4
|
Huang A, Chen W, Zhao T, Chen CW. Joint Learning of Latent Similarity and Local Embedding for Multi-View Clustering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6772-6784. [PMID: 34310300 DOI: 10.1109/tip.2021.3096086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Spectral clustering has been an attractive topic in the field of computer vision due to the extensive growth of applications, such as image segmentation, clustering and representation. In this problem, the construction of the similarity matrix is a vital element affecting clustering performance. In this paper, we propose a multi-view joint learning (MVJL) framework to achieve both a reliable similarity matrix and a latent low-dimensional embedding. Specifically, the similarity matrix to be learned is represented as a convex hull of similarity matrices from different views, where the nuclear norm is imposed to capture the principal information of multiple views and improve robustness against noise/outliers. Moreover, an effective low-dimensional representation is obtained by applying local embedding on the similarity matrix, which preserves the local intrinsic structure of data through dimensionality reduction. With these techniques, we formulate the MVJL as a joint optimization problem and derive its mathematical solution with the alternating direction method of multipliers strategy and the proximal gradient descent method. The solution, which consists of a similarity matrix and a low-dimensional representation, is ultimately integrated with spectral clustering or K-means for multi-view clustering. Extensive experimental results on real-world datasets demonstrate that MVJL achieves superior clustering performance over other state-of-the-art methods.
Collapse
|
5
|
|
6
|
Wang Q, Ding Z, Tao Z, Gao Q, Fu Y. Generative Partial Multi-View Clustering With Adaptive Fusion and Cycle Consistency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1771-1783. [PMID: 33417549 DOI: 10.1109/tip.2020.3048626] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Nowadays, with the rapid development of data collection sources and feature extraction methods, multi-view data are getting easy to obtain and have received increasing research attention in recent years, among which, multi-view clustering (MVC) forms a mainstream research direction and is widely used in data analysis. However, existing MVC methods mainly assume that each sample appears in all the views, without considering the incomplete view case due to data corruption, sensor failure, equipment malfunction, etc. In this study, we design and build a generative partial multi-view clustering model with adaptive fusion and cycle consistency, named as GP-MVC, to solve the incomplete multi-view problem by explicitly generating the data of missing views. The main idea of GP-MVC lies in two-fold. First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the shared cluster structure across multiple views. Second, view-specific generative adversarial networks with multi-view cycle consistency are developed to generate the missing data of one view conditioning on the shared representation given by other views. These two steps could be promoted mutually, where the learned common representation facilitates data imputation and the generated data could further explores the view consistency. Moreover, an weighted adaptive fusion scheme is implemented to exploit the complementary information among different views. Experimental results on four benchmark datasets are provided to show the effectiveness of the proposed GP-MVC over the state-of-the-art methods.
Collapse
|
7
|
Huang A, Wang Z, Zheng Y, Zhao T, Lin CW. Embedding Regularizer Learning for Multi-View Semi-Supervised Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6997-7011. [PMID: 34357859 DOI: 10.1109/tip.2021.3101917] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Classification remains challenging when confronted with the existence of multi-view data with limited labels. In this paper, we propose an embedding regularizer learning scheme for multi-view semi-supervised classification (ERL-MVSC). The proposed framework integrates diversity, sparsity and consensus to dexterously manipulate multi-view data with limited labels. To encourage diversity, ERL-MVSC recasts a linear regression model to derive view-specific embedding regularizers and automatically determines their weights. This is able to tactfully incorporate complementary information of different views. To ensure sparsity, ERL-MVSC imposes l2,1 -norm on a fused embedding regularizer to exploit the sparse local structure of samples, thereby conveying valuable classification information and enhancing the robustness against noise/outliers. To enhance consensus, ERL-MVSC learns a shared predicted label matrix, which serves as the comment target of multi-view classification. With these techniques, we formulate ERL-MVSC as a joint optimization problem of an embedding regularizer and a predicted label matrix, which can be solved by a coordinate descent method. Extensive experimental results on real-world datasets demonstrate the effectiveness and superiority of the proposed algorithm.
Collapse
|
8
|
Huang A, Zhao T, Lin CW. Multi-View Data Fusion Oriented Clustering via Nuclear Norm Minimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:9600-9613. [PMID: 33055030 DOI: 10.1109/tip.2020.3029883] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Image clustering remains challenging when handling image data from heterogeneous sources. Fusing the independent and complementary information existing in heterogeneous sources together facilitates to improve the image clustering performance. To this end, we propose a joint learning framework of multi-view image data fusion and clustering based on nuclear norm minimization. Specifically, we first formulate the problem as matrix factorization to a shared clustering indicator matrix and a representative coefficient matrix. The former is constrained with orthogonality and nonnegativity, which ensures the validation of clustering assignments. The latter is imposed with nuclear norm minimization to achieve compression of principal components for performance improvement. Then, an alternating minimization strategy is employed to efficiently decompose the multi-variable optimization problem into several small solvable sub-problems with closed-form solutions. Extensive experimental results on real-world image and video datasets demonstrate the superiority of proposed method over other state-of-the-art methods.
Collapse
|
9
|
Zhang L, Sun J, Wang T, Min Y, Lu H. Visual Saliency Detection via Kernelized Subspace Ranking with Active Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2258-2270. [PMID: 31613758 DOI: 10.1109/tip.2019.2945679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Saliency detection task has witnessed a booming interest for years, due to the growth of the computer vision community. In this paper, we introduce a new saliency model that performs active learning with kernelized subspace ranker (KSR) referred to as KSR-AL. This pool-based active learning algorithm ranks the informativeness of unlabeled data by considering both uncertainty sampling and information density, thereby minimizing the cost of labeling. The informative images are selected to train the KSR iteratively and incrementally. The learning model of this algorithm is designed on object-level proposals and region-based convolutional neural network (R-CNN) features, by jointly learning a Rank-SVM classifier and a subspace projection. When the active learning process meets its stopping criteria, the saliency map of each image is generated by a weight fusion of its top-ranked proposals, whose ranking scores are graded by the learned ranker. We show that the KSR-AL achieves a reduction in annotation, as well as improvement in performance, compared with the supervised learning scheme. Besides, the proposed algorithm also outperforms the state-of-the-art methods. These improvements are demonstrated by extensive experiments on six publicly available benchmark datasets.
Collapse
|