1
|
Cai J, Hao J, Yang H, Zhao X, Yang Y. A Review on Semi-supervised Clustering. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
|
2
|
Xu Y, Cui X, Zhang L, Zhao T, Wang Y. Metastasis-related gene identification by compound constrained NMF and a semisupervised cluster approach using pancancer multiomics features. Comput Biol Med 2022; 151:106263. [PMID: 36371902 DOI: 10.1016/j.compbiomed.2022.106263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 10/16/2022] [Accepted: 10/30/2022] [Indexed: 11/11/2022]
Abstract
In recent years, with the gradual increase in pancancer-related research, more attention has been given to the field of pancancer metastasis. However, the molecular mechanism of pancancer metastasis is very unclear, and identification methods for pancancer metastasis-related genes are still lacking. In view of this research status, we developed a novel pipeline to identify pancancer metastasis-related genes based on compound constrained nonnegative matrix factorization (CCNMF). To solve the above problems, the following modules were designed. A correntropy operator and feature similarity fusion (FSF) were first adopted to process the multiomics features of genes; thus, the influences caused by irrelevant biomolecular patterns, manifested as non-Gaussian noise, were minimized. CCNMF was then adopted to handle the above features with compound constraints consisting of a gene relation network and a "metastasis-related" gene set, which maximizes the biological interpretability of the metafeatures generated by NMF. Since a negative set of pancancer "metastasis-related" genes could hardly be obtained, semisupervised analyses were performed on gene features acquired by each step in our pipeline to examine our method's effect. 83% of the 236 candidates identified by the above method were associated with the metastasis of one or more cancers, 71.9% candidates were identified immune-related in pancancer in addition to the hallmark genes. Our study provides an effective and interpretable method for identifying metastasis-related as well as immune-related genes, and the method is successfully applied to TCGA pancancer data.
Collapse
Affiliation(s)
- Yining Xu
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| | - Xinran Cui
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| | - Liyuan Zhang
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| | - Tianyi Zhao
- School of medicine and Health, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| | - Yadong Wang
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| |
Collapse
|
3
|
Semi-supervised nonnegative matrix factorization with pairwise constraints for image clustering. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01614-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
4
|
Huang D, Hu J, Li T, Du S, Chen H. Consistency regularization for deep semi-supervised clustering with pairwise constraints. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01599-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
5
|
Hazratgholizadeh R, Balafar MA, Derakhshi MRF. Active constrained deep embedded clustering with dual source. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03752-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
Liu M, Yang Z, Han W, Chen J, Sun W. Semi-supervised multi-view binary learning for large-scale image clustering. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03205-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
7
|
Pan S, Zhu C, Zhao XM, Coelho LP. A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nat Commun 2022; 13:2326. [PMID: 35484115 PMCID: PMC9051138 DOI: 10.1038/s41467-022-29843-y] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 03/31/2022] [Indexed: 12/14/2022] Open
Abstract
Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we introduce SemiBin, an open source tool that uses deep siamese neural networks to implement a semi-supervised approach, i.e. SemiBin exploits the information in reference genomes, while retaining the capability of reconstructing high-quality bins that are outside the reference dataset. Using simulated and real microbiome datasets from several different habitats from GMGCv1 (Global Microbial Gene Catalog), including the human gut, non-human guts, and environmental habitats (ocean and soil), we show that SemiBin outperforms existing state-of-the-art binning methods. In particular, compared to other methods, SemiBin returns more high-quality bins with larger taxonomic diversity, including more distinct genera and species.
Collapse
Affiliation(s)
- Shaojun Pan
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China
| | - Chengkai Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China
- School of Life Sciences, Fudan University, Shanghai, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China.
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
- Zhangjiang Fudan International Innovation Center, Shanghai, China.
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China.
| |
Collapse
|
8
|
Zhou P, Wang N, Zhao S, Zhang Y. Robust semi-supervised clustering via data transductive warping. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03493-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
9
|
Deep Semi-Supervised Algorithm for Learning Cluster-Oriented Representations of Medical Images Using Partially Observable DICOM Tags and Images. Diagnostics (Basel) 2021; 11:diagnostics11101920. [PMID: 34679618 PMCID: PMC8534981 DOI: 10.3390/diagnostics11101920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 10/13/2021] [Accepted: 10/15/2021] [Indexed: 11/16/2022] Open
Abstract
The task of automatically extracting large homogeneous datasets of medical images based on detailed criteria and/or semantic similarity can be challenging because the acquisition and storage of medical images in clinical practice is not fully standardised and can be prone to errors, which are often made unintentionally by medical professionals during manual input. In this paper, we propose an algorithm for learning cluster-oriented representations of medical images by fusing images with partially observable DICOM tags. Pairwise relations are modelled by thresholding the Gower distance measure which is calculated using eight DICOM tags. We trained the models using 30,000 images, and we tested them using a disjoint test set consisting of 8000 images, gathered retrospectively from the PACS repository of the Clinical Hospital Centre Rijeka in 2017. We compare our method against the standard and deep unsupervised clustering algorithms, as well as the popular semi-supervised algorithms combined with the most commonly used feature descriptors. Our model achieves an NMI score of 0.584 with respect to the anatomic region, and an NMI score of 0.793 with respect to the modality. The results suggest that DICOM data can be used to generate pairwise constraints that can help improve medical images clustering, even when using only a small number of constraints.
Collapse
|
10
|
Lima BVA, Neto ADD, Silva LES, Machado VP. Deep semi‐supervised classification based in deep clustering and cross‐entropy. INT J INTELL SYST 2021. [DOI: 10.1002/int.22446] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Bruno Vicente Alves Lima
- Departament of Computer and Automation Federal University of Rio Grande do Norte Natal Rio Grande do Norte Brazil
| | - Adrião Duarte Dória Neto
- Departament of Computer and Automation Federal University of Rio Grande do Norte Natal Rio Grande do Norte Brazil
| | | | | |
Collapse
|
11
|
Qing Y, Zeng Y, Cao Q, Huang GB. End-to-end novel visual categories learning via auxiliary self-supervision. Neural Netw 2021; 139:24-32. [PMID: 33677376 DOI: 10.1016/j.neunet.2021.02.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 01/18/2021] [Accepted: 02/12/2021] [Indexed: 10/22/2022]
Abstract
Semi-supervised learning has largely alleviated the strong demand for large amount of annotations in deep learning. However, most of the methods have adopted a common assumption that there is always labeled data from the same class of unlabeled data, which is impractical and restricted for real-world applications. In this research work, our focus is on semi-supervised learning when the categories of unlabeled data and labeled data are disjoint from each other. The main challenge is how to effectively leverage knowledge in labeled data to unlabeled data when they are independent from each other, and not belonging to the same categories. Previous state-of-the-art methods have proposed to construct pairwise similarity pseudo labels as supervising signals. However, two issues are commonly inherent in these methods: (1) All of previous methods are comprised of multiple training phases, which makes it difficult to train the model in an end-to-end fashion. (2) Strong dependence on the quality of pairwise similarity pseudo labels limits the performance as pseudo labels are vulnerable to noise and bias. Therefore, we propose to exploit the use of self-supervision as auxiliary task during model training such that labeled data and unlabeled data will share the same set of surrogate labels and overall supervising signals can have strong regularization. By doing so, all modules in the proposed algorithm can be trained simultaneously, which will boost the learning capability as end-to-end learning can be achieved. Moreover, we propose to utilize local structure information in feature space during pairwise pseudo label construction, as local properties are more robust to noise. Extensive experiments have been conducted on three frequently used visual datasets, i.e., CIFAR-10, CIFAR-100 and SVHN, in this paper. Experiment results have indicated the effectiveness of our proposed algorithm as we have achieved new state-of-the-art performance for novel visual categories learning for these three datasets.
Collapse
Affiliation(s)
- Yuanyuan Qing
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore.
| | - Yijie Zeng
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore.
| | - Qi Cao
- School of Computing Science, University of Glasgow, Singapore 567739, Singapore.
| | - Guang-Bin Huang
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore.
| |
Collapse
|