1
|
Jiang M, Gu X, Xu Y, Wang J. Metabolism-associated molecular classification and prognosis signature of head and neck squamous cell carcinoma. Heliyon 2024; 10:e27587. [PMID: 38501009 PMCID: PMC10945276 DOI: 10.1016/j.heliyon.2024.e27587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 02/25/2024] [Accepted: 03/04/2024] [Indexed: 03/20/2024] Open
Abstract
Although the fundamental processes and chemical changes in metabolic programs have been elucidated in many cancers, the expression patterns of metabolism-related genes in head and neck squamous cell carcinoma (HNSCC) remain unclear. The mRNA expression profiles from the Cancer Genome Atlas included 502 tumour and 44 normal samples were extracted. We explored the biological functions and prognosis roles of metabolism-associated genes in patients with HNSCC. The results indicated that patients with HNSCC could be divided into three molecular subtypes (C1, C2 and C3) based on 249 metabolism-related genes. There were markedly different clinical characteristics, prognosis outcomes, and biological functions among the three subtypes. Different molecular subtypes also have different tumour microenvironments and immune infiltration levels. The established prognosis model with 17 signature genes could predict the prognosis of patients with HNSCC and was validated using an independent cohort dataset. An individual risk scoring tool was developed using the risk score and clinical parameters; the risk score was an independent prognostic factor for patients with HNSCC. Different risk stratifications have different clinical characteristics, biological features, tumour microenvironments and immune infiltration levels. Our study could be used for clinical risk management and to help conduct precision medicine for patients with HNSCC.
Collapse
Affiliation(s)
- Mengxian Jiang
- Department of Otorhinolaryngology Head and Neck Surgery, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, 430000, China
| | - Xiang Gu
- Department of Otorhinolaryngology Head and Neck Surgery, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, 430000, China
| | - Yexing Xu
- Department of Otorhinolaryngology Head and Neck Surgery, Maternal and Child Health of Hubei Province, Tongji Medical College, Huazhong University of Science and Technology, Hubei Province, 430000, China
| | - Jing Wang
- Department of Otorhinolaryngology Head and Neck Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Hubei Province, 430000, China
| |
Collapse
|
2
|
Zhou N, Choi KS, Chen B, Du Y, Liu J, Xu Y. Correntropy-Based Low-Rank Matrix Factorization With Constraint Graph Learning for Image Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10433-10446. [PMID: 35507622 DOI: 10.1109/tnnls.2022.3166931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article proposes a novel low-rank matrix factorization model for semisupervised image clustering. In order to alleviate the negative effect of outliers, the maximum correntropy criterion (MCC) is incorporated as a metric to build the model. To utilize the label information to improve the clustering results, a constraint graph learning framework is proposed to adaptively learn the local structure of the data by considering the label information. Furthermore, an iterative algorithm based on Fenchel conjugate (FC) and block coordinate update (BCU) is proposed to solve the model. The convergence properties of the proposed algorithm are analyzed, which shows that the algorithm exhibits both objective sequential convergence and iterate sequential convergence. Experiments are conducted on six real-world image datasets, and the proposed algorithm is compared with eight state-of-the-art methods. The results show that the proposed method can achieve better performance in most situations in terms of clustering accuracy and mutual information.
Collapse
|
3
|
Ding Y, Zhou H, Zou Q, Yuan L. Identification of drug-side effect association via correntropy-loss based matrix factorization with neural tangent kernel. Methods 2023; 219:73-81. [PMID: 37783242 DOI: 10.1016/j.ymeth.2023.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 09/20/2023] [Indexed: 10/04/2023] Open
Abstract
Adverse drug reactions include side effects, allergic reactions, and secondary infections. Severe adverse reactions can cause cancer, deformity, or mutation. The monitoring of drug side effects is an important support for post marketing safety supervision of drugs, and an important basis for revising drug instructions. Its purpose is to timely detect and control drug safety risks. Traditional methods are time-consuming. To accelerate the discovery of side effects, we propose a machine learning based method, called correntropy-loss based matrix factorization with neural tangent kernel (CLMF-NTK), to solve the prediction of drug side effects. Our method and other computational methods are tested on three benchmark datasets, and the results show that our method achieves the best predictive performance.
Collapse
Affiliation(s)
- Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou 571158, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China; School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Hongmei Zhou
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100# Minjiang Main Road, Quzhou 324000, China.
| |
Collapse
|
4
|
Huang X, Bajpai AK, Sun J, Xu F, Lu L, Yousefi S. A new gene-scoring method for uncovering novel glaucoma-related genes using non-negative matrix factorization based on RNA-seq data. Front Genet 2023; 14:1204909. [PMID: 37377596 PMCID: PMC10292752 DOI: 10.3389/fgene.2023.1204909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/30/2023] [Indexed: 06/29/2023] Open
Abstract
Early diagnosis and treatment of glaucoma are challenging. The discovery of glaucoma biomarkers based on gene expression data could potentially provide new insights for early diagnosis, monitoring, and treatment options of glaucoma. Non-negative Matrix Factorization (NMF) has been widely used in numerous transcriptome data analyses in order to identify subtypes and biomarkers of different diseases; however, its application in glaucoma biomarker discovery has not been previously reported. Our study applied NMF to extract latent representations of RNA-seq data from BXD mouse strains and sorted the genes based on a novel gene scoring method. The enrichment ratio of the glaucoma-reference genes, extracted from multiple relevant resources, was compared using both the classical differentially expressed gene (DEG) analysis and NMF methods. The complete pipeline was validated using an independent RNA-seq dataset. Findings showed our NMF method significantly improved the enrichment detection of glaucoma genes. The application of NMF with the scoring method showed great promise in the identification of marker genes for glaucoma.
Collapse
Affiliation(s)
- Xiaoqin Huang
- Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Akhilesh K. Bajpai
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Jian Sun
- Integrated Data Science Section, Research Technologies Branch, National Institute of Allergy and Infectious Diseases, National Institute of Health (NIH), Bethesda, MD, United States
| | - Fuyi Xu
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
- School of Pharmacy, Binzhou Medical University, Yantai, Shandong, China
| | - Lu Lu
- Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, United States
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Siamak Yousefi
- Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, United States
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| |
Collapse
|
5
|
Zhong W, Wu Y, Zhu M, Zhong H, Huang C, Lin Y, Huang J. Alternative splicing and alternative polyadenylation define tumor immune microenvironment and pharmacogenomic landscape in clear cell renal carcinoma. MOLECULAR THERAPY. NUCLEIC ACIDS 2022; 27:927-946. [PMID: 35211354 PMCID: PMC8829526 DOI: 10.1016/j.omtn.2022.01.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 01/17/2022] [Indexed: 12/29/2022]
Abstract
Two major posttranscriptional mechanisms—alternative splicing (AS) and alternative polyadenylation (APA)—have attracted much attention in cancer research. Nevertheless, their roles in clear cell renal carcinoma (ccRCC) are still ill defined. Herein, this study was conducted to uncover the implications of AS and APA events in ccRCC progression. Through consensus molecular clustering analysis, two AS or APA RNA processing phenotypes were separately constructed with distinct prognosis, tumor-infiltrating immune cells, responses to immunotherapy, and chemotherapy. The AS or APA score was constructed to quantify AS or APA RNA processing patterns of individual ccRCCs with principal-component analysis. Both high AS and APA scores were characterized by undesirable survival outcomes, relatively high response to immunotherapy, and low sensitivity to targeted drugs, such as sorafenib and pazopanib. Moreover, several small molecular compounds were predicted for patients with a high AS or APA score. There was a positive correlation between AS and APA scores. Their interplay contributed to poor prognosis and reshaped the tumor immune microenvironment. Collectively, this study is the first to comprehensively analyze two major posttranscriptional events in ccRCC. Our findings uncovered the potential functions of AS and APA events and identified their therapeutic potential in immunotherapy and targeted therapy.
Collapse
Affiliation(s)
- Weimin Zhong
- Central Laboratory at The Fifth Hospital of Xiamen, Xiamen 361101, Fujian Province, China
| | - Yulong Wu
- Department of Urology at The Fifth Hospital of Xiamen, Xiamen 361101, Fujian Province, China
| | - Maoshu Zhu
- Central Laboratory at The Fifth Hospital of Xiamen, Xiamen 361101, Fujian Province, China
| | - Hongbin Zhong
- Department of Nephrology at The Fifth Hospital of Xiamen, Xiamen 361101, Fujian Province, China
| | - Chaoqun Huang
- Central Laboratory at The Fifth Hospital of Xiamen, Xiamen 361101, Fujian Province, China
| | - Yao Lin
- Central Laboratory at The Second Affiliated Hospital of Fujian Traditional Chinese Medical University, Innovation and transformation center, Fujian University of Traditional Chinese Medicine, Fuzhou 350122, China
| | - Jiyi Huang
- Department of Nephrology at The Fifth Hospital of Xiamen, Xiamen 361101, Fujian Province, China
| |
Collapse
|
6
|
Shao D, Dai Y, Li N, Cao X, Zhao W, Cheng L, Rong Z, Huang L, Wang Y, Zhao J. Artificial intelligence in clinical research of cancers. Brief Bioinform 2021; 23:6470966. [PMID: 34929741 PMCID: PMC8769909 DOI: 10.1093/bib/bbab523] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 11/06/2021] [Accepted: 11/13/2021] [Indexed: 12/16/2022] Open
Abstract
Several factors, including advances in computational algorithms, the availability of high-performance computing hardware, and the assembly of large community-based databases, have led to the extensive application of Artificial Intelligence (AI) in the biomedical domain for nearly 20 years. AI algorithms have attained expert-level performance in cancer research. However, only a few AI-based applications have been approved for use in the real world. Whether AI will eventually be capable of replacing medical experts has been a hot topic. In this article, we first summarize the cancer research status using AI in the past two decades, including the consensus on the procedure of AI based on an ideal paradigm and current efforts of the expertise and domain knowledge. Next, the available data of AI process in the biomedical domain are surveyed. Then, we review the methods and applications of AI in cancer clinical research categorized by the data types including radiographic imaging, cancer genome, medical records, drug information and biomedical literatures. At last, we discuss challenges in moving AI from theoretical research to real-world cancer research applications and the perspectives toward the future realization of AI participating cancer treatment.
Collapse
Affiliation(s)
- Dan Shao
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Yinfei Dai
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Nianfeng Li
- College of Computer Science and Technology, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Xuqing Cao
- Department of Neurology, People's Hospital of Ningxia Hui Autonomous Region (The Affiliated people's Hospital of Ningxia Medical University and The First Affiliated Hospital of Northwest Minzu University), Yinchuan 750002, China
| | - Wei Zhao
- Department of Biochemistry and Molecular Biology, Ningxia Medical University, Yinchuan 750002, China
| | - Li Cheng
- Department of Electrical Diagnosis, Affiliated Hospital of Changchun University of Traditional Chinese Medicine, Changchun, 130021, China
| | - Zhuqing Rong
- School of Science, Key Laboratory of Human Health Status Identification and Function Enhancement of Jilin Province, Changchun University, Changchun 130022, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Jing Zhao
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, 43210, USA
| |
Collapse
|
7
|
Fan M, Yuan W, Liu W, Gao X, Xu M, Wang S, Li L. A deep matrix factorization framework for identifying underlying tissue-specific patterns of DCE-MRI: applications for molecular subtype classification in breast cancer. Phys Med Biol 2021; 66. [PMID: 34787109 DOI: 10.1088/1361-6560/ac3a25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 11/16/2021] [Indexed: 11/12/2022]
Abstract
Objective.Breast cancer is heterogeneous in that different angiogenesis and blood flow characteristics could be present within a tumor. The pixel kinetics of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) can assume several distinct signal patterns related to specific tissue characteristics. Identification of the latent, tissue-specific dynamic patterns of intratumor heterogeneity can shed light on the biological mechanisms underlying the heterogeneity of tumors.Approach.To mine this information, we propose a deep matrix factorization-based dynamic decomposition (DMFDE) model specifically designed according to DCE-MRI characteristics. The time-series imaging data were decomposed into tissue-specific dynamic patterns and their corresponding proportion maps. The image pixel matrix and the reference matrix of population-level kinetics obtained by clustering the dynamic signals were used as the inputs. Two multilayer neural network branches were designed to collaboratively project the input matrix into a latent dynamic pattern and a dynamic proportion matrix, which was justified using simulated data. Clinical implications of DMFDE were assessed by radiomics analysis of proportion maps obtained from the tumor/parenchyma region for classifying the luminal A subtype.Main results.The decomposition performance of DMFDE was evaluated by the root mean square error and was shown to be better than that of the conventional convex analysis of mixtures (CAM) method. The predictive model withK = 3, 4, and 5 dynamic proportion maps generated AUC values of 0.780, 0.786 and 0.790, respectively, in distinguishing between luminal A and nonluminal A tumors, which are better than the CAM method (AUC = 0.726). The combination of statistical features from images with different proportion maps has the highest prediction value (AUC = 0.813), which is significantly higher than that based on CAM.Conclusion.This proposed method identified the latent dynamic patterns associated with different molecular subtypes, and radiomics analysis based on the pixel compositions of the uncovered dynamic patterns was able to determine molecular subtypes of breast cancer.
Collapse
Affiliation(s)
- Ming Fan
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Wei Yuan
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Weifen Liu
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, 23955-6900, Saudi Arabia
| | - Maosheng Xu
- Department of Radiology, First Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang, People's Republic of China
| | - Shiwei Wang
- Department of Radiology, First Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang, People's Republic of China
| | - Lihua Li
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| |
Collapse
|
8
|
Wang Y, Ma Z, Wong KC, Li X. Evolving Multiobjective Cancer Subtype Diagnosis From Cancer Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2431-2444. [PMID: 32086219 DOI: 10.1109/tcbb.2020.2974953] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Detection and diagnosis of cancer are especially essential for early prevention and effective treatments. Many studies have been proposed to tackle the subtype diagnosis problems with those data, which often suffer from low diagnostic ability and bad generalization. This article studies a multiobjective PSO-based hybrid algorithm (MOPSOHA) to optimize four objectives including the number of features, the accuracy, and two entropy-based measures: the relevance and the redundancy simultaneously, diagnosing the cancer data with high classification power and robustness. First, we propose a novel binary encoding strategy to choose informative gene subsets to optimize those objective functions. Second, a mutation operator is designed to enhance the exploration capability of the swarm. Finally, a local search method based on the "best/1" mutation operator of differential evolutionary algorithm (DE) is employed to exploit the neighborhood area with sparse high-quality solutions since the base vector always approaches to some good promising areas. In order to demonstrate the effectiveness of MOPSOHA, it is tested on 41 cancer datasets including thirty-five cancer gene expression datasets and six independent disease datasets. Compared MOPSOHA with other state-of-the-art algorithms, the performance of MOPSOHA is superior to other algorithms in most of the benchmark datasets.
Collapse
|
9
|
Jiao CN, Liu JX, Wang J, Shang J, Zheng CH. Visualization and Analysis of Single cell RNA-seq Data by Maximizing Correntropy based Non-negative Low Rank Representation. IEEE J Biomed Health Inform 2021; 26:1872-1882. [PMID: 34495855 DOI: 10.1109/jbhi.2021.3110766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The exploration of single cell RNA-sequencing (scRNA-seq) technology generates a new perspective to analyze biological problems. One of the major applications of scRNA-seq data is to discover subtypes of cells by cell clustering. Nevertheless, it is challengeable for traditional methods to handle scRNA-seq data with high level of technical noise and notorious dropouts. To better analyze single cell data, a novel scRNA-seq data analysis model called Maximum correntropy criterion based Non-negative and Low Rank Representation (MccNLRR) is introduced. Specifically, the maximum correntropy criterion, as an effective loss function, is more robust to the high noise and large outliers existed in the data. Moreover, the low rank representation is proven to be a powerful tool for capturing the global and local structures of data. Therefore, some important information, such as the similarity of cells in the subspace, is also extracted by it. Then, an iterative algorithm on the basis of the half-quadratic optimization and alternating direction method is developed to settle the complex optimization problem. Before the experiment, we also analyze the convergence and robustness of MccNLRR. At last, the results of cell clustering, visualization analysis, and gene markers selection on scRNA-seq data reveal that MccNLRR method can distinguish cell subtypes accurately and robustly.
Collapse
|
10
|
Yu N, Wu MJ, Liu JX, Zheng CH, Xu Y. Correntropy-Based Hypergraph Regularized NMF for Clustering and Feature Selection on Multi-Cancer Integrated Data. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3952-3963. [PMID: 32603306 DOI: 10.1109/tcyb.2020.3000799] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Non-negative matrix factorization (NMF) has become one of the most powerful methods for clustering and feature selection. However, the performance of the traditional NMF method severely degrades when the data contain noises and outliers or the manifold structure of the data is not taken into account. In this article, a novel method called correntropy-based hypergraph regularized NMF (CHNMF) is proposed to solve the above problem. Specifically, we use the correntropy instead of the Euclidean norm in the loss term of CHNMF, which will improve the robustness of the algorithm. And the hypergraph regularization term is also applied to the objective function, which can explore the high-order geometric information in more sample points. Then, the half-quadratic (HQ) optimization technique is adopted to solve the complex optimization problem of CHNMF. Finally, extensive experimental results on multi-cancer integrated data indicate that the proposed CHNMF method is superior to other state-of-the-art methods for clustering and feature selection.
Collapse
|
11
|
Ding Q, Sun Y, Shang J, Li F, Zhang Y, Liu JX. NMFNA: A Non-negative Matrix Factorization Network Analysis Method for Identifying Modules and Characteristic Genes of Pancreatic Cancer. Front Genet 2021; 12:678642. [PMID: 34367241 PMCID: PMC8340025 DOI: 10.3389/fgene.2021.678642] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 06/03/2021] [Indexed: 01/15/2023] Open
Abstract
Pancreatic cancer (PC) is a highly fatal disease, yet its causes remain unclear. Comprehensive analysis of different types of PC genetic data plays a crucial role in understanding its pathogenic mechanisms. Currently, non-negative matrix factorization (NMF)-based methods are widely used for genetic data analysis. Nevertheless, it is a challenge for them to integrate and decompose different types of genetic data simultaneously. In this paper, a non-NMF network analysis method, NMFNA, is proposed, which introduces a graph-regularized constraint to the NMF, for identifying modules and characteristic genes from two-type PC data of methylation (ME) and copy number variation (CNV). Firstly, three PC networks, i.e., ME network, CNV network, and ME-CNV network, are constructed using the Pearson correlation coefficient (PCC). Then, modules are detected from these three PC networks effectively due to the introduced graph-regularized constraint, which is the highlight of the NMFNA. Finally, both gene ontology (GO) and pathway enrichment analyses are performed, and characteristic genes are detected by the multimeasure score, to deeply understand biological functions of PC core modules. Experimental results demonstrated that the NMFNA facilitates the integration and decomposition of two types of PC data simultaneously and can further serve as an alternative method for detecting modules and characteristic genes from multiple genetic data of complex diseases.
Collapse
Affiliation(s)
- Qian Ding
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Yan Sun
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, China
| |
Collapse
|
12
|
Wang CY, Gao YL, Liu JX, Dai LY, Shang J. Sparse robust graph-regularized non-negative matrix factorization based on correntropy. J Bioinform Comput Biol 2021; 19:2050047. [PMID: 33410727 DOI: 10.1142/s021972002050047x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Non-negative Matrix Factorization (NMF) is a popular data dimension reduction method in recent years. The traditional NMF method has high sensitivity to data noise. In the paper, we propose a model called Sparse Robust Graph-regularized Non-negative Matrix Factorization based on Correntropy (SGNMFC). The maximized correntropy replaces the traditional minimized Euclidean distance to improve the robustness of the algorithm. Through the kernel function, correntropy can give less weight to outliers and noise in data but give greater weight to meaningful data. Meanwhile, the geometry structure of the high-dimensional data is completely preserved in the low-dimensional manifold through the graph regularization. Feature selection and sample clustering are commonly used methods for analyzing genes. Sparse constraints are applied to the loss function to reduce matrix complexity and analysis difficulty. Comparing the other five similar methods, the effectiveness of the SGNMFC model is proved by selection of differentially expressed genes and sample clustering experiments in three The Cancer Genome Atlas (TCGA) datasets.
Collapse
Affiliation(s)
- Chuan-Yuan Wang
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, P. R. China
| | - Ying-Lian Gao
- Qufu Normal University Library, Qufu Normal University, Rizhao, Shandong, P. R. China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, P. R. China
| | - Ling-Yun Dai
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, P. R. China
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, Shandong, P. R. China
| |
Collapse
|
13
|
Zhang X, Ma H, Zou Q, Wu J. Analysis of Cyclin-Dependent Kinase 1 as an Independent Prognostic Factor for Gastric Cancer Based on Statistical Methods. Front Cell Dev Biol 2020; 8:620164. [PMID: 33365314 PMCID: PMC7750425 DOI: 10.3389/fcell.2020.620164] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 11/03/2020] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE The aim of this study was to investigate the expression of cyclin-dependent kinase 1 (CDK1) in gastric cancer (GC), evaluate its relationship with the clinicopathological features and prognosis of GC, and analyze the advantage of CDK1 as a potential independent prognostic factor for GC. METHODS The Cancer Genome Atlas (TCGA) data and corresponding clinical features of GC were collected. First, the aim gene was selected by combining five topological analysis methods, where the gene expression in paracancerous and GC tissues was analyzed by Limma package and Wilcox test. Second, the correlation between gene expression and clinical features was analyzed by logistic regression. Finally, the survival analysis was carried out by using the Kaplan-Meier. The gene prognostic value was evaluated by univariate and multivariate Cox analyses, and the gene potential biological function was explored by gene set enrichment analysis (GSEA). RESULTS CDK1 was selected as one of the most important genes associated with GC. The expression level of CDK1 in GC tissues was significantly higher than that in paracancerous tissues, which was significantly correlated with pathological stage and grade. The survival rate of the CDK1 high expression group was significantly lower than that of the low expression group. CDK1 expression was significantly correlated with overall survival (OS). CDK1 expression was mainly involved in prostate cancer, small cell lung cancer, and GC and was enriched in the WNT signaling pathway and T cell receptor signaling pathway. CONCLUSION CDK1 may serve as an independent prognostic factor for GC. It is also expected to be a new target for molecular targeted therapy of GC.
Collapse
Affiliation(s)
- Xu Zhang
- School of Mathematics and Statistics, Southwest University, Chongqing, China
| | - Hua Ma
- School of Mathematics and Statistics, Southwest University, Chongqing, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Hainan Key Laboratory for Computational Science and Application, Hainan Normal University, Haikou, China
| | - Jin Wu
- School of Management, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
14
|
Zhou N, Chen B, Du Y, Jiang T, Liu J, Xu Y. Maximum Correntropy Criterion-Based Robust Semisupervised Concept Factorization for Image Representation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3877-3891. [PMID: 31722499 DOI: 10.1109/tnnls.2019.2947156] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Concept factorization (CF) has shown its great advantage for both clustering and data representation and is particularly useful for image representation. Compared with nonnegative matrix factorization (NMF), CF can be applied to data containing negative values. However, the performance of CF method and its extensions will degenerate a lot due to the negative effects of outliers, and CF is an unsupervised method that cannot incorporate label information. In this article, we propose a novel CF method, with a novel model built based on the maximum correntropy criterion (MCC). In order to capture the local geometry information of data, our method integrates the robust adaptive embedding and CF into a unified framework. The label information is utilized in the adaptive learning process. Furthermore, an iterative strategy based on the accelerated block coordinate update is proposed. The convergence property of the proposed method is analyzed to ensure that the algorithm converges to a reliable solution. The experimental results on four real-world image data sets show that the new method can almost always filter out the negative effects of the outliers and outperform several state-of-the-art image representation methods.
Collapse
|
15
|
Peng S, Ser W, Chen B, Lin Z. Robust orthogonal nonnegative matrix tri-factorization for data representation. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106054] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
16
|
Qiang J, Ding W, Kuijjer M, Quackenbush J, Chen P. Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:67775-67789. [PMID: 36329870 PMCID: PMC9629797 DOI: 10.1109/access.2020.2982569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this paper, given data with high-dimensional features, we study this problem of how to calculate the similarity between two samples by considering feature interaction network, where a feature interaction network represents the relationship between features. This is different from some traditional methods, those of which learn similarities based on a sample network that represents the relationship between samples. Therefore, we propose a novel network-based similarity metric for computing the similarity between samples, which incorporates the knowledge of feature interaction network, in order to overcome the data sparseness problem. Our similarity metric uses a new Feature Alignment Similarity measure, which does not directly compute the similarities among samples, but projects each sample into a feature interaction network and measures the similarities between two samples using the similarities between the vertices of the samples in the network. As such, when two samples do not share any common features, they are likely to have higher similarity values when their features share the similar network regions. For ensuring that the metric is useful in a real-world application, we apply our metric to discover subtypes in tumor mutational data by incorporating the information of the gene interaction network. Our experimental results from using synthetic data and real-world tumor mutational data show that our approach outperforms the top competitors in cancer subtype discovery. Furthermore, our approach can identify cancer subtypes that cannot be detected by other clustering algorithms in real cancer data.
Collapse
Affiliation(s)
- Jipeng Qiang
- Department of Computer Science, Yangzhou University, Yangzhou 225127, China
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Wei Ding
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Marieke Kuijjer
- Centre for Molecular Medicine Norway, University of Oslo Faculty of Medicine, 0318 Oslo, Norway
| | - John Quackenbush
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Ping Chen
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| |
Collapse
|
17
|
Xiao Q, Luo J, Liang C, Li G, Cai J, Ding P, Liu Y. Identifying lncRNA and mRNA Co-Expression Modules from Matched Expression Data in Ovarian Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:623-634. [PMID: 30106686 DOI: 10.1109/tcbb.2018.2864129] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Long non-coding RNAs (lncRNAs) have been shown to be involved in multiple biological processes and play critical roles in tumorigenesis. Numerous lncRNAs have been discovered in diverse species, but the functions of most lncRNAs still remain unclear. Meanwhile, their expression patterns and regulation mechanisms are also far from being fully understood. With the advances of high-throughput technologies, the increasing availability of genomic data creates opportunities for deciphering the molecular mechanism and underlying pathogenesis of human diseases. Here, we develop an integrative framework called JONMF to identify lncRNA-mRNA co-expression modules based on the sample-matched lncRNA and mRNA expression profiles. We formulate the module detection task as an optimization problem with joint orthogonal non-negative matrix factorization that could effectively prevent multicollinearity and produce a good modularity interpretation. The constructed lncRNA-mRNA co-expression network and the gene-gene interaction network are used as the network-regularized constraints to improve the module accuracy, while the sparsity constraints are simultaneously utilized to achieve modular sparse solutions. We applied JONMF to human ovarian cancer dataset and the experiment results demonstrate that the proposed method can effectively discover biologically functional co-expression modules, which may provide insights into the function of lncRNAs and molecular mechanism of human diseases.
Collapse
|
18
|
Gao J, Liu L, Yao S, Huang X, Mamitsuka H, Zhu S. HPOAnnotator: improving large-scale prediction of HPO annotations by low-rank approximation with HPO semantic similarities and multiple PPI networks. BMC Med Genomics 2019; 12:187. [PMID: 31865916 PMCID: PMC6927106 DOI: 10.1186/s12920-019-0625-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND As a standardized vocabulary of phenotypic abnormalities associated with human diseases, the Human Phenotype Ontology (HPO) has been widely used by researchers to annotate phenotypes of genes/proteins. For saving the cost and time spent on experiments, many computational approaches have been proposed. They are able to alleviate the problem to some extent, but their performances are still far from satisfactory. METHOD For inferring large-scale protein-phenotype associations, we propose HPOAnnotator that incorporates multiple Protein-Protein Interaction (PPI) information and the hierarchical structure of HPO. Specifically, we use a dual graph to regularize Non-negative Matrix Factorization (NMF) in a way that the information from different sources can be seamlessly integrated. In essence, HPOAnnotator solves the sparsity problem of a protein-phenotype association matrix by using a low-rank approximation. RESULTS By combining the hierarchical structure of HPO and co-annotations of proteins, our model can well capture the HPO semantic similarities. Moreover, graph Laplacian regularizations are imposed in the latent space so as to utilize multiple PPI networks. The performance of HPOAnnotator has been validated under cross-validation and independent test. Experimental results have shown that HPOAnnotator outperforms the competing methods significantly. CONCLUSIONS Through extensive comparisons with the state-of-the-art methods, we conclude that the proposed HPOAnnotator is able to achieve the superior performance as a result of using a low-rank approximation with a graph regularization. It is promising in that our approach can be considered as a starting point to study more efficient matrix factorization-based algorithms.
Collapse
Affiliation(s)
- Junning Gao
- School of Computer Science and Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, 220 Handan Road, Shanghai, 200433 China
| | - Lizhi Liu
- School of Computer Science and Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, 220 Handan Road, Shanghai, 200433 China
| | - Shuwei Yao
- School of Computer Science and Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, 220 Handan Road, Shanghai, 200433 China
| | - Xiaodi Huang
- School of Computing and Mathematics, Charles Sturt University, Elizabeth Mitchell Dr, Albury, NSW 2640 Australia
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kashiwada Gokasho, Uji, Kyoto, 611-0011 Japan
- Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02150 Finland
| | - Shanfeng Zhu
- School of Computer Science and Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, 220 Handan Road, Shanghai, 200433 China
- Shanghai Institute of Artificial Intelligence Algorithms and ISTBI, Fudan University, Shanghai, 200433 China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| |
Collapse
|
19
|
Wang Y, Liu Q, Huang S, Yuan B. Learning a Structural and Functional Representation for Gene Expressions: To Systematically Dissect Complex Cancer Phenotypes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1729-1742. [PMID: 28489545 DOI: 10.1109/tcbb.2017.2702161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Cancer is a heterogeneous disease, thus one of the central problems is how to dissect the resulting complex phenotypes in terms of their biological building blocks. Computationally, this is to represent and interpret high dimensional observations through a structural and conceptual abstraction into the most influential determinants underlying the problem. The working hypothesis of this report is to consider gene interaction to be largely responsible for the manifestation of complex cancer phenotypes, thus where the representation is to be conceptualized. Here, we report a representation learning strategy combined with regularizations, in which gene expressions are described in terms of a regularized product of meta-genes and their expression levels. The meta-genes are constrained by gene interactions thus representing their original topological contexts. The expression levels are supervised by their conditional dependencies among the observations thus providing a cluster-specific constraint. We obtain both of these structural constraints using a node-based graphical model. Our representation allows the selection of more influential modules, thus implicating their possible roles in neoplastic transformations. We validate our representation strategy by its robust recognitions of various cancer phenotypes comparing with various classical methods. The modules discovered are either shared or specify for different types or stages of human cancers, all of which are consistent with literature and biology.
Collapse
|
20
|
Woo J, Prince JL, Stone M, Xing F, Gomez AD, Green JR, Hartnick CJ, Brady TJ, Reese TG, Wedeen VJ, El Fakhri G. A Sparse Non-Negative Matrix Factorization Framework for Identifying Functional Units of Tongue Behavior From MRI. IEEE TRANSACTIONS ON MEDICAL IMAGING 2019; 38:730-740. [PMID: 30235120 PMCID: PMC6422735 DOI: 10.1109/tmi.2018.2870939] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Muscle coordination patterns of lingual behaviors are synergies generated by deforming local muscle groups in a variety of ways. Functional units are functional muscle groups of local structural elements within the tongue that compress, expand, and move in a cohesive and consistent manner. Identifying the functional units using tagged-magnetic resonance imaging (MRI) sheds light on the mechanisms of normal and pathological muscle coordination patterns, yielding improvement in surgical planning, treatment, or rehabilitation procedures. In this paper, to mine this information, we propose a matrix factorization and probabilistic graphical model framework to produce building blocks and their associated weighting map using motion quantities extracted from tagged-MRI. Our tagged-MRI imaging and accurate voxel-level tracking provide previously unavailable internal tongue motion patterns, thus revealing the inner workings of the tongue during speech or other lingual behaviors. We then employ spectral clustering on the weighting map to identify the cohesive regions defined by the tongue motion that may involve multiple or undocumented regions. To evaluate our method, we perform a series of experiments. We first use two-dimensional images and synthetic data to demonstrate the accuracy of our method. We then use three-dimensional synthetic and in vivo tongue motion data using protrusion and simple speech tasks to identify subject-specific and data-driven functional units of the tongue in localized regions.
Collapse
Affiliation(s)
- Jonghye Woo
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Jerry L. Prince
- Department of Electrical and Computer Engineering at Johns Hopkins University
| | | | - Fangxu Xing
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Arnold D. Gomez
- Department of Electrical and Computer Engineering at Johns Hopkins University
| | | | | | - Thomas J. Brady
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Timothy G. Reese
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Van J. Wedeen
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Georges El Fakhri
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| |
Collapse
|
21
|
Xiao Q, Luo J, Liang C, Cai J, Li G, Cao B. CeModule: an integrative framework for discovering regulatory patterns from genomic data in cancer. BMC Bioinformatics 2019; 20:67. [PMID: 30732558 PMCID: PMC6367773 DOI: 10.1186/s12859-019-2654-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 01/24/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Non-coding RNAs (ncRNAs) are emerging as key regulators and play critical roles in a wide range of tumorigenesis. Recent studies have suggested that long non-coding RNAs (lncRNAs) could interact with microRNAs (miRNAs) and indirectly regulate miRNA targets through competing interactions. Therefore, uncovering the competing endogenous RNA (ceRNA) regulatory mechanism of lncRNAs, miRNAs and mRNAs in post-transcriptional level will aid in deciphering the underlying pathogenesis of human polygenic diseases and may unveil new diagnostic and therapeutic opportunities. However, the functional roles of vast majority of cancer specific ncRNAs and their combinational regulation patterns are still insufficiently understood. RESULTS Here we develop an integrative framework called CeModule to discover lncRNA, miRNA and mRNA-associated regulatory modules. We fully utilize the matched expression profiles of lncRNAs, miRNAs and mRNAs and establish a model based on joint orthogonality non-negative matrix factorization for identifying modules. Meanwhile, we impose the experimentally verified miRNA-lncRNA interactions, the validated miRNA-mRNA interactions and the weighted gene-gene network into this framework to improve the module accuracy through the network-based penalties. The sparse regularizations are also used to help this model obtain modular sparse solutions. Finally, an iterative multiplicative updating algorithm is adopted to solve the optimization problem. CONCLUSIONS We applied CeModule to two cancer datasets including ovarian cancer (OV) and uterine corpus endometrial carcinoma (UCEC) obtained from TCGA. The modular analysis indicated that the identified modules involving lncRNAs, miRNAs and mRNAs are significantly associated and functionally enriched in cancer-related biological processes and pathways, which may provide new insights into the complex regulatory mechanism of human diseases at the system level.
Collapse
Affiliation(s)
- Qiu Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, 410081, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| | - Cheng Liang
- College of Information Science and Engineering, Shandong Normal University, Jinan, 250000, China
| | - Jie Cai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Guanghui Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Buwen Cao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| |
Collapse
|
22
|
Laplacian regularized low-rank representation for cancer samples clustering. Comput Biol Chem 2018; 78:504-509. [PMID: 30528509 DOI: 10.1016/j.compbiolchem.2018.11.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 11/07/2018] [Indexed: 12/18/2022]
Abstract
Cancer samples clustering based on biomolecular data has been becoming an important tool for cancer classification. The recognition of cancer types is of great importance for cancer treatment. In this paper, in order to improve the accuracy of cancer recognition, we propose to use Laplacian regularized Low-Rank Representation (LLRR) to cluster the cancer samples based on genomic data. In LLRR method, the high-dimensional genomic data are approximately treated as samples extracted from a combination of several low-rank subspaces. The purpose of LLRR method is to seek the lowest-rank representation matrix based on a dictionary. Because a Laplacian regularization based on manifold is introduced into LLRR, compared to the Low-Rank Representation (LRR) method, besides capturing the global geometric structure, LLRR can capture the intrinsic local structure of high-dimensional observation data well. And what is more, in LLRR, the original data themselves are selected as a dictionary, so the lowest-rank representation is actually a similar expression between the samples. Therefore, corresponding to the low-rank representation matrix, the samples with high similarity are considered to come from the same subspace and are grouped into a class. The experiment results on real genomic data illustrate that LLRR method, compared with LRR and MLLRR, is more robust to noise and has a better ability to learn the inherent subspace structure of data, and achieves remarkable performance in the clustering of cancer samples.
Collapse
|
23
|
Peng S, Ser W, Chen B, Sun L, Lin Z. Correntropy based graph regularized concept factorization for clustering. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.07.049] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
24
|
Li X, Wong KC. A Comparative Study for Identifying the Chromosome-Wide Spatial Clusters from High-Throughput Chromatin Conformation Capture Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:774-787. [PMID: 28333638 DOI: 10.1109/tcbb.2017.2684800] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In the past years, the high-throughput sequencing technologies have enabled massive insights into genomic annotations. In contrast, the full-scale three-dimensional arrangements of genomic regions are relatively unknown. Thanks to the recent breakthroughs in High-throughput Chromosome Conformation Capture (Hi-C) techniques, non-negative matrix factorization (NMF) has been adopted to identify local spatial clusters of genomic regions from Hi-C data. However, such non-negative matrix factorization entails a high-dimensional non-convex objective function to be optimized with non-negative constraints. We propose and compare more than ten optimization algorithms to improve the identification of local spatial clusters via NMF. To circumvent and optimize the high-dimensional, non-convex, and constrained objective function, we draw inspiration from the nature to perform in silico evolution. The proposed algorithms consist of a population of candidates to be evolved while the NMF acts as local search during the evolutions. The population based optimization algorithm coordinates and guides the non-negative matrix factorization toward global optima. Experimental results show that the proposed algorithms can improve the quality of non-negative matrix factorization over the recent state-of-the-arts. The effectiveness and robustness of the proposed algorithms are supported by comprehensive performance benchmarking on chromosome-wide Hi-C contact maps of yeast and human. In addition, time complexity analysis, convergence analysis, parameter analysis, biological case studies, and gene ontology similarity analysis are conducted to demonstrate the robustness of the proposed methods from different perspectives.
Collapse
|
25
|
Shang R, Wang W, Stolkin R, Jiao L. Non-Negative Spectral Learning and Sparse Regression-Based Dual-Graph Regularized Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:793-806. [PMID: 28287996 DOI: 10.1109/tcyb.2017.2657007] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Feature selection is an important approach for reducing the dimension of high-dimensional data. In recent years, many feature selection algorithms have been proposed, but most of them only exploit information from the data space. They often neglect useful information contained in the feature space, and do not make full use of the characteristics of the data. To overcome this problem, this paper proposes a new unsupervised feature selection algorithm, called non-negative spectral learning and sparse regression-based dual-graph regularized feature selection (NSSRD). NSSRD is based on the feature selection framework of joint embedding learning and sparse regression, but extends this framework by introducing the feature graph. By using low dimensional embedding learning in both data space and feature space, NSSRD simultaneously exploits the geometric information of both spaces. Second, the algorithm uses non-negative constraints to constrain the low-dimensional embedding matrix of both feature space and data space, ensuring that the elements in the matrix are non-negative. Third, NSSRD unifies the embedding matrix of the feature space and the sparse transformation matrix. To ensure the sparsity of the feature array, the sparse transformation matrix is constrained using the -norm. Thus feature selection can obtain accurate discriminative information from these matrices. Finally, NSSRD uses an iterative and alternative updating rule to optimize the objective function, enabling it to select the representative features more quickly and efficiently. This paper explains the objective function, the iterative updating rules and a proof of convergence. Experimental results show that NSSRD is significantly more effective than several other feature selection algorithms from the literature, on a variety of test data.
Collapse
|
26
|
Abstract
Background Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disease association mining. Matrix factorization learns a latent data model that takes a data matrix and transforms it into a latent feature space enabling generalization, noise removal and feature discovery. However, factorization algorithms are numerically intensive, and hence there is a pressing challenge to scale current algorithms to work with large datasets. Our focus in this paper is matrix tri-factorization, a popular method that is not limited by the assumption of standard matrix factorization about data residing in one latent space. Matrix tri-factorization solves this by inferring a separate latent space for each dimension in a data matrix, and a latent mapping of interactions between the inferred spaces, making the approach particularly suitable for biomedical data mining. Results We developed a block-wise approach for latent factor learning in matrix tri-factorization. The approach partitions a data matrix into disjoint submatrices that are treated independently and fed into a parallel factorization system. An appealing property of the proposed approach is its mathematical equivalence with serial matrix tri-factorization. In a study on large biomedical datasets we show that our approach scales well on multi-processor and multi-GPU architectures. On a four-GPU system we demonstrate that our approach can be more than 100-times faster than its single-processor counterpart. Conclusions A general approach for scaling non-negative matrix tri-factorization is proposed. The approach is especially useful parallel matrix factorization implemented in a multi-GPU environment. We expect the new approach will be useful in emerging procedures for latent factor analysis, notably for data integration, where many large data matrices need to be collectively factorized.
Collapse
|
27
|
Han S, Cai H, Che D, Zhang Y, Huang Y, Xie M. Metrical Consistency NMF for Predicting Gene-Phenotype Associations. Interdiscip Sci 2017; 10:189-194. [PMID: 28391494 DOI: 10.1007/s12539-017-0224-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 02/13/2017] [Accepted: 03/09/2017] [Indexed: 10/19/2022]
Abstract
Discovering gene-phenotype associations is significant to understand the disease mechanisms. Nonnegative matrix factorization (NMF) has been widely used in computational biology for its good performance and interpretability. In this paper, we proposed a novel metrical consistency NMF (MCNMF) method for candidate gene prioritization. The MCNMF method assume that phenotype similarities, calculated from various independent ways, should be consistent in case that the associations between genes and phenotypes are completely known. Experiment results show that our method can recover the gene-phenotype associations effectively and outperform the comparative methods.
Collapse
Affiliation(s)
- Shuai Han
- College of Software, Nankai University, Tianjin, 300350, China
| | - Hong Cai
- College of Software, Nankai University, Tianjin, 300350, China
| | - Dan Che
- College of Software, Nankai University, Tianjin, 300350, China
| | - Yaogong Zhang
- College of Software, Nankai University, Tianjin, 300350, China
| | - Yalou Huang
- College of Software, Nankai University, Tianjin, 300350, China
| | - Maoqiang Xie
- College of Software, Nankai University, Tianjin, 300350, China.
| |
Collapse
|
28
|
Wang S, Cong Y, Fan H, Liu L, Li X, Yang Y, Tang Y, Zhao H, Yu H. Computer-Aided Endoscopic Diagnosis Without Human-Specific Labeling. IEEE Trans Biomed Eng 2016; 63:2347-2358. [DOI: 10.1109/tbme.2016.2530141] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
29
|
Ou W, Yu S, Li G, Lu J, Zhang K, Xie G. Multi-view non-negative matrix factorization by patch alignment framework with view consistency. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.09.133] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
30
|
Differential Regulatory Analysis Based on Coexpression Network in Cancer Research. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4241293. [PMID: 27597964 PMCID: PMC4997028 DOI: 10.1155/2016/4241293] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Revised: 06/09/2016] [Accepted: 06/12/2016] [Indexed: 12/15/2022]
Abstract
With rapid development of high-throughput techniques and accumulation of big transcriptomic data, plenty of computational methods and algorithms such as differential analysis and network analysis have been proposed to explore genome-wide gene expression characteristics. These efforts are aiming to transform underlying genomic information into valuable knowledges in biological and medical research fields. Recently, tremendous integrative research methods are dedicated to interpret the development and progress of neoplastic diseases, whereas differential regulatory analysis (DRA) based on gene coexpression network (GCN) increasingly plays a robust complement to regular differential expression analysis in revealing regulatory functions of cancer related genes such as evading growth suppressors and resisting cell death. Differential regulatory analysis based on GCN is prospective and shows its essential role in discovering the system properties of carcinogenesis features. Here we briefly review the paradigm of differential regulatory analysis based on GCN. We also focus on the applications of differential regulatory analysis based on GCN in cancer research and point out that DRA is necessary and extraordinary to reveal underlying molecular mechanism in large-scale carcinogenesis studies.
Collapse
|
31
|
Mohammadi M, Sharifi Noghabi H, Abed Hodtani G, Rajabi Mashhadi H. Robust and stable gene selection via Maximum–Minimum Correntropy Criterion. Genomics 2016; 107:83-87. [DOI: 10.1016/j.ygeno.2015.12.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Revised: 12/13/2015] [Accepted: 12/23/2015] [Indexed: 11/17/2022]
|
32
|
Chen J, Ma Q, Hu X, Zhang M, Qin D, Lu X. Gene selection and cancer classification using Monte Carlo and nonnegative matrix factorization. RSC Adv 2016. [DOI: 10.1039/c6ra05694f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Cancer classification is a key problem for identifying the genomic biomarkers and treating cancerous tumors in clinical research.
Collapse
Affiliation(s)
- Jing Chen
- Key Laboratory of Bioelectrochemistry & Environmental Analysis of Gansu Province
- College of Chemistry & Chemical Engineering
- Northwest Normal University
- P. R. China
| | - Qin Ma
- Key Laboratory of Bioelectrochemistry & Environmental Analysis of Gansu Province
- College of Chemistry & Chemical Engineering
- Northwest Normal University
- P. R. China
| | - Xiaoyan Hu
- Key Laboratory of Bioelectrochemistry & Environmental Analysis of Gansu Province
- College of Chemistry & Chemical Engineering
- Northwest Normal University
- P. R. China
| | - Miao Zhang
- Key Laboratory of Bioelectrochemistry & Environmental Analysis of Gansu Province
- College of Chemistry & Chemical Engineering
- Northwest Normal University
- P. R. China
| | - Dongdong Qin
- Key Laboratory of Bioelectrochemistry & Environmental Analysis of Gansu Province
- College of Chemistry & Chemical Engineering
- Northwest Normal University
- P. R. China
| | - Xiaoquan Lu
- Key Laboratory of Bioelectrochemistry & Environmental Analysis of Gansu Province
- College of Chemistry & Chemical Engineering
- Northwest Normal University
- P. R. China
| |
Collapse
|
33
|
Wang Y, Pan C, Xiang S, Zhu F. Robust Hyperspectral Unmixing With Correntropy-Based Metric. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:4027-4040. [PMID: 26186789 DOI: 10.1109/tip.2015.2456508] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Hyperspectral unmixing is one of the crucial steps for many hyperspectral applications. The problem of hyperspectral unmixing has proved to be a difficult task in unsupervised work settings where the endmembers and abundances are both unknown. In addition, this task becomes more challenging in the case that the spectral bands are degraded by noise. This paper presents a robust model for unsupervised hyperspectral unmixing. Specifically, our model is developed with the correntropy-based metric where the nonnegative constraints on both endmembers and abundances are imposed to keep physical significance. Besides, a sparsity prior is explicitly formulated to constrain the distribution of the abundances of each endmember. To solve our model, a half-quadratic optimization technique is developed to convert the original complex optimization problem into an iteratively reweighted nonnegative matrix factorization with sparsity constraints. As a result, the optimization of our model can adaptively assign small weights to noisy bands and put more emphasis on noise-free bands. In addition, with sparsity constraints, our model can naturally generate sparse abundances. Experiments on synthetic and real data demonstrate the effectiveness of our model in comparison to the related state-of-the-art unmixing models.
Collapse
|
34
|
Mohammadi M, Hodtani GA, Yassi M. A robust Correntropy-based method for analyzing multisample aCGH data. Genomics 2015; 106:257-64. [DOI: 10.1016/j.ygeno.2015.07.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2015] [Revised: 07/14/2015] [Accepted: 07/20/2015] [Indexed: 11/16/2022]
|
35
|
Liu X, Wang J, Yin M, Edwards B, Xu P. Supervised learning of sparse context reconstruction coefficients for data representation and classification. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-2042-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
36
|
|
37
|
Yang X, Yang F. Completing tags by local learning: a novel image tag completion method based on neighborhood tag vector predictor. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-1983-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
38
|
|
39
|
Li L, Yang J, Zhao K, Xu Y, Zhang H, Fan Z. Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy. ACTA ACUST UNITED AC 2014. [DOI: 10.4304/jcp.9.11.2570-2579] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
40
|
Chen X, Jian C. Gene expression data clustering based on graph regularized subspace segmentation. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.06.023] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
41
|
Wang JJY, Gao X. Max-min distance nonnegative matrix factorization. Neural Netw 2014; 61:75-84. [PMID: 25462636 DOI: 10.1016/j.neunet.2014.10.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 10/12/2014] [Accepted: 10/16/2014] [Indexed: 11/16/2022]
Abstract
Nonnegative Matrix Factorization (NMF) has been a popular representation method for pattern classification problems. It tries to decompose a nonnegative matrix of data samples as the product of a nonnegative basis matrix and a nonnegative coefficient matrix. The columns of the coefficient matrix can be used as new representations of these data samples. However, traditional NMF methods ignore class labels of the data samples. In this paper, we propose a novel supervised NMF algorithm to improve the discriminative ability of the new representation by using the class labels. Using the class labels, we separate all the data sample pairs into within-class pairs and between-class pairs. To improve the discriminative ability of the new NMF representations, we propose to minimize the maximum distance of the within-class pairs in the new NMF space, and meanwhile to maximize the minimum distance of the between-class pairs. With this criterion, we construct an objective function and optimize it with regard to basis and coefficient matrices, and slack variables alternatively, resulting in an iterative algorithm. The proposed algorithm is evaluated on three pattern classification problems and experiment results show that it outperforms the state-of-the-art supervised NMF methods.
Collapse
Affiliation(s)
- Jim Jing-Yan Wang
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.
| | | |
Collapse
|
42
|
Liu JX, Liu J, Gao YL, Mi JX, Ma CX, Wang D. A class-information-based penalized matrix decomposition for identifying plants core genes responding to abiotic stresses. PLoS One 2014; 9:e106097. [PMID: 25180509 PMCID: PMC4152128 DOI: 10.1371/journal.pone.0106097] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 07/29/2014] [Indexed: 12/03/2022] Open
Abstract
In terms of making genes expression data more interpretable and comprehensible, there exists a significant superiority on sparse methods. Many sparse methods, such as penalized matrix decomposition (PMD) and sparse principal component analysis (SPCA), have been applied to extract plants core genes. Supervised algorithms, especially the support vector machine-recursive feature elimination (SVM-RFE) method, always have good performance in gene selection. In this paper, we draw into class information via the total scatter matrix and put forward a class-information-based penalized matrix decomposition (CIPMD) method to improve the gene identification performance of PMD-based method. Firstly, the total scatter matrix is obtained based on different samples of the gene expression data. Secondly, a new data matrix is constructed by decomposing the total scatter matrix. Thirdly, the new data matrix is decomposed by PMD to obtain the sparse eigensamples. Finally, the core genes are identified according to the nonzero entries in eigensamples. The results on simulation data show that CIPMD method can reach higher identification accuracies than the conventional gene identification methods. Moreover, the results on real gene expression data demonstrate that CIPMD method can identify more core genes closely related to the abiotic stresses than the other methods.
Collapse
Affiliation(s)
- Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong, China
- Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, Guangdong, China
- * E-mail:
| | - Jian Liu
- School of Communication, Qufu Normal University, Rizhao, Shandong, China
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, Shandong, China
| | - Jian-Xun Mi
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
- Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Chun-Xia Ma
- School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong, China
| | - Dong Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong, China
| |
Collapse
|
43
|
Yang M, Li X, Li Z, Ou Z, Liu M, Liu S, Li X, Yang S. Gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and S-plot using microarray data. PLoS One 2013; 8:e84253. [PMID: 24386356 PMCID: PMC3875537 DOI: 10.1371/journal.pone.0084253] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 11/12/2013] [Indexed: 11/28/2022] Open
Abstract
Motivation DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes. Results Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub’s leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.
Collapse
Affiliation(s)
- Mingxing Yang
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
- Department of Electronic Science, School of Physics and Mechanical & Electrical Engineering, Xiamen University, Xiamen, China
| | - Xiumin Li
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Zhibin Li
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Zhimin Ou
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Ming Liu
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Suhuan Liu
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Xuejun Li
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
- Department of Endocrinology and Diabetes, the First Affiliated Hospital of Xiamen University, Xiamen, China
- * E-mail: (SY); (Xuejun Li)
| | - Shuyu Yang
- Xiamen Diabetes Institute, the First Affiliated Hospital of Xiamen University, Xiamen, China
- * E-mail: (SY); (Xuejun Li)
| |
Collapse
|
44
|
Abusamra H. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma. ACTA ACUST UNITED AC 2013. [DOI: 10.1016/j.procs.2013.10.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|