1
|
Li H, Wang J, Zhang N, Zhang W. Binary matrix factorization via collaborative neurodynamic optimization. Neural Netw 2024; 176:106348. [PMID: 38735099 DOI: 10.1016/j.neunet.2024.106348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/19/2024] [Accepted: 04/25/2024] [Indexed: 05/14/2024]
Abstract
Binary matrix factorization is an important tool for dimension reduction for high-dimensional datasets with binary attributes and has been successfully applied in numerous areas. This paper presents a collaborative neurodynamic optimization approach to binary matrix factorization based on the original combinatorial optimization problem formulation and quadratic unconstrained binary optimization problem reformulations. The proposed approach employs multiple discrete Hopfield networks operating concurrently in search of local optima. In addition, a particle swarm optimization rule is used to reinitialize neuronal states iteratively to escape from local minima toward better ones. Experimental results on eight benchmark datasets are elaborated to demonstrate the superior performance of the proposed approach against six baseline algorithms in terms of factorization error. Additionally, the viability of the proposed approach is demonstrated for pattern discovery on three datasets.
Collapse
Affiliation(s)
- Hongzong Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong.
| | - Jun Wang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong; School of Data Science, City University of Hong Kong, Kowloon, Hong Kong.
| | - Nian Zhang
- Department of Electrical & Computer Engineering, University of the District of Columbia, Washington, DC, USA.
| | - Wei Zhang
- Chongqing Engineering Research Center of Internet of Things and Intelligent Control Technology, Chongqing Three Gorges University, Chongqing, China.
| |
Collapse
|
2
|
Zhang W, Wendt C, Bowler R, Hersh CP, Safo SE. Robust integrative biclustering for multi-view data. Stat Methods Med Res 2022; 31:2201-2216. [PMID: 36113157 PMCID: PMC10153449 DOI: 10.1177/09622802221122427] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In many biomedical research, multiple views of data (e.g. genomics, proteomics) are available, and a particular interest might be the detection of sample subgroups characterized by specific groups of variables. Biclustering methods are well-suited for this problem as they assume that specific groups of variables might be relevant only to specific groups of samples. Many biclustering methods exist for detecting row-column clusters in a view but few methods exist for data from multiple views. The few existing algorithms are heavily dependent on regularization parameters for getting row-column clusters, and they impose unnecessary burden on users thus limiting their use in practice. We extend an existing biclustering method based on sparse singular value decomposition for single-view data to data from multiple views. Our method, integrative sparse singular value decomposition (iSSVD), incorporates stability selection to control Type I error rates, estimates the probability of samples and variables to belong to a bicluster, finds stable biclusters, and results in interpretable row-column associations. Simulations and real data analyses show that integrative sparse singular value decomposition outperforms several other single- and multi-view biclustering methods and is able to detect meaningful biclusters. iSSVD is a user-friendly, computationally efficient algorithm that will be useful in many disease subtyping applications.
Collapse
Affiliation(s)
- Weijie Zhang
- Division of Biostatistics, 5635University of Minnesota, MN, USA
| | - Christine Wendt
- Division of Pulmonary, Allergy and Critical Care, 5635University of Minnesota, MN, USA
| | - Russel Bowler
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, 551774National Jewish Health, Denver, USA
| | - Craig P Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital, 1811Harvard Medical School, USA
| | - Sandra E Safo
- Division of Biostatistics, 5635University of Minnesota, MN, USA
| |
Collapse
|
3
|
Wan C, Dang P, Zhao T, Zang Y, Zhang C, Cao S. Bias Aware Probabilistic Boolean Matrix Factorization. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2022; 180:2035-2044. [PMID: 37576874 PMCID: PMC10421704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Boolean matrix factorization (BMF) is a combinatorial problem arising from a wide range of applications including recommendation system, collaborative filtering, and dimensionality reduction. Currently, the noise model of existing BMF methods is often assumed to be homoscedastic; however, in real world data scenarios, the deviations of observed data from their true values are almost surely diverse due to stochastic noises, making each data point not equally suitable for fitting a model. In this case, it is not ideal to treat all data points as equally distributed. Motivated by such observations, we introduce a probabilistic BMF model that recognizes the object- and feature-wise bias distribution respectively, called bias aware BMF (BABF). To the best of our knowledge, BABF is the first approach for Boolean decomposition with consideration of the feature-wise and object-wise bias in binary data. We conducted experiments on datasets with different levels of background noise, bias level, and sizes of the signal patterns, to test the effectiveness of our method in various scenarios. We demonstrated that our model outperforms the state-of-the-art factorization methods in both accuracy and efficiency in recovering the original datasets, and the inferred bias level is highly significantly correlated with true existing bias in both simulated and real world datasets.
Collapse
Affiliation(s)
- Changlin Wan
- Indiana University, Indianapolis, Indiana, United States
- Purdue University, West Lafayette, Indiana, United States
| | - Pengtao Dang
- Indiana University, Indianapolis, Indiana, United States
| | - Tong Zhao
- Amazon, Seattle, Washington, United States
| | - Yong Zang
- Indiana University, Indianapolis, Indiana, United States
| | - Chi Zhang
- Indiana University, Indianapolis, Indiana, United States
| | - Sha Cao
- Indiana University, Indianapolis, Indiana, United States
| |
Collapse
|
4
|
Trnecka M, Vyjidacek R. Revisiting the GreCon algorithm for Boolean matrix factorization. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
5
|
Malik OA, Ushijima-Mwesigwa H, Roy A, Mandal A, Ghosh I. Binary matrix factorization on special purpose hardware. PLoS One 2021; 16:e0261250. [PMID: 34914786 PMCID: PMC8675762 DOI: 10.1371/journal.pone.0261250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 11/25/2021] [Indexed: 11/22/2022] Open
Abstract
Many fundamental problems in data mining can be reduced to one or more NP-hard combinatorial optimization problems. Recent advances in novel technologies such as quantum and quantum-inspired hardware promise a substantial speedup for solving these problems compared to when using general purpose computers but often require the problem to be modeled in a special form, such as an Ising or quadratic unconstrained binary optimization (QUBO) model, in order to take advantage of these devices. In this work, we focus on the important binary matrix factorization (BMF) problem which has many applications in data mining. We propose two QUBO formulations for BMF. We show how clustering constraints can easily be incorporated into these formulations. The special purpose hardware we consider is limited in the number of variables it can handle which presents a challenge when factorizing large matrices. We propose a sampling based approach to overcome this challenge, allowing us to factorize large rectangular matrices. In addition to these methods, we also propose a simple baseline algorithm which outperforms our more sophisticated methods in a few situations. We run experiments on the Fujitsu Digital Annealer, a quantum-inspired complementary metal-oxide-semiconductor (CMOS) annealer, on both synthetic and real data, including gene expression data. These experiments show that our approach is able to produce more accurate BMFs than competing methods.
Collapse
Affiliation(s)
- Osman Asif Malik
- Department of Applied Mathematics, University of Colorado Boulder, Boulder, CO, United States of America
| | | | - Arnab Roy
- Fujitsu Research of America, Inc., Sunnyvale, CA, United States of America
| | - Avradip Mandal
- Fujitsu Research of America, Inc., Sunnyvale, CA, United States of America
| | - Indradeep Ghosh
- Fujitsu Research of America, Inc., Sunnyvale, CA, United States of America
| |
Collapse
|
6
|
Liang L, Zhu K, Tao J, Lu S. ORN: Inferring patient-specific dysregulation status of pathway modules in cancer with OR-gate Network. PLoS Comput Biol 2021; 17:e1008792. [PMID: 33819263 PMCID: PMC8049496 DOI: 10.1371/journal.pcbi.1008792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 04/15/2021] [Accepted: 02/15/2021] [Indexed: 01/26/2023] Open
Abstract
Pathway level understanding of cancer plays a key role in precision oncology. However, the current amount of high-throughput data cannot support the elucidation of full pathway topology. In this study, instead of directly learning the pathway network, we adapted the probabilistic OR gate to model the modular structure of pathways and regulon. The resulting model, OR-gate Network (ORN), can simultaneously infer pathway modules of somatic alterations, patient-specific pathway dysregulation status, and downstream regulon. In a trained ORN, the differentially expressed genes (DEGs) in each tumour can be explained by somatic mutations perturbing a pathway module. Furthermore, the ORN handles one of the most important properties of pathway perturbation in tumours, the mutual exclusivity. We have applied the ORN to lower-grade glioma (LGG) samples and liver hepatocellular carcinoma (LIHC) samples in TCGA and breast cancer samples from METABRIC. Both datasets have shown abnormal pathway activities related to immune response and cell cycles. In LGG samples, ORN identified pathway modules closely related to glioma development and revealed two pathways closely related to patient survival. We had similar results with LIHC samples. Additional results from the METABRIC datasets showed that ORN could characterize critical mechanisms of cancer and connect them to less studied somatic mutations (e.g., BAP1, MIR604, MICAL3, and telomere activities), which may generate novel hypothesis for targeted therapy.
Collapse
Affiliation(s)
- Lifan Liang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Kunju Zhu
- Clinical Medicine Research Institute, Jinan University, Guangzhou, Guangdong, China
| | - Junyan Tao
- Department of Pathology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Songjian Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|