1
|
Liu C, Li R, Wu S, Che H, Jiang D, Yu Z, Wong HS. Self-Guided Partial Graph Propagation for Incomplete Multiview Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10803-10816. [PMID: 37028079 DOI: 10.1109/tnnls.2023.3244021] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In this work, we study a more realistic challenging scenario in multiview clustering (MVC), referred to as incomplete MVC (IMVC) where some instances in certain views are missing. The key to IMVC is how to adequately exploit complementary and consistency information under the incompleteness of data. However, most existing methods address the incompleteness problem at the instance level and they require sufficient information to perform data recovery. In this work, we develop a new approach to facilitate IMVC based on the graph propagation perspective. Specifically, a partial graph is used to describe the similarity of samples for incomplete views, such that the issue of missing instances can be translated into the missing entries of the partial graph. In this way, a common graph can be adaptively learned to self-guide the propagation process by exploiting the consistency information, and the propagated graph of each view is in turn used to refine the common self-guided graph in an iterative manner. Thus, the associated missing entries can be inferred through graph propagation by exploiting the consistency information across all views. On the other hand, existing approaches focus on the consistency structure only, and the complementary information has not been sufficiently exploited due to the data incompleteness issue. By contrast, under the proposed graph propagation framework, an exclusive regularization term can be naturally adopted to exploit the complementary information in our method. Extensive experiments demonstrate the effectiveness of the proposed method in comparison with state-of-the-art methods. The source code of our method is available at the https://github.com/CLiu272/TNNLS-PGP.
Collapse
|
2
|
Zhu S, Wang W, Fang W, Cui M. Autoencoder-assisted latent representation learning for survival prediction and multi-view clustering on multi-omics cancer subtyping. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21098-21119. [PMID: 38124589 DOI: 10.3934/mbe.2023933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer subtyping (or cancer subtypes identification) based on multi-omics data has played an important role in advancing diagnosis, prognosis and treatment, which triggers the development of advanced multi-view clustering algorithms. However, the high-dimension and heterogeneity of multi-omics data make great effects on the performance of these methods. In this paper, we propose to learn the informative latent representation based on autoencoder (AE) to naturally capture nonlinear omic features in lower dimensions, which is helpful for identifying the similarity of patients. Moreover, to take advantage of survival information or clinical information, a multi-omic survival analysis approach is embedded when integrating the similarity graph of heterogeneous data at the multi-omics level. Then, the clustering method is performed on the integrated similarity to generate subtype groups. In the experimental part, the effectiveness of the proposed framework is confirmed by evaluating five different multi-omics datasets, taken from The Cancer Genome Atlas. The results show that AE-assisted multi-omics clustering method can identify clinically significant cancer subtypes.
Collapse
Affiliation(s)
- Shuwei Zhu
- School of Artificial Intelligence and Computer Science, Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| | - Wenping Wang
- School of Artificial Intelligence and Computer Science, Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| | - Wei Fang
- School of Artificial Intelligence and Computer Science, Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| | - Meiji Cui
- School of Intelligent Manufacturing, Nanjing University of Science and Technology, Nanjing 210094, China
| |
Collapse
|
3
|
Ge S, Liu J, Cheng Y, Meng X, Wang X. Multi-view spectral clustering with latent representation learning for applications on multi-omics cancer subtyping. Brief Bioinform 2023; 24:6850565. [PMID: 36445207 DOI: 10.1093/bib/bbac500] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 09/19/2022] [Accepted: 10/22/2022] [Indexed: 11/30/2022] Open
Abstract
Driven by multi-omics data, some multi-view clustering algorithms have been successfully applied to cancer subtypes prediction, aiming to identify subtypes with biometric differences in the same cancer, thereby improving the clinical prognosis of patients and designing personalized treatment plan. Due to the fact that the number of patients in omics data is much smaller than the number of genes, multi-view spectral clustering based on similarity learning has been widely developed. However, these algorithms still suffer some problems, such as over-reliance on the quality of pre-defined similarity matrices for clustering results, inability to reasonably handle noise and redundant information in high-dimensional omics data, ignoring complementary information between omics data, etc. This paper proposes multi-view spectral clustering with latent representation learning (MSCLRL) method to alleviate the above problems. First, MSCLRL generates a corresponding low-dimensional latent representation for each omics data, which can effectively retain the unique information of each omics and improve the robustness and accuracy of the similarity matrix. Second, the obtained latent representations are assigned appropriate weights by MSCLRL, and global similarity learning is performed to generate an integrated similarity matrix. Third, the integrated similarity matrix is used to feed back and update the low-dimensional representation of each omics. Finally, the final integrated similarity matrix is used for clustering. In 10 benchmark multi-omics datasets and 2 separate cancer case studies, the experiments confirmed that the proposed method obtained statistically and biologically meaningful cancer subtypes.
Collapse
Affiliation(s)
- Shuguang Ge
- School of Information and Control Engineering, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China.,Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China
| | - Jian Liu
- School of Information and Control Engineering, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China.,Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China
| | - Yuhu Cheng
- School of Information and Control Engineering, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China.,Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China
| | - Xiaojing Meng
- School of Medical Information and Engineering, Xuzhou Medical University, No. 209, Tongshan Road, 221116 Xuzhou, Jiangsu, China
| | - Xuesong Wang
- School of Information and Control Engineering, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China.,Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, No. 1, Daxue Road, 221116 Xuzhou, Jiangsu, China
| |
Collapse
|
4
|
Chen J, Rong W, Tao G, Cai H. Similarity Fusion via Exploiting High Order Proximity for Cancer Subtyping. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:658-667. [PMID: 34971537 DOI: 10.1109/tcbb.2021.3139597] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Identifying cancer subtypes holds essential promise for improving prognosis and personalized treatment. Cancer subtyping based on multi-omics data has become a hotspot in bioinformatics research. One of the critical approaches of handling data heterogeneity in multi-omics data is first modeling each omics data as a separate similarity graph. Then, the information of multiple graphs is integrated into a unified graph. However, a significant challenge is how to measure the similarity of nodes in each graph and preserve cluster information of each graph. To that end, we exploit a new high order proximity in each graph and propose a similarity fusion method to fuse the high order proximity of multiple graphs while preserving cluster information of multiple graphs. Compared with the current techniques employing the first order proximity, exploiting high order proximity contributes to attaining accurate similarity. The proposed similarity fusion method makes full use of the complementary information from multi-omics data. Experiments in six benchmark multi-omics datasets and two individual cancer case studies confirm that our proposed method achieves statistically significant and biologically meaningful cancer subtypes.
Collapse
|
5
|
Alfatemi A, Peng H, Rong W, Zhang B, Cai H. Patient subgrouping with distinct survival rates via integration of multiomics data on a Grassmann manifold. BMC Med Inform Decis Mak 2022; 22:190. [PMID: 35870923 PMCID: PMC9308936 DOI: 10.1186/s12911-022-01938-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 07/15/2022] [Indexed: 11/10/2022] Open
Abstract
Background Patient subgroups are important for easily understanding a disease and for providing precise yet personalized treatment through multiple omics dataset integration. Multiomics datasets are produced daily. Thus, the fusion of heterogeneous big data into intrinsic structures is an urgent problem. Novel mathematical methods are needed to process these data in a straightforward way. Results We developed a novel method for subgrouping patients with distinct survival rates via the integration of multiple omics datasets and by using principal component analysis to reduce the high data dimensionality. Then, we constructed similarity graphs for patients, merged the graphs in a subspace, and analyzed them on a Grassmann manifold. The proposed method could identify patient subgroups that had not been reported previously by selecting the most critical information during the merging at each level of the omics dataset. Our method was tested on empirical multiomics datasets from The Cancer Genome Atlas. Conclusion Through the integration of microRNA, gene expression, and DNA methylation data, our method accurately identified patient subgroups and achieved superior performance compared with popular methods. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-01938-y.
Collapse
|
6
|
Ranek JS, Stanley N, Purvis JE. Integrating temporal single-cell gene expression modalities for trajectory inference and disease prediction. Genome Biol 2022; 23:186. [PMID: 36064614 PMCID: PMC9442962 DOI: 10.1186/s13059-022-02749-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 08/16/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Current methods for analyzing single-cell datasets have relied primarily on static gene expression measurements to characterize the molecular state of individual cells. However, capturing temporal changes in cell state is crucial for the interpretation of dynamic phenotypes such as the cell cycle, development, or disease progression. RNA velocity infers the direction and speed of transcriptional changes in individual cells, yet it is unclear how these temporal gene expression modalities may be leveraged for predictive modeling of cellular dynamics. RESULTS Here, we present the first task-oriented benchmarking study that investigates integration of temporal sequencing modalities for dynamic cell state prediction. We benchmark ten integration approaches on ten datasets spanning different biological contexts, sequencing technologies, and species. We find that integrated data more accurately infers biological trajectories and achieves increased performance on classifying cells according to perturbation and disease states. Furthermore, we show that simple concatenation of spliced and unspliced molecules performs consistently well on classification tasks and can be used over more memory intensive and computationally expensive methods. CONCLUSIONS This work illustrates how integrated temporal gene expression modalities may be leveraged for predicting cellular trajectories and sample-associated perturbation and disease phenotypes. Additionally, this study provides users with practical recommendations for task-specific integration of single-cell gene expression modalities.
Collapse
Affiliation(s)
- Jolene S. Ranek
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Natalie Stanley
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Jeremy E. Purvis
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA
| |
Collapse
|
7
|
Consistent Affinity Representation Learning with Dual Low-rank Constraints for Multi-view Subspace Clustering. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
8
|
Khan A, Maji P. Multi-Manifold Optimization for Multi-View Subspace Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3895-3907. [PMID: 33606638 DOI: 10.1109/tnnls.2021.3054789] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The meaningful patterns embedded in high-dimensional multi-view data sets typically tend to have a much more compact representation that often lies close to a low-dimensional manifold. Identification of hidden structures in such data mainly depends on the proper modeling of the geometry of low-dimensional manifolds. In this regard, this article presents a manifold optimization-based integrative clustering algorithm for multi-view data. To identify consensus clusters, the algorithm constructs a joint graph Laplacian that contains denoised cluster information of the individual views. It optimizes a joint clustering objective while reducing the disagreement between the cluster structures conveyed by the joint and individual views. The optimization is performed alternatively over k -means and Stiefel manifolds. The Stiefel manifold helps to model the nonlinearities and differential clusters within the individual views, whereas k -means manifold tries to elucidate the best-fit joint cluster structure of the data. A gradient-based movement is performed separately on the manifold of each view so that individual nonlinearity is preserved while looking for shared cluster information. The convergence of the proposed algorithm is established over the manifold and asymptotic convergence bound is obtained to quantify theoretically how fast the sequence of iterates generated by the algorithm converges to an optimal solution. The integrative clustering on benchmark and multi-omics cancer data sets demonstrates that the proposed algorithm outperforms state-of-the-art multi-view clustering approaches.
Collapse
|
9
|
Al-Kuhali HA, Shan M, Hael MA, Al-Hada EA, Al-Murisi SA, Al-Kuhali AA, Aldaifl AAQ, Amin ME. Multiview clustering of multi-omics data integration by using a penalty model. BMC Bioinformatics 2022; 23:288. [PMID: 35864439 PMCID: PMC9306064 DOI: 10.1186/s12859-022-04826-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 06/20/2022] [Indexed: 11/10/2022] Open
Abstract
Background Methods for the multiview clustering and integration of multi-omics data have been developed recently to solve problems caused by data noise or limited sample size and to integrate multi-omics data with consistent (common) and differential cluster patterns. However, the integration of such data still suffers from limited performance and low accuracy. Results In this study, a computational framework for the multiview clustering method based on the penalty model is presented to overcome the challenges of low accuracy and limited performance in the case of integrating multi-omics data with consistent (common) and differential cluster patterns. The performance of the proposed method was evaluated on synthetic data and four real multi-omics data and then compared with approaches presented in the literature under different scenarios. Result implies that our method exhibits competitive performance compared with recently developed techniques when the underlying clusters are consistent with synthetic data. In the case of the differential clusters, the proposed method also presents an enhanced performance. In addition, with regards to real omics data, the developed method exhibits better performance, demonstrating its ability to provide more detailed information within each data type and working better to integrate multi-omics data with consistent (common) and differential cluster patterns. This study shows that the proposed method offers more significant differences in survival times across all types of cancer. Conclusions A new multiview clustering method is proposed in this study based on synthetic and real data. This method performs better than other techniques previously presented in the literature in terms of integrating multi-omics data with consistent and differential cluster patterns and determining the significance of difference in survival times.
Collapse
Affiliation(s)
- Hamas A Al-Kuhali
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China
| | - Ma Shan
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China.
| | | | - Eman A Al-Hada
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China
| | | | | | - Ammar A Q Aldaifl
- School of Information Engineering, Wuhan University of Technology, Wuhan, China
| | - Mohammed Elmustafa Amin
- Department of Mathematics, Faculty of Science and Technology, Omdurman Islamic University, Khartoum, Sudan
| |
Collapse
|
10
|
Liu C, Cao W, Wu S, Shen W, Jiang D, Yu Z, Wong HS. Supervised Graph Clustering for Cancer Subtyping Based on Survival Analysis and Integration of Multi-Omic Tumor Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1193-1202. [PMID: 32750893 DOI: 10.1109/tcbb.2020.3010509] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Identifying cancer subtypes by integration of multi-omic data is beneficial to improve the understanding of disease progression, and provides more precise treatment for patients. Cancer subtypes identification is usually accomplished by clustering patients with unsupervised learning approaches. Thus, most existing integrative cancer subtyping methods are performed in an entirely unsupervised way. An integrative cancer subtyping approach can be improved to discover clinically more relevant cancer subtypes when considering the clinical survival response variables. In this study, we propose a Survival Supervised Graph Clustering (S2GC)for cancer subtyping by taking into consideration survival information. Specifically, we use a graph to represent similarity of patients, and develop a multi-omic survival analysis embedding with patient-to-patient similarity graph learning for cancer subtype identification. The multi-view (omic)survival analysis model and graph of patients are jointly learned in a unified way. The learned optimal graph can be unitized to cluster cancer subtypes directly. In the proposed model, the survival analysis model and adaptive graph learning could positively reinforce each other. Consequently, the survival time can be considered as supervised information to improve the quality of the similarity graph and explore clinically more relevant subgroups of patients. Experiments on several representative multi-omic cancer datasets demonstrate that the proposed method achieves better results than a number of state-of-the-art methods. The results also suggest that our method is able to identify biologically meaningful subgroups for different cancer types. (Our Matlab source code is available online at github: https://github.com/CLiu272/S2GC).
Collapse
|
11
|
Li X, Ma J, Leng L, Han M, Li M, He F, Zhu Y. MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Front Genet 2022; 13:806842. [PMID: 35186034 PMCID: PMC8847688 DOI: 10.3389/fgene.2022.806842] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 01/14/2022] [Indexed: 12/17/2022] Open
Abstract
In light of the rapid accumulation of large-scale omics datasets, numerous studies have attempted to characterize the molecular and clinical features of cancers from a multi-omics perspective. However, there are great challenges in integrating multi-omics using machine learning methods for cancer subtype classification. In this study, MoGCN, a multi-omics integration model based on graph convolutional network (GCN) was developed for cancer subtype classification and analysis. Genomics, transcriptomics and proteomics datasets for 511 breast invasive carcinoma (BRCA) samples were downloaded from the Cancer Genome Atlas (TCGA). The autoencoder (AE) and the similarity network fusion (SNF) methods were used to reduce dimensionality and construct the patient similarity network (PSN), respectively. Then the vector features and the PSN were input into the GCN for training and testing. Feature extraction and network visualization were used for further biological knowledge discovery and subtype classification. In the analysis of multi-dimensional omics data of the BRCA samples in TCGA, MoGCN achieved the highest accuracy in cancer subtype classification compared with several popular algorithms. Moreover, MoGCN can extract the most significant features of each omics layer and provide candidate functional molecules for further analysis of their biological effects. And network visualization showed that MoGCN could make clinically intuitive diagnosis. The generality of MoGCN was proven on the TCGA pan-kidney cancer datasets. MoGCN and datasets are public available at https://github.com/Lifoof/MoGCN. Our study shows that MoGCN performs well for heterogeneous data integration and the interpretability of classification results, which confers great potential for applications in biomarker identification and clinical diagnosis.
Collapse
Affiliation(s)
- Xiao Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Jie Ma
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Ling Leng
- Stem Cell and Regenerative Medicine Lab, Department of Medical Science Research Center, State Key Laboratory of Complex Severe and Rare Diseases, Translational Medicine Center, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing, China
| | - Mingfei Han
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Mansheng Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| |
Collapse
|
12
|
Ward RA, Aghaeepour N, Bhattacharyya RP, Clish CB, Gaudillière B, Hacohen N, Mansour MK, Mudd PA, Pasupneti S, Presti RM, Rhee EP, Sen P, Spec A, Tam JM, Villani AC, Woolley AE, Hsu JL, Vyas JM. Harnessing the Potential of Multiomics Studies for Precision Medicine in Infectious Disease. Open Forum Infect Dis 2021; 8:ofab483. [PMID: 34805429 PMCID: PMC8598922 DOI: 10.1093/ofid/ofab483] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 09/21/2021] [Indexed: 12/11/2022] Open
Abstract
The field of infectious diseases currently takes a reactive approach and treats infections as they present in patients. Although certain populations are known to be at greater risk of developing infection (eg, immunocompromised), we lack a systems approach to define the true risk of future infection for a patient. Guided by impressive gains in "omics" technologies, future strategies to infectious diseases should take a precision approach to infection through identification of patients at intermediate and high-risk of infection and deploy targeted preventative measures (ie, prophylaxis). The advances of high-throughput immune profiling by multiomics approaches (ie, transcriptomics, epigenomics, metabolomics, proteomics) hold the promise to identify patients at increased risk of infection and enable risk-stratifying approaches to be applied in the clinic. Integration of patient-specific data using machine learning improves the effectiveness of prediction, providing the necessary technologies needed to propel the field of infectious diseases medicine into the era of personalized medicine.
Collapse
Affiliation(s)
- Rebecca A Ward
- Division of Infectious Disease, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, California, USA
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Palo Alto, California, USA
| | - Roby P Bhattacharyya
- Division of Infectious Disease, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Clary B Clish
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Brice Gaudillière
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, California, USA
- Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, USA
| | - Nir Hacohen
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Cancer for Cancer Research, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Michael K Mansour
- Division of Infectious Disease, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Philip A Mudd
- Department of Emergency Medicine, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Shravani Pasupneti
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, California, USA
- Veterans Affairs Palo Alto Health Care System, Medical Service, Palo Alto, California, USA
| | - Rachel M Presti
- Division of Infectious Diseases, Department of lnternal Medicine, Washington University School of Medicine, St. Louis, Missouri, USA
- Center for Vaccines and Immunity to Microbial Pathogens, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Eugene P Rhee
- The Nephrology Division and Endocrine Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Pritha Sen
- Division of Infectious Disease, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- Center for Immunology and Inflammatory Diseases, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Andrej Spec
- Division of Infectious Diseases, Department of lnternal Medicine, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Jenny M Tam
- Harvard Medical School, Boston, Massachusetts, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts, USA
| | - Alexandra-Chloé Villani
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- Center for Immunology and Inflammatory Diseases, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Ann E Woolley
- Division of Infectious Diseases, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Joe L Hsu
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, California, USA
- Veterans Affairs Palo Alto Health Care System, Medical Service, Palo Alto, California, USA
| | - Jatin M Vyas
- Division of Infectious Disease, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
13
|
Wang H, Han G, Zhang B, Tao G, Cai H. Multi-View Learning a Decomposable Affinity Matrix via Tensor Self-Representation on Grassmann Manifold. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8396-8409. [PMID: 34587010 DOI: 10.1109/tip.2021.3114995] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multi-view clustering aims to partition objects into potential categories by utilizing cross-view information. One of the core issues is to sufficiently leverage different views to learn a latent subspace, within which the clustering task is performed. Recently, it has been shown that representing the multi-view data by a tensor and then learning a latent self-expressive tensor is effective. However, early works mainly focus on learning essential tensor representation from multi-view data and the resulted affinity matrix is considered as a byproduct or is computed by a simple average in Euclidean space, thereby destroying the intrinsic clustering structure. To that end, here we proposed a novel multi-view clustering method to directly learn a well-structured affinity matrix driven by the clustering task on Grassmann manifold. Specifically, we firstly employed a tensor learning model to unify multiple feature spaces into a latent low-rank tensor space. Then each individual view was merged on Grassmann manifold to obtain both an integrative subspace and a consensus affinity matrix, driven by clustering task. The two parts are modeled by a unified objective function and optimized jointly to mine a decomposable affinity matrix. Extensive experiments on eight real-world datasets show that our method achieves superior performances over other popular methods.
Collapse
|
14
|
Tian J, Zhao J, Zheng C. Clustering of cancer data based on Stiefel manifold for multiple views. BMC Bioinformatics 2021; 22:268. [PMID: 34034643 PMCID: PMC8152349 DOI: 10.1186/s12859-021-04195-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/12/2021] [Indexed: 12/23/2022] Open
Abstract
Background In recent years, various sequencing techniques have been used to collect biomedical omics datasets. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. Nevertheless, clustering of omics data consists of many challenges. The primary challenges in omics data analysis come from high dimension of data and small size of sample. Therefore, it is difficult to find a suitable integration method for structural analysis of multiple datasets. Results In this paper, a multi-view clustering based on Stiefel manifold method (MCSM) is proposed. The MCSM method comprises three core steps. Firstly, we established a binary optimization model for the simultaneous clustering problem. Secondly, we solved the optimization problem by linear search algorithm based on Stiefel manifold. Finally, we integrated the clustering results obtained from three omics by using k-nearest neighbor method. We applied this approach to four cancer datasets on TCGA. The result shows that our method is superior to several state-of-art methods, which depends on the hypothesis that the underlying omics cluster class is the same. Conclusion Particularly, our approach has better performance than compared approaches when the underlying clusters are inconsistent. For patients with different subtypes, both consistent and differential clusters can be identified at the same time.
Collapse
Affiliation(s)
- Jing Tian
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jianping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.
| | - Chunhou Zheng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.,School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|