1
|
Li Z, Melograna F, Hoskens H, Duroux D, Marazita ML, Walsh S, Weinberg SM, Shriver MD, Müller-Myhsok B, Claes P, Van Steen K. netMUG: a novel network-guided multi-view clustering workflow for dissecting genetic and facial heterogeneity. Front Genet 2023; 14:1286800. [PMID: 38125750 PMCID: PMC10731261 DOI: 10.3389/fgene.2023.1286800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/14/2023] [Indexed: 12/23/2023] Open
Abstract
Introduction: Multi-view data offer advantages over single-view data for characterizing individuals, which is crucial in precision medicine toward personalized prevention, diagnosis, or treatment follow-up. Methods: Here, we develop a network-guided multi-view clustering framework named netMUG to identify actionable subgroups of individuals. This pipeline first adopts sparse multiple canonical correlation analysis to select multi-view features possibly informed by extraneous data, which are then used to construct individual-specific networks (ISNs). Finally, the individual subtypes are automatically derived by hierarchical clustering on these network representations. Results: We applied netMUG to a dataset containing genomic data and facial images to obtain BMI-informed multi-view strata and showed how it could be used for a refined obesity characterization. Benchmark analysis of netMUG on synthetic data with known strata of individuals indicated its superior performance compared with both baseline and benchmark methods for multi-view clustering. The clustering derived from netMUG achieved an adjusted Rand index of 1 with respect to the synthesized true labels. In addition, the real-data analysis revealed subgroups strongly linked to BMI and genetic and facial determinants of these subgroups. Discussion: netMUG provides a powerful strategy, exploiting individual-specific networks to identify meaningful and actionable strata. Moreover, the implementation is easy to generalize to accommodate heterogeneous data sources or highlight data structures.
Collapse
Affiliation(s)
- Zuqi Li
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Federico Melograna
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Hanne Hoskens
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Diane Duroux
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Susan Walsh
- Department of Biology, Indiana University Indianapolis, Indianapolis, IN, United States
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Mark D. Shriver
- Department of Anthropology, Pennsylvania State University, State College, PA, United States
| | | | - Peter Claes
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Electrical Engineering, KU Leuven, Leuven, Belgium
- Murdoch Children’s Research Institute, Melbourne, VIC, Australia
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| |
Collapse
|
2
|
Hassan Zada MS, Yuan B, Khan WA, Anjum A, Reiff-Marganiec S, Saleem R. A unified graph model based on molecular data binning for disease subtyping. J Biomed Inform 2022; 134:104187. [PMID: 36055637 DOI: 10.1016/j.jbi.2022.104187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 08/05/2022] [Accepted: 08/25/2022] [Indexed: 11/19/2022]
Abstract
Molecular disease subtype discovery from omics data is an important research problem in precision medicine.The biggest challenges are the skewed distribution and data variability in the measurements of omics data. These challenges complicate the efficient identification of molecular disease subtypes defined by clinical differences, such as survival. Existing approaches adopt kernels to construct patient similarity graphs from each view through pairwise matching. However, the distance functions used in kernels are unable to utilize the potentially critical information of extreme values and data variability which leads to the lack of robustness. In this paper, a novel robust distance metric (ROMDEX) is proposed to construct similarity graphs for molecular disease subtypes from omics data, which is able to address the data variability and extreme values challenges. The proposed approach is validated on multiple TCGA cancer datasets, and the results are compared with multiple baseline disease subtyping methods. The evaluation of results is based on Kaplan-Meier survival time analysis, which is validated using statistical tests e.g, Cox-proportional hazard (Cox p-value). We reject the null hypothesis that the cohorts have the same hazard, for the P-values less than 0.05. The proposed approach achieved best P-values of 0.00181, 0.00171, and 0.00758 for Gene Expression, DNA Methylation, and MicroRNA data respectively, which shows significant difference in survival between the cohorts. In the results, the proposed approach outperformed the existing state-of-the-art (MRGC, PINS, SNF, Consensus Clustering and Icluster+) disease subtyping approaches on various individual disease views of multiple TCGA datasets.
Collapse
Affiliation(s)
| | - Bo Yuan
- School of Computing and Mathematical Sciences, University of Leicester, United Kingdom.
| | - Wajahat Ali Khan
- School of Computing and Engineering, University of Derby, United Kingdom.
| | - Ashiq Anjum
- School of Computing and Mathematical Sciences, University of Leicester, United Kingdom.
| | | | - Rabia Saleem
- School of Computing and Engineering, University of Derby, United Kingdom.
| |
Collapse
|