1
|
Ballard JL, Wang Z, Li W, Shen L, Long Q. Deep learning-based approaches for multi-omics data integration and analysis. BioData Min 2024; 17:38. [PMID: 39358793 PMCID: PMC11446004 DOI: 10.1186/s13040-024-00391-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 09/06/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND The rapid growth of deep learning, as well as the vast and ever-growing amount of available data, have provided ample opportunity for advances in fusion and analysis of complex and heterogeneous data types. Different data modalities provide complementary information that can be leveraged to gain a more complete understanding of each subject. In the biomedical domain, multi-omics data includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics, etc.) and imaging (radiomics, pathomics) modalities which, when combined, have the potential to improve performance on prediction, classification, clustering and other tasks. Deep learning encompasses a wide variety of methods, each of which have certain strengths and weaknesses for multi-omics integration. METHOD In this review, we categorize recent deep learning-based approaches by their basic architectures and discuss their unique capabilities in relation to one another. We also discuss some emerging themes advancing the field of multi-omics integration. RESULTS Deep learning-based multi-omics integration methods were categorized broadly into non-generative (feedforward neural networks, graph convolutional neural networks, and autoencoders) and generative (variational methods, generative adversarial models, and a generative pretrained model). Generative methods have the advantage of being able to impose constraints on the shared representations to enforce certain properties or incorporate prior knowledge. They can also be used to generate or impute missing modalities. Recent advances achieved by these methods include the ability to handle incomplete data as well as going beyond the traditional molecular omics data types to integrate other modalities such as imaging data. CONCLUSION We expect to see further growth in methods that can handle missingness, as this is a common challenge in working with complex and heterogeneous data. Additionally, methods that integrate more data types are expected to improve performance on downstream tasks by capturing a comprehensive view of each sample.
Collapse
Affiliation(s)
- Jenna L Ballard
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA.
| | - Zexuan Wang
- Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, 209 S. 33rd Street, Philadelphia, PA, 19104, USA
| | - Wenrui Li
- Department of Statistics, University of Connecticut, 215 Glenbrook Road, Storrs, CT, 06269, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
| |
Collapse
|
2
|
Hulot JS, Janiak P, Boutinaud P, Boutouyrie P, Chézalviel-Guilbert F, Christophe JJ, Cohen A, Damy T, Djadi-Prat J, Firat H, Hervé PY, Isnard R, Jondeau G, Mousseaux E, Pernot M, Prot P, Tyl B, Soulat G, Logeart D. Rationale and design of the PACIFIC-PRESERVED (PhenomApping, ClassIFication and Innovation for Cardiac dysfunction in patients with heart failure and PRESERVED left ventricular ejection fraction) study. Arch Cardiovasc Dis 2024; 117:332-342. [PMID: 38644067 DOI: 10.1016/j.acvd.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/23/2024] [Accepted: 02/26/2024] [Indexed: 04/23/2024]
Abstract
BACKGROUND Heart failure with preserved ejection fraction (HFpEF) is a heterogeneous syndrome that is poorly defined, reflecting an incomplete understanding of its pathophysiology. AIM To redefine the phenotypic spectrum of HFpEF. METHODS The PACIFIC-PRESERVED study is a prospective multicentre cohort study designed to perform multidimensional deep phenotyping of patients diagnosed with HFpEF (left ventricular ejection fraction≥50%), patients with heart failure with reduced ejection fraction (left ventricular ejection fraction≤40%) and subjects without overt heart failure (3:2:1 ratio). The study proposes prospective investigations in patients during a 1-day hospital stay: physical examination; electrocardiogram; performance-based tests; blood samples; cardiac magnetic resonance imaging; transthoracic echocardiography (rest and low-level exercise); myocardial shear wave elastography; chest computed tomography; and non-invasive measurement of arterial stiffness. Dyspnoea, depression, general health and quality of life will be assessed by dedicated questionnaires. A biobank will be established. After the hospital stay, patients are asked to wear a connected garment (with digital sensors) to collect electrocardiography, pulmonary and activity variables in real-life conditions (for up to 14 days). Data will be centralized for machine-learning-based analyses, with the aim of reclassifying HFpEF into more distinct subgroups, improving understanding of the disease mechanisms and identifying new biological pathways and molecular targets. The study will also serve as a platform to enable the development of innovative technologies and strategies for the diagnosis and stratification of patients with HFpEF. CONCLUSIONS PACIFIC-PRESERVED is a prospective multicentre phenomapping study, using novel analytical techniques, which will provide a unique data resource to better define HFpEF and identify new clinically meaningful subgroups of patients.
Collapse
Affiliation(s)
- Jean-Sébastien Hulot
- Université Paris Cité, INSERM, PARCC, 75015 Paris, France; CIC1418 and DMU CARTE, Hôpital Européen Georges-Pompidou, AP-HP, 75015 Paris, France.
| | | | | | - Pierre Boutouyrie
- Université Paris Cité, INSERM, PARCC, 75015 Paris, France; Pharmacology and DMU CARTE, Hôpital Européen Georges-Pompidou, AP-HP, 75015 Paris, France
| | | | | | - Ariel Cohen
- Cardiology, Hôpital Saint-Antoine, AP-HP, ICAN 1166, Sorbonne Université, 75012 Paris, France
| | - Thibaud Damy
- Cardiology, Hôpital Henri-Mondor, AP-HP, 94000 Créteil, France
| | - Juliette Djadi-Prat
- Clinical Research Unit, Hôpital Européen Georges-Pompidou, AP-HP, 75015 Paris, France
| | | | | | - Richard Isnard
- Cardiology, Hôpital Pitié-Salpêtrière, AP-HP, 75013 Paris, France
| | | | - Elie Mousseaux
- Université Paris Cité, INSERM, PARCC, 75015 Paris, France; Cardiac Imaging Radiology, Hôpital Européen Georges-Pompidou, AP-HP, 75015 Paris, France
| | - Mathieu Pernot
- Physics for Medicine Paris, INSERM U1273, ESPCI Paris, PSL University, CNRS FRE 2031, 75015 Paris, France
| | | | | | - Gilles Soulat
- Université Paris Cité, INSERM, PARCC, 75015 Paris, France; Cardiac Imaging Radiology, Hôpital Européen Georges-Pompidou, AP-HP, 75015 Paris, France
| | - Damien Logeart
- Cardiology, Hôpital Lariboisière, AP-HP, 75018 Paris, France
| |
Collapse
|
3
|
Way GP, Sailem H, Shave S, Kasprowicz R, Carragher NO. Evolution and impact of high content imaging. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2023; 28:292-305. [PMID: 37666456 DOI: 10.1016/j.slasd.2023.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/09/2023] [Accepted: 08/29/2023] [Indexed: 09/06/2023]
Abstract
The field of high content imaging has steadily evolved and expanded substantially across many industry and academic research institutions since it was first described in the early 1990's. High content imaging refers to the automated acquisition and analysis of microscopic images from a variety of biological sample types. Integration of high content imaging microscopes with multiwell plate handling robotics enables high content imaging to be performed at scale and support medium- to high-throughput screening of pharmacological, genetic and diverse environmental perturbations upon complex biological systems ranging from 2D cell cultures to 3D tissue organoids to small model organisms. In this perspective article the authors provide a collective view on the following key discussion points relevant to the evolution of high content imaging: • Evolution and impact of high content imaging: An academic perspective • Evolution and impact of high content imaging: An industry perspective • Evolution of high content image analysis • Evolution of high content data analysis pipelines towards multiparametric and phenotypic profiling applications • The role of data integration and multiomics • The role and evolution of image data repositories and sharing standards • Future perspective of high content imaging hardware and software.
Collapse
Affiliation(s)
- Gregory P Way
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Heba Sailem
- School of Cancer and Pharmaceutical Sciences, King's College London, UK
| | - Steven Shave
- GlaxoSmithKline Medicines Research Centre, Gunnels Wood Rd, Stevenage SG1 2NY, UK; Edinburgh Cancer Research, Cancer Research UK Scotland Centre, Institute of Genetics and Cancer, University of Edinburgh, UK
| | - Richard Kasprowicz
- GlaxoSmithKline Medicines Research Centre, Gunnels Wood Rd, Stevenage SG1 2NY, UK
| | - Neil O Carragher
- Edinburgh Cancer Research, Cancer Research UK Scotland Centre, Institute of Genetics and Cancer, University of Edinburgh, UK.
| |
Collapse
|
4
|
Wanjiku RN, Nderu L, Kimwele M. Improved transfer learning using textural features conflation and dynamically fine-tuned layers. PeerJ Comput Sci 2023; 9:e1601. [PMID: 37810335 PMCID: PMC10557498 DOI: 10.7717/peerj-cs.1601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 08/29/2023] [Indexed: 10/10/2023]
Abstract
Transfer learning involves using previously learnt knowledge of a model task in addressing another task. However, this process works well when the tasks are closely related. It is, therefore, important to select data points that are closely relevant to the previous task and fine-tune the suitable pre-trained model's layers for effective transfer. This work utilises the least divergent textural features of the target datasets and pre-trained model's layers, minimising the lost knowledge during the transfer learning process. This study extends previous works on selecting data points with good textural features and dynamically selected layers using divergence measures by combining them into one model pipeline. Five pre-trained models are used: ResNet50, DenseNet169, InceptionV3, VGG16 and MobileNetV2 on nine datasets: CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, Stanford Dogs, Caltech 256, ISIC 2016, ChestX-ray8 and MIT Indoor Scenes. Experimental results show that data points with lower textural feature divergence and layers with more positive weights give better accuracy than other data points and layers. The data points with lower divergence give an average improvement of 3.54% to 6.75%, while the layers improve by 2.42% to 13.04% for the CIFAR-100 dataset. Combining the two methods gives an extra accuracy improvement of 1.56%. This combined approach shows that data points with lower divergence from the source dataset samples can lead to a better adaptation for the target task. The results also demonstrate that selecting layers with more positive weights reduces instances of trial and error in selecting fine-tuning layers for pre-trained models.
Collapse
Affiliation(s)
| | - Lawrence Nderu
- Computing, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
| | - Michael Kimwele
- Computing, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
| |
Collapse
|
5
|
Ye X, Shang Y, Shi T, Zhang W, Sakurai T. Multi-omics clustering for cancer subtyping based on latent subspace learning. Comput Biol Med 2023; 164:107223. [PMID: 37490833 DOI: 10.1016/j.compbiomed.2023.107223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/30/2023] [Indexed: 07/27/2023]
Abstract
The increased availability of high-throughput technologies has enabled biomedical researchers to learn about disease etiology across multiple omics layers, which shows promise for improving cancer subtype identification. Many computational methods have been developed to perform clustering on multi-omics data, however, only a few of them are applicable for partial multi-omics in which some samples lack data in some types of omics. In this study, we propose a novel multi-omics clustering method based on latent sub-space learning (MCLS), which can deal with the missing multi-omics for clustering. We utilize the data with complete omics to construct a latent subspace using PCA-based feature extraction and singular value decomposition (SVD). The data with incomplete multi-omics are then projected to the latent subspace, and spectral clustering is performed to find the clusters. The proposed MCLS method is evaluated on seven different cancer datasets on three levels of omics in both full and partial cases compared to several state-of-the-art methods. The experimental results show that the proposed MCLS method is more efficient and effective than the compared methods for cancer subtype identification in multi-omics data analysis, which provides important references to a comprehensive understanding of cancer and biological mechanisms. AVAILABILITY: The proposed method can be freely accessible at https://github.com/ShangCS/MCLS.
Collapse
Affiliation(s)
- Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Yifan Shang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tianyi Shi
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Weihang Zhang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
6
|
Raza Abidi SS, Naqvi A, Worthen G, Vinson A, Abidi S, Kiberd B, Skinner T, West K, Tennankore KK. Multiview Clustering to Identify Novel Kidney Donor Phenotypes for Assessing Graft Survival in Older Transplant Recipients. KIDNEY360 2023; 4:951-961. [PMID: 37291713 PMCID: PMC10371275 DOI: 10.34067/kid.0000000000000190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/08/2023] [Indexed: 06/10/2023]
Abstract
Key Points An unsupervised machine learning clustering algorithm identified distinct deceased kidney donor phenotypes among older recipients. Recipients of certain donor phenotypes were at a relatively higher risk of all-cause graft loss even after accounting for recipient factors. The use of unsupervised clustering to support kidney allocation systems may be an important area for future study. Background Older transplant recipients are at a relatively increased risk of graft failure after transplantation, and some of this risk may relate to donor characteristics. Unsupervised clustering using machine learning may be a novel approach to identify donor phenotypes that may then be used to evaluate outcomes for older recipients. Using a cohort of older recipients, the purpose of this study was to (1 ) use unsupervised clustering to identify donor phenotypes and (2 ) determine the risk of death/graft failure for recipients of each donor phenotype. Methods We analyzed a nationally representative cohort of kidney transplant recipients aged 65 years or older captured using the Scientific Registry of Transplant Recipients between 2000 and 2017. Unsupervised clustering was used to generate phenotypes using donor characteristics inclusive of variables in the kidney donor risk index (KDRI). Cluster assignment was internally validated. Outcomes included all-cause graft failure (including mortality) and delayed graft function. Differences in the distribution of KDRI scores were also compared across the clusters. All-cause graft failure was compared for recipients of donor kidneys from each cluster using a multivariable Cox survival analysis. Results Overall, 23,558 donors were separated into five clusters. The area under the curve for internal validation of cluster assignment was 0.89. Recipients of donor kidneys from two clusters were found to be at high risk of all-cause graft failure relative to the lowest risk cluster (adjusted hazards ratio, 1.86; 95% confidence interval, 1.69 to 2.05 and 1.73; 95% confidence interval, 1.61 to 1.87). Only one of these high-risk clusters had high proportions of donors with established risk factors (i.e. , hypertension, diabetes). KDRI scores were similar for the highest and lowest risk clusters (1.40 [1.18–1.67] and 1.37 [1.15–1.65], respectively). Conclusions Unsupervised clustering can identify novel donor phenotypes comprising established donor characteristics that, in turn, may be associated with different risks of graft loss for older transplant recipients.
Collapse
Affiliation(s)
- Syed Sibte Raza Abidi
- Division of Nephrology, Department of Medicine, Dalhousie University, Halifax, Nova Scotia, Canada
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Ye X, Shi T, Cui Y, Sakurai T. Interactive gene identification for cancer subtyping based on multi-omics clustering. Methods 2023; 211:61-67. [PMID: 36804215 DOI: 10.1016/j.ymeth.2023.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 02/06/2023] [Accepted: 02/12/2023] [Indexed: 02/17/2023] Open
Abstract
Recent advances in multi-omics databases offer the opportunity to explore complex systems of cancers across hierarchical biological levels. Some methods have been proposed to identify the genes that play a vital role in disease development by integrating multi-omics. However, the existing methods identify the related genes separately, neglecting the gene interactions that are related to the multigenic disease. In this study, we develop a learning framework to identify the interactive genes based on multi-omics data including gene expression. Firstly, we integrate different omics based on their similarities and apply spectral clustering for cancer subtype identification. Then, a gene co-expression network is construct for each cancer subtype. Finally, we detect the interactive genes in the co-expression network by learning the dense subgraphs based on the L1 prosperities of eigenvectors in the modularity matrix. We apply the proposed learning framework on a multi-omics cancer dataset to identify the interactive genes for each cancer subtype. The detected genes are examined by DAVID and KEGG tools for systematic gene ontology enrichment analysis. The analysis results show that the detected genes have relationships to cancer development and the genes in different cancer subtypes are related to different biological processes and pathways, which are expected to yield important references for understanding tumor heterogeneity and improving patient survival.
Collapse
Affiliation(s)
- Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Tianyi Shi
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan
| | - Yaxuan Cui
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan
| |
Collapse
|
8
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| |
Collapse
|
9
|
Shi X, Liang C, Wang H. Multiview Robust Graph-Based Clustering for Cancer Subtype Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:544-556. [PMID: 35044919 DOI: 10.1109/tcbb.2022.3143897] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Cancer subtype identification is to classify cancer into groups according to their molecular characteristics and clinical manifestations and is the basis for more personalized diagnosis and therapy. Public datasets such as The Cancer Genome Atlas (TCGA) have collected a massive number of multi-omics data. The accumulation of these datasets provides unprecedented opportunities to study the mechanism of cancers and further identify cancer subtypes at a comprehensive level. In this paper, we propose a multi-view robust graph-based clustering (MRGC) method to effectively identify cancer subtypes. Our method first learns robust latent representations from the raw omics data to alleviate the influences of the noise, where a set of similarity matrices are then adaptively learned based on these new representations. Finally, a global similarity graph is obtained by exploiting the consensus structure from the graphs. As a result, the three parts in our method can reinforce each other in a mutual iterative manner. We conduct extensive experiments on both generic machine learning datasets and cancer datasets. The experimental results confirm that our model can achieve satisfactory clustering performance compared to several state-of-the-art approaches. Moreover, we convey the practicability of MRGC by carrying out a case study on hepatocellular carcinoma.
Collapse
|
10
|
Shetta O, Niranjan M, Dasmahapatra S. Convex Multi-View Clustering Via Robust Low Rank Approximation With Application to Multi-Omic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3340-3352. [PMID: 34705655 DOI: 10.1109/tcbb.2021.3122961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent advances in high throughput technologies have made large amounts of biomedical omics data accessible to the scientific community. Single omic data clustering has proved its impact in the biomedical and biological research fields. Multi-omic data clustering and multi-omic data integration techniques have shown improved clustering performance and biological insight. Cancer subtype clustering is an important task in the medical field to be able to identify a suitable treatment procedure and prognosis for cancer patients. State of the art multi-view clustering methods are based on non-convex objectives which only guarantee non-global solutions that are high in computational complexity. Only a few convex multi-view methods are present. However, their models do not take into account the intrinsic manifold structure of the data. In this paper, we introduce a convex graph regularized multi-view clustering method that is robust to outliers. We compare our algorithm to state of the art convex and non-convex multi-view and single view clustering methods, and show its superiority in clustering cancer subtypes on publicly available cancer genomic datasets from the TCGA repository. We also show our method's better ability to potentially discover cancer subtypes compared to other state of the art multi-view methods.
Collapse
|
11
|
Tian J, Zhao J, Zheng C. Clustering of cancer data based on Stiefel manifold for multiple views. BMC Bioinformatics 2021; 22:268. [PMID: 34034643 PMCID: PMC8152349 DOI: 10.1186/s12859-021-04195-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/12/2021] [Indexed: 12/23/2022] Open
Abstract
Background In recent years, various sequencing techniques have been used to collect biomedical omics datasets. It is usually possible to obtain multiple types of omics data from a single patient sample. Clustering of omics data plays an indispensable role in biological and medical research, and it is helpful to reveal data structures from multiple collections. Nevertheless, clustering of omics data consists of many challenges. The primary challenges in omics data analysis come from high dimension of data and small size of sample. Therefore, it is difficult to find a suitable integration method for structural analysis of multiple datasets. Results In this paper, a multi-view clustering based on Stiefel manifold method (MCSM) is proposed. The MCSM method comprises three core steps. Firstly, we established a binary optimization model for the simultaneous clustering problem. Secondly, we solved the optimization problem by linear search algorithm based on Stiefel manifold. Finally, we integrated the clustering results obtained from three omics by using k-nearest neighbor method. We applied this approach to four cancer datasets on TCGA. The result shows that our method is superior to several state-of-art methods, which depends on the hypothesis that the underlying omics cluster class is the same. Conclusion Particularly, our approach has better performance than compared approaches when the underlying clusters are inconsistent. For patients with different subtypes, both consistent and differential clusters can be identified at the same time.
Collapse
Affiliation(s)
- Jing Tian
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jianping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.
| | - Chunhou Zheng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.,School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
12
|
Adossa N, Khan S, Rytkönen KT, Elo LL. Computational strategies for single-cell multi-omics integration. Comput Struct Biotechnol J 2021; 19:2588-2596. [PMID: 34025945 PMCID: PMC8114078 DOI: 10.1016/j.csbj.2021.04.060] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/23/2021] [Accepted: 04/24/2021] [Indexed: 02/06/2023] Open
Abstract
Single-cell omics technologies are currently solving biological and medical problems that earlier have remained elusive, such as discovery of new cell types, cellular differentiation trajectories and communication networks across cells and tissues. Current advances especially in single-cell multi-omics hold high potential for breakthroughs by integration of multiple different omics layers. To pair with the recent biotechnological developments, many computational approaches to process and analyze single-cell multi-omics data have been proposed. In this review, we first introduce recent developments in single-cell multi-omics in general and then focus on the available data integration strategies. The integration approaches are divided into three categories: early, intermediate, and late data integration. For each category, we describe the underlying conceptual principles and main characteristics, as well as provide examples of currently available tools and how they have been applied to analyze single-cell multi-omics data. Finally, we explore the challenges and prospective future directions of single-cell multi-omics data integration, including examples of adopting multi-view analysis approaches used in other disciplines to single-cell multi-omics.
Collapse
Affiliation(s)
- Nigatu Adossa
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Sofia Khan
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Kalle T. Rytkönen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- Institute of Biomedicine, University of Turku, 20520 Turku, Finland
| | - Laura L. Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- Institute of Biomedicine, University of Turku, 20520 Turku, Finland
| |
Collapse
|