1
|
Zhang S, Lv J, Zhang J, Fan Z, Gu B, Fan B, Li C, Wang C, Zhang T. Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 261:108603. [PMID: 39826483 DOI: 10.1016/j.cmpb.2025.108603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 11/27/2024] [Accepted: 01/12/2025] [Indexed: 01/22/2025]
Abstract
BACKGROUND AND OBJECTIVE Colorectal cancer (CRC) represents a heterogeneous malignancy that has concerned global burden of incidence and mortality. The traditional tumor-node-metastasis staging system has exhibited certain limitations. With the advancement of omics technologies, researchers are directing their focus on developing a more precise multi-omics molecular classification. Therefore, the utilization of unsupervised multi-omics integrative clustering methods in CRC, advocating for the establishment of a comprehensive benchmark with practical guidelines. METHODS In this study, we obtained CRC multi-omics data, encompassing DNA methylation, gene expression, and protein expression from the cancer genome atlas (TCGA)database. We then generated interrelated CRC multi-omics data with various structures based on realistic multi-omics correlations, and performed a comprehensive evaluation of eight representative methods categorized as early integration, intermediate integration, and late integration using complementary benchmarks for subtype classification accuracy. Lastly, we employed these methods to integrate real-world CRC multi-omics data, survival and differential analysis were used to highlight differences among newly identified multi-omics subtypes. RESULTS Through in-depth comparisons, we observed that similarity network fusion (SNF) exhibited exceptional performance in integrating multi-omics data derived from simulations. Additionally, SNF effectively distinguished CRC patients into five subgroups with the highest classification accuracy. Moreover, we found significant survival differences and molecular distinctions among SNF subtypes. CONCLUSIONS The findings consistently demonstrate that SNF outperforms other methods in CRC multi-omics integrative clustering. The significant survival differences and molecular distinctions among SNF subtypes provide novel insights into the multi-omics perspective on CRC heterogeneity with potential clinical treatment.
Collapse
Affiliation(s)
- Shuai Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Jiali Lv
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Jinglan Zhang
- School of Life Science, Shandong University, Qingdao, 266237, China
| | - Zhe Fan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Bingbing Gu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Bingbing Fan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Chunxia Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Cheng Wang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China.
| | - Tao Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China; Department of Epidemiology and Biostatistics, School of Public Health, Tianjin Medical University, Tianjin, 300070, China.
| |
Collapse
|
2
|
Mildau K, Ehlers H, Meisenburg M, Del Pup E, Koetsier RA, Torres Ortega LR, de Jonge NF, Singh KS, Ferreira D, Othibeng K, Tugizimana F, Huber F, van der Hooft JJJ. Effective data visualization strategies in untargeted metabolomics. Nat Prod Rep 2024. [PMID: 39620439 PMCID: PMC11610048 DOI: 10.1039/d4np00039k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Indexed: 12/11/2024]
Abstract
Covering: 2014 to 2023 for metabolomics, 2002 to 2023 for information visualizationLC-MS/MS-based untargeted metabolomics is a rapidly developing research field spawning increasing numbers of computational metabolomics tools assisting researchers with their complex data processing, analysis, and interpretation tasks. In this article, we review the entire untargeted metabolomics workflow from the perspective of information visualization, visual analytics and visual data integration. Data visualization is a crucial step at every stage of the metabolomics workflow, where it provides core components of data inspection, evaluation, and sharing capabilities. However, due to the large number of available data analysis tools and corresponding visualization components, it is hard for both users and developers to get an overview of what is already available and which tools are suitable for their analysis. In addition, there is little cross-pollination between the fields of data visualization and metabolomics, leaving visual tools to be designed in a secondary and mostly ad hoc fashion. With this review, we aim to bridge the gap between the fields of untargeted metabolomics and data visualization. First, we introduce data visualization to the untargeted metabolomics field as a topic worthy of its own dedicated research, and provide a primer on cutting-edge visualization research into data visualization for both researchers as well as developers active in metabolomics. We extend this primer with a discussion of best practices for data visualization as they have emerged from data visualization studies. Second, we provide a practical roadmap to the visual tool landscape and its use within the untargeted metabolomics field. Here, for several computational analysis stages within the untargeted metabolomics workflow, we provide an overview of commonly used visual strategies with practical examples. In this context, we will also outline promising areas for further research and development. We end the review with a set of recommendations for developers and users on how to make the best use of visualizations for more effective and transparent communication of results.
Collapse
Affiliation(s)
- Kevin Mildau
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | - Henry Ehlers
- Visualization Group, Institute of Visual Computing and Human-Centered Technology, TU Wien, Vienna, Austria.
| | - Mara Meisenburg
- Adaptation Physiology Group, Wageningen University & Research, Wageningen, The Netherlands
| | - Elena Del Pup
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | - Robert A Koetsier
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | | | - Niek F de Jonge
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | - Kumar Saurabh Singh
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
- Maastricht University Faculty of Science and Engineering, Plant Functional Genomics Maastricht, Limburg, The Netherlands
- Faculty of Environment, Science and Economy, University of Exeter, Penryl Cornwall, UK
| | | | - Kgalaletso Othibeng
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Fidele Tugizimana
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Florian Huber
- Centre for Digitalisation and Digitality, Düsseldorf University of Applied Sciences, Düsseldorf, Germany
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| |
Collapse
|
3
|
Anwar MY, Highland H, Buchanan VL, Graff M, Young K, Taylor KD, Tracy RP, Durda P, Liu Y, Johnson CW, Aguet F, Ardlie KG, Gerszten RE, Clish CB, Lange LA, Ding J, Goodarzi MO, Chen YDI, Peloso GM, Guo X, Stanislawski MA, Rotter JI, Rich SS, Justice AE, Liu CT, North K. Machine learning-based clustering identifies obesity subgroups with differential multi-omics profiles and metabolic patterns. Obesity (Silver Spring) 2024; 32:2024-2034. [PMID: 39497627 PMCID: PMC11540333 DOI: 10.1002/oby.24137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 06/18/2024] [Accepted: 07/22/2024] [Indexed: 11/08/2024]
Abstract
OBJECTIVE Individuals living with obesity are differentially susceptible to cardiometabolic diseases. We hypothesized that an integrative multi-omics approach might improve identification of subgroups of individuals with obesity who have distinct cardiometabolic disease patterns. METHODS We performed machine learning-based, integrative unsupervised clustering to identify proteomics- and metabolomics-defined subpopulations of individuals living with obesity (BMI ≥ 30 kg/m2), leveraging data from 243 individuals in the Multi-Ethnic Study of Atherosclerosis (MESA) cohort. Omics that contributed to the observed clusters were functionally characterized. We performed multivariate regression to assess whether the individuals in each cluster demonstrated differential patterns of cardiometabolic traits. RESULTS We identified two distinct clusters (iCluster1 and 2). iCluster2 had significantly higher average BMI values, fasting blood glucose, and inflammation. iCluster1 was associated with higher levels of total cholesterol and high-density lipoprotein cholesterol. Pathways mediating cell growth, lipogenesis, and energy expenditures were positively associated with iCluster1. Inflammatory response and insulin resistance pathways were positively associated with iCluster2. CONCLUSIONS Although the two identified clusters may represent progressive obesity-related pathologic processes measured at different stages, other mechanisms in combination could also underpin the identified clusters given no significant age difference between the comparative groups. For instance, clusters may reflect differences in dietary/behavioral patterns or differential rates of metabolic damage.
Collapse
Affiliation(s)
- Mohammad Y Anwar
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Heather Highland
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Victoria Lynn Buchanan
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kristin Young
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, California, USA
| | - Russell P Tracy
- Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, Vermont, USA
| | - Peter Durda
- Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, Vermont, USA
| | - Yongmei Liu
- Department of Medicine, Duke University Medical Center, Durham, North Carolina, USA
| | - Craig W Johnson
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Francois Aguet
- Program of Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Kristin G Ardlie
- Program of Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Robert E Gerszten
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Clary B Clish
- Metabolite Profiling Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Leslie A Lange
- Department of Epidemiology, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Jingzhong Ding
- Section of Gerontology and Geriatric Medicine, Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA
| | - Mark O Goodarzi
- Division of Endocrinology, Diabetes, and Metabolism, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, California, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston University, Boston, Massachusetts, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, California, USA
| | - Maggie A Stanislawski
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, California, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, USA
| | - Anne E Justice
- Department of Population Health Sciences, Geisinger Health System, Danville, Pennsylvania, USA
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University School of Public Health, Boston University, Boston, Massachusetts, USA
| | - Kari North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
4
|
Acharya D, Mukhopadhyay A. A comprehensive review of machine learning techniques for multi-omics data integration: challenges and applications in precision oncology. Brief Funct Genomics 2024; 23:549-560. [PMID: 38600757 DOI: 10.1093/bfgp/elae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/12/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024] Open
Abstract
Multi-omics data play a crucial role in precision medicine, mainly to understand the diverse biological interaction between different omics. Machine learning approaches have been extensively employed in this context over the years. This review aims to comprehensively summarize and categorize these advancements, focusing on the integration of multi-omics data, which includes genomics, transcriptomics, proteomics and metabolomics, alongside clinical data. We discuss various machine learning techniques and computational methodologies used for integrating distinct omics datasets and provide valuable insights into their application. The review emphasizes both the challenges and opportunities present in multi-omics data integration, precision medicine and patient stratification, offering practical recommendations for method selection in various scenarios. Recent advances in deep learning and network-based approaches are also explored, highlighting their potential to harmonize diverse biological information layers. Additionally, we present a roadmap for the integration of multi-omics data in precision oncology, outlining the advantages, challenges and implementation difficulties. Hence this review offers a thorough overview of current literature, providing researchers with insights into machine learning techniques for patient stratification, particularly in precision oncology. Contact: anirban@klyuniv.ac.in.
Collapse
Affiliation(s)
- Debabrata Acharya
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science & Engineering, University of Kalyani, Kalyani-741235, West Bengal, India
| |
Collapse
|
5
|
Wang T, Cui S, Lyu C, Wang Z, Li Z, Han C, Liu W, Wang Y, Xu R. Molecular precision medicine: Multi-omics-based stratification model for acute myeloid leukemia. Heliyon 2024; 10:e36155. [PMID: 39263156 PMCID: PMC11388765 DOI: 10.1016/j.heliyon.2024.e36155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 08/01/2024] [Accepted: 08/11/2024] [Indexed: 09/13/2024] Open
Abstract
Acute myeloid leukemia (AML), as the most common malignancy of the hematopoietic system, poses challenges in treatment efficacy, relapse, and drug resistance. In this study, we have utilized 151 RNA sequencing datasets, 194 DNA methylation datasets, and 200 somatic mutation datasets from the AML cohort in the TCGA database to develop a multi-omics stratification model. This model enables comparison of prognosis, clinical features, gene mutations, immune microenvironment and drug sensitivity across subgroups. External validation datasets have been sourced from the GEO database, which includes 562 mRNA datasets and 136 miRNA datasets from 984 adult AML patients. Through multi-omics-based stratification model, we classified 126 AML patients into 4 clusters (CS). CS4 had the best prognosis, with the youngest age, highest M3 subtype proportion, fewest copy number alterations, and common mutations in WT1, FLT3, and KIT genes. It showed sensitivity to HDAC inhibitors and BCL-2 inhibitors. Both the M3 subtype and CS4 were identified as independent protective factors for survival. Conversely, CS3 had the worst prognosis due to older age, high copy number alterations, and frequent mutations in RUNX1, DNMT3A, and TP53 genes. Additionally, it showed higher proportions of cytotoxic cells and Tregs, suggesting potential sensitivity to mTOR inhibitors. CS1 had a better prognosis than CS2, with more copy number alterations, while CS2 had higher monocyte proportions. CS1 showed good sensitivity to cytarabine, while CS2 was sensitive to RXR agonists. Both CS1 and CS2, which predominantly featured mutations in FLT3, NPM1, and DNMT3A genes, benefited from FLT3 inhibitors. Using the Kappa test, our stratification model underwent robust validation in the miRNA and mRNA external validation datasets. With advancements in sequencing technology and machine learning algorithms, AML is poised to transition towards multi-omics precision medicine in the future. We aspire for our study to offer new perspectives on multi-drug combination clinical trials and multi-targeted precision medicine for AML.
Collapse
Affiliation(s)
- Teng Wang
- The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Siyuan Cui
- Key Laboratory of Integrated Traditional Chinese and Western Medicine for Hematology, Health Commission of Shandong Province, Shandong, 250014, China
- Institute of Hematology, Shandong University of Traditional Chinese Medicine, Shandong, 250014, China
- Department of Hematology, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Shandong, 250014, China
| | - Chunyi Lyu
- The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Zhenzhen Wang
- Key Laboratory of Integrated Traditional Chinese and Western Medicine for Hematology, Health Commission of Shandong Province, Shandong, 250014, China
- Institute of Hematology, Shandong University of Traditional Chinese Medicine, Shandong, 250014, China
- Department of Hematology, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Shandong, 250014, China
| | - Zonghong Li
- The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Chen Han
- The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Weilin Liu
- College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yan Wang
- Key Laboratory of Integrated Traditional Chinese and Western Medicine for Hematology, Health Commission of Shandong Province, Shandong, 250014, China
- Institute of Hematology, Shandong University of Traditional Chinese Medicine, Shandong, 250014, China
- Department of Hematology, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Shandong, 250014, China
| | - Ruirong Xu
- Key Laboratory of Integrated Traditional Chinese and Western Medicine for Hematology, Health Commission of Shandong Province, Shandong, 250014, China
- Institute of Hematology, Shandong University of Traditional Chinese Medicine, Shandong, 250014, China
- Department of Hematology, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Shandong, 250014, China
| |
Collapse
|
6
|
Zhao Y, Jia Q, Goodrich J, Darst B, Conti DV. An extension of latent unknown clustering integrating multi-omics data (LUCID) incorporating incomplete omics data. BIOINFORMATICS ADVANCES 2024; 4:vbae123. [PMID: 39224838 PMCID: PMC11368387 DOI: 10.1093/bioadv/vbae123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 07/23/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
Motivation Latent unknown clustering integrating multi-omics data is a novel statistical model designed for multi-omics data analysis. It integrates omics data with exposures and an outcome through a latent cluster, elucidating how exposures influence processes reflected in multi-omics measurements, ultimately affecting an outcome. A significant challenge in multi-omics analysis is the issue of list-wise missingness. To address this, we extend the model to incorporate list-wise missingness within an integrated imputation framework, which can also handle sporadic missingness when necessary. Results Simulation studies demonstrate that our integrated imputation approach produces consistent and less biased estimates, closely reflecting true underlying values. We applied this model to data from the ISGlobal/ATHLETE "Exposome Data Challenge Event" to explore the association between maternal exposure to hexachlorobenzene and childhood body mass index by integrating incomplete proteomics data from 1301 children. The model successfully estimated proteomics profiles for two clusters representing higher and lower body mass index, characterizing the potential profiles linking prenatal hexachlorobenzene levels and childhood body mass index. Availability and implementation The proposed methods have been implemented in the R package LUCIDus. The source code is available at https://github.com/USCbiostats/LUCIDus.
Collapse
Affiliation(s)
- Yinqi Zhao
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, United States
| | - Qiran Jia
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, United States
| | - Jesse Goodrich
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, United States
| | - Burcu Darst
- Public Health Sciences Division, Fred Hutch Cancer Center, Seattle, WA 98109, United States
| | - David V Conti
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, United States
| |
Collapse
|
7
|
Rintala TJ, Fortino V. COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms. PLoS Comput Biol 2024; 20:e1012275. [PMID: 39102448 PMCID: PMC11326705 DOI: 10.1371/journal.pcbi.1012275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 08/15/2024] [Accepted: 06/25/2024] [Indexed: 08/07/2024] Open
Abstract
Recent research on multi-view clustering algorithms for complex disease subtyping often overlooks aspects like clustering stability and critical assessment of prognostic relevance. Furthermore, current frameworks do not allow for a comparison between data-driven and pathway-driven clustering, highlighting a significant gap in the methodology. We present the COPS R-package, tailored for robust evaluation of single and multi-omics clustering results. COPS features advanced methods, including similarity networks, kernel-based approaches, dimensionality reduction, and pathway knowledge integration. Some of these methods are not accessible through R, and some correspond to new approaches proposed with COPS. Our framework was rigorously applied to multi-omics data across seven cancer types, including breast, prostate, and lung, utilizing mRNA, CNV, miRNA, and DNA methylation data. Unlike previous studies, our approach contrasts data- and knowledge-driven multi-view clustering methods and incorporates cross-fold validation for robustness. Clustering outcomes were assessed using the ARI score, survival analysis via Cox regression models including relevant covariates, and the stability of the results. While survival analysis and gold-standard agreement are standard metrics, they vary considerably across methods and datasets. Therefore, it is essential to assess multi-view clustering methods using multiple criteria, from cluster stability to prognostic relevance, and to provide ways of comparing these metrics simultaneously to select the optimal approach for disease subtype discovery in novel datasets. Emphasizing multi-objective evaluation, we applied the Pareto efficiency concept to gauge the equilibrium of evaluation metrics in each cancer case-study. Affinity Network Fusion, Integrative Non-negative Matrix Factorization, and Multiple Kernel K-Means with linear or Pathway Induced Kernels were the most stable and effective in discerning groups with significantly different survival outcomes in several case studies.
Collapse
Affiliation(s)
- Teemu J. Rintala
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland
| | - Vittorio Fortino
- Institute of Biomedicine, School of Medicine, University of Eastern Finland, Kuopio, Finland
| |
Collapse
|
8
|
Yang L, Yuan L, Liu G. Comprehensive evaluation of disulfidptosis in intestinal immunity and biologic therapy response in Ulcerative Colitis. Heliyon 2024; 10:e34516. [PMID: 39148969 PMCID: PMC11324823 DOI: 10.1016/j.heliyon.2024.e34516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 07/10/2024] [Accepted: 07/10/2024] [Indexed: 08/17/2024] Open
Abstract
Objective Ulcerative Colitis (UC) manifests as a chronic inflammatory condition of the intestines, marked by ongoing immune system dysregulation. Disulfidptosis, a newly identified cell death mechanism, is intimately linked to the onset and advancement of inflammation. However, the role of disulfidptosis in UC remains unclear. Methods We screened differentially expressed genes (DEGs) associated with disulfidptosis in multiple UC datasets, narrowed down the target gene number using lasso regression, and conducted immune infiltration analysis and constructed a clinical diagnostic model. Additionally, we explored the association between disulfidptosis-related key genes and disease remission in UC patients receiving biologic therapy. Finally, we confirmed the expression of key genes in FHC cells and UC tissue samples. Results In the differential analysis, we identified 20 DEGs associated with disulfidptosis. Immune infiltration results revealed that five genes (PDLIM1, SLC7A11, MYH10, NUBPL, OXSM) exhibited strong correlations with immune cells and pathways. Using GO, KEGG and WGCNA analyses, we discovered that gene modules highly correlated with disulfidptosis-related gene expression were significantly enriched in inflammation-related pathways. Additionally, we developed a nomogram based on these five immune-related disulfidptosis genes for UC diagnosis, showing robust diagnostic capability and clinical efficacy. Kaplan-Meier survival analysis revealed a significant link between changes in the expression levels of these cell genes and disease remission in UC patients receiving biologic therapy. In line with previous studies, similar expression changes of the target gene were seen in both UC cell models and tissue samples. Conclusions This study utilized bioinformatic analysis and machine learning to identify and analyze features associated with disulfidptosis in multiple UC datasets. This enhances our comprehension of the role disulfidptosis plays in intestinal immunity and inflammation in UC, providing new perspectives for developing innovative treatments for UC.
Collapse
Affiliation(s)
- Lichao Yang
- Department of General Surgery, The Second Xiangya Hospital of Central South University, 410011, Changsha, China
| | - Lianwen Yuan
- Department of General Surgery, The Second Xiangya Hospital of Central South University, 410011, Changsha, China
| | - Ganglei Liu
- Department of General Surgery, The Second Xiangya Hospital of Central South University, 410011, Changsha, China
| |
Collapse
|
9
|
Ji Q, Zheng Y, Zhou L, Chen F, Li W. Unveiling divergent treatment prognoses in IDHwt-GBM subtypes through multiomics clustering: a swift dual MRI-mRNA model for precise subtype prediction. J Transl Med 2024; 22:578. [PMID: 38890658 PMCID: PMC11186189 DOI: 10.1186/s12967-024-05401-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 06/13/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND IDH1-wildtype glioblastoma multiforme (IDHwt-GBM) is a highly heterogeneous and aggressive brain tumour characterised by a dismal prognosis and significant challenges in accurately predicting patient outcomes. To address these issues and personalise treatment approaches, we aimed to develop and validate robust multiomics molecular subtypes of IDHwt-GBM. Through this, we sought to uncover the distinct molecular signatures underlying these subtypes, paving the way for improved diagnosis and targeted therapy for this challenging disease. METHODS To identify stable molecular subtypes among 184 IDHwt-GBM patients from TCGA, we used the consensus clustering method to consolidate the results from ten advanced multiomics clustering approaches based on mRNA, lncRNA, and mutation data. We developed subtype prediction models using the PAM and machine learning algorithms based on mRNA and MRI data for enhanced clinical utility. These models were validated in five independent datasets, and an online interactive system was created. We conducted a comprehensive assessment of the clinical impact, drug treatment response, and molecular associations of the IDHwt-GBM subtypes. RESULTS In the TCGA cohort, two molecular subtypes, class 1 and class 2, were identified through multiomics clustering of IDHwt-GBM patients. There was a significant difference in survival between Class 1 and Class 2 patients, with a hazard ratio (HR) of 1.68 [1.15-2.47]. This difference was validated in other datasets (CGGA: HR = 1.75[1.04, 2.94]; CPTAC: HR = 1.79[1.09-2.91]; GALSS: HR = 1.66[1.09-2.54]; UCSF: HR = 1.33[1.00-1.77]; UPENN HR = 1.29[1.04-1.58]). Additionally, class 2 was more sensitive to treatment with radiotherapy combined with temozolomide, and this sensitivity was validated in the GLASS cohort. Correspondingly, class 2 and class 1 exhibited significant differences in mutation patterns, enriched pathways, programmed cell death (PCD), and the tumour immune microenvironment. Class 2 had more mutation signatures associated with defective DNA mismatch repair (P = 0.0021). Enriched pathways of differentially expressed genes in class 1 and class 2 (P-adjust < 0.05) were mainly related to ferroptosis, the PD-1 checkpoint pathway, the JAK-STAT signalling pathway, and other programmed cell death and immune-related pathways. The different cell death modes and immune microenvironments were validated across multiple datasets. Finally, our developed survival prediction model, which integrates molecular subtypes, age, and sex, demonstrated clinical benefits based on the decision curve in the test set. We deployed the molecular subtyping prediction model and survival prediction model online, allowing interactive use and facilitating user convenience. CONCLUSIONS Molecular subtypes were identified and verified through multiomics clustering in IDHwt-GBM patients. These subtypes are linked to specific mutation patterns, the immune microenvironment, prognoses, and treatment responses.
Collapse
Affiliation(s)
- Qiang Ji
- Department of Neuro-Oncology, Cancer Center, China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- National Institute for Data Science in Health and Medicine, Capital Medical University, Beijing, China
| | - Yi Zheng
- Department of Neuro-Oncology, Cancer Center, China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Lili Zhou
- Department of Neuro-Oncology, Cancer Center, China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Feng Chen
- Department of Neuro-Oncology, Cancer Center, China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
| | - Wenbin Li
- Department of Neuro-Oncology, Cancer Center, China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
- National Institute for Data Science in Health and Medicine, Capital Medical University, Beijing, China.
| |
Collapse
|
10
|
Soeorg H, Kalamees R, Lutsar I, Metsvaht T. Subgroup identification-based model selection to improve the predictive performance of individualized dosing. J Pharmacokinet Pharmacodyn 2024; 51:253-263. [PMID: 38400995 DOI: 10.1007/s10928-024-09909-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 02/13/2024] [Indexed: 02/26/2024]
Abstract
Currently, model-informed precision dosing uses one population pharmacokinetic model that best fits the target population. We aimed to develop a subgroup identification-based model selection approach to improve the predictive performance of individualized dosing, using vancomycin in neonates/infants as a test case. Data from neonates/infants with at least one vancomycin concentration was randomly divided into training and test dataset. Population predictions from published vancomycin population pharmacokinetic models were calculated. The single best-performing model based on various performance metrics, including median absolute percentage error (APE) and percentage of predictions within 20% (P20) or 60% (P60) of measurement, were determined. Clustering based on median APEs or clinical and demographic characteristics and model selection by genetic algorithm was used to group neonates/infants according to their best-performing model. Subsequently, classification trees to predict the best-performing model using clinical and demographic characteristics were developed. A total of 208 vancomycin treatment episodes in training and 88 in test dataset was included. Of 30 identified models from the literature, the single best-performing model for training dataset had P20 26.2-42.6% in test dataset. The best-performing clustering approach based on median APEs or clinical and demographic characteristics and model selection by genetic algorithm had P20 44.1-45.5% in test dataset, whereas P60 was comparable. Our proof-of-concept study shows that the prediction of the best-performing model for each patient according to the proposed model selection approaches has the potential to improve the predictive performance of model-informed precision dosing compared with the single best-performing model approach.
Collapse
Affiliation(s)
- Hiie Soeorg
- Department of Microbiology, University of Tartu, Ravila 19, Tartu, 50411, Estonia.
| | - Riste Kalamees
- Department of Microbiology, University of Tartu, Ravila 19, Tartu, 50411, Estonia
| | - Irja Lutsar
- Department of Microbiology, University of Tartu, Ravila 19, Tartu, 50411, Estonia
| | - Tuuli Metsvaht
- Department of Microbiology, University of Tartu, Ravila 19, Tartu, 50411, Estonia
- Pediatric Intensive Care Unit, Tartu University Hospital, Puusepa 8, Tartu, 50406, Estonia
| |
Collapse
|
11
|
Novoloaca A, Broc C, Beloeil L, Yu WH, Becker J. Comparative analysis of integrative classification methods for multi-omics data. Brief Bioinform 2024; 25:bbae331. [PMID: 38985929 PMCID: PMC11234228 DOI: 10.1093/bib/bbae331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/31/2024] [Indexed: 07/12/2024] Open
Abstract
Recent advances in sequencing, mass spectrometry, and cytometry technologies have enabled researchers to collect multiple 'omics data types from a single sample. These large datasets have led to a growing consensus that a holistic approach is needed to identify new candidate biomarkers and unveil mechanisms underlying disease etiology, a key to precision medicine. While many reviews and benchmarks have been conducted on unsupervised approaches, their supervised counterparts have received less attention in the literature and no gold standard has emerged yet. In this work, we present a thorough comparison of a selection of six methods, representative of the main families of intermediate integrative approaches (matrix factorization, multiple kernel methods, ensemble learning, and graph-based methods). As non-integrative control, random forest was performed on concatenated and separated data types. Methods were evaluated for classification performance on both simulated and real-world datasets, the latter being carefully selected to cover different medical applications (infectious diseases, oncology, and vaccines) and data modalities. A total of 15 simulation scenarios were designed from the real-world datasets to explore a large and realistic parameter space (e.g. sample size, dimensionality, class imbalance, effect size). On real data, the method comparison showed that integrative approaches performed better or equally well than their non-integrative counterpart. By contrast, DIABLO and the four random forest alternatives outperform the others across the majority of simulation scenarios. The strengths and limitations of these methods are discussed in detail as well as guidelines for future applications.
Collapse
Affiliation(s)
- Alexei Novoloaca
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Camilo Broc
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Laurent Beloeil
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Wen-Han Yu
- Bill & Melinda Gates Medical Research Institute, Cambridge, Massachusetts, MA 02139, United States
| | - Jérémie Becker
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| |
Collapse
|
12
|
Brooks TG, Lahens NF, Mrčela A, Grant GR. Challenges and best practices in omics benchmarking. Nat Rev Genet 2024; 25:326-339. [PMID: 38216661 DOI: 10.1038/s41576-023-00679-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2023] [Indexed: 01/14/2024]
Abstract
Technological advances enabling massively parallel measurement of biological features - such as microarrays, high-throughput sequencing and mass spectrometry - have ushered in the omics era, now in its third decade. The resulting complex landscape of analytical methods has naturally fostered the growth of an omics benchmarking industry. Benchmarking refers to the process of objectively comparing and evaluating the performance of different computational or analytical techniques when processing and analysing large-scale biological data sets, such as transcriptomics, proteomics and metabolomics. With thousands of omics benchmarking studies published over the past 25 years, the field has matured to the point where the foundations of benchmarking have been established and well described. However, generating meaningful benchmarking data and properly evaluating performance in this complex domain remains challenging. In this Review, we highlight some common oversights and pitfalls in omics benchmarking. We also establish a methodology to bring the issues that can be addressed into focus and to be transparent about those that cannot: this takes the form of a spreadsheet template of guidelines for comprehensive reporting, intended to accompany publications. In addition, a survey of recent developments in benchmarking is provided as well as specific guidance for commonly encountered difficulties.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
13
|
Shannon CP, Lee AH, Tebbutt SJ, Singh A. A Commentary on Multi-omics Data Integration in Systems Vaccinology. J Mol Biol 2024; 436:168522. [PMID: 38458605 DOI: 10.1016/j.jmb.2024.168522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 03/04/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Affiliation(s)
| | - Amy Hy Lee
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| | - Scott J Tebbutt
- PROOF Centre of Excellence, Vancouver, Canada; Department of Medicine, The University of British Columbia, Vancouver, Canada; Centre for Heart Lung Innovation, Vancouver, Canada
| | - Amrit Singh
- Centre for Heart Lung Innovation, Vancouver, Canada; Department of Anesthesiology, Pharmacology and Therapeutics, The University of British Columbia, Vancouver, Canada.
| |
Collapse
|
14
|
Williams A. Multiomics data integration, limitations, and prospects to reveal the metabolic activity of the coral holobiont. FEMS Microbiol Ecol 2024; 100:fiae058. [PMID: 38653719 PMCID: PMC11067971 DOI: 10.1093/femsec/fiae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 03/25/2024] [Accepted: 04/22/2024] [Indexed: 04/25/2024] Open
Abstract
Since their radiation in the Middle Triassic period ∼240 million years ago, stony corals have survived past climate fluctuations and five mass extinctions. Their long-term survival underscores the inherent resilience of corals, particularly when considering the nutrient-poor marine environments in which they have thrived. However, coral bleaching has emerged as a global threat to coral survival, requiring rapid advancements in coral research to understand holobiont stress responses and allow for interventions before extensive bleaching occurs. This review encompasses the potential, as well as the limits, of multiomics data applications when applied to the coral holobiont. Synopses for how different omics tools have been applied to date and their current restrictions are discussed, in addition to ways these restrictions may be overcome, such as recruiting new technology to studies, utilizing novel bioinformatics approaches, and generally integrating omics data. Lastly, this review presents considerations for the design of holobiont multiomics studies to support lab-to-field advancements of coral stress marker monitoring systems. Although much of the bleaching mechanism has eluded investigation to date, multiomic studies have already produced key findings regarding the holobiont's stress response, and have the potential to advance the field further.
Collapse
Affiliation(s)
- Amanda Williams
- Microbial Biology Graduate Program, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
| |
Collapse
|
15
|
Liu W, Pratte KA, Castaldi PJ, Hersh C, Bowler RP, Banaei-Kashani F, Kechris KJ. A Generalized Higher-order Correlation Analysis Framework for Multi-Omics Network Inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.22.576667. [PMID: 38328226 PMCID: PMC10849540 DOI: 10.1101/2024.01.22.576667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Multiple -omics (genomics, proteomics, etc.) profiles are commonly generated to gain insight into a disease or physiological system. Constructing multi-omics networks with respect to the trait(s) of interest provides an opportunity to understand relationships between molecular features but integration is challenging due to multiple data sets with high dimensionality. One approach is to use canonical correlation to integrate one or two omics types and a single trait of interest. However, these types of methods may be limited due to (1) not accounting for higher-order correlations existing among features, (2) computational inefficiency when extending to more than two omics data when using a penalty term-based sparsity method, and (3) lack of flexibility for focusing on specific correlations (e.g., omics-to-phenotype correlation versus omics-to-omics correlations). In this work, we have developed a novel multi-omics network analysis pipeline called Sparse Generalized Tensor Canonical Correlation Analysis Network Inference (SGTCCA-Net) that can effectively overcome these limitations. We also introduce an implementation to improve the summarization of networks for downstream analyses. Simulation and real-data experiments demonstrate the effectiveness of our novel method for inferring omics networks and features of interest.
Collapse
Affiliation(s)
- Weixuan Liu
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Peter J. Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, United States
| | - Craig Hersh
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, United States
| | - Russell P. Bowler
- Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, CO, USA
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, USA
| | - Katerina J. Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
16
|
Bodinier B, Vuckovic D, Rodrigues S, Filippi S, Chiquet J, Chadeau-Hyam M. Automated calibration of consensus weighted distance-based clustering approaches using sharp. Bioinformatics 2023; 39:btad635. [PMID: 37847776 PMCID: PMC10627366 DOI: 10.1093/bioinformatics/btad635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 09/24/2023] [Accepted: 10/16/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION In consensus clustering, a clustering algorithm is used in combination with a subsampling procedure to detect stable clusters. Previous studies on both simulated and real data suggest that consensus clustering outperforms native algorithms. RESULTS We extend here consensus clustering to allow for attribute weighting in the calculation of pairwise distances using existing regularized approaches. We propose a procedure for the calibration of the number of clusters (and regularization parameter) by maximizing the sharp score, a novel stability score calculated directly from consensus clustering outputs, making it extremely computationally competitive. Our simulation study shows better clustering performances of (i) approaches calibrated by maximizing the sharp score compared to existing calibration scores and (ii) weighted compared to unweighted approaches in the presence of features that do not contribute to cluster definition. Application on real gene expression data measured in lung tissue reveals clear clusters corresponding to different lung cancer subtypes. AVAILABILITY AND IMPLEMENTATION The R package sharp (version ≥1.4.3) is available on CRAN at https://CRAN.R-project.org/package=sharp.
Collapse
Affiliation(s)
- Barbara Bodinier
- Department of Epidemiology and Biostatistics, Imperial College London, Norfolk place, London W2 1PG, United Kingdom
| | - Dragana Vuckovic
- Department of Epidemiology and Biostatistics, Imperial College London, Norfolk place, London W2 1PG, United Kingdom
| | - Sabrina Rodrigues
- Department of Epidemiology and Biostatistics, Imperial College London, Norfolk place, London W2 1PG, United Kingdom
| | - Sarah Filippi
- Department of Mathematics, Imperial College London, London SW7 2RH, United Kingdom
| | - Julien Chiquet
- UMR MIA Paris-Saclay, AgroParisTech/INRAE, Palaiseau 91123, France
| | - Marc Chadeau-Hyam
- Department of Epidemiology and Biostatistics, Imperial College London, Norfolk place, London W2 1PG, United Kingdom
| |
Collapse
|
17
|
Chen C, Wang J, Pan D, Wang X, Xu Y, Yan J, Wang L, Yang X, Yang M, Liu G. Applications of multi-omics analysis in human diseases. MedComm (Beijing) 2023; 4:e315. [PMID: 37533767 PMCID: PMC10390758 DOI: 10.1002/mco2.315] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/25/2023] [Accepted: 05/31/2023] [Indexed: 08/04/2023] Open
Abstract
Multi-omics usually refers to the crossover application of multiple high-throughput screening technologies represented by genomics, transcriptomics, single-cell transcriptomics, proteomics and metabolomics, spatial transcriptomics, and so on, which play a great role in promoting the study of human diseases. Most of the current reviews focus on describing the development of multi-omics technologies, data integration, and application to a particular disease; however, few of them provide a comprehensive and systematic introduction of multi-omics. This review outlines the existing technical categories of multi-omics, cautions for experimental design, focuses on the integrated analysis methods of multi-omics, especially the approach of machine learning and deep learning in multi-omics data integration and the corresponding tools, and the application of multi-omics in medical researches (e.g., cancer, neurodegenerative diseases, aging, and drug target discovery) as well as the corresponding open-source analysis tools and databases, and finally, discusses the challenges and future directions of multi-omics integration and application in precision medicine. With the development of high-throughput technologies and data integration algorithms, as important directions of multi-omics for future disease research, single-cell multi-omics and spatial multi-omics also provided a detailed introduction. This review will provide important guidance for researchers, especially who are just entering into multi-omics medical research.
Collapse
Affiliation(s)
- Chongyang Chen
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
| | - Jing Wang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Donghui Pan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xinyu Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Yuping Xu
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Junjie Yan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Lizhen Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xifei Yang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Min Yang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Gong‐Ping Liu
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
- Department of PathophysiologySchool of Basic MedicineKey Laboratory of Ministry of Education of China and Hubei Province for Neurological DisordersTongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
18
|
O'Connor LM, O'Connor BA, Lim SB, Zeng J, Lo CH. Integrative multi-omics and systems bioinformatics in translational neuroscience: A data mining perspective. J Pharm Anal 2023; 13:836-850. [PMID: 37719197 PMCID: PMC10499660 DOI: 10.1016/j.jpha.2023.06.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 06/20/2023] [Accepted: 06/25/2023] [Indexed: 09/19/2023] Open
Abstract
Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information, with its application to neuroscience termed neuroinformatics. Data mining of omics datasets has enabled the generation of new hypotheses based on differentially regulated biological molecules associated with disease mechanisms, which can be tested experimentally for improved diagnostic and therapeutic targeting of neurodegenerative diseases. Importantly, integrating multi-omics data using a systems bioinformatics approach will advance the understanding of the layered and interactive network of biological regulation that exchanges systemic knowledge to facilitate the development of a comprehensive human brain profile. In this review, we first summarize data mining studies utilizing datasets from the individual type of omics analysis, including epigenetics/epigenomics, transcriptomics, proteomics, metabolomics, lipidomics, and spatial omics, pertaining to Alzheimer's disease, Parkinson's disease, and multiple sclerosis. We then discuss multi-omics integration approaches, including independent biological integration and unsupervised integration methods, for more intuitive and informative interpretation of the biological data obtained across different omics layers. We further assess studies that integrate multi-omics in data mining which provide convoluted biological insights and offer proof-of-concept proposition towards systems bioinformatics in the reconstruction of brain networks. Finally, we recommend a combination of high dimensional bioinformatics analysis with experimental validation to achieve translational neuroscience applications including biomarker discovery, therapeutic development, and elucidation of disease mechanisms. We conclude by providing future perspectives and opportunities in applying integrative multi-omics and systems bioinformatics to achieve precision phenotyping of neurodegenerative diseases and towards personalized medicine.
Collapse
Affiliation(s)
- Lance M. O'Connor
- College of Biological Sciences, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Blake A. O'Connor
- School of Pharmacy, University of Wisconsin, Madison, WI, 53705, USA
| | - Su Bin Lim
- Department of Biochemistry and Molecular Biology, Ajou University School of Medicine, Suwon, 16499, South Korea
| | - Jialiu Zeng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| | - Chih Hung Lo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| |
Collapse
|
19
|
Park J, Lee JW, Park M. Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis. BioData Min 2023; 16:18. [PMID: 37420304 DOI: 10.1186/s13040-023-00334-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 06/30/2023] [Indexed: 07/09/2023] Open
Abstract
BACKGROUND Cancer subtype identification is important for the early diagnosis of cancer and the provision of adequate treatment. Prior to identifying the subtype of cancer in a patient, feature selection is also crucial for reducing the dimensionality of the data by detecting genes that contain important information about the cancer subtype. Numerous cancer subtyping methods have been developed, and their performance has been compared. However, combinations of feature selection and subtype identification methods have rarely been considered. This study aimed to identify the best combination of variable selection and subtype identification methods in single omics data analysis. RESULTS Combinations of six filter-based methods and six unsupervised subtype identification methods were investigated using The Cancer Genome Atlas (TCGA) datasets for four cancers. The number of features selected varied, and several evaluation metrics were used. Although no single combination was found to have a distinctively good performance, Consensus Clustering (CC) and Neighborhood-Based Multi-omics Clustering (NEMO) used with variance-based feature selection had a tendency to show lower p-values, and nonnegative matrix factorization (NMF) stably showed good performance in many cases unless the Dip test was used for feature selection. In terms of accuracy, the combination of NMF and similarity network fusion (SNF) with Monte Carlo Feature Selection (MCFS) and Minimum-Redundancy Maximum Relevance (mRMR) showed good overall performance. NMF always showed among the worst performances without feature selection in all datasets, but performed much better when used with various feature selection methods. iClusterBayes (ICB) had decent performance when used without feature selection. CONCLUSIONS Rather than a single method clearly emerging as optimal, the best methodology was different depending on the data used, the number of features selected, and the evaluation method. A guideline for choosing the best combination method under various situations is provided.
Collapse
Affiliation(s)
- JiYoon Park
- Department of Statistics, Korea University, 145 Anam-Ro, Seongbuk-Gu, Seoul, 02841, South Korea
| | - Jae Won Lee
- Department of Statistics, Korea University, 145 Anam-Ro, Seongbuk-Gu, Seoul, 02841, South Korea
| | - Mira Park
- Department of Preventive Medicine, Eulji University, 77 Gyeryong-Ro, Jung-Gu, Daejeon, 34824, South Korea.
| |
Collapse
|
20
|
Chen Y, Meng J, Lu X, Li X, Wang C. Clustering analysis revealed the autophagy classification and potential autophagy regulators' sensitivity of pancreatic cancer based on multi-omics data. Cancer Med 2023; 12:733-746. [PMID: 35684936 PMCID: PMC9844610 DOI: 10.1002/cam4.4932] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 05/06/2022] [Accepted: 05/24/2022] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Pancreatic ductal adenocarcinoma (PDAC) is a lethal malignancy and is unresponsive to conventional therapeutic modalities due to its high heterogeneity, expounding the necessity, and priority of searching for effective biomarkers and drugs. Autophagy, as an evolutionarily conserved biological process, is upregulated in PDAC and its regulation is linked to a poor prognosis. Increased autophagy sequestered MHC-I on PDAC cells and weaken the antigen presentation and antitumor immune response, indicating the potential therapeutic strategies of autophagy inhibitors. METHODS By performing 10 state-of-the-art multi-omics clustering algorithms, we constructed a robust PDAC classification model to reveal the autophagy-related genes among different subgroups. OUTCOMES After building a more comprehensive regulating network for potential autophagy regulators exploration, we concluded the top 20 autophagy-related hub genes (GAPDH, MAPK3, RHEB, SQSTM1, EIF2S1, RAB5A, CTSD, MAP1LC3B, RAB7A, RAB11A, FADD, CFKN2A, HSP90AB1, VEGFA, RELA, DDIT3, HSPA5, BCL2L1, BAG3, and ERBB2), six miRNAs, five transcription factors, and five immune infiltrated cells as biomarkers. The drug sensitivity database was screened based on the biomarkers to predict possible drug-targeting signal pathways, hoping to yield novel insights, and promote the progress of the anticancer therapeutic strategy. CONCLUSION We succefully constructed an autophagy-related mRNA/miRNA/TF/Immune cells network based on a 10 state-of art algorithm multi-omics analysis, and screened the drug sensitivity dataset for detecting potential signal pathway which might be possible autophagy modulators' targets.
Collapse
Affiliation(s)
- Yonghao Chen
- Department of GastroenterologyWest China Hospital of Sichuan UniversityChengduSichuanP.R. China
| | - Jialin Meng
- Department of Urology, The First Affiliated Hospital of Anhui Medical UniversityHefeiP.R. China
- Institute of UrologyAnhui Medical UniversityHefeiP.R. China
- Anhui Province Key Laboratory of Genitourinary Diseases, Anhui Medical UniversityHefeiP.R. China
| | - Xiaofan Lu
- State Key Laboratory of Natural Medicines, Research Center of Biostatistics and Computational PharmacyChina Pharmaceutical UniversityNanjingP.R. China
| | - Xiao Li
- Department of GastroenterologyWest China Hospital of Sichuan UniversityChengduSichuanP.R. China
| | - Chunhui Wang
- Department of GastroenterologyWest China Hospital of Sichuan UniversityChengduSichuanP.R. China
| |
Collapse
|
21
|
Ni Y, He J, Chalise P. Randomized singular value decomposition for integrative subtype analysis of 'omics data' using non-negative matrix factorization. Stat Appl Genet Mol Biol 2023; 22:sagmb-2022-0047. [PMID: 37937887 DOI: 10.1515/sagmb-2022-0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 09/25/2023] [Indexed: 11/09/2023]
Abstract
Integration of multiple 'omics datasets for differentiating cancer subtypes is a powerful technic that leverages the consistent and complementary information across multi-omics data. Matrix factorization is a common technique used in integrative clustering for identifying latent subtype structure across multi-omics data. High dimensionality of the omics data and long computation time have been common challenges of clustering methods. In order to address the challenges, we propose randomized singular value decomposition (RSVD) for integrative clustering using Non-negative Matrix Factorization: intNMF-rsvd. The method utilizes RSVD to reduce the dimensionality by projecting the data into eigen vector space with user specified lower rank. Then, clustering analysis is carried out by estimating common basis matrix across the projected multi-omics datasets. The performance of the proposed method was assessed using the simulated datasets and compared with six state-of-the-art integrative clustering methods using real-life datasets from The Cancer Genome Atlas Study. intNMF-rsvd was found working efficiently and competitively as compared to standard intNMF and other multi-omics clustering methods. Most importantly, intNMF-rsvd can handle large number of features and significantly reduce the computation time. The identified subtypes can be utilized for further clinical association studies to understand the etiology of the disease.
Collapse
Affiliation(s)
- Yonghui Ni
- Department of Biostatistics and Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS 66160, USA
| | - Jianghua He
- Department of Biostatistics and Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS 66160, USA
| | - Prabhakar Chalise
- Department of Biostatistics and Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS 66160, USA
| |
Collapse
|
22
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
Affiliation(s)
- Efi Athieniti
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| | - George M. Spyrou
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| |
Collapse
|
23
|
Zou Y, Cao C, Wang Y, Zhou Y, Yao S, Zhang L, Zheng K, Zhang H, Qin W, Qin K, Xiong H, Yuan X, Fu S, Wang Y, Xiong H. Multi-omics consensus portfolio to refine the classification of lung adenocarcinoma with prognostic stratification, tumor microenvironment, and unique sensitivity to first-line therapies. Transl Lung Cancer Res 2022; 11:2243-2260. [PMID: 36519025 PMCID: PMC9742627 DOI: 10.21037/tlcr-22-775] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 11/21/2022] [Indexed: 09/09/2023]
Abstract
BACKGROUND Molecular classification of lung adenocarcinoma (LUAD) based on transcriptomic features has been widely studied. The complementarity of data obtained from multilayer molecular biology could help the LUAD classification via combining multi-omics information. METHODS We successfully divided samples from the The Cancer Genome Atlas (TCGA) (n=437) into four subtypes (CS1, CS2, CS3 and CS4) by 10 comprehensive multi-omics clustering methods in the "movics" R package. Meanwhile, external validation sets from different sequencing technologies proved the robustness of the grouping model. The relationship between subtypes, prognosis, molecular features, tumor microenvironment and response to first-line therapy was further analyzed. Next we used univariate Cox regression analysis and Lasso regression analysis to explore the application of biomarkers in clinical prognosis and constructed a prognostic model. RESULTS CS1 showed the worst overall survival (OS) among all four clusters, possibly related to its poor immune infiltration, higher tumor mutation and worse chromosomal stability. Patients in different subtypes differed significantly in cancer stem cell characteristics, activation of cancer-related pathways, sensitivity to chemotherapy and immunotherapy. The prognostic model showed good predictive performance. The 1-, 2- and 3-year areas under the curve of risk score were 0.779, 0.742 and 0.678, respectively. Seven genes (DKK1, TSPAN7, ID1, DLGAP5, HHIPL2, CD40 and SEMA3C) used to build the model may be potential therapeutic targets for LUAD. CONCLUSIONS Four LUAD subtypes with different molecular characteristics and clinical implications were identified successfully through bioinformatic analysis. Our results may contribute to precision medicine and inform the development of rational clinical strategies for targeted and immune therapies.
Collapse
Affiliation(s)
- Yanmei Zou
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Chenlin Cao
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yali Wang
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yilu Zhou
- Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton, UK
- Institute for Life Sciences, University of Southampton, Southampton, UK
| | - Shuo Yao
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Lili Zhang
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Kun Zheng
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Hong Zhang
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Wan Qin
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Kai Qin
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Huihua Xiong
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xianglin Yuan
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shengling Fu
- Department of Thoracic Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yihua Wang
- Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton, UK
- Institute for Life Sciences, University of Southampton, Southampton, UK
| | - Hua Xiong
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
24
|
Gross SM, Dane MA, Smith RL, Devlin KL, McLean IC, Derrick DS, Mills CE, Subramanian K, London AB, Torre D, Evangelista JE, Clarke DJB, Xie Z, Erdem C, Lyons N, Natoli T, Pessa S, Lu X, Mullahoo J, Li J, Adam M, Wassie B, Liu M, Kilburn DF, Liby TA, Bucher E, Sanchez-Aguila C, Daily K, Omberg L, Wang Y, Jacobson C, Yapp C, Chung M, Vidovic D, Lu Y, Schurer S, Lee A, Pillai A, Subramanian A, Papanastasiou M, Fraenkel E, Feiler HS, Mills GB, Jaffe JD, Ma’ayan A, Birtwistle MR, Sorger PK, Korkola JE, Gray JW, Heiser LM. A multi-omic analysis of MCF10A cells provides a resource for integrative assessment of ligand-mediated molecular and phenotypic responses. Commun Biol 2022; 5:1066. [PMID: 36207580 PMCID: PMC9546880 DOI: 10.1038/s42003-022-03975-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 09/12/2022] [Indexed: 02/01/2023] Open
Abstract
The phenotype of a cell and its underlying molecular state is strongly influenced by extracellular signals, including growth factors, hormones, and extracellular matrix proteins. While these signals are normally tightly controlled, their dysregulation leads to phenotypic and molecular states associated with diverse diseases. To develop a detailed understanding of the linkage between molecular and phenotypic changes, we generated a comprehensive dataset that catalogs the transcriptional, proteomic, epigenomic and phenotypic responses of MCF10A mammary epithelial cells after exposure to the ligands EGF, HGF, OSM, IFNG, TGFB and BMP2. Systematic assessment of the molecular and cellular phenotypes induced by these ligands comprise the LINCS Microenvironment (ME) perturbation dataset, which has been curated and made publicly available for community-wide analysis and development of novel computational methods ( synapse.org/LINCS_MCF10A ). In illustrative analyses, we demonstrate how this dataset can be used to discover functionally related molecular features linked to specific cellular phenotypes. Beyond these analyses, this dataset will serve as a resource for the broader scientific community to mine for biological insights, to compare signals carried across distinct molecular modalities, and to develop new computational methods for integrative data analysis.
Collapse
Affiliation(s)
- Sean M. Gross
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Mark A. Dane
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Rebecca L. Smith
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Kaylyn L. Devlin
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Ian C. McLean
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Daniel S. Derrick
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Caitlin E. Mills
- grid.38142.3c000000041936754XLaboratory of Systems Pharmacology, Department of Systems Biology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA USA
| | - Kartik Subramanian
- grid.38142.3c000000041936754XLaboratory of Systems Pharmacology, Department of Systems Biology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA USA
| | - Alexandra B. London
- grid.59734.3c0000 0001 0670 2351Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Denis Torre
- grid.59734.3c0000 0001 0670 2351Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - John Erol Evangelista
- grid.59734.3c0000 0001 0670 2351Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Daniel J. B. Clarke
- grid.59734.3c0000 0001 0670 2351Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Zhuorui Xie
- grid.59734.3c0000 0001 0670 2351Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Cemal Erdem
- grid.26090.3d0000 0001 0665 0280Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC USA
| | - Nicholas Lyons
- grid.66859.340000 0004 0546 1623Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Ted Natoli
- grid.66859.340000 0004 0546 1623Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Sarah Pessa
- grid.66859.340000 0004 0546 1623Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Xiaodong Lu
- grid.66859.340000 0004 0546 1623Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - James Mullahoo
- grid.66859.340000 0004 0546 1623Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Jonathan Li
- grid.116068.80000 0001 2341 2786Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Miriam Adam
- grid.116068.80000 0001 2341 2786Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Brook Wassie
- grid.116068.80000 0001 2341 2786Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Moqing Liu
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - David F. Kilburn
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Tiera A. Liby
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Elmar Bucher
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Crystal Sanchez-Aguila
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA
| | - Kenneth Daily
- grid.430406.50000 0004 6023 5303Sage Bionetworks, Seattle, WA USA
| | - Larsson Omberg
- grid.430406.50000 0004 6023 5303Sage Bionetworks, Seattle, WA USA
| | - Yunguan Wang
- grid.38142.3c000000041936754XLaboratory of Systems Pharmacology, Department of Systems Biology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA USA
| | - Connor Jacobson
- grid.38142.3c000000041936754XLaboratory of Systems Pharmacology, Department of Systems Biology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA USA
| | - Clarence Yapp
- grid.38142.3c000000041936754XLaboratory of Systems Pharmacology, Department of Systems Biology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA USA
| | - Mirra Chung
- grid.38142.3c000000041936754XLaboratory of Systems Pharmacology, Department of Systems Biology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA USA
| | - Dusica Vidovic
- grid.26790.3a0000 0004 1936 8606Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136 USA ,grid.26790.3a0000 0004 1936 8606Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136 USA ,grid.26790.3a0000 0004 1936 8606Institute for Data Science & Computing, University of Miami, Miami, FL 33136 USA
| | - Yiling Lu
- grid.240145.60000 0001 2291 4776Department of Genomic Medicine, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Stephan Schurer
- grid.26790.3a0000 0004 1936 8606Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136 USA ,grid.26790.3a0000 0004 1936 8606Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136 USA ,grid.26790.3a0000 0004 1936 8606Institute for Data Science & Computing, University of Miami, Miami, FL 33136 USA
| | - Albert Lee
- grid.94365.3d0000 0001 2297 5165Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, USA
| | - Ajay Pillai
- grid.94365.3d0000 0001 2297 5165Human Genome Research Institute, National Institutes of Health, Bethesda, USA
| | - Aravind Subramanian
- grid.66859.340000 0004 0546 1623Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Malvina Papanastasiou
- grid.66859.340000 0004 0546 1623Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Ernest Fraenkel
- grid.66859.340000 0004 0546 1623Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.116068.80000 0001 2341 2786Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Heidi S. Feiler
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA ,grid.5288.70000 0000 9758 5690Knight Cancer Institute, OHSU, Portland, OR USA
| | - Gordon B. Mills
- grid.5288.70000 0000 9758 5690Knight Cancer Institute, OHSU, Portland, OR USA ,grid.5288.70000 0000 9758 5690Division of Oncological Sciences, OHSU, Portland, OR USA
| | - Jake D. Jaffe
- grid.66859.340000 0004 0546 1623Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Avi Ma’ayan
- grid.59734.3c0000 0001 0670 2351Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Marc R. Birtwistle
- grid.26090.3d0000 0001 0665 0280Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC USA
| | - Peter K. Sorger
- grid.38142.3c000000041936754XLaboratory of Systems Pharmacology, Department of Systems Biology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA USA
| | - James E. Korkola
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA ,grid.5288.70000 0000 9758 5690Knight Cancer Institute, OHSU, Portland, OR USA
| | - Joe W. Gray
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA ,grid.5288.70000 0000 9758 5690Knight Cancer Institute, OHSU, Portland, OR USA
| | - Laura M. Heiser
- grid.5288.70000 0000 9758 5690Department of Biomedical Engineering, OHSU, Portland, OR USA ,grid.5288.70000 0000 9758 5690Knight Cancer Institute, OHSU, Portland, OR USA
| |
Collapse
|
25
|
Suter P, Dazert E, Kuipers J, Ng CKY, Boldanova T, Hall MN, Heim MH, Beerenwinkel N. Multi-omics subtyping of hepatocellular carcinoma patients using a Bayesian network mixture model. PLoS Comput Biol 2022; 18:e1009767. [PMID: 36067230 PMCID: PMC9481159 DOI: 10.1371/journal.pcbi.1009767] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 09/16/2022] [Accepted: 07/18/2022] [Indexed: 11/18/2022] Open
Abstract
Comprehensive molecular characterization of cancer subtypes is essential for predicting clinical outcomes and searching for personalized treatments. We present bnClustOmics, a statistical model and computational tool for multi-omics unsupervised clustering, which serves a dual purpose: Clustering patient samples based on a Bayesian network mixture model and learning the networks of omics variables representing these clusters. The discovered networks encode interactions among all omics variables and provide a molecular characterization of each patient subgroup. We conducted simulation studies that demonstrated the advantages of our approach compared to other clustering methods in the case where the generative model is a mixture of Bayesian networks. We applied bnClustOmics to a hepatocellular carcinoma (HCC) dataset comprising genome (mutation and copy number), transcriptome, proteome, and phosphoproteome data. We identified three main HCC subtypes together with molecular characteristics, some of which are associated with survival even when adjusting for the clinical stage. Cluster-specific networks shed light on the links between genotypes and molecular phenotypes of samples within their respective clusters and suggest targets for personalized treatments.
Collapse
Affiliation(s)
- Polina Suter
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Eva Dazert
- Biozentrum, University of Basel, Basel, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Charlotte K. Y. Ng
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department for BioMedical Research (DBMR), University of Bern, Bern, Switzerland
- Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland
- Institute of Medical Genetics and Pathology, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Tuyana Boldanova
- Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | | | - Markus H. Heim
- Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland
- Department of Gastroenterology and Hepatology, Clarunis, University Center for Gastrointestinal and Liver Diseases, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
26
|
Zhang J, Yang L, Zhang Y, Tang D, Liu T. Non-parameter clustering algorithm based on saturated neighborhood graph. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
Leng D, Zheng L, Wen Y, Zhang Y, Wu L, Wang J, Wang M, Zhang Z, He S, Bo X. A benchmark study of deep learning-based multi-omics data fusion methods for cancer. Genome Biol 2022; 23:171. [PMID: 35945544 PMCID: PMC9361561 DOI: 10.1186/s13059-022-02739-2] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/26/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fused method using a combination of multi-omics data enables a comprehensive study of complex biological processes and highlights the interrelationship of relevant biomolecules and their functions. Driven by high-throughput sequencing technologies, several promising deep learning methods have been proposed for fusing multi-omics data generated from a large number of samples. RESULTS In this study, 16 representative deep learning methods are comprehensively evaluated on simulated, single-cell, and cancer multi-omics datasets. For each of the datasets, two tasks are designed: classification and clustering. The classification performance is evaluated by using three benchmarking metrics including accuracy, F1 macro, and F1 weighted. Meanwhile, the clustering performance is evaluated by using four benchmarking metrics including the Jaccard index (JI), C-index, silhouette score, and Davies Bouldin score. For the cancer multi-omics datasets, the methods' strength in capturing the association of multi-omics dimensionality reduction results with survival and clinical annotations is further evaluated. The benchmarking results indicate that moGAT achieves the best classification performance. Meanwhile, efmmdVAE, efVAE, and lfmmdVAE show the most promising performance across all complementary contexts in clustering tasks. CONCLUSIONS Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate deep learning-based multi-omics data fusion methods, but also suggest the future directions for the development of more effective multi-omics data fusion methods. The deep learning frameworks are available at https://github.com/zhenglinyi/DL-mo .
Collapse
Affiliation(s)
- Dongjin Leng
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Linyi Zheng
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Yuqi Wen
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Yunhao Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Lianlian Wu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, People’s Republic of China
| | - Jing Wang
- School of Medicine, Tsinghua University, Beijing, People’s Republic of China
| | - Meihong Wang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Zhongnan Zhang
- School of Informatics, Xiamen University, Xiamen, People’s Republic of China
| | - Song He
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, People’s Republic of China
| |
Collapse
|
28
|
Zhao N, Xing Y, Hu Y, Chang H. Exploration of the Immunotyping Landscape and Immune Infiltration-Related Prognostic Markers in Ovarian Cancer Patients. Front Oncol 2022; 12:916251. [PMID: 35880167 PMCID: PMC9307664 DOI: 10.3389/fonc.2022.916251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 06/01/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundIncreasing evidence indicates that immune cell infiltration (ICI) affects the prognosis of multiple cancers. This study aims to explore the immunotypes and ICI-related biomarkers in ovarian cancer.MethodsThe ICI levels were quantified with the CIBERSORT and ESTIMATE algorithms. The unsupervised consensus clustering method determined immunotypes based on the ICI profiles. Characteristic genes were identified with the Boruta algorithm. Then, the ICI score, a novel prognostic marker, was generated with the principal component analysis of the characteristic genes. The relationships between the ICI scores and clinical features were revealed. Further, an ICI signature was integrated after the univariate Cox, lasso, and stepwise regression analyses. The accuracy and robustness of the model were tested by three independent cohorts. The roles of the model in the immunophenoscores (IPS), tumor immune dysfunction and exclusion (TIDE) scores, and immunotherapy responses were also explored. Finally, risk genes (GBP1P1, TGFBI, PLA2G2D) and immune cell marker genes (CD11B, NOS2, CD206, CD8A) were tested by qRT-PCR in clinical tissues.ResultsThree immunotypes were identified, and ICI scores were generated based on the 75 characteristic genes. CD8 TCR pathways, chemokine-related pathways, and lymphocyte activation were critical to immunophenotyping. Higher ICI scores contributed to better prognoses. An independent prognostic factor, a three-gene signature, was integrated to calculate patients’ risk scores. Higher TIDE scores, lower ICI scores, lower IPS, lower immunotherapy responses, and worse prognoses were revealed in high-risk patients. Macrophage polarization and CD8 T cell infiltration were indicated to play potentially important roles in the development of ovarian cancer in the clinical validation cohort.ConclusionsOur study characterized the immunotyping landscape and provided novel immune infiltration-related prognostic markers in ovarian cancer.
Collapse
Affiliation(s)
- Na Zhao
- Department of Gynecology, Dongying People’s Hospital, Dongying, China
| | - Yujuan Xing
- Department of Gynecology, Dongying People’s Hospital, Dongying, China
| | - Yanfang Hu
- Department of Gynecology, Dongying People’s Hospital, Dongying, China
- *Correspondence: Yanfang Hu, ; Hao Chang,
| | - Hao Chang
- Department of Cancer Research, Hanyu Biomed Center Beijing, Beijing, China
- *Correspondence: Yanfang Hu, ; Hao Chang,
| |
Collapse
|
29
|
Zhang X, Zhou Z, Xu H, Liu CT. Integrative clustering methods for multi-omics data. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2022; 14. [PMID: 35573155 PMCID: PMC9097984 DOI: 10.1002/wics.1553] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Integrative analysis of multi-omics data has drawn much attention from the scientific community due to the technological advancements which have generated various omics data. Leveraging these multi-omics data potentially provides a more comprehensive view of the disease mechanism or biological processes. Integrative multi-omics clustering is an unsupervised integrative method specifically used to find coherent groups of samples or features by utilizing information across multi-omics data. It aims to better stratify diseases and to suggest biological mechanisms and potential targeted therapies for the diseases. However, applying integrative multi-omics clustering is both statistically and computationally challenging due to various reasons such as high dimensionality and heterogeneity. In this review, we summarized integrative multi-omics clustering methods into three general categories: concatenated clustering, clustering of clusters, and interactive clustering based on when and how the multi-omics data are processed for clustering. We further classified the methods into different approaches under each category based on the main statistical strategy used during clustering. In addition, we have provided recommended practices tailored to four real-life scenarios to help researchers to strategize their selection in integrative multi-omics clustering methods for their future studies.
Collapse
Affiliation(s)
- Xiaoyu Zhang
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Zhenwei Zhou
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Hanfei Xu
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
30
|
Vahabi N, Michailidis G. Unsupervised Multi-Omics Data Integration Methods: A Comprehensive Review. Front Genet 2022; 13:854752. [PMID: 35391796 PMCID: PMC8981526 DOI: 10.3389/fgene.2022.854752] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/28/2022] [Indexed: 12/26/2022] Open
Abstract
Through the developments of Omics technologies and dissemination of large-scale datasets, such as those from The Cancer Genome Atlas, Alzheimer’s Disease Neuroimaging Initiative, and Genotype-Tissue Expression, it is becoming increasingly possible to study complex biological processes and disease mechanisms more holistically. However, to obtain a comprehensive view of these complex systems, it is crucial to integrate data across various Omics modalities, and also leverage external knowledge available in biological databases. This review aims to provide an overview of multi-Omics data integration methods with different statistical approaches, focusing on unsupervised learning tasks, including disease onset prediction, biomarker discovery, disease subtyping, module discovery, and network/pathway analysis. We also briefly review feature selection methods, multi-Omics data sets, and resources/tools that constitute critical components for carrying out the integration.
Collapse
Affiliation(s)
- Nasim Vahabi
- Informatics Institute, University of Florida, Gainesville, FL, United States
| | - George Michailidis
- Informatics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
31
|
Pierre-Jean M, Mauger F, Deleuze JF, Le Floch E. PIntMF: Penalized Integrative Matrix Factorization method for multi-omics data. Bioinformatics 2021; 38:900-907. [PMID: 34849583 PMCID: PMC8796362 DOI: 10.1093/bioinformatics/btab786] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 09/30/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION It is more and more common to perform multi-omics analyses to explore the genome at diverse levels and not only at a single level. Through integrative statistical methods, multi-omics data have the power to reveal new biological processes, potential biomarkers and subgroups in a cohort. Matrix factorization (MF) is an unsupervised statistical method that allows a clustering of individuals, but also reveals relevant omics variables from the various blocks. RESULTS Here, we present PIntMF (Penalized Integrative Matrix Factorization), an MF model with sparsity, positivity and equality constraints. To induce sparsity in the model, we used a classical Lasso penalization on variable and individual matrices. For the matrix of samples, sparsity helps in the clustering, while normalization (matching an equality constraint) of inferred coefficients is added to improve interpretation. Moreover, we added an automatic tuning of the sparsity parameters using the famous glmnet package. We also proposed three criteria to help the user to choose the number of latent variables. PIntMF was compared with other state-of-the-art integrative methods including feature selection techniques in both synthetic and real data. PIntMF succeeds in finding relevant clusters as well as variables in two types of simulated data (correlated and uncorrelated). Next, PIntMF was applied to two real datasets (Diet and cancer), and it revealed interpretable clusters linked to available clinical data. Our method outperforms the existing ones on two criteria (clustering and variable selection). We show that PIntMF is an easy, fast and powerful tool to extract patterns and cluster samples from multi-omics data. AVAILABILITY AND IMPLEMENTATION An R package is available at https://github.com/mpierrejean/pintmf. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Florence Mauger
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| | - Edith Le Floch
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| |
Collapse
|
32
|
Cheng K, Martin‐Sancho L, Pal LR, Pu Y, Riva L, Yin X, Sinha S, Nair NU, Chanda SK, Ruppin E. Genome-scale metabolic modeling reveals SARS-CoV-2-induced metabolic changes and antiviral targets. Mol Syst Biol 2021; 17:e10260. [PMID: 34709707 PMCID: PMC8552660 DOI: 10.15252/msb.202110260] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 09/29/2021] [Accepted: 09/30/2021] [Indexed: 12/15/2022] Open
Abstract
Tremendous progress has been made to control the COVID-19 pandemic caused by the SARS-CoV-2 virus. However, effective therapeutic options are still rare. Drug repurposing and combination represent practical strategies to address this urgent unmet medical need. Viruses, including coronaviruses, are known to hijack host metabolism to facilitate viral proliferation, making targeting host metabolism a promising antiviral approach. Here, we describe an integrated analysis of 12 published in vitro and human patient gene expression datasets on SARS-CoV-2 infection using genome-scale metabolic modeling (GEM), revealing complicated host metabolism reprogramming during SARS-CoV-2 infection. We next applied the GEM-based metabolic transformation algorithm to predict anti-SARS-CoV-2 targets that counteract the virus-induced metabolic changes. We successfully validated these targets using published drug and genetic screen data and by performing an siRNA assay in Caco-2 cells. Further generating and analyzing RNA-sequencing data of remdesivir-treated Vero E6 cell samples, we predicted metabolic targets acting in combination with remdesivir, an approved anti-SARS-CoV-2 drug. Our study provides clinical data-supported candidate anti-SARS-CoV-2 targets for future evaluation, demonstrating host metabolism targeting as a promising antiviral strategy.
Collapse
Affiliation(s)
- Kuoyuan Cheng
- Cancer Data Science Laboratory (CDSL)National Cancer Institute (NCI)National Institutes of Health (NIH)BethesdaMDUSA
- Biological Sciences Graduate Program (BISI)University of MarylandCollege ParkMDUSA
| | - Laura Martin‐Sancho
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease CenterSanford Burnham Prebys Medical Discovery InstituteLa JollaCAUSA
| | - Lipika R Pal
- Cancer Data Science Laboratory (CDSL)National Cancer Institute (NCI)National Institutes of Health (NIH)BethesdaMDUSA
| | - Yuan Pu
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease CenterSanford Burnham Prebys Medical Discovery InstituteLa JollaCAUSA
| | - Laura Riva
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease CenterSanford Burnham Prebys Medical Discovery InstituteLa JollaCAUSA
- Present address:
Calibr, a Division of The Scripps Research InstituteLa JollaCAUSA
| | - Xin Yin
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease CenterSanford Burnham Prebys Medical Discovery InstituteLa JollaCAUSA
- State Key Laboratory of Veterinary BiotechnologyHarbin Veterinary Research InstituteChinese Academy of Agricultural SciencesHarbinChina
| | - Sanju Sinha
- Cancer Data Science Laboratory (CDSL)National Cancer Institute (NCI)National Institutes of Health (NIH)BethesdaMDUSA
- Biological Sciences Graduate Program (BISI)University of MarylandCollege ParkMDUSA
| | - Nishanth Ulhas Nair
- Cancer Data Science Laboratory (CDSL)National Cancer Institute (NCI)National Institutes of Health (NIH)BethesdaMDUSA
| | - Sumit K Chanda
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease CenterSanford Burnham Prebys Medical Discovery InstituteLa JollaCAUSA
| | - Eytan Ruppin
- Cancer Data Science Laboratory (CDSL)National Cancer Institute (NCI)National Institutes of Health (NIH)BethesdaMDUSA
- Department of Computer ScienceUniversity of MarylandCollege ParkMDUSA
| |
Collapse
|
33
|
Cheng K, Martin-Sancho L, Pal LR, Pu Y, Riva L, Yin X, Sinha S, Nair NU, Chanda SK, Ruppin E. Genome-scale metabolic modeling reveals SARS-CoV-2-induced metabolic changes and antiviral targets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.01.27.428543. [PMID: 33532779 PMCID: PMC7852273 DOI: 10.1101/2021.01.27.428543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Tremendous progress has been made to control the COVID-19 pandemic caused by the SARS-CoV-2 virus. However, effective therapeutic options are still rare. Drug repurposing and combination represent practical strategies to address this urgent unmet medical need. Viruses, including coronaviruses, are known to hijack host metabolism to facilitate viral proliferation, making targeting host metabolism a promising antiviral approach. Here, we describe an integrated analysis of 12 published in vitro and human patient gene expression datasets on SARS-CoV-2 infection using genome-scale metabolic modeling (GEM), revealing complicated host metabolism reprogramming during SARS-CoV-2 infection. We next applied the GEM-based metabolic transformation algorithm to predict anti-SARS-CoV-2 targets that counteract the virus-induced metabolic changes. We successfully validated these targets using published drug and genetic screen data and by performing an siRNA assay in Caco-2 cells. Further generating and analyzing RNA-sequencing data of remdesivir-treated Vero E6 cell samples, we predicted metabolic targets acting in combination with remdesivir, an approved anti-SARS-CoV-2 drug. Our study provides clinical data-supported candidate anti-SARS-CoV-2 targets for future evaluation, demonstrating host metabolism-targeting as a promising antiviral strategy.
Collapse
Affiliation(s)
- Kuoyuan Cheng
- Cancer Data Science Laboratory (CDSL), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, MD, USA
- Biological Sciences Graduate Program (BISI), University of Maryland, College Park, MD, USA
| | - Laura Martin-Sancho
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Lipika R. Pal
- Cancer Data Science Laboratory (CDSL), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Yuan Pu
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Laura Riva
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Xin Yin
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
- State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin, China
| | - Sanju Sinha
- Cancer Data Science Laboratory (CDSL), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, MD, USA
- Biological Sciences Graduate Program (BISI), University of Maryland, College Park, MD, USA
| | - Nishanth Ulhas Nair
- Cancer Data Science Laboratory (CDSL), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Sumit K. Chanda
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Eytan Ruppin
- Cancer Data Science Laboratory (CDSL), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| |
Collapse
|
34
|
Tan K, Huang W, Liu X, Hu J, Dong S. A Hierarchical Graph Convolution Network for Representation Learning of Gene Expression Data. IEEE J Biomed Health Inform 2021; 25:3219-3229. [PMID: 33449889 DOI: 10.1109/jbhi.2021.3052008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The curse of dimensionality, which is caused by high-dimensionality and low-sample-size, is a major challenge in gene expression data analysis. However, the real situation is even worse: labelling data is laborious and time-consuming, so only a small part of the limited samples will be labelled. Having such few labelled samples further increases the difficulty of training deep learning models. Interpretability is an important requirement in biomedicine. Many existing deep learning methods are trying to provide interpretability, but rarely apply to gene expression data. Recent semi-supervised graph convolution network methods try to address these problems by smoothing the label information over a graph. However, to the best of our knowledge, these methods only utilize graphs in either the feature space or sample space, which restrict their performance. We propose a transductive semi-supervised representation learning method called a hierarchical graph convolution network (HiGCN) to aggregate the information of gene expression data in both feature and sample spaces. HiGCN first utilizes external knowledge to construct a feature graph and a similarity kernel to construct a sample graph. Then, two spatial-based GCNs are used to aggregate information on these graphs. To validate the model's performance, synthetic and real datasets are provided to lend empirical support. Compared with two recent models and three traditional models, HiGCN learns better representations of gene expression data, and these representations improve the performance of downstream tasks, especially when the model is trained on a few labelled samples. Important features can be extracted from our model to provide reliable interpretability.
Collapse
|
35
|
Picard M, Scott-Boyer MP, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 2021; 19:3735-3746. [PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030] [Citation(s) in RCA: 205] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 12/25/2022] Open
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
Collapse
Affiliation(s)
- Milan Picard
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- Corresponding author.
| |
Collapse
|
36
|
Tarazona S, Arzalluz-Luque A, Conesa A. Undisclosed, unmet and neglected challenges in multi-omics studies. NATURE COMPUTATIONAL SCIENCE 2021; 1:395-402. [PMID: 38217236 DOI: 10.1038/s43588-021-00086-z] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/17/2021] [Indexed: 01/15/2024]
Abstract
Multi-omics approaches have become a reality in both large genomics projects and small laboratories. However, the multi-omics research community still faces a number of issues that have either not been sufficiently discussed or for which current solutions are still limited. In this Perspective, we elaborate on these limitations and suggest points of attention for future research. We finally discuss new opportunities and challenges brought to the field by the rapid development of single-cell high-throughput molecular technologies.
Collapse
Affiliation(s)
- Sonia Tarazona
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Angeles Arzalluz-Luque
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Ana Conesa
- Microbiology and Cell Science Department, Institute for Food and Agricultural Research, University of Florida, Gainesville, FL, USA.
- Genetics Institute, University of Florida, Gainesville, FL, USA.
- Institute for Integrative Systems Biology, Spanish National Research Council, Valencia, Spain.
| |
Collapse
|
37
|
TSCCA: A tensor sparse CCA method for detecting microRNA-gene patterns from multiple cancers. PLoS Comput Biol 2021; 17:e1009044. [PMID: 34061840 PMCID: PMC8195367 DOI: 10.1371/journal.pcbi.1009044] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 06/11/2021] [Accepted: 05/05/2021] [Indexed: 12/22/2022] Open
Abstract
Existing studies have demonstrated that dysregulation of microRNAs (miRNAs or miRs) is involved in the initiation and progression of cancer. Many efforts have been devoted to identify microRNAs as potential biomarkers for cancer diagnosis, prognosis and therapeutic targets. With the rapid development of miRNA sequencing technology, a vast amount of miRNA expression data for multiple cancers has been collected. These invaluable data repositories provide new paradigms to explore the relationship between miRNAs and cancer. Thus, there is an urgent need to explore the complex cancer-related miRNA-gene patterns by integrating multi-omics data in a pan-cancer paradigm. In this study, we present a tensor sparse canonical correlation analysis (TSCCA) method for identifying cancer-related miRNA-gene modules across multiple cancers. TSCCA is able to overcome the drawbacks of existing solutions and capture both the cancer-shared and specific miRNA-gene co-expressed modules with better biological interpretations. We comprehensively evaluate the performance of TSCCA using a set of simulated data and matched miRNA/gene expression data across 33 cancer types from the TCGA database. We uncover several dysfunctional miRNA-gene modules with important biological functions and statistical significance. These modules can advance our understanding of miRNA regulatory mechanisms of cancer and provide insights into miRNA-based treatments for cancer. MicroRNAs (miRNAs) are a class of small non-coding RNAs. Previous studies have revealed that miRNA-gene regulatory modules play key roles in the occurrence and development of cancer. However, little has been done to discover miRNA-gene regulatory modules from a pan-cancer view. Thus, it is urgently needed to develop new methods to explore the complex cancer-related miRNA-gene patterns by integrating multi-omics data of multi-cancers. To build the connections between miRNA-gene regulatory modules across different cancer types, we propose a tensor sparse canonical correlation analysis (TSCCA) method. Our specific contributions are two-fold: (1) We propose a sparse statistical learning model TSCCA and an efficient block-coordinate descent algorithm to solve it. (2) We apply TSCCA to a multi-omics data set of 33 cancer types from TCGA and identify some cancer-related miRNA-gene modules with important biological functions and statistical significance.
Collapse
|
38
|
Casadio M, Biancaniello F, Overi D, Venere R, Carpino G, Gaudio E, Alvaro D, Cardinale V. Molecular Landscape and Therapeutic Strategies in Cholangiocarcinoma: An Integrated Translational Approach towards Precision Medicine. Int J Mol Sci 2021; 22:5613. [PMID: 34070643 PMCID: PMC8199244 DOI: 10.3390/ijms22115613] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 05/17/2021] [Accepted: 05/19/2021] [Indexed: 12/15/2022] Open
Abstract
Cholangiocarcinomas (CCAs) are heterogeneous biliary tract malignancies with dismal prognosis, mainly due to tumor aggressiveness, late diagnosis, and poor response to current therapeutic options. High-throughput technologies have been used as a fundamental tool in unveiling CCA molecular landscape, and several molecular classifications have been proposed, leading to various targeted therapy trials. In this review, we aim to analyze the critical issues concerning the status of precision medicine in CCA, discussing molecular signatures and clusters, related to both anatomical classification and different etiopathogenesis, and the latest therapeutic strategies. Furthermore, we propose an integrated approach comprising the CCA molecular mechanism, pathobiology, clinical and histological findings, and treatment perspectives for the ultimate purpose of improving the methods of patient allocations in clinical trials and the response to personalized therapies.
Collapse
Affiliation(s)
- Marco Casadio
- Department of Translational and Precision Medicine, Sapienza University of Rome, Viale dell’Università 37, 00185 Rome, Italy; (M.C.); (R.V.); (D.A.)
| | - Francesca Biancaniello
- Department of Translational and Precision Medicine, Sapienza University of Rome, Viale dell’Università 37, 00185 Rome, Italy; (M.C.); (R.V.); (D.A.)
| | - Diletta Overi
- Department of Anatomical, Histological, Forensic Medicine and Orthopedics Sciences, Sapienza University of Rome, Via Borelli 50, 00161 Rome, Italy; (D.O.); (E.G.)
| | - Rosanna Venere
- Department of Translational and Precision Medicine, Sapienza University of Rome, Viale dell’Università 37, 00185 Rome, Italy; (M.C.); (R.V.); (D.A.)
| | - Guido Carpino
- Department of Movement, Human and Health Sciences, Division of Health Sciences, University of Rome “Foro Italico”, Piazza Lauro de Bosis 6, 00135 Rome, Italy;
| | - Eugenio Gaudio
- Department of Anatomical, Histological, Forensic Medicine and Orthopedics Sciences, Sapienza University of Rome, Via Borelli 50, 00161 Rome, Italy; (D.O.); (E.G.)
| | - Domenico Alvaro
- Department of Translational and Precision Medicine, Sapienza University of Rome, Viale dell’Università 37, 00185 Rome, Italy; (M.C.); (R.V.); (D.A.)
| | - Vincenzo Cardinale
- Medical-Surgical and Biotechnologies Sciences, Polo Pontino, Sapienza University of Rome, Corso della Repubblica 79, 04100 Latina, Italy;
| |
Collapse
|
39
|
Tolani P, Gupta S, Yadav K, Aggarwal S, Yadav AK. Big data, integrative omics and network biology. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:127-160. [PMID: 34340766 DOI: 10.1016/bs.apcsb.2021.03.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
A cell integrates various signals through a network of biomolecules that crosstalk to synergistically regulate the replication, transcription, translation and other metabolic activities of a cell. These networks regulate signal perception and processing that drives biological functions. The biological complexity cannot be fully captured by a single -omics discipline. The holistic study of an organism-in health, perturbation, exposure to environment and disease, is studied under systems biology. The bottom-up molecular approaches (genes, mRNA, protein, metabolite, etc.) have laid the foundation of current biological knowledge covering the horizon from viruses, bacteria, fungi, plants and animals. Yet, these techniques provide a rather myopic view of biology at the molecular level. To understand how the interconnected molecular components are formed and rewired in disease or exposure to environmental stimuli is the holy grail of modern biology. The omics era was heralded by the genomics revolution but advanced sequencing techniques are now also ubiquitous in transcriptomics, proteomics, metabolomics and lipidomics. Multi-omics data analysis and integration techniques are driving the quest for deeper insights into how the different layers of biomolecules talk to each other in diverse contexts.
Collapse
Affiliation(s)
- Priya Tolani
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India
| | - Srishti Gupta
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Kirti Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; Department of Pharmaceutical Biotechnology, Delhi Pharmaceutical Sciences and Research University, New Delhi, India
| | - Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; Department of Molecular Biology and Biotechnology, Cotton University, Guwahati, Assam, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India.
| |
Collapse
|
40
|
Palou-Márquez G, Subirana I, Nonell L, Fernández-Sanlés A, Elosua R. DNA methylation and gene expression integration in cardiovascular disease. Clin Epigenetics 2021; 13:75. [PMID: 33836805 PMCID: PMC8034168 DOI: 10.1186/s13148-021-01064-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 03/29/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The integration of different layers of omics information is an opportunity to tackle the complexity of cardiovascular diseases (CVD) and to identify new predictive biomarkers and potential therapeutic targets. Our aim was to integrate DNA methylation and gene expression data in an effort to identify biomarkers related to cardiovascular disease risk in a community-based population. We accessed data from the Framingham Offspring Study, a cohort study with data on DNA methylation (Infinium HumanMethylation450 BeadChip; Illumina) and gene expression (Human Exon 1.0 ST Array; Affymetrix). Using the MOFA2 R package, we integrated these data to identify biomarkers related to the risk of presenting a cardiovascular event. RESULTS Four independent latent factors (9, 19, 21-only in women-and 27), driven by DNA methylation, were associated with cardiovascular disease independently of classical risk factors and cell-type counts. In a sensitivity analysis, we also identified factor 21 as associated with CVD in women. Factors 9, 21 and 27 were also associated with coronary heart disease risk. Moreover, in a replication effort in an independent study three of the genes included in factor 27 were also present in a factor identified to be associated with myocardial infarction (CDC42BPB, MAN2A2 and RPTOR). Factor 9 was related to age and cell-type proportions; factor 19 was related to age and B cells count; factor 21 pointed to human immunodeficiency virus infection-related pathways and inflammation; and factor 27 was related to lifestyle factors such as alcohol consumption, smoking and body mass index. Inclusion of factor 21 (only in women) improved the discriminative and reclassification capacity of the Framingham classical risk function and factor 27 improved its discrimination. CONCLUSIONS Unsupervised multi-omics data integration methods have the potential to provide insights into the pathogenesis of cardiovascular diseases. We identified four independent factors (one only in women) pointing to inflammation, endothelium homeostasis, visceral fat, cardiac remodeling and lifestyles as key players in the determination of cardiovascular risk. Moreover, two of these factors improved the predictive capacity of a classical risk function.
Collapse
Affiliation(s)
- Guillermo Palou-Márquez
- Cardiovascular Epidemiology and Genetics Research Group, Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003, Barcelona, Spain
- Pompeu Fabra University (UPF), Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Isaac Subirana
- Cardiovascular Epidemiology and Genetics Research Group, Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003, Barcelona, Spain
- CIBER Epidemiology and Public Health (CIBERESP), Barcelona, Spain
| | - Lara Nonell
- MARGenomics, Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Alba Fernández-Sanlés
- Cardiovascular Epidemiology and Genetics Research Group, Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003, Barcelona, Spain
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
| | - Roberto Elosua
- Cardiovascular Epidemiology and Genetics Research Group, Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003, Barcelona, Spain.
- CIBER Cardiovascular Diseases (CIBERCV), Barcelona, Spain.
- Medicine Department, Faculty of Medicine, University of Vic-Central University of Catalonia (UVic-UCC), Vic, Spain.
| |
Collapse
|
41
|
Zhong Q, Lu M, Yuan W, Cui Y, Ouyang H, Fan Y, Wang Z, Wu C, Qiao J, Hang J. Eight-lncRNA signature of cervical cancer were identified by integrating DNA methylation, copy number variation and transcriptome data. J Transl Med 2021; 19:58. [PMID: 33557879 PMCID: PMC8045209 DOI: 10.1186/s12967-021-02705-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 01/12/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Copy number variation (CNV) suggests genetic changes in malignant tumors. Abnormal expressions of long non-coding RNAs (lncRNAs) resulted from genomic and epigenetic abnormalities play a driving role in tumorigenesis of cervical cancer. However, the role of lncRNAs-related CNV in cervical cancer remained largely unclear. METHODS The data of messenger RNAs (mRNAs), DNA methylation, and DNA copy number were collected from 292 cervical cancer specimens. The prognosis-related subtypes of cervical cancer were determined by multi-omics integration analysis, and protein-coding genes (PCGs) and lncRNAs with subtype-specific expressions were identified. The CNV pattern of the subtype-specific lncRNAs was analyzed to identify the subtype-specific lncRNAs. A prognostic risk model based on lncRNAs was established by least absolute shrinkage and selection operator (LASSO). RESULTS Multi-omics integration analysis identified three molecular subtypes incorporating 617 differentially expressed lncRNAs and 1395 differentially expressed PCGs. The 617 lncRNAs were found to intersect with disease-related lncRNAs. Functional enrichment showed that 617 lncRNAs were mainly involved in tumor metabolism, immunity and other pathways, such as p53 and cAMP signaling pathways, which are closely related to the development of cervical cancer. Finally, according to CNV pattern consistent with differential expression analysis, we established a lncRNAs-based signature consisted of 8 lncRNAs, namely, RUSC1-AS1, LINC01990, LINC01411, LINC02099, H19, LINC00452, ADPGK-AS1, C1QTNF1-AS1. The interaction of the 8 lncRNAs showed a significantly poor prognosis of cervical cancer patients, which has also been verified in an independent dataset. CONCLUSION Our study expanded the network of CNVs and improved the understanding on the regulatory network of lncRNAs in cervical cancer, providing novel biomarkers for the prognosis management of cervical cancer patients.
Collapse
Affiliation(s)
- Qihang Zhong
- Institute of Systems Biomedicine, School of Basic Medical Sciences, Peking University Health Science Center, Peking University, HaiDian District, No. 38 XueYuan Road, Beijing, 100191, China.,Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, HaiDian District, No. 49 North HuaYuan Road, Beijing, 100191, China
| | - Minzhen Lu
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, HaiDian District, No. 49 North HuaYuan Road, Beijing, 100191, China.,National Clinical Research Center for Obstetrics and Gynecology, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, 100191, China
| | - Wanqiong Yuan
- Department of Orthopedics, Peking University Third Hospital, Beijing, 100091, China.,Beijing Key Laboratory of Spinal Disease Research, Beijing, 100191, China
| | - Yueyi Cui
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, HaiDian District, No. 49 North HuaYuan Road, Beijing, 100191, China
| | - Hanqiang Ouyang
- Department of Orthopedics, Peking University Third Hospital, Beijing, 100091, China
| | - Yong Fan
- Key Laboratory for Major Obstetric Diseases of Guangdong Province, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, 510150, China
| | - Zhaohui Wang
- State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Congying Wu
- Institute of Systems Biomedicine, School of Basic Medical Sciences, Peking University Health Science Center, Peking University, HaiDian District, No. 38 XueYuan Road, Beijing, 100191, China.
| | - Jie Qiao
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, HaiDian District, No. 49 North HuaYuan Road, Beijing, 100191, China. .,National Clinical Research Center for Obstetrics and Gynecology, Beijing, 100191, China. .,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China. .,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, 100191, China.
| | - Jing Hang
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, HaiDian District, No. 49 North HuaYuan Road, Beijing, 100191, China. .,National Clinical Research Center for Obstetrics and Gynecology, Beijing, 100191, China. .,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China. .,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, 100191, China.
| |
Collapse
|
42
|
Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun 2021; 12:124. [PMID: 33402734 PMCID: PMC7785750 DOI: 10.1038/s41467-020-20430-7] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 12/02/2020] [Indexed: 01/08/2023] Open
Abstract
High-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve proper integration, joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines. We perform a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluate their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we use TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assess their classification of multi-omics single-cell data. From these in-depth comparisons, we observe that intNMF performs best in clustering, while MCIA offers an effective behavior across many contexts. The code developed for this benchmark study is implemented in a Jupyter notebook—multi-omics mix (momix)—to foster reproducibility, and support users and future developers. Advances in omics technology have resulted in the generation of multi-view data for cancer samples. Here, the authors compare dimensionality reduction techniques using simulated and TCGA data and identify the features of the methods with superior performance.
Collapse
|
43
|
Genitsaridi E, Hoare DJ, Kypraios T, Hall DA. A Review and a Framework of Variables for Defining and Characterizing Tinnitus Subphenotypes. Brain Sci 2020; 10:E938. [PMID: 33291859 PMCID: PMC7762072 DOI: 10.3390/brainsci10120938] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 11/26/2020] [Accepted: 12/01/2020] [Indexed: 02/07/2023] Open
Abstract
Tinnitus patients can present with various characteristics, such as those related to the tinnitus perception, symptom severity, and pattern of comorbidities. It is speculated that this phenotypic heterogeneity is associated with differences in the underlying pathophysiology and personal reaction to the condition. However, there is as yet no established protocol for tinnitus profiling or subtyping, hindering progress in treatment development. This review summarizes data on variables that have been used in studies investigating phenotypic differences in subgroups of tinnitus, including variables used to both define and compare subgroups. A PubMed search led to the identification of 64 eligible articles. In most studies, variables for subgrouping were chosen by the researchers (hypothesis-driven approach). Other approaches included application of unsupervised machine-learning techniques for the definition of subgroups (data-driven), and subgroup definition based on the response to a tinnitus treatment (treatment response). A framework of 94 variable concepts was created to summarize variables used across all studies. Frequency statistics for the use of each variable concept are presented, demonstrating those most and least commonly assessed. This review highlights the high dimensionality of tinnitus heterogeneity. The framework of variables can contribute to the design of future studies, helping to decide on tinnitus assessment and subgrouping.
Collapse
Affiliation(s)
- Eleni Genitsaridi
- Hearing Sciences, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Nottingham NG7 2RD, UK; (D.J.H.); (D.A.H.)
- National Institute for Health Research Nottingham Biomedical Research Centre, Nottingham NG1 5DU, UK
| | - Derek J. Hoare
- Hearing Sciences, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Nottingham NG7 2RD, UK; (D.J.H.); (D.A.H.)
- National Institute for Health Research Nottingham Biomedical Research Centre, Nottingham NG1 5DU, UK
| | - Theodore Kypraios
- School of Mathematical Sciences, University of Nottingham, Nottingham NG7 2RD, UK;
| | - Deborah A. Hall
- Hearing Sciences, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Nottingham NG7 2RD, UK; (D.J.H.); (D.A.H.)
- National Institute for Health Research Nottingham Biomedical Research Centre, Nottingham NG1 5DU, UK
- Queens Medical Centre, Nottingham University Hospitals NHS Trust, Nottingham NG7 2UH, UK
- University of Nottingham Malaysia, Semenyih 43500, Selangor Darul Ehsan, Malaysia
| |
Collapse
|