1
|
Giordano R, Arendt-Nielsen L, Gerra MC, Kappel A, Østergaard SE, Capriotti C, Dallabona C, Petersen KKS. Pain mechanistic networks: the development using supervised multivariate data analysis and implications for chronic pain. Pain 2025; 166:847-857. [PMID: 39297729 DOI: 10.1097/j.pain.0000000000003410] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 08/20/2024] [Indexed: 03/20/2025]
Abstract
ABSTRACT Chronic postoperative pain is present in approximately 20% of patients undergoing total knee arthroplasty. Studies indicate that pain mechanisms are associated with development and maintenance of chronic postoperative pain. The current study assessed pain sensitivity, inflammation, microRNAs, and psychological factors and combined these in a network to describe chronic postoperative pain. This study involved 75 patients with and without chronic postoperative pain after total knee arthroplasty. Clinical pain intensity, Oxford Knee Score, and pain catastrophizing were assessed as clinical parameters. Quantitative sensory testing was assessed to evaluate pain sensitivity and microRNAs, and inflammatory markers were likewise analyzed. Supervised multivariate data analysis with "Data Integration Analysis for Biomarker Discovery" using Latent cOmponents (DIABLO) was used to describe the chronic postoperative pain intensity. Two DIABLO models were constructed by dividing the patients into 3 groups or 2 defined by clinical pain intensities. Data Integration Analysis for Biomarker discovery using Latent cOmponents model explained chronic postoperative pain and identified factors involved in pain mechanistic networks among assessments included in the analysis. Developing models of 3 or 2 patient groups using the assessments and the networks could explain 81% and 69% of the variability in clinical postoperative pain intensity. The reduction of the number of parameters stabilized the models and reduced the explanatory value to 69% and 51%. This is the first study to use the DIABLO model for chronic postoperative pain and to demonstrate how different pain mechanisms form a pain mechanistic network. The complex model explained 81% of the variability of clinical pain intensity, whereas the less complex model explained 51% of the variability of clinical pain intensity.
Collapse
Affiliation(s)
- Rocco Giordano
- Department of Oral and Maxillofacial Surgery, Aalborg University Hospital, Aalborg, Denmark
- Center for Neuroplasticity and Pain (CNAP), SMI®, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Lars Arendt-Nielsen
- Center for Neuroplasticity and Pain (CNAP), SMI®, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Center for Mathematical Modeling of Knee Osteoarthritis (MathKOA), Department of Material and Production, Faculty of Engineering and Science, Aalborg University, Aalborg, Denmark
- Department of Gastroenterology & Hepatology, MechSense, Aalborg University Hospital, Aalborg, Denmark
- Steno Diabetes Center North Denmark, Clinical Institute, Aalborg University Hospital, Aalborg, Denmark
| | - Maria Carla Gerra
- Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Andreas Kappel
- Interdisciplinary Orthopedics, Department of Orthopedic Surgery, Aalborg University Hospital, Aalborg University Hospital, Aalborg, Denmark
| | - Svend Erik Østergaard
- Interdisciplinary Orthopedics, Department of Orthopedic Surgery, Aalborg University Hospital, Aalborg University Hospital, Aalborg, Denmark
| | - Camila Capriotti
- Center for Neuroplasticity and Pain (CNAP), SMI®, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Cristina Dallabona
- Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Kristian Kjær-Staal Petersen
- Center for Neuroplasticity and Pain (CNAP), SMI®, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Center for Mathematical Modeling of Knee Osteoarthritis (MathKOA), Department of Material and Production, Faculty of Engineering and Science, Aalborg University, Aalborg, Denmark
| |
Collapse
|
2
|
Bhargava M, Crouser ED. Application of laboratory models for sarcoidosis research. J Autoimmun 2024; 149:103184. [PMID: 38443221 DOI: 10.1016/j.jaut.2024.103184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/12/2024] [Accepted: 02/15/2024] [Indexed: 03/07/2024]
Abstract
This manuscript will review the implications and applications of sarcoidosis models towards advancing our understanding of sarcoidosis disease mechanisms, identification of biomarkers, and preclinical testing of novel therapies. Emerging disease models and innovative research tools will also be considered.
Collapse
Affiliation(s)
- Maneesh Bhargava
- University of Minnesota Medical Center, Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, 420 Delaware Street SE, MMC 276. Minneapolis, MN 55455, USA
| | - Elliott D Crouser
- Ohio State University Wexner Medicine Center, Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, 241 W. 11th Street, Suite 5000, Columbus, OH 43201, USA.
| |
Collapse
|
3
|
Zhang R, Datta S. asmbPLS: biomarker identification and patient survival prediction with multi-omics data. Front Genet 2024; 15:1444054. [PMID: 39649094 PMCID: PMC11621212 DOI: 10.3389/fgene.2024.1444054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 11/11/2024] [Indexed: 12/10/2024] Open
Abstract
Introduction With the advancement of high-throughput studies, an increasing wealth of high-dimensional multi-omics data is being collected from the same patient cohort. However, leveraging this multi-omics data to predict survival outcomes poses a significant challenge due to its complex structure. Methods In this article, we present a novel approach, the Adaptive Sparse Multi-Block Partial Least Squares (asmbPLS) Regression model, which introduces a dynamic assignment of penalty factors to distinct blocks within various PLS components, facilitating effective feature selection and prediction. Results We compared the proposed method with several state-of-the-art algorithms encompassing prediction performance, feature selection and computation efficiency. We conducted comprehensive evaluations using both simulated data with various scenarios and a real dataset from the melanoma patients to validate the effectiveness and efficiency of the asmbPLS method. Additionally, we applied the lung squamous cell carcinoma (LUSC) dataset from The Cancer Genome Atlas (TCGA) to further assess the feature selection capability of asmbPLS. Discussion The inherent nature of asmbPLS imparts it with higher sensitivity in feature selection compared to other methods. Furthermore, an R package called asmbPLS implementing this method is made publicly available.
Collapse
Affiliation(s)
| | - Susmita Datta
- Department of Biostatistics, University of Florida, Gainesville, FL, United States
| |
Collapse
|
4
|
Hernández-Lemus E, Ochoa S. Methods for multi-omic data integration in cancer research. Front Genet 2024; 15:1425456. [PMID: 39364009 PMCID: PMC11446849 DOI: 10.3389/fgene.2024.1425456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 08/28/2024] [Indexed: 10/05/2024] Open
Abstract
Multi-omics data integration is a term that refers to the process of combining and analyzing data from different omic experimental sources, such as genomics, transcriptomics, methylation assays, and microRNA sequencing, among others. Such data integration approaches have the potential to provide a more comprehensive functional understanding of biological systems and has numerous applications in areas such as disease diagnosis, prognosis and therapy. However, quantitative integration of multi-omic data is a complex task that requires the use of highly specialized methods and approaches. Here, we discuss a number of data integration methods that have been developed with multi-omics data in view, including statistical methods, machine learning approaches, and network-based approaches. We also discuss the challenges and limitations of such methods and provide examples of their applications in the literature. Overall, this review aims to provide an overview of the current state of the field and highlight potential directions for future research.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| |
Collapse
|
5
|
Tiwari P, Tripathi LP. Long Non-Coding RNAs, Nuclear Receptors and Their Cross-Talks in Cancer-Implications and Perspectives. Cancers (Basel) 2024; 16:2920. [PMID: 39199690 PMCID: PMC11352509 DOI: 10.3390/cancers16162920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 07/30/2024] [Accepted: 08/14/2024] [Indexed: 09/01/2024] Open
Abstract
Long non-coding RNAs (lncRNAs) play key roles in various epigenetic and post-transcriptional events in the cell, thereby significantly influencing cellular processes including gene expression, development and diseases such as cancer. Nuclear receptors (NRs) are a family of ligand-regulated transcription factors that typically regulate transcription of genes involved in a broad spectrum of cellular processes, immune responses and in many diseases including cancer. Owing to their many overlapping roles as modulators of gene expression, the paths traversed by lncRNA and NR-mediated signaling often cross each other; these lncRNA-NR cross-talks are being increasingly recognized as important players in many cellular processes and diseases such as cancer. Here, we review the individual roles of lncRNAs and NRs, especially growth factor modulated receptors such as androgen receptors (ARs), in various types of cancers and how the cross-talks between lncRNAs and NRs are involved in cancer progression and metastasis. We discuss the challenges involved in characterizing lncRNA-NR associations and how to overcome them. Furthering our understanding of the mechanisms of lncRNA-NR associations is crucial to realizing their potential as prognostic features, diagnostic biomarkers and therapeutic targets in cancer biology.
Collapse
Affiliation(s)
- Prabha Tiwari
- Department of Microbiology and Immunology, Keio University School of Medicine, Shinjuku, Tokyo 160-8582, Japan
| | - Lokesh P. Tripathi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Kanagawa, Japan
- AI Center for Health and Biomedical Research (ArCHER), National Institutes of Biomedical Innovation, Health and Nutrition, Kento Innovation Park NK Building, 3-17 Senrioka Shinmachi, Settsu 566-0002, Osaka, Japan
| |
Collapse
|
6
|
Bartzis G, Peeters CFW, Ligterink W, Van Eeuwijk FA. A guided network estimation approach using multi-omic information. BMC Bioinformatics 2024; 25:202. [PMID: 38816801 PMCID: PMC11137963 DOI: 10.1186/s12859-024-05778-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 04/11/2024] [Indexed: 06/01/2024] Open
Abstract
INTODUCTION In systems biology, an organism is viewed as a system of interconnected molecular entities. To understand the functioning of organisms it is essential to integrate information about the variations in the concentrations of those molecular entities. This information can be structured as a set of networks with interconnections and with some hierarchical relations between them. Few methods exist for the reconstruction of integrative networks. OBJECTIVE In this work, we propose an integrative network reconstruction method in which the network organization for a particular type of omics data is guided by the network structure of a related type of omics data upstream in the omic cascade. The structure of these guiding data can be either already known or be estimated from the guiding data themselves. METHODS The method consists of three steps. First a network structure for the guiding data should be provided. Next, responses in the target set are regressed on the full set of predictors in the guiding data with a Lasso penalty to reduce the number of predictors and an L2 penalty on the differences between coefficients for predictors that share edges in the network for the guiding data. Finally, a network is reconstructed on the fitted target responses as functions of the predictors in the guiding data. This way we condition the target network on the network of the guiding data. CONCLUSIONS We illustrate our approach on two examples in Arabidopsis. The method detects groups of metabolites that have a similar genetic or transcriptomic basis.
Collapse
Affiliation(s)
- Georgios Bartzis
- Mathematical and Statistical Methods Group - Biometris, Wageningen University and Research, Wageningen, The Netherlands
| | - Carel F W Peeters
- Mathematical and Statistical Methods Group - Biometris, Wageningen University and Research, Wageningen, The Netherlands.
| | - Wilco Ligterink
- Laboratory of Plant Physiology, Wageningen University and Research, Wageningen, The Netherlands
| | - Fred A Van Eeuwijk
- Mathematical and Statistical Methods Group - Biometris, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
7
|
Vieira FG, Bispo R, Lopes MB. Integration of Multi-Omics Data for the Classification of Glioma Types and Identification of Novel Biomarkers. Bioinform Biol Insights 2024; 18:11779322241249563. [PMID: 38812741 PMCID: PMC11135104 DOI: 10.1177/11779322241249563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 04/09/2024] [Indexed: 05/31/2024] Open
Abstract
Glioma is currently one of the most prevalent types of primary brain cancer. Given its high level of heterogeneity along with the complex biological molecular markers, many efforts have been made to accurately classify the type of glioma in each patient, which, in turn, is critical to improve early diagnosis and increase survival. Nonetheless, as a result of the fast-growing technological advances in high-throughput sequencing and evolving molecular understanding of glioma biology, its classification has been recently subject to significant alterations. In this study, we integrate multiple glioma omics modalities (including mRNA, DNA methylation, and miRNA) from The Cancer Genome Atlas (TCGA), while using the revised glioma reclassified labels, with a supervised method based on sparse canonical correlation analysis (DIABLO) to discriminate between glioma types. We were able to find a set of highly correlated features distinguishing glioblastoma from lower-grade gliomas (LGGs) that were mainly associated with the disruption of receptor tyrosine kinases signaling pathways and extracellular matrix organization and remodeling. Concurrently, the discrimination of the LGG types was characterized primarily by features involved in ubiquitination and DNA transcription processes. Furthermore, we could identify several novel glioma biomarkers likely helpful in both diagnosis and prognosis of the patients, including the genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A, and HEPN1. Collectively, this comprehensive approach not only allowed a highly accurate discrimination of the different TCGA glioma patients but also presented a step forward in advancing our comprehension of the underlying molecular mechanisms driving glioma heterogeneity. Ultimately, our study also revealed novel candidate biomarkers that might constitute potential therapeutic targets, marking a significant stride toward personalized and more effective treatment strategies for patients with glioma.
Collapse
Affiliation(s)
- Francisca G Vieira
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
| | - Regina Bispo
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Department of Mathematics, NOVA School of Science and Technology, Caparica, Portugal
| | - Marta B Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Department of Mathematics, NOVA School of Science and Technology, Caparica, Portugal
- UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| |
Collapse
|
8
|
Zeng IS. Integrating omics atlas in health informatics system design-an opinion article. Front Digit Health 2024; 6:1374359. [PMID: 38784702 PMCID: PMC11111845 DOI: 10.3389/fdgth.2024.1374359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 04/22/2024] [Indexed: 05/25/2024] Open
Affiliation(s)
- Irene Suilan Zeng
- Department of Biostatistics and Epidemiology, Auckland University of Technology, Auckland, New Zealand
- School of Clinical Science, Faculty of Health and Environmental Sciences, Auckland University of Technology, Auckland, New Zealand
| |
Collapse
|
9
|
Gygi JP, Konstorum A, Pawar S, Aron E, Kleinstein SH, Guan L. A supervised Bayesian factor model for the identification of multi-omics signatures. Bioinformatics 2024; 40:btae202. [PMID: 38603606 PMCID: PMC11078774 DOI: 10.1093/bioinformatics/btae202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 02/29/2024] [Accepted: 04/10/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. RESULTS We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the reconstruction of underlying factors in synthetic examples and prediction accuracy of coronavirus disease 2019 severity and breast cancer tumor subtypes. AVAILABILITY AND IMPLEMENTATION SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.
Collapse
Affiliation(s)
- Jeremy P Gygi
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Anna Konstorum
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, United States
| | - Shrikant Pawar
- Department of Genetics, Yale Center for Genomic Analysis (YCGA), Yale School of Medicine, New Haven, CT 06520, United States
| | - Edel Aron
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Steven H Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, United States
- Department of Immunobiology, Yale School of Medicine, New Haven, CT 06520, United States
| | - Leying Guan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06520, United States
| |
Collapse
|
10
|
Sharma V, Singh A, Chauhan S, Sharma PK, Chaudhary S, Sharma A, Porwal O, Fuloria NK. Role of Artificial Intelligence in Drug Discovery and Target Identification in Cancer. Curr Drug Deliv 2024; 21:870-886. [PMID: 37670704 DOI: 10.2174/1567201821666230905090621] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 03/08/2023] [Accepted: 03/24/2023] [Indexed: 09/07/2023]
Abstract
Drug discovery and development (DDD) is a highly complex process that necessitates precise monitoring and extensive data analysis at each stage. Furthermore, the DDD process is both timeconsuming and costly. To tackle these concerns, artificial intelligence (AI) technology can be used, which facilitates rapid and precise analysis of extensive datasets within a limited timeframe. The pathophysiology of cancer disease is complicated and requires extensive research for novel drug discovery and development. The first stage in the process of drug discovery and development involves identifying targets. Cell structure and molecular functioning are complex due to the vast number of molecules that function constantly, performing various roles. Furthermore, scientists are continually discovering novel cellular mechanisms and molecules, expanding the range of potential targets. Accurately identifying the correct target is a crucial step in the preparation of a treatment strategy. Various forms of AI, such as machine learning, neural-based learning, deep learning, and network-based learning, are currently being utilised in applications, online services, and databases. These technologies facilitate the identification and validation of targets, ultimately contributing to the success of projects. This review focuses on the different types and subcategories of AI databases utilised in the field of drug discovery and target identification for cancer.
Collapse
Affiliation(s)
- Vishal Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Amit Singh
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Sanjana Chauhan
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Pramod Kumar Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Shubham Chaudhary
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Astha Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Omji Porwal
- Department of Pharmacognosy, Faculty of Pharmacy, Tishk International University, Erbil 44001, Iraq
| | | |
Collapse
|
11
|
Hai Y, Ma J, Yang K, Wen Y. Bayesian linear mixed model with multiple random effects for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2023; 39:btad647. [PMID: 37882747 PMCID: PMC10627352 DOI: 10.1093/bioinformatics/btad647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/24/2023] [Accepted: 10/24/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION Accurate disease risk prediction is an essential step in the modern quest for precision medicine. While high-dimensional multi-omics data have provided unprecedented data resources for prediction studies, their high-dimensionality and complex inter/intra-relationships have posed significant analytical challenges. RESULTS We proposed a two-step Bayesian linear mixed model framework (TBLMM) for risk prediction analysis on multi-omics data. TBLMM models the predictive effects from multi-omics data using a hybrid of the sparsity regression and linear mixed model with multiple random effects. It can resemble the shape of the true effect size distributions and accounts for non-linear, including interaction effects, among multi-omics data via kernel fusion. It infers its parameters via a computationally efficient variational Bayes algorithm. Through extensive simulation studies and the prediction analyses on the positron emission tomography imaging outcomes using data obtained from the Alzheimer's Disease Neuroimaging Initiative, we have demonstrated that TBLMM can consistently outperform the existing method in predicting the risk of complex traits. AVAILABILITY AND IMPLEMENTATION The corresponding R package is available on GitHub (https://github.com/YaluWen/TBLMM).
Collapse
Affiliation(s)
- Yang Hai
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Jixiang Ma
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Kaixin Yang
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Yalu Wen
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
12
|
Downing T, Angelopoulos N. A primer on correlation-based dimension reduction methods for multi-omics analysis. J R Soc Interface 2023; 20:20230344. [PMID: 37817584 PMCID: PMC10565429 DOI: 10.1098/rsif.2023.0344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/19/2023] [Indexed: 10/12/2023] Open
Abstract
The continuing advances of omic technologies mean that it is now more tangible to measure the numerous features collectively reflecting the molecular properties of a sample. When multiple omic methods are used, statistical and computational approaches can exploit these large, connected profiles. Multi-omics is the integration of different omic data sources from the same biological sample. In this review, we focus on correlation-based dimension reduction approaches for single omic datasets, followed by methods for pairs of omics datasets, before detailing further techniques for three or more omic datasets. We also briefly detail network methods when three or more omic datasets are available and which complement correlation-oriented tools. To aid readers new to this area, these are all linked to relevant R packages that can implement these procedures. Finally, we discuss scenarios of experimental design and present road maps that simplify the selection of appropriate analysis methods. This review will help researchers navigate emerging methods for multi-omics and integrating diverse omic datasets appropriately. This raises the opportunity of implementing population multi-omics with large sample sizes as omics technologies and our understanding improve.
Collapse
Affiliation(s)
- Tim Downing
- Pirbright Institute, Pirbright, Surrey, UK
- Department of Biotechnology, Dublin City University, Dublin, Ireland
| | | |
Collapse
|
13
|
Jiang C, Geng L, Wang J, Liang Y, Guo X, Liu C, Zhao Y, Jin J, Liu Z, Mu Y. Multiplexed Gene Engineering Based on dCas9 and gRNA-tRNA Array Encoded on Single Transcript. Int J Mol Sci 2023; 24:ijms24108535. [PMID: 37239880 DOI: 10.3390/ijms24108535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/04/2023] [Accepted: 05/05/2023] [Indexed: 05/28/2023] Open
Abstract
Simultaneously, multiplexed genome engineering and targeting multiple genomic loci are valuable to elucidating gene interactions and characterizing genetic networks that affect phenotypes. Here, we developed a general CRISPR-based platform to perform four functions and target multiple genome loci encoded in a single transcript. To establish multiple functions for multiple loci targets, we fused four RNA hairpins, MS2, PP7, com and boxB, to stem-loops of gRNA (guide RNA) scaffolds, separately. The RNA-hairpin-binding domains MCP, PCP, Com and λN22 were fused with different functional effectors. These paired combinations of cognate-RNA hairpins and RNA-binding proteins generated the simultaneous, independent regulation of multiple target genes. To ensure that all proteins and RNAs are expressed in one transcript, multiple gRNAs were constructed in a tandemly arrayed tRNA (transfer RNA)-gRNA architecture, and the triplex sequence was cloned between the protein-coding sequences and the tRNA-gRNA array. By leveraging this system, we illustrate the transcriptional activation, transcriptional repression, DNA methylation and DNA demethylation of endogenous targets using up to 16 individual CRISPR gRNAs delivered on a single transcript. This system provides a powerful platform to investigate synthetic biology questions and engineer complex-phenotype medical applications.
Collapse
Affiliation(s)
- Chaoqian Jiang
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Lishuang Geng
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Jinpeng Wang
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
| | - Yingjuan Liang
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
| | - Xiaochen Guo
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
| | - Chang Liu
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
| | - Yunjing Zhao
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
| | - Junxue Jin
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Zhonghua Liu
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| | - Yanshuang Mu
- Key Laboratory of Animal Cellular and Genetic Engineering of Heilongjiang Province, Northeast Agricultural University, Harbin 150030, China
- College of Life Science, Northeast Agricultural University, Harbin 150030, China
| |
Collapse
|
14
|
Zhang R, Datta S. asmbPLS: Adaptive Sparse Multi-block Partial Least Square for Survival Prediction using Multi-Omics Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535442. [PMID: 37066143 PMCID: PMC10103991 DOI: 10.1101/2023.04.03.535442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Background As high-throughput studies advance, more and more high-dimensional multi-omics data are available and collected from the same patient cohort. Using multi-omics data as predictors to predict survival outcomes is challenging due to the complex structure of such data. Results In this article, we introduce an adaptive sparse multi-block partial least square (asmbPLS) regression method by assigning different penalty factors to different blocks in different PLS components for feature selection and prediction. We compared the proposed method with several competitive algorithms in many aspects including prediction performance, feature selection and computation efficiency. The performance and the efficiency of our method were demonstrated using both the simulated and the real data. Conclusions In summary, asmbPLS achieved a competitive performance in prediction, feature selection, and computation efficiency. We anticipate asmbPLS to be a valuable tool for multi-omics research. An R package called asmbPLS implementing this method is made publicly available on GitHub.
Collapse
|
15
|
Ochoa S, Hernández-Lemus E. Functional impact of multi-omic interactions in breast cancer subtypes. Front Genet 2023; 13:1078609. [PMID: 36685900 PMCID: PMC9850112 DOI: 10.3389/fgene.2022.1078609] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 12/15/2022] [Indexed: 01/07/2023] Open
Abstract
Multi-omic approaches are expected to deliver a broader molecular view of cancer. However, the promised mechanistic explanations have not quite settled yet. Here, we propose a theoretical and computational analysis framework to semi-automatically produce network models of the regulatory constraints influencing a biological function. This way, we identified functions significantly enriched on the analyzed omics and described associated features, for each of the four breast cancer molecular subtypes. For instance, we identified functions sustaining over-representation of invasion-related processes in the basal subtype and DNA modification processes in the normal tissue. We found limited overlap on the omics-associated functions between subtypes; however, a startling feature intersection within subtype functions also emerged. The examples presented highlight new, potentially regulatory features, with sound biological reasons to expect a connection with the functions. Multi-omic regulatory networks thus constitute reliable models of the way omics are connected, demonstrating a capability for systematic generation of mechanistic hypothesis.
Collapse
Affiliation(s)
- Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico,Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico,Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico,*Correspondence: Enrique Hernández-Lemus,
| |
Collapse
|
16
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
Affiliation(s)
- Efi Athieniti
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| | - George M. Spyrou
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| |
Collapse
|
17
|
Lei J, Cai Z, He X, Zheng W, Liu J. An approach of gene regulatory network construction using mixed entropy optimizing context-related likelihood mutual information. Bioinformatics 2022; 39:6808612. [PMID: 36342190 PMCID: PMC9805593 DOI: 10.1093/bioinformatics/btac717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 09/18/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION The question of how to construct gene regulatory networks has long been a focus of biological research. Mutual information can be used to measure nonlinear relationships, and it has been widely used in the construction of gene regulatory networks. However, this method cannot measure indirect regulatory relationships under the influence of multiple genes, which reduces the accuracy of inferring gene regulatory networks. APPROACH This work proposes a method for constructing gene regulatory networks based on mixed entropy optimizing context-related likelihood mutual information (MEOMI). First, two entropy estimators were combined to calculate the mutual information between genes. Then, distribution optimization was performed using a context-related likelihood algorithm to eliminate some indirect regulatory relationships and obtain the initial gene regulatory network. To obtain the complex interaction between genes and eliminate redundant edges in the network, the initial gene regulatory network was further optimized by calculating the conditional mutual inclusive information (CMI2) between gene pairs under the influence of multiple genes. The network was iteratively updated to reduce the impact of mutual information on the overestimation of the direct regulatory intensity. RESULTS The experimental results show that the MEOMI method performed better than several other kinds of gene network construction methods on DREAM challenge simulated datasets (DREAM3 and DREAM5), three real Escherichia coli datasets (E.coli SOS pathway network, E.coli SOS DNA repair network and E.coli community network) and two human datasets. AVAILABILITY AND IMPLEMENTATION Source code and dataset are available at https://github.com/Dalei-Dalei/MEOMI/ and http://122.205.95.139/MEOMI/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimeng Lei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zongheng Cai
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi He
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wanting Zheng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | | |
Collapse
|
18
|
Network-based integration of multi-omics data for clinical outcome prediction in neuroblastoma. Sci Rep 2022; 12:15425. [PMID: 36104347 PMCID: PMC9475034 DOI: 10.1038/s41598-022-19019-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 08/23/2022] [Indexed: 11/08/2022] Open
Abstract
AbstractMulti-omics data are increasingly being gathered for investigations of complex diseases such as cancer. However, high dimensionality, small sample size, and heterogeneity of different omics types pose huge challenges to integrated analysis. In this paper, we evaluate two network-based approaches for integration of multi-omics data in an application of clinical outcome prediction of neuroblastoma. We derive Patient Similarity Networks (PSN) as the first step for individual omics data by computing distances among patients from omics features. The fusion of different omics can be investigated in two ways: the network-level fusion is achieved using Similarity Network Fusion algorithm for fusing the PSNs derived for individual omics types; and the feature-level fusion is achieved by fusing the network features obtained from individual PSNs. We demonstrate our methods on two high-risk neuroblastoma datasets from SEQC project and TARGET project. We propose Deep Neural Network and Machine Learning methods with Recursive Feature Elimination as the predictor of survival status of neuroblastoma patients. Our results indicate that network-level fusion outperformed feature-level fusion for integration of different omics data whereas feature-level fusion is more suitable incorporating different feature types derived from same omics type. We conclude that the network-based methods are capable of handling heterogeneity and high dimensionality well in the integration of multi-omics.
Collapse
|
19
|
Shan X, Chen J, Dong K, Zhou W, Zhang S. Deciphering the Spatial Modular Patterns of Tissues by Integrating Spatial and Single-Cell Transcriptomic Data. J Comput Biol 2022; 29:650-663. [PMID: 35727094 DOI: 10.1089/cmb.2021.0617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) provides a powerful tool to analyze the expression level of tissues at a cellular resolution. However, it could not capture the spatial organization of cells in a tissue. The spatially resolved transcriptomics technologies (ST) have been developed to address this issue. However, the emerging STs are still inefficient at single-cell resolution and/or fail to capture the sufficient reads. To this end, we adopted a partial least squares-based method (spatial modular patterns [SpaMOD]) to simultaneously integrate the two data modalities, as well as the networks related to cells and spots, to identify the cell-spot comodules for deciphering the SpaMOD of tissues. We applied SpaMOD to three paired scRNA-seq and ST datasets, derived from the mouse brain, granuloma, and pancreatic ductal adenocarcinoma, respectively. The identified cell-spot comodules provide detailed biological insights into the spatial relationships between cell populations and their spatial locations in the tissue.
Collapse
Affiliation(s)
- Xu Shan
- Department of Software Engineering, Yunnan University, Kunming, China
| | - Jinyu Chen
- College of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing, China
| | - Kangning Dong
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Wei Zhou
- Department of Software Engineering, Yunnan University, Kunming, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.,Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
| |
Collapse
|
20
|
Drouard G, Ollikainen M, Mykkänen J, Raitakari O, Lehtimäki T, Kähönen M, Mishra PP, Wang X, Kaprio J. Multi-Omics Integration in a Twin Cohort and Predictive Modeling of Blood Pressure Values. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:130-141. [PMID: 35259029 PMCID: PMC8978565 DOI: 10.1089/omi.2021.0201] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/21/2023]
Abstract
Abnormal blood pressure is strongly associated with risk of high-prevalence diseases, making the study of blood pressure a major public health challenge. Although biological mechanisms underlying hypertension at the single omic level have been discovered, multi-omics integrative analyses using continuous variations in blood pressure values remain limited. We used a multi-omics regression-based method, called sparse multi-block partial least square, for integrative, explanatory, and predictive interests in study of systolic and diastolic blood pressure values. Various datasets were obtained from the Finnish Twin Cohort for up to 444 twins. Blocks of omics-including transcriptomic, methylation, metabolomic-data as well as polygenic risk scores and clinical data were integrated into the modeling and supported by cross-validation. The predictive contribution of each omics block when predicting blood pressure values was investigated using external participants from the Young Finns Study. In addition to revealing interesting inter-omics associations, we found that each block of omics heterogeneously improved the predictions of blood pressure values once the multi-omics data were integrated. The modeling revealed a plurality of clinical, transcriptomic, and metabolomic factors consistent with the literature and that play a leading role in explaining unit variations in blood pressure. These findings demonstrate (1) the robustness of our integrative method to harness results obtained by single omics discriminant analyses, and (2) the added value of predictive and exploratory gains of a multi-omics approach in studies of complex phenotypes such as blood pressure.
Collapse
Affiliation(s)
- Gabin Drouard
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Address correspondence to: Gabin Drouard, MSc, Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Tukholmankatu 8, Helsinki 00014, Finland
| | - Miina Ollikainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Juha Mykkänen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Olli Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Mika Kähönen
- Department of Clinical Physiology, Tampere University Hospital, and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Pashupati P. Mishra
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Xiaoling Wang
- Georgia Prevention Institute (GPI), Medical College of Georgia, Augusta University, Augusta, Georgia, USA
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| |
Collapse
|
21
|
Bayley JC, Hadley CC, Harmanci AO, Harmanci AS, Klisch TJ, Patel AJ. Multiple approaches converge on three biological subtypes of meningioma and extract new insights from published studies. SCIENCE ADVANCES 2022; 8:eabm6247. [PMID: 35108039 PMCID: PMC11313601 DOI: 10.1126/sciadv.abm6247] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 12/10/2021] [Indexed: 05/09/2023]
Abstract
One-fifth of meningiomas classified as benign by World Health Organization (WHO) histopathological grading will behave malignantly. To better diagnose these tumors, several groups turned to DNA methylation, whereas we combined RNA-sequencing (RNA-seq) and cytogenetics. Both approaches were more accurate than histopathology in identifying aggressive tumors, but whether they revealed similar tumor types was unclear. We therefore performed unbiased DNA methylation, RNA-seq, and cytogenetic profiling on 110 primary meningiomas WHO grade I and II). Each technique distinguished the same three groups (two benign and one malignant) as our previous molecular classification; integrating these methods into one classifier further improved accuracy. Computational modeling revealed strong correlations between transcription and cytogenetic changes, particularly loss of chromosome 1p, in malignant tumors. Applying our classifier to data from previous studies also resolved certain anomalies entailed by grouping tumors by WHO grade. Accurate classification will therefore elucidate meningioma biology as well as improve diagnosis and prognosis.
Collapse
Affiliation(s)
- James C. Bayley
- Department of Neurosurgery, Baylor College of Medicine, Houston , TX 77030, USA
| | - Caroline C. Hadley
- Department of Neurosurgery, Baylor College of Medicine, Houston , TX 77030, USA
| | - Arif O. Harmanci
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center , Houston , TX 77030, USA
| | - Akdes S. Harmanci
- Department of Neurosurgery, Baylor College of Medicine, Houston , TX 77030, USA
| | - Tiemo J. Klisch
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston , TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Akash J. Patel
- Department of Neurosurgery, Baylor College of Medicine, Houston , TX 77030, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston , TX 77030, USA
- Department of Otolaryngology–Head and Neck Surgery, Baylor College of Medicine, Houston , TX 77030, USA
| |
Collapse
|
22
|
Nguyen H, Tran D, Tran B, Roy M, Cassell A, Dascalu S, Draghici S, Nguyen T. SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis. Front Oncol 2021; 11:725133. [PMID: 34745946 PMCID: PMC8563705 DOI: 10.3389/fonc.2021.725133] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/28/2021] [Indexed: 12/25/2022] Open
Abstract
Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com. The R package will be deposited to CRAN as part of our PINSPlus software suite.
Collapse
Affiliation(s)
- Hung Nguyen
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Duc Tran
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Bang Tran
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Monikrishna Roy
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Adam Cassell
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Sergiu Dascalu
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, United States
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada Reno, Reno, NV, United States
| |
Collapse
|
23
|
Zheng X, Amos CI, Frost HR. Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction. BMC Cancer 2021; 21:1053. [PMID: 34563154 PMCID: PMC8467202 DOI: 10.1186/s12885-021-08796-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 08/16/2021] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Over the past decades, approaches for diagnosing and treating cancer have seen significant improvement. However, the variability of patient and tumor characteristics has limited progress on methods for prognosis prediction. The development of high-throughput omics technologies now provides multiple approaches for characterizing tumors. Although a large number of published studies have focused on integration of multi-omics data and use of pathway-level models for cancer prognosis prediction, there still exists a gap of knowledge regarding the prognostic landscape across multi-omics data for multiple cancer types using both gene-level and pathway-level predictors. METHODS In this study, we systematically evaluated three often available types of omics data (gene expression, copy number variation and somatic point mutation) covering both DNA-level and RNA-level features. We evaluated the landscape of predictive performance of these three omics modalities for 33 cancer types in the TCGA using a Lasso or Group Lasso-penalized Cox model and either gene or pathway level predictors. RESULTS We constructed the prognostic landscape using three types of omics data for 33 cancer types on both the gene and pathway levels. Based on this landscape, we found that predictive performance is cancer type dependent and we also highlighted the cancer types and omics modalities that support the most accurate prognostic models. In general, models estimated on gene expression data provide the best predictive performance on either gene or pathway level and adding copy number variation or somatic point mutation data to gene expression data does not improve predictive performance, with some exceptional cohorts including low grade glioma and thyroid cancer. In general, pathway-level models have better interpretative performance, higher stability and smaller model size across multiple cancer types and omics data types relative to gene-level models. CONCLUSIONS Based on this landscape and comprehensively comparison, models estimated on gene expression data provide the best predictive performance on either gene or pathway level. Pathway-level models have better interpretative performance, higher stability and smaller model size relative to gene-level models.
Collapse
Affiliation(s)
- Xingyu Zheng
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA. .,Department of Medicine, Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA.
| | - H Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA.
| |
Collapse
|
24
|
Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors. J MULTIVARIATE ANAL 2021. [DOI: 10.1016/j.jmva.2021.104781] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
25
|
Vahabi N, McDonough CW, Desai AA, Cavallari LH, Duarte JD, Michailidis G. Cox-sMBPLS: An Algorithm for Disease Survival Prediction and Multi-Omics Module Discovery Incorporating Cis-Regulatory Quantitative Effects. Front Genet 2021; 12:701405. [PMID: 34408773 PMCID: PMC8366414 DOI: 10.3389/fgene.2021.701405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/07/2021] [Indexed: 12/03/2022] Open
Abstract
Background The development of high-throughput techniques has enabled profiling a large number of biomolecules across a number of molecular compartments. The challenge then becomes to integrate such multimodal Omics data to gain insights into biological processes and disease onset and progression mechanisms. Further, given the high dimensionality of such data, incorporating prior biological information on interactions between molecular compartments when developing statistical models for data integration is beneficial, especially in settings involving a small number of samples. Results We develop a supervised model for time to event data (e.g., death, biochemical recurrence) that simultaneously accounts for redundant information within Omics profiles and leverages prior biological associations between them through a multi-block PLS framework. The interactions between data from different molecular compartments (e.g., epigenome, transcriptome, methylome, etc.) were captured by using cis-regulatory quantitative effects in the proposed model. The model, coined Cox-sMBPLS, exhibits superior prediction performance and improved feature selection based on both simulation studies and analysis of data from heart failure patients. Conclusion The proposed supervised Cox-sMBPLS model can effectively incorporate prior biological information in the survival prediction system, leading to improved prediction performance and feature selection. It also enables the identification of multi-Omics modules of biomolecules that impact the patients’ survival probability and also provides insights into potential relevant risk factors that merit further investigation.
Collapse
Affiliation(s)
- Nasim Vahabi
- Informatics Institute, University of Florida, Gainesville, FL, United States
| | - Caitrin W McDonough
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, University of Florida, Gainesville, FL, United States
| | - Ankit A Desai
- Department of Medicine, Indiana University, Indianapolis, IN, United States
| | - Larisa H Cavallari
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, University of Florida, Gainesville, FL, United States
| | - Julio D Duarte
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, University of Florida, Gainesville, FL, United States
| | - George Michailidis
- Informatics Institute, University of Florida, Gainesville, FL, United States
| |
Collapse
|
26
|
Duan R, Gao L, Gao Y, Hu Y, Xu H, Huang M, Song K, Wang H, Dong Y, Jiang C, Zhang C, Jia S. Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Comput Biol 2021; 17:e1009224. [PMID: 34383739 PMCID: PMC8384175 DOI: 10.1371/journal.pcbi.1009224] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/24/2021] [Accepted: 06/28/2021] [Indexed: 11/18/2022] Open
Abstract
Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis. Cancer is one of the most heterogeneous diseases, characterized by diverse morphological, phenotypic, and genomic profiles between tumors and their subtypes. Identifying cancer subtypes can help patients receive precise treatments. With the development of high-throughput technologies, genomics, epigenomics, and transcriptomics data have been generated for large cancer patient cohorts. It is believed that the more omics data we use, the more accurate identification of cancer subtypes. To examine this assumption, we first constructed three classes of benchmarking datasets to conduct a comprehensive evaluation and comparison of ten representative multi-omics data integration methods for cancer subtyping by considering their accuracy, robustness, and computational efficiency. Then, we investigated the influence of different omics data and their various combinations on the effectiveness of cancer subtyping. Our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. We hope that our work may help researchers choose a proper method and an effective data combination when identifying cancer subtypes using data integration methods.
Collapse
Affiliation(s)
- Ran Duan
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, China
- * E-mail:
| | - Yong Gao
- Department of Computer Science, The University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Han Xu
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Mingfeng Huang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Kuo Song
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Hongda Wang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Yongqiang Dong
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chaoqun Jiang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Chenxing Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, China
| | - Songwei Jia
- School of Computer Science and Technology, Xidian University, Xi’an, China
| |
Collapse
|
27
|
Kosvyra A, Ntzioni E, Chouvarda I. Network analysis with biological data of cancer patients: A scoping review. J Biomed Inform 2021; 120:103873. [PMID: 34298154 DOI: 10.1016/j.jbi.2021.103873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 06/30/2021] [Accepted: 07/18/2021] [Indexed: 12/25/2022]
Abstract
BACKGROUND & OBJECTIVE Network Analysis (NA) is a mathematical method that allows exploring relations between units and representing them as a graph. Although NA was initially related to social sciences, the past two decades was introduced in Bioinformatics. The recent growth of the networks' use in biological data analysis reveals the need to further investigate this area. In this work, we attempt to identify the use of NA with biological data, and specifically: (a) what types of data are used and whether they are integrated or not, (b) what is the purpose of this analysis, predictive or descriptive, and (c) the outcome of such analyses, specifically in cancer diseases. METHODS & MATERIALS The literature review was conducted on two databases, PubMed & IEEE, and was restricted to journal articles of the last decade (January 2010 - December 2019). At a first level, all articles were screened by title and abstract, and at a second level the screening was conducted by reading the full text article, following the predefined inclusion & exclusion criteria leading to 131 articles of interest. A table was created with the information of interest and was used for the classification of the articles. The articles were initially classified to analysis studies and studies that propose a new algorithm or methodology. Each one of these categories was further screened by the following clustering criteria: (a) data used, (b) study purpose, (c) study outcome. Specifically for the studies proposing a new algorithm, the novelty presented in each one was detected. RESULTS & Conclusions: In the past five years researchers are focusing on creating new algorithms and methodologies to enhance this field. The articles' classification revealed that only 25% of the analyses are integrating multi-omics data, although 50% of the new algorithms developed follow this integrative direction. Moreover, only 20% of the analyses and 10% of the newly developed methodologies have a predictive purpose. Regarding the result of the works reviewed, 75% of the studies focus on identifying, prognostic or not, gene signatures. Concluding, this review revealed the need for deploying predictive and multi-omics integrative algorithms and methodologies that can be used to enhance cancer diagnosis, prognosis and treatment.
Collapse
Affiliation(s)
- A Kosvyra
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - E Ntzioni
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - I Chouvarda
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
28
|
Akbarzadeh M, Dehkordi SR, Roudbar MA, Sargolzaei M, Guity K, Sedaghati-Khayat B, Riahi P, Azizi F, Daneshpour MS. GWAS findings improved genomic prediction accuracy of lipid profile traits: Tehran Cardiometabolic Genetic Study. Sci Rep 2021; 11:5780. [PMID: 33707626 PMCID: PMC7952573 DOI: 10.1038/s41598-021-85203-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 02/26/2021] [Indexed: 12/15/2022] Open
Abstract
In recent decades, ongoing GWAS findings discovered novel therapeutic modifications such as whole-genome risk prediction in particular. Here, we proposed a method based on integrating the traditional genomic best linear unbiased prediction (gBLUP) approach with GWAS information to boost genetic prediction accuracy and gene-based heritability estimation. This study was conducted in the framework of the Tehran Cardio-metabolic Genetic study (TCGS) containing 14,827 individuals and 649,932 SNP markers. Five SNP subsets were selected based on GWAS results: top 1%, 5%, 10%, 50% significant SNPs, and reported associated SNPs in previous studies. Furthermore, we randomly selected subsets as large as every five subsets. Prediction accuracy has been investigated on lipid profile traits with a tenfold and 10-repeat cross-validation algorithm by the gBLUP method. Our results revealed that genetic prediction based on selected subsets of SNPs obtained from the dataset outperformed the subsets from previously reported SNPs. Selected SNPs' subsets acquired a more precise prediction than whole SNPs and much higher than randomly selected SNPs. Also, common SNPs with the most captured prediction accuracy in the selected sets caught the highest gene-based heritability. However, it is better to be mindful of the fact that a small number of SNPs obtained from GWAS results could capture a highly notable proportion of variance and prediction accuracy.
Collapse
Affiliation(s)
- Mahdi Akbarzadeh
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Saeid Rasekhi Dehkordi
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Mahmoud Amiri Roudbar
- Department of Animal Science, Safiabad-Dezful Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education & Extension Organization (AREEO), Dezful, Iran
| | - Mehdi Sargolzaei
- Department of Pathobiology, Ontario Veterinary College, University of Guelph, Guelph, Canada
- Select Sires Inc., Plain City, USA
| | - Kamran Guity
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Bahareh Sedaghati-Khayat
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Parisa Riahi
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Maryam S Daneshpour
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran.
| |
Collapse
|
29
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
30
|
Tan K, Huang W, Hu J, Dong S. A multi-omics supervised autoencoder for pan-cancer clinical outcome endpoints prediction. BMC Med Inform Decis Mak 2020; 20:129. [PMID: 32646413 PMCID: PMC7477832 DOI: 10.1186/s12911-020-1114-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background With the rapid development of sequencing technologies, collecting diverse types of cancer omics data become more cost-effective. Many computational methods attempted to represent and fuse multiple omics into a comprehensive view of cancer. However, different types of omics are related and heterogeneous. Most of the existing methods do not consider the difference between omics, so the biological knowledge of individual omics may not be fully excavated. And for a given task (e.g. predicting overall survival), these methods prefer to use sample similarity or domain knowledge to learn a more reasonable representation of omics, but it’s not enough. Methods For the purpose of learning more useful representation for individual omics and fusing them to improve the prediction ability, we proposed an autoencoder-based method named MOSAE (Multi-omics Supervised Autoencoder). In our method, a specific autoencoder were designed for each omics according to their size of dimension to generate omics-specific representations. Then, a supervised autoencoder was constructed based on specific autoencoder by using labels to enforce each specific autoencoder to learn both omics-specific and task-specific representations. Finally, representations of different omics that generate from supervised autoencoders were fused in a traditional but powerful way, and the fused representation was used for subsequent predictive tasks. Results We applied our method over TCGA Pan-Cancer dataset to predict four different clinical outcome endpoints (OS, PFI, DFI, and DSS). Compared with traditional and state-of-the-art methods, MOSAE achieved better predictive performance. We also tested the effects of each improvement, which all have a positive effect on predictive performance. Conclusions Predicting clinical outcome endpoints are very important for precision medicine and personalized medicine. And multi-omics fusion is an effective way to solve this problem. MOSAE is a powerful multi-omics fusion method, which can generate both omics-specific and task-specific representation for given endpoint predictive tasks and improve the predictive performance.
Collapse
Affiliation(s)
- Kaiwen Tan
- Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, Guangzhou, 381, China
| | - Weixian Huang
- Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, Guangzhou, 381, China
| | - Jinlong Hu
- Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, Guangzhou, 381, China
| | - Shoubin Dong
- Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, Guangzhou, 381, China.
| |
Collapse
|
31
|
Huang J, Chen J, Zhang B, Zhu L, Cai H. Evaluation of gene-drug common module identification methods using pharmacogenomics data. Brief Bioinform 2020; 22:5860683. [PMID: 32591780 DOI: 10.1093/bib/bbaa087] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 04/06/2020] [Accepted: 04/23/2020] [Indexed: 01/21/2023] Open
Abstract
Accurately identifying the interactions between genomic factors and the response of cancer drugs plays important roles in drug discovery, drug repositioning and cancer treatment. A number of studies revealed that interactions between genes and drugs were 'many-genes-to-many drugs' interactions, i.e. common modules, opposed to 'one-gene-to-one-drug' interactions. Such modules fully explain the interactions between complex biological regulatory mechanisms and cancer drugs. However, strategies for effectively and robustly identifying the underlying common modules among pharmacogenomics data remain to be improved. In this paper, we aim to provide a detailed evaluation of three categories of state-of-the-art common module identification techniques from a machine learning perspective, including non-negative matrix factorization (NMF), partial least squares (PLS) and network analyses. We first evaluate the performance of six methods, namely SNMNMF, NetNMF, SNPLS, O2PLS, NSBM and HOGMMNC, using two series of simulated data sets with different noise levels and outlier ratios. Then, we conduct experiments using a real world data set of 2091 genes and 101 drugs in 392 cancer cell lines and compare the real experimental results from the aspect of biological process term enrichment, gene-drug and drug-drug interactions. Finally, we present interesting findings from our evaluation study and discuss the advantages and drawbacks of each method. Supplementary information: Supplementary file is available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Jie Huang
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Jiazhou Chen
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Bin Zhang
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Lei Zhu
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Hongmin Cai
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| |
Collapse
|
32
|
Chen T, Tyagi S. Integrative computational epigenomics to build data-driven gene regulation hypotheses. Gigascience 2020; 9:giaa064. [PMID: 32543653 PMCID: PMC7297091 DOI: 10.1093/gigascience/giaa064] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 05/25/2020] [Accepted: 05/26/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is non-trivial to unify these complex datasets into an interpretable phenotype. Although recent methods address this problem with varying degrees of success, they are constrained by their scopes or limitations. Therefore, an important gap in the field is the lack of a universal data harmonizer with the capability to arbitrarily integrate multi-modal datasets. RESULTS In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework. CONCLUSIONS A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool; (2) investigating these pathways further allows the biological community to better understand a disease's mechanisms; and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.
Collapse
Affiliation(s)
- Tyrone Chen
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Sonika Tyagi
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
33
|
Ochoa S, de Anda-Jáuregui G, Hernández-Lemus E. Multi-Omic Regulation of the PAM50 Gene Signature in Breast Cancer Molecular Subtypes. Front Oncol 2020; 10:845. [PMID: 32528899 PMCID: PMC7259379 DOI: 10.3389/fonc.2020.00845] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 04/29/2020] [Indexed: 12/24/2022] Open
Abstract
Breast cancer is a disease that exhibits heterogeneity that goes from the genomic to the clinical levels. This heterogeneity is thought to be captured (at least partially) by the so-called breast cancer molecular subtypes. These molecular subtypes were initially defined based on the unsupervised clustering of gene expression and its correlate with histological, morphological, phenotypic and clinical features already known. Later, a 50-gene signature, PAM50, was defined in order to identify the biological subtype of a given sample within the clinical setting. The PAM50 signature was obtained by the use of unsupervised statistical methods, and therefore no limitation was set on the biological relevance (or lack of) of the selected genes beyond its predictive capacity. An open question that remains is what are the regulatory elements that drive the various expression behaviors of this set of genes in the different molecular subtypes. This question becomes more relevant as the measurement of more biological layers of regulation becomes accessible. In this work, we analyzed the gene expression regulation of the 50 genes in the PAM50 signature, in terms of (a) gene co-expression, (b) transcription factors, (c) micro-RNAs, and (d) methylation. Using data from the Cancer Genome Atlas (TCGA) for the Luminal A and B, Basal, and HER2-enriched molecular subtypes as well as normal tumor adjacent tissue, we identified predictors for gene expression through the use of an elastic net model. We compare and contrast the sets of identified regulators for the gene signature in each molecular subtype, and systematically compare them to current literature. We also identified a unique set of predictors for the expression of genes in the PAM50 signature associated with each of the molecular subtypes. Most selected predictors are exclusive for a PAM50 gene and predictors are not shared across subtypes. There are only 13 coding transcripts and 2 miRNAs selected for the four subtypes. MiR-21 and miR-10b connect almost all the PAM50 genes in all the subtypes and normal tissue, but do it in an exclusive manner, suggesting a cancer switch from miR-10b coordination in normal tissue to miR-21. The PAM50 gene sets of selected predictors that enrich for a function across subtypes, support that different regulatory molecular mechanisms are taking place. With this study we aim to a wider understanding of the regulatory mechanisms that differentiate the expression of the PAM50 signature, which in turn could perhaps help understand the molecular basis of the differences between the molecular subtypes.
Collapse
Affiliation(s)
- Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Graduate Program in Biomedical Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Guillermo de Anda-Jáuregui
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Cátedras Conacyt para Jóvenes Investigadores', National Council on Science and Technology, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico.,Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
34
|
Tini G, Marchetti L, Priami C, Scott-Boyer MP. Multi-omics integration-a comparison of unsupervised clustering methodologies. Brief Bioinform 2020; 20:1269-1279. [PMID: 29272335 DOI: 10.1093/bib/bbx167] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Revised: 11/06/2017] [Indexed: 12/19/2022] Open
Abstract
With the recent developments in the field of multi-omics integration, the interest in factors such as data preprocessing, choice of the integration method and the number of different omics considered had increased. In this work, the impact of these factors is explored when solving the problem of sample classification, by comparing the performances of five unsupervised algorithms: Multiple Canonical Correlation Analysis, Multiple Co-Inertia Analysis, Multiple Factor Analysis, Joint and Individual Variation Explained and Similarity Network Fusion. These methods were applied to three real data sets taken from literature and several ad hoc simulated scenarios to discuss classification performance in different conditions of noise and signal strength across the data types. The impact of experimental design, feature selection and parameter training has been also evaluated to unravel important conditions that can affect the accuracy of the result.
Collapse
|
35
|
Chen J, Han G, Xu A, Cai H. Identification of Multidimensional Regulatory Modules Through Multi-Graph Matching With Network Constraints. IEEE Trans Biomed Eng 2020; 67:987-998. [DOI: 10.1109/tbme.2019.2927157] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
36
|
Oh M, Park S, Kim S, Chae H. Machine learning-based analysis of multi-omics data on the cloud for investigating gene regulations. Brief Bioinform 2020; 22:66-76. [PMID: 32227074 DOI: 10.1093/bib/bbaa032] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 02/05/2020] [Accepted: 02/25/2020] [Indexed: 02/06/2023] Open
Abstract
Gene expressions are subtly regulated by quantifiable measures of genetic molecules such as interaction with other genes, methylation, mutations, transcription factor and histone modifications. Integrative analysis of multi-omics data can help scientists understand the condition or patient-specific gene regulation mechanisms. However, analysis of multi-omics data is challenging since it requires not only the analysis of multiple omics data sets but also mining complex relations among different genetic molecules by using state-of-the-art machine learning methods. In addition, analysis of multi-omics data needs quite large computing infrastructure. Moreover, interpretation of the analysis results requires collaboration among many scientists, often requiring reperforming analysis from different perspectives. Many of the aforementioned technical issues can be nicely handled when machine learning tools are deployed on the cloud. In this survey article, we first survey machine learning methods that can be used for gene regulation study, and we categorize them according to five different goals: gene regulatory subnetwork discovery, disease subtype analysis, survival analysis, clinical prediction and visualization. We also summarize the methods in terms of multi-omics input types. Then, we explain why the cloud is potentially a good solution for the analysis of multi-omics data, followed by a survey of two state-of-the-art cloud systems, Galaxy and BioVLAB. Finally, we discuss important issues when the cloud is used for the analysis of multi-omics data for the gene regulation study.
Collapse
Affiliation(s)
- Minsik Oh
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea
| | - Sungjoon Park
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea.,Bioinformatics Institute, Seoul National University, Seoul, 08826, Korea
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women's University, Seoul, 04310,Korea
| |
Collapse
|
37
|
Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020; 21:368-394. [PMID: 30649169 PMCID: PMC7373185 DOI: 10.1093/bib/bby120] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open
Abstract
Cancer is well recognized as a complex disease with dysregulated molecular networks or modules. Graph- and rule-based analytics have been applied extensively for cancer classification as well as prognosis using large genomic and other data over the past decade. This article provides a comprehensive review of various graph- and rule-based machine learning algorithms that have been applied to numerous genomics data to determine the cancer-specific gene modules, identify gene signature-based classifiers and carry out other related objectives of potential therapeutic value. This review focuses mainly on the methodological design and features of these algorithms to facilitate the application of these graph- and rule-based analytical approaches for cancer classification and prognosis. Based on the type of data integration, we divided all the algorithms into three categories: model-based integration, pre-processing integration and post-processing integration. Each category is further divided into four sub-categories (supervised, unsupervised, semi-supervised and survival-driven learning analyses) based on learning style. Therefore, a total of 11 categories of methods are summarized with their inputs, objectives and description, advantages and potential limitations. Next, we briefly demonstrate well-known and most recently developed algorithms for each sub-category along with salient information, such as data profiles, statistical or feature selection methods and outputs. Finally, we summarize the appropriate use and efficiency of all categories of graph- and rule mining-based learning methods when input data and specific objective are given. This review aims to help readers to select and use the appropriate algorithms for cancer classification and prognosis study.
Collapse
Affiliation(s)
- Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| |
Collapse
|
38
|
Amiri Roudbar M, Mohammadabadi MR, Ayatollahi Mehrgardi A, Abdollahi-Arpanahi R, Momen M, Morota G, Brito Lopes F, Gianola D, Rosa GJM. Integration of single nucleotide variants and whole-genome DNA methylation profiles for classification of rheumatoid arthritis cases from controls. Heredity (Edinb) 2020; 124:658-674. [PMID: 32127659 DOI: 10.1038/s41437-020-0301-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 02/17/2020] [Accepted: 02/17/2020] [Indexed: 12/16/2022] Open
Abstract
This study evaluated the use of multiomics data for classification accuracy of rheumatoid arthritis (RA). Three approaches were used and compared in terms of prediction accuracy: (1) whole-genome prediction (WGP) using SNP marker information only, (2) whole-methylome prediction (WMP) using methylation profiles only, and (3) whole-genome/methylome prediction (WGMP) with combining both omics layers. The number of SNP and of methylation sites varied in each scenario, with either 1, 10, or 50% of these preselected based on four approaches: randomly, evenly spaced, lowest p value (genome-wide association or epigenome-wide association study), and estimated effect size using a Bayesian ridge regression (BRR) model. To remove effects of high levels of pairwise linkage disequilibrium (LD), SNPs were also preselected with an LD-pruning method. Five Bayesian regression models were studied for classification, including BRR, Bayes-A, Bayes-B, Bayes-C, and the Bayesian LASSO. Adjusting methylation profiles for cellular heterogeneity within whole blood samples had a detrimental effect on the classification ability of the models. Overall, WGMP using Bayes-B model has the best performance. In particular, selecting SNPs based on LD-pruning with 1% of the methylation sites selected based on BRR included in the model, and fitting the most significant SNP as a fixed effect was the best method for predicting disease risk with a classification accuracy of 0.975. Our results showed that multiomics data can be used to effectively predict the risk of RA and identify cases in early stages to prevent or alter disease progression via appropriate interventions.
Collapse
Affiliation(s)
- Mahmoud Amiri Roudbar
- Department of Animal Science, Safiabad-Dezful Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education & Extension Organization (AREEO), Dezful, Iran.
| | - Mohammad Reza Mohammadabadi
- Department of Animal Science, College of Agriculture, Shahid Bahonar University of Kerman, 76169-133, Kerman, Iran
| | - Ahmad Ayatollahi Mehrgardi
- Department of Animal Science, College of Agriculture, Shahid Bahonar University of Kerman, 76169-133, Kerman, Iran
| | - Rostam Abdollahi-Arpanahi
- Department of Animal and Poultry Science, College of Aburaihan, University of Tehran, 465, Pakdasht, Tehran, Iran
| | - Mehdi Momen
- Department of Surgical Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Gota Morota
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Fernando Brito Lopes
- Department of Animal Sciences, Sao Paulo State University, Julio de Mesquita Filho (UNESP), Prof. Paulo Donato Castelane, Jaboticabal, SP, 14884-900, Brazil
| | - Daniel Gianola
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53792, USA
| | - Guilherme J M Rosa
- Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53792, USA
| |
Collapse
|
39
|
Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Multi-omics Data Integration, Interpretation, and Its Application. Bioinform Biol Insights 2020; 14:1177932219899051. [PMID: 32076369 PMCID: PMC7003173 DOI: 10.1177/1177932219899051] [Citation(s) in RCA: 649] [Impact Index Per Article: 129.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 11/09/2019] [Indexed: 12/22/2022] Open
Abstract
To study complex biological processes holistically, it is imperative to take an integrative approach that combines multi-omics data to highlight the interrelationships of the involved biomolecules and their functions. With the advent of high-throughput techniques and availability of multi-omics data generated from a large set of samples, several promising tools and methods have been developed for data integration and interpretation. In this review, we collected the tools and methods that adopt integrative approach to analyze multiple omics data and summarized their ability to address applications such as disease subtyping, biomarker prediction, and deriving insights into the data. We provide the methodology, use-cases, and limitations of these tools; brief account of multi-omics data repositories and visualization portals; and challenges associated with multi-omics data integration.
Collapse
Affiliation(s)
| | | | | | - Abhay Jere
- Innovation Cell, Ministry of Human Resource Development, New Delhi, India
| | | |
Collapse
|
40
|
Csala A, Zwinderman AH, Hof MH. Multiset sparse partial least squares path modeling for high dimensional omics data analysis. BMC Bioinformatics 2020; 21:9. [PMID: 31918677 PMCID: PMC6953292 DOI: 10.1186/s12859-019-3286-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 11/20/2019] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Recent technological developments have enabled the measurement of a plethora of biomolecular data from various omics domains, and research is ongoing on statistical methods to leverage these omics data to better model and understand biological pathways and genetic architectures of complex phenotypes. Current reviews report that the simultaneous analysis of multiple (i.e. three or more) high dimensional omics data sources is still challenging and suitable statistical methods are unavailable. Often mentioned challenges are the lack of accounting for the hierarchical structure between omics domains and the difficulty of interpretation of genomewide results. This study is motivated to address these challenges. We propose multiset sparse Partial Least Squares path modeling (msPLS), a generalized penalized form of Partial Least Squares path modeling, for the simultaneous modeling of biological pathways across multiple omics domains. msPLS simultaneously models the effect of multiple molecular markers, from multiple omics domains, on the variation of multiple phenotypic variables, while accounting for the relationships between data sources, and provides sparse results. The sparsity in the model helps to provide interpretable results from analyses of hundreds of thousands of biomolecular variables. RESULTS With simulation studies, we quantified the ability of msPLS to discover associated variables among high dimensional data sources. Furthermore, we analysed high dimensional omics datasets to explore biological pathways associated with Marfan syndrome and with Chronic Lymphocytic Leukaemia. Additionally, we compared the results of msPLS to the results of Multi-Omics Factor Analysis (MOFA), which is an alternative method to analyse this type of data. CONCLUSIONS msPLS is an multiset multivariate method for the integrative analysis of multiple high dimensional omics data sources. It accounts for the relationship between multiple high dimensional data sources while it provides interpretable results through its sparse solutions. The biomarkers found by msPLS in the omics datasets can be interpreted in terms of biological pathways associated with the pathophysiology of Marfan syndrome and of Chronic Lymphocytic Leukaemia. Additionally, msPLS outperforms MOFA in terms of variation explained in the chronic lymphocytic leukaemia dataset while it identifies the two most important clinical markers for Chronic Lymphocytic Leukaemia AVAILABILITY: http://uva.csala.me/mspls.https://github.com/acsala/2018_msPLS.
Collapse
Affiliation(s)
- Attila Csala
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, 1105 AZ The Netherlands
| | - Aeilko H. Zwinderman
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, 1105 AZ The Netherlands
| | - Michel H. Hof
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, 1105 AZ The Netherlands
| |
Collapse
|
41
|
Shao W, Han Z, Cheng J, Cheng L, Wang T, Sun L, Lu Z, Zhang J, Zhang D, Huang K. Integrative Analysis of Pathological Images and Multi-Dimensional Genomic Data for Early-Stage Cancer Prognosis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:99-110. [PMID: 31170067 DOI: 10.1109/tmi.2019.2920608] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The integrative analysis of histopathological images and genomic data has received increasing attention for studying the complex mechanisms of driving cancers. However, most image-genomic studies have been restricted to combining histopathological images with the single modality of genomic data (e.g., mRNA transcription or genetic mutation), and thus neglect the fact that the molecular architecture of cancer is manifested at multiple levels, including genetic, epigenetic, transcriptional, and post-transcriptional events. To address this issue, we propose a novel ordinal multi-modal feature selection (OMMFS) framework that can simultaneously identify important features from both pathological images and multi-modal genomic data (i.e., mRNA transcription, copy number variation, and DNA methylation data) for the prognosis of cancer patients. Our model is based on a generalized sparse canonical correlation analysis framework, by which we also take advantage of the ordinal survival information among different patients for survival outcome prediction. We evaluate our method on three early-stage cancer datasets derived from The Cancer Genome Atlas (TCGA) project, and the experimental results demonstrated that both the selected image and multi-modal genomic markers are strongly correlated with survival enabling effective stratification of patients with distinct survival than the comparing methods, which is often difficult for early-stage cancer patients.
Collapse
|
42
|
Hernández-Lemus E, Reyes-Gopar H, Espinal-Enríquez J, Ochoa S. The Many Faces of Gene Regulation in Cancer: A Computational Oncogenomics Outlook. Genes (Basel) 2019; 10:E865. [PMID: 31671657 PMCID: PMC6896122 DOI: 10.3390/genes10110865] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/24/2019] [Indexed: 12/16/2022] Open
Abstract
Cancer is a complex disease at many different levels. The molecular phenomenology of cancer is also quite rich. The mutational and genomic origins of cancer and their downstream effects on processes such as the reprogramming of the gene regulatory control and the molecular pathways depending on such control have been recognized as central to the characterization of the disease. More important though is the understanding of their causes, prognosis, and therapeutics. There is a multitude of factors associated with anomalous control of gene expression in cancer. Many of these factors are now amenable to be studied comprehensively by means of experiments based on diverse omic technologies. However, characterizing each dimension of the phenomenon individually has proven to fall short in presenting a clear picture of expression regulation as a whole. In this review article, we discuss some of the more relevant factors affecting gene expression control both, under normal conditions and in tumor settings. We describe the different omic approaches that we can use as well as the computational genomic analysis needed to track down these factors. Then we present theoretical and computational frameworks developed to integrate the amount of diverse information provided by such single-omic analyses. We contextualize this within a systems biology-based multi-omic regulation setting, aimed at better understanding the complex interplay of gene expression deregulation in cancer.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Helena Reyes-Gopar
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| |
Collapse
|
43
|
Pfeffer M, Uschmajew A, Amaro A, Pfeffer U. Data Fusion Techniques for the Integration of Multi-Domain Genomic Data from Uveal Melanoma. Cancers (Basel) 2019; 11:cancers11101434. [PMID: 31561508 PMCID: PMC6826760 DOI: 10.3390/cancers11101434] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 08/29/2019] [Accepted: 09/15/2019] [Indexed: 11/16/2022] Open
Abstract
Uveal melanoma (UM) is a rare cancer that is well characterized at the molecular level. Two to four classes have been identified by the analyses of gene expression (mRNA, ncRNA), DNA copy number, DNA-methylation and somatic mutations yet no factual integration of these data has been reported. We therefore applied novel algorithms for data fusion, joint Singular Value Decomposition (jSVD) and joint Constrained Matrix Factorization (jCMF), as well as similarity network fusion (SNF), for the integration of gene expression, methylation and copy number data that we applied to the Cancer Genome Atlas (TCGA) UM dataset. Variant features that most strongly impact on definition of classes were extracted for biological interpretation of the classes. Data fusion allows for the identification of the two to four classes previously described. Not all of these classes are evident at all levels indicating that integrative analyses add to genomic discrimination power. The classes are also characterized by different frequencies of somatic mutations in putative driver genes (GNAQ, GNA11, SF3B1, BAP1). Innovative data fusion techniques confirm, as expected, the existence of two main types of uveal melanoma mainly characterized by copy number alterations. Subtypes were also confirmed but are somewhat less defined. Data fusion allows for real integration of multi-domain genomic data.
Collapse
Affiliation(s)
- Max Pfeffer
- Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany.
| | - André Uschmajew
- Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany.
| | - Adriana Amaro
- IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy.
| | - Ulrich Pfeffer
- IRCCS Ospedale Policlinico San Martino, 16132 Genova, Italy.
| |
Collapse
|
44
|
Singh A, Shannon CP, Gautier B, Rohart F, Vacher M, Tebbutt SJ, Lê Cao KA. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics 2019; 35:3055-3062. [PMID: 30657866 PMCID: PMC6735831 DOI: 10.1093/bioinformatics/bty1054] [Citation(s) in RCA: 478] [Impact Index Per Article: 79.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 12/17/2018] [Accepted: 01/14/2019] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION In the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups. RESULTS Using simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites. AVAILABILITY AND IMPLEMENTATION DIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters' choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Amrit Singh
- Prevention of Organ Failure (PROOF) Centre of Excellence, University of British Columbia, Vancouver, BC, Canada
| | - Casey P Shannon
- Prevention of Organ Failure (PROOF) Centre of Excellence, University of British Columbia, Vancouver, BC, Canada
| | - Benoît Gautier
- The University of Queensland Diamantina Institute, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Florian Rohart
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland, Australia
| | - Michaël Vacher
- Australian eHealth Research Centre, Commonwealth Scientific and Industrial Research Organisation, Brisbane, Queensland, Australia
| | - Scott J Tebbutt
- Prevention of Organ Failure (PROOF) Centre of Excellence, University of British Columbia, Vancouver, BC, Canada
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
45
|
Shi Q, Hu B, Zeng T, Zhang C. Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data. Front Genet 2019; 10:744. [PMID: 31497031 PMCID: PMC6712585 DOI: 10.3389/fgene.2019.00744] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 07/16/2019] [Indexed: 12/18/2022] Open
Abstract
Integration of distinct biological data types could provide a comprehensive view of biological processes or complex diseases. The combinations of molecules responsible for different phenotypes form multiple embedded (expression) subspaces, thus identifying the intrinsic data structure is challenging by regular integration methods. In this paper, we propose a novel framework of “Multi-view Subspace Clustering Analysis (MSCA),” which could measure the local similarities of samples in the same subspace and obtain the global consensus sample patterns (structures) for multiple data types, thereby comprehensively capturing the underlying heterogeneity of samples. Applied to various synthetic datasets, MSCA performs effectively to recognize the predefined sample patterns, and is robust to data noises. Given a real biological dataset, i.e., Cancer Cell Line Encyclopedia (CCLE) data, MSCA successfully identifies cell clusters of common aberrations across cancer types. A remarkable superiority over the state-of-the-art methods, such as iClusterPlus, SNF, and ANF, has also been demonstrated in our simulation and case studies.
Collapse
Affiliation(s)
- Qianqian Shi
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Bing Hu
- Department of Applied Mathematics, College of Science, Zhejiang University of Technology, Hangzhou, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China
| | | |
Collapse
|
46
|
Chen J, Zhang S. Discovery of two-level modular organization from matched genomic data via joint matrix tri-factorization. Nucleic Acids Res 2019; 46:5967-5976. [PMID: 29878151 PMCID: PMC6158745 DOI: 10.1093/nar/gky440] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 05/08/2018] [Indexed: 12/22/2022] Open
Abstract
With the rapid development of biotechnology, multi-dimensional genomic data are available for us to study the regulatory associations among multiple levels. Thus, it is essential to develop a tool to identify not only the modular patterns from multiple levels, but also the relationships among these modules. In this study, we adopt a novel non-negative matrix factorization framework (NetNMF) to integrate pairwise genomic data in a network manner. NetNMF could reveal the modules of each dimension and the connections within and between both types of modules. We first demonstrated the effectiveness of NetNMF using a set of simulated data and compared it with two typical NMF methods. Further, we applied it to two different types of pairwise genomic datasets including microRNA (miRNA) and gene expression data from The Cancer Genome Atlas and gene expression and pharmacological data from the Cancer Genome Project. We respectively identified a two-level miRNA–gene module network and a two-level gene–drug module network. Not only have the majority of identified modules significantly functional implications, but also the three types of module pairs have closely biological associations. This module discovery tool provides us comprehensive insights into the mechanisms of how the two levels of molecules cooperate with each other.
Collapse
Affiliation(s)
- Jinyu Chen
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
47
|
Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res 2019; 46:10546-10562. [PMID: 30295871 PMCID: PMC6237755 DOI: 10.1093/nar/gky889] [Citation(s) in RCA: 259] [Impact Index Per Article: 43.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 09/20/2018] [Indexed: 12/18/2022] Open
Abstract
Recent high throughput experimental methods have been used to collect large biomedical omics datasets. Clustering of single omic datasets has proven invaluable for biological and medical research. The decreasing cost and development of additional high throughput methods now enable measurement of multi-omic data. Clustering multi-omic data has the potential to reveal further systems-level insights, but raises computational and biological challenges. Here, we review algorithms for multi-omics clustering, and discuss key issues in applying these algorithms. Our review covers methods developed specifically for omic data as well as generic multi-view methods developed in the machine learning community for joint clustering of multiple data types. In addition, using cancer data from TCGA, we perform an extensive benchmark spanning ten different cancer types, providing the first systematic comparison of leading multi-omics and multi-view clustering algorithms. The results highlight key issues regarding the use of single- versus multi-omics, the choice of clustering strategy, the power of generic multi-view methods and the use of approximated p-values for gauging solution quality. Due to the growing use of multi-omics data, we expect these issues to be important for future progress in the field.
Collapse
Affiliation(s)
- Nimrod Rappoport
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
48
|
Tang H, Zeng T, Chen L. High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis. Front Genet 2019; 10:371. [PMID: 31080457 PMCID: PMC6497731 DOI: 10.3389/fgene.2019.00371] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/09/2019] [Indexed: 12/19/2022] Open
Abstract
Quantifying or labeling the sample type with high quality is a challenging task, which is a key step for understanding complex diseases. Reducing noise pollution to data and ensuring the extracted intrinsic patterns in concordance with the primary data structure are important in sample clustering and classification. Here we propose an effective data integration framework named as HCI (High-order Correlation Integration), which takes an advantage of high-order correlation matrix incorporated with pattern fusion analysis (PFA), to realize high-dimensional data feature extraction. On the one hand, the high-order Pearson's correlation coefficient can highlight the latent patterns underlying noisy input datasets and thus improve the accuracy and robustness of the algorithms currently available for sample clustering. On the other hand, the PFA can identify intrinsic sample patterns efficiently from different input matrices by optimally adjusting the signal effects. To validate the effectiveness of our new method, we firstly applied HCI on four single-cell RNA-seq datasets to distinguish the cell types, and we found that HCI is capable of identifying the prior-known cell types of single-cell samples from scRNA-seq data with higher accuracy and robustness than other methods under different conditions. Secondly, we also integrated heterogonous omics data from TCGA datasets and GEO datasets including bulk RNA-seq data, which outperformed the other methods at identifying distinct cancer subtypes. Within an additional case study, we also constructed the mRNA-miRNA regulatory network of colorectal cancer based on the feature weight estimated from HCI, where the differentially expressed mRNAs and miRNAs were significantly enriched in well-known functional sets of colorectal cancer, such as KEGG pathways and IPA disease annotations. All these results supported that HCI has extensive flexibility and applicability on sample clustering with different types and organizations of RNA-seq data.
Collapse
Affiliation(s)
- Hui Tang
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China
| |
Collapse
|
49
|
Yang Z, Michailidis G. Quantifying heterogeneity of expression data based on principal components. Bioinformatics 2019; 35:553-559. [PMID: 30060088 DOI: 10.1093/bioinformatics/bty671] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 07/05/2018] [Accepted: 07/27/2018] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The diversity of biological omics data provides richness of information, but also presents an analytic challenge. While there has been much methodological and theoretical development on the statistical handling of large volumes of biological data, far less attention has been devoted to characterizing their veracity and variability. RESULTS We propose a method of statistically quantifying heterogeneity among multiple groups of datasets, derived from different omics modalities over various experimental and/or disease conditions. It draws upon strategies from analysis of variance and principal component analysis in order to reduce dimensionality of the variability across multiple data groups. The resulting hypothesis-based inference procedure is demonstrated with synthetic and real data from a cell line study of growth factor responsiveness based on a factorial experimental design. AVAILABILITY AND IMPLEMENTATION Source code and datasets are freely available at https://github.com/yangzi4/gPCA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zi Yang
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | | |
Collapse
|
50
|
Jain Y, Ding S, Qiu J. Sliced inverse regression for integrative multi-omics data analysis. Stat Appl Genet Mol Biol 2019; 18:/j/sagmb.ahead-of-print/sagmb-2018-0028/sagmb-2018-0028.xml. [PMID: 30685747 DOI: 10.1515/sagmb-2018-0028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Advancement in next-generation sequencing, transcriptomics, proteomics and other high-throughput technologies has enabled simultaneous measurement of multiple types of genomic data for cancer samples. These data together may reveal new biological insights as compared to analyzing one single genome type data. This study proposes a novel use of supervised dimension reduction method, called sliced inverse regression, to multi-omics data analysis to improve prediction over a single data type analysis. The study further proposes an integrative sliced inverse regression method (integrative SIR) for simultaneous analysis of multiple omics data types of cancer samples, including MiRNA, MRNA and proteomics, to achieve integrative dimension reduction and to further improve prediction performance. Numerical results show that integrative analysis of multi-omics data is beneficial as compared to single data source analysis, and more importantly, that supervised dimension reduction methods possess advantages in integrative data analysis in terms of classification and prediction as compared to unsupervised dimension reduction methods.
Collapse
Affiliation(s)
- Yashita Jain
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA
| | - Shanshan Ding
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA.,Department of Applied Economics and Statistics, University of Delaware, 531 S College Ave., Newark, DE 19711, USA
| | - Jing Qiu
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA.,Department of Applied Economics and Statistics, University of Delaware, 531 S College Ave., Newark, DE 19711, USA
| |
Collapse
|