1
|
A Novel Attention-Mechanism Based Cox Survival Model by Exploiting Pan-Cancer Empirical Genomic Information. Cells 2022; 11:cells11091421. [PMID: 35563727 PMCID: PMC9100007 DOI: 10.3390/cells11091421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/15/2022] [Accepted: 04/19/2022] [Indexed: 01/27/2023] Open
Abstract
Cancer prognosis is an essential goal for early diagnosis, biomarker selection, and medical therapy. In the past decade, deep learning has successfully solved a variety of biomedical problems. However, due to the high dimensional limitation of human cancer transcriptome data and the small number of training samples, there is still no mature deep learning-based survival analysis model that can completely solve problems in the training process like overfitting and accurate prognosis. Given these problems, we introduced a novel framework called SAVAE-Cox for survival analysis of high-dimensional transcriptome data. This model adopts a novel attention mechanism and takes full advantage of the adversarial transfer learning strategy. We trained the model on 16 types of TCGA cancer RNA-seq data sets. Experiments show that our module outperformed state-of-the-art survival analysis models such as the Cox proportional hazard model (Cox-ph), Cox-lasso, Cox-ridge, Cox-nnet, and VAECox on the concordance index. In addition, we carry out some feature analysis experiments. Based on the experimental results, we concluded that our model is helpful for revealing cancer-related genes and biological functions.
Collapse
|
2
|
Reel PS, Reel S, Pearson E, Trucco E, Jefferson E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 2021; 49:107739. [PMID: 33794304 DOI: 10.1016/j.biotechadv.2021.107739] [Citation(s) in RCA: 243] [Impact Index Per Article: 81.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/01/2021] [Accepted: 03/25/2021] [Indexed: 02/06/2023]
Abstract
With the development of modern high-throughput omic measurement platforms, it has become essential for biomedical studies to undertake an integrative (combined) approach to fully utilise these data to gain insights into biological systems. Data from various omics sources such as genetics, proteomics, and metabolomics can be integrated to unravel the intricate working of systems biology using machine learning-based predictive algorithms. Machine learning methods offer novel techniques to integrate and analyse the various omics data enabling the discovery of new biomarkers. These biomarkers have the potential to help in accurate disease prediction, patient stratification and delivery of precision medicine. This review paper explores different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease. It provides insight and recommendations for interdisciplinary professionals who envisage employing machine learning skills in multi-omics studies.
Collapse
Affiliation(s)
- Parminder S Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Smarti Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Ewan Pearson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Emanuele Trucco
- VAMPIRE project, Computing, School of Science and Engineering, University of Dundee, Dundee, United Kingdom
| | - Emily Jefferson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom.
| |
Collapse
|
3
|
Kim S, Kim K, Choe J, Lee I, Kang J. Improved survival analysis by learning shared genomic information from pan-cancer data. Bioinformatics 2021; 36:i389-i398. [PMID: 32657401 PMCID: PMC7355236 DOI: 10.1093/bioinformatics/btaa462] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Motivation Recent advances in deep learning have offered solutions to many biomedical tasks. However, there remains a challenge in applying deep learning to survival analysis using human cancer transcriptome data. As the number of genes, the input variables of survival model, is larger than the amount of available cancer patient samples, deep-learning models are prone to overfitting. To address the issue, we introduce a new deep-learning architecture called VAECox. VAECox uses transfer learning and fine tuning. Results We pre-trained a variational autoencoder on all RNA-seq data in 20 TCGA datasets and transferred the trained weights to our survival prediction model. Then we fine-tuned the transferred weights during training the survival model on each dataset. Results show that our model outperformed other previous models such as Cox Proportional Hazard with LASSO and ridge penalty and Cox-nnet on the 7 of 10 TCGA datasets in terms of C-index. The results signify that the transferred information obtained from entire cancer transcriptome data helped our survival prediction model reduce overfitting and show robust performance in unseen cancer patient samples. Availability and implementation Our implementation of VAECox is available at https://github.com/dmis-lab/VAECox. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sunkyu Kim
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul 02841, Republic of Korea
| | - Keonwoo Kim
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul 02841, Republic of Korea
| | - Junseok Choe
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul 02841, Republic of Korea
| | - Inggeol Lee
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul 02841, Republic of Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul 02841, Republic of Korea.,Interdisciplinary Graduate Program in Bioinformatics, College of Informatics, Korea University, Seoul 02841, Republic of Korea
| |
Collapse
|
4
|
Baldwin E, Han J, Luo W, Zhou J, An L, Liu J, Zhang HH, Li H. On fusion methods for knowledge discovery from multi-omics datasets. Comput Struct Biotechnol J 2020; 18:509-517. [PMID: 32206210 PMCID: PMC7078495 DOI: 10.1016/j.csbj.2020.02.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 01/25/2020] [Accepted: 02/19/2020] [Indexed: 12/22/2022] Open
Abstract
Recent years have witnessed the tendency of measuring a biological sample on multiple omics scales for a comprehensive understanding of how biological activities on varying levels are perturbed by genetic variants, environments, and their interactions. This new trend raises substantial challenges to data integration and fusion, of which the latter is a specific type of integration that applies a uniform method in a scalable manner, to solve biological problems which the multi-omics measurements target. Fusion-based analysis has advanced rapidly in the past decade, thanks to application drivers and theoretical breakthroughs in mathematics, statistics, and computer science. We will briefly address these methods from methodological and mathematical perspectives and categorize them into three types of approaches: data fusion (a narrowed definition as compared to the general data fusion concept), model fusion, and mixed fusion. We will demonstrate at least one typical example in each specific category to exemplify the characteristics, principles, and applications of the methods in general, as well as discuss the gaps and potential issues for future studies.
Collapse
Affiliation(s)
- Edwin Baldwin
- Department of Biosystems Engineering, University of Arizona, United States
| | - Jiali Han
- Department of Systems and Industrial Engineering, University of Arizona, United States
| | - Wenting Luo
- Department of Biosystems Engineering, University of Arizona, United States
| | - Jin Zhou
- Department of Epidemiology and Biostatics, University of Arizona, United States
| | - Lingling An
- Department of Biosystems Engineering, University of Arizona, United States.,Department of Epidemiology and Biostatics, University of Arizona, United States
| | - Jian Liu
- Department of Systems and Industrial Engineering, University of Arizona, United States
| | - Hao Helen Zhang
- Department of Mathematics, University of Arizona, United States
| | - Haiquan Li
- Department of Biosystems Engineering, University of Arizona, United States
| |
Collapse
|
5
|
El-Manzalawy Y, Hsieh TY, Shivakumar M, Kim D, Honavar V. Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data. BMC Med Genomics 2018; 11:71. [PMID: 30255801 PMCID: PMC6157248 DOI: 10.1186/s12920-018-0388-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Large-scale collaborative precision medicine initiatives (e.g., The Cancer Genome Atlas (TCGA)) are yielding rich multi-omics data. Integrative analyses of the resulting multi-omics data, such as somatic mutation, copy number alteration (CNA), DNA methylation, miRNA, gene expression, and protein expression, offer tantalizing possibilities for realizing the promise and potential of precision medicine in cancer prevention, diagnosis, and treatment by substantially improving our understanding of underlying mechanisms as well as the discovery of novel biomarkers for different types of cancers. However, such analyses present a number of challenges, including heterogeneity, and high-dimensionality of omics data. METHODS We propose a novel framework for multi-omics data integration using multi-view feature selection. We introduce a novel multi-view feature selection algorithm, MRMR-mv, an adaptation of the well-known Min-Redundancy and Maximum-Relevance (MRMR) single-view feature selection algorithm to the multi-view setting. RESULTS We report results of experiments using an ovarian cancer multi-omics dataset derived from the TCGA database on the task of predicting ovarian cancer survival. Our results suggest that multi-view models outperform both view-specific models (i.e., models trained and tested using a single type of omics data) and models based on two baseline data fusion methods. CONCLUSIONS Our results demonstrate the potential of multi-view feature selection in integrative analyses and predictive modeling from multi-omics data.
Collapse
Affiliation(s)
- Yasser El-Manzalawy
- Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA.,The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA.,The Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, PA, 16802, USA
| | - Tsung-Yu Hsieh
- Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA.,School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, PA, 16802, USA.,The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA
| | - Manu Shivakumar
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Dokyoon Kim
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA. .,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Vasant Honavar
- Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA. .,School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, PA, 16802, USA. .,The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA. .,The Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
6
|
Shivakumar M, Lee Y, Bang L, Garg T, Sohn KA, Kim D. Identification of epigenetic interactions between miRNA and DNA methylation associated with gene expression as potential prognostic markers in bladder cancer. BMC Med Genomics 2017; 10:30. [PMID: 28589857 PMCID: PMC5461531 DOI: 10.1186/s12920-017-0269-y] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background One of the fundamental challenges in cancer is to detect the regulators of gene expression changes during cancer progression. Through transcriptional silencing of critical cancer-related genes, epigenetic change such as DNA methylation plays a crucial role in cancer. In addition, miRNA, another major component of epigenome, is also a regulator at the post-transcriptional levels that modulate transcriptome changes. However, a mechanistic role of synergistic interactions between DNA methylation and miRNA as epigenetic regulators on transcriptomic changes and its association with clinical outcomes such as survival have remained largely unexplored in cancer. Methods In this study, we propose an integrative framework to identify epigenetic interactions between methylation and miRNA associated with transcriptomic changes. To test the utility of the proposed framework, the bladder cancer data set, including DNA methylation, miRNA expression, and gene expression data, from The Cancer Genome Atlas (TCGA) was analyzed for this study. Results First, we found 120 genes associated with interactions between the two epigenomic components. Then, 11 significant epigenetic interactions between miRNA and methylation, which target E2F3, CCND1, UTP6, CDADC1, SLC35E3, METRNL, TPCN2, NACC2, VGLL4, and PTEN, were found to be associated with survival. To this end, exploration of TCGA bladder cancer data identified epigenetic interactions that are associated with survival as potential prognostic markers in bladder cancer. Conclusions Given the importance and prevalence of these interactions of epigenetic events in bladder cancer it is timely to understand further how different epigenetic components interact and influence each other. Electronic supplementary material The online version of this article (doi:10.1186/s12920-017-0269-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Manu Shivakumar
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Younghee Lee
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Lisa Bang
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Tullika Garg
- Mowad Urology Department, Geisinger Health System, Danville, PA, USA
| | - Kyung-Ah Sohn
- Department of Software and Computer Engineering, Ajou University, Suwon, South Korea.
| | - Dokyoon Kim
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA. .,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
7
|
Kim D, Li R, Dudek SM, Ritchie MD. Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer. J Biomed Inform 2015; 56:220-8. [PMID: 26048077 DOI: 10.1016/j.jbi.2015.05.019] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Revised: 05/15/2015] [Accepted: 05/27/2015] [Indexed: 12/27/2022]
Abstract
Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Dokyoon Kim
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Ruowang Li
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Scott M Dudek
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Marylyn D Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA; Geisinger Health System, Danville, PA, USA.
| |
Collapse
|
8
|
Kim D, Joung JG, Sohn KA, Shin H, Park YR, Ritchie MD, Kim JH. Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc 2014; 22:109-20. [PMID: 25002459 PMCID: PMC4433357 DOI: 10.1136/amiajnl-2013-002481] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Objective Cancer can involve gene dysregulation via multiple mechanisms, so no single level of genomic data fully elucidates tumor behavior due to the presence of numerous genomic variations within or between levels in a biological system. We have previously proposed a graph-based integration approach that combines multi-omics data including copy number alteration, methylation, miRNA, and gene expression data for predicting clinical outcome in cancer. However, genomic features likely interact with other genomic features in complex signaling or regulatory networks, since cancer is caused by alterations in pathways or complete processes. Methods Here we propose a new graph-based framework for integrating multi-omics data and genomic knowledge to improve power in predicting clinical outcomes and elucidate interplay between different levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from The Cancer Genome Atlas for predicting stage, grade, and survival outcomes. Results Integrating multi-omics data with genomic knowledge to construct pre-defined features resulted in higher performance in clinical outcome prediction and higher stability. For the grade outcome, the model with gene expression data produced an area under the receiver operating characteristic curve (AUC) of 0.7866. However, models of the integration with pathway, Gene Ontology, chromosomal gene set, and motif gene set consistently outperformed the model with genomic data only, attaining AUCs of 0.7873, 0.8433, 0.8254, and 0.8179, respectively. Conclusions Integrating multi-omics data and genomic knowledge to improve understanding of molecular pathogenesis and underlying biology in cancer should improve diagnostic and prognostic indicators and the effectiveness of therapies.
Collapse
Affiliation(s)
- Dokyoon Kim
- Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Korea Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Je-Gun Joung
- Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Korea Translational Bioinformatics Lab (TBL), Samsung Genome Institute (SGI), Samsung Medical Center, Seoul, Korea
| | - Kyung-Ah Sohn
- Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Korea Department of Information and Computer Engineering, Ajou University, Suwon, Korea
| | - Hyunjung Shin
- Department of Industrial and Information Systems Engineering, Ajou University, Suwon, Korea
| | - Yu Rang Park
- Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Korea Department of Biomedical Informatics, Asan Medical Center, Seoul, Korea
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Ju Han Kim
- Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Korea Systems Biomedical Informatics Research Center, Seoul National University, Seoul, Korea
| |
Collapse
|
9
|
Kim D, Shin H, Sohn KA, Verma A, Ritchie MD, Kim JH. Incorporating inter-relationships between different levels of genomic data into cancer clinical outcome prediction. Methods 2014; 67:344-53. [PMID: 24561168 DOI: 10.1016/j.ymeth.2014.02.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2013] [Revised: 01/25/2014] [Accepted: 02/07/2014] [Indexed: 01/06/2023] Open
Abstract
In order to improve our understanding of cancer and develop multi-layered theoretical models for the underlying mechanism, it is essential to have enhanced understanding of the interactions between multiple levels of genomic data that contribute to tumor formation and progression. Although there exist recent approaches such as a graph-based framework that integrates multi-omics data including copy number alteration, methylation, gene expression, and miRNA data for cancer clinical outcome prediction, most of previous methods treat each genomic data as independent and the possible interplay between them is not explicitly incorporated to the model. However, cancer is dysregulated by multiple levels in the biological system through genomic, epigenomic, transcriptomic, and proteomic level. Thus, genomic features are likely to interact with other genomic features in the different genomic levels. In order to deepen our knowledge, it would be desirable to incorporate such inter-relationship information when integrating multi-omics data for cancer clinical outcome prediction. In this study, we propose a new graph-based framework that integrates not only multi-omics data but inter-relationship between them for better elucidating cancer clinical outcomes. In order to highlight the validity of the proposed framework, serous cystadenocarcinoma data from TCGA was adopted as a pilot task. The proposed model incorporating inter-relationship between different genomic features showed significantly improved performance compared to the model that does not consider inter-relationship when integrating multi-omics data. For the pair between miRNA and gene expression data, the model integrating miRNA, for example, gene expression, and inter-relationship between them with an AUC of 0.8476 (REI) outperformed the model combining miRNA and gene expression data with an AUC of 0.8404. Similar results were also obtained for other pairs between different levels of genomic data. Integration of different levels of data and inter-relationship between them can aid in extracting new biological knowledge by drawing an integrative conclusion from many pieces of information collected from diverse types of genomic data, eventually leading to more effective screening strategies and alternative therapies that may improve outcomes.
Collapse
Affiliation(s)
- Dokyoon Kim
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Republic of Korea; Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
| | - Hyunjung Shin
- Department of Industrial & Information Systems Engineering, Ajou University, San 5, Wonchun-dong, Yeoungtong-gu, 443-749 Suwon, Republic of Korea.
| | - Kyung-Ah Sohn
- Department of Information and Computer Engineering, Ajou University, Suwon 443-749, Republic of Korea.
| | - Anurag Verma
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
| | - Marylyn D Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
| | - Ju Han Kim
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Republic of Korea; Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Republic of Korea.
| |
Collapse
|
10
|
Sohn KA, Kim D, Lim J, Kim JH. Relative impact of multi-layered genomic data on gene expression phenotypes in serous ovarian tumors. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S9. [PMID: 24521303 PMCID: PMC3906601 DOI: 10.1186/1752-0509-7-s6-s9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background The emerging multi-layers of genomic data have provided unprecedented opportunities for cancer research, especially for the association study between gene expressions and other types of genomic features. No previous approaches, however, provide an adequate statistical framework for or global analysis on the relative impact of different genomic feature layers to gene expression phenotypes. Methods We propose an integrative statistical framework based on a sparse regression to model the impact of multi-layered genomic features on gene expression traits. The proposed approach can be regarded as an integrative expression Quantitative Traits Loci approach in which not only the genetic variations of SNPs or copy number variations but also other features in both genomic and epigenomic levels are used to explain the expression of genes. To highlight the validity of the proposed approach, the TCGA ovarian cancer dataset was analysed as a pilot task. Results The analysis shows that our integrative approach has consistently superior power in predicting gene expression levels compared to that from each single data type-based analysis. Moreover, the proposed method has the advantage of producing a substantially reduced number of spurious associations. We provide an interesting characterization of genes in terms of its genomic association patterns. Important genomic features reported in previous ovarian cancer research are successfully identified as major hubs in the resulting association network between heterogeneous types of genomic features and genes. Conclusions In this paper, we model the gene expression phenotypes with respect to multiple different types of genomic data in an integrative framework. Our analysis reveals the global view on the relative contribution of different genomic feature types to gene expression phenotypes in ovarian cancer.
Collapse
|
11
|
Kim D, Shin H, Joung JG, Lee SY, Kim JH. Intra-relation reconstruction from inter-relation: miRNA to gene expression. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 3:S8. [PMID: 24521265 PMCID: PMC3852212 DOI: 10.1186/1752-0509-7-s3-s8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
BACKGROUND In computational biology, a novel knowledge has been obtained mostly by identifying 'intra-relation,' the relation between entities on a specific biological level such as from gene expression or from microRNA (miRNA) and many such researches have been successful. However, intra-relations are not fully explaining complex cancer mechanisms because the inter-relation information between different levels of genomic data is missing, e.g. miRNA and its target genes. The 'inter-relation' between different levels of genomic data can be constructed from biological experimental data as well as genomic knowledge. METHODS Previously, we have proposed a graph-based framework that integrates with multi-layers of genomic data, copy number alteration, DNA methylation, gene expression, and miRNA expression, for the cancer clinical outcome prediction. However, the limitation of previous work was that we integrated with multi-layers of genomic data without considering of inter-relationship information between genomic features. In this paper, we propose a new integrative framework that combines genomic dataset from gene expression and genomic knowledge from inter-relation between miRNA and gene expression for the clinical outcome prediction as a pilot study. RESULTS In order to demonstrate the validity of the proposed method, the prediction of short-term/long-term survival for 82 patients in glioblastoma multiforme (GBM) was adopted as a base task. Based on our results, the accuracy of our predictive model increases because of incorporation of information fused over genomic dataset from gene expression and genomic knowledge from inter-relation between miRNA and gene expression. CONCLUSIONS In the present study, the intra-relation of gene expression was reconstructed from inter-relation between miRNA and gene expression for prediction of short-term/long-term survival of GBM patients. Our finding suggests that the utilization of external knowledge representing miRNA-mediated regulation of gene expression is substantially useful for elucidating the cancer phenotype.
Collapse
Affiliation(s)
- Dokyoon Kim
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
- Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea
- Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Hyunjung Shin
- Department of Industrial Engineering, Ajou University, San 5, Wonchun-dong, Yeoungtong-gu, 443-749, Suwon, Korea
| | - Je-Gun Joung
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
- Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea
- Translational Bioinformatics Lab (TBL), Samsung Genome Institute (SGI), Samsung Medical Center, Seoul, Korea
| | - Su-Yeon Lee
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
- Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea
| | - Ju Han Kim
- Seoul National University Biomedical Informatics (SNUBI), Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
- Systems Biomedical Informatics Research Center, Seoul National University, Seoul 110799, Korea
| |
Collapse
|