1
|
Hotness prediction of scientific topics based on a bibliographic knowledge graph. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.102980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
2
|
Park C, Kim B, Park T. DeepHisCoM: deep learning pathway analysis using hierarchical structural component models. Brief Bioinform 2022; 23:6590446. [DOI: 10.1093/bib/bbac171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 04/04/2022] [Accepted: 04/18/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Many statistical methods for pathway analysis have been used to identify pathways associated with the disease along with biological factors such as genes and proteins. However, most pathway analysis methods neglect the complex nonlinear relationship between biological factors and pathways. In this study, we propose a Deep-learning pathway analysis using Hierarchical structured CoMponent models (DeepHisCoM) that utilize deep learning to consider a nonlinear complex contribution of biological factors to pathways by constructing a multilayered model which accounts for hierarchical biological structure. Through simulation studies, DeepHisCoM was shown to have a higher power in the nonlinear pathway effect and comparable power for the linear pathway effect when compared to the conventional pathway methods. Application to hepatocellular carcinoma (HCC) omics datasets, including metabolomic, transcriptomic and metagenomic datasets, demonstrated that DeepHisCoM successfully identified three well-known pathways that are highly associated with HCC, such as lysine degradation, valine, leucine and isoleucine biosynthesis and phenylalanine, tyrosine and tryptophan. Application to the coronavirus disease-2019 (COVID-19) single-nucleotide polymorphism (SNP) dataset also showed that DeepHisCoM identified four pathways that are highly associated with the severity of COVID-19, such as mitogen-activated protein kinase (MAPK) signaling pathway, gonadotropin-releasing hormone (GnRH) signaling pathway, hypertrophic cardiomyopathy and dilated cardiomyopathy. Codes are available at https://github.com/chanwoo-park-official/DeepHisCoM.
Collapse
Affiliation(s)
- Chanwoo Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| | - Boram Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
3
|
Transducer Cascades for Biological Literature-Based Discovery. INFORMATION 2022. [DOI: 10.3390/info13050262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
G protein-coupled receptors (GPCRs) control the response of cells to many signals, and as such, are involved in most cellular processes. As membrane receptors, they are accessible at the surface of the cell. GPCRs are also the largest family of membrane receptors, with more than 800 representatives in mammal genomes. For this reason, they are ideal targets for drugs. Although about one third of approved drugs target GPCRs, only about 16% of GPCRs are targeted by drugs. One of the difficulties comes from the lack of knowledge on the intra-cellular events triggered by these molecules. In the last two decades, scientists have started mapping the signaling networks triggered by GPCRs. However, it soon appeared that the system is very complex, which led to the publication of more than 320,000 scientific papers. Clearly, a human cannot take into account such massive sources of information. These papers represent a mine of information about both ontological knowledge and experimental results related to GPCRs, which have to be exploited in order to build signaling networks. The ABLISS project aims at the automatic building of GPCRs networks using automated deductive reasoning, allowing to integrate all available data. Therefore, we processed the automatic extraction of network information from the literature using Natural Language Processing (NLP). We mainly focused on the experimental results about GPCRs reported in the scientific papers, as so far there is no source gathering all these experimental results. We designed a relational database in order to make them available to the scientific community later. After introducing the more general objectives of the ABLISS project, we describe the formalism in detail. We then explain the NLP program using the finite state methods (Unitex graph cascades) we implemented and discuss the extracted facts obtained. Finally, we present the design of the relational database that stores the facts extracted from the selected papers.
Collapse
|
4
|
Zhang J, Zhu M, Qian Y. protein2vec: Predicting Protein-Protein Interactions Based on LSTM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1257-1266. [PMID: 32750870 DOI: 10.1109/tcbb.2020.3003941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The semantic similarity of gene ontology (GO) terms is widely used to predict protein-protein interactions (PPIs). The traditional semantic similarity measures are based mainly on manually crafted features, which may ignore some important hidden information of the gene ontology. Moreover, those methods usually obtain the similarity between proteins from similarity between GO terms by some simple statistical rules, such as MAX and BMA (best-match average), oversimplifying the possible complex relationship between the proteins and the GO terms annotated with them. To overcome the two deficiencies, we propose a new method named protein2vec, which characterizes a protein with a vector based on the GO terms annotated to it and combines the information of both the GO and known PPIs. We firstly try to apply the network embedding algorithm on the GO network to generate feature vectors for each GO term. Then, Long Short-Time Memory (LSTM) encodes the feature vectors of the GO terms annotated with a protein into another vector (called protein vector). Finally, two protein vectors are forwarded into a feedforward neural network to predict the interaction between the two corresponding proteins. The experimental results show that protein2vec outperforms almost all commonly used traditional semantic similarity methods.
Collapse
|
5
|
Kim SA, Kang N, Park T. Hierarchical Structured Component Analysis for Microbiome Data Using Taxonomy Assignments. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1302-1312. [PMID: 33211665 DOI: 10.1109/tcbb.2020.3039326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The recent advent of high-throughput sequencing technology has enabled us to study the associations between human microbiome and diseases. The DNA sequences of microbiome samples are clustered as operational taxonomic units (OTUs) according to their similarity. The OTU table containing counts of OTUs present in each sample is used to measure correlations between OTUs and disease status and find key microbes for prediction of the disease status. Various statistical methods have been proposed for such microbiome data analysis. However, none of these methods reflects the hierarchy of taxonomy information. In this paper, we propose a hierarchical structural component model for microbiome data (HisCoM-microb) using taxonomy information as well as OTU table data. The proposed HisCoM-microb consists of two layers: one for OTUs and the other for taxa at the higher taxonomy level. Then we calculate simultaneously coefficient estimates of OTUs and taxa of the two layers inserted in the hierarchical model. Through this analysis, we can infer the association between taxa or OTUs and disease status, considering the impact of taxonomic structure on disease status. Both simulation study and real microbiome data analysis show that HisCoM-microb can successfully reveal the relations between each taxon and disease status and identify the key OTUs of the disease at the same time.
Collapse
|
6
|
Hwangbo S, Lee S, Lee S, Hwang H, Kim I, Park T. Kernel-based hierarchical structural component models for pathway analysis. Bioinformatics 2022; 38:3078-3086. [PMID: 35460238 DOI: 10.1093/bioinformatics/btac276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 04/08/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Pathway analyses have led to more insight into the underlying biological functions related to the phenotype of interest in various types of omics data. Pathway-based statistical approaches have been actively developed, but most of them do not consider correlations among pathways. Because it is well known that there are quite a few biomarkers that overlap between pathways, these approaches may provide misleading results. In addition, most pathway-based approaches tend to assume that biomarkers within a pathway have linear associations with the phenotype of interest, even though the relationships are more complex. RESULTS To model complex effects including nonlinear effects, we propose a new approach, Hierarchical structural CoMponent analysis using Kernel (HisCoM-Kernel). The proposed method models nonlinear associations between biomarkers and phenotype by extending the kernel machine regression and analyzes entire pathways simultaneously by using the biomarker-pathway hierarchical structure. HisCoM-Kernel is a flexible model that can be applied to various omics data. It was successfully applied to three omics datasets generated by different technologies. Our simulation studies showed that HisCoM-Kernel provided higher statistical power than other existing pathway-based methods in all datasets. The application of HisCoM-Kernel to three types of omics dataset showed its superior performance compared to existing methods in identifying more biologically meaningful pathways, including those reported in previous studies. AVAILABILITY AND IMPLEMENTATION Freely available at http://statgen.snu.ac.kr/software/HisCom-Kernel/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Suhyun Hwangbo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.,Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Sungyoung Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Sejong, 05006, Korea
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, QC, H3A 1B1, Canada
| | - Inyoung Kim
- Department of Statistics, Virginia Tech, Blacksburg, Virginia, 24060, U.S.A
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.,Department of Statistics, Seoul National University, Seoul, 151-747, Korea
| |
Collapse
|
7
|
An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records. J Pers Med 2022; 12:jpm12010025. [PMID: 35055340 PMCID: PMC8778877 DOI: 10.3390/jpm12010025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 12/11/2021] [Accepted: 12/30/2021] [Indexed: 11/17/2022] Open
Abstract
Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of EMRs in training machine learning models in the biomedical domain. We present an unsupervised approach to medical data annotation. Syntactic trees are produced from initial sentences using morphological and syntactical analyses. In retrieved trees, similar subtrees are grouped using Node2Vec and Word2Vec and labeled using domain vocabularies and Wikidata categories. The usage of Wikidata categories increased the fraction of labeled sentences 5.5 times compared to labeling with domain vocabularies only. We show on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups. Annotation with domain vocabularies and Wikidata categories covered more than 82% of sentences of the corpus, extended with timestamp and event labels 97% of sentences got covered. The obtained method can be used to label EMRs in Russian automatically. Additionally, the proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabulary.
Collapse
|
8
|
Kim Y, Lee S, Jang JY, Lee S, Park T. Identifying miRNA-mRNA Integration Set Associated With Survival Time. Front Genet 2021; 12:634922. [PMID: 34267778 PMCID: PMC8276759 DOI: 10.3389/fgene.2021.634922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 04/06/2021] [Indexed: 11/26/2022] Open
Abstract
In the “personalized medicine” era, one of the most difficult problems is identification of combined markers from different omics platforms. Many methods have been developed to identify candidate markers for each type of omics data, but few methods facilitate the identification of multiple markers on multi-omics platforms. microRNAs (miRNAs) is well known to affect only indirectly phenotypes by regulating mRNA expression and/or protein translation. To take into account this knowledge into practice, we suggest a miRNA-mRNA integration model for survival time analysis, called mimi-surv, which accounts for the biological relationship, to identify such integrated markers more efficiently. Through simulation studies, we found that the statistical power of mimi-surv be better than other models. Application to real datasets from Seoul National University Hospital and The Cancer Genome Atlas demonstrated that mimi-surv successfully identified miRNA-mRNA integrations sets associated with progression-free survival of pancreatic ductal adenocarcinoma (PDAC) patients. Only mimi-surv found miR-96, a previously unidentified PDAC-related miRNA in these two real datasets. Furthermore, mimi-surv was shown to identify more PDAC related miRNAs than other methods because it used the known structure for miRNA-mRNA regularization. An implementation of mimi-surv is available at http://statgen.snu.ac.kr/software/mimi-surv.
Collapse
Affiliation(s)
- Yongkang Kim
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul, South Korea.,Department of Genomic Medicine, Seoul National University Hospital, Seoul, South Korea
| | - Jin-Young Jang
- Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine, Seoul, South Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Seoul, South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, South Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| |
Collapse
|
9
|
Kim B, Cho EJ, Yoon JH, Kim SS, Cheong JY, Cho SW, Park T. Pathway-Based Integrative Analysis of Metabolome and Microbiome Data from Hepatocellular Carcinoma and Liver Cirrhosis Patients. Cancers (Basel) 2020; 12:E2705. [PMID: 32967314 PMCID: PMC7563418 DOI: 10.3390/cancers12092705] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 09/14/2020] [Accepted: 09/16/2020] [Indexed: 12/12/2022] Open
Abstract
Aberrations of the human microbiome are associated with diverse liver diseases, including hepatocellular carcinoma (HCC). Even if we can associate specific microbes with particular diseases, it is difficult to know mechanistically how the microbe contributes to the pathophysiology. Here, we sought to reveal the functional potential of the HCC-associated microbiome with the human metabolome which is known to play a role in connecting host phenotype to microbiome function. To utilize both microbiome and metabolomic data sets, we propose an innovative, pathway-based analysis, Hierarchical structural Component Model for pathway analysis of Microbiome and Metabolome (HisCoM-MnM), for integrating microbiome and metabolomic data. In particular, we used pathway information to integrate these two omics data sets, thus providing insight into biological interactions between different biological layers, with regard to the host's phenotype. The application of HisCoM-MnM to data sets from 103 and 97 patients with HCC and liver cirrhosis (LC), respectively, showed that this approach could identify HCC-related pathways related to cancer metabolic reprogramming, in addition to the significant metabolome and metagenome that make up those pathways.
Collapse
Affiliation(s)
- Boram Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea;
| | - Eun Ju Cho
- Department of Internal Medicine and Liver Research Institute, Seoul National University College of Medicine, Seoul 03080, Korea; (E.J.C.); (J.-H.Y.)
| | - Jung-Hwan Yoon
- Department of Internal Medicine and Liver Research Institute, Seoul National University College of Medicine, Seoul 03080, Korea; (E.J.C.); (J.-H.Y.)
| | - Soon Sun Kim
- Department of Gastroenterology, Ajou University School of Medicine, Suwon 16499, Korea; (S.S.K.); (J.Y.C.); (S.W.C.)
| | - Jae Youn Cheong
- Department of Gastroenterology, Ajou University School of Medicine, Suwon 16499, Korea; (S.S.K.); (J.Y.C.); (S.W.C.)
| | - Sung Won Cho
- Department of Gastroenterology, Ajou University School of Medicine, Suwon 16499, Korea; (S.S.K.); (J.Y.C.); (S.W.C.)
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea;
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
10
|
Leem S, Huh I, Park T. Enhanced Permutation Tests via Multiple Pruning. Front Genet 2020; 11:509. [PMID: 32670346 PMCID: PMC7330123 DOI: 10.3389/fgene.2020.00509] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 04/27/2020] [Indexed: 11/25/2022] Open
Abstract
Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these distinct characteristics, standard statistical analyses using parametric-based assumptions may sometimes fail to provide exact asymptotic results. To resolve this issue, permutation tests can be a way to exactly analyze multi-omics data because they are distribution-free and flexible to use. In permutation tests, p-values are evaluated by estimating the locations of test statistics in an empirical null distribution generated by random shuffling. However, the permutation approach can be infeasible when the number of features increases, because more stringent control of type I error is needed for multiple hypothesis testing, and consequently, much larger numbers of permutations are required to reach significance. To address this problem, we propose a well-organized strategy, “ENhanced Permutation tests via multiple Pruning (ENPP).” ENPP prunes the features in every permutation round if they are determined to be non-significant. In other words, if the feature statistics from the permuted datasets exceed the feature statistics from the original dataset, beyond a predetermined threshold, the feature is determined to be non-significant. If so, ENPP removes the feature and iterates the process without the feature in the next permutation round. Our simulation study showed that the ENPP method could remove about 50% of the features at the first permutation round, and, by the 100th permutation round, 98% of the features had been removed and only 7.4% of the computation time with the original unpruned permutation approach had elapsed. In addition, we applied this approach to a real data set (Korea Association REsource: KARE) of 327,872 SNPs to find association with a non-normally distributed phenotype (fasting plasma glucose), interpreted the results, and discussed the feasibility and advantages of the approach.
Collapse
Affiliation(s)
- Sangseob Leem
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Iksoo Huh
- College of Nursing and Research Institute of Nursing Science, Seoul National University, Seoul, South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, South Korea
| |
Collapse
|
11
|
Nair AA, Tang X, Thompson KJ, Vedell PT, Kalari KR, Subramanian S. Frequency of MicroRNA Response Elements Identifies Pathologically Relevant Signaling Pathways in Triple-Negative Breast Cancer. iScience 2020; 23:101249. [PMID: 32629614 PMCID: PMC7322352 DOI: 10.1016/j.isci.2020.101249] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 03/24/2020] [Accepted: 06/03/2020] [Indexed: 02/02/2023] Open
Abstract
Complex interactions between mRNAs and microRNAs influence cellular functions. The mRNA-microRNA interactions also determine the post-transcriptional availability of mRNAs and unbound microRNAs. MicroRNAs binds to one or more microRNA response elements (MREs) located on the 3′UTR of mRNAs. In this study, we leveraged MREs and their frequencies in cancer and matched normal tissues to obtain insights into disease-specific interactions between mRNAs and microRNAs. We developed a bioinformatics method “ReMIx” that utilizes RNA sequencing (RNA-Seq) data to quantify MRE frequencies across the transcriptome. We applied ReMIx to triple-negative (TN) breast cancer tumor-normal adjacent pairs and identified MREs specific to TN tumors. ReMIx identified candidate mRNAs and microRNAs in the MAPK signaling cascade. Further analysis of MAPK gene regulatory networks revealed microRNA partners that influence and modulate MAPK signaling. In conclusion, we demonstrate a novel method of using MREs in the identification of functionally relevant mRNA-microRNA interactions in TN breast cancer. Bioinformatics method ReMIx identify differential microRNA response rlements (MRE) Tumor-specific MREs frequency observed in triple-negative breast cancer (TNBC) MRE analysis identify MAPK signaling genes as therapeutic target for TNBC MREs frequency can be used to identify pathologically relevant pathways
Collapse
Affiliation(s)
- Asha A Nair
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA
| | - Xiaojia Tang
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA
| | - Kevin J Thompson
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA
| | - Peter T Vedell
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA
| | - Krishna R Kalari
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA.
| | - Subbaya Subramanian
- Department of Surgery, University of Minnesota, 420 Delaware St SE, Minneapolis, MN 55455, USA; Masonic Cancer Center, University of Minnesota, Minneapolis, MN 55455, USA; Center for Immunology, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
12
|
Mok L, Park T. HisCoM-PAGE: software for hierarchical structural component models for pathway analysis of gene expression data. Genomics Inform 2019; 17:e45. [PMID: 31896245 PMCID: PMC6944051 DOI: 10.5808/gi.2019.17.4.e45] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/22/2019] [Indexed: 12/04/2022] Open
Abstract
To identify pathways associated with survival phenotypes using gene expression data, we recently proposed the hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE) method. The HisCoM-PAGE software can consider hierarchical structural relationships between genes and pathways and analyze multiple pathways simultaneously. It can be applied to various types of gene expression data, such as microarray data or RNA sequencing data. We expect that the HisCoM-PAGE software will make our method more easily accessible to researchers who want to perform pathway analysis for survival times.
Collapse
Affiliation(s)
- Lydia Mok
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Corresponding author: E-mail:
| |
Collapse
|
13
|
Mok L, Kim Y, Lee S, Choi S, Lee S, Jang JY, Park T. HisCoM-PAGE: Hierarchical Structural Component Models for Pathway Analysis of Gene Expression Data. Genes (Basel) 2019; 10:E931. [PMID: 31739607 PMCID: PMC6896173 DOI: 10.3390/genes10110931] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 11/06/2019] [Accepted: 11/07/2019] [Indexed: 01/10/2023] Open
Abstract
Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.
Collapse
Affiliation(s)
- Lydia Mok
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Yongkang Kim
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul 03080, Korea
| | - Sungkyoung Choi
- Department of Applied Mathematics, Hanyang University (ERICA), Ansan 15588, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Seoul 05006, Korea
| | - Jin-Young Jang
- Department of Surgery, Seoul National University College of Medicine, Seoul 03080, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
14
|
Kim Y, Park T. HisCoM-mimi: Software for Hierarchical Structural Component Analysis for miRNA-mRNA Integration Model for Binary Phenotypes. Genomics Inform 2019; 17:e10. [PMID: 30929411 PMCID: PMC6459173 DOI: 10.5808/gi.2019.17.1.e10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 03/11/2019] [Indexed: 11/20/2022] Open
Abstract
To identify miRNA-mRNA interaction pairs associated with binary phenotypes, we propose a hierarchical structural component model for miRNA-mRNA integration (HisCoM-mimi). Information on known mRNA targets provided by TargetScan is used to perform HisCoM-mimi. However, multiple databases can be used to find miRNA-mRNA signatures with known biological information through different algorithms. To take these additional databases into account, we present our advanced application software for HisCoM-mimi for binary phenotypes. The proposed HisCoM-mimi supports both TargetScan and miRTarBase, which provides manually-verified information initially gathered by text-mining the literature. By integrating information from miRTarBase into HisCoM-mimi, a broad range of target information derived from the research literature can be analyzed. Another improvement of the new HisCoM-mimi approach is the inclusion of updated algorithms to provide the lasso and elastic-net penalties for users who want to fit a model with a smaller number of selected miRNAs and mRNAs. We expect that our HisCoM-mimi software will make advanced methods accessible to researchers who want to identify miRNA-mRNA interaction pairs related with binary phenotypes.
Collapse
Affiliation(s)
- Yongkang Kim
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|