1
|
A Multi-Granularity Information-Based Method for Learning High-Dimensional Bayesian Network Structures. Cognit Comput 2021. [DOI: 10.1007/s12559-021-09891-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
2
|
Nguyen H, Tran D, Tran B, Pehlivan B, Nguyen T. A comprehensive survey of regulatory network inference methods using single cell RNA sequencing data. Brief Bioinform 2021; 22:bbaa190. [PMID: 34020546 PMCID: PMC8138892 DOI: 10.1093/bib/bbaa190] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 06/19/2020] [Accepted: 07/24/2020] [Indexed: 12/13/2022] Open
Abstract
Gene regulatory network is a complicated set of interactions between genetic materials, which dictates how cells develop in living organisms and react to their surrounding environment. Robust comprehension of these interactions would help explain how cells function as well as predict their reactions to external factors. This knowledge can benefit both developmental biology and clinical research such as drug development or epidemiology research. Recently, the rapid advance of single-cell sequencing technologies, which pushed the limit of transcriptomic profiling to the individual cell level, opens up an entirely new area for regulatory network research. To exploit this new abundant source of data and take advantage of data in single-cell resolution, a number of computational methods have been proposed to uncover the interactions hidden by the averaging process in standard bulk sequencing. In this article, we review 15 such network inference methods developed for single-cell data. We discuss their underlying assumptions, inference techniques, usability, and pros and cons. In an extensive analysis using simulation, we also assess the methods' performance, sensitivity to dropout and time complexity. The main objective of this survey is to assist not only life scientists in selecting suitable methods for their data and analysis purposes but also computational scientists in developing new methods by highlighting outstanding challenges in the field that remain to be addressed in the future development.
Collapse
Affiliation(s)
- Hung Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| | - Duc Tran
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| | - Bang Tran
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| | - Bahadir Pehlivan
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| |
Collapse
|
3
|
Kontio JAJ, Rinta-Aho MJ, Sillanpää MJ. Estimating Linear and Nonlinear Gene Coexpression Networks by Semiparametric Neighborhood Selection. Genetics 2020; 215:597-607. [PMID: 32414870 PMCID: PMC7337083 DOI: 10.1534/genetics.120.303186] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 05/11/2020] [Indexed: 11/18/2022] Open
Abstract
Whereas nonlinear relationships between genes are acknowledged, there exist only a few methods for estimating nonlinear gene coexpression networks or gene regulatory networks (GCNs/GRNs) with common deficiencies. These methods often consider only pairwise associations between genes, and are, therefore, poorly capable of identifying higher-order regulatory patterns when multiple genes should be considered simultaneously. Another critical issue in current nonlinear GCN/GRN estimation approaches is that they consider linear and nonlinear dependencies at the same time in confounded form nonparametrically. This severely undermines the possibilities for nonlinear associations to be found, since the power of detecting nonlinear dependencies is lower compared to linear dependencies, and the sparsity-inducing procedures might favor linear relationships over nonlinear ones only due to small sample sizes. In this paper, we propose a method to estimate undirected nonlinear GCNs independently from the linear associations between genes based on a novel semiparametric neighborhood selection procedure capable of identifying complex nonlinear associations between genes. Simulation studies using the common DREAM3 and DREAM9 datasets show that the proposed method compares superiorly to the current nonlinear GCN/GRN estimation methods.
Collapse
Affiliation(s)
- Juho A J Kontio
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
| | - Marko J Rinta-Aho
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
- Infotech Oulu, University of Oulu, 90014, Finland
| |
Collapse
|
4
|
Jin Z, Kang J, Yu T. Missing value imputation for LC-MS metabolomics data by incorporating metabolic network and adduct ion relations. Bioinformatics 2019; 34:1555-1561. [PMID: 29272352 DOI: 10.1093/bioinformatics/btx816] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 12/19/2017] [Indexed: 12/20/2022] Open
Abstract
Motivation Metabolomics data generated from liquid chromatography-mass spectrometry platforms often contain missing values. Existing imputation methods do not consider underlying feature relations and the metabolic network information. As a result, the imputation results may not be optimal. Results We proposed an imputation algorithm that incorporates the existing metabolic network, adduct ion relations even for unknown compounds, as well as linear and nonlinear associations between feature intensities to build a feature-level network. The algorithm uses support vector regression for missing value imputation based on features in the neighborhood on the network. We compared our proposed method with methods being widely used. As judged by the normalized root mean squared error in real data-based simulations, our proposed methods can achieve better accuracy. Availability and implementation The R package is available at http://web1.sph.emory.edu/users/tyu8/MINMA. Contact jiankang@umich.edu or tianwei.yu@emory.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhuxuan Jin
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
5
|
Ji C, Lin S, Yao D, Li M, Chen W, Zheng S, Zhao Z. Identification of promising prognostic genes for relapsed acute lymphoblastic leukemia. Blood Cells Mol Dis 2019; 77:113-119. [PMID: 31030124 DOI: 10.1016/j.bcmd.2019.04.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 04/17/2019] [Indexed: 11/16/2022]
Abstract
PURPOSE The present study aimed to identify the molecular mechanism of acute lymphoblastic leukemia (ALL), and explore valuable prognostic biomarkers for relapsed ALL. METHODS Gene expression dataset including 59 samples from ALL survivals without recurrence (good group) and 114 samples from dead ALL patients died of recurrence (poor group) was downloaded from TCGA database. The differentially expressed genes (DEGs) were identified between good and poor groups, followed by pathway and functional enrichment analyses. Subsequently, logistic regression model and survival analysis were performed. RESULTS In total, 637 up- and 578 down-regulated DEGs were revealed between good and poor groups. These DEGs were mainly enriched in functions including transcription and pathways like focal adhesion. Genes including alpha-protein kinase 1 (ALPK1), zinc finger protein 695 (ZNF695), actinin alpha 4 (ACTN4), calreticulin (CALR), and F-Box and leucine rich repeat protein 5 (FBXL5) were outstanding in survival analysis. CONCLUSION Transcription and focal adhesion might play important roles in ALL progression. Furthermore, genes including ALPK1, ZNF695, ACTN4, CALR, and FBXL5 might be novel prognostic genes for relapsed ALL.
Collapse
Affiliation(s)
- Chai Ji
- Child Health Care Department, Children's Hospital Zhejiang University School of Medicine, Hangzhou, Zhejiang 310003, China
| | - Shengliang Lin
- Child Health Care Department, Children's Hospital Zhejiang University School of Medicine, Hangzhou, Zhejiang 310003, China
| | - Dan Yao
- Child Health Care Department, Children's Hospital Zhejiang University School of Medicine, Hangzhou, Zhejiang 310003, China
| | - Mingyan Li
- Child Health Care Department, Children's Hospital Zhejiang University School of Medicine, Hangzhou, Zhejiang 310003, China
| | - Weijun Chen
- Child Health Care Department, Children's Hospital Zhejiang University School of Medicine, Hangzhou, Zhejiang 310003, China
| | - Shuangshuang Zheng
- Child Health Care Department, Children's Hospital Zhejiang University School of Medicine, Hangzhou, Zhejiang 310003, China
| | - Zhengyan Zhao
- Child Health Care Department, Children's Hospital Zhejiang University School of Medicine, Hangzhou, Zhejiang 310003, China.
| |
Collapse
|
6
|
Yu T. Nonlinear variable selection with continuous outcome: A fully nonparametric incremental forward stagewise approach. Stat Anal Data Min 2018; 11:188-197. [DOI: 10.1002/sam.11381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health; Emory University; Atlanta Georgia
| |
Collapse
|
7
|
Li Y, Bai W, Zhang X. Identifying heterogeneous subtypes of gastric cancer and subtype‑specific subpaths of microRNA‑target pathways. Mol Med Rep 2017; 17:3583-3590. [PMID: 29286091 PMCID: PMC5802161 DOI: 10.3892/mmr.2017.8329] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 11/15/2017] [Indexed: 01/13/2023] Open
Abstract
The present study aimed to classify gastric cancer (GC) into subtypes and to screen the subtype-specific genes, their targeted microRNAs (miRNAs) and enriched pathways to explore the putative mechanism of each GC subtypes. The GSE13861 data set was downloaded from the Gene Expression Omnibus and used to screen differential expression genes (DEGs) in GC samples based on the detection of imbalanced differential signal algorithm. The specific genes in each subtype were identified with the cut-off criterion of U>0.04, pathway enrichment analysis was performed and the subtype-specific subpaths of miRNA-target pathway were determined. A total of 1,263 DEGs were identified in the primary gastric adenocarcinoma (PGD) samples, which were subsequently divided into four subtypes, according to the hierarchy cluster analysis. Identification of the subpaths of each subtype indicated that the subpath related to subtype 1 was miRNA (miR)-202/calcium voltage-gated channel subunit α1 (CACNA1E)/type II diabetes mellitus. The nuclear factor-κB signaling pathway was the most significantly specific pathway and subpath identified for subtype 2, which was regulated by miR-338-targeted suppression of C-C motif chemokine ligand 21 (CCL21). For subtype 3, significant related pathways included ubiquitin-mediated proteolysis and proteasome, and the important subpath was miR-146B/proteasome 26S subunit, non-ATPase 3 (PSMD3)/proteasome; focal adhesion was the significant pathway indicated for subtype 4, and the subpaths were miR-34A/vinculin (VCL)/focal adhesion and miR-34C/VCL/focal adhesion. In addition, Helicobacter pylori infection was higher in GC subtype 1 than in other subtypes. Specific genes, such as CACNA1E, CCL21, PSMD3 and VCL, may be used as potential feature genes to identify different subtypes of GC, and their associated subpaths may partially explain the pathogenetic mechanism of each GC subtype.
Collapse
Affiliation(s)
- Yuanhang Li
- Medical Department, Cancer Hospital of China Medical University, Shenyang, Liaoning 110042, P.R. China
| | - Weijun Bai
- Medical Department, Cancer Hospital of China Medical University, Shenyang, Liaoning 110042, P.R. China
| | - Xu Zhang
- Radiotherapy Department, Cancer Hospital of China Medical University, Shenyang, Liaoning 110042, P.R. China
| |
Collapse
|
8
|
Nonlinear Network Reconstruction from Gene Expression Data Using Marginal Dependencies Measured by DCOL. PLoS One 2016; 11:e0158247. [PMID: 27380516 PMCID: PMC4933395 DOI: 10.1371/journal.pone.0158247] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 06/13/2016] [Indexed: 12/29/2022] Open
Abstract
Reconstruction of networks from high-throughput expression data is an important tool to identify new regulatory relations. Given that nonlinear and complex relations exist between biological units, methods that can utilize nonlinear dependencies may yield insights that are not provided by methods using linear associations alone. We have previously developed a distance to measure predictive nonlinear relations, the Distance based on Conditional Ordered List (DCOL), which is sensitive and computationally efficient on large matrices. In this study, we explore the utility of DCOL in the reconstruction of networks, by combining it with local false discovery rate (lfdr)–based inference. We demonstrate in simulations that the new method named nlnet is effective in recovering hidden nonlinear modules. We also demonstrate its utility using a single cell RNA seq dataset. The method is available as an R package at https://cran.r-project.org/web/packages/nlnet.
Collapse
|
9
|
K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data. BIOMED RESEARCH INTERNATIONAL 2015; 2015:918954. [PMID: 26339652 PMCID: PMC4538770 DOI: 10.1155/2015/918954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 12/18/2014] [Indexed: 01/23/2023]
Abstract
With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering not only outperformed traditional linear K-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which K-profile clustering generated biologically meaningful results.
Collapse
|
10
|
AN FENGWEI, ZHANG ZHIQIANG, XIA MING, XING LIJUN. Subpath analysis of each subtype of head and neck cancer based on the regulatory relationship between miRNAs and biological pathways. Oncol Rep 2015; 34:1745-54. [DOI: 10.3892/or.2015.4150] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 05/25/2015] [Indexed: 11/06/2022] Open
|
11
|
Yin H, Hou X, Tao T, Lv X, Zhang L, Duan W. Neurite outgrowth resistance to rho kinase inhibitors in PC12 Adh cell. Cell Biol Int 2015; 39:563-76. [PMID: 25571866 DOI: 10.1002/cbin.10423] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2014] [Accepted: 12/26/2014] [Indexed: 01/21/2023]
Abstract
Rho kinase (ROCK) inhibitor is a promising agent for neural injury disorders, which mechanism is associated with neurite outgrowth. However, neurite outgrowth resistance occurred when PC12 Adh cell was treated with ROCK inhibitors for a longer time. PC12 Adh cells were treated with ROCK inhibitor Y27632 or NGF for different durations. Neurite outgrowth resistance occurred when PC12 Adh cell exposed to Y27632 (33 µM) for 3 or more days, but not happen when exposed to nerve growth factor (NGF, 100 ng/mL). The gene expression in the PC12 Adh cells treated with Y27632 (33 µM) or NGF (100 ng/mL) for 2 or 4 days was assayed by gene microarray, and the reliability of the results were confirmed by real-time RT-PCR. Cluster analysis proved that the gene expression profile of PC12 Adh cell treated with Y27632 for 4 days was different from that treated with Y27632 for 2 days and those treated with NGF for 2 and 4 days, respectively. Pathway analysis hinted that the neurite outgrowth resistance could be associated with up-regulation of inflammatory pathways, especially rno04610 (complement and coagulation cascades), and down-regulation of cell cycle pathways, especially rno04110.
Collapse
Affiliation(s)
- Hua Yin
- Key Laboratory of Molecular Biology for Sinomedicine, Yunnan University of Traditional Chinese Medicine, 1076, Yuhua Road, University City of Chenggong, Kunming, 650500, China
| | | | | | | | | | | |
Collapse
|