1
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
2
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2023:elad040. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
3
|
Nakulugamuwa Gamage H, Chetty M, Lim S, Hallinan J. MICFuzzy: A maximal information content based fuzzy approach for reconstructing genetic networks. PLoS One 2023; 18:e0288174. [PMID: 37418430 DOI: 10.1371/journal.pone.0288174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 06/21/2023] [Indexed: 07/09/2023] Open
Abstract
In systems biology, the accurate reconstruction of Gene Regulatory Networks (GRNs) is crucial since these networks can facilitate the solving of complex biological problems. Amongst the plethora of methods available for GRN reconstruction, information theory and fuzzy concepts-based methods have abiding popularity. However, most of these methods are not only complex, incurring a high computational burden, but they may also produce a high number of false positives, leading to inaccurate inferred networks. In this paper, we propose a novel hybrid fuzzy GRN inference model called MICFuzzy which involves the aggregation of the effects of Maximal Information Coefficient (MIC). This model has an information theory-based pre-processing stage, the output of which is applied as an input to the novel fuzzy model. In this preprocessing stage, the MIC component filters relevant genes for each target gene to significantly reduce the computational burden of the fuzzy model when selecting the regulatory genes from these filtered gene lists. The novel fuzzy model uses the regulatory effect of the identified activator-repressor gene pairs to determine target gene expression levels. This approach facilitates accurate network inference by generating a high number of true regulatory interactions while significantly reducing false regulatory predictions. The performance of MICFuzzy was evaluated using DREAM3 and DREAM4 challenge data, and the SOS real gene expression dataset. MICFuzzy outperformed the other state-of-the-art methods in terms of F-score, Matthews Correlation Coefficient, Structural Accuracy, and SS_mean, and outperformed most of them in terms of efficiency. MICFuzzy also had improved efficiency compared with the classical fuzzy model since the design of MICFuzzy leads to a reduction in combinatorial computation.
Collapse
Affiliation(s)
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Churchill, Victoria, Australia
| | - Suryani Lim
- Health Innovation and Transformation Centre, Federation University, Churchill, Victoria, Australia
| | | |
Collapse
|
4
|
Wu YH, Huang YA, Li JQ, You ZH, Hu PW, Hu L, Leung VCM, Du ZH. Knowledge graph embedding for profiling the interaction between transcription factors and their target genes. PLoS Comput Biol 2023; 19:e1011207. [PMID: 37339154 PMCID: PMC10313080 DOI: 10.1371/journal.pcbi.1011207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 06/30/2023] [Accepted: 05/23/2023] [Indexed: 06/22/2023] Open
Abstract
Interactions between transcription factor and target gene form the main part of gene regulation network in human, which are still complicating factors in biological research. Specifically, for nearly half of those interactions recorded in established database, their interaction types are yet to be confirmed. Although several computational methods exist to predict gene interactions and their type, there is still no method available to predict them solely based on topology information. To this end, we proposed here a graph-based prediction model called KGE-TGI and trained in a multi-task learning manner on a knowledge graph that we specially constructed for this problem. The KGE-TGI model relies on topology information rather than being driven by gene expression data. In this paper, we formulate the task of predicting interaction types of transcript factor and target genes as a multi-label classification problem for link types on a heterogeneous graph, coupled with solving another link prediction problem that is inherently related. We constructed a ground truth dataset as benchmark and evaluated the proposed method on it. As a result of the 5-fold cross experiments, the proposed method achieved average AUC values of 0.9654 and 0.9339 in the tasks of link prediction and link type classification, respectively. In addition, the results of a series of comparison experiments also prove that the introduction of knowledge information significantly benefits to the prediction and that our methodology achieve state-of-the-art performance in this problem.
Collapse
Affiliation(s)
- Yang-Han Wu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| | - Yu-An Huang
- School of Computer Science, Northwesterm Polytechnical University, Xi’an, Shaanxi, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| | - Zhu-Hong You
- School of Computer Science, Northwesterm Polytechnical University, Xi’an, Shaanxi, China
| | - Peng-Wei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Victor C. M. Leung
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| | - Zhi-Hua Du
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| |
Collapse
|
5
|
Schiffthaler B, van Zalen E, Serrano AR, Street NR, Delhomme N. Seiðr: Efficient calculation of robust ensemble gene networks. Heliyon 2023; 9:e16811. [PMID: 37313140 PMCID: PMC10258422 DOI: 10.1016/j.heliyon.2023.e16811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 05/22/2023] [Accepted: 05/29/2023] [Indexed: 06/15/2023] Open
Abstract
Gene regulatory and gene co-expression networks are powerful research tools for identifying biological signal within high-dimensional gene expression data. In recent years, research has focused on addressing shortcomings of these techniques with regard to the low signal-to-noise ratio, non-linear interactions and dataset dependent biases of published methods. Furthermore, it has been shown that aggregating networks from multiple methods provides improved results. Despite this, few useable and scalable software tools have been implemented to perform such best-practice analyses. Here, we present Seidr (stylized Seiðr), a software toolkit designed to assist scientists in gene regulatory and gene co-expression network inference. Seidr creates community networks to reduce algorithmic bias and utilizes noise corrected network backboning to prune noisy edges in the networks. Using benchmarks in real-world conditions across three eukaryotic model organisms, Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana, we show that individual algorithms are biased toward functional evidence for certain gene-gene interactions. We further demonstrate that the community network is less biased, providing robust performance across different standards and comparisons for the model organisms. Finally, we apply Seidr to a network of drought stress in Norway spruce (Picea abies (L.) H. Krast) as an example application in a non-model species. We demonstrate the use of a network inferred using Seidr for identifying key components, communities and suggesting gene function for non-annotated genes.
Collapse
Affiliation(s)
- Bastian Schiffthaler
- Department of Plant Physiology, Umea Plant Science Center, Umea University, Umea, Sweden
| | - Elena van Zalen
- Department of Plant Physiology, Umea Plant Science Center, Umea University, Umea, Sweden
| | - Alonso R. Serrano
- Department of Plant Physiology, Umea Plant Science Center, Swedish University of Agricultural Sciences, Umea, Sweden
| | - Nathaniel R. Street
- Department of Plant Physiology, Umea Plant Science Center, Umea University, Umea, Sweden
| | - Nicolas Delhomme
- Department of Plant Physiology, Umea Plant Science Center, Swedish University of Agricultural Sciences, Umea, Sweden
| |
Collapse
|
6
|
Jiang X, Liu K, Peng H, Fang J, Zhang A, Han Y, Zhang X. Comparative network analysis reveals the dynamics of organic acid diversity during fruit ripening in peach (Prunus persica L. Batsch). BMC PLANT BIOLOGY 2023; 23:16. [PMID: 36617558 PMCID: PMC9827700 DOI: 10.1186/s12870-023-04037-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 01/02/2023] [Indexed: 06/17/2023]
Abstract
BACKGROUND Organic acids are important components that determine the fruit flavor of peach (Prunus persica L. Batsch). However, the dynamics of organic acid diversity during fruit ripening and the key genes that modulate the organic acids metabolism remain largely unknown in this kind of fruit tree which yield ranks sixth in the world. RESULTS In this study, we used 3D transcriptome data containing three dimensions of information, namely time, phenotype and gene expression, from 5 different varieties of peach to construct gene co-expression networks throughout fruit ripening of peach. With the network inferred, the time-ordered network comparative analysis was performed to select high-acid specific gene co-expression network and then clarify the regulatory factors controlling organic acid accumulation. As a result, network modules related to organic acid synthesis and metabolism under high-acid and low-acid comparison conditions were identified for our following research. In addition, we obtained 20 candidate genes as regulatory factors related to organic acid metabolism in peach. CONCLUSIONS The study provides new insights into the dynamics of organic acid accumulation during fruit ripening, complements the results of classical co-expression network analysis and establishes a foundation for key genes discovery from time-series multiple species transcriptome data.
Collapse
Affiliation(s)
- Xiaohan Jiang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jing Fang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China
| | - Yuepeng Han
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
- Center of Economic Botany, Core Botanical Gardens, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, Hubei, China.
| |
Collapse
|
7
|
Ye Q, Guo NL. Inferencing Bulk Tumor and Single-Cell Multi-Omics Regulatory Networks for Discovery of Biomarkers and Therapeutic Targets. Cells 2022; 12:cells12010101. [PMID: 36611894 PMCID: PMC9818242 DOI: 10.3390/cells12010101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/28/2022] Open
Abstract
There are insufficient accurate biomarkers and effective therapeutic targets in current cancer treatment. Multi-omics regulatory networks in patient bulk tumors and single cells can shed light on molecular disease mechanisms. Integration of multi-omics data with large-scale patient electronic medical records (EMRs) can lead to the discovery of biomarkers and therapeutic targets. In this review, multi-omics data harmonization methods were introduced, and common approaches to molecular network inference were summarized. Our Prediction Logic Boolean Implication Networks (PLBINs) have advantages over other methods in constructing genome-scale multi-omics networks in bulk tumors and single cells in terms of computational efficiency, scalability, and accuracy. Based on the constructed multi-modal regulatory networks, graph theory network centrality metrics can be used in the prioritization of candidates for discovering biomarkers and therapeutic targets. Our approach to integrating multi-omics profiles in a patient cohort with large-scale patient EMRs such as the SEER-Medicare cancer registry combined with extensive external validation can identify potential biomarkers applicable in large patient populations. These methodologies form a conceptually innovative framework to analyze various available information from research laboratories and healthcare systems, accelerating the discovery of biomarkers and therapeutic targets to ultimately improve cancer patient survival outcomes.
Collapse
Affiliation(s)
- Qing Ye
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Department of Occupational and Environmental Health Sciences, School of Public Health, West Virginia University, Morgantown, WV 26506, USA
- Correspondence: ; Tel.: +1-304-293-6455
| |
Collapse
|
8
|
Jia Z, Zhang X. Accurate determination of causalities in gene regulatory networks by dissecting downstream target genes. Front Genet 2022; 13:923339. [PMID: 36568360 PMCID: PMC9768335 DOI: 10.3389/fgene.2022.923339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 11/08/2022] [Indexed: 12/12/2022] Open
Abstract
Accurate determination of causalities between genes is a challenge in the inference of gene regulatory networks (GRNs) from the gene expression profile. Although many methods have been developed for the reconstruction of GRNs, most of them are insufficient in determining causalities or regulatory directions. In this work, we present a novel method, namely, DDTG, to improve the accuracy of causality determination in GRN inference by dissecting downstream target genes. In the proposed method, the topology and hierarchy of GRNs are determined by mutual information and conditional mutual information, and the regulatory directions of GRNs are determined by Taylor formula-based regression. In addition, indirect interactions are removed with the sparseness of the network topology to improve the accuracy of network inference. The method is validated on the benchmark GRNs from DREAM3 and DREAM4 challenges. The results demonstrate the superior performance of the DDTG method on causality determination of GRNs compared to some popular GRN inference methods. This work provides a useful tool to infer the causal gene regulatory network.
Collapse
Affiliation(s)
- Zhigang Jia
- School of Mathematics and Statistics, Xinyang Normal University, Xinyang, China,Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China,*Correspondence: Xiujun Zhang,
| |
Collapse
|
9
|
Lei J, Cai Z, He X, Zheng W, Liu J. An approach of gene regulatory network construction using mixed entropy optimizing context-related likelihood mutual information. Bioinformatics 2022; 39:6808612. [PMID: 36342190 PMCID: PMC9805593 DOI: 10.1093/bioinformatics/btac717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 09/18/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION The question of how to construct gene regulatory networks has long been a focus of biological research. Mutual information can be used to measure nonlinear relationships, and it has been widely used in the construction of gene regulatory networks. However, this method cannot measure indirect regulatory relationships under the influence of multiple genes, which reduces the accuracy of inferring gene regulatory networks. APPROACH This work proposes a method for constructing gene regulatory networks based on mixed entropy optimizing context-related likelihood mutual information (MEOMI). First, two entropy estimators were combined to calculate the mutual information between genes. Then, distribution optimization was performed using a context-related likelihood algorithm to eliminate some indirect regulatory relationships and obtain the initial gene regulatory network. To obtain the complex interaction between genes and eliminate redundant edges in the network, the initial gene regulatory network was further optimized by calculating the conditional mutual inclusive information (CMI2) between gene pairs under the influence of multiple genes. The network was iteratively updated to reduce the impact of mutual information on the overestimation of the direct regulatory intensity. RESULTS The experimental results show that the MEOMI method performed better than several other kinds of gene network construction methods on DREAM challenge simulated datasets (DREAM3 and DREAM5), three real Escherichia coli datasets (E.coli SOS pathway network, E.coli SOS DNA repair network and E.coli community network) and two human datasets. AVAILABILITY AND IMPLEMENTATION Source code and dataset are available at https://github.com/Dalei-Dalei/MEOMI/ and http://122.205.95.139/MEOMI/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimeng Lei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zongheng Cai
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi He
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wanting Zheng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | | |
Collapse
|
10
|
Abstract
Maximal information coefficient (MIC) explores the associations between pairwise variables in complex relationships. It approaches the correlation by optimized partition on the axis. However, when the relationships meet special noise, MIC may overestimate the correlated value, which leads to the misidentification of the relationship without noiseless. In this article, a novel method of weighted information coefficient mean (WICM) is proposed to detect unbiased associations in large data sets. First, we mathematically analyze the cause of giving an abnormal correlation value to a noisy relationship. Then, the WICM is presented in two core steps. One is to detect the potential overestimation from the relationships with high value, and the other is to rectify the overestimation by calculating information coefficient mean instead of just selecting the maximum element in the characteristic matrix. Finally, experiments in functional relationships and real-world data relationships show that the overestimation can be solved by WICM with both feasibility and effectiveness.
Collapse
Affiliation(s)
- Chuanlu Liu
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Shuliang Wang
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Institute of E-Government, Beijing Institute of Technology, Beijing, China
| | - Hanning Yuan
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Xiaojia Liu
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
11
|
Jiang X, Zhang X. RSNET: inferring gene regulatory networks by a redundancy silencing and network enhancement technique. BMC Bioinformatics 2022; 23:165. [PMID: 35524190 PMCID: PMC9074326 DOI: 10.1186/s12859-022-04696-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 04/25/2022] [Indexed: 11/29/2022] Open
Abstract
Background Current gene regulatory network (GRN) inference methods are notorious for a great number of indirect interactions hidden in the predictions. Filtering out the indirect interactions from direct ones remains an important challenge in the reconstruction of GRNs. To address this issue, we developed a redundancy silencing and network enhancement technique (RSNET) for inferring GRNs. Results To assess the performance of RSNET method, we implemented the experiments on several gold-standard networks by using simulation study, DREAM challenge dataset and Escherichia coli network. The results show that RSNET method performed better than the compared methods in sensitivity and accuracy. As a case of study, we used RSNET to construct functional GRN for apple fruit ripening from gene expression data. Conclusions In the proposed method, the redundant interactions including weak and indirect connections are silenced by recursive optimization adaptively, and the highly dependent nodes are constrained in the model to keep the real interactions. This study provides a useful tool for inferring clean networks. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04696-w.
Collapse
Affiliation(s)
- Xiaohan Jiang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China. .,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.
| |
Collapse
|
12
|
Yadav AK, Shukla R, Singh TR. Topological parameters, patterns, and motifs in biological networks. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00012-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
13
|
Wang T, Zhang X. Genome-wide dynamic network analysis reveals the potential genes for MeJA-induced growth-to-defense transition. BMC PLANT BIOLOGY 2021; 21:450. [PMID: 34615468 PMCID: PMC8493714 DOI: 10.1186/s12870-021-03185-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 08/23/2021] [Indexed: 05/13/2023]
Abstract
BACKGROUND Methyl jasmonate (MeJA), which has been identified as a lipid-derived stress hormone, mediates plant resistance to biotic/abiotic stress. Understanding MeJA-induced plant defense provides insight into how they responding to environmental stimuli. RESULT In this work, the dynamic network analysis method was used to quantitatively identify the tipping point of growth-to-defense transition and detect the associated genes. As a result, 146 genes were detected as dynamic network biomarker (DNB) members and the critical defense transition was identified based on dense time-series RNA-seq data of MeJA-treated Arabidopsis thaliana. The GO functional analysis showed that these DNB genes were significantly enriched in defense terms. The network analysis between DNB genes and differentially expressed genes showed that the hub genes including SYP121, SYP122, WRKY33 and MPK11 play a vital role in plant growth-to-defense transition. CONCLUSIONS Based on the dynamic network analysis of MeJA-induced plant resistance, we provide an important guideline for understanding the growth-to-defense transition of plants' response to environment stimuli. This study also provides a database with the key genes of plant defense induced by MeJA.
Collapse
Affiliation(s)
- Tengfei Wang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, 430074, Wuhan, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, 430074, Wuhan, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Xiujun Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, 430074, Wuhan, China.
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, 430074, Wuhan, China.
| |
Collapse
|
14
|
Liu W, Jiang Y, Peng L, Sun X, Gan W, Zhao Q, Tang H. Inferring Gene Regulatory Networks Using the Improved Markov Blanket Discovery Algorithm. Interdiscip Sci 2021; 14:168-181. [PMID: 34495484 DOI: 10.1007/s12539-021-00478-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 08/22/2021] [Accepted: 08/24/2021] [Indexed: 11/26/2022]
Abstract
Inferring gene regulatory networks (GRNs) from microarray data can help us understand the mechanisms of life and eventually develop effective therapies. Currently, many computational methods have been used in inferring GRNs. However, owing to high-dimensional data and small samples, these methods often tend to introduce redundant regulatory relationships. Therefore, a novel network inference method based on the improved Markov blanket discovery algorithm, IMBDANET, is proposed to infer GRNs. Specifically, for each target gene, data processing inequality was applied to the Markov blanket discovery algorithm for the accurate differentiation of direct regulatory genes from indirect regulatory genes. Finally, direct regulatory genes were used in constructing GRNs, and the network structure was optimized according to the importance degree score. Experimental results on six public network datasets show that the proposed method can be effectively used to infer GRNs.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Yi Jiang
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
| | - Xingen Sun
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Wenqing Gan
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| | - Huanrong Tang
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.
| |
Collapse
|
15
|
Zhang Q, Wang D, Han K, Huang DS. Predicting TF-DNA Binding Motifs from ChIP-seq Datasets Using the Bag-Based Classifier Combined With a Multi-Fold Learning Scheme. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1743-1751. [PMID: 32946398 DOI: 10.1109/tcbb.2020.3025007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The rapid development of high-throughput sequencing technology provides unique opportunities for studying of transcription factor binding sites, but also brings new computational challenges. Recently, a series of discriminative motif discovery (DMD) methods have been proposed and offer promising solutions for addressing these challenges. However, because of the huge computation cost, most of them have to choose approximate schemes that either sacrifice the accuracy of motif representation or tune motif parameter indirectly. In this paper, we propose a bag-based classifier combined with a multi-fold learning scheme (BCMF) to discover motifs from ChIP-seq datasets. First, BCMF formulates input sequences as a labeled bag naturally. Then, a bag-based classifier, combining with a bag feature extracting strategy, is applied to construct the objective function, and a multi-fold learning scheme is used to solve it. Compared with the existing DMD tools, BCMF features three improvements: 1) Learning position weight matrix (PWM) directly in a continuous space; 2) Proposing to represent a positive bag with a feature fused by its k "most positive" patterns. 3) Applying a more advanced learning scheme. The experimental results on 134 ChIP-seq datasets show that BCMF substantially outperforms existing DMD methods (including DREME, HOMER, XXmotif, motifRG, EDCOD and our previous work).
Collapse
|
16
|
Towle-Miller LM, Miecznikowski JC, Zhang F, Tritchler DL. SuMO-Fil: Supervised multi-omic filtering prior to performing network analysis. PLoS One 2021; 16:e0255579. [PMID: 34343218 PMCID: PMC8330944 DOI: 10.1371/journal.pone.0255579] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 07/20/2021] [Indexed: 11/18/2022] Open
Abstract
Multi-omic analyses that integrate many high-dimensional datasets often present significant deficiencies in statistical power and require time consuming computations to execute the analytical methods. We present SuMO-Fil to remedy against these issues which is a pre-processing method for Supervised Multi-Omic Filtering that removes variables or features considered to be irrelevant noise. SuMO-Fil is intended to be performed prior to downstream analyses that detect supervised gene networks in sparse settings. We accomplish this by implementing variable filters based on low similarity across the datasets in conjunction with low similarity with the outcome. This approach can improve accuracy, as well as reduce run times for a variety of computationally expensive downstream analyses. This method has applications in a setting where the downstream analysis may include sparse canonical correlation analysis. Filtering methods specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. The SuMO-Fil method performs favorably by eliminating non-network features while maintaining important biological signal under a variety of different signal settings as compared to popular filtering techniques based on low means or low variances. We show that the speed and accuracy of methods such as supervised sparse canonical correlation are increased after using SuMO-Fil, thus greatly improving the scalability of these approaches.
Collapse
Affiliation(s)
- Lorin M. Towle-Miller
- Department of Biostatistics, University at Buffalo, Buffalo, NY, United States of America
| | | | - Fan Zhang
- Department of Biostatistics, University at Buffalo, Buffalo, NY, United States of America
| | - David L. Tritchler
- Department of Biostatistics, University at Buffalo, Buffalo, NY, United States of America
- Biostatistics Division, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
17
|
Tian Y, Su X, Su Y, Zhang X. EMODMI: A Multi-Objective Optimization Based Method to Identify Disease Modules. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2020.3014923] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
18
|
Pyne S, Anand A. Rapid Reconstruction of Time-varying Gene Regulatory Networks with Limited Main Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1608-1619. [PMID: 31613774 DOI: 10.1109/tcbb.2019.2946826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Reconstruction of time-varying gene regulatory networks underlying a time-series gene expression data is a fundamental challenge in the computational systems biology. The challenge increases multi-fold if the target networks need to be constructed for hundreds to thousands of genes. There have been constant efforts to design an algorithm that can perform the reconstruction task correctly as well as can scale efficiently (with respect to both time and memory) to such a large number of genes. However, the existing algorithms either do not offer time-efficiency, or they offer it at other costs - memory-inefficiency or imposition of a constraint, known as the 'smoothly time-varying assumption'. In this article, two novel algorithms - 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators - which is Light on memory' (TGS-Lite) and 'TGS-Lite Plus' (TGS-Lite+) - are proposed that are time-efficient, memory-efficient and do not impose the smoothly time-varying assumption. Additionally, they offer state-of-the-art reconstruction correctness as demonstrated with three benchmark datasets. Source Code: https://github.com/sap01/TGS-Lite-supplem/tree/master/sourcecode.
Collapse
|
19
|
Zhou Y, Wang S, Yan H, Pang B, Zhang X, Pang L, Wang Y, Xu J, Hu J, Lan Y, Ping Y. Identifying Key Somatic Copy Number Alterations Driving Dysregulation of Cancer Hallmarks in Lower-Grade Glioma. Front Genet 2021; 12:654736. [PMID: 34163522 PMCID: PMC8215700 DOI: 10.3389/fgene.2021.654736] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Accepted: 04/26/2021] [Indexed: 01/17/2023] Open
Abstract
Somatic copy-number alterations (SCNAs) are major contributors to cancer development that are pervasive and highly heterogeneous in human cancers. However, the driver roles of SCNAs in cancer are insufficiently characterized. We combined network propagation and linear regression models to design an integrative strategy to identify driver SCNAs and dissect the functional roles of SCNAs by integrating profiles of copy number and gene expression in lower-grade glioma (LGG). We applied our strategy to 511 LGG patients and identified 98 driver genes that dysregulated 29 cancer hallmark signatures, forming 143 active gene-hallmark pairs. We found that these active gene-hallmark pairs could stratify LGG patients into four subtypes with significantly different survival times. The two new subtypes with similar poorest prognoses were driven by two different gene sets (one including EGFR, CDKN2A, CDKN2B, INFA8, and INFA5, and the other including CDK4, AVIL, and DTX3), respectively. The SCNAs of the two gene sets could disorder the same cancer hallmark signature in a mutually exclusive manner (including E2F_TARGETS and G2M_CHECKPOINT). Compared with previous methods, our strategy could not only capture the known cancer genes and directly dissect the functional roles of their SCNAs in LGG, but also discover the functions of new driver genes in LGG, such as IFNA5, IFNA8, and DTX3. Additionally, our method can be applied to a variety of cancer types to explore the pathogenesis of driver SCNAs and improve the treatment and diagnosis of cancer.
Collapse
Affiliation(s)
- Yao Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shuai Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Haoteng Yan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Bo Pang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xinxin Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lin Pang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yihan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jinyuan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jing Hu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yujia Lan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yanyan Ping
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
20
|
He W, Tang J, Zou Q, Guo F. MMFGRN: a multi-source multi-model fusion method for gene regulatory network reconstruction. Brief Bioinform 2021; 22:6261916. [PMID: 33939795 DOI: 10.1093/bib/bbab166] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/08/2021] [Accepted: 04/08/2021] [Indexed: 01/05/2023] Open
Abstract
Lots of biological processes are controlled by gene regulatory networks (GRNs), such as growth and differentiation of cells, occurrence and development of the diseases. Therefore, it is important to persistently concentrate on the research of GRN. The determination of the gene-gene relationships from gene expression data is a complex issue. Since it is difficult to efficiently obtain the regularity behind the gene-gene relationship by only relying on biochemical experimental methods, thus various computational methods have been used to construct GRNs, and some achievements have been made. In this paper, we propose a novel method MMFGRN (for "Multi-source Multi-model Fusion for Gene Regulatory Network reconstruction") to reconstruct the GRN. In order to make full use of the limited datasets and explore the potential regulatory relationships contained in different data types, we construct the MMFGRN model from three perspectives: single time series data model, single steady-data model and time series and steady-data joint model. And, we utilize the weighted fusion strategy to get the final global regulatory link ranking. Finally, MMFGRN model yields the best performance on the DREAM4 InSilico_Size10 data, outperforming other popular inference algorithms, with an overall area under receiver operating characteristic score of 0.909 and area under precision-recall (AUPR) curves score of 0.770 on the 10-gene network. Additionally, as the network scale increases, our method also has certain advantages with an overall AUPR score of 0.335 on the DREAM4 InSilico_Size100 data. These results demonstrate the good robustness of MMFGRN on different scales of networks. At the same time, the integration strategy proposed in this paper provides a new idea for the reconstruction of the biological network model without prior knowledge, which can help researchers to decipher the elusive mechanism of life.
Collapse
Affiliation(s)
| | | | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
21
|
Mahmoodi SH, Aghdam R, Eslahchi C. An order independent algorithm for inferring gene regulatory network using quantile value for conditional independence tests. Sci Rep 2021; 11:7605. [PMID: 33828122 PMCID: PMC8027014 DOI: 10.1038/s41598-021-87074-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 03/24/2021] [Indexed: 10/31/2022] Open
Abstract
In recent years, due to the difficulty and inefficiency of experimental methods, numerous computational methods have been introduced for inferring the structure of Gene Regulatory Networks (GRNs). The Path Consistency (PC) algorithm is one of the popular methods to infer the structure of GRNs. However, this group of methods still has limitations and there is a potential for improvements in this field. For example, the PC-based algorithms are still sensitive to the ordering of nodes i.e. different node orders results in different network structures. The second is that the networks inferred by these methods are highly dependent on the threshold used for independence testing. Also, it is still a challenge to select the set of conditional genes in an optimal way, which affects the performance and computation complexity of the PC-based algorithm. We introduce a novel algorithm, namely Order Independent PC-based algorithm using Quantile value (OIPCQ), which improves the accuracy of the learning process of GRNs and solves the order dependency issue. The quantile-based thresholds are considered for different orders of CMI tests. For conditional gene selection, we consider the paths between genes with length equal or greater than 2 while other well-known PC-based methods only consider the paths of length 2. We applied OIPCQ on the various networks of the DREAM3 and DREAM4 in silico challenges. As a real-world case study, we used OIPCQ to reconstruct SOS DNA network obtained from Escherichia coli and GRN for acute myeloid leukemia based on the RNA sequencing data from The Cancer Genome Atlas. The results show that OIPCQ produces the same network structure for all the permutations of the genes and improves the resulted GRN through accurately quantifying the causal regulation strength in comparison with other well-known PC-based methods. According to the GRN constructed by OIPCQ, for acute myeloid leukemia, two regulators BCLAF1 and NRSF reported previously are significantly important. However, the highest degree nodes in this GRN are ZBTB7A and PU1 which play a significant role in cancer, especially in leukemia. OIPCQ is freely accessible at https://github.com/haammim/OIPCQ-and-OIPCQ2 .
Collapse
Affiliation(s)
- Sayyed Hadi Mahmoodi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Rosa Aghdam
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
22
|
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics 2021; 37:2423-2431. [PMID: 33576787 DOI: 10.1093/bioinformatics/btab099] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/18/2021] [Accepted: 02/10/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. RESULTS Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. AVAILABILITY The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuelei Zhang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| |
Collapse
|
23
|
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22:6128842. [PMID: 33539514 DOI: 10.1093/bib/bbab009] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.
Collapse
Affiliation(s)
- Mengyuan Zhao
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- University of South Carolina, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
24
|
Sun Y, Li C, Pang S, Yao Q, Chen L, Li Y, Zeng R. Kinase-substrate Edge Biomarkers Provide a More Accurate Prognostic Prediction in ER-negative Breast Cancer. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:525-538. [PMID: 33450402 PMCID: PMC8377385 DOI: 10.1016/j.gpb.2019.11.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Revised: 08/27/2019] [Accepted: 11/11/2019] [Indexed: 11/19/2022]
Abstract
The estrogen receptor (ER)-negative breast cancer subtype is aggressive with few treatment options available. To identify specific prognostic factors for ER-negative breast cancer, this study included 705,729 and 1034 breast invasive cancer patients from the Surveillance, Epidemiology, and End Results (SEER) and The Cancer Genome Atlas (TCGA) databases, respectively. To identify key differential kinase-substrate node and edge biomarkers between ER-negative and ER-positive breast cancer patients, we adopted a network-based method using correlation coefficients between molecular pairs in the kinase regulatory network. Integrated analysis of the clinical and molecular data revealed the significant prognostic power of kinase-substrate node and edge features for both subtypes of breast cancer. Two promising kinase-substrate edge features, CSNK1A1-NFATC3 and SRC-OCLN, were identified for more accurate prognostic prediction in ER-negative breast cancer patients.
Collapse
Affiliation(s)
- Yidi Sun
- CAS Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Chen Li
- CAS Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Shichao Pang
- Deptartment of Statistics, School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qianlan Yao
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Luonan Chen
- CAS Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Department of Life Sciences, ShanghaiTech University, Shanghai 201210, China; CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.
| | - Yixue Li
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China; Department of Life Sciences, ShanghaiTech University, Shanghai 201210, China; Bio-Med Big Data Center, Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 200032, China; Shanghai Center for Bioinformation Technology, Shanghai Academy of Science & Technology, Shanghai 201203, China.
| | - Rong Zeng
- CAS Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Department of Life Sciences, ShanghaiTech University, Shanghai 201210, China.
| |
Collapse
|
25
|
Malekpour SA, Alizad-Rahvar AR, Sadeghi M. LogicNet: probabilistic continuous logics in reconstructing gene regulatory networks. BMC Bioinformatics 2020; 21:318. [PMID: 32690031 PMCID: PMC7372900 DOI: 10.1186/s12859-020-03651-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 07/10/2020] [Indexed: 11/10/2022] Open
Abstract
Background Gene Regulatory Networks (GRNs) have been previously studied by using Boolean/multi-state logics. While the gene expression values are usually scaled into the range [0, 1], these GRN inference methods apply a threshold to discretize the data, resulting in missing information. Most of studies apply fuzzy logics to infer the logical gene-gene interactions from continuous data. However, all these approaches require an a priori known network structure. Results Here, by introducing a new probabilistic logic for continuous data, we propose a novel logic-based approach (called the LogicNet) for the simultaneous reconstruction of the GRN structure and identification of the logics among the regulatory genes, from the continuous gene expression data. In contrast to the previous approaches, the LogicNet does not require an a priori known network structure to infer the logics. The proposed probabilistic logic is superior to the existing fuzzy logics and is more relevant to the biological contexts than the fuzzy logics. The performance of the LogicNet is superior to that of several Mutual Information-based and regression-based tools for reconstructing GRNs. Conclusions The LogicNet reconstructs GRNs and logic functions without requiring prior knowledge of the network structure. Moreover, in another application, the LogicNet can be applied for logic function detection from the known regulatory genes-target interactions. We also conclude that computational modeling of the logical interactions among the regulatory genes significantly improves the GRN reconstruction accuracy.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Amir Reza Alizad-Rahvar
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
26
|
Chen X, Gu J, Neuwald AF, Hilakivi-Clarke L, Clarke R, Xuan J. BICORN: An R package for integrative inference of de novo cis-regulatory modules. Sci Rep 2020; 10:7960. [PMID: 32409786 PMCID: PMC7224214 DOI: 10.1038/s41598-020-63043-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 01/15/2020] [Indexed: 12/18/2022] Open
Abstract
Genome-wide transcription factor (TF) binding signal analyses reveal co-localization of TF binding sites based on inferred cis-regulatory modules (CRMs). CRMs play a key role in understanding the cooperation of multiple TFs under specific conditions. However, the functions of CRMs and their effects on nearby gene transcription are highly dynamic and context-specific and therefore are challenging to characterize. BICORN (Bayesian Inference of COoperative Regulatory Network) builds a hierarchical Bayesian model and infers context-specific CRMs based on TF-gene binding events and gene expression data for a particular cell type. BICORN automatically searches for a list of candidate CRMs based on the input TF bindings at regulatory regions associated with genes of interest. Applying Gibbs sampling, BICORN iteratively estimates model parameters of CRMs, TF activities, and corresponding regulation on gene transcription, which it models as a sparse network of functional CRMs regulating target genes. The BICORN package is implemented in R (version 3.4 or later) and is publicly available on the CRAN server at https://cran.r-project.org/web/packages/BICORN/index.html.
Collapse
Affiliation(s)
- Xi Chen
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 North Glebe Road, Arlington, VA, 22203, USA
| | - Jinghua Gu
- Baylor Research Institute, 3310 Live Oak St, Dallas, TX, 75204, USA
| | - Andrew F Neuwald
- Institute for Genome Sciences and Department Biochemistry & Molecular Biology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Leena Hilakivi-Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, 3970 Reservoir Road, Washington, DC, 20057, USA
| | - Robert Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, 3970 Reservoir Road, Washington, DC, 20057, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 North Glebe Road, Arlington, VA, 22203, USA.
| |
Collapse
|
27
|
Specific functions for Mediator complex subunits from different modules in the transcriptional response of Arabidopsis thaliana to abiotic stress. Sci Rep 2020; 10:5073. [PMID: 32193425 PMCID: PMC7081235 DOI: 10.1038/s41598-020-61758-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 02/26/2020] [Indexed: 11/22/2022] Open
Abstract
Adverse environmental conditions are detrimental to plant growth and development. Acclimation to abiotic stress conditions involves activation of signaling pathways which often results in changes in gene expression via networks of transcription factors (TFs). Mediator is a highly conserved co-regulator complex and an essential component of the transcriptional machinery in eukaryotes. Some Mediator subunits have been implicated in stress-responsive signaling pathways; however, much remains unknown regarding the role of plant Mediator in abiotic stress responses. Here, we use RNA-seq to analyze the transcriptional response of Arabidopsis thaliana to heat, cold and salt stress conditions. We identify a set of common abiotic stress regulons and describe the sequential and combinatorial nature of TFs involved in their transcriptional regulation. Furthermore, we identify stress-specific roles for the Mediator subunits MED9, MED16, MED18 and CDK8, and putative TFs connecting them to different stress signaling pathways. Our data also indicate different modes of action for subunits or modules of Mediator at the same gene loci, including a co-repressor function for MED16 prior to stress. These results illuminate a poorly understood but important player in the transcriptional response of plants to abiotic stress and identify target genes and mechanisms as a prelude to further biochemical characterization.
Collapse
|
28
|
Tang H, Tang Y, Zeng T, Chen L. Gene expression analysis reveals the tipping points during infant brain development for human and chimpanzee. BMC Genomics 2020; 21:74. [PMID: 32138647 PMCID: PMC7057467 DOI: 10.1186/s12864-020-6465-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2020] [Accepted: 01/08/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Postpartum developmental delay has been proposed as an important phenotype of human evolution which contributes to many human-specific features including the increase in brain size and the advanced human-specific cognitive traits. However, the biological processes and molecular functions underlying early brain development still remain poorly understood, especially in human and primates. RESULTS In this paper, we comparatively and extensively studied dorsolarteral prefrontal cortex expression data in human and chimpanzee to investigate the critical processes or biological events during early brain development at a molecular level. By using the dynamic network biomarker (DNB) model, we found that there are tipping points around 3 months and 1 month, which are crucial periods in infant human and chimpanzee brain development, respectively. In particular, we shown that the human postnatal development and the corresponding expression changes are delayed 3 times relative to chimpanzee, and we also revealed that many common biological processes are highly involved in those critical periods for both human and chimpanzee, e.g., physiological system development functions, nervous system development, organismal development and tissue morphology. These findings support that the maximal rates of brain growth will be in those two critical periods for respective human and primates. In addition, different from chimpanzee, our analytic results also showed that human can further develop a number of advanced behavior functions around this tipping point (around 3 months), such as the ability of learning and memory. CONCLUSION This work not only provides biological insights into primate brain development at a molecular level but also opens a new way to study the criticality of nonlinear biological processes based on the observed omics data.
Collapse
Affiliation(s)
- Hui Tang
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, 200031 China
| | - Ying Tang
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, 200031 China
| | - Tao Zeng
- Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, 201210 China
| | - Luonan Chen
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, 200031 China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223 China
- Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, 201210 China
| |
Collapse
|
29
|
Wang H, Lian Y, Li C, Ma Y, Yan Z, Dong C. SIN-KNO: A method of gene regulatory network inference using single-cell transcription and gene knockout data. J Bioinform Comput Biol 2020; 17:1950035. [PMID: 32019417 DOI: 10.1142/s0219720019500355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
As a tool of interpreting and analyzing genetic data, gene regulatory network (GRN) could reveal regulatory relationships between genes, proteins, and small molecules, as well as understand physiological activities and functions within biological cells, interact in pathways, and how to make changes in the organism. Traditional GRN research focuses on the analysis of the regulatory relationships through the average of cellular gene expressions. These methods are difficult to identify the cell heterogeneity of gene expression. Existing methods for inferring GRN using single-cell transcriptional data lack expression information when genes reach steady state, and the high dimensionality of single-cell data leads to high temporal and spatial complexity of the algorithm. In order to solve the problem in traditional GRN inference methods, including the lack of cellular heterogeneity information, single-cell data complexity and lack of steady-state information, we propose a method for GRN inference using single-cell transcription and gene knockout data, called SINgle-cell transcription data-KNOckout data (SIN-KNO), which focuses on combining dynamic and steady-state information of regulatory relationship contained in gene expression. Capturing cell heterogeneity information could help understand the gene expression difference in different cells. So, we could observe gene expression changes more accurately. Gene knockout data could observe the gene expression levels at steady-state of all other genes when one gene is knockout. Classifying the genes before analyzing the single-cell data could determine a large number of non-existent regulation, greatly reducing the number of regulation required for inference. In order to show the efficiency, the proposed method has been compared with several typical methods in this area including GENIE3, JUMP3, and SINCERITIES. The results of the evaluation indicate that the proposed method can analyze the diversified information contained in the two types of data, establish a more accurate gene regulation network, and improve the computational efficiency. The method provides a new thinking for dealing with large datasets and high computational complexity of single-cell data in the GRN inference.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Yuanyuan Lian
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Chun Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Yue Ma
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Zhiliang Yan
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Chunlin Dong
- Dryland Agriculture Research Center, Shanxi Academy of Agricultural Sciences, Taiyuan, Shanxi, China
| |
Collapse
|
30
|
Pyne S, Kumar AR, Anand A. Rapid Reconstruction of Time-Varying Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:278-291. [PMID: 30072338 DOI: 10.1109/tcbb.2018.2861698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Rapid advancements in high-throughput technologies have resulted in genome-scale time series datasets. Uncovering the temporal sequence of gene regulatory events, in the form of time-varying gene regulatory networks (GRNs), demands computationally fast, accurate, and scalable algorithms. The existing algorithms can be divided into two categories: ones that are time-intensive and hence unscalable; and others that impose structural constraints to become scalable. In this paper, a novel algorithm, namely 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators' (TGS), is proposed. TGS is time-efficient and does not impose any structural constraints. Moreover, it provides such flexibility and time-efficiency, without losing its accuracy. TGS consistently outperforms the state-of-the-art algorithms in true positive detection, on three benchmark synthetic datasets. However, TGS does not perform as well in false positive rejection. To mitigate this issue, TGS+ is proposed. TGS+ demonstrates competitive false positive rejection power, while maintaining the superior speed and true positive detection power of TGS. Nevertheless, the main memory requirements of both TGS variants grow exponentially with the number of genes, which they tackle by restricting the maximum number of regulators for each gene. Relaxing this restriction remains a challenge as the actual number of regulators is not known a priori.
Collapse
|
31
|
A Stable, Unified Density Controlled Memetic Algorithm for Gene Regulatory Network Reconstruction Based on Sparse Fuzzy Cognitive Maps. Neural Process Lett 2019. [DOI: 10.1007/s11063-019-10056-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
32
|
Fan A, Wang H, Xiang H, Zou X. Inferring Large-Scale Gene Regulatory Networks Using a Randomized Algorithm Based on Singular Value Decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1997-2008. [PMID: 29993839 DOI: 10.1109/tcbb.2018.2825446] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Reconstructing large-scale gene regulatory networks (GRNs) is a challenging problem in the field of computational biology. Various methods for inferring GRNs have been developed, but they fail to accurately infer GRNs with a large number of genes. Additionally, the existing evaluation indexes for evaluating the constructed networks have obvious disadvantages because GRNs in most biological systems are sparse. In this paper, we develop a new method for inferring GRNs based on randomized singular value decomposition (RSVD) and ordinary differential equation (ODE)-based optimization, denoted as IGRSVD, from large-scale time series data with noise. The three major contributions of this paper are as follows. First, the IGRSVD algorithm uses the RSVD to handle the noise and reduce the original large-scale data into small-scale problems. Second, we propose two new evaluated indexes, the expected value accuracy (EVA) and the expected value error (EVE), to evaluate the performance of inferred networks by considering the sparse features in the network. Finally, the proposed IGRSVD algorithm is compared with the existing SVD algorithm and PCA_CMI algorithm using four subsets from E. coli and datasets from DREAM challenge. The experimental results demonstrate that the IGRSVD algorithm is effective and more suitable for reconstructing large-scale networks.
Collapse
|
33
|
Muldoon JJ, Yu JS, Fassia MK, Bagheri N. Network inference performance complexity: a consequence of topological, experimental and algorithmic determinants. Bioinformatics 2019; 35:3421-3432. [PMID: 30932143 PMCID: PMC6748731 DOI: 10.1093/bioinformatics/btz105] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 01/24/2019] [Accepted: 02/11/2019] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Network inference algorithms aim to uncover key regulatory interactions governing cellular decision-making, disease progression and therapeutic interventions. Having an accurate blueprint of this regulation is essential for understanding and controlling cell behavior. However, the utility and impact of these approaches are limited because the ways in which various factors shape inference outcomes remain largely unknown. RESULTS We identify and systematically evaluate determinants of performance-including network properties, experimental design choices and data processing-by developing new metrics that quantify confidence across algorithms in comparable terms. We conducted a multifactorial analysis that demonstrates how stimulus target, regulatory kinetics, induction and resolution dynamics, and noise differentially impact widely used algorithms in significant and previously unrecognized ways. The results show how even if high-quality data are paired with high-performing algorithms, inferred models are sometimes susceptible to giving misleading conclusions. Lastly, we validate these findings and the utility of the confidence metrics using realistic in silico gene regulatory networks. This new characterization approach provides a way to more rigorously interpret how algorithms infer regulation from biological datasets. AVAILABILITY AND IMPLEMENTATION Code is available at http://github.com/bagherilab/networkinference/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joseph J Muldoon
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
| | - Jessica S Yu
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Mohammad-Kasim Fassia
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Department of Biomedical Engineering, Northwestern University, Evanston, IL, USA
| | - Neda Bagheri
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
| |
Collapse
|
34
|
Zhang W, Zhang F, Zhang J, Wang N. Hierarchical parameter estimation of GRN based on topological analysis. IET Syst Biol 2019; 12:294-303. [PMID: 30472694 DOI: 10.1049/iet-syb.2018.5015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Reverse engineering of gene regulatory network (GRN) is an important and challenging task in systems biology. Existing parameter estimation approaches that compute model parameters with the same importance are usually computationally expensive or infeasible, especially in dealing with complex biological networks.In order to improve the efficiency of computational modeling, the paper applies a hierarchical estimation methodology in computational modeling of GRN based on topological analysis. This paper divides nodes in a network into various priority levels using the graph-based measure and genetic algorithm. The nodes in the first level, that correspond to root strongly connected components(SCC) in the digraph of GRN, are given top priority in parameter estimation. The estimated parameters of vertices in the previous priority level ARE used to infer the parameters for nodes in the next priority level. The proposed hierarchical estimation methodology obtains lower error indexes while consuming less computational resources compared with single estimation methodology. Experimental outcomes with insilico networks and a realistic network show that gene networks are decomposed into no more than four levels, which is consistent with the properties of inherent modularity for GRN. In addition, the proposed hierarchical parameter estimation achieves a balance between computational efficiency and accuracy.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China
| | - Feng Zhang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China
| | - Jianming Zhang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China.
| | - Ning Wang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China
| |
Collapse
|
35
|
Chen X, Gu J, Wang X, Jung JG, Wang TL, Hilakivi-Clarke L, Clarke R, Xuan J. CRNET: an efficient sampling approach to infer functional regulatory networks by integrating large-scale ChIP-seq and time-course RNA-seq data. Bioinformatics 2019; 34:1733-1740. [PMID: 29280996 DOI: 10.1093/bioinformatics/btx827] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 12/20/2017] [Indexed: 12/28/2022] Open
Abstract
Motivation NGS techniques have been widely applied in genetic and epigenetic studies. Multiple ChIP-seq and RNA-seq profiles can now be jointly used to infer functional regulatory networks (FRNs). However, existing methods suffer from either oversimplified assumption on transcription factor (TF) regulation or slow convergence of sampling for FRN inference from large-scale ChIP-seq and time-course RNA-seq data. Results We developed an efficient Bayesian integration method (CRNET) for FRN inference using a two-stage Gibbs sampler to estimate iteratively hidden TF activities and the posterior probabilities of binding events. A novel statistic measure that jointly considers regulation strength and regression error enables the sampling process of CRNET to converge quickly, thus making CRNET very efficient for large-scale FRN inference. Experiments on synthetic and benchmark data showed a significantly improved performance of CRNET when compared with existing methods. CRNET was applied to breast cancer data to identify FRNs functional at promoter or enhancer regions in breast cancer MCF-7 cells. Transcription factor MYC is predicted as a key functional factor in both promoter and enhancer FRNs. We experimentally validated the regulation effects of MYC on CRNET-predicted target genes using appropriate RNAi approaches in MCF-7 cells. Availability and implementation R scripts of CRNET are available at http://www.cbil.ece.vt.edu/software.htm. Contact xuan@vt.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xi Chen
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jinghua Gu
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Xiao Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jin-Gyoung Jung
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA
| | - Tian-Li Wang
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA
| | - Leena Hilakivi-Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Robert Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
36
|
Pirgazi J, Khanteymoori AR, Jalilkhani M. TIGRNCRN: Trustful inference of gene regulatory network using clustering and refining the network. J Bioinform Comput Biol 2019; 17:1950018. [DOI: 10.1142/s0219720019500185] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this study, in order to deal with the noise and uncertainty in gene expression data, learning networks, especially Bayesian networks, that have the ability to use prior knowledge, were used to infer gene regulatory network. Learning networks are methods that have the structure of the network and a learning process to obtain relationships. One of the methods which have been used for measuring the relationship between genes is the correlation metrics, but the high correlated genes not necessarily mean that they have causal effect on each other. Studies on common methods in inference of gene regulatory networks are yet to pay attention to their biological importance and as such, predictions by these methods are less accurate in terms of biological significance. Hence, in the proposed method, genes with high correlation were identified in one cluster using clustering, and the existence of edge between the genes in the cluster was prevented. Finally, after the Bayesian network modeling, based on knowledge gained from clustering, the refining phase and improving regulatory interactions using biological correlation were done. In order to show the efficiency, the proposed method has been compared with several common methods in this area including GENIE3 and BMALR. The results of the evaluation indicate that the proposed method recognized regulatory relations in Bayesian modeling process well, due to using of biological knowledge which is hidden in the data collection, and is able to recognize gene regulatory networks align with important methods in this field.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Maryam Jalilkhani
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| |
Collapse
|
37
|
Li M, Li C, Liu WX, Liu C, Cui J, Li Q, Ni H, Yang Y, Wu C, Chen C, Zhen X, Zeng T, Zhao M, Chen L, Wu J, Zeng R, Chen L. Dysfunction of PLA2G6 and CYP2C44-associated network signals imminent carcinogenesis from chronic inflammation to hepatocellular carcinoma. J Mol Cell Biol 2019; 9:489-503. [PMID: 28655161 PMCID: PMC5907842 DOI: 10.1093/jmcb/mjx021] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2016] [Accepted: 06/16/2017] [Indexed: 12/14/2022] Open
Abstract
Little is known about how chronic inflammation contributes to the progression of hepatocellular carcinoma (HCC), especially the initiation of cancer. To uncover the critical transition from chronic inflammation to HCC and the molecular mechanisms at a network level, we analyzed the time-series proteomic data of woodchuck hepatitis virus/c-myc mice and age-matched wt-C57BL/6 mice using our dynamical network biomarker (DNB) model. DNB analysis indicated that the 5th month after birth of transgenic mice was the critical period of cancer initiation, just before the critical transition, which is consistent with clinical symptoms. Meanwhile, the DNB-associated network showed a drastic inversion of protein expression and coexpression levels before and after the critical transition. Two members of DNB, PLA2G6 and CYP2C44, along with their associated differentially expressed proteins, were found to induce dysfunction of arachidonic acid metabolism, further activate inflammatory responses through inflammatory mediator regulation of transient receptor potential channels, and finally lead to impairments of liver detoxification and malignant transition to cancer. As a c-Myc target, PLA2G6 positively correlated with c-Myc in expression, showing a trend from decreasing to increasing during carcinogenesis, with the minimal point at the critical transition or tipping point. Such trend of homologous PLA2G6 and c-Myc was also observed during human hepatocarcinogenesis, with the minimal point at high-grade dysplastic nodules (a stage just before the carcinogenesis). Our study implies that PLA2G6 might function as an oncogene like famous c-Myc during hepatocarcinogenesis, while downregulation of PLA2G6 and c-Myc could be a warning signal indicating imminent carcinogenesis.
Collapse
Affiliation(s)
- Meiyi Li
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,Minhang Hospital, Fudan University, Shanghai, China
| | - Chen Li
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Wei-Xin Liu
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China.,University of Chinese Academy of sciences, Beijing, China
| | - Conghui Liu
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,University of Chinese Academy of sciences, Beijing, China
| | - Jingru Cui
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Qingrun Li
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Hong Ni
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Yingcheng Yang
- International Co-operation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Institute, Second Military Medical University, Shanghai, China
| | - Chaochao Wu
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Chunlei Chen
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Xing Zhen
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Mujun Zhao
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- International Co-operation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Institute, Second Military Medical University, Shanghai, China.,National Center for Liver Cancer, Shanghai, China
| | - Jiarui Wu
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China.,Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, China
| | - Rong Zeng
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, CAS center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai, China.,Minhang Hospital, Fudan University, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| |
Collapse
|
38
|
Ahn H, Jo K, Jeong D, Pak M, Hur J, Jung W, Kim S. PropaNet: Time-Varying Condition-Specific Transcriptional Network Construction by Network Propagation. FRONTIERS IN PLANT SCIENCE 2019; 10:698. [PMID: 31258543 PMCID: PMC6587906 DOI: 10.3389/fpls.2019.00698] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 05/09/2019] [Indexed: 06/09/2023]
Abstract
Transcription factor (TF) has a significant influence on the state of a cell by regulating multiple down-stream genes. Thus, experimental and computational biologists have made great efforts to construct TF gene networks for regulatory interactions between TFs and their target genes. Now, an important research question is how to utilize TF networks to investigate the response of a plant to stress at the transcription control level using time-series transcriptome data. In this article, we present a new computational network, PropaNet, to investigate dynamics of TF networks from time-series transcriptome data using two state-of-the-art network analysis techniques, influence maximization and network propagation. PropaNet uses the influence maximization technique to produce a ranked list of TFs, in the order of TF that explains differentially expressed genes (DEGs) better at each time point. Then, a network propagation technique is used to select a group of TFs that explains DEGs best as a whole. For the analysis of Arabidopsis time series datasets from AtGenExpress, we used PlantRegMap as a template TF network and performed PropaNet analysis to investigate transcriptional dynamics of Arabidopsis under cold and heat stress. The time varying TF networks showed that Arabidopsis responded to cold and heat stress quite differently. For cold stress, bHLH and bZIP type TFs were the first responding TFs and the cold signal influenced histone variants, various genes involved in cell architecture, osmosis and restructuring of cells. However, the consequences of plants under heat stress were up-regulation of genes related to accelerating differentiation and starting re-differentiation. In terms of energy metabolism, plants under heat stress show elevated metabolic process and resulting in an exhausted status. We believe that PropaNet will be useful for the construction of condition-specific time-varying TF network for time-series data analysis in response to stress. PropaNet is available at http://biohealth.snu.ac.kr/software/PropaNet.
Collapse
Affiliation(s)
- Hongryul Ahn
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Kyuri Jo
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Minwoo Pak
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| | - Jihye Hur
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Woosuk Jung
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| |
Collapse
|
39
|
Care MA, Westhead DR, Tooze RM. Parsimonious Gene Correlation Network Analysis (PGCNA): a tool to define modular gene co-expression for refined molecular stratification in cancer. NPJ Syst Biol Appl 2019; 5:13. [PMID: 30993001 PMCID: PMC6459838 DOI: 10.1038/s41540-019-0090-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 03/18/2019] [Indexed: 12/11/2022] Open
Abstract
Cancers converge onto shared patterns that arise from constraints placed by the biology of the originating cell lineage and microenvironment on programs driven by oncogenic events. Here we define consistent expression modules reflecting this structure in colon and breast cancer by exploiting expression data resources and a new computationally efficient approach that we validate against other comparable methods. This approach, Parsimonious Gene Correlation Network Analysis (PGCNA), allows comparison of network structures between these cancer types identifying shared modules of gene co-expression reflecting: cancer hallmarks, functional and structural gene batteries, copy number variation and biology of originating lineage. These networks along with the mapping of outcome data at gene and module level provide an interactive resource that generates context for relationships between genes within and between such modules. Assigning module expression values (MEVs) provides a tool to summarize network level gene expression in individual cases illustrating potential utility in classification and allowing analysis of linkage between module expression and mutational state. Exploiting TCGA data thus defines both recurrent patterns of association between module expression and mutation at data-set level, and exemplifies the polarization of mutation patterns with the leading edge of module expression at individual case level. We illustrate the scalable nature of the approach within immune response related modules, which in the context of breast cancer demonstrates the selective association of immune subsets, in particular mast cells, with the underlying mutational pattern. Together our analyses provide evidence for a generalizable framework to enhance molecular stratification in cancer.
Collapse
Affiliation(s)
- Matthew A. Care
- Section of Experimental Haematology, Leeds Institute of Medical Research, University of Leeds, Leeds, LS9 7TF UK
- Bioinformatics Group, School of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT UK
| | - David R. Westhead
- Bioinformatics Group, School of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT UK
| | - Reuben M. Tooze
- Section of Experimental Haematology, Leeds Institute of Medical Research, University of Leeds, Leeds, LS9 7TF UK
| |
Collapse
|
40
|
Shi M, Shen W, Chong Y, Wang HQ. Improving GRN re-construction by mining hidden regulatory signals. IET Syst Biol 2019; 11:174-181. [PMID: 29125126 PMCID: PMC8687237 DOI: 10.1049/iet-syb.2017.0013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Inferring gene regulatory networks (GRNs) from gene expression data is an important but challenging issue in systems biology. Here, the authors propose a dictionary learning-based approach that aims to infer GRNs by globally mining regulatory signals, known or latent. Gene expression is often regulated by various regulatory factors, some of which are observed and some of which are latent. The authors assume that all regulators are unknown for a target gene and the expression of the target gene can be mapped into a regulatory space spanned by all the regulators. Specifically, the authors modify the dictionary learning model, k-SVD, according to the sparse property of GRNs for mining the regulatory signals. The recovered regulatory signals are then used as a pool of regulatory factors to calculate a confidence score for a given transcription factor regulating a target gene. The capability of recovering hidden regulatory signals was verified on simulated data. Comparative experiments for GRN inference between the proposed algorithm (OURM) and some state-of-the-art algorithms, e.g. GENIE3 and ARACNE, on real-world data sets show the superior performance of OURM in inferring GRNs: higher area under the receiver operating characteristic curves and area under the precision-recall curves.
Collapse
Affiliation(s)
- Ming Shi
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Weiming Shen
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Yanwen Chong
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Hong-Qiang Wang
- Machine Intelligence and Computational Biology Laboratory, Institute of Intelligent Machines, Chinese Academy of Science, PO Box 1130, Hefei 230031, People's Republic of China.
| |
Collapse
|
41
|
Zhang F, Liu X, Zhang A, Jiang Z, Chen L, Zhang X. Genome-wide dynamic network analysis reveals a critical transition state of flower development in Arabidopsis. BMC PLANT BIOLOGY 2019; 19:11. [PMID: 30616516 PMCID: PMC6323737 DOI: 10.1186/s12870-018-1589-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 12/04/2018] [Indexed: 05/06/2023]
Abstract
BACKGROUND The flowering transition which is controlled by a complex and intricate gene regulatory network plays an important role in the reproduction for offspring of plants. It is a challenge to identify the critical transition state as well as the genes that control the transition of flower development. With the emergence of massively parallel sequencing, a great number of time-course transcriptome data greatly facilitate the exploration of the developmental phase transition in plants. Although some network-based bioinformatics analyses attempted to identify the genes that control the phase transition, they generally overlooked the dynamics of regulation and resulted in unreliable results. In addition, the results of these methods cannot be self-explained. RESULTS In this work, to reveal a critical transition state and identify the transition-specific genes of flower development, we implemented a genome-wide dynamic network analysis on temporal gene expression data in Arabidopsis by dynamic network biomarker (DNB) method. In the analysis, DNB model which can exploit collective fluctuations and correlations of different metabolites at a network level was used to detect the imminent critical transition state or the tipping point. The genes that control the phase transition can be identified by the difference of weighted correlations between the genes interested and the other genes in the global network. To construct the gene regulatory network controlling the flowering transition, we applied NARROMI algorithm which can reduce the noisy, redundant and indirect regulations on the expression data of the transition-specific genes. In the results, the critical transition state detected during the formation of flowers corresponded to the development of flowering on the 7th to 9th day in Arabidopsis. Among of 233 genes identified to be highly fluctuated at the transition state, a high percentage of genes with maximum expression in pollen was detected, and 24 genes were validated to participate in stress reaction process, as well as other floral-related pathways. Composed of three major subnetworks, a gene regulatory network with 150 nodes and 225 edges was found to be highly correlated with flowering transition. The gene ontology (GO) annotation of pathway enrichment analysis revealed that the identified genes are enriched in the catalytic activity, metabolic process and cellular process. CONCLUSIONS This study provides a novel insight to identify the real causality of the phase transition with genome-wide dynamic network analysis.
Collapse
Affiliation(s)
- Fuping Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specially Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074 China
- University of Chinese Academy of Sciences, Beijing, 10049 China
| | - Xiaoping Liu
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specially Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074 China
| | - Zhonglin Jiang
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specially Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074 China
| |
Collapse
|
42
|
Huynh-Thu VA, Geurts P. Unsupervised Gene Network Inference with Decision Trees and Random Forests. Methods Mol Biol 2019; 1883:195-215. [PMID: 30547401 DOI: 10.1007/978-1-4939-8882-2_8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
In this chapter, we introduce the reader to a popular family of machine learning algorithms, called decision trees. We then review several approaches based on decision trees that have been developed for the inference of gene regulatory networks (GRNs). Decision trees have indeed several nice properties that make them well-suited for tackling this problem: they are able to detect multivariate interacting effects between variables, are non-parametric, have good scalability, and have very few parameters. In particular, we describe in detail the GENIE3 algorithm, a state-of-the-art method for GRN inference.
Collapse
Affiliation(s)
- Vân Anh Huynh-Thu
- Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium.
| | - Pierre Geurts
- Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium
| |
Collapse
|
43
|
Yang B, Xu Y, Maxwell A, Koh W, Gong P, Zhang C. MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data. BMC SYSTEMS BIOLOGY 2018; 12:115. [PMID: 30547796 PMCID: PMC6293491 DOI: 10.1186/s12918-018-0635-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background Reconstruction of gene regulatory networks (GRNs), also known as reverse engineering of GRNs, aims to infer the potential regulation relationships between genes. With the development of biotechnology, such as gene chip microarray and RNA-sequencing, the high-throughput data generated provide us with more opportunities to infer the gene-gene interaction relationships using gene expression data and hence understand the underlying mechanism of biological processes. Gene regulatory networks are known to exhibit a multiplicity of interaction mechanisms which include functional and non-functional, and linear and non-linear relationships. Meanwhile, the regulatory interactions between genes and gene products are not spontaneous since various processes involved in producing fully functional and measurable concentrations of transcriptional factors/proteins lead to a delay in gene regulation. Many different approaches for reconstructing GRNs have been proposed, but the existing GRN inference approaches such as probabilistic Boolean networks and dynamic Bayesian networks have various limitations and relatively low accuracy. Inferring GRNs from time series microarray data or RNA-sequencing data remains a very challenging inverse problem due to its nonlinearity, high dimensionality, sparse and noisy data, and significant computational cost, which motivates us to develop more effective inference methods. Results We developed a novel algorithm, MICRAT (Maximal Information coefficient with Conditional Relative Average entropy and Time-series mutual information), for inferring GRNs from time series gene expression data. Maximal information coefficient (MIC) is an effective measure of dependence for two-variable relationships. It captures a wide range of associations, both functional and non-functional, and thus has good performance on measuring the dependence between two genes. Our approach mainly includes two procedures. Firstly, it employs maximal information coefficient for constructing an undirected graph to represent the underlying relationships between genes. Secondly, it directs the edges in the undirected graph for inferring regulators and their targets. In this procedure, the conditional relative average entropies of each pair of nodes (or genes) are employed to indicate the directions of edges. Since the time delay might exist in the expression of regulators and target genes, time series mutual information is combined to cooperatively direct the edges for inferring the potential regulators and their targets. We evaluated the performance of MICRAT by applying it to synthetic datasets as well as real gene expression data and compare with other GRN inference methods. We inferred five 10-gene and five 100-gene networks from the DREAM4 challenge that were generated using the gene expression simulator GeneNetWeaver (GNW). MICRAT was also used to reconstruct GRNs on real gene expression data including part of the DNA-damaged response pathway (SOS DNA repair network) and experimental dataset in E. Coli. The results showed that MICRAT significantly improved the inference accuracy, compared to other inference methods, such as TDBN, etc. Conclusion In this work, a novel algorithm, MICRAT, for inferring GRNs from time series gene expression data was proposed by taking into account dependence and time delay of expressions of a regulator and its target genes. This approach employed maximal information coefficients for reconstructing an undirected graph to represent the underlying relationships between genes. The edges were directed by combining conditional relative average entropy with time course mutual information of pairs of genes. The proposed algorithm was evaluated on the benchmark GRNs provided by the DREAM4 challenge and part of the real SOS DNA repair network in E. Coli. The experimental study showed that our approach was comparable to other methods on 10-gene datasets and outperformed other methods on 100-gene datasets in GRN inference from time series datasets.
Collapse
Affiliation(s)
- Bei Yang
- School of Information & Engineering, Zhengzhou University, Zhengzhou, 450000, China. .,Center of Precision Medicine, Zhengzhou University, Zhengzhou, 450000, China.
| | - Yaohui Xu
- School of Information & Engineering, Zhengzhou University, Zhengzhou, 450000, China
| | - Andrew Maxwell
- School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Wonryull Koh
- School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Ping Gong
- Environmental Lab, US Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA
| | - Chaoyang Zhang
- School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA.
| |
Collapse
|
44
|
Legeay M, Aubourg S, Renou JP, Duval B. Large scale study of anti-sense regulation by differential network analysis. BMC SYSTEMS BIOLOGY 2018; 12:95. [PMID: 30458828 PMCID: PMC6245689 DOI: 10.1186/s12918-018-0613-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Background Systems biology aims to analyse regulation mechanisms into the cell. By mapping interactions observed in different situations, differential network analysis has shown its power to reveal specific cellular responses or specific dysfunctional regulations. In this work, we propose to explore on a large scale the role of natural anti-sense transcription on gene regulation mechanisms, and we focus our study on apple (Malus domestica) in the context of fruit ripening in cold storage. Results We present a differential functional analysis of the sense and anti-sense transcriptomic data that reveals functional terms linked to the ripening process. To develop our differential network analysis, we introduce our inference method of an Extended Core Network; this method is inspired by C3NET, but extends the notion of significant interactions. By comparing two extended core networks, one inferred with sense data and the other one inferred with sense and anti-sense data, our differential analysis is first performed on a local view and reveals AS-impacted genes, genes that have important interactions impacted by anti-sense transcription. The motifs surrounding AS-impacted genes gather transcripts with functions mostly consistent with the biological context of the data used and the method allows us to identify new actors involved in ripening and cold acclimation pathways and to decipher their interactions. Then from a more global view, we compute minimal sub-networks that connect the AS-impacted genes using Steiner trees. Those Steiner trees allow us to study the rewiring of the AS-impacted genes in the network with anti-sense actors. Conclusion Anti-sense transcription is usually ignored in transcriptomic studies. The large-scale differential analysis of apple data that we propose reveals that anti-sense regulation may have an important impact in several cellular stress response mechanisms. Our data mining process enables to highlight specific interactions that deserve further experimental investigations. Electronic supplementary material The online version of this article (10.1186/s12918-018-0613-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marc Legeay
- LERIA, Université d'Angers, 2 bd Lavoisier, Angers, 49045, France.,IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, 49071, France
| | - Sébastien Aubourg
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, 49071, France
| | - Jean-Pierre Renou
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, 49071, France
| | - Béatrice Duval
- LERIA, Université d'Angers, 2 bd Lavoisier, Angers, 49045, France.
| |
Collapse
|
45
|
Yang B, Chen Y, Zhang W, Lv J, Bao W, Huang DS. HSCVFNT: Inference of Time-Delayed Gene Regulatory Network Based on Complex-Valued Flexible Neural Tree Model. Int J Mol Sci 2018; 19:E3178. [PMID: 30326663 PMCID: PMC6214043 DOI: 10.3390/ijms19103178] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 10/08/2018] [Accepted: 10/10/2018] [Indexed: 11/17/2022] Open
Abstract
Gene regulatory network (GRN) inference can understand the growth and development of animals and plants, and reveal the mystery of biology. Many computational approaches have been proposed to infer GRN. However, these inference approaches have hardly met the need of modeling, and the reducing redundancy methods based on individual information theory method have bad universality and stability. To overcome the limitations and shortcomings, this thesis proposes a novel algorithm, named HSCVFNT, to infer gene regulatory network with time-delayed regulations by utilizing a hybrid scoring method and complex-valued flexible neural network (CVFNT). The regulations of each target gene can be obtained by iteratively performing HSCVFNT. For each target gene, the HSCVFNT algorithm utilizes a novel scoring method based on time-delayed mutual information (TDMI), time-delayed maximum information coefficient (TDMIC) and time-delayed correlation coefficient (TDCC), to reduce the redundancy of regulatory relationships and obtain the candidate regulatory factor set. Then, the TDCC method is utilized to create time-delayed gene expression time-series matrix. Finally, a complex-valued flexible neural tree model is proposed to infer the time-delayed regulations of each target gene with the time-delayed time-series matrix. Three real time-series expression datasets from (Save Our Soul) SOS DNA repair system in E. coli and Saccharomyces cerevisiae are utilized to evaluate the performance of the HSCVFNT algorithm. As a result, HSCVFNT obtains outstanding F-scores of 0.923, 0.8 and 0.625 for SOS network and (In vivo Reverse-Engineering and Modeling Assessment) IRMA network inference, respectively, which are 5.5%, 14.3% and 72.2% higher than the best performance of other state-of-the-art GRN inference methods and time-delayed methods.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan 250002, China.
| | - Wei Zhang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Jiaguo Lv
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Wenzheng Bao
- School of Computer Science, China University of Mining and Technology, Xuzhou 221000, China.
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, Tongji University, Shanghai 200092, China.
| |
Collapse
|
46
|
Pirgazi J, Khanteymoori AR. A robust gene regulatory network inference method base on Kalman filter and linear regression. PLoS One 2018; 13:e0200094. [PMID: 30001352 PMCID: PMC6044105 DOI: 10.1371/journal.pone.0200094] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 06/19/2018] [Indexed: 11/24/2022] Open
Abstract
The reconstruction of the topology of gene regulatory networks (GRNs) using high
throughput genomic data such as microarray gene expression data is an important
problem in systems biology. The main challenge in gene expression data is the
high number of genes and low number of samples; also the data are often
impregnated with noise. In this paper, in dealing with the noisy data, Kalman
filter based method that has the ability to use prior knowledge on learning the
network was used. In the proposed method namely (KFLR), in the
first phase by using mutual information, the noisy regulations with low
correlations were removed. The proposed method utilized a new closed form
solution to compute the posterior probabilities of the edges from regulators to
the target gene within a hybrid framework of Bayesian model averaging and linear
regression methods. In order to show the efficiency, the proposed method was
compared with several well know methods. The results of the evaluation indicate
that the inference accuracy was improved by the proposed method which also
demonstrated better regulatory relations with the noisy data.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty,
University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty,
University of Zanjan, Zanjan, Iran
- * E-mail:
| |
Collapse
|
47
|
Lee CJ, Kang D, Lee S, Lee S, Kang J, Kim S. In silico experiment system for testing hypothesis on gene functions using three condition specific biological networks. Methods 2018; 145:10-15. [PMID: 29758273 DOI: 10.1016/j.ymeth.2018.05.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Revised: 04/30/2018] [Accepted: 05/03/2018] [Indexed: 01/18/2023] Open
Abstract
Determining functions of a gene requires time consuming, expensive biological experiments. Scientists can speed up this experimental process if the literature information and biological networks can be adequately provided. In this paper, we present a web-based information system that can perform in silico experiments of computationally testing hypothesis on the function of a gene. A hypothesis that is specified in English by the user is converted to genes using a literature and knowledge mining system called BEST. Condition-specific TF, miRNA and PPI (protein-protein interaction) networks are automatically generated by projecting gene and miRNA expression data to template networks. Then, an in silico experiment is to test how well the target genes are connected from the knockout gene through the condition-specific networks. The test result visualizes path from the knockout gene to the target genes in the three networks. Statistical and information-theoretic scores are provided on the resulting web page to help scientists either accept or reject the hypothesis being tested. Our web-based system was extensively tested using three data sets, such as E2f1, Lrrk2, and Dicer1 knockout data sets. We were able to re-produce gene functions reported in the original research papers. In addition, we comprehensively tested with all disease names in MalaCards as hypothesis to show the effectiveness of our system. Our in silico experiment system can be very useful in suggesting biological mechanisms which can be further tested in vivo or in vitro. AVAILABILITY http://biohealth.snu.ac.kr/software/insilico/.
Collapse
Affiliation(s)
- Chai-Jin Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Dongwon Kang
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sangseon Lee
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sunwon Lee
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea; Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea; Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
48
|
Pirayre A, Couprie C, Duval L, Pesquet JC. BRANE Clust: Cluster-Assisted Gene Regulatory Network Inference Refinement. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:850-860. [PMID: 28368827 DOI: 10.1109/tcbb.2017.2688355] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells. Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. BRANE Clust (Biologically-Related A priori Network Enhancement with Clustering) refines gene regulatory network (GRN) inference thanks to cluster information. It works as a post-processing tool for inference methods (i.e., CLR, GENIE3). In BRANE Clust, the clustering is based on the inversion of a system of linear equations involving a graph-Laplacian matrix promoting a modular structure. Our approach is validated on DREAM4 and DREAM5 datasets with objective measures, showing significant comparative improvements. We provide additional insights on the discovery of novel regulatory or co-expressed links in the inferred Escherichia coli network evaluated using the STRING database. The comparative pertinence of clustering is discussed computationally (SIMoNe, WGCNA, X-means) and biologically (RegulonDB). BRANE Clust software is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-clust.html.
Collapse
|
49
|
dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data. Sci Rep 2018; 8:3384. [PMID: 29467401 PMCID: PMC5821733 DOI: 10.1038/s41598-018-21715-0] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 02/06/2018] [Indexed: 11/22/2022] Open
Abstract
The elucidation of gene regulatory networks is one of the major challenges of systems biology. Measurements about genes that are exploited by network inference methods are typically available either in the form of steady-state expression vectors or time series expression data. In our previous work, we proposed the GENIE3 method that exploits variable importance scores derived from Random forests to identify the regulators of each target gene. This method provided state-of-the-art performance on several benchmark datasets, but it could however not specifically be applied to time series expression data. We propose here an adaptation of the GENIE3 method, called dynamical GENIE3 (dynGENIE3), for handling both time series and steady-state expression data. The proposed method is evaluated extensively on the artificial DREAM4 benchmarks and on three real time series expression datasets. Although dynGENIE3 does not systematically yield the best performance on each and every network, it is competitive with diverse methods from the literature, while preserving the main advantages of GENIE3 in terms of scalability.
Collapse
|
50
|
Yu B, Xu JM, Li S, Chen C, Chen RX, Wang L, Zhang Y, Wang MH. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method. Oncotarget 2017; 8:80373-80392. [PMID: 29113310 PMCID: PMC5655205 DOI: 10.18632/oncotarget.21268] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 08/27/2017] [Indexed: 01/31/2023] Open
Abstract
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jia-Meng Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Rui-Xin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Lei Wang
- Key Laboratory of Eco-chemical Engineering, Ministry of Education, Laboratory of Inorganic Synthesis and Applied Chemistry, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Yan Zhang
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Ming-Hui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| |
Collapse
|