1
|
Xin J, Wang M, Qu L, Chen Q, Wang W, Wang Z. BIC-LP: A Hybrid Higher-Order Dynamic Bayesian Network Score Function for Gene Regulatory Network Reconstruction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:188-199. [PMID: 38127613 DOI: 10.1109/tcbb.2023.3345317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Reconstructing gene regulatory networks(GRNs) is an increasingly hot topic in bioinformatics. Dynamic Bayesian network(DBN) is a stochastic graph model commonly used as a vital model for GRN reconstruction. But probabilistic characteristics of biological networks and the existence of data noise bring great challenges to GRN reconstruction and always lead to many false positive/negative edges. ScoreLasso is a hybrid DBN score function combining DBN and linear regression with good performance. Its performance is, however, limited by first-order assumption and ignorance of the initial network of DBN. In this article, an integrated model based on higher-order DBN model, higher-order Lasso linear regression model and Pearson correlation model is proposed. Based on this, a hybrid higher-order DBN score function for GRN reconstruction is proposed, namely BIC-LP. BIC-LP score function is constructed by adding terms based on Lasso linear regression coefficients and Pearson correlation coefficients on classical BIC score function. Therefore, it could capture more information from dataset and curb information loss, compared with both many existing Bayesian family score functions and many state-of-the-art methods for GRN reconstruction. Experimental results show that BIC-LP can reasonably eliminate some false positive edges while retaining most true positive edges, so as to achieve better GRN reconstruction performance.
Collapse
|
2
|
Liu X, Shi N, Wang Y, Ji Z, He S. Data-Driven Boolean Network Inference Using a Genetic Algorithm With Marker-Based Encoding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1558-1569. [PMID: 33513105 DOI: 10.1109/tcbb.2021.3055646] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The inference of Boolean networks is crucial for analyzing the topology and dynamics of gene regulatory networks. Many data-driven approaches using evolutionary algorithms have been proposed based on time-series data. However, the ability to infer both network topology and dynamics is restricted by their inflexible encoding schemes. To address this problem, we propose a novel Boolean network inference algorithm for inferring both network topology and dynamics simultaneously. The main idea is that, we use a marker-based genetic algorithm to encode both regulatory nodes and logical operators in a chromosome. By using the markers and introducing more logical operators, the proposed algorithm can infer more diverse candidate Boolean functions. The proposed algorithm is applied to five networks, including two artificial Boolean networks and three real-world gene regulatory networks. Compared with other algorithms, the experimental results demonstrate that our proposed algorithm infers more accurate topology and dynamics.
Collapse
|
3
|
Liu X, Wang Y, Shi N, Ji Z, He S. GAPORE: Boolean network inference using a genetic algorithm with novel polynomial representation and encoding scheme. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
4
|
|
5
|
Zheng R, Li M, Chen X, Zhao S, Wu FX, Pan Y, Wang J. An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:347-354. [PMID: 30794516 DOI: 10.1109/tcbb.2019.2900614] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Gene regulatory networks (GRNs) play a key role in biological processes. However, GRNs are diverse under different biological conditions. Reconstructing gene regulatory networks (GRNs) from gene expression has become an important opportunity and challenge in the past decades. Although there are a lot of existing methods to infer the topology of GRNs, such as mutual information, random forest, and partial least squares, the accuracy is still low due to the noise and high dimension of the expression data. In this paper, we introduce an ensemble Multivariate Adaptive Regression Splines (MARS) based method to reconstruct the directed GRNs from multifactorial gene expression data, called PBMarsNet. PBMarsNet incorporates part mutual information (PMI) to pre-weight the candidate regulatory genes and then uses MARS to detect the nonlinear regulatory links. Moreover, we apply bootstrap to run the MARS multiple times and average the outputs of each MARS as the final score of regulatory links. The results on DREAM4 challenge and DREAM5 challenge datasets show PBMarsNet has a superior performance and generalization over other state-of-the-art methods.
Collapse
|
6
|
|
7
|
Malekpour SA, Alizad-Rahvar AR, Sadeghi M. LogicNet: probabilistic continuous logics in reconstructing gene regulatory networks. BMC Bioinformatics 2020; 21:318. [PMID: 32690031 PMCID: PMC7372900 DOI: 10.1186/s12859-020-03651-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 07/10/2020] [Indexed: 11/10/2022] Open
Abstract
Background Gene Regulatory Networks (GRNs) have been previously studied by using Boolean/multi-state logics. While the gene expression values are usually scaled into the range [0, 1], these GRN inference methods apply a threshold to discretize the data, resulting in missing information. Most of studies apply fuzzy logics to infer the logical gene-gene interactions from continuous data. However, all these approaches require an a priori known network structure. Results Here, by introducing a new probabilistic logic for continuous data, we propose a novel logic-based approach (called the LogicNet) for the simultaneous reconstruction of the GRN structure and identification of the logics among the regulatory genes, from the continuous gene expression data. In contrast to the previous approaches, the LogicNet does not require an a priori known network structure to infer the logics. The proposed probabilistic logic is superior to the existing fuzzy logics and is more relevant to the biological contexts than the fuzzy logics. The performance of the LogicNet is superior to that of several Mutual Information-based and regression-based tools for reconstructing GRNs. Conclusions The LogicNet reconstructs GRNs and logic functions without requiring prior knowledge of the network structure. Moreover, in another application, the LogicNet can be applied for logic function detection from the known regulatory genes-target interactions. We also conclude that computational modeling of the logical interactions among the regulatory genes significantly improves the GRN reconstruction accuracy.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Amir Reza Alizad-Rahvar
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
8
|
Chen H, Maduranga DAK, Mundra PA, Zheng J. Bayesian Data Fusion of Gene Expression and Histone Modification Profiles for Inference of Gene Regulatory Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:516-525. [PMID: 30207963 DOI: 10.1109/tcbb.2018.2869590] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Accurately reconstructing gene regulatory networks (GRNs) from high-throughput gene expression data has been a major challenge in systems biology for decades. Many approaches have been proposed to solve this problem. However, there is still much room for the improvement of GRN inference. Integrating data from different sources is a promising strategy. Epigenetic modifications have a close relationship with gene regulation. Hence, epigenetic data such as histone modification profiles can provide useful information for uncovering regulatory interactions between genes. In this paper, we propose a method to integrate epigenetic data into the inference of GRNs. In particular, a dynamic Bayesian network (DBN) is employed to infer gene regulations from time-series gene expression data. Epigenetic data (histone modification profiles here) are integrated into the prior probability distribution of the Bayesian model. Our method has been validated on both synthetic and real datasets. Experimental results show that the integration of epigenetic data can significantly improve the performance of GRN inference. As more epigenetic datasets become available, our method would be useful for elucidating the gene regulatory mechanisms driving various cellular activities. The source code and testing datasets are available at https://github.com/Zheng-Lab/MetaGRN/tree/master/histonePrior.
Collapse
|
9
|
Ahmed SS, Roy S, Kalita J. Assessing the Effectiveness of Causality Inference Methods for Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:56-70. [PMID: 29994618 DOI: 10.1109/tcbb.2018.2853728] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Causality inference is the use of computational techniques to predict possible causal relationships for a set of variables, thereby forming a directed network. Causality inference in Gene Regulatory Networks (GRNs) is an important, yet challenging task due to the limits of available data and lack of efficiency in existing causality inference techniques. A number of techniques have been proposed and applied to infer causal relationships in various domains, although they are not specific to regulatory network inference. In this paper, we assess the effectiveness of methods for inferring causal GRNs. We introduce seven different inference methods and apply them to infer directed edges in GRNs. We use time-series expression data from the DREAM challenges to assess the methods in terms of quality of inference and rank them based on performance. The best method is applied to Breast Cancer data to infer a causal network. Experimental results show that Causation Entropy is best, however, highly time-consuming and not feasible to use in a relatively large network. We infer Breast Cancer GRN with the second-best method, Transfer Entropy. The topological analysis of the network reveals that top out-degree genes such as SLC39A5 which are considered central genes, play important role in cancer progression.
Collapse
|
10
|
Li M, Zheng R, Li Y, Wu FX, Wang J. MGT-SM: A Method for Constructing Cellular Signal Transduction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:417-424. [PMID: 28541220 DOI: 10.1109/tcbb.2017.2705143] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
A cellular signal transduction network is an important means to describe biological responses to environmental stimuli and exchange of biological signals. Constructing the cellular signal transduction network provides an important basis for the study of the biological activities, the mechanism of the diseases, drug targets and so on. The statistical approaches to network inference are popular in literature. Granger test has been used as an effective method for causality inference. Compared with bivariate granger tests, multivariate granger tests reduce the indirect causality and were used widely for the construction of cellular signal transduction networks. A multivariate Granger test requires that the number of time points in the time-series data is more than the number of nodes involved in the network. However, there are many real datasets with a few time points which are much less than the number of nodes in the network. In this study, we propose a new multivariate Granger test-based framework to construct cellular signal transduction network, called MGT-SM. Our MGT-SM uses SVD to compute the coefficient matrix from gene expression data and adopts Monte Carlo simulation to estimate the significance of directed edges in the constructed networks. We apply the proposed MGT-SM to Yeast Synthetic Network and MDA-MB-468, and evaluate its performance in terms of the recall and the AUC. The results show that MGT-SM achieves better results, compared with other popular methods (CGC2SPR, PGC, and DBN).
Collapse
|
11
|
Inference of Large-scale Time-delayed Gene Regulatory Network with Parallel MapReduce Cloud Platform. Sci Rep 2018; 8:17787. [PMID: 30542062 PMCID: PMC6290780 DOI: 10.1038/s41598-018-36180-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 11/16/2018] [Indexed: 02/06/2023] Open
Abstract
Inference of gene regulatory network (GRN) is crucial to understand intracellular physiological activity and function of biology. The identification of large-scale GRN has been a difficult and hot topic of system biology in recent years. In order to reduce the computation load for large-scale GRN identification, a parallel algorithm based on restricted gene expression programming (RGEP), namely MPRGEP, is proposed to infer instantaneous and time-delayed regulatory relationships between transcription factors and target genes. In MPRGEP, the structure and parameters of time-delayed S-system (TDSS) model are encoded into one chromosome. An original hybrid optimization approach based on genetic algorithm (GA) and gene expression programming (GEP) is proposed to optimize TDSS model with MapReduce framework. Time-delayed GRNs (TDGRN) with hundreds of genes are utilized to test the performance of MPRGEP. The experiment results reveal that MPRGEP could infer more accurately gene regulatory network than other state-of-art methods, and obtain the convincing speedup.
Collapse
|
12
|
Yang B, Chen Y, Zhang W, Lv J, Bao W, Huang DS. HSCVFNT: Inference of Time-Delayed Gene Regulatory Network Based on Complex-Valued Flexible Neural Tree Model. Int J Mol Sci 2018; 19:E3178. [PMID: 30326663 PMCID: PMC6214043 DOI: 10.3390/ijms19103178] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 10/08/2018] [Accepted: 10/10/2018] [Indexed: 11/17/2022] Open
Abstract
Gene regulatory network (GRN) inference can understand the growth and development of animals and plants, and reveal the mystery of biology. Many computational approaches have been proposed to infer GRN. However, these inference approaches have hardly met the need of modeling, and the reducing redundancy methods based on individual information theory method have bad universality and stability. To overcome the limitations and shortcomings, this thesis proposes a novel algorithm, named HSCVFNT, to infer gene regulatory network with time-delayed regulations by utilizing a hybrid scoring method and complex-valued flexible neural network (CVFNT). The regulations of each target gene can be obtained by iteratively performing HSCVFNT. For each target gene, the HSCVFNT algorithm utilizes a novel scoring method based on time-delayed mutual information (TDMI), time-delayed maximum information coefficient (TDMIC) and time-delayed correlation coefficient (TDCC), to reduce the redundancy of regulatory relationships and obtain the candidate regulatory factor set. Then, the TDCC method is utilized to create time-delayed gene expression time-series matrix. Finally, a complex-valued flexible neural tree model is proposed to infer the time-delayed regulations of each target gene with the time-delayed time-series matrix. Three real time-series expression datasets from (Save Our Soul) SOS DNA repair system in E. coli and Saccharomyces cerevisiae are utilized to evaluate the performance of the HSCVFNT algorithm. As a result, HSCVFNT obtains outstanding F-scores of 0.923, 0.8 and 0.625 for SOS network and (In vivo Reverse-Engineering and Modeling Assessment) IRMA network inference, respectively, which are 5.5%, 14.3% and 72.2% higher than the best performance of other state-of-the-art GRN inference methods and time-delayed methods.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan 250002, China.
| | - Wei Zhang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Jiaguo Lv
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Wenzheng Bao
- School of Computer Science, China University of Mining and Technology, Xuzhou 221000, China.
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, Tongji University, Shanghai 200092, China.
| |
Collapse
|
13
|
Yu B, Xu JM, Li S, Chen C, Chen RX, Wang L, Zhang Y, Wang MH. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method. Oncotarget 2017; 8:80373-80392. [PMID: 29113310 PMCID: PMC5655205 DOI: 10.18632/oncotarget.21268] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 08/27/2017] [Indexed: 01/31/2023] Open
Abstract
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jia-Meng Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Rui-Xin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Lei Wang
- Key Laboratory of Eco-chemical Engineering, Ministry of Education, Laboratory of Inorganic Synthesis and Applied Chemistry, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Yan Zhang
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Ming-Hui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| |
Collapse
|