1
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
2
|
Bao W, Lin X, Yang B, Chen B. Gene Regulatory Identification Based on the Novel Hybrid Time-Delayed Method. Front Genet 2022; 13:888786. [PMID: 35664311 PMCID: PMC9161097 DOI: 10.3389/fgene.2022.888786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 04/06/2022] [Indexed: 11/28/2022] Open
Abstract
Gene regulatory network (GRN) inference with biology data is a difficult and serious issue in the field of system biology. In order to detect the direct associations of GRN more accurately, a novel two-step GRN inference technique based on the time-delayed correlation coefficient (TDCC) and time-delayed complex-valued S-system model (TDCVSS) is proposed. First, a TDCC algorithm is utilized to construct an initial network. Second, a TDCVSS model is utilized to prune the network topology in order to delete false-positive regulatory relationships for each target gene. The complex-valued restricted additive tree and complex-valued differential evolution are proposed to approximate the optimal TDCVSS model. Finally, the overall network could be inferred by integrating the regulations of all target genes. Two real gene expression datasets from E. coli and S. cerevisiae gene networks are utilized to evaluate the performances of our proposed two-step GRN inference algorithm. The results demonstrated that the proposed algorithm could infer GRN more correct than classical methods and time-delayed methods.
Collapse
Affiliation(s)
- Wenzheng Bao
- School of Information Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Xiao Lin
- Department of Pharmaceutics, Zaozhuang Municipal Hospital, Zaozhuang, China
- *Correspondence: Xiao Lin,
| | - Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang, China 277160
| | - Baitong Chen
- Xuzhou Municipal First People’s Hospital, Xuzhou, China
| |
Collapse
|
3
|
Data-driven learning how oncogenic gene expression locally alters heterocellular networks. Nat Commun 2022; 13:1986. [PMID: 35418177 PMCID: PMC9007999 DOI: 10.1038/s41467-022-29636-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 03/22/2022] [Indexed: 11/21/2022] Open
Abstract
Developing drugs increasingly relies on mechanistic modeling and simulation. Models that capture causal relations among genetic drivers of oncogenesis, functional plasticity, and host immunity complement wet experiments. Unfortunately, formulating such mechanistic cell-level models currently relies on hand curation, which can bias how data is interpreted or the priority of drug targets. In modeling molecular-level networks, rules and algorithms are employed to limit a priori biases in formulating mechanistic models. Here we combine digital cytometry with Bayesian network inference to generate causal models of cell-level networks linking an increase in gene expression associated with oncogenesis with alterations in stromal and immune cell subsets from bulk transcriptomic datasets. We predict how increased Cell Communication Network factor 4, a secreted matricellular protein, alters the tumor microenvironment using data from patients diagnosed with breast cancer and melanoma. Predictions are then tested using two immunocompetent mouse models for melanoma, which provide consistent experimental results. While mechanistic models play increasing roles in immuno-oncology, hand network curation is current practice. Here the authors use a Bayesian data-driven approach to infer how expression of a secreted oncogene alters the cellular landscape within the tumor.
Collapse
|
4
|
Li Y, Jann T, Vera-Licona P. Benchmarking time-series data discretization on inference methods. Bioinformatics 2019; 35:3102-3109. [PMID: 30657860 DOI: 10.1093/bioinformatics/btz036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 12/10/2018] [Accepted: 01/14/2019] [Indexed: 12/15/2022] Open
Abstract
SUMMARY The rapid development in quantitatively measuring DNA, RNA and protein has generated a great interest in the development of reverse-engineering methods, that is, data-driven approaches to infer the network structure or dynamical model of the system. Many reverse-engineering methods require discrete quantitative data as input, while many experimental data are continuous. Some studies have started to reveal the impact that the choice of data discretization has on the performance of reverse-engineering methods. However, more comprehensive studies are still greatly needed to systematically and quantitatively understand the impact that discretization methods have on inference methods. Furthermore, there is an urgent need for systematic comparative methods that can help select between discretization methods. In this work, we consider four published intracellular networks inferred with their respective time-series datasets. We discretized the data using different discretization methods. Across all datasets, changing the data discretization to a more appropriate one improved the reverse-engineering methods' performance. We observed no universal best discretization method across different time-series datasets. Thus, we propose DiscreeTest, a two-step evaluation metric for ranking discretization methods for time-series data. The underlying assumption of DiscreeTest is that an optimal discretization method should preserve the dynamic patterns observed in the original data across all variables. We used the same datasets and networks to show that DiscreeTest is able to identify an appropriate discretization among several candidate methods. To our knowledge, this is the first time that a method for benchmarking and selecting an appropriate discretization method for time-series data has been proposed. AVAILABILITY AND IMPLEMENTATION All the datasets, reverse-engineering methods and source code used in this paper are available in Vera-Licona's lab Github repository: https://github.com/VeraLiconaResearchGroup/Benchmarking_TSDiscretizations. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuezhe Li
- R.D. Berlin Center for Cell Analysis and Modeling, University of Connecticut School of Medicine, Farmington, CT, USA
| | - Tiffany Jann
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Paola Vera-Licona
- Center for Quantitative Medicine, University of Connecticut School of Medicine, Farmington, CT, USA.,Department of Cell Biology, University of Connecticut School of Medicine, Farmington, CT, USA.,Department of Pediatrics, University of Connecticut School of Medicine, Farmington, CT, USA.,Institute for Systems Genomics, University of Connecticut School of Medicine, Farmington, CT, USA
| |
Collapse
|
5
|
Yang B, Xu Y, Maxwell A, Koh W, Gong P, Zhang C. MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data. BMC SYSTEMS BIOLOGY 2018; 12:115. [PMID: 30547796 PMCID: PMC6293491 DOI: 10.1186/s12918-018-0635-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background Reconstruction of gene regulatory networks (GRNs), also known as reverse engineering of GRNs, aims to infer the potential regulation relationships between genes. With the development of biotechnology, such as gene chip microarray and RNA-sequencing, the high-throughput data generated provide us with more opportunities to infer the gene-gene interaction relationships using gene expression data and hence understand the underlying mechanism of biological processes. Gene regulatory networks are known to exhibit a multiplicity of interaction mechanisms which include functional and non-functional, and linear and non-linear relationships. Meanwhile, the regulatory interactions between genes and gene products are not spontaneous since various processes involved in producing fully functional and measurable concentrations of transcriptional factors/proteins lead to a delay in gene regulation. Many different approaches for reconstructing GRNs have been proposed, but the existing GRN inference approaches such as probabilistic Boolean networks and dynamic Bayesian networks have various limitations and relatively low accuracy. Inferring GRNs from time series microarray data or RNA-sequencing data remains a very challenging inverse problem due to its nonlinearity, high dimensionality, sparse and noisy data, and significant computational cost, which motivates us to develop more effective inference methods. Results We developed a novel algorithm, MICRAT (Maximal Information coefficient with Conditional Relative Average entropy and Time-series mutual information), for inferring GRNs from time series gene expression data. Maximal information coefficient (MIC) is an effective measure of dependence for two-variable relationships. It captures a wide range of associations, both functional and non-functional, and thus has good performance on measuring the dependence between two genes. Our approach mainly includes two procedures. Firstly, it employs maximal information coefficient for constructing an undirected graph to represent the underlying relationships between genes. Secondly, it directs the edges in the undirected graph for inferring regulators and their targets. In this procedure, the conditional relative average entropies of each pair of nodes (or genes) are employed to indicate the directions of edges. Since the time delay might exist in the expression of regulators and target genes, time series mutual information is combined to cooperatively direct the edges for inferring the potential regulators and their targets. We evaluated the performance of MICRAT by applying it to synthetic datasets as well as real gene expression data and compare with other GRN inference methods. We inferred five 10-gene and five 100-gene networks from the DREAM4 challenge that were generated using the gene expression simulator GeneNetWeaver (GNW). MICRAT was also used to reconstruct GRNs on real gene expression data including part of the DNA-damaged response pathway (SOS DNA repair network) and experimental dataset in E. Coli. The results showed that MICRAT significantly improved the inference accuracy, compared to other inference methods, such as TDBN, etc. Conclusion In this work, a novel algorithm, MICRAT, for inferring GRNs from time series gene expression data was proposed by taking into account dependence and time delay of expressions of a regulator and its target genes. This approach employed maximal information coefficients for reconstructing an undirected graph to represent the underlying relationships between genes. The edges were directed by combining conditional relative average entropy with time course mutual information of pairs of genes. The proposed algorithm was evaluated on the benchmark GRNs provided by the DREAM4 challenge and part of the real SOS DNA repair network in E. Coli. The experimental study showed that our approach was comparable to other methods on 10-gene datasets and outperformed other methods on 100-gene datasets in GRN inference from time series datasets.
Collapse
Affiliation(s)
- Bei Yang
- School of Information & Engineering, Zhengzhou University, Zhengzhou, 450000, China. .,Center of Precision Medicine, Zhengzhou University, Zhengzhou, 450000, China.
| | - Yaohui Xu
- School of Information & Engineering, Zhengzhou University, Zhengzhou, 450000, China
| | - Andrew Maxwell
- School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Wonryull Koh
- School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Ping Gong
- Environmental Lab, US Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA
| | - Chaoyang Zhang
- School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA.
| |
Collapse
|
6
|
Yang B, Chen Y, Zhang W, Lv J, Bao W, Huang DS. HSCVFNT: Inference of Time-Delayed Gene Regulatory Network Based on Complex-Valued Flexible Neural Tree Model. Int J Mol Sci 2018; 19:E3178. [PMID: 30326663 PMCID: PMC6214043 DOI: 10.3390/ijms19103178] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 10/08/2018] [Accepted: 10/10/2018] [Indexed: 11/17/2022] Open
Abstract
Gene regulatory network (GRN) inference can understand the growth and development of animals and plants, and reveal the mystery of biology. Many computational approaches have been proposed to infer GRN. However, these inference approaches have hardly met the need of modeling, and the reducing redundancy methods based on individual information theory method have bad universality and stability. To overcome the limitations and shortcomings, this thesis proposes a novel algorithm, named HSCVFNT, to infer gene regulatory network with time-delayed regulations by utilizing a hybrid scoring method and complex-valued flexible neural network (CVFNT). The regulations of each target gene can be obtained by iteratively performing HSCVFNT. For each target gene, the HSCVFNT algorithm utilizes a novel scoring method based on time-delayed mutual information (TDMI), time-delayed maximum information coefficient (TDMIC) and time-delayed correlation coefficient (TDCC), to reduce the redundancy of regulatory relationships and obtain the candidate regulatory factor set. Then, the TDCC method is utilized to create time-delayed gene expression time-series matrix. Finally, a complex-valued flexible neural tree model is proposed to infer the time-delayed regulations of each target gene with the time-delayed time-series matrix. Three real time-series expression datasets from (Save Our Soul) SOS DNA repair system in E. coli and Saccharomyces cerevisiae are utilized to evaluate the performance of the HSCVFNT algorithm. As a result, HSCVFNT obtains outstanding F-scores of 0.923, 0.8 and 0.625 for SOS network and (In vivo Reverse-Engineering and Modeling Assessment) IRMA network inference, respectively, which are 5.5%, 14.3% and 72.2% higher than the best performance of other state-of-the-art GRN inference methods and time-delayed methods.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan 250002, China.
| | - Wei Zhang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Jiaguo Lv
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Wenzheng Bao
- School of Computer Science, China University of Mining and Technology, Xuzhou 221000, China.
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, Tongji University, Shanghai 200092, China.
| |
Collapse
|
7
|
Barman S, Kwon YK. A novel mutual information-based Boolean network inference method from time-series gene expression data. PLoS One 2017; 12:e0171097. [PMID: 28178334 PMCID: PMC5298315 DOI: 10.1371/journal.pone.0171097] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 01/16/2017] [Indexed: 11/27/2022] Open
Abstract
Background Inferring a gene regulatory network from time-series gene expression data in systems biology is a challenging problem. Many methods have been suggested, most of which have a scalability limitation due to the combinatorial cost of searching a regulatory set of genes. In addition, they have focused on the accurate inference of a network structure only. Therefore, there is a pressing need to develop a network inference method to search regulatory genes efficiently and to predict the network dynamics accurately. Results In this study, we employed a Boolean network model with a restricted update rule scheme to capture coarse-grained dynamics, and propose a novel mutual information-based Boolean network inference (MIBNI) method. Given time-series gene expression data as an input, the method first identifies a set of initial regulatory genes using mutual information-based feature selection, and then improves the dynamics prediction accuracy by iteratively swapping a pair of genes between sets of the selected regulatory genes and the other genes. Through extensive simulations with artificial datasets, MIBNI showed consistently better performance than six well-known existing methods, REVEAL, Best-Fit, RelNet, CST, CLR, and BIBN in terms of both structural and dynamics prediction accuracy. We further tested the proposed method with two real gene expression datasets for an Escherichia coli gene regulatory network and a fission yeast cell cycle network, and also observed better results using MIBNI compared to the six other methods. Conclusions Taken together, MIBNI is a promising tool for predicting both the structure and the dynamics of a gene regulatory network.
Collapse
Affiliation(s)
- Shohag Barman
- School of Electrical Engineering, University of Ulsan, Daehak-ro, Nam-gu, Ulsan, Republic of Korea
| | - Yung-Keun Kwon
- School of Electrical Engineering, University of Ulsan, Daehak-ro, Nam-gu, Ulsan, Republic of Korea
- * E-mail:
| |
Collapse
|
8
|
Yang B, Zhang W, Wang H, Song C, Chen Y. TDSDMI: Inference of time-delayed gene regulatory network using S-system model with delayed mutual information. Comput Biol Med 2016; 72:218-25. [DOI: 10.1016/j.compbiomed.2016.03.024] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Revised: 03/04/2016] [Accepted: 03/29/2016] [Indexed: 01/06/2023]
|