1
|
Pušnik Ž, Mraz M, Zimic N, Moškon M. SAILoR: Structure-Aware Inference of Logic Rules. PLoS One 2024; 19:e0304102. [PMID: 38861487 PMCID: PMC11166287 DOI: 10.1371/journal.pone.0304102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/07/2024] [Indexed: 06/13/2024] Open
Abstract
Boolean networks provide an effective mechanism for describing interactions and dynamics of gene regulatory networks (GRNs). Deriving accurate Boolean descriptions of GRNs is a challenging task. The number of experiments is usually much smaller than the number of genes. In addition, binarization leads to a loss of information and inconsistencies arise in binarized time-series data. The inference of Boolean networks from binarized time-series data alone often leads to complex and overfitted models. To obtain relevant Boolean models of gene regulatory networks, inference methods could incorporate data from multiple sources and prior knowledge in terms of general network structure and/or exact interactions. We propose the Boolean network inference method SAILoR (Structure-Aware Inference of Logic Rules). SAILoR incorporates time-series gene expression data in combination with provided reference networks to infer accurate Boolean models. SAILoR automatically extracts topological properties from reference networks. These can describe a more general structure of the GRN or can be more precise and describe specific interactions. SAILoR infers a Boolean network by learning from both continuous and binarized time-series data. It navigates between two main objectives, topological similarity to reference networks and correspondence with gene expression data. By incorporating the NSGA-II multi-objective genetic algorithm, SAILoR relies on the wisdom of crowds. Our results indicate that SAILoR can infer accurate and biologically relevant Boolean descriptions of GRNs from both a static and a dynamic perspective. We show that SAILoR improves the static accuracy of the inferred network compared to the network inference method dynGENIE3. Furthermore, we compared the performance of SAILoR with other Boolean network inference approaches including Best-Fit, REVEAL, MIBNI, GABNI, ATEN, and LogBTF. We have shown that by incorporating prior knowledge about the overall network structure, SAILoR can improve the structural correctness of the inferred Boolean networks while maintaining dynamic accuracy. To demonstrate the applicability of SAILoR, we inferred context-specific Boolean subnetworks of female Drosophila melanogaster before and after mating.
Collapse
Affiliation(s)
- Žiga Pušnik
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Mraz
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Nikolaj Zimic
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Moškon
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
2
|
Nakulugamuwa Gamage H, Chetty M, Lim S, Hallinan J. MICFuzzy: A maximal information content based fuzzy approach for reconstructing genetic networks. PLoS One 2023; 18:e0288174. [PMID: 37418430 PMCID: PMC10328247 DOI: 10.1371/journal.pone.0288174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 06/21/2023] [Indexed: 07/09/2023] Open
Abstract
In systems biology, the accurate reconstruction of Gene Regulatory Networks (GRNs) is crucial since these networks can facilitate the solving of complex biological problems. Amongst the plethora of methods available for GRN reconstruction, information theory and fuzzy concepts-based methods have abiding popularity. However, most of these methods are not only complex, incurring a high computational burden, but they may also produce a high number of false positives, leading to inaccurate inferred networks. In this paper, we propose a novel hybrid fuzzy GRN inference model called MICFuzzy which involves the aggregation of the effects of Maximal Information Coefficient (MIC). This model has an information theory-based pre-processing stage, the output of which is applied as an input to the novel fuzzy model. In this preprocessing stage, the MIC component filters relevant genes for each target gene to significantly reduce the computational burden of the fuzzy model when selecting the regulatory genes from these filtered gene lists. The novel fuzzy model uses the regulatory effect of the identified activator-repressor gene pairs to determine target gene expression levels. This approach facilitates accurate network inference by generating a high number of true regulatory interactions while significantly reducing false regulatory predictions. The performance of MICFuzzy was evaluated using DREAM3 and DREAM4 challenge data, and the SOS real gene expression dataset. MICFuzzy outperformed the other state-of-the-art methods in terms of F-score, Matthews Correlation Coefficient, Structural Accuracy, and SS_mean, and outperformed most of them in terms of efficiency. MICFuzzy also had improved efficiency compared with the classical fuzzy model since the design of MICFuzzy leads to a reduction in combinatorial computation.
Collapse
Affiliation(s)
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Churchill, Victoria, Australia
| | - Suryani Lim
- Health Innovation and Transformation Centre, Federation University, Churchill, Victoria, Australia
| | | |
Collapse
|
3
|
Shachaf LI, Roberts E, Cahan P, Xiao J. Gene regulation network inference using k-nearest neighbor-based mutual information estimation: revisiting an old DREAM. BMC Bioinformatics 2023; 24:84. [PMID: 36879188 PMCID: PMC9990267 DOI: 10.1186/s12859-022-05047-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 11/08/2022] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past 20 years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization. RESULTS In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution using k-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov-Stoögbauer-Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Finally, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods. CONCLUSIONS Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction-which combines CMIA, and the KSG-MI estimator-achieves an improvement of 20-35% in precision-recall measures over the current gold standard in the field. This new method will enable researchers to discover new gene interactions or better choose gene candidates for experimental validations.
Collapse
Affiliation(s)
- Lior I Shachaf
- Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA.
| | - Elijah Roberts
- Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
- 10x Genomics, 6230 Stoneridge Mall Road, Pleasanton, CA, 94588-3260, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Institute for Cell Engineering, Johns Hopkins School of Medicine, 733 N. Broadway, Baltimore, MD, 21205, USA
| | - Jie Xiao
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, 725 N. Wolfe Street, WBSB 708, Baltimore, MD, 21205, USA
| |
Collapse
|
4
|
Seçilmiş D, Hillerton T, Tjärnberg A, Nelander S, Nordling TEM, Sonnhammer ELL. Knowledge of the perturbation design is essential for accurate gene regulatory network inference. Sci Rep 2022; 12:16531. [PMID: 36192495 PMCID: PMC9529923 DOI: 10.1038/s41598-022-19005-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 08/23/2022] [Indexed: 11/08/2022] Open
Abstract
The gene regulatory network (GRN) of a cell executes genetic programs in response to environmental and internal cues. Two distinct classes of methods are used to infer regulatory interactions from gene expression: those that only use observed changes in gene expression, and those that use both the observed changes and the perturbation design, i.e. the targets used to cause the changes in gene expression. Considering that the GRN by definition converts input cues to changes in gene expression, it may be conjectured that the latter methods would yield more accurate inferences but this has not previously been investigated. To address this question, we evaluated a number of popular GRN inference methods that either use the perturbation design or not. For the evaluation we used targeted perturbation knockdown gene expression datasets with varying noise levels generated by two different packages, GeneNetWeaver and GeneSpider. The accuracy was evaluated on each dataset using a variety of measures. The results show that on all datasets, methods using the perturbation design matrix consistently and significantly outperform methods not using it. This was also found to be the case on a smaller experimental dataset from E. coli. Targeted gene perturbations combined with inference methods that use the perturbation design are indispensable for accurate GRN inference.
Collapse
Affiliation(s)
- Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden
| | - Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, USA
| | - Sven Nelander
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, 75185, Uppsala, Sweden
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, Tainan, 701, Taiwan, ROC
- Department of Applied Physics and Electronics, Umeå University, 90187, Umeå, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|
5
|
Seçilmiş D, Hillerton T, Sonnhammer ELL. GRNbenchmark - a web server for benchmarking directed gene regulatory network inference methods. Nucleic Acids Res 2022; 50:W398-W404. [PMID: 35609981 PMCID: PMC9252735 DOI: 10.1093/nar/gkac377] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/20/2022] [Accepted: 05/19/2022] [Indexed: 11/30/2022] Open
Abstract
Accurate inference of gene regulatory networks (GRN) is an essential component of systems biology, and there is a constant development of new inference methods. The most common approach to assess accuracy for publications is to benchmark the new method against a selection of existing algorithms. This often leads to a very limited comparison, potentially biasing the results, which may stem from tuning the benchmark's properties or incorrect application of other methods. These issues can be avoided by a web server with a broad range of data properties and inference algorithms, that makes it easy to perform comprehensive benchmarking of new methods, and provides a more objective assessment. Here we present https://GRNbenchmark.org/ - a new web server for benchmarking GRN inference methods, which provides the user with a set of benchmarks with several datasets, each spanning a range of properties including multiple noise levels. As soon as the web server has performed the benchmarking, the accuracy results are made privately available to the user via interactive summary plots and underlying curves. The user can then download these results for any purpose, and decide whether or not to make them public to share with the community.
Collapse
Affiliation(s)
- Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| |
Collapse
|