1
|
Zhang D, Gao S, Liu ZP, Gao R. LogicGep: Boolean networks inference using symbolic regression from time-series transcriptomic profiling data. Brief Bioinform 2024; 25:bbae286. [PMID: 38886006 PMCID: PMC11182660 DOI: 10.1093/bib/bbae286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/09/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.
Collapse
Affiliation(s)
- Dezhen Zhang
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Shuhua Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Rui Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
2
|
Wei PJ, Guo Z, Gao Z, Ding Z, Cao RF, Su Y, Zheng CH. Inference of gene regulatory networks based on directed graph convolutional networks. Brief Bioinform 2024; 25:bbae309. [PMID: 38935070 PMCID: PMC11209731 DOI: 10.1093/bib/bbae309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 05/17/2024] [Indexed: 06/28/2024] Open
Abstract
Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Ziqiang Guo
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zheng Ding
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Chun-Hou Zheng
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| |
Collapse
|
3
|
Lee J, Kim N, Cho KH. Decoding the principle of cell-fate determination for its reverse control. NPJ Syst Biol Appl 2024; 10:47. [PMID: 38710700 PMCID: PMC11074314 DOI: 10.1038/s41540-024-00372-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 04/16/2024] [Indexed: 05/08/2024] Open
Abstract
Understanding and manipulating cell fate determination is pivotal in biology. Cell fate is determined by intricate and nonlinear interactions among molecules, making mathematical model-based quantitative analysis indispensable for its elucidation. Nevertheless, obtaining the essential dynamic experimental data for model development has been a significant obstacle. However, recent advancements in large-scale omics data technology are providing the necessary foundation for developing such models. Based on accumulated experimental evidence, we can postulate that cell fate is governed by a limited number of core regulatory circuits. Following this concept, we present a conceptual control framework that leverages single-cell RNA-seq data for dynamic molecular regulatory network modeling, aiming to identify and manipulate core regulatory circuits and their master regulators to drive desired cellular state transitions. We illustrate the proposed framework by applying it to the reversion of lung cancer cell states, although it is more broadly applicable to understanding and controlling a wide range of cell-fate determination processes.
Collapse
Affiliation(s)
- Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Namhee Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- biorevert, Inc., Daejeon, Republic of Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
4
|
Malekpour SA, Haghverdi L, Sadeghi M. Single-cell multi-omics analysis identifies context-specific gene regulatory gates and mechanisms. Brief Bioinform 2024; 25:bbae180. [PMID: 38653489 PMCID: PMC11036345 DOI: 10.1093/bib/bbae180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/29/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
There is a growing interest in inferring context specific gene regulatory networks from single-cell RNA sequencing (scRNA-seq) data. This involves identifying the regulatory relationships between transcription factors (TFs) and genes in individual cells, and then characterizing these relationships at the level of specific cell types or cell states. In this study, we introduce scGATE (single-cell gene regulatory gate) as a novel computational tool for inferring TF-gene interaction networks and reconstructing Boolean logic gates involving regulatory TFs using scRNA-seq data. In contrast to current Boolean models, scGATE eliminates the need for individual formulations and likelihood calculations for each Boolean rule (e.g. AND, OR, XOR). By employing a Bayesian framework, scGATE infers the Boolean rule after fitting the model to the data, resulting in significant reductions in time-complexities for logic-based studies. We have applied assay for transposase-accessible chromatin with sequencing (scATAC-seq) data and TF DNA binding motifs to filter out non-relevant TFs in gene regulations. By integrating single-cell clustering with these external cues, scGATE is able to infer context specific networks. The performance of scGATE is evaluated using synthetic and real single-cell multi-omics data from mouse tissues and human blood, demonstrating its superiority over existing tools for reconstructing TF-gene networks. Additionally, scGATE provides a flexible framework for understanding the complex combinatorial and cooperative relationships among TFs regulating target genes by inferring Boolean logic gates among them.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), 19395-5746, Tehran, Iran
| | - Laleh Haghverdi
- Berlin Institute for Medical Systems Biology, Max Delbrück Center (BIMSB-MDC) in the Helmholtz Association, Berlin, Germany
| | - Mehdi Sadeghi
- Department of Medical Genetics, National Institute of Genetic Engineering and Biotechnology, 1497716316, Tehran, Iran
| |
Collapse
|
5
|
Gonzalez G, Herath I, Veselkov K, Bronstein M, Zitnik M. Combinatorial prediction of therapeutic perturbations using causally-inspired neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.03.573985. [PMID: 38260532 PMCID: PMC10802439 DOI: 10.1101/2024.01.03.573985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
As an alternative to target-driven drug discovery, phenotype-driven approaches identify compounds that counteract the overall disease effects by analyzing phenotypic signatures. Our study introduces a novel approach to this field, aiming to expand the search space for new therapeutic agents. We introduce PDGrapher, a causally-inspired graph neural network model designed to predict arbitrary perturbagens - sets of therapeutic targets - capable of reversing disease effects. Unlike existing methods that learn responses to perturbations, PDGrapher solves the inverse problem, which is to infer the perturbagens necessary to achieve a specific response - i.e., directly predicting perturbagens by learning which perturbations elicit a desired response. Experiments across eight datasets of genetic and chemical perturbations show that PDGrapher successfully predicted effective perturbagens in up to 9% additional test samples and ranked therapeutic targets up to 35% higher than competing methods. A key innovation of PDGrapher is its direct prediction capability, which contrasts with the indirect, computationally intensive models traditionally used in phenotypedriven drug discovery that only predict changes in phenotypes due to perturbations. The direct approach enables PDGrapher to train up to 30 times faster, representing a significant leap in efficiency. Our results suggest that PDGrapher can advance phenotype-driven drug discovery, offering a fast and comprehensive approach to identifying therapeutically useful perturbations.
Collapse
Affiliation(s)
- Guadalupe Gonzalez
- Imperial College London, London, UK
- Prescient Design, Genentech, South San Francisco, CA, USA
- F. Hoffmann-La Roche Ltd, Basel, Switzerland
| | - Isuru Herath
- Merck & Co., South San Francisco, CA, USA
- Cornell University, Ithaca, NY, USA
| | | | | | - Marinka Zitnik
- Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| |
Collapse
|
6
|
Kim H, Choi H, Lee D, Kim J. A review on gene regulatory network reconstruction algorithms based on single cell RNA sequencing. Genes Genomics 2024; 46:1-11. [PMID: 38032470 DOI: 10.1007/s13258-023-01473-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND Understanding gene regulatory networks (GRNs) is essential for unraveling the molecular mechanisms governing cellular behavior. With the advent of high-throughput transcriptome measurement technology, researchers have aimed to reverse engineer the biological systems, extracting gene regulatory rules from their outputs, which represented by gene expression data. Bulk RNA sequencing, a widely used method for measuring gene expression, has been employed for GRN reconstruction. However, it falls short in capturing dynamic changes in gene expression at the level of individual cells since it averages gene expression across mixed cell populations. OBJECTIVE In this review, we provide an overview of 15 GRN reconstruction tools and discuss their respective strengths and limitations, particularly in the context of single cell RNA sequencing (scRNA-seq). METHODS Recent advancements in scRNA-seq break new ground of GRN reconstruction. They offer snapshots of the individual cell transcriptomes and capturing dynamic changes. We emphasize how these technological breakthroughs have enhanced GRN reconstruction. CONCLUSION GRN reconstructors can be classified based on their requirement for cellular trajectory, which represents a dynamical cellular process including differentiation, aging, or disease progression. Benchmarking studies support the superiority of GRN reconstructors that do not require trajectory analysis in identifying regulator-target relationships. However, methods equipped with trajectory analysis demonstrate better performance in identifying key regulatory factors. In conclusion, researchers should select a suitable GRN reconstructor based on their specific research objectives.
Collapse
Affiliation(s)
- Hyeonkyu Kim
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea
| | - Hwisoo Choi
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea
| | - Daewon Lee
- School of Art and Technology, Chung-Ang University, 4726 Seodong-Daero, Anseong-Si, Gyeonggi-Do, 17546, Republic of Korea.
| | - Junil Kim
- School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, 06978, Republic of Korea.
| |
Collapse
|
7
|
Nakulugamuwa Gamage H, Chetty M, Lim S, Hallinan J. MICFuzzy: A maximal information content based fuzzy approach for reconstructing genetic networks. PLoS One 2023; 18:e0288174. [PMID: 37418430 PMCID: PMC10328247 DOI: 10.1371/journal.pone.0288174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 06/21/2023] [Indexed: 07/09/2023] Open
Abstract
In systems biology, the accurate reconstruction of Gene Regulatory Networks (GRNs) is crucial since these networks can facilitate the solving of complex biological problems. Amongst the plethora of methods available for GRN reconstruction, information theory and fuzzy concepts-based methods have abiding popularity. However, most of these methods are not only complex, incurring a high computational burden, but they may also produce a high number of false positives, leading to inaccurate inferred networks. In this paper, we propose a novel hybrid fuzzy GRN inference model called MICFuzzy which involves the aggregation of the effects of Maximal Information Coefficient (MIC). This model has an information theory-based pre-processing stage, the output of which is applied as an input to the novel fuzzy model. In this preprocessing stage, the MIC component filters relevant genes for each target gene to significantly reduce the computational burden of the fuzzy model when selecting the regulatory genes from these filtered gene lists. The novel fuzzy model uses the regulatory effect of the identified activator-repressor gene pairs to determine target gene expression levels. This approach facilitates accurate network inference by generating a high number of true regulatory interactions while significantly reducing false regulatory predictions. The performance of MICFuzzy was evaluated using DREAM3 and DREAM4 challenge data, and the SOS real gene expression dataset. MICFuzzy outperformed the other state-of-the-art methods in terms of F-score, Matthews Correlation Coefficient, Structural Accuracy, and SS_mean, and outperformed most of them in terms of efficiency. MICFuzzy also had improved efficiency compared with the classical fuzzy model since the design of MICFuzzy leads to a reduction in combinatorial computation.
Collapse
Affiliation(s)
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Churchill, Victoria, Australia
| | - Suryani Lim
- Health Innovation and Transformation Centre, Federation University, Churchill, Victoria, Australia
| | | |
Collapse
|
8
|
Nabuco Leva Ferreira de Freitas JA, Bischof O. Dynamic modeling of the cellular senescence gene regulatory network. Heliyon 2023; 9:e14007. [PMID: 36938415 PMCID: PMC10015196 DOI: 10.1016/j.heliyon.2023.e14007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 02/13/2023] [Accepted: 02/17/2023] [Indexed: 02/27/2023] Open
Abstract
Cellular senescence is a cell fate that prominently impacts physiological and pathophysiological processes. Diverse cellular stresses induce it, and dramatic gene expression changes accompany it. However, determining the interactions comprising the gene regulatory network (GRN) governing senescence remains challenging. Recent advances in signal processing techniques provide opportunities to reconstruct GRNs. Here, we describe a GRN for senescence integrating time-series transcriptome and transcription factor depletion datasets. Specifically, we infer a set of differential equations using the "Sparse Identification of Nonlinear Dynamics" (SINDy) algorithm, discriminate genes with potential hidden regulators, validate the inferred GRN for time-points not included in the training data, and comprehensively benchmark our approach. Our work is a proof of concept for a data-driven GRN reconstruction method, consolidating an iterative, powerful mathematical platform for senescence modeling that can be used to test hypotheses in silico and has the potential for future discoveries of clinical impact.
Collapse
Affiliation(s)
- José Américo Nabuco Leva Ferreira de Freitas
- IMRB, Mondor Institute for Biomedical Research, INSERM U955 – Université Paris Est Créteil, UPEC, Faculté de Médecine de Créteil 8, rue du Général Sarrail, 94010 Créteil
- Sorbonne Université, UMR 8256, Biological Adaptation and Ageing B2A–IBPS, F-75005, Paris, France
- INSERM U1164, F-75005, Paris, France
| | - Oliver Bischof
- IMRB, Mondor Institute for Biomedical Research, INSERM U955 – Université Paris Est Créteil, UPEC, Faculté de Médecine de Créteil 8, rue du Général Sarrail, 94010 Créteil
- Corresponding author.
| |
Collapse
|
9
|
Abid D, Brent MR. NetProphet 3: a machine learning framework for transcription factor network mapping and multi-omics integration. Bioinformatics 2023; 39:7000334. [PMID: 36692138 PMCID: PMC9912366 DOI: 10.1093/bioinformatics/btad038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 01/11/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Many methods have been proposed for mapping the targets of transcription factors (TFs) from gene expression data. It is known that combining outputs from multiple methods can improve performance. To date, outputs have been combined by using either simplistic formulae, such as geometric mean, or carefully hand-tuned formulae that may not generalize well to new inputs. Finally, the evaluation of accuracy has been challenging due to the lack of genome-scale, ground-truth networks. RESULTS We developed NetProphet3, which combines scores from multiple analyses automatically, using a tree boosting algorithm trained on TF binding location data. We also developed three independent, genome-scale evaluation metrics. By these metrics, NetProphet3 is more accurate than other commonly used packages, including NetProphet 2.0, when gene expression data from direct TF perturbations are available. Furthermore, its integration mode can forge a consensus network from gene expression data and TF binding location data. AVAILABILITY AND IMPLEMENTATION All data and code are available at https://zenodo.org/record/7504131#.Y7Wu3i-B2x8. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dhoha Abid
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
10
|
Zhang J, Singh R. Investigating the Complexity of Gene Co-expression Estimation for Single-cell Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.24.525447. [PMID: 36747724 PMCID: PMC9900775 DOI: 10.1101/2023.01.24.525447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
With the rapid advance of single-cell RNA sequencing (scRNA-seq) technology, understanding biological processes at a more refined single-cell level is becoming possible. Gene co-expression estimation is an essential step in this direction. It can annotate functionalities of unknown genes or construct the basis of gene regulatory network inference. This study thoroughly tests the existing gene co-expression estimation methods on simulation datasets with known ground truth co-expression networks. We generate these novel datasets using two simulation processes that use the parameters learned from the experimental data. We demonstrate that these simulations better capture the underlying properties of the real-world single-cell datasets than previously tested simulations for the task. Our performance results on tens of simulated and eight experimental datasets show that all methods produce estimations with a high false discovery rate potentially caused by high-sparsity levels in the data. Finally, we find that commonly used pre-processing approaches, such as normalization and imputation, do not improve the co-expression estimation. Overall, our benchmark setup contributes to the co-expression estimator development, and our study provides valuable insights for the community of single-cell data analyses.
Collapse
Affiliation(s)
- Jiaqi Zhang
- Department of Computer Science, Brown University
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University
| |
Collapse
|
11
|
Koumadorakis DE, Krokidis MG, Dimitrakopoulos GN, Vrahatis AG. A Consensus Gene Regulatory Network for Neurodegenerative Diseases Using Single-Cell RNA-Seq Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1423:215-224. [PMID: 37525047 DOI: 10.1007/978-3-031-31978-5_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]
Abstract
Gene regulatory network (GRN) inference from gene expression data is a highly complex and challenging task in systems biology. Despite the challenges, GRNs have emerged, and for complex diseases such as neurodegenerative diseases, they have the potential to provide vital information and identify key regulators. However, every GRN method produced predicts results based on its assumptions, providing limited biological insights. For that reason, the current work focused on the development of an ensemble method from individual GRN methods to address this issue. Four state-of-the-art GRN algorithms were selected to form a consensus GRN from their common gene interactions. Each algorithm uses a different construction method, and for a more robust behavior, both static and dynamic methods were selected as well. The algorithms were applied to a scRNA-seq dataset from the CK-p25 mus musculus model during neurodegeneration. The top subnetworks were constructed from the consensus network, and potential key regulators were identified. The results also demonstrated the overlap between the algorithms for the current dataset and the necessity for an ensemble approach. This work aims to demonstrate the creation of an ensemble network and provide insights into whether a combination of different GRN methods can produce valuable results.
Collapse
Affiliation(s)
- Dimitrios E Koumadorakis
- Bioinformatics and Human Electrophysiology Lab (BiHELab), Department of Informatics, Ionian University, Corfu, Greece
| | - Marios G Krokidis
- Bioinformatics and Human Electrophysiology Lab (BiHELab), Department of Informatics, Ionian University, Corfu, Greece
| | - Georgios N Dimitrakopoulos
- Bioinformatics and Human Electrophysiology Lab (BiHELab), Department of Informatics, Ionian University, Corfu, Greece
| | - Aristidis G Vrahatis
- Bioinformatics and Human Electrophysiology Lab (BiHELab), Department of Informatics, Ionian University, Corfu, Greece
| |
Collapse
|
12
|
Seçilmiş D, Hillerton T, Tjärnberg A, Nelander S, Nordling TEM, Sonnhammer ELL. Knowledge of the perturbation design is essential for accurate gene regulatory network inference. Sci Rep 2022; 12:16531. [PMID: 36192495 PMCID: PMC9529923 DOI: 10.1038/s41598-022-19005-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 08/23/2022] [Indexed: 11/08/2022] Open
Abstract
The gene regulatory network (GRN) of a cell executes genetic programs in response to environmental and internal cues. Two distinct classes of methods are used to infer regulatory interactions from gene expression: those that only use observed changes in gene expression, and those that use both the observed changes and the perturbation design, i.e. the targets used to cause the changes in gene expression. Considering that the GRN by definition converts input cues to changes in gene expression, it may be conjectured that the latter methods would yield more accurate inferences but this has not previously been investigated. To address this question, we evaluated a number of popular GRN inference methods that either use the perturbation design or not. For the evaluation we used targeted perturbation knockdown gene expression datasets with varying noise levels generated by two different packages, GeneNetWeaver and GeneSpider. The accuracy was evaluated on each dataset using a variety of measures. The results show that on all datasets, methods using the perturbation design matrix consistently and significantly outperform methods not using it. This was also found to be the case on a smaller experimental dataset from E. coli. Targeted gene perturbations combined with inference methods that use the perturbation design are indispensable for accurate GRN inference.
Collapse
Affiliation(s)
- Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden
| | - Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, USA
| | - Sven Nelander
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, 75185, Uppsala, Sweden
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, Tainan, 701, Taiwan, ROC
- Department of Applied Physics and Electronics, Umeå University, 90187, Umeå, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|
13
|
Song Q, Zhu X, Jin L, Chen M, Zhang W, Su J. SMGR: a joint statistical method for integrative analysis of single-cell multi-omics data. NAR Genom Bioinform 2022; 4:lqac056. [PMID: 35910046 PMCID: PMC9326599 DOI: 10.1093/nargab/lqac056] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 06/16/2022] [Accepted: 07/20/2022] [Indexed: 12/12/2022] Open
Abstract
Unravelling the regulatory programs from single-cell multi-omics data has long been one of the major challenges in genomics, especially in the current emerging single-cell field. Currently there is a huge gap between fast-growing single-cell multi-omics data and effective methods for the integrative analysis of these inherent sparse and heterogeneous data. In this study, we have developed a novel method, Single-cell Multi-omics Gene co-Regulatory algorithm (SMGR), to detect coherent functional regulatory signals and target genes from the joint single-cell RNA-sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data obtained from different samples. Given that scRNA-seq and scATAC-seq data can be captured by zero-inflated Negative Binomial distribution, we utilize a generalized linear regression model to identify the latent representation of consistently expressed genes and peaks, thus enables the identification of co-regulatory programs and the elucidation of regulating mechanisms. Results from both simulation and experimental data demonstrate that SMGR outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of SMGR, we apply SMGR to mixed-phenotype acute leukemia (MPAL) and identify the MPAL-specific regulatory program with significant peak-gene links, which greatly enhance our understanding of the regulatory mechanisms and potential targets of this complex tumor.
Collapse
Affiliation(s)
- Qianqian Song
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Atrium Health Wake Forest Baptist, Winston-Salem, NC27157, USA
| | - Xuewei Zhu
- Department of Internal Medicine, Section on Molecular Medicine, Wake Forest School of Medicine, Winston-Salem, NC27101, USA
| | - Lingtao Jin
- Department of Molecular Medicine, UT Health San Antonio, San Antonio, TX78229, USA
| | - Minghan Chen
- Wake Forest University, Department of Computer Science, Winston-Salem, NC27109, USA
| | - Wei Zhang
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Atrium Health Wake Forest Baptist, Winston-Salem, NC27157, USA
| | - Jing Su
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
14
|
Combining kinetic orders for efficient S-System modelling of gene regulatory network. Biosystems 2022; 220:104736. [PMID: 35863700 DOI: 10.1016/j.biosystems.2022.104736] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 07/10/2022] [Accepted: 07/10/2022] [Indexed: 11/21/2022]
Abstract
S-System models, non-linear differential equation models, are widely used for reconstructing gene regulatory networks from temporal gene expression data. An S-System model involves two states, generation and degeneration, and uses the kinetic parameters gij and hij, to represent the direction, nature, and intensity of the genetic interactions. The need for learning a large number of model parameters results in increased computational expense. Previously, we improved the performance of the algorithm using dynamic allocation of the maximum in-degree for each gene. While the method was effective for smaller networks, a large amount of computation was still needed for larger networks. This problem arose mainly due to the increased occurrence of invalid networks during optimization, primarily because the two kinetic parameters (gij and hij) of the S-System model converge independently during optimization. Being independent, these two parameters can converge to values that can indicate contradictory gene interactions, specifically inhibition or activation. In this study, to address this major challenge in S-System modelling, we developed a novel method that includes two features: a penalty term that penalizes those networks with invalid kinetic orders, and a parameter, wij, derived by combining the kinetic parameters gij and hij. The novel penalty term was used for candidate selection during the process of optimizing the DRNI (Dynamically Regulated Network Initialization) algorithm. Rather than remaining constant, it is dynamic, with its magnitude dependent on the number of invalid interactions in the given network. This approach encourages the generation of valid candidate solutions, and eliminates invalid networks in a systematic manner. The previous DRNI method, a two-stage approach which uses dynamic allocation of the maximum in-degree for each gene, was further improved by adding a third stage which applies the proposed wij to handle the invalid regulations that may still exist in that candidate solutions. The method was tested on different gene expression datasets, and was able to reduce the number of iterations and produce improved network accuracies. For a 20 gene network, the number of generations required for convergence was reduced by 300, and the F-score improved by 0.05 compared to our previously reported DRNI approach. For the well-known 10 gene networks of the DREAM challenge, our method produced an improvement in the average area under the ROC curve of the DREAM4 10 gene networks.
Collapse
|
15
|
Madsen CD, Hein J, Workman CT. Systematic inference of indirect transcriptional regulation by protein kinases and phosphatases. PLoS Comput Biol 2022; 18:e1009414. [PMID: 35731801 PMCID: PMC9255832 DOI: 10.1371/journal.pcbi.1009414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 07/05/2022] [Accepted: 05/17/2022] [Indexed: 11/18/2022] Open
Abstract
Gene expression is controlled by pathways of regulatory factors often involving the activity of protein kinases on transcription factor proteins. Despite this well established mechanism, the number of well described pathways that include the regulatory role of protein kinases on transcription factors is surprisingly scarce in eukaryotes.
To address this, PhosTF was developed to infer functional regulatory interactions and pathways in both simulated and real biological networks, based on linear cyclic causal models with latent variables. GeneNetWeaverPhos, an extension of GeneNetWeaver, was developed to allow the simulation of perturbations in known networks that included the activity of protein kinases and phosphatases on gene regulation. Over 2000 genome-wide gene expression profiles, where the loss or gain of regulatory genes could be observed to perturb gene regulation, were then used to infer the existence of regulatory interactions, and their mode of regulation in the budding yeast Saccharomyces cerevisiae.
Despite the additional complexity, our inference performed comparably to the best methods that inferred transcription factor regulation assessed in the DREAM4 challenge on similar simulated networks. Inference on integrated genome-scale data sets for yeast identified ∼ 8800 protein kinase/phosphatase-transcription factor interactions and ∼ 6500 interactions among protein kinases and/or phosphatases. Both types of regulatory predictions captured statistically significant numbers of known interactions of their type. Surprisingly, kinases and phosphatases regulated transcription factors by a negative mode or regulation (deactivation) in over 70% of the predictions.
Collapse
Affiliation(s)
- Christian Degnbol Madsen
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kongens Lyngby, Denmark
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Jotun Hein
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Christopher T. Workman
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kongens Lyngby, Denmark
- * E-mail:
| |
Collapse
|
16
|
Lasri A, Shahrezaei V, Sturrock M. Benchmarking imputation methods for network inference using a novel method of synthetic scRNA-seq data generation. BMC Bioinformatics 2022; 23:236. [PMID: 35715748 PMCID: PMC9204969 DOI: 10.1186/s12859-022-04778-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 05/31/2022] [Indexed: 11/30/2022] Open
Abstract
Background Single cell RNA-sequencing (scRNA-seq) has very rapidly become the new workhorse of modern biology providing an unprecedented global view on cellular diversity and heterogeneity. In particular, the structure of gene-gene expression correlation contains information on the underlying gene regulatory networks. However, interpretation of scRNA-seq data is challenging due to specific experimental error and biases that are unique to this kind of data including drop-out (or technical zeros). Methods To deal with this problem several methods for imputation of zeros for scRNA-seq have been developed. However, it is not clear how these processing steps affect inference of genetic networks from single cell data. Here, we introduce Biomodelling.jl, a tool for generation of synthetic scRNA-seq data using multiscale modelling of stochastic gene regulatory networks in growing and dividing cells. Results Our tool produces realistic transcription data with a known ground truth network topology that can be used to benchmark different approaches for gene regulatory network inference. Using this tool we investigate the impact of different imputation methods on the performance of several network inference algorithms. Conclusions Biomodelling.jl provides a versatile and useful tool for future development and benchmarking of network inference approaches using scRNA-seq data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04778-9
Collapse
Affiliation(s)
- Ayoub Lasri
- Department of Physiology and Medical Physics, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Vahid Shahrezaei
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Marc Sturrock
- Department of Physiology and Medical Physics, Royal College of Surgeons in Ireland, Dublin, Ireland.
| |
Collapse
|
17
|
Seçilmiş D, Hillerton T, Sonnhammer ELL. GRNbenchmark - a web server for benchmarking directed gene regulatory network inference methods. Nucleic Acids Res 2022; 50:W398-W404. [PMID: 35609981 PMCID: PMC9252735 DOI: 10.1093/nar/gkac377] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/20/2022] [Accepted: 05/19/2022] [Indexed: 11/30/2022] Open
Abstract
Accurate inference of gene regulatory networks (GRN) is an essential component of systems biology, and there is a constant development of new inference methods. The most common approach to assess accuracy for publications is to benchmark the new method against a selection of existing algorithms. This often leads to a very limited comparison, potentially biasing the results, which may stem from tuning the benchmark's properties or incorrect application of other methods. These issues can be avoided by a web server with a broad range of data properties and inference algorithms, that makes it easy to perform comprehensive benchmarking of new methods, and provides a more objective assessment. Here we present https://GRNbenchmark.org/ - a new web server for benchmarking GRN inference methods, which provides the user with a set of benchmarks with several datasets, each spanning a range of properties including multiple noise levels. As soon as the web server has performed the benchmarking, the accuracy results are made privately available to the user via interactive summary plots and underlying curves. The user can then download these results for any purpose, and decide whether or not to make them public to share with the community.
Collapse
Affiliation(s)
- Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
18
|
Sonawane AR, Aikawa E, Aikawa M. Connections for Matters of the Heart: Network Medicine in Cardiovascular Diseases. Front Cardiovasc Med 2022; 9:873582. [PMID: 35665246 PMCID: PMC9160390 DOI: 10.3389/fcvm.2022.873582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 04/19/2022] [Indexed: 01/18/2023] Open
Abstract
Cardiovascular diseases (CVD) are diverse disorders affecting the heart and vasculature in millions of people worldwide. Like other fields, CVD research has benefitted from the deluge of multiomics biomedical data. Current CVD research focuses on disease etiologies and mechanisms, identifying disease biomarkers, developing appropriate therapies and drugs, and stratifying patients into correct disease endotypes. Systems biology offers an alternative to traditional reductionist approaches and provides impetus for a comprehensive outlook toward diseases. As a focus area, network medicine specifically aids the translational aspect of in silico research. This review discusses the approach of network medicine and its application to CVD research.
Collapse
Affiliation(s)
- Abhijeet Rajendra Sonawane
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
- Center for Excellence in Vascular Biology, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Elena Aikawa
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
- Center for Excellence in Vascular Biology, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Masanori Aikawa
- Center for Interdisciplinary Cardiovascular Sciences, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
- Center for Excellence in Vascular Biology, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
19
|
Passemiers A, Moreau Y, Raimondi D. Fast and accurate inference of gene regulatory networks through robust precision matrix estimation. Bioinformatics 2022; 38:2802-2809. [PMID: 35561176 PMCID: PMC9113237 DOI: 10.1093/bioinformatics/btac178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 03/14/2022] [Accepted: 03/22/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Transcriptional regulation mechanisms allow cells to adapt and respond to external stimuli by altering gene expression. The possible cell transcriptional states are determined by the underlying gene regulatory network (GRN), and reliably inferring such network would be invaluable to understand biological processes and disease progression. RESULTS In this article, we present a novel method for the inference of GRNs, called PORTIA, which is based on robust precision matrix estimation, and we show that it positively compares with state-of-the-art methods while being orders of magnitude faster. We extensively validated PORTIA using the DREAM and MERLIN+P datasets as benchmarks. In addition, we propose a novel scoring metric that builds on graph-theoretical concepts. AVAILABILITY AND IMPLEMENTATION The code and instructions for data acquisition and full reproduction of our results are available at https://github.com/AntoinePassemiers/PORTIA-Manuscript. PORTIA is available on PyPI as a Python package (portia-grn). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
20
|
Inference of Molecular Regulatory Systems Using Statistical Path-Consistency Algorithm. ENTROPY 2022; 24:e24050693. [PMID: 35626576 PMCID: PMC9142129 DOI: 10.3390/e24050693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/12/2022] [Accepted: 05/12/2022] [Indexed: 11/16/2022]
Abstract
One of the key challenges in systems biology and molecular sciences is how to infer regulatory relationships between genes and proteins using high-throughout omics datasets. Although a wide range of methods have been designed to reverse engineer the regulatory networks, recent studies show that the inferred network may depend on the variable order in the dataset. In this work, we develop a new algorithm, called the statistical path-consistency algorithm (SPCA), to solve the problem of the dependence of variable order. This method generates a number of different variable orders using random samples, and then infers a network by using the path-consistent algorithm based on each variable order. We propose measures to determine the edge weights using the corresponding edge weights in the inferred networks, and choose the edges with the largest weights as the putative regulations between genes or proteins. The developed method is rigorously assessed by the six benchmark networks in DREAM challenges, the mitogen-activated protein (MAP) kinase pathway, and a cancer-specific gene regulatory network. The inferred networks are compared with those obtained by using two up-to-date inference methods. The accuracy of the inferred networks shows that the developed method is effective for discovering molecular regulatory systems.
Collapse
|
21
|
Saremi M, Amirmazlaghani M. Reconstruction of Gene Regulatory Networks Using Multiple Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1827-1839. [PMID: 33539303 DOI: 10.1109/tcbb.2021.3057241] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
MOTIVATION Laboratory gene regulatory data for a species are sporadic. Despite the abundance of gene regulatory network algorithms that employ single data sets, few algorithms can combine the vast but disperse sources of data and extract the potential information. With a motivation to compensate for this shortage, we developed an algorithm called GENEREF that can accumulate information from multiple types of data sets in an iterative manner, with each iteration boosting the performance of the prediction results. RESULTS The algorithm is examined extensively on data extracted from the quintuple DREAM4 networks and DREAM5's Escherichia coli and Saccharomyces cerevisiae networks and sub-networks. Many single-dataset and multi-dataset algorithms were compared to test the performance of the algorithm. Results show that GENEREF surpasses non-ensemble state-of-the-art multi-perturbation algorithms on the selected networks and is competitive to present multiple-dataset algorithms. Specifically, it outperforms dynGENIE3 and is on par with iRafNet. Also, we argued that a scoring method solely based on the AUPR criterion would be more trustworthy than the traditional score. AVAILABILITY The Python implementation along with the data sets and results can be downloaded from github.com/msaremi/GENEREF.
Collapse
|
22
|
Skok Gibbs C, Jackson CA, Saldi GA, Tjärnberg A, Shah A, Watters A, De Veaux N, Tchourine K, Yi R, Hamamsy T, Castro DM, Carriero N, Gorissen BL, Gresham D, Miraldi ER, Bonneau R. High-performance single-cell gene regulatory network inference at scale: the Inferelator 3.0. Bioinformatics 2022; 38:2519-2528. [PMID: 35188184 PMCID: PMC9048651 DOI: 10.1093/bioinformatics/btac117] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 12/08/2021] [Accepted: 02/17/2022] [Indexed: 12/04/2022] Open
Abstract
MOTIVATION Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above. RESULTS In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data. AVAILABILITY AND IMPLEMENTATION The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claudia Skok Gibbs
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Center for Data Science, New York University, New York, NY 10003, USA
| | - Christopher A Jackson
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Giuseppe-Antonio Saldi
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Andreas Tjärnberg
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Aashna Shah
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Aaron Watters
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Nicholas De Veaux
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | | | - Ren Yi
- Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
| | - Tymor Hamamsy
- Center for Data Science, New York University, New York, NY 10003, USA
| | - Dayanne M Castro
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Nicholas Carriero
- Flatiron Institute, Scientific Computing Core, Simons Foundation, New York, NY 10010, USA
| | - Bram L Gorissen
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - David Gresham
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Emily R Miraldi
- Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA
| | - Richard Bonneau
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Center for Data Science, New York University, New York, NY 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
- Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
| |
Collapse
|
23
|
Erbe R, Gore J, Gemmill K, Gaykalova DA, Fertig EJ. The use of machine learning to discover regulatory networks controlling biological systems. Mol Cell 2022; 82:260-273. [PMID: 35016036 PMCID: PMC8905511 DOI: 10.1016/j.molcel.2021.12.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 12/06/2021] [Accepted: 12/13/2021] [Indexed: 01/22/2023]
Abstract
Biological systems are composed of a vast web of multiscale molecular interactors and interactions. High-throughput technologies, both bulk and single cell, now allow for investigation of the properties and quantities of these interactors. Computational algorithms and machine learning methods then provide the tools to derive meaningful insights from the resulting data sets. One such approach is graphical network modeling, which provides a computational framework to explicitly model the molecular interactions within and between the cells comprising biological systems. These graphical networks aim to describe a putative chain of cause and effect between interacting molecules. This feature allows for determination of key molecules in a biological process, accelerated generation of mechanistic hypotheses, and simulation of experimental outcomes. We review the computational concepts and applications of graphical network models across molecular scales for both intracellular and intercellular regulatory biology, examples of successful applications, and the future directions needed to overcome current limitations.
Collapse
Affiliation(s)
- Rossin Erbe
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Jessica Gore
- Institute for Genome Sciences, University of Maryland Medical Center, Baltimore, MD, USA
| | - Kelly Gemmill
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Daria A Gaykalova
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Institute for Genome Sciences, University of Maryland Medical Center, Baltimore, MD, USA; Department of Otorhinolaryngology-Head and Neck Surgery, University of Maryland Medical Center, Baltimore, MD, USA; Marlene & Stewart Greenebaum Comprehensive Cancer Center, University of Maryland Medical Center, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
24
|
Hayes SMS, Sachs JR, Cho CR. From complex data to biological insight: 'DEKER' feature selection and network inference. J Pharmacokinet Pharmacodyn 2021; 49:81-99. [PMID: 34791577 PMCID: PMC8837529 DOI: 10.1007/s10928-021-09792-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2021] [Indexed: 11/12/2022]
Abstract
Network inference is a valuable approach for gaining mechanistic insight from high-dimensional biological data. Existing methods for network inference focus on ranking all possible relations (edges) among all measured quantities such as genes, proteins, metabolites (features) observed, which yields a dense network that is challenging to interpret. Identifying a sparse, interpretable network using these methods thus requires an error-prone thresholding step which compromises their performance. In this article we propose a new method, DEKER-NET, that addresses this limitation by directly identifying a sparse, interpretable network without thresholding, improving real-world performance. DEKER-NET uses a novel machine learning method for feature selection in an iterative framework for network inference. DEKER-NET is extremely flexible, handling linear and nonlinear relations while making no assumptions about the underlying distribution of data, and is suitable for categorical or continuous variables. We test our method on the Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge data, demonstrating that it can directly identify sparse, interpretable networks without thresholding while maintaining performance comparable to the hypothetical best-case thresholded network of other methods.
Collapse
Affiliation(s)
- Sean M S Hayes
- Quantitative Pharmacology and Pharmacometrics, Merck & Co., Inc., Kenilworth, NJ, USA.
| | - Jeffrey R Sachs
- Quantitative Pharmacology and Pharmacometrics, Merck & Co., Inc., Kenilworth, NJ, USA
| | - Carolyn R Cho
- Quantitative Pharmacology and Pharmacometrics, Merck & Co., Inc., Kenilworth, NJ, USA
| |
Collapse
|
25
|
Chen J, Cheong C, Lan L, Zhou X, Liu J, Lyu A, Cheung WK, Zhang L. DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-seq data. Brief Bioinform 2021; 22:bbab325. [PMID: 34424948 PMCID: PMC8499812 DOI: 10.1093/bib/bbab325] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/12/2021] [Accepted: 07/26/2021] [Indexed: 01/11/2023] Open
Abstract
Single-cell RNA sequencing has enabled to capture the gene activities at single-cell resolution, thus allowing reconstruction of cell-type-specific gene regulatory networks (GRNs). The available algorithms for reconstructing GRNs are commonly designed for bulk RNA-seq data, and few of them are applicable to analyze scRNA-seq data by dealing with the dropout events and cellular heterogeneity. In this paper, we represent the joint gene expression distribution of a gene pair as an image and propose a novel supervised deep neural network called DeepDRIM which utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRN from scRNA-seq data. Due to the consideration of TF-gene pair's neighborhood context, DeepDRIM can effectively eliminate the false positives caused by transitive gene-gene interactions. We compared DeepDRIM with nine GRN reconstruction algorithms designed for either bulk or single-cell RNA-seq data. It achieves evidently better performance for the scRNA-seq data collected from eight cell lines. The simulated data show that DeepDRIM is robust to the dropout rate, the cell number and the size of the training data. We further applied DeepDRIM to the scRNA-seq gene expression of B cells from the bronchoalveolar lavage fluid of the patients with mild and severe coronavirus disease 2019. We focused on the cell-type-specific GRN alteration and observed targets of TFs that were differentially expressed between the two statuses to be enriched in lysosome, apoptosis, response to decreased oxygen level and microtubule, which had been proved to be associated with coronavirus infection.
Collapse
Affiliation(s)
- Jiaxing Chen
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - ChinWang Cheong
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Liang Lan
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Xin Zhou
- Department of Biomedical Engineering, Vanderbilt University, Vanderbilt Place Nashville, 37235, TN, USA
| | - Jiming Liu
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - William K Cheung
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong
| |
Collapse
|
26
|
scLink: Inferring Sparse Gene Co-expression Networks from Single-cell Expression Data. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:475-492. [PMID: 34252628 PMCID: PMC8896229 DOI: 10.1016/j.gpb.2020.11.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/23/2020] [Accepted: 12/26/2020] [Indexed: 11/23/2022]
Abstract
A system-level understanding of the regulation and coordination mechanisms of gene expression is essential for studying the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell type-specific manner. Here we propose the scLink method, which uses statistical network modeling to understand the co-expression relationships among genes and construct sparse gene co-expression networks from single-cell gene expression data. We use both simulation and real data studies to demonstrate the advantages of scLink and its ability to improve single-cell gene network analysis. The scLink R package is available at https://github.com/Vivianstats/scLink.
Collapse
|
27
|
Inferring and analyzing gene regulatory networks from multi-factorial expression data: a complete and interactive suite. BMC Genomics 2021; 22:387. [PMID: 34039282 PMCID: PMC8152307 DOI: 10.1186/s12864-021-07659-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 04/28/2021] [Indexed: 11/29/2022] Open
Abstract
Background High-throughput transcriptomic datasets are often examined to discover new actors and regulators of a biological response. To this end, graphical interfaces have been developed and allow a broad range of users to conduct standard analyses from RNA-seq data, even with little programming experience. Although existing solutions usually provide adequate procedures for normalization, exploration or differential expression, more advanced features, such as gene clustering or regulatory network inference, often miss or do not reflect current state of the art methodologies. Results We developed here a user interface called DIANE (Dashboard for the Inference and Analysis of Networks from Expression data) designed to harness the potential of multi-factorial expression datasets from any organisms through a precise set of methods. DIANE interactive workflow provides normalization, dimensionality reduction, differential expression and ontology enrichment. Gene clustering can be performed and explored via configurable Mixture Models, and Random Forests are used to infer gene regulatory networks. DIANE also includes a novel procedure to assess the statistical significance of regulator-target influence measures based on permutations for Random Forest importance metrics. All along the pipeline, session reports and results can be downloaded to ensure clear and reproducible analyses. Conclusions We demonstrate the value and the benefits of DIANE using a recently published data set describing the transcriptional response of Arabidopsis thaliana under the combination of temperature, drought and salinity perturbations. We show that DIANE can intuitively carry out informative exploration and statistical procedures with RNA-Seq data, perform model based gene expression profiles clustering and go further into gene network reconstruction, providing relevant candidate genes or signalling pathways to explore. DIANE is available as a web service (https://diane.bpmp.inrae.fr), or can be installed and locally launched as a complete R package. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-07659-2).
Collapse
|
28
|
Nguyen H, Tran D, Tran B, Pehlivan B, Nguyen T. A comprehensive survey of regulatory network inference methods using single cell RNA sequencing data. Brief Bioinform 2021; 22:bbaa190. [PMID: 34020546 PMCID: PMC8138892 DOI: 10.1093/bib/bbaa190] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 06/19/2020] [Accepted: 07/24/2020] [Indexed: 12/13/2022] Open
Abstract
Gene regulatory network is a complicated set of interactions between genetic materials, which dictates how cells develop in living organisms and react to their surrounding environment. Robust comprehension of these interactions would help explain how cells function as well as predict their reactions to external factors. This knowledge can benefit both developmental biology and clinical research such as drug development or epidemiology research. Recently, the rapid advance of single-cell sequencing technologies, which pushed the limit of transcriptomic profiling to the individual cell level, opens up an entirely new area for regulatory network research. To exploit this new abundant source of data and take advantage of data in single-cell resolution, a number of computational methods have been proposed to uncover the interactions hidden by the averaging process in standard bulk sequencing. In this article, we review 15 such network inference methods developed for single-cell data. We discuss their underlying assumptions, inference techniques, usability, and pros and cons. In an extensive analysis using simulation, we also assess the methods' performance, sensitivity to dropout and time complexity. The main objective of this survey is to assist not only life scientists in selecting suitable methods for their data and analysis purposes but also computational scientists in developing new methods by highlighting outstanding challenges in the field that remain to be addressed in the future development.
Collapse
Affiliation(s)
- Hung Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| | - Duc Tran
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| | - Bang Tran
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| | - Bahadir Pehlivan
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
| |
Collapse
|
29
|
Seçilmiş D, Hillerton T, Nelander S, Sonnhammer ELL. Inferring the experimental design for accurate gene regulatory network inference. Bioinformatics 2021; 37:3553-3559. [PMID: 33978748 PMCID: PMC8545292 DOI: 10.1093/bioinformatics/btab367] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 03/29/2021] [Accepted: 05/11/2021] [Indexed: 11/17/2022] Open
Abstract
Motivation Accurate inference of gene regulatory interactions is of importance for understanding the mechanisms of underlying biological processes. For gene expression data gathered from targeted perturbations, gene regulatory network (GRN) inference methods that use the perturbation design are the top performing methods. However, the connection between the perturbation design and gene expression can be obfuscated due to problems, such as experimental noise or off-target effects, limiting the methods’ ability to reconstruct the true GRN. Results In this study, we propose an algorithm, IDEMAX, to infer the effective perturbation design from gene expression data in order to eliminate the potential risk of fitting a disconnected perturbation design to gene expression. We applied IDEMAX to synthetic data from two different data generation tools, GeneNetWeaver and GeneSPIDER, and assessed its effect on the experiment design matrix as well as the accuracy of the GRN inference, followed by application to a real dataset. The results show that our approach consistently improves the accuracy of GRN inference compared to using the intended perturbation design when much of the signal is hidden by noise, which is often the case for real data. Availability and implementation https://bitbucket.org/sonnhammergrni/idemax. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna, 17121, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna, 17121, Sweden
| | - Sven Nelander
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna, 17121, Sweden
| |
Collapse
|
30
|
Feng G, Lu W, Pedrycz W, Yang J, Liu X. The Learning of Fuzzy Cognitive Maps With Noisy Data: A Rapid and Robust Learning Method With Maximum Entropy. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:2080-2092. [PMID: 31443065 DOI: 10.1109/tcyb.2019.2933438] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Numerous learning methods for fuzzy cognitive maps (FCMs), such as the Hebbian-based and the population-based learning methods, have been developed for modeling and simulating dynamic systems. However, these methods are faced with several obvious limitations. Most of these models are extremely time consuming when learning the large-scale FCMs with hundreds of nodes. Furthermore, the FCMs learned by those algorithms lack robustness when the experimental data contain noise. In addition, reasonable distribution of the weights is rarely considered in these algorithms, which could result in the reduction of the performance of the resulting FCM. In this article, a straightforward, rapid, and robust learning method is proposed to learn FCMs from noisy data, especially, to learn large-scale FCMs. The crux of the proposed algorithm is to equivalently transform the learning problem of FCMs to a classic-constrained convex optimization problem in which the least-squares term ensures the robustness of the well-learned FCM and the maximum entropy term regularizes the distribution of the weights of the well-learned FCM. A series of experiments covering two frequently used activation functions (the sigmoid and hyperbolic tangent functions) are performed on both synthetic datasets with noise and real-world datasets. The experimental results show that the proposed method is rapid and robust against data containing noise and that the well-learned weights have better distribution. In addition, the FCMs learned by the proposed method also exhibit superior performance in comparison with the existing methods.
Collapse
|
31
|
Muzio G, O’Bray L, Borgwardt K. Biological network analysis with deep learning. Brief Bioinform 2021; 22:1515-1530. [PMID: 33169146 PMCID: PMC7986589 DOI: 10.1093/bib/bbaa257] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 08/26/2020] [Accepted: 09/11/2020] [Indexed: 12/17/2022] Open
Abstract
Recent advancements in experimental high-throughput technologies have expanded the availability and quantity of molecular data in biology. Given the importance of interactions in biological processes, such as the interactions between proteins or the bonds within a chemical compound, this data is often represented in the form of a biological network. The rise of this data has created a need for new computational tools to analyze networks. One major trend in the field is to use deep learning for this goal and, more specifically, to use methods that work with networks, the so-called graph neural networks (GNNs). In this article, we describe biological networks and review the principles and underlying algorithms of GNNs. We then discuss domains in bioinformatics in which graph neural networks are frequently being applied at the moment, such as protein function prediction, protein-protein interaction prediction and in silico drug discovery and development. Finally, we highlight application areas such as gene regulatory networks and disease diagnosis where deep learning is emerging as a new tool to answer classic questions like gene interaction prediction and automatic disease prediction from data.
Collapse
Affiliation(s)
- Giulia Muzio
- Machine Learning and Computational Biology Lab at ETH Zürich
| | - Leslie O’Bray
- Machine Learning and Computational Biology Lab at ETH Zürich
| | | |
Collapse
|
32
|
Muzio G, O'Bray L, Borgwardt K. Biological network analysis with deep learning. Brief Bioinform 2021; 22:1515-1530. [PMID: 33169146 DOI: 10.1145/3447548.3467442] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 08/26/2020] [Accepted: 09/11/2020] [Indexed: 05/28/2023] Open
Abstract
Recent advancements in experimental high-throughput technologies have expanded the availability and quantity of molecular data in biology. Given the importance of interactions in biological processes, such as the interactions between proteins or the bonds within a chemical compound, this data is often represented in the form of a biological network. The rise of this data has created a need for new computational tools to analyze networks. One major trend in the field is to use deep learning for this goal and, more specifically, to use methods that work with networks, the so-called graph neural networks (GNNs). In this article, we describe biological networks and review the principles and underlying algorithms of GNNs. We then discuss domains in bioinformatics in which graph neural networks are frequently being applied at the moment, such as protein function prediction, protein-protein interaction prediction and in silico drug discovery and development. Finally, we highlight application areas such as gene regulatory networks and disease diagnosis where deep learning is emerging as a new tool to answer classic questions like gene interaction prediction and automatic disease prediction from data.
Collapse
Affiliation(s)
- Giulia Muzio
- Machine Learning and Computational Biology Lab at ETH Zürich
| | - Leslie O'Bray
- Machine Learning and Computational Biology Lab at ETH Zürich
| | | |
Collapse
|
33
|
Kang Y, Thieffry D, Cantini L. Evaluating the Reproducibility of Single-Cell Gene Regulatory Network Inference Algorithms. Front Genet 2021; 12:617282. [PMID: 33828580 PMCID: PMC8019823 DOI: 10.3389/fgene.2021.617282] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 02/24/2021] [Indexed: 12/13/2022] Open
Abstract
Networks are powerful tools to represent and investigate biological systems. The development of algorithms inferring regulatory interactions from functional genomics data has been an active area of research. With the advent of single-cell RNA-seq data (scRNA-seq), numerous methods specifically designed to take advantage of single-cell datasets have been proposed. However, published benchmarks on single-cell network inference are mostly based on simulated data. Once applied to real data, these benchmarks take into account only a small set of genes and only compare the inferred networks with an imposed ground-truth. Here, we benchmark six single-cell network inference methods based on their reproducibility, i.e., their ability to infer similar networks when applied to two independent datasets for the same biological condition. We tested each of these methods on real data from three biological conditions: human retina, T-cells in colorectal cancer, and human hematopoiesis. Once taking into account networks with up to 100,000 links, GENIE3 results to be the most reproducible algorithm and, together with GRNBoost2, show higher intersection with ground-truth biological interactions. These results are independent from the single-cell sequencing platform, the cell type annotation system and the number of cells constituting the dataset. Finally, GRNBoost2 and CLR show more reproducible performance once a more stringent thresholding is applied to the networks (1,000–100 links). In order to ensure the reproducibility and ease extensions of this benchmark study, we implemented all the analyses in scNET, a Jupyter notebook available at https://github.com/ComputationalSystemsBiology/scNET.
Collapse
Affiliation(s)
- Yoonjee Kang
- Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR 8197, INSERM U1024, Ecole Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
| | - Denis Thieffry
- Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR 8197, INSERM U1024, Ecole Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
| | - Laura Cantini
- Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR 8197, INSERM U1024, Ecole Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
| |
Collapse
|
34
|
Johnson JS, De Veaux N, Rives AW, Lahaye X, Lucas SY, Perot BP, Luka M, Garcia-Paredes V, Amon LM, Watters A, Abdessalem G, Aderem A, Manel N, Littman DR, Bonneau R, Ménager MM. A Comprehensive Map of the Monocyte-Derived Dendritic Cell Transcriptional Network Engaged upon Innate Sensing of HIV. Cell Rep 2021; 30:914-931.e9. [PMID: 31968263 PMCID: PMC7039998 DOI: 10.1016/j.celrep.2019.12.054] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 06/25/2019] [Accepted: 12/13/2019] [Indexed: 01/12/2023] Open
Abstract
Transcriptional programming of the innate immune response is pivotal for host protection. However, the transcriptional mechanisms that link pathogen sensing with innate activation remain poorly under-stood. During HIV-1 infection, human dendritic cells (DCs) can detect the virus through an innate sensing pathway, leading to antiviral interferon and DC maturation. Here, we develop an iterative experimental and computational approach to map the HIV-1 innate response circuitry in monocyte-derived DCs (MDDCs). By integrating genome-wide chromatin accessibility with expression kinetics, we infer a gene regulatory network that links 542 transcription factors with 21,862 target genes. We observe that an interferon response is required, yet insufficient, to drive MDDC maturation and identify PRDM1 and RARA as essential regulators of the interferon response and MDDC maturation, respectively. Our work provides a resource for interrogation of regulators of HIV replication and innate immunity, highlighting complexity and cooperativity in the regulatory circuit controlling the response to infection. Pathogen sensing leads to host transcriptional reprogramming to protect against infection. However, it is unclear how transcription factor activity is coordinated across the genome. Johnson et al. integrate chromatin accessibility and gene expression data to infer and validate a gene regulatory network that directs the innate immune response to HIV.
Collapse
Affiliation(s)
- Jarrod S Johnson
- Department of Biochemistry, University of Utah, Salt Lake City, UT 84112, USA; Center for Infectious Disease Research, Seattle, WA 98109, USA.
| | - Nicholas De Veaux
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Alexander W Rives
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Xavier Lahaye
- Immunity and Cancer Department, Institut Curie, PSL Research University, INSERM U932, 75005 Paris, France
| | - Sasha Y Lucas
- Center for Infectious Disease Research, Seattle, WA 98109, USA
| | - Brieuc P Perot
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France
| | - Marine Luka
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France
| | - Victor Garcia-Paredes
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France
| | - Lynn M Amon
- Center for Infectious Disease Research, Seattle, WA 98109, USA
| | - Aaron Watters
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Ghaith Abdessalem
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France
| | - Alan Aderem
- Center for Infectious Disease Research, Seattle, WA 98109, USA; Department of Immunology, University of Washington School of Medicine, Seattle, WA 98109, USA
| | - Nicolas Manel
- Immunity and Cancer Department, Institut Curie, PSL Research University, INSERM U932, 75005 Paris, France
| | - Dan R Littman
- The Kimmel Center for Biology and Medicine of the Skirball Institute, New York University School of Medicine, New York, NY 10016, USA; Howard Hughes Medical Institute, New York University School of Medicine, New York, NY 10016, USA
| | - Richard Bonneau
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA; Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; Center for Data Science, New York University, New York, NY 10011, USA
| | - Mickaël M Ménager
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France; The Kimmel Center for Biology and Medicine of the Skirball Institute, New York University School of Medicine, New York, NY 10016, USA.
| |
Collapse
|
35
|
Kimura S, Fukutomi R, Tokuhisa M, Okada M. Inference of Genetic Networks From Time-Series and Static Gene Expression Data: Combining a Random-Forest-Based Inference Method With Feature Selection Methods. Front Genet 2021; 11:595912. [PMID: 33384716 PMCID: PMC7770182 DOI: 10.3389/fgene.2020.595912] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 11/23/2020] [Indexed: 11/17/2022] Open
Abstract
Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.
Collapse
Affiliation(s)
- Shuhei Kimura
- Faculty of Engineering, Tottori University, Tottori, Japan
| | - Ryo Fukutomi
- Graduate School of Sustainability Science, Tottori University, Tottori, Japan
| | | | - Mariko Okada
- Laboratory of Cell Systems, Institute of Protein Research, Osaka University, Osaka, Japan
| |
Collapse
|
36
|
Ma B, Fang M, Jiao X. Inference of gene regulatory networks based on nonlinear ordinary differential equations. Bioinformatics 2020; 36:4885-4893. [DOI: 10.1093/bioinformatics/btaa032] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/30/2019] [Accepted: 01/15/2020] [Indexed: 01/05/2023] Open
Abstract
Abstract
Motivation
Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks.
Results
In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity.
Availability and implementation
The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoshan Ma
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Mingkun Fang
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Xiangtian Jiao
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
37
|
Adabor ES, Acquaah-Mensah GK. DOKI: Domain knowledge-driven inference method for reverse-engineering transcriptional regulatory relationships among genes in cancer. Comput Biol Med 2020; 125:104017. [PMID: 33010618 DOI: 10.1016/j.compbiomed.2020.104017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 09/16/2020] [Accepted: 09/20/2020] [Indexed: 11/18/2022]
Abstract
Efficient reverse-engineering methods are important for identifying transcriptional regulatory relationships among genes in cancer. These methods are becoming increasingly useful in this era where huge volumes of data are generated through the use of high-throughput technologies such as next-generation sequencing technologies and microarrays. However, it is important to improve current methods because of complications involved in modelling complex biological systems. In this paper, we present a novel approach, Domain Knowledge-driven Inference (DOKI), for identification of transcriptional regulatory relationships among genes, given a biological context such as cancer. Combining data normalization, the use of a probability distribution function and Kullback-Leibler Divergence, DOKI incorporates a domain knowledge-driven criterion to make determinations of the existence of regulatory relationships between given transcription factors and given specific gene targets. Characteristics of DOKI enable it to adequately handle complexities inherent in data, and accurately unearth linear and higher-order dependent relationships among genes. DOKI performed equally well with one established high-performing method and better than three other high-performing methods on relatively small data sets. However, it remarkably outperformed these methods on larger data sets to demonstrate its utility. Furthermore, we demonstrate the relevance of such inference algorithms for identifying novel relationships among genes in breast cancer, as some of the consensus results representing novel relationships were confirmed in previously published experimental results. Thus, DOKI will facilitate current efforts to gain etiological insights and help uncover new targeted therapies for various diseases.
Collapse
Affiliation(s)
- Emmanuel S Adabor
- School of Technology, Ghana Institute of Management and Public Administration, Achimota, Accra, Ghana.
| | - George K Acquaah-Mensah
- Pharmaceutical Sciences Department, Massachusetts College of Pharmacy and Health Sciences (MCPHS University), 19 Foster Street, Worcester, MA, USA
| |
Collapse
|
38
|
Che D, Guo S, Jiang Q, Chen L. PFBNet: a priori-fused boosting method for gene regulatory network inference. BMC Bioinformatics 2020; 21:308. [PMID: 32664870 PMCID: PMC7362553 DOI: 10.1186/s12859-020-03639-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 07/02/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Inferring gene regulatory networks (GRNs) from gene expression data remains a challenge in system biology. In past decade, numerous methods have been developed for the inference of GRNs. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of potential interactions. RESULTS We present a novel method, namely priori-fused boosting network inference method (PFBNet), to infer GRNs from time-series expression data by using the non-linear model of Boosting and the prior information (e.g., the knockout data) fusion scheme. Specifically, PFBNet first calculates the confidences of the regulation relationships using the boosting-based model, where the information about the accumulation impact of the gene expressions at previous time points is taken into account. Then, a newly defined strategy is applied to fuse the information from the prior data by elevating the confidences of the regulation relationships from the corresponding regulators. CONCLUSIONS The experiments on the benchmark datasets from DREAM challenge as well as the E.coli datasets show that PFBNet achieves significantly better performance than other state-of-the-art methods (Jump3, GEINE3-lag, HiDi, iRafNet and BiXGBoost).
Collapse
Affiliation(s)
- Dandan Che
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Shun Guo
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Qingshan Jiang
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Lifei Chen
- School of Mathematics and Computer Science, Fujian Normal University, Fujian, 350117 China
| |
Collapse
|
39
|
Gene regulatory network inference from sparsely sampled noisy data. Nat Commun 2020; 11:3493. [PMID: 32661225 PMCID: PMC7359369 DOI: 10.1038/s41467-020-17217-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 06/11/2020] [Indexed: 12/16/2022] Open
Abstract
The complexity of biological systems is encoded in gene regulatory networks. Unravelling this intricate web is a fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases. The major obstacle in inferring gene regulatory networks is the lack of data. While time series data are nowadays widely available, they are typically noisy, with low sampling frequency and overall small number of samples. This paper develops a method called BINGO to specifically deal with these issues. Benchmarked with both real and simulated time-series data covering many different gene regulatory networks, BINGO clearly and consistently outperforms state-of-the-art methods. The novelty of BINGO lies in a nonparametric approach featuring statistical sampling of continuous gene expression profiles. BINGO's superior performance and ease of use, even by non-specialists, make gene regulatory network inference available to any researcher, helping to decipher the complex mechanisms of life.
Collapse
|
40
|
Khan A, Saha G, Pal RK. Modified Half-System Based Method for Reverse Engineering of Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1303-1316. [PMID: 30640623 DOI: 10.1109/tcbb.2019.2892450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The accurate reconstruction of gene regulatory networks for proper understanding of the intricacies of complex biological mechanisms still provides motivation for researchers. Due to accessibility of various gene expression data, we can now attempt to computationally infer genetic interactions. Among the established network inference techniques, S-system is preferred because of its efficiency in replicating biological systems though it is computationally more expensive. This provides motivation for us to develop a similar system with lesser computational load. In this work, we have proposed a novel methodology for reverse engineering of gene regulatory networks based on a new technique: half-system. Half-systems use half the number of parameters compared to S-systems and thus significantly reduce the computational complexity. We have implemented our proposed technique for reconstructing four benchmark networks from their corresponding temporal expression profiles: an 8-gene, a 10-gene, and two 20-gene networks. Being a new technique, to the best of our knowledge, there are no comparable results for this in the contemporary literature. Therefore, we have compared our results with those obtained from the contemporary literature using other methodologies, including the state-of-the-art method, GENIE3. The results obtained in this work stack favourably against the competition, even showing quantifiable improvements in some cases.
Collapse
|
41
|
Razaghi-Moghadam Z, Nikoloski Z. Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. NPJ Syst Biol Appl 2020; 6:21. [PMID: 32606380 PMCID: PMC7327016 DOI: 10.1038/s41540-020-0140-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 06/09/2020] [Indexed: 02/07/2023] Open
Abstract
Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data from Escherichia coli and Saccharomyces cerevisiae as well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs from E. coli and S. cerevisiae to validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein–protein and protein–metabolite interactions.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany
| | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany. .,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany.
| |
Collapse
|
42
|
Abstract
Motivation A major challenge in molecular and cellular biology is to map out the regulatory networks of cells. As regulatory interactions can typically not be directly observed experimentally, various computational methods have been proposed to disentangling direct and indirect effects. Most of these rely on assumptions that are rarely met or cannot be adapted to a given context. Results We present a network inference method that is based on a simple response logic with minimal presumptions. It requires that we can experimentally observe whether or not some of the system’s components respond to perturbations of some other components, and then identifies the directed networks that most accurately account for the observed propagation of the signal. To cope with the intractable number of possible networks, we developed a logic programming approach that can infer networks of hundreds of nodes, while being robust to noisy, heterogeneous or missing data. This allows to directly integrate prior network knowledge and additional constraints such as sparsity. We systematically benchmark our method on KEGG pathways, and show that it outperforms existing approaches in DREAM3 and DREAM4 challenges. Applied to a novel perturbation dataset on PI3K and MAPK pathways in isogenic models of a colon cancer cell line, it generates plausible network hypotheses that explain distinct sensitivities toward various targeted inhibitors due to different PI3K mutants. Availability and implementation A Python/Answer Set Programming implementation can be accessed at github.com/GrossTor/response-logic. Data and analysis scripts are available at github.com/GrossTor/response-logic-projects. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Torsten Gross
- Institut für Pathologie, Charité-Universitätsmedizin, Berlin.,IRI Life Sciences, Humboldt Universität zu Berlin, Berlin.,Berlin Institute of Health, Berlin, Germany
| | | | - Yibing Yan
- Oncology Biomarker Development, Genentech Inc., South San Francisco, CA, USA
| | - Nils Blüthgen
- Institut für Pathologie, Charité-Universitätsmedizin, Berlin.,IRI Life Sciences, Humboldt Universität zu Berlin, Berlin.,Berlin Institute of Health, Berlin, Germany
| |
Collapse
|
43
|
Cirrone J, Brooks MD, Bonneau R, Coruzzi GM, Shasha DE. OutPredict: multiple datasets can improve prediction of expression and inference of causality. Sci Rep 2020; 10:6804. [PMID: 32321967 PMCID: PMC7176633 DOI: 10.1038/s41598-020-63347-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 03/26/2020] [Indexed: 01/09/2023] Open
Abstract
The ability to accurately predict the causal relationships from transcription factors to genes would greatly enhance our understanding of transcriptional dynamics. This could lead to applications in which one or more transcription factors could be manipulated to effect a change in genes leading to the enhancement of some desired trait. Here we present a method called OutPredict that constructs a model for each gene based on time series (and other) data and that predicts gene's expression in a previously unseen subsequent time point. The model also infers causal relationships based on the most important transcription factors for each gene model, some of which have been validated from previous physical experiments. The method benefits from known network edges and steady-state data to enhance predictive accuracy. Our results across B. subtilis, Arabidopsis, E.coli, Drosophila and the DREAM4 simulated in silico dataset show improved predictive accuracy ranging from 40% to 60% over other state-of-the-art methods. We find that gene expression models can benefit from the addition of steady-state data to predict expression values of time series. Finally, we validate, based on limited available data, that the influential edges we infer correspond to known relationships significantly more than expected by chance or by state-of-the-art methods.
Collapse
Affiliation(s)
- Jacopo Cirrone
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, 10012, USA.
| | - Matthew D Brooks
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Richard Bonneau
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, 10012, USA
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, 10010, USA
| | - Gloria M Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Dennis E Shasha
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, 10012, USA
| |
Collapse
|
44
|
Liu L, Liu J. Reconstructing gene regulatory networks via memetic algorithm and LASSO based on recurrent neural networks. Soft comput 2020. [DOI: 10.1007/s00500-019-04185-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
45
|
Shi J, Zhao J, Liu X, Chen L, Li T. Quantifying Direct Dependencies in Biological Networks by Multiscale Association Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:449-458. [PMID: 29994264 DOI: 10.1109/tcbb.2018.2846648] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Partial correlation (PC) or conditional mutual information (CMI) is widely used in detecting direct dependencies between the observed variables in biological networks by eliminating indirect correlations/associations, but it fails whenever there are some strong correlations in a network. In this paper, we theoretically develop a multiscale association analysis to overcome this flaw. We propose a new measure, partial association (PA), based on the multiscale conditional mutual information. We show that linear PA and nonlinear PA have clear advantages over PC and CMI from both theoretical and computational aspects. Both simulated models and real omics datasets demonstrate that PA is superior to PC and CMI in terms of accuracy, and is a powerful tool to identify the direct associations or reconstruct molecular networks based on the observed data. Survival and functional analyses of the hub genes in the gene networks reconstructed from TCGA data for different cancers also validated the effectiveness of our method.
Collapse
|
46
|
Li W, Zhang W, Zhang J. A Novel Model Integration Network Inference Algorithm with Clustering and Hub Genes Finding. Mol Inform 2020; 39:e1900075. [DOI: 10.1002/minf.201900075] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Accepted: 01/14/2020] [Indexed: 11/08/2022]
Affiliation(s)
- Wenchao Li
- State Key Laboratory of Industrial Control TechnologyInstitute of Cyber-Systems and Control of Zhejiang University Hangzhou China
| | - Wei Zhang
- State Key Laboratory of Industrial Control TechnologyInstitute of Cyber-Systems and Control of Zhejiang University Hangzhou China
| | - Jianming Zhang
- State Key Laboratory of Industrial Control TechnologyInstitute of Cyber-Systems and Control of Zhejiang University Hangzhou China
| |
Collapse
|
47
|
Jackson CA, Castro DM, Saldi GA, Bonneau R, Gresham D. Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments. eLife 2020; 9:e51254. [PMID: 31985403 PMCID: PMC7004572 DOI: 10.7554/elife.51254] [Citation(s) in RCA: 92] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 01/10/2020] [Indexed: 11/13/2022] Open
Abstract
Understanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse transcriptionally barcoded gene deletion mutants in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We benchmarked a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,228 interactions.
Collapse
Affiliation(s)
- Christopher A Jackson
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
| | | | | | - Richard Bonneau
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
- Courant Institute of Mathematical Sciences, Computer Science DepartmentNew York UniversityNew YorkUnited States
- Center For Data ScienceNew York UniversityNew YorkUnited States
- Flatiron Institute, Center for Computational BiologySimons FoundationNew YorkUnited States
| | - David Gresham
- Center For Genomics and Systems BiologyNew York UniversityNew YorkUnited States
- Department of BiologyNew York UniversityNew YorkUnited States
| |
Collapse
|
48
|
Deep learning for inferring gene relationships from single-cell expression data. Proc Natl Acad Sci U S A 2019; 116:27151-27158. [PMID: 31822622 DOI: 10.1073/pnas.1911536116] [Citation(s) in RCA: 106] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Several methods were developed to mine gene-gene relationships from expression data. Examples include correlation and mutual information methods for coexpression analysis, clustering and undirected graphical models for functional assignments, and directed graphical models for pathway reconstruction. Using an encoding for gene expression data, followed by deep neural networks analysis, we present a framework that can successfully address all of these diverse tasks. We show that our method, convolutional neural network for coexpression (CNNC), improves upon prior methods in tasks ranging from predicting transcription factor targets to identifying disease-related genes to causality inference. CNNC's encoding provides insights about some of the decisions it makes and their biological basis. CNNC is flexible and can easily be extended to integrate additional types of genomics data, leading to further improvements in its performance.
Collapse
|
49
|
A Stable, Unified Density Controlled Memetic Algorithm for Gene Regulatory Network Reconstruction Based on Sparse Fuzzy Cognitive Maps. Neural Process Lett 2019. [DOI: 10.1007/s11063-019-10056-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
50
|
Liu L, Liu J. A sparse and decomposed particle swarm optimization for inferring gene regulatory networks based on fuzzy cognitive maps. J Bioinform Comput Biol 2019; 17:1950023. [PMID: 31617458 DOI: 10.1142/s0219720019500239] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Inferring gene regulatory networks (GRNs) is vital to understand the complex cellular processes and reveal the regulatory mechanisms among genes. Although various methods have been developed, more accurate algorithms which can control the sparseness of GRNs still need to be developed. In this work, we model GRNs by fuzzy cognitive maps (FCMs), and a node in an FCM means a gene. Then, a new sparse and decomposed particle swarm optimization, termed as SDPSOFCM-GRN, is proposed to train FCMs, which employs the least absolute shrinkage and selection operator (Lasso) to control the network sparseness with a decomposed strategy. In the experiments, the performance of SDPSOFCM-GRN is validated on synthetic data and the well-known benchmark DREAM3 and DREAM4. The results show that SDPSOFCM-GRN can well control the sparseness of GRNs, and infer directed GRNs with high accuracy and efficiency.
Collapse
Affiliation(s)
- Luowen Liu
- Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an 710071, P. R. China
| | - Jing Liu
- Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an 710071, P. R. China
| |
Collapse
|