1
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024; 25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA.
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA.
- Department of Neuro-Science, University of Wisconsin-Madison - Waisman Center, Madison, USA.
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA
- Department of Biology, NYU, New York, NY, 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY, 10003, USA.
- Center For Data Science, NYU, New York, NY, 10008, USA.
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA.
| |
Collapse
|
2
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526909. [PMID: 36778259 PMCID: PMC9915715 DOI: 10.1101/2023.02.02.526909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY 10003, USA
- Center For Data Science, NYU, New York, NY 10008, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| |
Collapse
|
3
|
Seçilmiş D, Nelander S, Sonnhammer ELL. Optimal Sparsity Selection Based on an Information Criterion for Accurate Gene Regulatory Network Inference. Front Genet 2022; 13:855770. [PMID: 35923701 PMCID: PMC9340570 DOI: 10.3389/fgene.2022.855770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 05/30/2022] [Indexed: 11/25/2022] Open
Abstract
Accurate inference of gene regulatory networks (GRNs) is important to unravel unknown regulatory mechanisms and processes, which can lead to the identification of treatment targets for genetic diseases. A variety of GRN inference methods have been proposed that, under suitable data conditions, perform well in benchmarks that consider the entire spectrum of false-positives and -negatives. However, it is very challenging to predict which single network sparsity gives the most accurate GRN. Lacking criteria for sparsity selection, a simplistic solution is to pick the GRN that has a certain number of links per gene, which is guessed to be reasonable. However, this does not guarantee finding the GRN that has the correct sparsity or is the most accurate one. In this study, we provide a general approach for identifying the most accurate and sparsity-wise relevant GRN within the entire space of possible GRNs. The algorithm, called SPA, applies a “GRN information criterion” (GRNIC) that is inspired by two commonly used model selection criteria, Akaike and Bayesian Information Criterion (AIC and BIC) but adapted to GRN inference. The results show that the approach can, in most cases, find the GRN whose sparsity is close to the true sparsity and close to as accurate as possible with the given GRN inference method and data. The datasets and source code can be found at https://bitbucket.org/sonnhammergrni/spa/.
Collapse
Affiliation(s)
- Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Sven Nelander
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Erik L. L. Sonnhammer
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
- *Correspondence: Erik L. L. Sonnhammer,
| |
Collapse
|
4
|
Hillerton T, Seçilmiş D, Nelander S, Sonnhammer ELL. Fast and accurate gene regulatory network inference by normalized least squares regression. Bioinformatics 2022; 38:2263-2268. [PMID: 35176145 PMCID: PMC9004640 DOI: 10.1093/bioinformatics/btac103] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 01/10/2022] [Accepted: 02/15/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes. RESULTS We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO. AVAILABILITY AND IMPLEMENTATION Data: https://bitbucket.org/sonnhammergrni/lscon; Code: https://bitbucket.org/sonnhammergrni/genespider. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, 17121 Solna, Sweden
| | - Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, 17121 Solna, Sweden
| | - Sven Nelander
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 75185 Uppsala, Sweden
| | | |
Collapse
|
5
|
Zhivkoplias EK, Vavulov O, Hillerton T, Sonnhammer ELL. Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops. Front Genet 2022; 13:815692. [PMID: 35222536 PMCID: PMC8872634 DOI: 10.3389/fgene.2022.815692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/13/2022] [Indexed: 11/13/2022] Open
Abstract
The regulatory relationships between genes and proteins in a cell form a gene regulatory network (GRN) that controls the cellular response to changes in the environment. A number of inference methods to reverse engineer the original GRN from large-scale expression data have recently been developed. However, the absence of ground-truth GRNs when evaluating the performance makes realistic simulations of GRNs necessary. One aspect of this is that local network motif analysis of real GRNs indicates that the feed-forward loop (FFL) is significantly enriched. To simulate this properly, we developed a novel motif-based preferential attachment algorithm, FFLatt, which outperformed the popular GeneNetWeaver network generation tool in reproducing the FFL motif occurrence observed in literature-based biological GRNs. It also preserves important topological properties such as scale-free topology, sparsity, and average in/out-degree per node. We conclude that FFLatt is well-suited as a network generation module for a benchmarking framework with the aim to provide fair and robust performance evaluation of GRN inference methods.
Collapse
Affiliation(s)
- Erik K. Zhivkoplias
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Oleg Vavulov
- Bioinformatics Institute, St. Petersburg, Russia
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Erik L. L. Sonnhammer
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
- *Correspondence: Erik L. L. Sonnhammer,
| |
Collapse
|
6
|
Seçilmiş D, Hillerton T, Nelander S, Sonnhammer ELL. Inferring the experimental design for accurate gene regulatory network inference. Bioinformatics 2021; 37:3553-3559. [PMID: 33978748 PMCID: PMC8545292 DOI: 10.1093/bioinformatics/btab367] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 03/29/2021] [Accepted: 05/11/2021] [Indexed: 11/17/2022] Open
Abstract
Motivation Accurate inference of gene regulatory interactions is of importance for understanding the mechanisms of underlying biological processes. For gene expression data gathered from targeted perturbations, gene regulatory network (GRN) inference methods that use the perturbation design are the top performing methods. However, the connection between the perturbation design and gene expression can be obfuscated due to problems, such as experimental noise or off-target effects, limiting the methods’ ability to reconstruct the true GRN. Results In this study, we propose an algorithm, IDEMAX, to infer the effective perturbation design from gene expression data in order to eliminate the potential risk of fitting a disconnected perturbation design to gene expression. We applied IDEMAX to synthetic data from two different data generation tools, GeneNetWeaver and GeneSPIDER, and assessed its effect on the experiment design matrix as well as the accuracy of the GRN inference, followed by application to a real dataset. The results show that our approach consistently improves the accuracy of GRN inference compared to using the intended perturbation design when much of the signal is hidden by noise, which is often the case for real data. Availability and implementation https://bitbucket.org/sonnhammergrni/idemax. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna, 17121, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna, 17121, Sweden
| | - Sven Nelander
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna, 17121, Sweden
| |
Collapse
|
7
|
Seçilmiş D, Hillerton T, Morgan D, Tjärnberg A, Nelander S, Nordling TEM, Sonnhammer ELL. Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data. NPJ Syst Biol Appl 2020; 6:37. [PMID: 33168813 PMCID: PMC7652823 DOI: 10.1038/s41540-020-00154-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 10/15/2020] [Indexed: 01/11/2023] Open
Abstract
The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where ~1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. We found that these datasets have a very low signal-to-noise ratio (SNR) level causing them to be too uninformative to infer accurate GRNs. We developed a gene reduction pipeline in which we eliminate uninformative genes from the system using a selection criterion based on SNR, until reaching an informative subset. The results show that our pipeline can identify an informative subset in an overall uninformative dataset, allowing inference of accurate subset GRNs. The accurate GRNs were functionally characterized and potential novel cancer-related regulatory interactions were identified.
Collapse
Affiliation(s)
- Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden
| | - Daniel Morgan
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden
| | - Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, USA
| | - Sven Nelander
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|
8
|
Perturbation-based gene regulatory network inference to unravel oncogenic mechanisms. Sci Rep 2020; 10:14149. [PMID: 32843692 PMCID: PMC7447758 DOI: 10.1038/s41598-020-70941-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Accepted: 07/22/2020] [Indexed: 01/01/2023] Open
Abstract
The gene regulatory network (GRN) of human cells encodes mechanisms to ensure proper functioning. However, if this GRN is dysregulated, the cell may enter into a disease state such as cancer. Understanding the GRN as a system can therefore help identify novel mechanisms underlying disease, which can lead to new therapies. To deduce regulatory interactions relevant to cancer, we applied a recent computational inference framework to data from perturbation experiments in squamous carcinoma cell line A431. GRNs were inferred using several methods, and the false discovery rate was controlled by the NestBoot framework. We developed a novel approach to assess the predictiveness of inferred GRNs against validation data, despite the lack of a gold standard. The best GRN was significantly more predictive than the null model, both in cross-validated benchmarks and for an independent dataset of the same genes under a different perturbation design. The inferred GRN captures many known regulatory interactions central to cancer-relevant processes in addition to predicting many novel interactions, some of which were experimentally validated, thus providing mechanistic insights that are useful for future cancer research.
Collapse
|
9
|
Morgan D, Tjärnberg A, Nordling TEM, Sonnhammer ELL. A generalized framework for controlling FDR in gene regulatory network inference. Bioinformatics 2018; 35:1026-1032. [DOI: 10.1093/bioinformatics/bty764] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 08/23/2018] [Accepted: 08/28/2018] [Indexed: 12/23/2022] Open
Affiliation(s)
- Daniel Morgan
- Department of Biochemistry and Biophysics, Stockholm Bioinformatics Center, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Andreas Tjärnberg
- Department of Physics, Chemistry and Biology/Bioinformatics, Linköping University, Linköping, Sweden
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm Bioinformatics Center, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| |
Collapse
|
10
|
Tjärnberg A, Morgan DC, Studham M, Nordling TEM, Sonnhammer ELL. GeneSPIDER – gene regulatory network inference benchmarking with controlled network and data properties. MOLECULAR BIOSYSTEMS 2017; 13:1304-1312. [DOI: 10.1039/c7mb00058h] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
A key question in network inference, that has not been properly answered, is what accuracy can be expected for a given biological dataset and inference method.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Stockholm Bioinformatics Center
- Science for Life Laboratory
- Sweden
- Department of Biochemistry and Biophysics
- Stockholm University
| | - Daniel C. Morgan
- Stockholm Bioinformatics Center
- Science for Life Laboratory
- Sweden
- Department of Biochemistry and Biophysics
- Stockholm University
| | - Matthew Studham
- Stockholm Bioinformatics Center
- Science for Life Laboratory
- Sweden
- Department of Biochemistry and Biophysics
- Stockholm University
| | - Torbjörn E. M. Nordling
- Stockholm Bioinformatics Center
- Science for Life Laboratory
- Sweden
- Department of Mechanical Engineering
- National Cheng Kung University
| | - Erik L. L. Sonnhammer
- Stockholm Bioinformatics Center
- Science for Life Laboratory
- Sweden
- Department of Biochemistry and Biophysics
- Stockholm University
| |
Collapse
|
11
|
Mangan NM, Brunton SL, Proctor JL, Kutz JN. Inferring Biological Networks by Sparse Identification of Nonlinear Dynamics. ACTA ACUST UNITED AC 2016. [DOI: 10.1109/tmbmc.2016.2633265] [Citation(s) in RCA: 172] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
12
|
Mayer G, Marcus K, Eisenacher M, Kohl M. Boolean modeling techniques for protein co-expression networks in systems medicine. Expert Rev Proteomics 2016; 13:555-69. [PMID: 27105325 DOI: 10.1080/14789450.2016.1181546] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
INTRODUCTION Application of systems biology/systems medicine approaches is promising for proteomics/biomedical research, but requires selection of an adequate modeling type. AREAS COVERED This article reviews the existing Boolean network modeling approaches, which provide in comparison with alternative modeling techniques several advantages for the processing of proteomics data. Application of methods for inference, reduction and validation of protein co-expression networks that are derived from quantitative high-throughput proteomics measurements is presented. It's also shown how Boolean models can be used to derive system-theoretic characteristics that describe both the dynamical behavior of such networks as a whole and the properties of different cell states (e.g. healthy or diseased cell states). Furthermore, application of methods derived from control theory is proposed in order to simulate the effects of therapeutic interventions on such networks, which is a promising approach for the computer-assisted discovery of biomarkers and drug targets. Finally, the clinical application of Boolean modeling analyses is discussed. Expert commentary: Boolean modeling of proteomics data is still in its infancy. Progress in this field strongly depends on provision of a repository with public access to relevant reference models. Also required are community supported standards that facilitate input of both proteomics and patient related data (e.g. age, gender, laboratory results, etc.).
Collapse
Affiliation(s)
- Gerhard Mayer
- a Medizinisches Proteom Center (MPC) , Ruhr-Universität Bochum , Bochum , Germany
| | - Katrin Marcus
- a Medizinisches Proteom Center (MPC) , Ruhr-Universität Bochum , Bochum , Germany
| | - Martin Eisenacher
- a Medizinisches Proteom Center (MPC) , Ruhr-Universität Bochum , Bochum , Germany
| | - Michael Kohl
- a Medizinisches Proteom Center (MPC) , Ruhr-Universität Bochum , Bochum , Germany
| |
Collapse
|
13
|
Tjärnberg A, Nordling TEM, Studham M, Nelander S, Sonnhammer ELL. Avoiding pitfalls in L1-regularised inference of gene networks. MOLECULAR BIOSYSTEMS 2015; 11:287-96. [DOI: 10.1039/c4mb00419a] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
L1 regularisation methods fail to infer the correct network even when the data are so informative that all existing links can be proven to exist.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Stockholm Bioinformatics Centre
- Science for Life Laboratory
- 17121 Solna
- Sweden
- Department of Biochemistry and Biophysics
| | - Torbjörn E. M. Nordling
- Stockholm Bioinformatics Centre
- Science for Life Laboratory
- 17121 Solna
- Sweden
- Department of Immunology
| | - Matthew Studham
- Stockholm Bioinformatics Centre
- Science for Life Laboratory
- 17121 Solna
- Sweden
| | - Sven Nelander
- Department of Immunology
- Genetics and Pathology
- Uppsala University
- Rudbeck laboratory
- 75185 Uppsala
| | - Erik L. L. Sonnhammer
- Stockholm Bioinformatics Centre
- Science for Life Laboratory
- 17121 Solna
- Sweden
- Department of Biochemistry and Biophysics
| |
Collapse
|
14
|
Studham ME, Tjärnberg A, Nordling TEM, Nelander S, Sonnhammer ELL. Functional association networks as priors for gene regulatory network inference. ACTA ACUST UNITED AC 2014; 30:i130-8. [PMID: 24931976 PMCID: PMC4058914 DOI: 10.1093/bioinformatics/btu285] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Motivation: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. Results: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data. Contact:matthew.studham@scilifelab.se Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew E Studham
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Andreas Tjärnberg
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Torbjörn E M Nordling
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Sven Nelander
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| |
Collapse
|
15
|
He FQ, Wang W, Zheng P, Sudhakar P, Sun J, Zeng AP. Essential O2-responsive genes of Pseudomonas aeruginosa and their network revealed by integrating dynamic data from inverted conditions. Integr Biol (Camb) 2014; 6:215-23. [PMID: 24413814 DOI: 10.1039/c3ib40180d] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Identification of the gene network through which Pseudomonas aeruginosa PAO1 (PA) adapts to altered oxygen-availability environments is essential for a better understanding of stress responses and pathogenicity of PA. We performed high-time-resolution (HTR) transcriptome analyses of PA in a continuous cultivation system during the transition from high oxygen tension to low oxygen tension (HLOT) and the reversed transition from low to high oxygen tension (LHOT). From those genes responsive to both transient conditions, we identified 85 essential oxygen-availability responsive genes (EORGs), including the expected ones (arcDABC) encoding enzymes for arginine fermentation. We then constructed the regulatory network for the EORGs of PA by integrating information from binding motif searching, literature and HTR data. Notably, our results show that only the sub-networks controlled by the well-known oxygen-responsive transcription factors show a very high consistency between the inferred network and literature knowledge, e.g. 87.5% and 83.3% of the obtained sub-network controlled by the anaerobic regulator (ANR) and a quorum sensing regulator RhIR, respectively. These results not only reveal stringent EORGs of PA and their transcription regulatory network, but also highlight that achieving a high accuracy of the inferred regulatory network might be feasible only for the apparently affected regulators under the given conditions but not for all the expressed regulators on a genome scale.
Collapse
Affiliation(s)
- Feng Q He
- Helmholtz Centre for Infection Research, D-38124, Braunschweig, Germany
| | | | | | | | | | | |
Collapse
|