1
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks. Comput Biol Med 2024; 179:108850. [PMID: 39013340 DOI: 10.1016/j.compbiomed.2024.108850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND AND OBJECTIVE Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
| | - José García-Nieto
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - José F Aldana-Montes
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
2
|
Yu J, Leng J, Yuan F, Sun D, Wu LY. Reverse network diffusion to remove indirect noise for better inference of gene regulatory networks. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae435. [PMID: 38963312 PMCID: PMC11236096 DOI: 10.1093/bioinformatics/btae435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 06/24/2024] [Accepted: 07/03/2024] [Indexed: 07/05/2024]
Abstract
MOTIVATION Gene regulatory networks (GRNs) are vital tools for delineating regulatory relationships between transcription factors and their target genes. The boom in computational biology and various biotechnologies has made inferring GRNs from multi-omics data a hot topic. However, when networks are constructed from gene expression data, they often suffer from false-positive problem due to the transitive effects of correlation. The presence of spurious noise edges obscures the real gene interactions, which makes downstream analyses, such as detecting gene function modules and predicting disease-related genes, difficult and inefficient. Therefore, there is an urgent and compelling need to develop network denoising methods to improve the accuracy of GRN inference. RESULTS In this study, we proposed a novel network denoising method named REverse Network Diffusion On Random walks (RENDOR). RENDOR is designed to enhance the accuracy of GRNs afflicted by indirect effects. RENDOR takes noisy networks as input, models higher-order indirect interactions between genes by transitive closure, eliminates false-positive effects using the inverse network diffusion method, and produces refined networks as output. We conducted a comparative assessment of GRN inference accuracy before and after denoising on simulated networks and real GRNs. Our results emphasized that the network derived from RENDOR more accurately and effectively captures gene interactions. This study demonstrates the significance of removing network indirect noise and highlights the effectiveness of the proposed method in enhancing the signal-to-noise ratio of noisy networks. AVAILABILITY AND IMPLEMENTATION The R package RENDOR is provided at https://github.com/Wu-Lab/RENDOR and other source code and data are available at https://github.com/Wu-Lab/RENDOR-reproduce.
Collapse
Affiliation(s)
- Jiating Yu
- School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Zhejiang Lab, Hangzhou 311121, China
| | - Fan Yuan
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Duanchen Sun
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Ling-Yun Wu
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
3
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Ioannis Mouratidis
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Nikol Chantzi
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Yasin Uzun
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Ilias Georgakopoulos-Soares
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
| |
Collapse
|
4
|
Kion-Crosby W, Barquist L. Network depth affects inference of gene sets from bacterial transcriptomes using denoising autoencoders. BIOINFORMATICS ADVANCES 2024; 4:vbae066. [PMID: 39027639 PMCID: PMC11256956 DOI: 10.1093/bioadv/vbae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/05/2024] [Accepted: 05/02/2024] [Indexed: 07/20/2024]
Abstract
Summary The increasing number of publicly available bacterial gene expression data sets provides an unprecedented resource for the study of gene regulation in diverse conditions, but emphasizes the need for self-supervised methods for the automated generation of new hypotheses. One approach for inferring coordinated regulation from bacterial expression data is through neural networks known as denoising autoencoders (DAEs) which encode large datasets in a reduced bottleneck layer. We have generalized this application of DAEs to include deep networks and explore the effects of network architecture on gene set inference using deep learning. We developed a DAE-based pipeline to extract gene sets from transcriptomic data in Escherichia coli, validate our method by comparing inferred gene sets with known pathways, and have used this pipeline to explore how the choice of network architecture impacts gene set recovery. We find that increasing network depth leads the DAEs to explain gene expression in terms of fewer, more concisely defined gene sets, and that adjusting the width results in a tradeoff between generalizability and biological inference. Finally, leveraging our understanding of the impact of DAE architecture, we apply our pipeline to an independent uropathogenic E.coli dataset to identify genes uniquely induced during human colonization. Availability and implementation https://github.com/BarquistLab/DAE_architecture_exploration.
Collapse
Affiliation(s)
- Willow Kion-Crosby
- Helmholtz Institute for RNA-based Infection Research (HIRI)/Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
- Faculty of Medicine, University of Würzburg, 97080 Würzburg, Germany
| | - Lars Barquist
- Helmholtz Institute for RNA-based Infection Research (HIRI)/Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
- Faculty of Medicine, University of Würzburg, 97080 Würzburg, Germany
- Department of Biology, University of Toronto, Mississauga, ON L5L 1C6, Canada
| |
Collapse
|
5
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024; 25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA.
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA.
- Department of Neuro-Science, University of Wisconsin-Madison - Waisman Center, Madison, USA.
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA
- Department of Biology, NYU, New York, NY, 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY, 10003, USA.
- Center For Data Science, NYU, New York, NY, 10008, USA.
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA.
| |
Collapse
|
6
|
Raja R, Khanum S, Aboulmouna L, Maurya MR, Gupta S, Subramaniam S, Ramkrishna D. Modeling transcriptional regulation of the cell cycle using a novel cybernetic-inspired approach. Biophys J 2024; 123:221-234. [PMID: 38102827 PMCID: PMC10808046 DOI: 10.1016/j.bpj.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 09/18/2023] [Accepted: 12/12/2023] [Indexed: 12/17/2023] Open
Abstract
Quantitative understanding of cellular processes, such as cell cycle and differentiation, is impeded by various forms of complexity ranging from myriad molecular players and their multilevel regulatory interactions, cellular evolution with multiple intermediate stages, lack of elucidation of cause-effect relationships among the many system players, and the computational complexity associated with the profusion of variables and parameters. In this paper, we present a modeling framework based on the cybernetic concept that biological regulation is inspired by objectives embedding rational strategies for dimension reduction, process stage specification through the system dynamics, and innovative causal association of regulatory events with the ability to predict the evolution of the dynamical system. The elementary step of the modeling strategy involves stage-specific objective functions that are computationally determined from experiments, augmented with dynamical network computations involving endpoint objective functions, mutual information, change-point detection, and maximal clique centrality. We demonstrate the power of the method through application to the mammalian cell cycle, which involves thousands of biomolecules engaged in signaling, transcription, and regulation. Starting with a fine-grained transcriptional description obtained from RNA sequencing measurements, we develop an initial model, which is then dynamically modeled using the cybernetic-inspired method, based on the strategies described above. The cybernetic-inspired method is able to distill the most significant interactions from a multitude of possibilities. In addition to capturing the complexity of regulatory processes in a mechanistically causal and stage-specific manner, we identify the functional network modules, including novel cell cycle stages. Our model is able to predict future cell cycles consistent with experimental measurements. We posit that this innovative framework has the promise to extend to the dynamics of other biological processes, with a potential to provide novel mechanistic insights.
Collapse
Affiliation(s)
- Rubesh Raja
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Sana Khanum
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Lina Aboulmouna
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Mano R Maurya
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shakti Gupta
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shankar Subramaniam
- Department of Bioengineering, University of California San Diego, La Jolla, California; Departments of Computer Science and Engineering, Cellular and Molecular Medicine, San Diego Supercomputer Center, and the Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, California.
| | - Doraiswami Ramkrishna
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana.
| |
Collapse
|
7
|
Tangeman JA, Rebull SM, Grajales-Esquivel E, Weaver JM, Bendezu-Sayas S, Robinson ML, Lachke SA, Del Rio-Tsonis K. Integrated single-cell multiomics uncovers foundational regulatory mechanisms of lens development and pathology. Development 2024; 151:dev202249. [PMID: 38180241 PMCID: PMC10906490 DOI: 10.1242/dev.202249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 11/28/2023] [Indexed: 01/06/2024]
Abstract
Ocular lens development entails epithelial to fiber cell differentiation, defects in which cause congenital cataracts. We report the first single-cell multiomic atlas of lens development, leveraging snRNA-seq, snATAC-seq and CUT&RUN-seq to discover previously unreported mechanisms of cell fate determination and cataract-linked regulatory networks. A comprehensive profile of cis- and trans-regulatory interactions, including for the cataract-linked transcription factor MAF, is established across a temporal trajectory of fiber cell differentiation. Furthermore, we identify an epigenetic paradigm of cellular differentiation, defined by progressive loss of the H3K27 methylation writer Polycomb repressive complex 2 (PRC2). PRC2 localizes to heterochromatin domains across master-regulator transcription factor gene bodies, suggesting it safeguards epithelial cell fate. Moreover, we demonstrate that FGF hyper-stimulation in vivo leads to MAF network activation and the emergence of novel lens cell states. Collectively, these data depict a comprehensive portrait of lens fiber cell differentiation, while defining regulatory effectors of cell identity and cataract formation.
Collapse
Affiliation(s)
- Jared A. Tangeman
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056, USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056, USA
| | - Sofia M. Rebull
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056, USA
| | - Erika Grajales-Esquivel
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056, USA
| | - Jacob M. Weaver
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056, USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056, USA
| | - Stacy Bendezu-Sayas
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056, USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056, USA
| | - Michael L. Robinson
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056, USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056, USA
| | - Salil A. Lachke
- Department of Biological Sciences, University of Delaware, Newark, DE 19716, USA
- Center for Bioinformatics & Computational Biology, University of Delaware, Newark, DE 19713, USA
| | - Katia Del Rio-Tsonis
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056, USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056, USA
| |
Collapse
|
8
|
Arriojas A, Patalano S, Macoska J, Zarringhalam K. A Bayesian noisy logic model for inference of transcription factor activity from single cell and bulk transcriptomic data. NAR Genom Bioinform 2023; 5:lqad106. [PMID: 38094309 PMCID: PMC10716740 DOI: 10.1093/nargab/lqad106] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 11/12/2023] [Accepted: 11/24/2023] [Indexed: 12/20/2023] Open
Abstract
The advent of high-throughput sequencing has made it possible to measure the expression of genes at relatively low cost. However, direct measurement of regulatory mechanisms, such as transcription factor (TF) activity is still not readily feasible in a high-throughput manner. Consequently, there is a need for computational approaches that can reliably estimate regulator activity from observable gene expression data. In this work, we present a noisy Boolean logic Bayesian model for TF activity inference from differential gene expression data and causal graphs. Our approach provides a flexible framework to incorporate biologically motivated TF-gene regulation logic models. Using simulations and controlled over-expression experiments in cell cultures, we demonstrate that our method can accurately identify TF activity. Moreover, we apply our method to bulk and single cell transcriptomics measurements to investigate transcriptional regulation of fibroblast phenotypic plasticity. Finally, to facilitate usage, we provide user-friendly software packages and a web-interface to query TF activity from user input differential gene expression data: https://umbibio.math.umb.edu/nlbayes/.
Collapse
Affiliation(s)
- Argenis Arriojas
- Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA
- Department of Physics, University of Massachusetts Boston, Boston, MA 02125, USA
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Susan Patalano
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Jill Macoska
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Kourosh Zarringhalam
- Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| |
Collapse
|
9
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
10
|
Shojaee A, Huang SSC. Robust discovery of gene regulatory networks from single-cell gene expression data by Causal Inference Using Composition of Transactions. Brief Bioinform 2023; 24:bbad370. [PMID: 37897702 PMCID: PMC10612495 DOI: 10.1093/bib/bbad370] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 09/06/2023] [Accepted: 09/29/2023] [Indexed: 10/30/2023] Open
Abstract
Gene regulatory networks (GRNs) drive organism structure and functions, so the discovery and characterization of GRNs is a major goal in biological research. However, accurate identification of causal regulatory connections and inference of GRNs using gene expression datasets, more recently from single-cell RNA-seq (scRNA-seq), has been challenging. Here we employ the innovative method of Causal Inference Using Composition of Transactions (CICT) to uncover GRNs from scRNA-seq data. The basis of CICT is that if all gene expressions were random, a non-random regulatory gene should induce its targets at levels different from the background random process, resulting in distinct patterns in the whole relevance network of gene-gene associations. CICT proposes novel network features derived from a relevance network, which enable any machine learning algorithm to predict causal regulatory edges and infer GRNs. We evaluated CICT using simulated and experimental scRNA-seq data in a well-established benchmarking pipeline and showed that CICT outperformed existing network inference methods representing diverse approaches with many-fold higher accuracy. Furthermore, we demonstrated that GRN inference with CICT was robust to different levels of sparsity in scRNA-seq data, the characteristics of data and ground truth, the choice of association measure and the complexity of the supervised machine learning algorithm. Our results suggest aiming at directly predicting causality to recover regulatory relationships in complex biological networks substantially improves accuracy in GRN inference.
Collapse
Affiliation(s)
- Abbas Shojaee
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| | - Shao-shan Carol Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA
| |
Collapse
|
11
|
Peterson EJR, Brooks AN, Reiss DJ, Kaur A, Do J, Pan M, Wu WJ, Morrison R, Srinivas V, Carter W, Arrieta-Ortiz ML, Ruiz RA, Bhatt A, Baliga NS. MtrA modulates Mycobacterium tuberculosis cell division in host microenvironments to mediate intrinsic resistance and drug tolerance. Cell Rep 2023; 42:112875. [PMID: 37542718 PMCID: PMC10480492 DOI: 10.1016/j.celrep.2023.112875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 04/21/2023] [Accepted: 07/11/2023] [Indexed: 08/07/2023] Open
Abstract
The success of Mycobacterium tuberculosis (Mtb) is largely attributed to its ability to physiologically adapt and withstand diverse localized stresses within host microenvironments. Here, we present a data-driven model (EGRIN 2.0) that captures the dynamic interplay of environmental cues and genome-encoded regulatory programs in Mtb. Analysis of EGRIN 2.0 shows how modulation of the MtrAB two-component signaling system tunes Mtb growth in response to related host microenvironmental cues. Disruption of MtrAB by tunable CRISPR interference confirms that the signaling system regulates multiple peptidoglycan hydrolases, among other targets, that are important for cell division. Further, MtrA decreases the effectiveness of antibiotics by mechanisms of both intrinsic resistance and drug tolerance. Together, the model-enabled dissection of complex MtrA regulation highlights its importance as a drug target and illustrates how EGRIN 2.0 facilitates discovery and mechanistic characterization of Mtb adaptation to specific host microenvironments within the host.
Collapse
Affiliation(s)
| | | | - David J Reiss
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Amardeep Kaur
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Julie Do
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Min Pan
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Wei-Ju Wu
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Robert Morrison
- Laboratory of Malaria, Immunology and Vaccinology, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892, USA
| | | | - Warren Carter
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | - Rene A Ruiz
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Apoorva Bhatt
- School of Biosciences and Institute of Microbiology and Infection, University of Birmingham, Birmingham B15 2TT, UK
| | - Nitin S Baliga
- Institute for Systems Biology, Seattle, WA 98109, USA; Departments of Biology and Microbiology, University of Washington, Seattle, WA 98195, USA; Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98195, USA; Lawrence Berkeley National Lab, Berkeley, CA 94720, USA.
| |
Collapse
|
12
|
Li R, Rozum JC, Quail MM, Qasim MN, Sindi SS, Nobile CJ, Albert R, Hernday AD. Inferring gene regulatory networks using transcriptional profiles as dynamical attractors. PLoS Comput Biol 2023; 19:e1010991. [PMID: 37607190 PMCID: PMC10473541 DOI: 10.1371/journal.pcbi.1010991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 09/01/2023] [Accepted: 07/19/2023] [Indexed: 08/24/2023] Open
Abstract
Genetic regulatory networks (GRNs) regulate the flow of genetic information from the genome to expressed messenger RNAs (mRNAs) and thus are critical to controlling the phenotypic characteristics of cells. Numerous methods exist for profiling mRNA transcript levels and identifying protein-DNA binding interactions at the genome-wide scale. These enable researchers to determine the structure and output of transcriptional regulatory networks, but uncovering the complete structure and regulatory logic of GRNs remains a challenge. The field of GRN inference aims to meet this challenge using computational modeling to derive the structure and logic of GRNs from experimental data and to encode this knowledge in Boolean networks, Bayesian networks, ordinary differential equation (ODE) models, or other modeling frameworks. However, most existing models do not incorporate dynamic transcriptional data since it has historically been less widely available in comparison to "static" transcriptional data. We report the development of an evolutionary algorithm-based ODE modeling approach (named EA) that integrates kinetic transcription data and the theory of attractor matching to infer GRN architecture and regulatory logic. Our method outperformed six leading GRN inference methods, none of which incorporate kinetic transcriptional data, in predicting regulatory connections among TFs when applied to a small-scale engineered synthetic GRN in Saccharomyces cerevisiae. Moreover, we demonstrate the potential of our method to predict unknown transcriptional profiles that would be produced upon genetic perturbation of the GRN governing a two-state cellular phenotypic switch in Candida albicans. We established an iterative refinement strategy to facilitate candidate selection for experimentation; the experimental results in turn provide validation or improvement for the model. In this way, our GRN inference approach can expedite the development of a sophisticated mathematical model that can accurately describe the structure and dynamics of the in vivo GRN.
Collapse
Affiliation(s)
- Ruihao Li
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Jordan C. Rozum
- Department of Systems Science and Industrial Engineering, Binghamton University (State University of New York), Binghamton, New York, United States of America
| | - Morgan M. Quail
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Mohammad N. Qasim
- Quantitative and Systems Biology Graduate Program, University of California, Merced, Merced, California, United States of America
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, Merced, California, United States of America
| | - Clarissa J. Nobile
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| | - Réka Albert
- Department of Physics, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
- Department of Biology, Pennsylvania State University, University Park, University Park, Pennsylvania, United States of America
| | - Aaron D. Hernday
- Department of Molecular Cell Biology, University of California, Merced, Merced, California, United States of America
- Health Sciences Research Institute, University of California, Merced, Merced, California, United States of America
| |
Collapse
|
13
|
Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol 2023; 19:e1011254. [PMID: 37561790 PMCID: PMC10414591 DOI: 10.1371/journal.pcbi.1011254] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Abstract
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.
Collapse
Affiliation(s)
- Malvina Marku
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Vera Pancaldi
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
14
|
Kernfeld E, Yang Y, Weinstock JS, Battle A, Cahan P. A systematic comparison of computational methods for expression forecasting. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.28.551039. [PMID: 37577640 PMCID: PMC10418073 DOI: 10.1101/2023.07.28.551039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Due to the abundance of single cell RNA-seq data, a number of methods for predicting expression after perturbation have recently been published. Expression prediction methods are enticing because they promise to answer pressing questions in fields ranging from developmental genetics to cell fate engineering and because they are faster, cheaper, and higher-throughput than their experimental counterparts. However, the absolute and relative accuracy of these methods is poorly characterized, limiting their informed use, their improvement, and the interpretation of their predictions. To address these issues, we created a benchmarking platform that combines a panel of large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to current methods. We used our platform to systematically assess methods, parameters, and sources of auxiliary data. We found that uninformed baseline predictions, which were not always included in prior evaluations, yielded the same or better mean absolute error than benchmarked methods in all test cases. These results cast doubt on the ability of current expression forecasting methods to provide mechanistic insights or to rank hypotheses for experimental follow-up. However, given the rapid pace of innovation in the field, new approaches may yield more accurate expression predictions. Our platform will serve as a neutral benchmark to improve methods and to identify contexts in which expression prediction can succeed.
Collapse
|
15
|
Tangeman JA, Rebull SM, Grajales-Esquivel E, Weaver JM, Bendezu-Sayas S, Robinson ML, Lachke SA, Rio-Tsonis KD. Integrated single-cell multiomics uncovers foundational regulatory mechanisms of lens development and pathology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548451. [PMID: 37502967 PMCID: PMC10369908 DOI: 10.1101/2023.07.10.548451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Ocular lens development entails epithelial to fiber cell differentiation, defects in which cause congenital cataract. We report the first single-cell multiomic atlas of lens development, leveraging snRNA-seq, snATAC-seq, and CUT&RUN-seq to discover novel mechanisms of cell fate determination and cataract-linked regulatory networks. A comprehensive profile of cis- and trans-regulatory interactions, including for the cataract-linked transcription factor MAF, is established across a temporal trajectory of fiber cell differentiation. Further, we divulge a conserved epigenetic paradigm of cellular differentiation, defined by progressive loss of H3K27 methylation writer Polycomb repressive complex 2 (PRC2). PRC2 localizes to heterochromatin domains across master-regulator transcription factor gene bodies, suggesting it safeguards epithelial cell fate. Moreover, we demonstrate that FGF hyper-stimulation in vivo leads to MAF network activation and the emergence of novel lens cell states. Collectively, these data depict a comprehensive portrait of lens fiber cell differentiation, while defining regulatory effectors of cell identity and cataract formation.
Collapse
Affiliation(s)
- Jared A Tangeman
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056 USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056 USA
| | - Sofia M Rebull
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056 USA
| | - Erika Grajales-Esquivel
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056 USA
| | - Jacob M Weaver
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056 USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056 USA
| | - Stacy Bendezu-Sayas
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056 USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056 USA
| | - Michael L Robinson
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056 USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056 USA
| | - Salil A Lachke
- Department of Biological Sciences, University of Delaware, Newark, DE 19716 USA
- Center for Bioinformatics & Computational Biology, University of Delaware, Newark, DE 19713 USA
| | - Katia Del Rio-Tsonis
- Department of Biology and Center for Visual Sciences, Miami University, Oxford, OH 45056 USA
- Cell, Molecular, and Structural Biology Program, Miami University, Oxford, OH 45056 USA
| |
Collapse
|
16
|
Mbebi AJ, Nikoloski Z. Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection. PLoS Comput Biol 2023; 19:e1010832. [PMID: 37523414 PMCID: PMC10414675 DOI: 10.1371/journal.pcbi.1010832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 08/10/2023] [Accepted: 07/11/2023] [Indexed: 08/02/2023] Open
Abstract
Despite extensive research efforts, reconstruction of gene regulatory networks (GRNs) from transcriptomics data remains a pressing challenge in systems biology. While non-linear approaches for reconstruction of GRNs show improved performance over simpler alternatives, we do not yet have understanding if joint modelling of multiple target genes may improve performance, even under linearity assumptions. To address this problem, we propose two novel approaches that cast the GRN reconstruction problem as a blend between regularized multivariate regression and graphical models that combine the L2,1-norm with classical regularization techniques. We used data and networks from the DREAM5 challenge to show that the proposed models provide consistently good performance in comparison to contenders whose performance varies with data sets from simulation and experiments from model unicellular organisms Escherichia coli and Saccharomyces cerevisiae. Since the models' formulation facilitates the prediction of master regulators, we also used the resulting findings to identify master regulators over all data sets as well as their plasticity across different environments. Our results demonstrate that the identified master regulators are in line with experimental evidence from the model bacterium E. coli. Together, our study demonstrates that simultaneous modelling of several target genes results in improved inference of GRNs and can be used as an alternative in different applications.
Collapse
Affiliation(s)
- Alain J. Mbebi
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| | - Zoran Nikoloski
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| |
Collapse
|
17
|
Pan TC, Chockalingam SP, Aluru M, Aluru S. MCPNet: a parallel maximum capacity-based genome-scale gene network construction framework. Bioinformatics 2023; 39:btad373. [PMID: 37289522 PMCID: PMC10287961 DOI: 10.1093/bioinformatics/btad373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 04/06/2023] [Accepted: 06/06/2023] [Indexed: 06/10/2023] Open
Abstract
MOTIVATION Gene network reconstruction from gene expression profiles is a compute- and data-intensive problem. Numerous methods based on diverse approaches including mutual information, random forests, Bayesian networks, correlation measures, as well as their transforms and filters such as data processing inequality, have been proposed. However, an effective gene network reconstruction method that performs well in all three aspects of computational efficiency, data size scalability, and output quality remains elusive. Simple techniques such as Pearson correlation are fast to compute but ignore indirect interactions, while more robust methods such as Bayesian networks are prohibitively time consuming to apply to tens of thousands of genes. RESULTS We developed maximum capacity path (MCP) score, a novel maximum-capacity-path-based metric to quantify the relative strengths of direct and indirect gene-gene interactions. We further present MCPNet, an efficient, parallelized gene network reconstruction software based on MCP score, to reverse engineer networks in unsupervised and ensemble manners. Using synthetic and real Saccharomyces cervisiae datasets as well as real Arabidopsis thaliana datasets, we demonstrate that MCPNet produces better quality networks as measured by AUPRC, is significantly faster than all other gene network reconstruction software, and also scales well to tens of thousands of genes and hundreds of CPU cores. Thus, MCPNet represents a new gene network reconstruction tool that simultaneously achieves quality, performance, and scalability requirements. AVAILABILITY AND IMPLEMENTATION Source code freely available for download at https://doi.org/10.5281/zenodo.6499747 and https://github.com/AluruLab/MCPNet, implemented in C++ and supported on Linux.
Collapse
Affiliation(s)
- Tony C Pan
- Department of Biomedical Informatics, Emory University, Woodruff Memorial Research Building 101 Woodruff Circle, 4th Floor East, Atlanta, GA 30322, United States
- Institute for Data Engineering and Science, Georgia Institute of Technology, 756 W Peachtree St NW, 12th Floor, Atlanta, GA 30332, United States
| | - Sriram P Chockalingam
- Institute for Data Engineering and Science, Georgia Institute of Technology, 756 W Peachtree St NW, 12th Floor, Atlanta, GA 30332, United States
| | - Maneesha Aluru
- School of Biological Sciences, Georgia Institute of Technology, 310 Ferst Dr NW, Atlanta, GA 30332, United States
| | - Srinivas Aluru
- Institute for Data Engineering and Science, Georgia Institute of Technology, 756 W Peachtree St NW, 12th Floor, Atlanta, GA 30332, United States
- School of Computational Science and Engineering, Georgia Institute of Technology, 756 W Peachtree St NW, 13th Floor, Atlanta, GA 30332, United States
| |
Collapse
|
18
|
Shen B, Coruzzi G, Shasha D. EnsInfer: a simple ensemble approach to network inference outperforms any single method. BMC Bioinformatics 2023; 24:114. [PMID: 36964499 PMCID: PMC10037858 DOI: 10.1186/s12859-023-05231-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/15/2023] [Indexed: 03/26/2023] Open
Abstract
This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble model EnsInfer easily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes the EnsInfer ensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.
Collapse
Affiliation(s)
- Bingran Shen
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, 10012, USA
| | - Gloria Coruzzi
- Department of Biology, Center for Genomics and Systems Biology, New York University, 12 Waverly Pl, New York, 10003, USA
| | - Dennis Shasha
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, 10012, USA.
| |
Collapse
|
19
|
Abid D, Brent MR. NetProphet 3: a machine learning framework for transcription factor network mapping and multi-omics integration. Bioinformatics 2023; 39:7000334. [PMID: 36692138 PMCID: PMC9912366 DOI: 10.1093/bioinformatics/btad038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 01/11/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Many methods have been proposed for mapping the targets of transcription factors (TFs) from gene expression data. It is known that combining outputs from multiple methods can improve performance. To date, outputs have been combined by using either simplistic formulae, such as geometric mean, or carefully hand-tuned formulae that may not generalize well to new inputs. Finally, the evaluation of accuracy has been challenging due to the lack of genome-scale, ground-truth networks. RESULTS We developed NetProphet3, which combines scores from multiple analyses automatically, using a tree boosting algorithm trained on TF binding location data. We also developed three independent, genome-scale evaluation metrics. By these metrics, NetProphet3 is more accurate than other commonly used packages, including NetProphet 2.0, when gene expression data from direct TF perturbations are available. Furthermore, its integration mode can forge a consensus network from gene expression data and TF binding location data. AVAILABILITY AND IMPLEMENTATION All data and code are available at https://zenodo.org/record/7504131#.Y7Wu3i-B2x8. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dhoha Abid
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
20
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526909. [PMID: 36778259 PMCID: PMC9915715 DOI: 10.1101/2023.02.02.526909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY 10003, USA
- Center For Data Science, NYU, New York, NY 10008, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| |
Collapse
|
21
|
Escorcia-Rodríguez JM, Gaytan-Nuñez E, Hernandez-Benitez EM, Zorro-Aranda A, Tello-Palencia MA, Freyre-González JA. Improving gene regulatory network inference and assessment: The importance of using network structure. Front Genet 2023; 14:1143382. [PMID: 36926589 PMCID: PMC10012345 DOI: 10.3389/fgene.2023.1143382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/03/2023] Open
Abstract
Gene regulatory networks are graph models representing cellular transcription events. Networks are far from complete due to time and resource consumption for experimental validation and curation of the interactions. Previous assessments have shown the modest performance of the available network inference methods based on gene expression data. Here, we study several caveats on the inference of regulatory networks and methods assessment through the quality of the input data and gold standard, and the assessment approach with a focus on the global structure of the network. We used synthetic and biological data for the predictions and experimentally-validated biological networks as the gold standard (ground truth). Standard performance metrics and graph structural properties suggest that methods inferring co-expression networks should no longer be assessed equally with those inferring regulatory interactions. While methods inferring regulatory interactions perform better in global regulatory network inference than co-expression-based methods, the latter is better suited to infer function-specific regulons and co-regulation networks. When merging expression data, the size increase should outweigh the noise inclusion and graph structure should be considered when integrating the inferences. We conclude with guidelines to take advantage of inference methods and their assessment based on the applications and available expression datasets.
Collapse
Affiliation(s)
- Juan M Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Estefani Gaytan-Nuñez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Ericka M Hernandez-Benitez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Andrea Zorro-Aranda
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Department of Chemical Engineering, Universidad de Antioquia, Medellín, Colombia
| | - Marco A Tello-Palencia
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Julio A Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| |
Collapse
|
22
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
23
|
A Framework for Comparison and Assessment of Synthetic RNA-Seq Data. Genes (Basel) 2022; 13:genes13122362. [PMID: 36553629 PMCID: PMC9778097 DOI: 10.3390/genes13122362] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 12/16/2022] Open
Abstract
The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and the optimization of data integration and normalization techniques. Here, we propose a general framework to compare synthetically generated RNA-seq data and select a data-generating tool that is suitable for a set of specific study goals. As there are multiple methods for synthetic RNA-seq data generation, researchers can use the proposed framework to make an informed choice of an RNA-seq data simulation algorithm and software that are best suited for their specific scientific questions of interest.
Collapse
|
24
|
Hao Y, Lu L, Liu A, Lin X, Xiao L, Kong X, Li K, Liang F, Xiong J, Qu L, Li Y, Li J. Integrating bioinformatic strategies in spatial life science research. Brief Bioinform 2022; 23:bbac415. [PMID: 36198665 PMCID: PMC9677476 DOI: 10.1093/bib/bbac415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 08/15/2022] [Accepted: 08/27/2022] [Indexed: 12/14/2022] Open
Abstract
As space exploration programs progress, manned space missions will become more frequent and farther away from Earth, putting a greater emphasis on astronaut health. Through the collaborative efforts of researchers from various countries, the effect of the space environment factors on living systems is gradually being uncovered. Although a large number of interconnected research findings have been produced, their connection seems to be confused, and many unknown effects are left to be discovered. Simultaneously, several valuable data resources have emerged, accumulating data measuring biological effects in space that can be used to further investigate the unknown biological adaptations. In this review, the previous findings and their correlations are sorted out to facilitate the understanding of biological adaptations to space and the design of countermeasures. The biological effect measurement methods/data types are also organized to provide references for experimental design and data analysis. To aid deeper exploration of the data resources, we summarized common characteristics of the data generated from longitudinal experiments, outlined challenges or caveats in data analysis and provided corresponding solutions by recommending bioinformatics strategies and available models/tools.
Collapse
Affiliation(s)
- Yangyang Hao
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Liang Lu
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Anna Liu
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Xue Lin
- Department of Bioinformatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Li Xiao
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Xiaoyue Kong
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Kai Li
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Fengji Liang
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Jianghui Xiong
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Lina Qu
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Yinghui Li
- The State Key Laboratory of Space Medicine Fundamentals and Application, China Astronaut Research and Training Center, No. 26 Beiqing Road, Haidian District, Beijing, 100094, China
| | - Jian Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| |
Collapse
|
25
|
Hawe JS, Saha A, Waldenberger M, Kunze S, Wahl S, Müller-Nurasyid M, Prokisch H, Grallert H, Herder C, Peters A, Strauch K, Theis FJ, Gieger C, Chambers J, Battle A, Heinig M. Network reconstruction for trans acting genetic loci using multi-omics data and prior information. Genome Med 2022; 14:125. [PMID: 36344995 PMCID: PMC9641770 DOI: 10.1186/s13073-022-01124-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/11/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Molecular measurements of the genome, the transcriptome, and the epigenome, often termed multi-omics data, provide an in-depth view on biological systems and their integration is crucial for gaining insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans-QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information can improve network inference. However, previous efforts were limited in the types of priors used or have only been applied to model systems. In this study, we reconstruct the regulatory networks underlying trans-QTL hotspots using human cohort data and data-driven prior information. METHODS We devised a new strategy to integrate QTL with human population scale multi-omics data. State-of-the art network inference methods including BDgraph and glasso were applied to these data. Comprehensive prior information to guide network inference was manually curated from large-scale biological databases. The inference approach was extensively benchmarked using simulated data and cross-cohort replication analyses. Best performing methods were subsequently applied to real-world human cohort data. RESULTS Our benchmarks showed that prior-based strategies outperform methods without prior information in simulated data and show better replication across datasets. Application of our approach to human cohort data highlighted two novel regulatory networks related to schizophrenia and lean body mass for which we generated novel functional hypotheses. CONCLUSIONS We demonstrate that existing biological knowledge can improve the integrative analysis of networks underlying trans associations and generate novel hypotheses about regulatory mechanisms.
Collapse
Affiliation(s)
- Johann S Hawe
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Heart Centre Munich, Department of Cardiology, Technical University Munich, Munich, Germany.,Department of Informatics, Technical University of Munich, Garching, Germany
| | - Ashis Saha
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Waldenberger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Sonja Kunze
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Simone Wahl
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,IBE, Faculty of Medicine, LMU Munich, 81377, Munich, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Department of Internal Medicine I (Cardiology), Hospital of the Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Holger Prokisch
- Institute of Human Genetics, School of Medicine, Technische Universität München, Munich, Germany
| | - Harald Grallert
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Christian Herder
- German Center for Diabetes Research (DZD), Neuherberg, Germany.,Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Düsseldorf, Germany.,Division of Endocrinology and Diabetology, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Annette Peters
- Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Konstantin Strauch
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU Munich, Munich, Germany
| | - Fabian J Theis
- Department of Informatics, Technical University of Munich, Garching, Germany.,Department of Mathematics, Technical University of Munich, Garching, Germany
| | - Christian Gieger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - John Chambers
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK.,Lee Kong Chian School of Medicine, Nanyang Technological University, 308232, Singapore, Singapore
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Matthias Heinig
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany. .,Department of Informatics, Technical University of Munich, Garching, Germany. .,Munich Heart Association, Partner Site Munich, DZHK (German Centre for Cardiovascular Research), 10785, Berlin, Germany.
| |
Collapse
|
26
|
Ferrari C, Manosalva Pérez N, Vandepoele K. MINI-EX: Integrative inference of single-cell gene regulatory networks in plants. MOLECULAR PLANT 2022; 15:1807-1824. [PMID: 36307979 DOI: 10.1016/j.molp.2022.10.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/30/2022] [Accepted: 10/21/2022] [Indexed: 05/26/2023]
Abstract
Multicellular organisms, such as plants, are characterized by highly specialized and tightly regulated cell populations, establishing specific morphological structures and executing distinct functions. Gene regulatory networks (GRNs) describe condition-specific interactions of transcription factors (TFs) regulating the expression of target genes, underpinning these specific functions. As efficient and validated methods to identify cell-type-specific GRNs from single-cell data in plants are lacking, limiting our understanding of the organization of specific cell types in both model species and crops, we developed MINI-EX (Motif-Informed Network Inference based on single-cell EXpression data), an integrative approach to infer cell-type-specific networks in plants. MINI-EX uses single-cell transcriptomic data to define expression-based networks and integrates TF motif information to filter the inferred regulons, resulting in networks with increased accuracy. Next, regulons are assigned to different cell types, leveraging cell-specific expression, and candidate regulators are prioritized using network centrality measures, functional annotations, and expression specificity. This embedded prioritization strategy offers a unique and efficient means to unravel signaling cascades in specific cell types controlling a biological process of interest. We demonstrate the stability of MINI-EX toward input data sets with low number of cells and its robustness toward missing data, and show that it infers state-of-the-art networks with a better performance compared with other related single-cell network tools. MINI-EX successfully identifies key regulators controlling root development in Arabidopsis and rice, leaf development in Arabidopsis, and ear development in maize, enhancing our understanding of cell-type-specific regulation and unraveling the roles of different regulators controlling the development of specific cell types in plants.
Collapse
Affiliation(s)
- Camilla Ferrari
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
| | - Nicolás Manosalva Pérez
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium; Bioinformatics Institute Ghent, Ghent University, 9052 Ghent, Belgium.
| |
Collapse
|
27
|
Cummins B, Motta FC, Moseley RC, Deckard A, Campione S, Gameiro M, Gedeon T, Mischaikow K, Haase SB. Experimental guidance for discovering genetic networks through hypothesis reduction on time series. PLoS Comput Biol 2022; 18:e1010145. [PMID: 36215333 PMCID: PMC9584434 DOI: 10.1371/journal.pcbi.1010145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 10/20/2022] [Accepted: 09/05/2022] [Indexed: 11/19/2022] Open
Abstract
Large programs of dynamic gene expression, like cell cyles and circadian rhythms, are controlled by a relatively small "core" network of transcription factors and post-translational modifiers, working in concerted mutual regulation. Recent work suggests that system-independent, quantitative features of the dynamics of gene expression can be used to identify core regulators. We introduce an approach of iterative network hypothesis reduction from time-series data in which increasingly complex features of the dynamic expression of individual, pairs, and entire collections of genes are used to infer functional network models that can produce the observed transcriptional program. The culmination of our work is a computational pipeline, Iterative Network Hypothesis Reduction from Temporal Dynamics (Inherent dynamics pipeline), that provides a priority listing of targets for genetic perturbation to experimentally infer network structure. We demonstrate the capability of this integrated computational pipeline on synthetic and yeast cell-cycle data.
Collapse
Affiliation(s)
- Breschine Cummins
- Department of Mathematical Sciences, Montana State University, Bozeman, Montana, United States of America
- * E-mail:
| | - Francis C. Motta
- Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, Florida, United States of America
| | - Robert C. Moseley
- Department of Biology, Duke University, Durham, North Carolina, United States of America
| | - Anastasia Deckard
- Geometric Data Analytics, Durham, North Carolina, United States of America
| | - Sophia Campione
- Department of Biology, Duke University, Durham, North Carolina, United States of America
| | - Marcio Gameiro
- Department of Mathematics, Rutgers University, New Brunswick, New Jersey, United States of America
- Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, São Paulo, Brazil
| | - Tomáš Gedeon
- Department of Mathematical Sciences, Montana State University, Bozeman, Montana, United States of America
| | - Konstantin Mischaikow
- Department of Mathematics, Rutgers University, New Brunswick, New Jersey, United States of America
| | - Steven B. Haase
- Department of Biology, Duke University, Durham, North Carolina, United States of America
| |
Collapse
|
28
|
Perkel JM. Smart software untangles gene regulation in cells. Nature 2022; 609:428-431. [PMID: 36064802 DOI: 10.1038/d41586-022-02826-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
29
|
You Y, Sun X, Ma M, He J, Li L, Porto FW, Lin S. Trypsin is a coordinate regulator of N and P nutrients in marine phytoplankton. Nat Commun 2022; 13:4022. [PMID: 35821503 PMCID: PMC9276738 DOI: 10.1038/s41467-022-31802-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 07/05/2022] [Indexed: 11/30/2022] Open
Abstract
Trypsin is best known as a digestive enzyme in animals, but remains unexplored in phytoplankton, the major primary producers in the ocean. Here we report the prevalence of trypsin genes in global ocean phytoplankton and significant influences of environmental nitrogen (N) and phosphorus (P) on their expression. Using CRISPR/Cas9 mediated-knockout and overexpression analyses, we further reveal that a trypsin in Phaeodactylum tricornutum (PtTryp2) functions to repress N acquisition, but its expression decreases under N-deficiency to promote N acquisition. On the contrary, PtTryp2 promotes phosphate uptake per se, and its expression increases under P-deficiency to further reinforce P acquisition. Furthermore, PtTryp2 knockout led to amplitude magnification of the nitrate and phosphate uptake ‘seesaw’, whereas PtTryp2 overexpression dampened it, linking PtTryp2 to stabilizing N:P stoichiometry. Our data demonstrate that PtTryp2 is a coordinate regulator of N:P stoichiometric homeostasis. The study opens a window for deciphering how phytoplankton adapt to nutrient-variable marine environments. Using CRISPR-Cas9 mediated-knockout and overexpression analyses, this study shows that a trypsin in the diatom Phaeodactylum tricornutum promotes phosphorus uptake and inhibits nitrogen uptake but its expression is downregulated under nitrogen stress and upregulated under phosphorus stress. Together, the findings suggest this trypsin is a coordinate regulator of nutrient homeostasis.
Collapse
Affiliation(s)
- Yanchun You
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China
| | - Xueqiong Sun
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China
| | - Minglei Ma
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China
| | - Jiamin He
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China
| | - Ling Li
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China
| | - Felipe Wendt Porto
- Department of Marine Sciences, University of Connecticut, Groton, CT, 06340, USA
| | - Senjie Lin
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China. .,Department of Marine Sciences, University of Connecticut, Groton, CT, 06340, USA.
| |
Collapse
|
30
|
Hammelman J, Patel T, Closser M, Wichterle H, Gifford D. Ranking reprogramming factors for cell differentiation. Nat Methods 2022; 19:812-822. [PMID: 35710610 PMCID: PMC10460539 DOI: 10.1038/s41592-022-01522-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 05/13/2022] [Indexed: 12/16/2022]
Abstract
Transcription factor over-expression is a proven method for reprogramming cells to a desired cell type for regenerative medicine and therapeutic discovery. However, a general method for the identification of reprogramming factors to create an arbitrary cell type is an open problem. Here we examine the success rate of methods and data for differentiation by testing the ability of nine computational methods (CellNet, GarNet, EBseq, AME, DREME, HOMER, KMAC, diffTF and DeepAccess) to discover and rank candidate factors for eight target cell types with known reprogramming solutions. We compare methods that use gene expression, biological networks and chromatin accessibility data, and comprehensively test parameter and preprocessing of input data to optimize performance. We find the best factor identification methods can identify an average of 50-60% of reprogramming factors within the top ten candidates, and methods that use chromatin accessibility perform the best. Among the chromatin accessibility methods, complex methods DeepAccess and diffTF have higher correlation with the ranked significance of transcription factor candidates within reprogramming protocols for differentiation. We provide evidence that AME and diffTF are optimal methods for transcription factor recovery that will allow for systematic prioritization of transcription factor candidates to aid in the design of new reprogramming protocols.
Collapse
Affiliation(s)
- Jennifer Hammelman
- Computational and Systems Biology, MIT, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Tulsi Patel
- Departments of Pathology and Cell Biology, Neuroscience, Rehabilitation and Regenerative Medicine (in Neurology), Columbia University Irving Medical Center, New York, NY, USA
- Center for Motor Neuron Biology and Disease, Columbia University Irving Medical Center, New York, NY, USA
- Columbia Stem Cell Initiative, Columbia University Irving Medical Center, New York, NY, USA
| | - Michael Closser
- Departments of Pathology and Cell Biology, Neuroscience, Rehabilitation and Regenerative Medicine (in Neurology), Columbia University Irving Medical Center, New York, NY, USA
- Center for Motor Neuron Biology and Disease, Columbia University Irving Medical Center, New York, NY, USA
- Columbia Stem Cell Initiative, Columbia University Irving Medical Center, New York, NY, USA
| | - Hynek Wichterle
- Departments of Pathology and Cell Biology, Neuroscience, Rehabilitation and Regenerative Medicine (in Neurology), Columbia University Irving Medical Center, New York, NY, USA
- Center for Motor Neuron Biology and Disease, Columbia University Irving Medical Center, New York, NY, USA
- Columbia Stem Cell Initiative, Columbia University Irving Medical Center, New York, NY, USA
| | - David Gifford
- Computational and Systems Biology, MIT, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
- Department of Biological Engineering, MIT, Cambridge, MA, USA.
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA.
| |
Collapse
|
31
|
Jansen JE, Aschenbrenner D, Uhlig HH, Coles MC, Gaffney EA. A method for the inference of cytokine interaction networks. PLoS Comput Biol 2022; 18:e1010112. [PMID: 35731827 PMCID: PMC9216621 DOI: 10.1371/journal.pcbi.1010112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 04/15/2022] [Indexed: 11/19/2022] Open
Abstract
Cell-cell communication is mediated by many soluble mediators, including over 40 cytokines. Cytokines, e.g. TNF, IL1β, IL5, IL6, IL12 and IL23, represent important therapeutic targets in immune-mediated inflammatory diseases (IMIDs), such as inflammatory bowel disease (IBD), psoriasis, asthma, rheumatoid and juvenile arthritis. The identification of cytokines that are causative drivers of, and not just associated with, inflammation is fundamental for selecting therapeutic targets that should be studied in clinical trials. As in vitro models of cytokine interactions provide a simplified framework to study complex in vivo interactions, and can easily be perturbed experimentally, they are key for identifying such targets. We present a method to extract a minimal, weighted cytokine interaction network, given in vitro data on the effects of the blockage of single cytokine receptors on the secretion rate of other cytokines. Existing biological network inference methods typically consider the correlation structure of the underlying dataset, but this can make them poorly suited for highly connected, non-linear cytokine interaction data. Our method uses ordinary differential equation systems to represent cytokine interactions, and efficiently computes the configuration with the lowest Akaike information criterion value for all possible network configurations. It enables us to study indirect cytokine interactions and quantify inhibition effects. The extracted network can also be used to predict the combined effects of inhibiting various cytokines simultaneously. The model equations can easily be adjusted to incorporate more complicated dynamics and accommodate temporal data. We validate our method using synthetic datasets and apply our method to an experimental dataset on the regulation of IL23, a cytokine with therapeutic relevance in psoriasis and IBD. We validate several model predictions against experimental data that were not used for model fitting. In summary, we present a novel method specifically designed to efficiently infer cytokine interaction networks from cytokine perturbation data in the context of IMIDs. Cytokines are the messenger molecules of the immune system, allowing intercellular communication and mediating effective immune responses. They are an important therapeutic target in immune mediated inflammatory diseases such as inflammatory bowel disease (IBD) and rheumatoid arthritis. Cytokines interact in a tightly regulated network and depending on the context a particular cytokine can be involved in anti-inflammatory or inflammatory activities. In order to determine which cytokines to target in specific disease types and patient subsets, it is critical to study the effects of the inhibition of one or more cytokines on the larger cytokine interaction network. We present a novel method to extract a minimal, weighted network from cytokine interaction data. Existing biological network inference methods typically consider the correlation structure of the underlying dataset and/or make further assumptions of the dataset such as the existence of a small core of regulators. This can make them poorly suited for highly connected, non-linear cytokine interaction data. We validated our method using synthetic data and applied our method to a dataset on the regulation of IL23, a cytokine implicated in IBD pathogenesis. Predictions of the extracted IL23 network were validated using additional experimental data and were used to support the view of the cytokines IL1 and IL23 as promising targets for those patients that fail to respond to TNFα inhibition, the current golden standard in IBD treatment.
Collapse
Affiliation(s)
- Joanneke E. Jansen
- Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford, United Kingdom
- Translational Gastroenterology Unit, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom
| | - Dominik Aschenbrenner
- Translational Gastroenterology Unit, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- Autoimmunity, Transplantation and Inflammation, Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland
| | - Holm H. Uhlig
- Translational Gastroenterology Unit, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- Department of Paediatrics, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
| | - Mark C. Coles
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom
| | - Eamonn A. Gaffney
- Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
32
|
Xavier JB, Monk JM, Poudel S, Norsigian CJ, Sastry AV, Liao C, Bento J, Suchard MA, Arrieta-Ortiz ML, Peterson EJ, Baliga NS, Stoeger T, Ruffin F, Richardson RA, Gao CA, Horvath TD, Haag AM, Wu Q, Savidge T, Yeaman MR. Mathematical models to study the biology of pathogens and the infectious diseases they cause. iScience 2022; 25:104079. [PMID: 35359802 PMCID: PMC8961237 DOI: 10.1016/j.isci.2022.104079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Mathematical models have many applications in infectious diseases: epidemiologists use them to forecast outbreaks and design containment strategies; systems biologists use them to study complex processes sustaining pathogens, from the metabolic networks empowering microbial cells to ecological networks in the microbiome that protects its host. Here, we (1) review important models relevant to infectious diseases, (2) draw parallels among models ranging widely in scale. We end by discussing a minimal set of information for a model to promote its use by others and to enable predictions that help us better fight pathogens and the diseases they cause.
Collapse
Affiliation(s)
- Joao B. Xavier
- Program for Computational and Systems Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | | | - Saugat Poudel
- Department of Bioengineering, UC San Diego, San Diego, CA, USA
| | | | - Anand V. Sastry
- Department of Bioengineering, UC San Diego, San Diego, CA, USA
| | - Chen Liao
- Program for Computational and Systems Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Jose Bento
- Computer Science Department, Boston College, Chestnut Hill, MA, USA
| | - Marc A. Suchard
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
| | | | | | | | - Thomas Stoeger
- Department of Chemical and Biological Engineering; Northwestern University, Evanston, IL 60208, USA
- Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center, Northwestern University, Chicago, IL, USA
| | - Felicia Ruffin
- Division of Infectious Diseases, Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Reese A.K. Richardson
- Department of Chemical and Biological Engineering; Northwestern University, Evanston, IL 60208, USA
- Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center, Northwestern University, Chicago, IL, USA
| | - Catherine A. Gao
- Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center, Northwestern University, Chicago, IL, USA
- Division of Pulmonary and Critical Care, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Thomas D. Horvath
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pathology, Texas Children’s Microbiome Center, Texas Children’s Hospital, Houston, TX 77030, USA
| | - Anthony M. Haag
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pathology, Texas Children’s Microbiome Center, Texas Children’s Hospital, Houston, TX 77030, USA
| | - Qinglong Wu
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pathology, Texas Children’s Microbiome Center, Texas Children’s Hospital, Houston, TX 77030, USA
| | - Tor Savidge
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pathology, Texas Children’s Microbiome Center, Texas Children’s Hospital, Houston, TX 77030, USA
| | - Michael R. Yeaman
- David Geffen School of Medicine at UCLA & Lundquist Institute for Infection & Immunity at Harbor UCLA Medical Center, Los Angeles, CA, USA
| |
Collapse
|
33
|
Niu P, Soto MJ, Yoon BJ, Dougherty ER, Alexander FJ, Blaby I, Qian X. Protocol for condition-dependent metabolite yield prediction using the TRIMER pipeline. STAR Protoc 2022; 3:101184. [PMID: 35243375 PMCID: PMC8866898 DOI: 10.1016/j.xpro.2022.101184] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
This protocol explains the pipeline for condition-dependent metabolite yield prediction using Transcription Regulation Integrated with MEtabolic Regulation (TRIMER). TRIMER targets metabolic engineering applications via a hybrid model integrating transcription factor (TF)-gene regulatory network (TRN) with a Bayesian network (BN) inferred from transcriptomic expression data to effectively regulate metabolic reactions. For E. coli and yeast, TRIMER achieves reliable knockout phenotype and flux predictions from the deletion of one or more TFs at the genome scale. For complete details on the use and execution of this protocol, please refer to Niu et al. (2021). TRIMER is a package for transcription-regulated metabolic predictions Global dependency modeling by Bayesian network enables condition-dependent prediction We present the step-by-step TRIMER implementation for metabolic engineering We demonstrate the analyses for E. coli and yeast mutants
Collapse
Affiliation(s)
- Puhua Niu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Maria J. Soto
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Edward R. Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Francis J. Alexander
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Corresponding author
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
- Corresponding author
| |
Collapse
|
34
|
Gibbs CS, Jackson CA, Saldi GA, Tjärnberg A, Shah A, Watters A, De Veaux N, Tchourine K, Yi R, Hamamsy T, Castro DM, Carriero N, Gorissen BL, Gresham D, Miraldi ER, Bonneau R. High performance single-cell gene regulatory network inference at scale: The Inferelator 3.0. Bioinformatics 2022; 38:2519-2528. [PMID: 35188184 PMCID: PMC9048651 DOI: 10.1093/bioinformatics/btac117] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 12/08/2021] [Accepted: 02/17/2022] [Indexed: 12/04/2022] Open
Abstract
Motivation Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above. Results In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data. Availability and implementation The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claudia Skok Gibbs
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA.,Center For Data Science, NYU, New York, NY, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Giuseppe-Antonio Saldi
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Andreas Tjärnberg
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Aashna Shah
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA
| | - Aaron Watters
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA
| | - Nicholas De Veaux
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA
| | | | - Ren Yi
- Courant Institute of Mathematical Sciences, Computer Science Department, NYU, New York, NY, USA
| | | | - Dayanne M Castro
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Nicholas Carriero
- Flatiron Institute, Scientific Computing Core, Simons Foundation, New York, NY, USA
| | - Bram L Gorissen
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA
| | - Emily R Miraldi
- Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Richard Bonneau
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, USA.,Center For Data Science, NYU, New York, NY, USA.,Center For Genomics and Systems Biology, NYU, New York, NY, USA.,Department of Biology, NYU, New York, NY, USA.,Courant Institute of Mathematical Sciences, Computer Science Department, NYU, New York, NY, USA
| |
Collapse
|
35
|
Zorro-Aranda A, Escorcia-Rodríguez JM, González-Kise JK, Freyre-González JA. Curation, inference, and assessment of a globally reconstructed gene regulatory network for Streptomyces coelicolor. Sci Rep 2022; 12:2840. [PMID: 35181703 PMCID: PMC8857197 DOI: 10.1038/s41598-022-06658-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 01/31/2022] [Indexed: 12/12/2022] Open
Abstract
Streptomyces coelicolor A3(2) is a model microorganism for the study of Streptomycetes, antibiotic production, and secondary metabolism in general. Even though S. coelicolor has an outstanding variety of regulators among bacteria, little effort to globally study its transcription has been made. We manually curated 29 years of literature and databases to assemble a meta-curated experimentally-validated gene regulatory network (GRN) with 5386 genes and 9707 regulatory interactions (~ 41% of the total expected interactions). This provides the most extensive and up-to-date reconstruction available for the regulatory circuitry of this organism. Only ~ 6% (534/9707) are supported by experiments confirming the binding of the transcription factor to the upstream region of the target gene, the so-called “strong” evidence. While for the remaining interactions there is no confirmation of direct binding. To tackle network incompleteness, we performed network inference using several methods (including two proposed here) for motif identification in DNA sequences and GRN inference from transcriptomics. Further, we contrasted the structural properties and functional architecture of the networks to assess the reliability of the predictions, finding the inference from DNA sequence data to be the most trustworthy approach. Finally, we show two applications of the inferred and the curated networks. The inference allowed us to propose novel transcription factors for the key Streptomyces antibiotic regulatory proteins (SARPs). The curated network allowed us to study the conservation of the system-level components between S. coelicolor and Corynebacterium glutamicum. There we identified the basal machinery as the common signature between the two organisms. The curated networks were deposited in Abasy Atlas (https://abasy.ccg.unam.mx/) while the inferences are available as Supplementary Material.
Collapse
Affiliation(s)
- Andrea Zorro-Aranda
- Regulatory Systems Biology Research Group, Laboratory of Systems and Synthetic Biology, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México.,Bioprocess Research Group, Department of Chemical Engineering, Universidad de Antioquia, Calle 70 No. 52-21, Medellín, Colombia
| | - Juan Miguel Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Laboratory of Systems and Synthetic Biology, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México
| | - José Kenyi González-Kise
- Regulatory Systems Biology Research Group, Laboratory of Systems and Synthetic Biology, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México.,Undergraduate Program in Genomic Sciences, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México
| | - Julio Augusto Freyre-González
- Regulatory Systems Biology Research Group, Laboratory of Systems and Synthetic Biology, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México.
| |
Collapse
|
36
|
Aluru M, Shrivastava H, Chockalingam SP, Shivakumar S, Aluru S. EnGRaiN: a supervised ensemble learning method for recovery of large-scale gene regulatory networks. Bioinformatics 2022; 38:1312-1319. [PMID: 34888624 DOI: 10.1093/bioinformatics/btab829] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/29/2021] [Accepted: 12/03/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Reconstruction of genome-scale networks from gene expression data is an actively studied problem. A wide range of methods that differ between the types of interactions they uncover with varying trade-offs between sensitivity and specificity have been proposed. To leverage benefits of multiple such methods, ensemble network methods that combine predictions from resulting networks have been developed, promising results better than or as good as the individual networks. Perhaps owing to the difficulty in obtaining accurate training examples, these ensemble methods hitherto are unsupervised. RESULTS In this article, we introduce EnGRaiN, the first supervised ensemble learning method to construct gene networks. The supervision for training is provided by small training datasets of true edge connections (positives) and edges known to be absent (negatives) among gene pairs. We demonstrate the effectiveness of EnGRaiN using simulated datasets as well as a curated collection of Arabidopsis thaliana datasets we created from microarray datasets available from public repositories. EnGRaiN shows better results not only in terms of receiver operating characteristic and PR characteristics for both real and simulated datasets compared with unsupervised methods for ensemble network construction, but also generates networks that can be mined for elucidating complex biological interactions. AVAILABILITY AND IMPLEMENTATION EnGRaiN software and the datasets used in the study are publicly available at the github repository: https://github.com/AluruLab/EnGRaiN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maneesha Aluru
- Department of Biology, Georgia Institute of Technology, Atlanta, GA 30308, USA
| | | | - Sriram P Chockalingam
- Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA 30308, USA
| | - Shruti Shivakumar
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30308, USA
| | - Srinivas Aluru
- Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA 30308, USA.,Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30308, USA
| |
Collapse
|
37
|
Swift J, Greenham K, Ecker JR, Coruzzi GM, McClung CR. The biology of time: dynamic responses of cell types to developmental, circadian and environmental cues. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 109:764-778. [PMID: 34797944 PMCID: PMC9215356 DOI: 10.1111/tpj.15589] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 11/10/2021] [Accepted: 11/15/2021] [Indexed: 05/26/2023]
Abstract
As sessile organisms, plants are finely tuned to respond dynamically to developmental, circadian and environmental cues. Genome-wide studies investigating these types of cues have uncovered the intrinsically different ways they can impact gene expression over time. Recent advances in single-cell sequencing and time-based bioinformatic algorithms are now beginning to reveal the dynamics of these time-based responses within individual cells and plant tissues. Here, we review what these techniques have revealed about the spatiotemporal nature of gene regulation, paying particular attention to the three distinct ways in which plant tissues are time sensitive. (i) First, we discuss how studying plant cell identity can reveal developmental trajectories hidden in pseudotime. (ii) Next, we present evidence that indicates that plant cell types keep their own local time through tissue-specific regulation of the circadian clock. (iii) Finally, we review what determines the speed of environmental signaling responses, and how they can be contingent on developmental and circadian time. By these means, this review sheds light on how these different scales of time-based responses can act with tissue and cell-type specificity to elicit changes in whole plant systems.
Collapse
Affiliation(s)
- Joseph Swift
- Plant Biology Laboratory, The Salk Institute for Biological Studies, 10010 N Torrey Pines Rd, La Jolla, CA 92037, USA
| | - Kathleen Greenham
- Department of Plant and Microbial Biology, University of Minnesota, St Paul, MN 55108, USA
| | - Joseph R. Ecker
- Plant Biology Laboratory, The Salk Institute for Biological Studies, 10010 N Torrey Pines Rd, La Jolla, CA 92037, USA
- Howard Hughes Medical Institute, The Salk Institute for Biological Studies, 10010 N Torrey Pines Rd, La Jolla, CA 92037, USA
| | - Gloria M. Coruzzi
- Department of Biology, Center for Genomics and Systems Biology, New York University, NY, USA
| | | |
Collapse
|
38
|
Qin G, Knijnenburg TA, Gibbs DL, Moser R, Monnat RJ, Kemp CJ, Shmulevich I. A functional module states framework reveals transcriptional states for drug and target prediction. Cell Rep 2022; 38:110269. [PMID: 35045296 DOI: 10.1016/j.celrep.2021.110269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 08/24/2021] [Accepted: 12/23/2021] [Indexed: 11/03/2022] Open
Abstract
Cells are complex systems in which many functions are performed by different genetically defined and encoded functional modules. To systematically understand how these modules respond to drug or genetic perturbations, we develop a functional module states framework. Using this framework, we (1) define the drug-induced transcriptional state space for breast cancer cell lines using large public gene expression datasets and reveal that the transcriptional states are associated with drug concentration and drug targets, (2) identify potential targetable vulnerabilities through integrative analysis of transcriptional states after drug treatment and gene knockdown-associated cancer dependency, and (3) use functional module states to predict transcriptional state-dependent drug sensitivity and build prediction models for drug response. This approach demonstrates a similar prediction performance as approaches using high-dimensional gene expression values, with the added advantage of more clearly revealing biologically relevant transcriptional states and key regulators.
Collapse
Affiliation(s)
- Guangrong Qin
- Institute for Systems Biology, Seattle, WA 98109, USA.
| | | | - David L Gibbs
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Russell Moser
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Raymond J Monnat
- Department of Laboratory Medicine/Pathology & Genome Sciences, University of Washington, Seattle, WA 98195-7705, USA
| | | | | |
Collapse
|
39
|
Zheng L, Liu Z, Yang Y, Shen HB. Accurate inference of gene regulatory interactions from spatial gene expression with deep contrastive learning. Bioinformatics 2022; 38:746-753. [PMID: 34664632 DOI: 10.1093/bioinformatics/btab718] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 09/19/2021] [Accepted: 10/15/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Reverse engineering of gene regulatory networks (GRNs) has long been an attractive research topic in system biology. Computational prediction of gene regulatory interactions has remained a challenging problem due to the complexity of gene expression and scarce information resources. The high-throughput spatial gene expression data, like in situ hybridization images that exhibit temporal and spatial expression patterns, has provided abundant and reliable information for the inference of GRNs. However, computational tools for analyzing the spatial gene expression data are highly underdeveloped. RESULTS In this study, we develop a new method for identifying gene regulatory interactions from gene expression images, called ConGRI. The method is featured by a contrastive learning scheme and deep Siamese convolutional neural network architecture, which automatically learns high-level feature embeddings for the expression images and then feeds the embeddings to an artificial neural network to determine whether or not the interaction exists. We apply the method to a Drosophila embryogenesis dataset and identify GRNs of eye development and mesoderm development. Experimental results show that ConGRI outperforms previous traditional and deep learning methods by a large margin, which achieves accuracies of 76.7% and 68.7% for the GRNs of early eye development and mesoderm development, respectively. It also reveals some master regulators for Drosophila eye development. AVAILABILITYAND IMPLEMENTATION https://github.com/lugimzheng/ConGRI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lujing Zheng
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- SJTU Paris Elite Institute of Technology (SPEIT), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhenhuan Liu
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai 200240, China
| | - Hong-Bin Shen
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
- Institute of Image Processing and Pattern Recognition and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
40
|
Emerging Machine Learning Techniques for Modelling Cellular Complex Systems in Alzheimer's Disease. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1338:199-208. [PMID: 34973026 DOI: 10.1007/978-3-030-78775-2_24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
We live in the big data era in the biomedical field, where machine learning has a very important contribution to the interpretation of complex biological processes and diseases, since it has the potential to create predictive models from multidimensional data sets. Part of the application of machine learning in biomedical science is to study and model complex cellular systems such as biological networks. In this context, the study of complex diseases, such as Alzheimer's diseases (AD), benefits from established methodologies of network science and machine learning as they offer algorithmic tools and techniques that can address the limitations and challenges of modeling and studying cellular AD-related networks. In this paper we analyze the opportunities and challenges at the intersection of machine learning and network biology and whether this can affect the biological interpretation and clarification of diseases. Specifically, we focus on GRN techniques which through omics data and the use of machine learning techniques can construct a network that captures all the information at the molecular level for the disease under study. We record the emerging machine learning techniques that are focus on ensemble tree-based techniques in the area of classification and regression. Their potential for unraveling the complexity of model cellular systems in complex diseases, such as AD, offers the opportunity for novel machine learning methodologies to decipher the mechanisms of the various AD processes.
Collapse
|
41
|
Network Biology and Artificial Intelligence Drive the Understanding of the Multidrug Resistance Phenotype in Cancer. Drug Resist Updat 2022; 60:100811. [DOI: 10.1016/j.drup.2022.100811] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 01/22/2022] [Accepted: 01/24/2022] [Indexed: 02/07/2023]
|
42
|
Immanuel SRC, Arrieta-Ortiz ML, Ruiz RA, Pan M, Lopez Garcia de Lomana A, Peterson EJR, Baliga NS. Quantitative prediction of conditional vulnerabilities in regulatory and metabolic networks using PRIME. NPJ Syst Biol Appl 2021; 7:43. [PMID: 34873198 PMCID: PMC8648758 DOI: 10.1038/s41540-021-00205-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 11/02/2021] [Indexed: 12/04/2022] Open
Abstract
The ability of Mycobacterium tuberculosis (Mtb) to adopt heterogeneous physiological states underlies its success in evading the immune system and tolerating antibiotic killing. Drug tolerant phenotypes are a major reason why the tuberculosis (TB) mortality rate is so high, with over 1.8 million deaths annually. To develop new TB therapeutics that better treat the infection (faster and more completely), a systems-level approach is needed to reveal the complexity of network-based adaptations of Mtb. Here, we report a new predictive model called PRIME (Phenotype of Regulatory influences Integrated with Metabolism and Environment) to uncover environment-specific vulnerabilities within the regulatory and metabolic networks of Mtb. Through extensive performance evaluations using genome-wide fitness screens, we demonstrate that PRIME makes mechanistically accurate predictions of context-specific vulnerabilities within the integrated regulatory and metabolic networks of Mtb, accurately rank-ordering targets for potentiating treatment with frontline drugs.
Collapse
Affiliation(s)
| | | | - Rene A Ruiz
- Institute for Systems Biology, Seattle, WA, USA
| | - Min Pan
- Institute for Systems Biology, Seattle, WA, USA
| | - Adrian Lopez Garcia de Lomana
- Institute for Systems Biology, Seattle, WA, USA
- Center for Systems Biology, University of Iceland, Reykjavik, Iceland
| | | | - Nitin S Baliga
- Institute for Systems Biology, Seattle, WA, USA.
- Departments of Biology and Microbiology, University of Washington, Seattle, WA, USA.
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA.
- Lawrence Berkeley National Lab, Berkeley, CA, USA.
| |
Collapse
|
43
|
Niu P, Soto MJ, Yoon BJ, Dougherty ER, Alexander FJ, Blaby I, Qian X. TRIMER: Transcription Regulation Integrated with Metabolic Regulation. iScience 2021; 24:103218. [PMID: 34761179 PMCID: PMC8567008 DOI: 10.1016/j.isci.2021.103218] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 08/22/2021] [Accepted: 09/29/2021] [Indexed: 01/01/2023] Open
Abstract
There has been extensive research in predictive modeling of genome-scale metabolic reaction networks. Living systems involve complex stochastic processes arising from interactions among different biomolecules. For more accurate and robust prediction of target metabolic behavior under different conditions, not only metabolic reactions but also the genetic regulatory relationships involving transcription factors (TFs) affecting these metabolic reactions should be modeled. We have developed a modeling and simulation pipeline enabling the analysis of Transcription Regulation Integrated with Metabolic Regulation: TRIMER. TRIMER utilizes a Bayesian network (BN) inferred from transcriptomes to model the transcription factor regulatory network. TRIMER then infers the probabilities of the gene states relevant to the metabolism of interest, and predicts the metabolic fluxes and their changes that result from the deletion of one or more transcription factors at the genome scale. We demonstrate TRIMER’s applicability to both simulated and experimental data and provide performance comparison with other existing approaches. TRIMER models transcription-regulated metabolism using Bayesian network modeling; TRIMER integrates prior knowledge (regulatory interaction) with data (expression); TRIMER enables metabolic behavior prediction for general knockout strategies; TRIMER includes a simulator as an evaluation platform for similar hybrid models; TRIMER reliably predicts metabolite yields for both simulated and experimental data.
Collapse
Affiliation(s)
- Puhua Niu
- Texas A&M University, Department of Electrical and Computer Engineering, College Station, TX, 77843, USA
| | - Maria J. Soto
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Byung-Jun Yoon
- Texas A&M University, Department of Electrical and Computer Engineering, College Station, TX, 77843, USA
- Brookhaven National Laboratory, Computational Science Initiative, Upton, NY, 11973, USA
| | - Edward R. Dougherty
- Texas A&M University, Department of Electrical and Computer Engineering, College Station, TX, 77843, USA
| | - Francis J. Alexander
- Brookhaven National Laboratory, Computational Science Initiative, Upton, NY, 11973, USA
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Corresponding author
| | - Xiaoning Qian
- Texas A&M University, Department of Electrical and Computer Engineering, College Station, TX, 77843, USA
- Brookhaven National Laboratory, Computational Science Initiative, Upton, NY, 11973, USA
- Corresponding author
| |
Collapse
|
44
|
Saint-André V. Computational biology approaches for mapping transcriptional regulatory networks. Comput Struct Biotechnol J 2021; 19:4884-4895. [PMID: 34522292 PMCID: PMC8426465 DOI: 10.1016/j.csbj.2021.08.028] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 08/16/2021] [Accepted: 08/16/2021] [Indexed: 12/13/2022] Open
Abstract
Transcriptional Regulatory Networks (TRNs) are mainly responsible for the cell-type- or cell-state-specific expression of gene sets from the same DNA sequence. However, so far there are no precise maps of TRNs available for each cell-type or cell-state, and no ideal tool to map those networks clearly and in full from biological samples. In this review, major approaches and tools to map TRNs from high-throughput data are presented, depending on the type of methods or data used to infer them, and their advantages and limitations are discussed. After summarizing the main principles defining the topology and structure–function relationships in TRNs, an overview of the extensive work done to map TRNs from bulk transcriptomic data will be presented by type of methodological approach. Most recent modellings of TRNs using other types of molecular data or integrating different data types, including single-cell RNA-sequencing and chromatin information, will then be discussed, before briefly concluding with improvements expected to come in the field.
Collapse
Affiliation(s)
- Violaine Saint-André
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, Paris, France
| |
Collapse
|
45
|
Ghosh Roy G, Geard N, Verspoor K, He S. PoLoBag: Polynomial Lasso Bagging for signed gene regulatory network inference from expression data. Bioinformatics 2021; 36:5187-5193. [PMID: 32697830 DOI: 10.1093/bioinformatics/btaa651] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 06/06/2020] [Accepted: 07/16/2020] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from expression data is a significant systems biology problem. A useful inference algorithm should not only unveil the global structure of the regulatory mechanisms but also the details of regulatory interactions such as edge direction (from regulator to target) and sign (activation/inhibition). Many popular GRN inference algorithms cannot infer edge signs, and those that can infer signed GRNs cannot simultaneously infer edge directions or network cycles. RESULTS To address these limitations of existing algorithms, we propose Polynomial Lasso Bagging (PoLoBag) for signed GRN inference with both edge directions and network cycles. PoLoBag is an ensemble regression algorithm in a bagging framework where Lasso weights estimated on bootstrap samples are averaged. These bootstrap samples incorporate polynomial features to capture higher-order interactions. Results demonstrate that PoLoBag is consistently more accurate for signed inference than state-of-the-art algorithms on simulated and real-world expression datasets. AVAILABILITY AND IMPLEMENTATION Algorithm and data are freely available at https://github.com/gourabghoshroy/PoLoBag. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gourab Ghosh Roy
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK.,School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK
| |
Collapse
|
46
|
Computational Phosphorylation Network Reconstruction: An Update on Methods and Resources. Methods Mol Biol 2021. [PMID: 34270057 DOI: 10.1007/978-1-0716-1625-3_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Most proteins undergo some form of modification after translation, and phosphorylation is one of the most relevant and ubiquitous post-translational modifications. The succession of protein phosphorylation and dephosphorylation catalyzed by protein kinase and phosphatase, respectively, constitutes a key mechanism of molecular information flow in cellular systems. The protein interactions of kinases, phosphatases, and their regulatory subunits and substrates are the main part of phosphorylation networks. To elucidate the landscape of phosphorylation events has been a central goal pursued by both experimental and computational approaches. Substrate specificity (e.g., sequence, structure) or the phosphoproteome has been utilized in an array of different statistical learning methods to infer phosphorylation networks. In this chapter, different computational phosphorylation network inference-related methods and resources are summarized and discussed.
Collapse
|
47
|
Tripathi RK, Wilkins O. Single cell gene regulatory networks in plants: Opportunities for enhancing climate change stress resilience. PLANT, CELL & ENVIRONMENT 2021; 44:2006-2017. [PMID: 33522607 PMCID: PMC8359182 DOI: 10.1111/pce.14012] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 05/05/2023]
Abstract
Global warming poses major challenges for plant survival and agricultural productivity. Thus, efforts to enhance stress resilience in plants are key strategies for protecting food security. Gene regulatory networks (GRNs) are a critical mechanism conferring stress resilience. Until recently, predicting GRNs of the individual cells that make up plants and other multicellular organisms was impeded by aggregate population scale measurements of transcriptome and other genome-scale features. With the advancement of high-throughput single cell RNA-seq and other single cell assays, learning GRNs for individual cells is now possible, in principle. In this article, we report on recent advances in experimental and analytical methodologies for single cell sequencing assays especially as they have been applied to the study of plants. We highlight recent advances and ongoing challenges for scGRN prediction, and finally, we highlight the opportunity to use scGRN discovery for studying and ultimately enhancing abiotic stress resilience in plants.
Collapse
Affiliation(s)
- Rajiv K. Tripathi
- Department of Biological SciencesUniversity of ManitobaWinnipegManitobaCanada
| | - Olivia Wilkins
- Department of Biological SciencesUniversity of ManitobaWinnipegManitobaCanada
| |
Collapse
|
48
|
Genetic program activity delineates risk, relapse, and therapy responsiveness in multiple myeloma. NPJ Precis Oncol 2021; 5:60. [PMID: 34183722 PMCID: PMC8239045 DOI: 10.1038/s41698-021-00185-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Accepted: 05/13/2021] [Indexed: 01/19/2023] Open
Abstract
Despite recent advancements in the treatment of multiple myeloma (MM), nearly all patients ultimately relapse and many become refractory to multiple lines of therapies. Therefore, we not only need the ability to predict which patients are at high risk for disease progression but also a means to understand the mechanisms underlying their risk. Here, we report a transcriptional regulatory network (TRN) for MM inferred from cross-sectional multi-omics data from 881 patients that predicts how 124 chromosomal abnormalities and somatic mutations causally perturb 392 transcription regulators of 8549 genes to manifest in distinct clinical phenotypes and outcomes. We identified 141 genetic programs whose activity profiles stratify patients into 25 distinct transcriptional states and proved to be more predictive of outcomes than did mutations. The coherence of these programs and accuracy of our network-based risk prediction was validated in two independent datasets. We observed subtype-specific vulnerabilities to interventions with existing drugs and revealed plausible mechanisms for relapse, including the establishment of an immunosuppressive microenvironment. Investigation of the t(4;14) clinical subtype using the TRN revealed that 16% of these patients exhibit an extreme-risk combination of genetic programs (median progression-free survival of 5 months) that create a distinct phenotype with targetable genes and pathways.
Collapse
|
49
|
Gupta C, Ramegowda V, Basu S, Pereira A. Using Network-Based Machine Learning to Predict Transcription Factors Involved in Drought Resistance. Front Genet 2021; 12:652189. [PMID: 34249082 PMCID: PMC8264776 DOI: 10.3389/fgene.2021.652189] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/13/2021] [Indexed: 12/13/2022] Open
Abstract
Gene regulatory networks underpin stress response pathways in plants. However, parsing these networks to prioritize key genes underlying a particular trait is challenging. Here, we have built the Gene Regulation and Association Network (GRAiN) of rice (Oryza sativa). GRAiN is an interactive query-based web-platform that allows users to study functional relationships between transcription factors (TFs) and genetic modules underlying abiotic-stress responses. We built GRAiN by applying a combination of different network inference algorithms to publicly available gene expression data. We propose a supervised machine learning framework that complements GRAiN in prioritizing genes that regulate stress signal transduction and modulate gene expression under drought conditions. Our framework converts intricate network connectivity patterns of 2160 TFs into a single drought score. We observed that TFs with the highest drought scores define the functional, structural, and evolutionary characteristics of drought resistance in rice. Our approach accurately predicted the function of OsbHLH148 TF, which we validated using in vitro protein-DNA binding assays and mRNA sequencing loss-of-function mutants grown under control and drought stress conditions. Our network and the complementary machine learning strategy lends itself to predicting key regulatory genes underlying other agricultural traits and will assist in the genetic engineering of desirable rice varieties.
Collapse
Affiliation(s)
- Chirag Gupta
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Venkategowda Ramegowda
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Supratim Basu
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Andy Pereira
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| |
Collapse
|
50
|
Wang N, Lefaudeux D, Mazumder A, Li JJ, Hoffmann A. Identifying the combinatorial control of signal-dependent transcription factors. PLoS Comput Biol 2021; 17:e1009095. [PMID: 34166361 PMCID: PMC8263068 DOI: 10.1371/journal.pcbi.1009095] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 07/07/2021] [Accepted: 05/18/2021] [Indexed: 12/13/2022] Open
Abstract
The effectiveness of immune responses depends on the precision of stimulus-responsive gene expression programs. Cells specify which genes to express by activating stimulus-specific combinations of stimulus-induced transcription factors (TFs). Their activities are decoded by a gene regulatory strategy (GRS) associated with each response gene. Here, we examined whether the GRSs of target genes may be inferred from stimulus-response (input-output) datasets, which remains an unresolved model-identifiability challenge. We developed a mechanistic modeling framework and computational workflow to determine the identifiability of all possible combinations of synergistic (AND) or non-synergistic (OR) GRSs involving three transcription factors. Considering different sets of perturbations for stimulus-response studies, we found that two thirds of GRSs are easily distinguishable but that substantially more quantitative data is required to distinguish the remaining third. To enhance the accuracy of the inference with timecourse experimental data, we developed an advanced error model that avoids error overestimates by distinguishing between value and temporal error. Incorporating this error model into a Bayesian framework, we show that GRS models can be identified for individual genes by considering multiple datasets. Our analysis rationalizes the allocation of experimental resources by identifying most informative TF stimulation conditions. Applying this computational workflow to experimental data of immune response genes in macrophages, we found that a much greater fraction of genes are combinatorially controlled than previously reported by considering compensation among transcription factors. Specifically, we revealed that a group of known NFκB target genes may also be regulated by IRF3, which is supported by chromatin immuno-precipitation analysis. Our study provides a computational workflow for designing and interpreting stimulus-response gene expression studies to identify underlying gene regulatory strategies and further a mechanistic understanding.
Collapse
Affiliation(s)
- Ning Wang
- Institute for Quantitative and Computational Biosciences (QCBio), University of California, Los Angeles, California, United States of America
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, California, United States of America
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| | - Diane Lefaudeux
- Institute for Quantitative and Computational Biosciences (QCBio), University of California, Los Angeles, California, United States of America
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, California, United States of America
| | - Anup Mazumder
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, California, United States of America
| | - Jingyi Jessica Li
- Institute for Quantitative and Computational Biosciences (QCBio), University of California, Los Angeles, California, United States of America
- Department of Statistics, University of California, Los Angeles, California, United States of America
| | - Alexander Hoffmann
- Institute for Quantitative and Computational Biosciences (QCBio), University of California, Los Angeles, California, United States of America
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|