1
|
Musilova J, Vafek Z, Puniya BL, Zimmer R, Helikar T, Sedlar K. Augusta: From RNA-Seq to gene regulatory networks and Boolean models. Comput Struct Biotechnol J 2024; 23:783-790. [PMID: 38312198 PMCID: PMC10837063 DOI: 10.1016/j.csbj.2024.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/17/2024] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Computational models of gene regulations help to understand regulatory mechanisms and are extensively used in a wide range of areas, e.g., biotechnology or medicine, with significant benefits. Unfortunately, there are only a few computational gene regulatory models of whole genomes allowing static and dynamic analysis due to the lack of sophisticated tools for their reconstruction. Here, we describe Augusta, an open-source Python package for Gene Regulatory Network (GRN) and Boolean Network (BN) inference from the high-throughput gene expression data. Augusta can reconstruct genome-wide models suitable for static and dynamic analyses. Augusta uses a unique approach where the first estimation of a GRN inferred from expression data is further refined by predicting transcription factor binding motifs in promoters of regulated genes and by incorporating verified interactions obtained from databases. Moreover, a refined GRN is transformed into a draft BN by searching in the curated model database and setting logical rules to incoming edges of target genes, which can be further manually edited as the model is provided in the SBML file format. The approach is applicable even if information about the organism under study is not available in the databases, which is typically the case for non-model organisms including most microbes. Augusta can be operated from the command line and, thus, is easy to use for automated prediction of models for various genomes. The Augusta package is freely available at github.com/JanaMus/Augusta. Documentation and tutorials are available at augusta.readthedocs.io.
Collapse
Affiliation(s)
- Jana Musilova
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno 61600, Czech Republic
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
| | - Zdenek Vafek
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
- Institute of Forensic Engineering, Brno University of Technology, Brno 61200, Czech Republic
| | - Bhanwar Lal Puniya
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
| | - Ralf Zimmer
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich 80539, Germany
| | - Tomas Helikar
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
| | - Karel Sedlar
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno 61600, Czech Republic
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich 80539, Germany
| |
Collapse
|
2
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. Gene regulatory network inference with covariance dynamics. Math Biosci 2024:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, NewYork, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, NewYork, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
3
|
Martínez-Enguita D, Dwivedi SK, Jörnsten R, Gustafsson M. NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures. Brief Bioinform 2023; 24:bbad293. [PMID: 37587790 PMCID: PMC10516364 DOI: 10.1093/bib/bbad293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/25/2023] [Accepted: 07/29/2023] [Indexed: 08/18/2023] Open
Abstract
Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- David Martínez-Enguita
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Sanjiv K Dwivedi
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Rebecka Jörnsten
- Department of Mathematical Sciences, Chalmers University of Technology, Sweden
| | - Mika Gustafsson
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| |
Collapse
|
4
|
Choi Y, Li R, Quon G. siVAE: interpretable deep generative models for single-cell transcriptomes. Genome Biol 2023; 24:29. [PMID: 36803416 PMCID: PMC9940350 DOI: 10.1186/s13059-023-02850-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 01/06/2023] [Indexed: 02/22/2023] Open
Abstract
Neural networks such as variational autoencoders (VAE) perform dimensionality reduction for the visualization and analysis of genomic data, but are limited in their interpretability: it is unknown which data features are represented by each embedding dimension. We present siVAE, a VAE that is interpretable by design, thereby enhancing downstream analysis tasks. Through interpretation, siVAE also identifies gene modules and hubs without explicit gene network inference. We use siVAE to identify gene modules whose connectivity is associated with diverse phenotypes such as iPSC neuronal differentiation efficiency and dementia, showcasing the wide applicability of interpretable generative models for genomic data analysis.
Collapse
Affiliation(s)
- Yongin Choi
- Graduate Group in Biomedical Engineering, University of California, Davis, Davis, CA, USA
- Genome Center, University of California, Davis, Davis, CA, USA
| | - Ruoxin Li
- Genome Center, University of California, Davis, Davis, CA, USA
- Graduate Group in Biostatistics, University of California, Davis, Davis, CA, USA
| | - Gerald Quon
- Graduate Group in Biomedical Engineering, University of California, Davis, Davis, CA, USA.
- Genome Center, University of California, Davis, Davis, CA, USA.
- Department of Molecular and Cellular Biology, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
5
|
Cerutti C, Zhang L, Tribollet V, Shi JR, Brillet R, Gillet B, Hughes S, Forcet C, Shi TL, Vanacker JM. Computational identification of new potential transcriptional partners of ERRα in breast cancer cells: specific partners for specific targets. Sci Rep 2022; 12:3826. [PMID: 35264626 PMCID: PMC8907200 DOI: 10.1038/s41598-022-07744-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 02/17/2022] [Indexed: 12/26/2022] Open
Abstract
Estrogen related receptors are orphan members of the nuclear receptor superfamily acting as transcription factors (TFs). In contrast to classical nuclear receptors, the activities of the ERRs are not controlled by a natural ligand. Regulation of their activities thus relies on availability of transcriptional co-regulators. In this paper, we focus on ERRα, whose involvement in cancer progression has been broadly demonstrated. We propose a new approach to identify potential co-activators, starting from previously identified ERRα-activated genes in a breast cancer (BC) cell line. Considering mRNA gene expression from two sets of human BC cells as major endpoint, we used sparse partial least squares modeling to uncover new transcriptional regulators associated with ERRα. Among them, DDX21, MYBBP1A, NFKB1, and SETD7 are functionally relevant in MDA-MB-231 cells, specifically activating the expression of subsets of ERRα-activated genes. We studied SET7 in more details and showed its co-localization with ERRα and its ERRα-dependent transcriptional and phenotypic effects. Our results thus demonstrate the ability of a modeling approach to identify new transcriptional partners from gene expression. Finally, experimental results show that ERRα cooperates with distinct co-regulators to control the expression of distinct sets of target genes, thus reinforcing the combinatorial specificity of transcription.
Collapse
Affiliation(s)
- Catherine Cerutti
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Ling Zhang
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Violaine Tribollet
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Jing-Ru Shi
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Riwan Brillet
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Benjamin Gillet
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Sandrine Hughes
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Christelle Forcet
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Tie-Liu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Jean-Marc Vanacker
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France.
| |
Collapse
|
6
|
Suriyalaksh M, Raimondi C, Mains A, Segonds-Pichon A, Mukhtar S, Murdoch S, Aldunate R, Krueger F, Guimerà R, Andrews S, Sales-Pardo M, Casanueva O. Gene regulatory network inference in long-lived C. elegans reveals modular properties that are predictive of novel aging genes. iScience 2022; 25:103663. [PMID: 35036864 PMCID: PMC8753122 DOI: 10.1016/j.isci.2021.103663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 09/09/2021] [Accepted: 12/15/2021] [Indexed: 11/24/2022] Open
Abstract
We design a “wisdom-of-the-crowds” GRN inference pipeline and couple it to complex network analysis to understand the organizational principles governing gene regulation in long-lived glp-1/Notch Caenorhabditis elegans. The GRN has three layers (input, core, and output) and is topologically equivalent to bow-tie/hourglass structures prevalent among metabolic networks. To assess the functional importance of structural layers, we screened 80% of regulators and discovered 50 new aging genes, 86% with human orthologues. Genes essential for longevity—including ones involved in insulin-like signaling (ILS)—are at the core, indicating that GRN's structure is predictive of functionality. We used in vivo reporters and a novel functional network covering 5,497 genetic interactions to make mechanistic predictions. We used genetic epistasis to test some of these predictions, uncovering a novel transcriptional regulator, sup-37, that works alongside DAF-16/FOXO. We present a framework with predictive power that can accelerate discovery in C. elegans and potentially humans. Gene-regulatory inference provides global network of long-lived animals The large-scale topology of the network has an hourglass structure Membership to the core of the hourglass is a good predictor of functionality Discovered 50 novel aging genes, including sup-37, a DAF-16 dependent gene
Collapse
Affiliation(s)
| | | | - Abraham Mains
- Babraham Institute, Babraham, Cambridge CB22 3AT, UK
| | | | | | | | - Rebeca Aldunate
- Escuela de Biotecnología, Facultad de Ciencias, Universidad Santo Tomas, Santiago, Chile
| | - Felix Krueger
- Babraham Institute, Babraham, Cambridge CB22 3AT, UK
| | - Roger Guimerà
- ICREA, Barcelona 08010, Catalonia, Spain.,Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | - Simon Andrews
- Babraham Institute, Babraham, Cambridge CB22 3AT, UK
| | - Marta Sales-Pardo
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | | |
Collapse
|
7
|
Conard AM, Goodman N, Hu Y, Perrimon N, Singh R, Lawrence C, Larschan E. TIMEOR: a web-based tool to uncover temporal regulatory mechanisms from multi-omics data. Nucleic Acids Res 2021; 49:W641-W653. [PMID: 34125906 PMCID: PMC8262710 DOI: 10.1093/nar/gkab384] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/13/2021] [Accepted: 04/28/2021] [Indexed: 01/17/2023] Open
Abstract
Uncovering how transcription factors regulate their targets at DNA, RNA and protein levels over time is critical to define gene regulatory networks (GRNs) and assign mechanisms in normal and diseased states. RNA-seq is a standard method measuring gene regulation using an established set of analysis stages. However, none of the currently available pipeline methods for interpreting ordered genomic data (in time or space) use time-series models to assign cause and effect relationships within GRNs, are adaptive to diverse experimental designs, or enable user interpretation through a web-based platform. Furthermore, methods integrating ordered RNA-seq data with protein–DNA binding data to distinguish direct from indirect interactions are urgently needed. We present TIMEOR (Trajectory Inference and Mechanism Exploration with Omics data in R), the first web-based and adaptive time-series multi-omics pipeline method which infers the relationship between gene regulatory events across time. TIMEOR addresses the critical need for methods to determine causal regulatory mechanism networks by leveraging time-series RNA-seq, motif analysis, protein–DNA binding data, and protein-protein interaction networks. TIMEOR’s user-catered approach helps non-coders generate new hypotheses and validate known mechanisms. We used TIMEOR to identify a novel link between insulin stimulation and the circadian rhythm cycle. TIMEOR is available at https://github.com/ashleymaeconard/TIMEOR.git and http://timeor.brown.edu.
Collapse
Affiliation(s)
- Ashley Mae Conard
- Computer Science Department, Brown University, Providence, RI 02912, USA.,Center for Computational and Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Nathaniel Goodman
- Computer Science Department, Brown University, Providence, RI 02912, USA
| | - Yanhui Hu
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Director of Bioinformatics DRSC/TRiP Functional Genomics Resources, Harvard Medical School, Boston, MA 02115, USA
| | - Norbert Perrimon
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.,Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Ritambhara Singh
- Computer Science Department, Brown University, Providence, RI 02912, USA.,Center for Computational and Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Charles Lawrence
- Center for Computational and Molecular Biology, Brown University, Providence, RI 02912, USA.,Applied Math Department, Brown University, Providence, RI 02912, USA
| | - Erica Larschan
- Center for Computational and Molecular Biology, Brown University, Providence, RI 02912, USA.,Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI 02912, USA
| |
Collapse
|
8
|
Grimes T, Datta S. SeqNet: An R Package for Generating Gene-Gene Networks and Simulating RNA-Seq Data. J Stat Softw 2021; 98:10.18637/jss.v098.i12. [PMID: 34321962 PMCID: PMC8315007 DOI: 10.18637/jss.v098.i12] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Gene expression data provide an abundant resource for inferring connections in gene regulatory networks. While methodologies developed for this task have shown success, a challenge remains in comparing the performance among methods. Gold-standard datasets are scarce and limited in use. And while tools for simulating expression data are available, they are not designed to resemble the data obtained from RNA-seq experiments. SeqNet is an R package that provides tools for generating a rich variety of gene network structures and simulating RNA-seq data from them. This produces in silico RNA-seq data for benchmarking and assessing gene network inference methods. The package is available on CRAN and on GitHub at https://github.com/tgrimes/SeqNet.
Collapse
Affiliation(s)
- Tyler Grimes
- Univeristy of Florida, Department of Biostatistics
| | | |
Collapse
|
9
|
Parkinson's Disease Master Regulators on Substantia Nigra and Frontal Cortex and Their Use for Drug Repositioning. Mol Neurobiol 2020; 58:1517-1534. [PMID: 33211252 DOI: 10.1007/s12035-020-02203-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 11/03/2020] [Indexed: 12/14/2022]
Abstract
Parkinson's disease (PD) is among the most prevalent neurodegenerative diseases. Available evidences support the view of PD as a complex disease, being the outcome of interactions between genetic and environmental factors. In face of diagnosis and therapy challenges, and the elusive PD etiology, the use of alternative methodological approaches for the elucidation of the disease pathophysiological mechanisms and proposal of novel potential therapeutic interventions has become increasingly necessary. In the present study, we first reconstructed the transcriptional regulatory networks (TN), centered on transcription factors (TF), of two brain regions affected in PD, the substantia nigra pars compacta (SNc) and the frontal cortex (FCtx). Then, we used case-control studies data from these regions to identify TFs working as master regulators (MR) of the disease, based on region-specific TNs. Twenty-nine regulatory units enriched with differentially expressed genes were identified for the SNc, and twenty for the FCtx, all of which were considered MR candidates for PD. Three consensus MR candidates were found for SNc and FCtx, namely ATF2, SLC30A9, and ZFP69B. In order to search for novel potential therapeutic interventions, we used these consensus MR candidate signatures as input to the Connectivity Map (CMap), a computational drug repositioning webtool. This analysis resulted in the identification of four drugs that reverse the expression pattern of all three MR consensus simultaneously, benperidol, harmaline, tubocurarine chloride, and vorinostat, thus suggested as novel potential PD therapeutic interventions.
Collapse
|
10
|
A review of methods for the reconstruction and analysis of integrated genome-scale models of metabolism and regulation. Biochem Soc Trans 2020; 48:1889-1903. [DOI: 10.1042/bst20190840] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 07/16/2020] [Accepted: 08/21/2020] [Indexed: 02/07/2023]
Abstract
The current survey aims to describe the main methodologies for extending the reconstruction and analysis of genome-scale metabolic models and phenotype simulation with Flux Balance Analysis mathematical frameworks, via the integration of Transcriptional Regulatory Networks and/or gene expression data. Although the surveyed methods are aimed at improving phenotype simulations obtained from these models, the perspective of reconstructing integrated genome-scale models of metabolism and gene expression for diverse prokaryotes is still an open challenge.
Collapse
|
11
|
Manzoni C, Lewis PA, Ferrari R. Network Analysis for Complex Neurodegenerative Diseases. CURRENT GENETIC MEDICINE REPORTS 2020. [DOI: 10.1007/s40142-020-00181-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Abstract
Purpose of Review
Biomedicine is witnessing a paradigm shift in the way complex disorders are investigated. In particular, the need for big data interpretation has led to the development of pipelines that require the cooperation of different fields of expertise, including medicine, functional biology, informatics, mathematics and systems biology. This review sits at the crossroad of different disciplines and surveys the recent developments in the use of graph theory (in the form of network analysis) to interpret large and different datasets in the context of complex neurodegenerative diseases. It aims at a professional audience with different backgrounds.
Recent Findings
Biomedicine has entered the era of big data, and this is actively changing the way we approach and perform research. The increase in size and power of biomedical studies has led to the establishment of multi-centre, international working groups coordinating open access platforms for data generation, storage and analysis. Particularly, pipelines for data interpretation are under development, and network analysis is gaining momentum since it represents a versatile approach to study complex systems made of interconnected multiple players.
Summary
We will describe the era of big data in biomedicine and survey the major freely accessible multi-omics datasets. We will then introduce the principles of graph theory and provide examples of network analysis applied to the interpretation of complex neurodegenerative disorders.
Collapse
|
12
|
Mercatelli D, Scalambra L, Triboli L, Ray F, Giorgi FM. Gene regulatory network inference resources: A practical overview. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194430. [PMID: 31678629 DOI: 10.1016/j.bbagrm.2019.194430] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 09/06/2019] [Accepted: 09/09/2019] [Indexed: 02/08/2023]
Abstract
Transcriptional regulation is a fundamental molecular mechanism involved in almost every aspect of life, from homeostasis to development, from metabolism to behavior, from reaction to stimuli to disease progression. In recent years, the concept of Gene Regulatory Networks (GRNs) has grown popular as an effective applied biology approach for describing the complex and highly dynamic set of transcriptional interactions, due to its easy-to-interpret features. Since cataloguing, predicting and understanding every GRN connection in all species and cellular contexts remains a great challenge for biology, researchers have developed numerous tools and methods to infer regulatory processes. In this review, we catalogue these methods in six major areas, based on the dominant underlying information leveraged to infer GRNs: Coexpression, Sequence Motifs, Chromatin Immunoprecipitation (ChIP), Orthology, Literature and Protein-Protein Interaction (PPI) specifically focused on transcriptional complexes. The methods described here cover a wide range of user-friendliness: from web tools that require no prior computational expertise to command line programs and algorithms for large scale GRN inferences. Each method for GRN inference described herein effectively illustrates a type of transcriptional relationship, with many methods being complementary to others. While a truly holistic approach for inferring and displaying GRNs remains one of the greatest challenges in the field of systems biology, we believe that the integration of multiple methods described herein provides an effective means with which experimental and computational biologists alike may obtain the most complete pictures of transcriptional relationships. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Laura Scalambra
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Luca Triboli
- Centre for Integrative Biology (CIBIO), University of Trento, Italy
| | - Forest Ray
- Department of Systems Biology, Columbia University Medical Center, New York, NY, United States
| | - Federico M Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| |
Collapse
|
13
|
Schubert M, Colomé-Tatché M, Foijer F. Gene networks in cancer are biased by aneuploidies and sample impurities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194444. [PMID: 31654805 DOI: 10.1016/j.bbagrm.2019.194444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 09/05/2019] [Accepted: 10/14/2019] [Indexed: 12/14/2022]
Abstract
Gene regulatory network inference is a standard technique for obtaining structured regulatory information from, for instance, gene expression measurements. Methods performing this task have been extensively evaluated on synthetic, and to a lesser extent real data sets. In contrast to these test evaluations, applications to gene expression data of human cancers are often limited by fewer samples and more potential regulatory links, and are biased by copy number aberrations as well as cell mixtures and sample impurities. Here, we take networks inferred from TCGA cohorts as an example to show that (1) transcription factor annotations are essential to obtain reliable networks, and (2) even for state of the art methods, we expect that between 20 and 80% of edges are caused by copy number changes and cell mixtures rather than transcription factor regulation.
Collapse
Affiliation(s)
- Michael Schubert
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany.
| | - Maria Colomé-Tatché
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Floris Foijer
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands.
| |
Collapse
|
14
|
Kizhakkethil Youseph AS, Chetty M, Karmakar G. Reverse engineering genetic networks using nonlinear saturation kinetics. Biosystems 2019; 182:30-41. [PMID: 31185246 DOI: 10.1016/j.biosystems.2019.103977] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/25/2019] [Accepted: 05/27/2019] [Indexed: 01/01/2023]
Abstract
A gene regulatory network (GRN) represents a set of genes along with their regulatory interactions. Cellular behavior is driven by genetic level interactions. Dynamics of such systems show nonlinear saturation kinetics which can be best modeled by Michaelis-Menten (MM) and Hill equations. Although MM equation is being widely used for modeling biochemical processes, it has been applied rarely for reverse engineering GRNs. In this paper, we develop a complete framework for a novel model for GRN inference using MM kinetics. A set of coupled equations is first proposed for modeling GRNs. In the coupled model, Michaelis-Menten constant associated with regulation by a gene is made invariant irrespective of the gene being regulated. The parameter estimation of the proposed model is carried out using an evolutionary optimization method, namely, trigonometric differential evolution (TDE). Subsequently, the model is further improved and the regulations of different genes by a given gene are made distinct by allowing varying values of Michaelis-Menten constants for each regulation. Apart from making the model more relevant biologically, the improvement results in a decoupled GRN model with fast estimation of model parameters. Further, to enhance exploitation of the search, we propose a local search algorithm based on hill climbing heuristics. A novel mutation operation is also proposed to avoid population stagnation and premature convergence. Real life benchmark data sets generated in vivo are used for validating the proposed model. Further, we also analyze realistic in silico datasets generated using GeneNetweaver. The comparison of the performance of proposed model with other existing methods shows the potential of the proposed model.
Collapse
Affiliation(s)
| | - Madhu Chetty
- School of Science, Engineering and Information Technology, Federation University Australia, Gippsland 3842, Australia
| | - Gour Karmakar
- School of Science, Engineering and Information Technology, Federation University Australia, Gippsland 3842, Australia
| |
Collapse
|