1
|
Unger Avila P, Padvitski T, Leote AC, Chen H, Saez-Rodriguez J, Kann M, Beyer A. Gene regulatory networks in disease and ageing. Nat Rev Nephrol 2024; 20:616-633. [PMID: 38867109 DOI: 10.1038/s41581-024-00849-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/14/2024]
Abstract
The precise control of gene expression is required for the maintenance of cellular homeostasis and proper cellular function, and the declining control of gene expression with age is considered a major contributor to age-associated changes in cellular physiology and disease. The coordination of gene expression can be represented through models of the molecular interactions that govern gene expression levels, so-called gene regulatory networks. Gene regulatory networks can represent interactions that occur through signal transduction, those that involve regulatory transcription factors, or statistical models of gene-gene relationships based on the premise that certain sets of genes tend to be coexpressed across a range of conditions and cell types. Advances in experimental and computational technologies have enabled the inference of these networks on an unprecedented scale and at unprecedented precision. Here, we delineate different types of gene regulatory networks and their cell-biological interpretation. We describe methods for inferring such networks from large-scale, multi-omics datasets and present applications that have aided our understanding of cellular ageing and disease mechanisms.
Collapse
Affiliation(s)
- Paula Unger Avila
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Tsimafei Padvitski
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Ana Carolina Leote
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - He Chen
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Julio Saez-Rodriguez
- Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg University, Heidelberg, Germany
| | - Martin Kann
- Department II of Internal Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Andreas Beyer
- Cluster of Excellence on Cellular Stress Responses in Aging-associated Diseases (CECAD), University of Cologne, Cologne, Germany.
- Center for Molecular Medicine Cologne, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
- Institute for Genetics, Faculty of Mathematics and Natural Sciences, University of Cologne, Cologne, Germany.
| |
Collapse
|
2
|
Kernfeld E, Keener R, Cahan P, Battle A. Transcriptome data are insufficient to control false discoveries in regulatory network inference. Cell Syst 2024; 15:709-724.e13. [PMID: 39173585 DOI: 10.1016/j.cels.2024.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 05/31/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024]
Abstract
Inference of causal transcriptional regulatory networks (TRNs) from transcriptomic data suffers notoriously from false positives. Approaches to control the false discovery rate (FDR), for example, via permutation, bootstrapping, or multivariate Gaussian distributions, suffer from several complications: difficulty in distinguishing direct from indirect regulation, nonlinear effects, and causal structure inference requiring "causal sufficiency," meaning experiments that are free of any unmeasured, confounding variables. Here, we use a recently developed statistical framework, model-X knockoffs, to control the FDR while accounting for indirect effects, nonlinear dose-response, and user-provided covariates. We adjust the procedure to estimate the FDR correctly even when measured against incomplete gold standards. However, benchmarking against chromatin immunoprecipitation (ChIP) and other gold standards reveals higher observed than reported FDR. This indicates that unmeasured confounding is a major driver of FDR in TRN inference. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Eric Kernfeld
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Institute for Cell Engineering, Johns Hopkins Medicine, Baltimore, MD, USA; Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Genetic Medicine, Johns Hopkins Medicine, Baltimore, MD, USA; Malone Center for Engineering and Healthcare, Johns Hopkins University, Baltimore, MD, USA; Data Science and AI Institute, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
3
|
Kundu S. A mathematically rigorous algorithm to define, compute and assess relevance of the probable dissociation constants in characterizing a biochemical network. Sci Rep 2024; 14:3507. [PMID: 38347039 PMCID: PMC10861591 DOI: 10.1038/s41598-024-53231-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 01/30/2024] [Indexed: 02/15/2024] Open
Abstract
Metabolism results from enzymatic- and non-enzymatic interactions of several molecules, is easily parameterized with the dissociation constant and occurs via biochemical networks. The dissociation constant is an empirically determined parameter and cannot be used directly to investigate in silico models of biochemical networks. Here, we develop and present an algorithm to define, compute and assess the relevance of the probable dissociation constant for every reaction of a biochemical network. The reactants and reactions of this network are modelled by a stoichiometry number matrix. The algorithm computes the null space and then serially generates subspaces by combinatorially summing the spanning vectors that are non-trivial and unique. This is done until the terms of each row either monotonically diverge or form an alternating sequence whose terms can be partitioned into subsets with almost the same number of oppositely signed terms. For a selected null space-generated subspace the algorithm utilizes several statistical and mathematical descriptors to select and bin terms from each row into distinct outcome-specific subsets. The terms of each subset are summed, mapped to the real-valued open interval [Formula: see text] and used to populate a reaction-specific outcome vector. The p1-norm for this vector is then the probable dissociation constant for this reaction. These steps are continued until every reaction of a modelled network is unambiguously annotated. The assertions presented are complemented by computational studies of a biochemical network for aerobic glycolysis. The fundamental premise of this work is that every row of a null space-generated subspace is a valid reaction and can therefore, be modelled as a reaction-specific sequence vector with a dimension that corresponds to the cardinality of the subspace after excluding all trivial- and redundant-vectors. A major finding of this study is that the row-wise sum or the sum of the terms contained in each reaction-specific sequence vector is mapped unambiguously to a positive real number. This means that the probable dissociation constants, for all reactions, can be directly computed from the stoichiometry number matrix and are suitable indicators of outcome for every reaction of the modelled biochemical network. Additionally, we find that the unambiguous annotation for a biochemical network will require a minimum number of iterations and will determine computational complexity.
Collapse
Affiliation(s)
- Siddhartha Kundu
- Department of Biochemistry, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110029, India.
| |
Collapse
|
4
|
Greulich P. Quantitative Modelling in Stem Cell Biology and Beyond: How to Make Best Use of It. CURRENT STEM CELL REPORTS 2023; 9:67-76. [PMID: 38145009 PMCID: PMC10739548 DOI: 10.1007/s40778-023-00230-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2023] [Indexed: 12/26/2023]
Abstract
Purpose of Review This article gives a broad overview of quantitative modelling approaches in biology and provides guidance on how to employ them to boost stem cell research, by helping to answer biological questions and to predict the outcome of biological processes. Recent Findings The twenty-first century has seen a steady increase in the proportion of cell biology publications employing mathematical modelling to aid experimental research. However, quantitative modelling is often used as a rather decorative element to confirm experimental findings, an approach which often yields only marginal added value, and is in many cases scientifically questionable. Summary Quantitative modelling can boost biological research in manifold ways, but one has to take some careful considerations before embarking on a modelling campaign, in order to maximise its added value, to avoid pitfalls that may lead to wrong results, and to be aware of its fundamental limitations, imposed by the risks of over-fitting and "universality".
Collapse
Affiliation(s)
- Philip Greulich
- School of Mathematical Sciences, University of Southampton, Southampton, UK
- Institute for Life Sciences, University of Southampton, Southampton, UK
| |
Collapse
|
5
|
Kundu S. ReDirection: an R-package to compute the probable dissociation constant for every reaction of a user-defined biochemical network. Front Mol Biosci 2023; 10:1206502. [PMID: 37942290 PMCID: PMC10628733 DOI: 10.3389/fmolb.2023.1206502] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 09/14/2023] [Indexed: 11/10/2023] Open
Abstract
Biochemical networks integrate enzyme-mediated substrate conversions with non-enzymatic complex formation and disassembly to accomplish complex biochemical and physiological functions. The choice of parameters and constraints used in most of these studies is numerically motivated and network-specific. Although sound in theory, the outcomes that result depart significantly from the intracellular milieu and are less likely to retain relevance in a clinical setting. There is a need for a computational tool which is biochemically relevant, mathematically rigorous, and unbiased, and can ascribe functionality to and generate potentially testable hypotheses for a user-defined biochemical network. Here, we present "ReDirection," an R-package which computes the probable dissociation constant for every reaction of a biochemical network directly from a null space-generated subspace of the stoichiometry number matrix of the modeled network. "ReDirection" delineates this subspace by excluding all trivial and redundant or duplicate occurrences of non-trivial vectors, combinatorially summing the vectors that remain and verifying that the upper or lower bounds of the sequence of terms formed by each row of this subspace belong to the open real-valued intervals - ∞ , - 1 or 1 , ∞ or whether the number of terms that are differently signed are almost equal. "ReDirection" iterates these steps until these bounds are consistent and unambiguous for all reactions of the modeled biochemical network. Thereafter, "ReDirection" filters the terms from each row of this subspace, bins them to outcome-specific subsets, sums and maps this to an outcome-specific reaction vector, and computes the p1-norm, which is the probable dissociation constant for a reaction. "ReDirection" works on first principles, does not discriminate between enzymatic and non-enzymatic reactions, offers a biochemically relevant and mathematically rigorous environment to explore user-defined biochemical networks under baseline and perturbed conditions, and can be used to address empirically intractable biochemical problems. The utility and relevance of "ReDirection" are highlighted by numerical studies on stoichiometric number models of biochemical networks of galactose metabolism and heme and cholesterol biosynthesis. "ReDirection" is freely available and accessible from the comprehensive R archive network (CRAN) with the URL (https://cran.r-project.org/package=ReDirection).
Collapse
Affiliation(s)
- Siddhartha Kundu
- Department of Biochemistry, All India Institute of Medical Sciences, New Delhi, India
| |
Collapse
|
6
|
Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol 2023; 19:e1011254. [PMID: 37561790 PMCID: PMC10414591 DOI: 10.1371/journal.pcbi.1011254] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Abstract
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.
Collapse
Affiliation(s)
- Malvina Marku
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Vera Pancaldi
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
7
|
Wen Y, Huang J, Guo S, Elyahu Y, Monsonego A, Zhang H, Ding Y, Zhu H. Applying causal discovery to single-cell analyses using CausalCell. eLife 2023; 12:e81464. [PMID: 37129360 PMCID: PMC10229139 DOI: 10.7554/elife.81464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 05/01/2023] [Indexed: 05/03/2023] Open
Abstract
Correlation between objects is prone to occur coincidentally, and exploring correlation or association in most situations does not answer scientific questions rich in causality. Causal discovery (also called causal inference) infers causal interactions between objects from observational data. Reported causal discovery methods and single-cell datasets make applying causal discovery to single cells a promising direction. However, evaluating and choosing causal discovery methods and developing and performing proper workflow remain challenges. We report the workflow and platform CausalCell (http://www.gaemons.net/causalcell/causalDiscovery/) for performing single-cell causal discovery. The workflow/platform is developed upon benchmarking four kinds of causal discovery methods and is examined by analyzing multiple single-cell RNA-sequencing (scRNA-seq) datasets. Our results suggest that different situations need different methods and the constraint-based PC algorithm with kernel-based conditional independence tests work best in most situations. Related issues are discussed and tips for best practices are given. Inferred causal interactions in single cells provide valuable clues for investigating molecular interactions and gene regulations, identifying critical diagnostic and therapeutic targets, and designing experimental and clinical interventions.
Collapse
Affiliation(s)
- Yujian Wen
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
| | - Jielong Huang
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
| | - Shuhui Guo
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
| | - Yehezqel Elyahu
- The Shraga Segal Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the NegevBeer-ShevaIsrael
| | - Alon Monsonego
- The Shraga Segal Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the NegevBeer-ShevaIsrael
| | - Hai Zhang
- Network Center, Southern Medical UniversityGuangzhouChina
| | - Yanqing Ding
- Department of Pathology, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
| | - Hao Zhu
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical UniversityGuangzhouChina
- Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical UniversityGuangzhouChina
- Guangdong Provincial Key Lab of Single Cell Technology and Application, Southern Medical UniversityGuangzhouChina
| |
Collapse
|
8
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
9
|
Jia Z, Zhang X. Accurate determination of causalities in gene regulatory networks by dissecting downstream target genes. Front Genet 2022; 13:923339. [PMID: 36568360 PMCID: PMC9768335 DOI: 10.3389/fgene.2022.923339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 11/08/2022] [Indexed: 12/12/2022] Open
Abstract
Accurate determination of causalities between genes is a challenge in the inference of gene regulatory networks (GRNs) from the gene expression profile. Although many methods have been developed for the reconstruction of GRNs, most of them are insufficient in determining causalities or regulatory directions. In this work, we present a novel method, namely, DDTG, to improve the accuracy of causality determination in GRN inference by dissecting downstream target genes. In the proposed method, the topology and hierarchy of GRNs are determined by mutual information and conditional mutual information, and the regulatory directions of GRNs are determined by Taylor formula-based regression. In addition, indirect interactions are removed with the sparseness of the network topology to improve the accuracy of network inference. The method is validated on the benchmark GRNs from DREAM3 and DREAM4 challenges. The results demonstrate the superior performance of the DDTG method on causality determination of GRNs compared to some popular GRN inference methods. This work provides a useful tool to infer the causal gene regulatory network.
Collapse
Affiliation(s)
- Zhigang Jia
- School of Mathematics and Statistics, Xinyang Normal University, Xinyang, China,Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China,*Correspondence: Xiujun Zhang,
| |
Collapse
|
10
|
Chen G, Liu ZP. Inferring causal gene regulatory network via GreyNet: From dynamic grey association to causation. Front Bioeng Biotechnol 2022; 10:954610. [PMID: 36237217 PMCID: PMC9551017 DOI: 10.3389/fbioe.2022.954610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/15/2022] [Indexed: 11/23/2022] Open
Abstract
Gene regulatory network (GRN) provides abundant information on gene interactions, which contributes to demonstrating pathology, predicting clinical outcomes, and identifying drug targets. Existing high-throughput experiments provide rich time-series gene expression data to reconstruct the GRN to further gain insights into the mechanism of organisms responding to external stimuli. Numerous machine-learning methods have been proposed to infer gene regulatory networks. Nevertheless, machine learning, especially deep learning, is generally a “black box,” which lacks interpretability. The causality has not been well recognized in GRN inference procedures. In this article, we introduce grey theory integrated with the adaptive sliding window technique to flexibly capture instant gene–gene interactions in the uncertain regulatory system. Then, we incorporate generalized multivariate Granger causality regression methods to transform the dynamic grey association into causation to generate directional regulatory links. We evaluate our model on the DREAM4 in silico benchmark dataset and real-world hepatocellular carcinoma (HCC) time-series data. We achieved competitive results on the DREAM4 compared with other state-of-the-art algorithms and gained meaningful GRN structure on HCC data respectively.
Collapse
Affiliation(s)
- Guangyi Chen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
- Center for Intelligent Medicine, Shandong University, Jinan, Shandong, China
- *Correspondence: Zhi-Ping Liu,
| |
Collapse
|
11
|
Deshpande A, Chu LF, Stewart R, Gitter A. Network inference with Granger causality ensembles on single-cell transcriptomics. Cell Rep 2022; 38:110333. [PMID: 35139376 PMCID: PMC9093087 DOI: 10.1016/j.celrep.2022.110333] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/19/2021] [Accepted: 01/12/2022] [Indexed: 12/20/2022] Open
Abstract
Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells' progress along a dynamic process based on their individual gene expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.
Collapse
Affiliation(s)
- Atul Deshpande
- Department of Electrical and Computer Engineering, University of Wisconsin - Madison, Madison, WI 53706, USA; Morgridge Institute for Research, Madison, WI 53715, USA
| | - Li-Fang Chu
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Anthony Gitter
- Morgridge Institute for Research, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53792, USA.
| |
Collapse
|
12
|
Lecca P. Machine Learning for Causal Inference in Biological Networks: Perspectives of This Challenge. FRONTIERS IN BIOINFORMATICS 2021; 1:746712. [PMID: 36303798 PMCID: PMC9581010 DOI: 10.3389/fbinf.2021.746712] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 09/08/2021] [Indexed: 11/13/2022] Open
Abstract
Most machine learning-based methods predict outcomes rather than understanding causality. Machine learning methods have been proved to be efficient in finding correlations in data, but unskilful to determine causation. This issue severely limits the applicability of machine learning methods to infer the causal relationships between the entities of a biological network, and more in general of any dynamical system, such as medical intervention strategies and clinical outcomes system, that is representable as a network. From the perspective of those who want to use the results of network inference not only to understand the mechanisms underlying the dynamics, but also to understand how the network reacts to external stimuli (e. g. environmental factors, therapeutic treatments), tools that can understand the causal relationships between data are highly demanded. Given the increasing popularity of machine learning techniques in computational biology and the recent literature proposing the use of machine learning techniques for the inference of biological networks, we would like to present the challenges that mathematics and computer science research faces in generalising machine learning to an approach capable of understanding causal relationships, and the prospects that achieving this will open up for the medical application domains of systems biology, the main paradigm of which is precisely network biology at any physical scale.
Collapse
Affiliation(s)
- Paola Lecca
- Faculty of Computer Science, Free University of Bozen-Bolzano, Piazza Domenicani, Bolzano, Italy
| |
Collapse
|
13
|
Oh VKS, Li RW. Temporal Dynamic Methods for Bulk RNA-Seq Time Series Data. Genes (Basel) 2021; 12:352. [PMID: 33673721 PMCID: PMC7997275 DOI: 10.3390/genes12030352] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 02/19/2021] [Accepted: 02/22/2021] [Indexed: 02/06/2023] Open
Abstract
Dynamic studies in time course experimental designs and clinical approaches have been widely used by the biomedical community. These applications are particularly relevant in stimuli-response models under environmental conditions, characterization of gradient biological processes in developmental biology, identification of therapeutic effects in clinical trials, disease progressive models, cell-cycle, and circadian periodicity. Despite their feasibility and popularity, sophisticated dynamic methods that are well validated in large-scale comparative studies, in terms of statistical and computational rigor, are less benchmarked, comparing to their static counterparts. To date, a number of novel methods in bulk RNA-Seq data have been developed for the various time-dependent stimuli, circadian rhythms, cell-lineage in differentiation, and disease progression. Here, we comprehensively review a key set of representative dynamic strategies and discuss current issues associated with the detection of dynamically changing genes. We also provide recommendations for future directions for studying non-periodical, periodical time course data, and meta-dynamic datasets.
Collapse
Affiliation(s)
- Vera-Khlara S. Oh
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA;
- Department of Computer Science and Statistics, College of Natural Sciences, Jeju National University, Jeju City 63243, Korea
| | - Robert W. Li
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA;
| |
Collapse
|