1
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
2
|
Morin A, Chu C, Pavlidis P. Identifying Reproducible Transcription Regulator Coexpression Patterns with Single Cell Transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580581. [PMID: 38559016 PMCID: PMC10979919 DOI: 10.1101/2024.02.15.580581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The proliferation of single cell transcriptomics has potentiated our ability to unveil patterns that reflect dynamic cellular processes, rather than cell type compositional effects that emerge from bulk tissue samples. In this study, we leverage a broad collection of single cell RNA-seq data to identify the gene partners whose expression is most coordinated with each human and mouse transcription regulator (TR). We assembled 120 human and 103 mouse scRNA-seq datasets from the literature (>28 million cells), constructing a single cell coexpression network for each. We aimed to understand the consistency of TR coexpression profiles across a broad sampling of biological contexts, rather than examine the preservation of context-specific signals. Our workflow therefore explicitly prioritizes the patterns that are most reproducible across cell types. Towards this goal, we characterize the similarity of each TR's coexpression within and across species. We create single cell coexpression rankings for each TR, demonstrating that this aggregated information recovers literature curated targets on par with ChIP-seq data. We then combine the coexpression and ChIP-seq information to identify candidate regulatory interactions supported across methods and species. Finally, we highlight interactions for the important neural TR ASCL1 to demonstrate how our compiled information can be adopted for community use.
Collapse
Affiliation(s)
- Alexander Morin
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC, Canada
| | - Chingpan Chu
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
3
|
Kernfeld E, Yang Y, Weinstock J, Battle A, Cahan P. A systematic comparison of computational methods for expression forecasting. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.28.551039. [PMID: 37577640 PMCID: PMC10418073 DOI: 10.1101/2023.07.28.551039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Expression forecasting methods use machine learning models to predict how a cell will alter its transcriptome upon perturbation. Such methods are enticing because they promise to answer pressing questions in fields ranging from developmental genetics to cell fate engineering and because they are a fast, cheap, and accessible complement to the corresponding experiments. However, the absolute and relative accuracy of these methods is poorly characterized, limiting their informed use, their improvement, and the interpretation of their predictions. To address these issues, we created a benchmarking platform that combines a panel of 11 large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to a wide variety of methods. We used our platform to systematically assess methods, parameters, and sources of auxiliary data, finding that performance strongly depends on the choice of metric, and especially for simple metrics like mean squared error, it is uncommon for expression forecasting methods to out-perform simple baselines. Our platform will serve as a resource to improve methods and to identify contexts in which expression forecasting can succeed.
Collapse
|
4
|
Maulding ND, Seninge L, Stuart JM. Associating transcription factors to single-cell trajectories with DREAMIT. Genome Biol 2024; 25:220. [PMID: 39143494 PMCID: PMC11323358 DOI: 10.1186/s13059-024-03368-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 08/06/2024] [Indexed: 08/16/2024] Open
Abstract
Inferring gene regulatory networks from single-cell RNA-sequencing trajectories has been an active area of research yet methods are still needed to identify regulators governing cell transitions. We developed DREAMIT (Dynamic Regulation of Expression Across Modules in Inferred Trajectories) to annotate transcription-factor activity along single-cell trajectory branches, using ensembles of relations to target genes. Using a benchmark representing several different tissues, as well as external validation with ATAC-Seq and Perturb-Seq data on hematopoietic cells, the method was found to have higher tissue-specific sensitivity and specificity over competing approaches.
Collapse
Affiliation(s)
- Nathan D Maulding
- UCSC Genomics Institute, Biomolecular Engineering, University of California, Santa Cruz, USA
| | - Lucas Seninge
- UCSC Genomics Institute, Biomolecular Engineering, University of California, Santa Cruz, USA
| | - Joshua M Stuart
- UCSC Genomics Institute, Biomolecular Engineering, University of California, Santa Cruz, USA.
| |
Collapse
|
5
|
Loers JU, Vermeirssen V. A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. Brief Bioinform 2024; 25:bbae382. [PMID: 39207727 PMCID: PMC11359808 DOI: 10.1093/bib/bbae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
6
|
Wang Y, Zhou F, Guan J. SFINN: inferring gene regulatory network from single-cell and spatial transcriptomic data with shared factor neighborhood and integrated neural network. Bioinformatics 2024; 40:btae433. [PMID: 38950180 PMCID: PMC11236097 DOI: 10.1093/bioinformatics/btae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/18/2024] [Accepted: 06/28/2024] [Indexed: 07/03/2024] Open
Abstract
MOTIVATION The rise of single-cell RNA sequencing (scRNA-seq) technology presents new opportunities for constructing detailed cell type-specific gene regulatory networks (GRNs) to study cell heterogeneity. However, challenges caused by noises, technical errors, and dropout phenomena in scRNA-seq data pose significant obstacles to GRN inference, making the design of accurate GRN inference algorithms still essential. The recent growth of both single-cell and spatial transcriptomic sequencing data enables the development of supervised deep learning methods to infer GRNs on these diverse single-cell datasets. RESULTS In this study, we introduce a novel deep learning framework based on shared factor neighborhood and integrated neural network (SFINN) for inferring potential interactions and causalities between transcription factors and target genes from single-cell and spatial transcriptomic data. SFINN utilizes shared factor neighborhood to construct cellular neighborhood network based on gene expression data and additionally integrates cellular network generated from spatial location information. Subsequently, the cell adjacency matrix and gene pair expression are fed into an integrated neural network framework consisting of a graph convolutional neural network and a fully-connected neural network to determine whether the genes interact. Performance evaluation in the tasks of gene interaction and causality prediction against the existing GRN reconstruction algorithms demonstrates the usability and competitiveness of SFINN across different kinds of data. SFINN can be applied to infer GRNs from conventional single-cell sequencing data and spatial transcriptomic data. AVAILABILITY AND IMPLEMENTATION SFINN can be accessed at GitHub: https://github.com/JGuan-lab/SFINN.
Collapse
Affiliation(s)
- Yongjie Wang
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Fengfan Zhou
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
- Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai 200240, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| |
Collapse
|
7
|
Trimbour R, Deutschmann IM, Cantini L. Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS. Bioinformatics 2024; 40:btae143. [PMID: 38460192 PMCID: PMC11065476 DOI: 10.1093/bioinformatics/btae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/20/2023] [Accepted: 03/07/2024] [Indexed: 03/11/2024] Open
Abstract
MOTIVATION The molecular identity of a cell results from a complex interplay between heterogeneous molecular layers. Recent advances in single-cell sequencing technologies have opened the possibility to measure such molecular layers of regulation. RESULTS Here, we present HuMMuS, a new method for inferring regulatory mechanisms from single-cell multi-omics data. Differently from the state-of-the-art, HuMMuS captures cooperation between biological macromolecules and can easily include additional layers of molecular regulation. We benchmarked HuMMuS with respect to the state-of-the-art on both paired and unpaired multi-omics datasets. Our results proved the improvements provided by HuMMuS in terms of transcription factor (TF) targets, TF binding motifs and regulatory regions prediction. Finally, once applied to snmC-seq, scATAC-seq and scRNA-seq data from mouse brain cortex, HuMMuS enabled to accurately cluster scRNA profiles and to identify potential driver TFs. AVAILABILITY AND IMPLEMENTATION HuMMuS is available at https://github.com/cantinilab/HuMMuS.
Collapse
Affiliation(s)
- Remi Trimbour
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015 Paris, France
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| | - Ina Maria Deutschmann
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| | - Laura Cantini
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015 Paris, France
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| |
Collapse
|
8
|
Stock M, Popp N, Fiorentino J, Scialdone A. Topological benchmarking of algorithms to infer gene regulatory networks from single-cell RNA-seq data. Bioinformatics 2024; 40:btae267. [PMID: 38627250 PMCID: PMC11096270 DOI: 10.1093/bioinformatics/btae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 02/28/2024] [Accepted: 04/16/2024] [Indexed: 05/18/2024] Open
Abstract
MOTIVATION In recent years, many algorithms for inferring gene regulatory networks from single-cell transcriptomic data have been published. Several studies have evaluated their accuracy in estimating the presence of an interaction between pairs of genes. However, these benchmarking analyses do not quantify the algorithms' ability to capture structural properties of networks, which are fundamental, e.g., for studying the robustness of a gene network to external perturbations. Here, we devise a three-step benchmarking pipeline called STREAMLINE that quantifies the ability of algorithms to capture topological properties of networks and identify hubs. RESULTS To this aim, we use data simulated from different types of networks as well as experimental data from three different organisms. We apply our benchmarking pipeline to four inference algorithms and provide guidance on which algorithm should be used depending on the global network property of interest. AVAILABILITY AND IMPLEMENTATION STREAMLINE is available at https://github.com/ScialdoneLab/STREAMLINE. The data generated in this study are available at https://doi.org/10.5281/zenodo.10710444.
Collapse
Affiliation(s)
- Marco Stock
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich 85354, Germany
| | - Niclas Popp
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Jonathan Fiorentino
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| |
Collapse
|
9
|
Yuan Y, Huo Q, Zhang Z, Wang Q, Wang J, Chang S, Cai P, Song KM, Galbraith DW, Zhang W, Huang L, Song R, Ma Z. Decoding the gene regulatory network of endosperm differentiation in maize. Nat Commun 2024; 15:34. [PMID: 38167709 PMCID: PMC10762121 DOI: 10.1038/s41467-023-44369-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/11/2023] [Indexed: 01/05/2024] Open
Abstract
The persistent cereal endosperm constitutes the majority of the grain volume. Dissecting the gene regulatory network underlying cereal endosperm development will facilitate yield and quality improvement of cereal crops. Here, we use single-cell transcriptomics to analyze the developing maize (Zea mays) endosperm during cell differentiation. After obtaining transcriptomic data from 17,022 single cells, we identify 12 cell clusters corresponding to five endosperm cell types and revealing complex transcriptional heterogeneity. We delineate the temporal gene-expression pattern from 6 to 7 days after pollination. We profile the genomic DNA-binding sites of 161 transcription factors differentially expressed between cell clusters and constructed a gene regulatory network by combining the single-cell transcriptomic data with the direct DNA-binding profiles, identifying 181 regulons containing genes encoding transcription factors along with their high-confidence targets, Furthermore, we map the regulons to endosperm cell clusters, identify cell-cluster-specific essential regulators, and experimentally validated three predicted key regulators. This study provides a framework for understanding cereal endosperm development and function at single-cell resolution.
Collapse
Affiliation(s)
- Yue Yuan
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China
- Sanya Institute of China Agricultural University, Sanya, 572025, China
- Hainan Yazhou Bay Seed Laboratory, Sanya, 572025, China
| | - Qiang Huo
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China
| | - Ziru Zhang
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China
| | - Qun Wang
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China
| | - Juanxia Wang
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China
| | - Shuaikang Chang
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China
| | - Peng Cai
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China
| | - Karen M Song
- Department of Biology, Trinity College of Arts and Sciences, Duke University, Durham, NC, 27708, USA
| | - David W Galbraith
- School of Plant Sciences and Bio5 Institute, University of Arizona, Tucson, AZ, 85721, USA
| | - Weixiao Zhang
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China
| | - Long Huang
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China
| | - Rentao Song
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China.
- Sanya Institute of China Agricultural University, Sanya, 572025, China.
- Hainan Yazhou Bay Seed Laboratory, Sanya, 572025, China.
| | - Zeyang Ma
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China.
- Sanya Institute of China Agricultural University, Sanya, 572025, China.
- Hainan Yazhou Bay Seed Laboratory, Sanya, 572025, China.
| |
Collapse
|
10
|
Manosalva Pérez N, Ferrari C, Engelhorn J, Depuydt T, Nelissen H, Hartwig T, Vandepoele K. MINI-AC: inference of plant gene regulatory networks using bulk or single-cell accessible chromatin profiles. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 117:280-301. [PMID: 37788349 DOI: 10.1111/tpj.16483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 09/13/2023] [Accepted: 09/16/2023] [Indexed: 10/05/2023]
Abstract
Gene regulatory networks (GRNs) represent the interactions between transcription factors (TF) and their target genes. Plant GRNs control transcriptional programs involved in growth, development, and stress responses, ultimately affecting diverse agricultural traits. While recent developments in accessible chromatin (AC) profiling technologies make it possible to identify context-specific regulatory DNA, learning the underlying GRNs remains a major challenge. We developed MINI-AC (Motif-Informed Network Inference based on Accessible Chromatin), a method that combines AC data from bulk or single-cell experiments with TF binding site (TFBS) information to learn GRNs in plants. We benchmarked MINI-AC using bulk AC datasets from different Arabidopsis thaliana tissues and showed that it outperforms other methods to identify correct TFBS. In maize, a crop with a complex genome and abundant distal AC regions, MINI-AC successfully inferred leaf GRNs with experimentally confirmed, both proximal and distal, TF-target gene interactions. Furthermore, we showed that both AC regions and footprints are valid alternatives to infer AC-based GRNs with MINI-AC. Finally, we combined MINI-AC predictions from bulk and single-cell AC datasets to identify general and cell-type specific maize leaf regulators. Focusing on C4 metabolism, we identified diverse regulatory interactions in specialized cell types for this photosynthetic pathway. MINI-AC represents a powerful tool for inferring accurate AC-derived GRNs in plants and identifying known and novel candidate regulators, improving our understanding of gene regulation in plants.
Collapse
Affiliation(s)
- Nicolás Manosalva Pérez
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Camilla Ferrari
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Julia Engelhorn
- Molecular Physiology Department, Heinrich-Heine University, 40225, Düsseldorf, Germany
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
| | - Thomas Depuydt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Hilde Nelissen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Thomas Hartwig
- Molecular Physiology Department, Heinrich-Heine University, 40225, Düsseldorf, Germany
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
- Cluster of Excellence on Plant Sciences, Düsseldorf, Germany
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, 9052, Ghent, Belgium
| |
Collapse
|
11
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 82] [Impact Index Per Article: 82.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
12
|
Bocci F, Jia D, Nie Q, Jolly MK, Onuchic J. Theoretical and computational tools to model multistable gene regulatory networks. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2023; 86:10.1088/1361-6633/acec88. [PMID: 37531952 PMCID: PMC10521208 DOI: 10.1088/1361-6633/acec88] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 08/02/2023] [Indexed: 08/04/2023]
Abstract
The last decade has witnessed a surge of theoretical and computational models to describe the dynamics of complex gene regulatory networks, and how these interactions can give rise to multistable and heterogeneous cell populations. As the use of theoretical modeling to describe genetic and biochemical circuits becomes more widespread, theoreticians with mathematical and physical backgrounds routinely apply concepts from statistical physics, non-linear dynamics, and network theory to biological systems. This review aims at providing a clear overview of the most important methodologies applied in the field while highlighting current and future challenges. It also includes hands-on tutorials to solve and simulate some of the archetypical biological system models used in the field. Furthermore, we provide concrete examples from the existing literature for theoreticians that wish to explore this fast-developing field. Whenever possible, we highlight the similarities and differences between biochemical and regulatory networks and 'classical' systems typically studied in non-equilibrium statistical and quantum mechanics.
Collapse
Affiliation(s)
- Federico Bocci
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, CA 92697, USA
- Department of Mathematics, University of California, Irvine, CA 92697, USA
| | - Dongya Jia
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
| | - Qing Nie
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, CA 92697, USA
- Department of Mathematics, University of California, Irvine, CA 92697, USA
| | - Mohit Kumar Jolly
- Centre for BioSystems Science and Engineering, Indian Institute of Science, Bangalore 560012, India
| | - José Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
- Department of Physics and Astronomy, Rice University, Houston, TX 77005, USA
- Department of Chemistry, Rice University, Houston, TX 77005, USA
- Department of Biosciences, Rice University, Houston, TX 77005, USA
| |
Collapse
|
13
|
Lu S, Keleş S. Debiased personalized gene coexpression networks for population-scale scRNA-seq data. Genome Res 2023; 33:932-947. [PMID: 37295843 PMCID: PMC10519377 DOI: 10.1101/gr.277363.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 06/07/2023] [Indexed: 06/12/2023]
Abstract
Population-scale single-cell RNA-seq (scRNA-seq) data sets create unique opportunities for quantifying expression variation across individuals at the gene coexpression network level. Estimation of coexpression networks is well established for bulk RNA-seq; however, single-cell measurements pose novel challenges owing to technical limitations and noise levels of this technology. Gene-gene correlation estimates from scRNA-seq tend to be severely biased toward zero for genes with low and sparse expression. Here, we present Dozer to debias gene-gene correlation estimates from scRNA-seq data sets and accurately quantify network-level variation across individuals. Dozer corrects correlation estimates in the general Poisson measurement model and provides a metric to quantify genes measured with high noise. Computational experiments establish that Dozer estimates are robust to mean expression levels of the genes and the sequencing depths of the data sets. Compared with alternatives, Dozer results in fewer false-positive edges in the coexpression networks, yields more accurate estimates of network centrality measures and modules, and improves the faithfulness of networks estimated from separate batches of the data sets. We showcase unique analyses enabled by Dozer in two population-scale scRNA-seq applications. Coexpression network-based centrality analysis of multiple differentiating human induced pluripotent stem cell (iPSC) lines yields biologically coherent gene groups that are associated with iPSC differentiation efficiency. Application with population-scale scRNA-seq of oligodendrocytes from postmortem human tissues of Alzheimer's disease and controls uniquely reveals coexpression modules of innate immune response with distinct coexpression levels between the diagnoses. Dozer represents an important advance in estimating personalized coexpression networks from scRNA-seq data.
Collapse
Affiliation(s)
- Shan Lu
- Department of Statistics, University of Wisconsin, Madison, Wisconsin 53706, USA
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin, Madison, Wisconsin 53706, USA;
- Department of Biostatistics and Medical Informatics, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin 53706, USA
| |
Collapse
|
14
|
Zhang S, Pyne S, Pietrzak S, Halberg S, McCalla SG, Siahpirani AF, Sridharan R, Roy S. Inference of cell type-specific gene regulatory networks on cell lineages from single cell omic datasets. Nat Commun 2023; 14:3064. [PMID: 37244909 PMCID: PMC10224950 DOI: 10.1038/s41467-023-38637-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 05/10/2023] [Indexed: 05/29/2023] Open
Abstract
Cell type-specific gene expression patterns are outputs of transcriptional gene regulatory networks (GRNs) that connect transcription factors and signaling proteins to target genes. Single-cell technologies such as single cell RNA-sequencing (scRNA-seq) and single cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), can examine cell-type specific gene regulation at unprecedented detail. However, current approaches to infer cell type-specific GRNs are limited in their ability to integrate scRNA-seq and scATAC-seq measurements and to model network dynamics on a cell lineage. To address this challenge, we have developed single-cell Multi-Task Network Inference (scMTNI), a multi-task learning framework to infer the GRN for each cell type on a lineage from scRNA-seq and scATAC-seq data. Using simulated and real datasets, we show that scMTNI is a broadly applicable framework for linear and branching lineages that accurately infers GRN dynamics and identifies key regulators of fate transitions for diverse processes such as cellular reprogramming and differentiation.
Collapse
Affiliation(s)
- Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Saptarshi Pyne
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Stefan Pietrzak
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Spencer Halberg
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Sunnie Grace McCalla
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Rupa Sridharan
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|