1
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
2
|
Omelyanchuk NA, Lavrekha VV, Bogomolov AG, Dolgikh VA, Sidorenko AD, Zemlyanskaya EV. Computational Reconstruction of the Transcription Factor Regulatory Network Induced by Auxin in Arabidopsis thaliana L. PLANTS (BASEL, SWITZERLAND) 2024; 13:1905. [PMID: 39065433 PMCID: PMC11280061 DOI: 10.3390/plants13141905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/05/2024] [Accepted: 07/06/2024] [Indexed: 07/28/2024]
Abstract
In plant hormone signaling, transcription factor regulatory networks (TFRNs), which link the master transcription factors to the biological processes under their control, remain insufficiently characterized despite their crucial function. Here, we identify a TFRN involved in the response to the key plant hormone auxin and define its impact on auxin-driven biological processes. To reconstruct the TFRN, we developed a three-step procedure, which is based on the integrated analysis of differentially expressed gene lists and a representative collection of transcription factor binding profiles. Its implementation is available as a part of the CisCross web server. With the new method, we distinguished two transcription factor subnetworks. The first operates before auxin treatment and is switched off upon hormone application, the second is switched on by the hormone. Moreover, we characterized the functioning of the auxin-regulated TFRN in control of chlorophyll and lignin biosynthesis, abscisic acid signaling, and ribosome biogenesis.
Collapse
Affiliation(s)
- Nadya A. Omelyanchuk
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia; (N.A.O.); (V.V.L.); (A.G.B.); (V.A.D.); (A.D.S.)
| | - Viktoriya V. Lavrekha
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia; (N.A.O.); (V.V.L.); (A.G.B.); (V.A.D.); (A.D.S.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| | - Anton G. Bogomolov
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia; (N.A.O.); (V.V.L.); (A.G.B.); (V.A.D.); (A.D.S.)
| | - Vladislav A. Dolgikh
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia; (N.A.O.); (V.V.L.); (A.G.B.); (V.A.D.); (A.D.S.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| | - Aleksandra D. Sidorenko
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia; (N.A.O.); (V.V.L.); (A.G.B.); (V.A.D.); (A.D.S.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| | - Elena V. Zemlyanskaya
- Department of Systems Biology, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia; (N.A.O.); (V.V.L.); (A.G.B.); (V.A.D.); (A.D.S.)
- Department of Natural Sciences, Novosibirsk State University, 630090 Novosibirsk, Russia
| |
Collapse
|
3
|
Cassan O, Lecellier CH, Martin A, Bréhélin L, Lèbre S. Optimizing data integration improves gene regulatory network inference in Arabidopsis thaliana. Bioinformatics 2024; 40:btae415. [PMID: 38913855 PMCID: PMC11227367 DOI: 10.1093/bioinformatics/btae415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/12/2024] [Accepted: 06/21/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATIONS Gene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. RESULTS We address this issue for two regression-based GRN inference models, a weighted random forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. AVAILABILITY AND IMPLEMENTATION The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction.
Collapse
Affiliation(s)
- Océane Cassan
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
| | - Charles-Henri Lecellier
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IGMM, Univ Montpellier, CNRS, Montpellier, 34090, France
| | - Antoine Martin
- IPSIM, CNRS, INRAE, Institut Agro, Univ Montpellier, 34060, Montpellier, France
| | | | - Sophie Lèbre
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IMAG, Univ Montpellier, CNRS, Montpellier, 34090, France
- Université Paul-Valéry-Montpellier 3, Montpellier, 34090, France
| |
Collapse
|
4
|
Huo Q, Song R, Ma Z. Recent advances in exploring transcriptional regulatory landscape of crops. FRONTIERS IN PLANT SCIENCE 2024; 15:1421503. [PMID: 38903438 PMCID: PMC11188431 DOI: 10.3389/fpls.2024.1421503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/23/2024] [Indexed: 06/22/2024]
Abstract
Crop breeding entails developing and selecting plant varieties with improved agronomic traits. Modern molecular techniques, such as genome editing, enable more efficient manipulation of plant phenotype by altering the expression of particular regulatory or functional genes. Hence, it is essential to thoroughly comprehend the transcriptional regulatory mechanisms that underpin these traits. In the multi-omics era, a large amount of omics data has been generated for diverse crop species, including genomics, epigenomics, transcriptomics, proteomics, and single-cell omics. The abundant data resources and the emergence of advanced computational tools offer unprecedented opportunities for obtaining a holistic view and profound understanding of the regulatory processes linked to desirable traits. This review focuses on integrated network approaches that utilize multi-omics data to investigate gene expression regulation. Various types of regulatory networks and their inference methods are discussed, focusing on recent advancements in crop plants. The integration of multi-omics data has been proven to be crucial for the construction of high-confidence regulatory networks. With the refinement of these methodologies, they will significantly enhance crop breeding efforts and contribute to global food security.
Collapse
Affiliation(s)
| | | | - Zeyang Ma
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
5
|
Manosalva Pérez N, Ferrari C, Engelhorn J, Depuydt T, Nelissen H, Hartwig T, Vandepoele K. MINI-AC: inference of plant gene regulatory networks using bulk or single-cell accessible chromatin profiles. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 117:280-301. [PMID: 37788349 DOI: 10.1111/tpj.16483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 09/13/2023] [Accepted: 09/16/2023] [Indexed: 10/05/2023]
Abstract
Gene regulatory networks (GRNs) represent the interactions between transcription factors (TF) and their target genes. Plant GRNs control transcriptional programs involved in growth, development, and stress responses, ultimately affecting diverse agricultural traits. While recent developments in accessible chromatin (AC) profiling technologies make it possible to identify context-specific regulatory DNA, learning the underlying GRNs remains a major challenge. We developed MINI-AC (Motif-Informed Network Inference based on Accessible Chromatin), a method that combines AC data from bulk or single-cell experiments with TF binding site (TFBS) information to learn GRNs in plants. We benchmarked MINI-AC using bulk AC datasets from different Arabidopsis thaliana tissues and showed that it outperforms other methods to identify correct TFBS. In maize, a crop with a complex genome and abundant distal AC regions, MINI-AC successfully inferred leaf GRNs with experimentally confirmed, both proximal and distal, TF-target gene interactions. Furthermore, we showed that both AC regions and footprints are valid alternatives to infer AC-based GRNs with MINI-AC. Finally, we combined MINI-AC predictions from bulk and single-cell AC datasets to identify general and cell-type specific maize leaf regulators. Focusing on C4 metabolism, we identified diverse regulatory interactions in specialized cell types for this photosynthetic pathway. MINI-AC represents a powerful tool for inferring accurate AC-derived GRNs in plants and identifying known and novel candidate regulators, improving our understanding of gene regulation in plants.
Collapse
Affiliation(s)
- Nicolás Manosalva Pérez
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Camilla Ferrari
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Julia Engelhorn
- Molecular Physiology Department, Heinrich-Heine University, 40225, Düsseldorf, Germany
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
| | - Thomas Depuydt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Hilde Nelissen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Thomas Hartwig
- Molecular Physiology Department, Heinrich-Heine University, 40225, Düsseldorf, Germany
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
- Cluster of Excellence on Plant Sciences, Düsseldorf, Germany
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, 9052, Ghent, Belgium
| |
Collapse
|
6
|
Jin M, Shan Y, Peng Y, Wang W, Zhang H, Liu K, Heckel DG, Wu K, Tabashnik BE, Xiao Y. Downregulation of a transcription factor associated with resistance to Bt toxin Vip3Aa in the invasive fall armyworm. Proc Natl Acad Sci U S A 2023; 120:e2306932120. [PMID: 37874855 PMCID: PMC10622909 DOI: 10.1073/pnas.2306932120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/11/2023] [Indexed: 10/26/2023] Open
Abstract
Transgenic crops producing insecticidal proteins from Bacillus thuringiensis (Bt) have revolutionized control of some major pests. However, more than 25 cases of field-evolved practical resistance have reduced the efficacy of transgenic crops producing crystalline (Cry) Bt proteins, spurring adoption of alternatives including crops producing the Bt vegetative insecticidal protein Vip3Aa. Although practical resistance to Vip3Aa has not been reported yet, better understanding of the genetic basis of resistance to Vip3Aa is urgently needed to proactively monitor, delay, and counter pest resistance. This is especially important for fall armyworm (Spodoptera frugiperda), which has evolved practical resistance to Cry proteins and is one of the world's most damaging pests. Here, we report the identification of an association between downregulation of the transcription factor gene SfMyb and resistance to Vip3Aa in S. frugiperda. Results from a genome-wide association study, fine-scale mapping, and RNA-Seq identified this gene as a compelling candidate for contributing to the 206-fold resistance to Vip3Aa in a laboratory-selected strain. Experimental reduction of SfMyb expression in a susceptible strain using RNA interference (RNAi) or CRISPR/Cas9 gene editing decreased susceptibility to Vip3Aa, confirming that reduced expression of this gene can cause resistance to Vip3Aa. Relative to the wild-type promoter for SfMyb, the promoter in the resistant strain has deletions and lower activity. Data from yeast one-hybrid assays, genomics, RNA-Seq, RNAi, and proteomics identified genes that are strong candidates for mediating the effects of SfMyb on Vip3Aa resistance. The results reported here may facilitate progress in understanding and managing pest resistance to Vip3Aa.
Collapse
Affiliation(s)
- Minghui Jin
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518116, China
- The State Key Laboratory for Biology of Plant Disease and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing100193, China
| | - Yinxue Shan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518116, China
| | - Yan Peng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518116, China
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan430070, China
| | - Wenhui Wang
- The State Key Laboratory for Biology of Plant Disease and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing100193, China
| | - Huihui Zhang
- Institute of Entomology, School of Life Sciences, Central China Normal University, Wuhan430079, China
| | - Kaiyu Liu
- Institute of Entomology, School of Life Sciences, Central China Normal University, Wuhan430079, China
| | - David G. Heckel
- Department of Entomology, Max Planck Institute for Chemical Ecology, JenaD-07745, Germany
| | - Kongming Wu
- The State Key Laboratory for Biology of Plant Disease and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing100193, China
| | | | - Yutao Xiao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen518116, China
| |
Collapse
|
7
|
Chen Y, Guo Y, Guan P, Wang Y, Wang X, Wang Z, Qin Z, Ma S, Xin M, Hu Z, Yao Y, Ni Z, Sun Q, Guo W, Peng H. A wheat integrative regulatory network from large-scale complementary functional datasets enables trait-associated gene discovery for crop improvement. MOLECULAR PLANT 2023; 16:393-414. [PMID: 36575796 DOI: 10.1016/j.molp.2022.12.019] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/28/2022] [Accepted: 12/18/2022] [Indexed: 06/17/2023]
Abstract
Gene regulation is central to all aspects of organism growth, and understanding it using large-scale functional datasets can provide a whole view of biological processes controlling complex phenotypic traits in crops. However, the connection between massive functional datasets and trait-associated gene discovery for crop improvement is still lacking. In this study, we constructed a wheat integrative gene regulatory network (wGRN) by combining an updated genome annotation and diverse complementary functional datasets, including gene expression, sequence motif, transcription factor (TF) binding, chromatin accessibility, and evolutionarily conserved regulation. wGRN contains 7.2 million genome-wide interactions covering 5947 TFs and 127 439 target genes, which were further verified using known regulatory relationships, condition-specific expression, gene functional information, and experiments. We used wGRN to assign genome-wide genes to 3891 specific biological pathways and accurately prioritize candidate genes associated with complex phenotypic traits in genome-wide association studies. In addition, wGRN was used to enhance the interpretation of a spike temporal transcriptome dataset to construct high-resolution networks. We further unveiled novel regulators that enhance the power of spike phenotypic trait prediction using machine learning and contribute to the spike phenotypic differences among modern wheat accessions. Finally, we developed an interactive webserver, wGRN (http://wheat.cau.edu.cn/wGRN), for the community to explore gene regulation and discover trait-associated genes. Collectively, this community resource establishes the foundation for using large-scale functional datasets to guide trait-associated gene discovery for crop improvement.
Collapse
Affiliation(s)
- Yongming Chen
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yiwen Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Panfeng Guan
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yongfa Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Xiaobo Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zihao Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zhen Qin
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Shengwei Ma
- Hainan Yazhou Bay Seed Laboratory, Sanya, Hainan, China; State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Mingming Xin
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zhaorong Hu
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yingyin Yao
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zhongfu Ni
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Qixin Sun
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Weilong Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| | - Huiru Peng
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
8
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
9
|
Wang Y, Lee H, Fear JM, Berger I, Oliver B, Przytycka TM. NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Commun Biol 2022; 5:1282. [PMID: 36418514 PMCID: PMC9684490 DOI: 10.1038/s42003-022-04226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open
Abstract
The inference of Gene Regulatory Networks (GRNs) is one of the key challenges in systems biology. Leading algorithms utilize, in addition to gene expression, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of TF binding experiments. However, such prior knowledge is typically incomplete, therefore, integrating it with gene expression to infer GRNs remains difficult. To address this challenge, we introduce NetREX-CF-Regulatory Network Reconstruction using EXpression and Collaborative Filtering-a GRN reconstruction approach that brings together Collaborative Filtering to address the incompleteness of the prior knowledge and a biologically justified model of gene expression (sparse Network Component Analysis based model). We validated the NetREX-CF using Yeast data and then used it to construct the GRN for Drosophila Schneider 2 (S2) cells. To corroborate the GRN, we performed a large-scale RNA-Seq analysis followed by a high-throughput RNAi treatment against all 465 expressed TFs in the cell line. Our knockdown result has not only extensively validated the GRN we built, but also provides a benchmark that our community can use for evaluating GRNs. Finally, we demonstrate that NetREX-CF can infer GRNs using single-cell RNA-Seq, and outperforms other methods, by using previously published human data.
Collapse
Affiliation(s)
- Yijie Wang
- Computer Science Department, Indiana University, Bloomington, IN, 47408, USA.
| | - Hangnoh Lee
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Justin M Fear
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Isabelle Berger
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Brian Oliver
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA.
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA.
| |
Collapse
|
10
|
Heuts BMH, Arza-Apalategi S, Frölich S, Bergevoet SM, van den Oever SN, van Heeringen SJ, van der Reijden BA, Martens JHA. Identification of transcription factors dictating blood cell development using a bidirectional transcription network-based computational framework. Sci Rep 2022; 12:18656. [PMID: 36333382 PMCID: PMC9636203 DOI: 10.1038/s41598-022-21148-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 09/23/2022] [Indexed: 11/06/2022] Open
Abstract
Advanced computational methods exploit gene expression and epigenetic datasets to predict gene regulatory networks controlled by transcription factors (TFs). These methods have identified cell fate determining TFs but require large amounts of reference data and experimental expertise. Here, we present an easy to use network-based computational framework that exploits enhancers defined by bidirectional transcription, using as sole input CAGE sequencing data to correctly predict TFs key to various human cell types. Next, we applied this Analysis Algorithm for Networks Specified by Enhancers based on CAGE (ANANSE-CAGE) to predict TFs driving red and white blood cell development, and THP-1 leukemia cell immortalization. Further, we predicted TFs that are differentially important to either cell line- or primary- associated MLL-AF9-driven gene programs, and in primary MLL-AF9 acute leukemia. Our approach identified experimentally validated as well as thus far unexplored TFs in these processes. ANANSE-CAGE will be useful to identify transcription factors that are key to any cell fate change using only CAGE-seq data as input.
Collapse
Affiliation(s)
- B M H Heuts
- Department of Molecular Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA, Nijmegen, The Netherlands
| | - S Arza-Apalategi
- Department of Laboratory Medicine, Laboratory of Hematology, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University Medical Center, 6525 GA, Nijmegen, The Netherlands
| | - S Frölich
- Department of Molecular Developmental Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA, Nijmegen, The Netherlands
| | - S M Bergevoet
- Department of Laboratory Medicine, Laboratory of Hematology, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University Medical Center, 6525 GA, Nijmegen, The Netherlands
| | - S N van den Oever
- Department of Molecular Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA, Nijmegen, The Netherlands
| | - S J van Heeringen
- Department of Molecular Developmental Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA, Nijmegen, The Netherlands
| | - B A van der Reijden
- Department of Laboratory Medicine, Laboratory of Hematology, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University Medical Center, 6525 GA, Nijmegen, The Netherlands.
| | - J H A Martens
- Department of Molecular Biology, Faculty of Science, RIMLS, Radboud University, 6525 GA, Nijmegen, The Netherlands.
| |
Collapse
|
11
|
Hérault L, Poplineau M, Remy E, Duprez E. Single Cell Transcriptomics to Understand HSC Heterogeneity and Its Evolution upon Aging. Cells 2022; 11:3125. [PMID: 36231086 PMCID: PMC9563410 DOI: 10.3390/cells11193125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 09/15/2022] [Accepted: 09/27/2022] [Indexed: 11/16/2022] Open
Abstract
Single-cell transcriptomic technologies enable the uncovering and characterization of cellular heterogeneity and pave the way for studies aiming at understanding the origin and consequences of it. The hematopoietic system is in essence a very well adapted model system to benefit from this technological advance because it is characterized by different cellular states. Each cellular state, and its interconnection, may be defined by a specific location in the global transcriptional landscape sustained by a complex regulatory network. This transcriptomic signature is not fixed and evolved over time to give rise to less efficient hematopoietic stem cells (HSC), leading to a well-documented hematopoietic aging. Here, we review the advance of single-cell transcriptomic approaches for the understanding of HSC heterogeneity to grasp HSC deregulations upon aging. We also discuss the new bioinformatics tools developed for the analysis of the resulting large and complex datasets. Finally, since hematopoiesis is driven by fine-tuned and complex networks that must be interconnected to each other, we highlight how mathematical modeling is beneficial for doing such interconnection between multilayered information and to predict how HSC behave while aging.
Collapse
Affiliation(s)
- Léonard Hérault
- I2M, CNRS, Aix Marseille University, 13009 Marseille, France
- Epigenetic Factors in Normal and Malignant Hematopoiesis Lab., CRCM, CNRS, INSERM, Institut Paoli Calmettes, Aix Marseille University, 13009 Marseille, France
| | - Mathilde Poplineau
- Epigenetic Factors in Normal and Malignant Hematopoiesis Lab., CRCM, CNRS, INSERM, Institut Paoli Calmettes, Aix Marseille University, 13009 Marseille, France
- Equipe Labellisée Ligue Nationale Contre le Cancer, 75013 Paris, France
| | - Elisabeth Remy
- I2M, CNRS, Aix Marseille University, 13009 Marseille, France
| | - Estelle Duprez
- Epigenetic Factors in Normal and Malignant Hematopoiesis Lab., CRCM, CNRS, INSERM, Institut Paoli Calmettes, Aix Marseille University, 13009 Marseille, France
- Equipe Labellisée Ligue Nationale Contre le Cancer, 75013 Paris, France
| |
Collapse
|
12
|
Abstract
The question of the heritability of behavior has been of long fascination to scientists and the broader public. It is now widely accepted that most behavioral variation has a genetic component, although the degree of genetic influence differs widely across behaviors. Starting with Mendel's remarkable discovery of "inheritance factors," it has become increasingly clear that specific genetic variants that influence behavior can be identified. This goal is not without its challenges: Unlike pea morphology, most natural behavioral variation has a complex genetic architecture. However, we can now apply powerful genome-wide approaches to connect variation in DNA to variation in behavior as well as analyses of behaviorally related variation in brain gene expression, which together have provided insights into both the genetic mechanisms underlying behavior and the dynamic relationship between genes and behavior, respectively, in a wide range of species and for a diversity of behaviors. Here, we focus on two systems to illustrate both of these approaches: the genetic basis of burrowing in deer mice and transcriptomic analyses of division of labor in honey bees. Finally, we discuss the troubled relationship between the field of behavioral genetics and eugenics, which reminds us that we must be cautious about how we discuss and contextualize the connections between genes and behavior, especially in humans.
Collapse
Affiliation(s)
- Hopi E. Hoekstra
- Department of Organismic & Evolutionary Biology, Harvard University, Cambridge, MA 02138
- Department of Molecular & Cellular Biology, Harvard University, Cambridge, MA 02138
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138
- HHMI, Harvard University, Cambridge, MA 02138
| | - Gene E. Robinson
- Department of Entomology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
- Neuroscience Program, University of Illinois at Urbana–Champaign, Urbana, IL 61801
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana–Champaign, Urbana, IL 61801
| |
Collapse
|
13
|
Zorro-Aranda A, Escorcia-Rodríguez JM, González-Kise JK, Freyre-González JA. Curation, inference, and assessment of a globally reconstructed gene regulatory network for Streptomyces coelicolor. Sci Rep 2022; 12:2840. [PMID: 35181703 PMCID: PMC8857197 DOI: 10.1038/s41598-022-06658-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 01/31/2022] [Indexed: 12/12/2022] Open
Abstract
Streptomyces coelicolor A3(2) is a model microorganism for the study of Streptomycetes, antibiotic production, and secondary metabolism in general. Even though S. coelicolor has an outstanding variety of regulators among bacteria, little effort to globally study its transcription has been made. We manually curated 29 years of literature and databases to assemble a meta-curated experimentally-validated gene regulatory network (GRN) with 5386 genes and 9707 regulatory interactions (~ 41% of the total expected interactions). This provides the most extensive and up-to-date reconstruction available for the regulatory circuitry of this organism. Only ~ 6% (534/9707) are supported by experiments confirming the binding of the transcription factor to the upstream region of the target gene, the so-called “strong” evidence. While for the remaining interactions there is no confirmation of direct binding. To tackle network incompleteness, we performed network inference using several methods (including two proposed here) for motif identification in DNA sequences and GRN inference from transcriptomics. Further, we contrasted the structural properties and functional architecture of the networks to assess the reliability of the predictions, finding the inference from DNA sequence data to be the most trustworthy approach. Finally, we show two applications of the inferred and the curated networks. The inference allowed us to propose novel transcription factors for the key Streptomyces antibiotic regulatory proteins (SARPs). The curated network allowed us to study the conservation of the system-level components between S. coelicolor and Corynebacterium glutamicum. There we identified the basal machinery as the common signature between the two organisms. The curated networks were deposited in Abasy Atlas (https://abasy.ccg.unam.mx/) while the inferences are available as Supplementary Material.
Collapse
Affiliation(s)
- Andrea Zorro-Aranda
- Regulatory Systems Biology Research Group, Laboratory of Systems and Synthetic Biology, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México.,Bioprocess Research Group, Department of Chemical Engineering, Universidad de Antioquia, Calle 70 No. 52-21, Medellín, Colombia
| | - Juan Miguel Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Laboratory of Systems and Synthetic Biology, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México
| | - José Kenyi González-Kise
- Regulatory Systems Biology Research Group, Laboratory of Systems and Synthetic Biology, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México.,Undergraduate Program in Genomic Sciences, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México
| | - Julio Augusto Freyre-González
- Regulatory Systems Biology Research Group, Laboratory of Systems and Synthetic Biology, Center for Genomics Sciences, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col. Chamilpa, 62210, Cuernavaca, Morelos, México.
| |
Collapse
|
14
|
Deshpande A, Chu LF, Stewart R, Gitter A. Network inference with Granger causality ensembles on single-cell transcriptomics. Cell Rep 2022; 38:110333. [PMID: 35139376 PMCID: PMC9093087 DOI: 10.1016/j.celrep.2022.110333] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/19/2021] [Accepted: 01/12/2022] [Indexed: 12/20/2022] Open
Abstract
Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells' progress along a dynamic process based on their individual gene expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.
Collapse
Affiliation(s)
- Atul Deshpande
- Department of Electrical and Computer Engineering, University of Wisconsin - Madison, Madison, WI 53706, USA; Morgridge Institute for Research, Madison, WI 53715, USA
| | - Li-Fang Chu
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Anthony Gitter
- Morgridge Institute for Research, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53792, USA.
| |
Collapse
|
15
|
Zhang J, Ibrahim F, Najmulski E, Katholos G, Altarawy D, Heath LS, Tulin SL. Developmental gene regulatory network connections predicted by machine learning from gene expression data alone. PLoS One 2021; 16:e0261926. [PMID: 34962963 PMCID: PMC8714117 DOI: 10.1371/journal.pone.0261926] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Accepted: 12/14/2021] [Indexed: 12/13/2022] Open
Abstract
Gene regulatory network (GRN) inference can now take advantage of powerful machine learning algorithms to complement traditional experimental methods in building gene networks. However, the dynamical nature of embryonic development-representing the time-dependent interactions between thousands of transcription factors, signaling molecules, and effector genes-is one of the most challenging arenas for GRN prediction. In this work, we show that successful GRN predictions for a developmental network from gene expression data alone can be obtained with the Priors Enriched Absent Knowledge (PEAK) network inference algorithm. PEAK is a noise-robust method that models gene expression dynamics via ordinary differential equations and selects the best network based on information-theoretic criteria coupled with the machine learning algorithm Elastic Net. We test our GRN prediction methodology using two gene expression datasets for the purple sea urchin, Stronglyocentrotus purpuratus, and cross-check our results against existing GRN models that have been constructed and validated by over 30 years of experimental results. Our results find a remarkably high degree of sensitivity in identifying known gene interactions in the network (maximum 81.58%). We also generate novel predictions for interactions that have not yet been described, which provide a resource for researchers to use to further complete the sea urchin GRN. Published ChIPseq data and spatial co-expression analysis further support a subset of the top novel predictions. We conclude that GRN predictions that match known gene interactions can be produced using gene expression data alone from developmental time series experiments.
Collapse
Affiliation(s)
- Jingyi Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America
| | - Farhan Ibrahim
- Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America
| | - Emily Najmulski
- Department of Biology, Canisius College, Buffalo, NY, United States of America
| | - George Katholos
- Department of Biology, Canisius College, Buffalo, NY, United States of America
| | - Doaa Altarawy
- Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America
- Computer and Systems Engineering Department, Alexandria University, Alexandria, Egypt
| | - Lenwood S. Heath
- Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America
| | - Sarah L. Tulin
- Department of Biology, Canisius College, Buffalo, NY, United States of America
| |
Collapse
|
16
|
Saint-André V. Computational biology approaches for mapping transcriptional regulatory networks. Comput Struct Biotechnol J 2021; 19:4884-4895. [PMID: 34522292 PMCID: PMC8426465 DOI: 10.1016/j.csbj.2021.08.028] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 08/16/2021] [Accepted: 08/16/2021] [Indexed: 12/13/2022] Open
Abstract
Transcriptional Regulatory Networks (TRNs) are mainly responsible for the cell-type- or cell-state-specific expression of gene sets from the same DNA sequence. However, so far there are no precise maps of TRNs available for each cell-type or cell-state, and no ideal tool to map those networks clearly and in full from biological samples. In this review, major approaches and tools to map TRNs from high-throughput data are presented, depending on the type of methods or data used to infer them, and their advantages and limitations are discussed. After summarizing the main principles defining the topology and structure–function relationships in TRNs, an overview of the extensive work done to map TRNs from bulk transcriptomic data will be presented by type of methodological approach. Most recent modellings of TRNs using other types of molecular data or integrating different data types, including single-cell RNA-sequencing and chromatin information, will then be discussed, before briefly concluding with improvements expected to come in the field.
Collapse
Affiliation(s)
- Violaine Saint-André
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, Paris, France
| |
Collapse
|
17
|
Xu Q, Georgiou G, Frölich S, van der Sande M, Veenstra G, Zhou H, van Heeringen S. ANANSE: an enhancer network-based computational approach for predicting key transcription factors in cell fate determination. Nucleic Acids Res 2021; 49:7966-7985. [PMID: 34244796 PMCID: PMC8373078 DOI: 10.1093/nar/gkab598] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 06/02/2021] [Accepted: 06/28/2021] [Indexed: 12/21/2022] Open
Abstract
Proper cell fate determination is largely orchestrated by complex gene regulatory networks centered around transcription factors. However, experimental elucidation of key transcription factors that drive cellular identity is currently often intractable. Here, we present ANANSE (ANalysis Algorithm for Networks Specified by Enhancers), a network-based method that exploits enhancer-encoded regulatory information to identify the key transcription factors in cell fate determination. As cell type-specific transcription factors predominantly bind to enhancers, we use regulatory networks based on enhancer properties to prioritize transcription factors. First, we predict genome-wide binding profiles of transcription factors in various cell types using enhancer activity and transcription factor binding motifs. Subsequently, applying these inferred binding profiles, we construct cell type-specific gene regulatory networks, and then predict key transcription factors controlling cell fate transitions using differential networks between cell types. This method outperforms existing approaches in correctly predicting major transcription factors previously identified to be sufficient for trans-differentiation. Finally, we apply ANANSE to define an atlas of key transcription factors in 18 normal human tissues. In conclusion, we present a ready-to-implement computational tool for efficient prediction of transcription factors in cell fate determination and to study transcription factor-mediated regulatory mechanisms. ANANSE is freely available at https://github.com/vanheeringen-lab/ANANSE.
Collapse
Affiliation(s)
- Quan Xu
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| | - Georgios Georgiou
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| | - Siebren Frölich
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| | - Maarten van der Sande
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| | - Gert Jan C Veenstra
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| | - Huiqing Zhou
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
- Radboud University Medical Center, Department of Human Genetics, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| | - Simon J van Heeringen
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| |
Collapse
|
18
|
Benowitz KM, Coleman JM, Allan CW, Matzkin LM. Contributions of cis- and trans-Regulatory Evolution to Transcriptomic Divergence across Populations in the Drosophila mojavensis Larval Brain. Genome Biol Evol 2021; 12:1407-1418. [PMID: 32653899 PMCID: PMC7495911 DOI: 10.1093/gbe/evaa145] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/06/2020] [Indexed: 12/22/2022] Open
Abstract
Natural selection on gene expression was originally predicted to result primarily in cis- rather than trans-regulatory evolution, due to the expectation of reduced pleiotropy. Despite this, numerous studies have ascribed recent evolutionary divergence in gene expression predominantly to trans-regulation. Performing RNA-seq on single isofemale lines from genetically distinct populations of the cactophilic fly Drosophila mojavensis and their F1 hybrids, we recapitulated this pattern in both larval brains and whole bodies. However, we demonstrate that improving the measurement of brain expression divergence between populations by using seven additional genotypes considerably reduces the estimate of trans-regulatory contributions to expression evolution. We argue that the finding of trans-regulatory predominance can result from biases due to environmental variation in expression or other sources of noise, and that cis-regulation is likely a greater contributor to transcriptional evolution across D. mojavensis populations. Lastly, we merge these lines of data to identify several previously hypothesized and intriguing novel candidate genes, and suggest that the integration of regulatory and population-level transcriptomic data can provide useful filters for the identification of potentially adaptive genes.
Collapse
Affiliation(s)
| | - Joshua M Coleman
- Department of Entomology, University of Arizona.,Department of Biological Sciences, University of Alabama in Huntsville
| | | | - Luciano M Matzkin
- Department of Entomology, University of Arizona.,Department of Ecology and Evolutionary Biology, University of Arizona.,BIO5 Institute, University of Arizona
| |
Collapse
|
19
|
Asma H, Halfon MS. Annotating the Insect Regulatory Genome. INSECTS 2021; 12:591. [PMID: 34209769 PMCID: PMC8305585 DOI: 10.3390/insects12070591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/23/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
| | - Marc S. Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203, USA
| |
Collapse
|
20
|
De Clercq I, Van de Velde J, Luo X, Liu L, Storme V, Van Bel M, Pottie R, Vaneechoutte D, Van Breusegem F, Vandepoele K. Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators. NATURE PLANTS 2021; 7:500-513. [PMID: 33846597 DOI: 10.1038/s41477-021-00894-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 03/04/2021] [Indexed: 05/12/2023]
Abstract
Gene regulation is a dynamic process in which transcription factors (TFs) play an important role in controlling spatiotemporal gene expression. To enhance our global understanding of regulatory interactions in Arabidopsis thaliana, different regulatory input networks capturing complementary information about DNA motifs, open chromatin, TF-binding and expression-based regulatory interactions were combined using a supervised learning approach, resulting in an integrated gene regulatory network (iGRN) covering 1,491 TFs and 31,393 target genes (1.7 million interactions). This iGRN outperforms the different input networks to predict known regulatory interactions and has a similar performance to recover functional interactions compared to state-of-the-art experimental methods. The iGRN correctly inferred known functions for 681 TFs and predicted new gene functions for hundreds of unknown TFs. For regulators predicted to be involved in reactive oxygen species (ROS) stress regulation, we confirmed in total 75% of TFs with a function in ROS and/or physiological stress responses. This includes 13 ROS regulators, previously not connected to any ROS or stress function, that were experimentally validated in our ROS-specific phenotypic assays of loss- or gain-of-function lines. In conclusion, the presented iGRN offers a high-quality starting point to enhance our understanding of gene regulation in plants by integrating different experimental data types.
Collapse
Affiliation(s)
- Inge De Clercq
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
- VIB Center for Plant Systems Biology, Ghent, Belgium.
| | - Jan Van de Velde
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Xiaopeng Luo
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Li Liu
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Veronique Storme
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Michiel Van Bel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Robin Pottie
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Dries Vaneechoutte
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Frank Van Breusegem
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
- VIB Center for Plant Systems Biology, Ghent, Belgium.
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
| |
Collapse
|
21
|
Hackett SR, Baltz EA, Coram M, Wranik BJ, Kim G, Baker A, Fan M, Hendrickson DG, Berndl M, McIsaac RS. Learning causal networks using inducible transcription factors and transcriptome-wide time series. Mol Syst Biol 2021; 16:e9174. [PMID: 32181581 PMCID: PMC7076914 DOI: 10.15252/msb.20199174] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 02/13/2020] [Accepted: 02/19/2020] [Indexed: 11/27/2022] Open
Abstract
We present IDEA (the Induction Dynamics gene Expression Atlas), a dataset constructed by independently inducing hundreds of transcription factors (TFs) and measuring timecourses of the resulting gene expression responses in budding yeast. Each experiment captures a regulatory cascade connecting a single induced regulator to the genes it causally regulates. We discuss the regulatory cascade of a single TF, Aft1, in detail; however, IDEA contains > 200 TF induction experiments with 20 million individual observations and 100,000 signal‐containing dynamic responses. As an application of IDEA, we integrate all timecourses into a whole‐cell transcriptional model, which is used to predict and validate multiple new and underappreciated transcriptional regulators. We also find that the magnitudes of coefficients in this model are predictive of genetic interaction profile similarities. In addition to being a resource for exploring regulatory connectivity between TFs and their target genes, our modeling approach shows that combining rapid perturbations of individual genes with genome‐scale time‐series measurements is an effective strategy for elucidating gene regulatory networks.
Collapse
Affiliation(s)
| | | | | | | | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Adam Baker
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | | | | | | |
Collapse
|
22
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
23
|
Ludl AA, Michoel T. Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. Mol Omics 2021; 17:241-251. [PMID: 33438713 DOI: 10.1039/d0mo00140f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software package providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1012 segregants from a cross between two budding yeast strains, and the Yeastract database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, whereas instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses, and for genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.
Collapse
Affiliation(s)
- Adriaan-Alexander Ludl
- Computational Biology Unit, Department of Informatics, University of Bergen, PO Box 7803, 5020 Bergen, Norway.
| | | |
Collapse
|
24
|
Wu L, Han L, Li Q, Wang G, Zhang H, Li L. Using Interactome Big Data to Crack Genetic Mysteries and Enhance Future Crop Breeding. MOLECULAR PLANT 2021; 14:77-94. [PMID: 33340690 DOI: 10.1016/j.molp.2020.12.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 12/11/2020] [Accepted: 12/14/2020] [Indexed: 05/27/2023]
Abstract
The functional genes underlying phenotypic variation and their interactions represent "genetic mysteries". Understanding and utilizing these genetic mysteries are key solutions for mitigating the current threats to agriculture posed by population growth and individual food preferences. Due to advances in high-throughput multi-omics technologies, we are stepping into an Interactome Big Data era that is certain to revolutionize genetic research. In this article, we provide a brief overview of current strategies to explore genetic mysteries. We then introduce the methods for constructing and analyzing the Interactome Big Data and summarize currently available interactome resources. Next, we discuss how Interactome Big Data can be used as a versatile tool to dissect genetic mysteries. We propose an integrated strategy that could revolutionize genetic research by combining Interactome Big Data with machine learning, which involves mining information hidden in Big Data to identify the genetic models or networks that control various traits, and also provide a detailed procedure for systematic dissection of genetic mysteries,. Finally, we discuss three promising future breeding strategies utilizing the Interactome Big Data to improve crop yields and quality.
Collapse
Affiliation(s)
- Leiming Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Linqian Han
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Qing Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoying Wang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hongwei Zhang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
25
|
Rivera J, Keränen SVE, Gallo SM, Halfon MS. REDfly: the transcriptional regulatory element database for Drosophila. Nucleic Acids Res 2020; 47:D828-D834. [PMID: 30329093 PMCID: PMC6323911 DOI: 10.1093/nar/gky957] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 10/04/2018] [Indexed: 12/21/2022] Open
Abstract
The REDfly database provides a comprehensive curation of experimentally-validated Drosophila transcriptional cis-regulatory elements and includes information on DNA sequence, experimental evidence, patterns of regulated gene expression, and more. Now in its thirteenth year, REDfly has grown to over 23 000 records of tested reporter gene constructs and 2200 tested transcription factor binding sites. Recent developments include the start of curation of predicted cis-regulatory modules in addition to experimentally-verified ones, improved search and filtering, and increased interaction with the authors of curated papers. An expanded data model that will capture information on temporal aspects of gene regulation, regulation in response to environmental and other non-developmental cues, sexually dimorphic gene regulation, and non-endogenous (ectopic) aspects of reporter gene expression is under development and expected to be in place within the coming year. REDfly is freely accessible at http://redfly.ccr.buffalo.edu, and news about database updates and new features can be followed on Twitter at @REDfly_database.
Collapse
Affiliation(s)
- John Rivera
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA.,New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | | | - Steven M Gallo
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA.,New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | - Marc S Halfon
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biomedical Informatics, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
26
|
Pyne S, Kumar AR, Anand A. Rapid Reconstruction of Time-Varying Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:278-291. [PMID: 30072338 DOI: 10.1109/tcbb.2018.2861698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Rapid advancements in high-throughput technologies have resulted in genome-scale time series datasets. Uncovering the temporal sequence of gene regulatory events, in the form of time-varying gene regulatory networks (GRNs), demands computationally fast, accurate, and scalable algorithms. The existing algorithms can be divided into two categories: ones that are time-intensive and hence unscalable; and others that impose structural constraints to become scalable. In this paper, a novel algorithm, namely 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators' (TGS), is proposed. TGS is time-efficient and does not impose any structural constraints. Moreover, it provides such flexibility and time-efficiency, without losing its accuracy. TGS consistently outperforms the state-of-the-art algorithms in true positive detection, on three benchmark synthetic datasets. However, TGS does not perform as well in false positive rejection. To mitigate this issue, TGS+ is proposed. TGS+ demonstrates competitive false positive rejection power, while maintaining the superior speed and true positive detection power of TGS. Nevertheless, the main memory requirements of both TGS variants grow exponentially with the number of genes, which they tackle by restricting the maximum number of regulators for each gene. Relaxing this restriction remains a challenge as the actual number of regulators is not known a priori.
Collapse
|
27
|
Abstract
Inferring gene regulatory networks from expression data is a very challenging problem that has raised the interest of the scientific community. Different algorithms have been proposed to try to solve this issue, but it has been shown that different methods have some particular biases and strengths, and none of them is the best across all types of data and datasets. As a result, the idea of aggregating various network inferences through a consensus mechanism naturally arises. In this chapter, a common framework to standardize already proposed consensus methods is presented, and based on this framework different proposals are introduced and analyzed in two different scenarios: Homogeneous and Heterogeneous. The first scenario reflects situations where the networks to be aggregated are rather similar because they are obtained with inference algorithms working on the same data, whereas the second scenario deals with very diverse networks because various sources of data are used to generate the individual networks. A procedure for combining multiple network inference algorithms is analyzed in a systematic way. The results show that there is a very significant difference between these two scenarios, and that the best way to combine networks in the Heterogeneous scenario is not the most commonly used. We show in particular that aggregation in the Heterogeneous scenario can be very beneficial if the individual networks are combined with our new proposed method ScaleLSum.
Collapse
|
28
|
Ko Y, Kim J, Rodriguez-Zas SL. Markov chain Monte Carlo simulation of a Bayesian mixture model for gene network inference. Genes Genomics 2019; 41:547-555. [PMID: 30741379 DOI: 10.1007/s13258-019-00789-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 01/21/2019] [Indexed: 12/31/2022]
Abstract
BACKGROUND Simultaneous measurement of gene expression level for thousands of genes contains the rich information about many different aspects of biological mechanisms. A major computational challenge is to find methods to extract new biological insights from this wealth of data. Complex biological processes are often regulated under the various conditions or circumstances and associated gene interactions are dynamically changed depending on different biological contexts. Thus, inference of such dynamic relationships between genes with consideration of biological conditions is very challenging. METHOD In this study, we propose a comprehensive and integrated approach to infer the dynamic relationships between genes and evaluate this approach on three distinct gene networks. RESULTS This study demonstrates the advantage of integrating Markov chain Monte Carlo (MCMC) simulation into a Bayesian mixture model to overcome the high-dimension, low sample size (HDLSS) problem as well as to identify context-specific biological modules. Such biological modules were identified through the summarization of sampled network structures obtained from MCMC simulation. CONCLUSION This novel approach gives a comprehensive understanding of the dynamically regulated biological modules.
Collapse
Affiliation(s)
- Younhee Ko
- Division of Biomedical Engineering, Hankuk University of Foreign Studies, Gyeonggi-do, 17035, South Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea.
| | - Sandra L Rodriguez-Zas
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA.
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA.
| |
Collapse
|
29
|
Siahpirani AF, Chasman D, Roy S. Integrative Approaches for Inference of Genome-Scale Gene Regulatory Networks. Methods Mol Biol 2019; 1883:161-194. [PMID: 30547400 DOI: 10.1007/978-1-4939-8882-2_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Transcriptional regulatory networks specify the regulatory proteins of target genes that control the context-specific expression levels of genes. With our ability to profile the different types of molecular components of cells under different conditions, we are now uniquely positioned to infer regulatory networks in diverse biological contexts such as different cell types, tissues, and time points. In this chapter, we cover two main classes of computational methods to integrate different types of information to infer genome-scale transcriptional regulatory networks. The first class of methods focuses on integrative methods for specifically inferring connections between transcription factors and target genes by combining gene expression data with regulatory edge-specific knowledge. The second class of methods integrates upstream signaling networks with transcriptional regulatory networks by combining gene expression data with protein-protein interaction networks and proteomic datasets. We conclude with a section on practical applications of a network inference algorithm to infer a genome-scale regulatory network.
Collapse
Affiliation(s)
- Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.,Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA. .,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
30
|
Ng FSL, Ruau D, Wernisch L, Göttgens B. A graphical model approach visualizes regulatory relationships between genome-wide transcription factor binding profiles. Brief Bioinform 2018; 19:162-173. [PMID: 27780826 PMCID: PMC5496675 DOI: 10.1093/bib/bbw102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Indexed: 11/16/2022] Open
Abstract
Integrated analysis of multiple genome-wide transcription factor (TF)-binding profiles will be vital to advance our understanding of the global impact of TF binding. However, existing methods for measuring similarity in large numbers of chromatin immunoprecipitation assays with sequencing (ChIP-seq), such as correlation, mutual information or enrichment analysis, are limited in their ability to display functionally relevant TF relationships. In this study, we propose the use of graphical models to determine conditional independence between TFs and showed that network visualization provides a promising alternative to distinguish ‘direct’ versus ‘indirect’ TF interactions. We applied four algorithms to measure ‘direct’ dependence to a compendium of 367 mouse haematopoietic TF ChIP-seq samples and obtained a consensus network known as a ‘TF association network’ where edges in the network corresponded to likely causal pairwise relationships between TFs. The ‘TF association network’ illustrates the role of TFs in developmental pathways, is reminiscent of combinatorial TF regulation, corresponds to known protein–protein interactions and indicates substantial TF-binding reorganization in leukemic cell types. With the rapid increase in TF ChIP-Seq data sets, the approach presented here will be a powerful tool to study transcriptional programmes across a wide range of biological systems.
Collapse
Affiliation(s)
- Felicia S L Ng
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - David Ruau
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - Lorenz Wernisch
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - Berthold Göttgens
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
- Corresponding author: Berthold Gottgens, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge CB2 0XY, UK. Tel: 01223-336829; Fax: 01223-762670; E-mail:
| |
Collapse
|
31
|
Wang Y, Cho DY, Lee H, Fear J, Oliver B, Przytycka TM. Reprogramming of regulatory network using expression uncovers sex-specific gene regulation in Drosophila. Nat Commun 2018; 9:4061. [PMID: 30283019 PMCID: PMC6170494 DOI: 10.1038/s41467-018-06382-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 08/13/2018] [Indexed: 02/07/2023] Open
Abstract
Gene regulatory networks (GRNs) describe regulatory relationships between transcription factors (TFs) and their target genes. Computational methods to infer GRNs typically combine evidence across different conditions to infer context-agnostic networks. We develop a method, Network Reprogramming using EXpression (NetREX), that constructs a context-specific GRN given context-specific expression data and a context-agnostic prior network. NetREX remodels the prior network to obtain the topology that provides the best explanation for expression data. Because NetREX utilizes prior network topology, we also develop PriorBoost, a method that evaluates a prior network in terms of its consistency with the expression data. We validate NetREX and PriorBoost using the "gold standard" E. coli GRN from the DREAM5 network inference challenge and apply them to construct sex-specific Drosophila GRNs. NetREX constructed sex-specific Drosophila GRNs that, on all applied measures, outperform networks obtained from other methods indicating that NetREX is an important milestone toward building more accurate GRNs.
Collapse
Affiliation(s)
- Yijie Wang
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Dong-Yeon Cho
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Hangnoh Lee
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Justin Fear
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Brian Oliver
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA.
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA.
| |
Collapse
|
32
|
Corso-Díaz X, Jaeger C, Chaitankar V, Swaroop A. Epigenetic control of gene regulation during development and disease: A view from the retina. Prog Retin Eye Res 2018; 65:1-27. [PMID: 29544768 PMCID: PMC6054546 DOI: 10.1016/j.preteyeres.2018.03.002] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Revised: 02/01/2018] [Accepted: 03/08/2018] [Indexed: 12/20/2022]
Abstract
Complex biological processes, such as organogenesis and homeostasis, are stringently regulated by genetic programs that are fine-tuned by epigenetic factors to establish cell fates and/or to respond to the microenvironment. Gene regulatory networks that guide cell differentiation and function are modulated and stabilized by modifications to DNA, RNA and proteins. In this review, we focus on two key epigenetic changes - DNA methylation and histone modifications - and discuss their contribution to retinal development, aging and disease, especially in the context of age-related macular degeneration (AMD) and diabetic retinopathy. We highlight less-studied roles of DNA methylation and provide the RNA expression profiles of epigenetic enzymes in human and mouse retina in comparison to other tissues. We also review computational tools and emergent technologies to profile, analyze and integrate epigenetic information. We suggest implementation of editing tools and single-cell technologies to trace and perturb the epigenome for delineating its role in transcriptional regulation. Finally, we present our thoughts on exciting avenues for exploring epigenome in retinal metabolism, disease modeling, and regeneration.
Collapse
Affiliation(s)
- Ximena Corso-Díaz
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Catherine Jaeger
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Vijender Chaitankar
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anand Swaroop
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
33
|
Liu ZP. Towards precise reconstruction of gene regulatory networks by data integration. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-018-0139-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
34
|
Yang B, Wittkopp PJ. Structure of the Transcriptional Regulatory Network Correlates with Regulatory Divergence in Drosophila. Mol Biol Evol 2017; 34:1352-1362. [PMID: 28333240 DOI: 10.1093/molbev/msx068] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Transcriptional control of gene expression is regulated by biochemical interactions between cis-regulatory DNA sequences and trans-acting factors that form complex regulatory networks. Genetic changes affecting both cis- and trans-acting sequences in these networks have been shown to alter patterns of gene expression as well as higher-order organismal phenotypes. Here, we investigate how the structure of these regulatory networks relates to patterns of polymorphism and divergence in gene expression. To do this, we compared a transcriptional regulatory network inferred for Drosophila melanogaster to differences in gene regulation observed between two strains of D. melanogaster as well as between two pairs of closely related species: Drosophila sechellia and Drosophila simulans, and D. simulans and D. melanogaster. We found that the number of transcription factors predicted to directly regulate a gene ("in-degree") was negatively correlated with divergence in both gene expression (mRNA abundance) and cis-regulation. This observation suggests that the number of transcription factors directly regulating a gene's expression affects the conservation of cis-regulation and gene expression over evolutionary time. We also tested the hypothesis that transcription factors regulating more target genes (higher "out-degree") are less likely to evolve changes in their cis-regulation and expression (presumably due to increased pleiotropy), but found little support for this predicted relationship. Taken together, these data show how the architecture of regulatory networks can influence regulatory evolution.
Collapse
Affiliation(s)
- Bing Yang
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI
| | - Patricia J Wittkopp
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI.,Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| |
Collapse
|
35
|
Kang Y, Liow HH, Maier EJ, Brent MR. NetProphet 2.0: mapping transcription factor networks by exploiting scalable data resources. Bioinformatics 2017; 34:249-257. [PMID: 28968736 PMCID: PMC5860202 DOI: 10.1093/bioinformatics/btx563] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 03/14/2017] [Accepted: 09/11/2017] [Indexed: 11/15/2022] Open
Abstract
Motivation Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and ‘integrative’ algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. Results We present NetProphet 2.0, a ‘data light’ algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. Availability and implementation Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiming Kang
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Hien-Haw Liow
- Department of Mathematics, Washington University, Saint Louis, MO, USA
| | - Ezekiel J Maier
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Michael R Brent
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| |
Collapse
|
36
|
Inference and interrogation of a coregulatory network in the context of lipid accumulation in Yarrowia lipolytica. NPJ Syst Biol Appl 2017; 3:21. [PMID: 28955503 PMCID: PMC5554221 DOI: 10.1038/s41540-017-0024-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 07/07/2017] [Accepted: 07/13/2017] [Indexed: 12/14/2022] Open
Abstract
Complex phenotypes, such as lipid accumulation, result from cooperativity between regulators and the integration of multiscale information. However, the elucidation of such regulatory programs by experimental approaches may be challenging, particularly in context-specific conditions. In particular, we know very little about the regulators of lipid accumulation in the oleaginous yeast of industrial interest Yarrowia lipolytica. This lack of knowledge limits the development of this yeast as an industrial platform, due to the time-consuming and costly laboratory efforts required to design strains with the desired phenotypes. In this study, we aimed to identify context-specific regulators and mechanisms, to guide explorations of the regulation of lipid accumulation in Y. lipolytica. Using gene regulatory network inference, and considering the expression of 6539 genes over 26 time points from GSE35447 for biolipid production and a list of 151 transcription factors, we reconstructed a gene regulatory network comprising 111 transcription factors, 4451 target genes and 17048 regulatory interactions (YL-GRN-1) supported by evidence of protein-protein interactions. This study, based on network interrogation and wet laboratory validation (a) highlights the relevance of our proposed measure, the transcription factors influence, for identifying phases corresponding to changes in physiological state without prior knowledge (b) suggests new potential regulators and drivers of lipid accumulation and
Collapse
|
37
|
Reverse engineering highlights potential principles of large gene regulatory network design and learning. NPJ Syst Biol Appl 2017. [PMID: 28649444 PMCID: PMC5481436 DOI: 10.1038/s41540-017-0019-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology, with potential impacts ranging from medicine to agronomy. There are several techniques used presently to experimentally assay transcription factors to target relationships, defining important information about real gene regulatory networks connections. These techniques include classical ChIP-seq, yeast one-hybrid, or more recently, DAP-seq or target technologies. These techniques are usually used to validate algorithm predictions. Here, we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms. First, we developed a gene regulatory networks-simulating engine called FRANK (Fast Randomizing Algorithm for Network Knowledge) that is able to simulate large gene regulatory networks (containing 104 genes) with characteristics of gene regulatory networks observed in vivo. FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks. The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties (built ex nihilo). In combination with supervised (accepting prior knowledge) support vector machine algorithm we (i) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning, and (ii) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future. By demonstrating that our predictions concerning the influence of the prior-knowledge structure on support vector machine learning capacity holds true on real data (Escherichia coli K14 network reconstruction using network and transcriptomic data), we show that the formalism used to build FRANK can to some extent be a reasonable model for gene regulatory networks in real cells. This work by Carré et al addresses central questions in biology, which are: how very large gene regulatory networks (GRNs) are organized, generate stable gene expression, and can be learnt using machine learning algorithms? In this work authors developed an algorithm able to simulate large GRNs. From these networks they simulate stable or oscillating gene expression and highlights some mathematical rules controlling such a collective (several thousands of genes) behavior. They discuss consequent hypothesis concerning the organization of GRNs in real cells. Using this simulation tool, authors also demonstrate that it’s likely possible to computationally learn GRNs from transcriptomic data and prior knowledge on the network (actual known connections issued from Yeast One Hybrid or ChIP Seq for instance). They particularly highlight the crucial importance of the prior knowledge structure in their capacity to learn large GRNs.
Collapse
|
38
|
Banf M, Rhee SY. Enhancing gene regulatory network inference through data integration with markov random fields. Sci Rep 2017; 7:41174. [PMID: 28145456 PMCID: PMC5286517 DOI: 10.1038/srep41174] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 12/16/2016] [Indexed: 02/06/2023] Open
Abstract
A gene regulatory network links transcription factors to their target genes and represents a map of transcriptional regulation. Much progress has been made in deciphering gene regulatory networks computationally. However, gene regulatory network inference for most eukaryotic organisms remain challenging. To improve the accuracy of gene regulatory network inference and facilitate candidate selection for experimentation, we developed an algorithm called GRACE (Gene Regulatory network inference ACcuracy Enhancement). GRACE exploits biological a priori and heterogeneous data integration to generate high- confidence network predictions for eukaryotic organisms using Markov Random Fields in a semi-supervised fashion. GRACE uses a novel optimization scheme to integrate regulatory evidence and biological relevance. It is particularly suited for model learning with sparse regulatory gold standard data. We show GRACE’s potential to produce high confidence regulatory networks compared to state of the art approaches using Drosophila melanogaster and Arabidopsis thaliana data. In an A. thaliana developmental gene regulatory network, GRACE recovers cell cycle related regulatory mechanisms and further hypothesizes several novel regulatory links, including a putative control mechanism of vascular structure formation due to modifications in cell proliferation.
Collapse
Affiliation(s)
- Michael Banf
- Department of Plant Biology, Carnegie Institution for Science, 93405 Stanford, USA
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, 93405 Stanford, USA
| |
Collapse
|
39
|
Qin J, Yan B, Hu Y, Wang P, Wang J. Applications of integrative OMICs approaches to gene regulation studies. QUANTITATIVE BIOLOGY 2016. [DOI: 10.1007/s40484-016-0085-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
40
|
Banf M, Rhee SY. Computational inference of gene regulatory networks: Approaches, limitations and opportunities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1860:41-52. [PMID: 27641093 DOI: 10.1016/j.bbagrm.2016.09.003] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 09/08/2016] [Accepted: 09/08/2016] [Indexed: 10/21/2022]
Abstract
Gene regulatory networks lie at the core of cell function control. In E. coli and S. cerevisiae, the study of gene regulatory networks has led to the discovery of regulatory mechanisms responsible for the control of cell growth, differentiation and responses to environmental stimuli. In plants, computational rendering of gene regulatory networks is gaining momentum, thanks to the recent availability of high-quality genomes and transcriptomes and development of computational network inference approaches. Here, we review current techniques, challenges and trends in gene regulatory network inference and highlight challenges and opportunities for plant science. We provide plant-specific application examples to guide researchers in selecting methodologies that suit their particular research questions. Given the interdisciplinary nature of gene regulatory network inference, we tried to cater to both biologists and computer scientists to help them engage in a dialogue about concepts and caveats in network inference. Specifically, we discuss problems and opportunities in heterogeneous data integration for eukaryotic organisms and common caveats to be considered during network model evaluation. This article is part of a Special Issue entitled: Plant Gene Regulatory Mechanisms and Networks, edited by Dr. Erich Grotewold and Dr. Nathan Springer.
Collapse
Affiliation(s)
- Michael Banf
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| |
Collapse
|
41
|
Utilizing Regulatory Networks for Pluripotency Assessment in Stem Cells. CURRENT STEM CELL REPORTS 2016. [DOI: 10.1007/s40778-016-0054-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
42
|
Van de Velde J, Van Bel M, Vaneechoutte D, Vandepoele K. A Collection of Conserved Noncoding Sequences to Study Gene Regulation in Flowering Plants. PLANT PHYSIOLOGY 2016; 171:2586-98. [PMID: 27261064 PMCID: PMC4972296 DOI: 10.1104/pp.16.00821] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 05/31/2016] [Indexed: 05/03/2023]
Abstract
Transcription factors (TFs) regulate gene expression by binding cis-regulatory elements, of which the identification remains an ongoing challenge owing to the prevalence of large numbers of nonfunctional TF binding sites. Powerful comparative genomics methods, such as phylogenetic footprinting, can be used for the detection of conserved noncoding sequences (CNSs), which are functionally constrained and can greatly help in reducing the number of false-positive elements. In this study, we applied a phylogenetic footprinting approach for the identification of CNSs in 10 dicot plants, yielding 1,032,291 CNSs associated with 243,187 genes. To annotate CNSs with TF binding sites, we made use of binding site information for 642 TFs originating from 35 TF families in Arabidopsis (Arabidopsis thaliana). In three species, the identified CNSs were evaluated using TF chromatin immunoprecipitation sequencing data, resulting in significant overlap for the majority of data sets. To identify ultraconserved CNSs, we included genomes of additional plant families and identified 715 binding sites for 501 genes conserved in dicots, monocots, mosses, and green algae. Additionally, we found that genes that are part of conserved mini-regulons have a higher coherence in their expression profile than other divergent gene pairs. All identified CNSs were integrated in the PLAZA 3.0 Dicots comparative genomics platform (http://bioinformatics.psb.ugent.be/plaza/versions/plaza_v3_dicots/) together with new functionalities facilitating the exploration of conserved cis-regulatory elements and their associated genes. The availability of this data set in a user-friendly platform enables the exploration of functional noncoding DNA to study gene regulation in a variety of plant species, including crops.
Collapse
Affiliation(s)
- Jan Van de Velde
- Department of Plant Systems Biology, Vlaams Instituut voor Biotechnologie, B-9052 Ghent, Belgium (J.V.d.V., M.V.B., D.V., K.V.); andDepartment of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium (J.V.d.V., M.V.B., D.V., K.V.)
| | - Michiel Van Bel
- Department of Plant Systems Biology, Vlaams Instituut voor Biotechnologie, B-9052 Ghent, Belgium (J.V.d.V., M.V.B., D.V., K.V.); andDepartment of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium (J.V.d.V., M.V.B., D.V., K.V.)
| | - Dries Vaneechoutte
- Department of Plant Systems Biology, Vlaams Instituut voor Biotechnologie, B-9052 Ghent, Belgium (J.V.d.V., M.V.B., D.V., K.V.); andDepartment of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium (J.V.d.V., M.V.B., D.V., K.V.)
| | - Klaas Vandepoele
- Department of Plant Systems Biology, Vlaams Instituut voor Biotechnologie, B-9052 Ghent, Belgium (J.V.d.V., M.V.B., D.V., K.V.); andDepartment of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium (J.V.d.V., M.V.B., D.V., K.V.)
| |
Collapse
|
43
|
Mostovoy Y, Thiemicke A, Hsu TY, Brem RB. The Role of Transcription Factors at Antisense-Expressing Gene Pairs in Yeast. Genome Biol Evol 2016; 8:1748-61. [PMID: 27190003 PMCID: PMC4943177 DOI: 10.1093/gbe/evw104] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genes encoded close to one another on the chromosome are often coexpressed, by a mechanism and regulatory logic that remain poorly understood. We surveyed the yeast genome for tandem gene pairs oriented tail-to-head at which expression antisense to the upstream gene was conserved across species. The intergenic region at most such tandem pairs is a bidirectional promoter, shared by the downstream gene mRNA and the upstream antisense transcript. Genomic analyses of these intergenic loci revealed distinctive patterns of transcription factor regulation. Mutation of a given transcription factor verified its role as a regulator in trans of tandem gene pair loci, including the proximally initiating upstream antisense transcript and downstream mRNA and the distally initiating upstream mRNA. To investigate cis-regulatory activity at such a locus, we focused on the stress-induced NAD(P)H dehydratase YKL151C and its downstream neighbor, the metabolic enzyme GPM1. Previous work has implicated the region between these genes in regulation of GPM1 expression; our mutation experiments established its function in rich medium as a repressor in cis of the distally initiating YKL151C sense RNA, and an activator of the proximally initiating YKL151C antisense RNA. Wild-type expression of all three transcripts required the transcription factor Gcr2. Thus, at this locus, the intergenic region serves as a focal point of regulatory input, driving antisense expression and mediating the coordinated regulation of YKL151C and GPM1. Together, our findings implicate transcription factors in the joint control of neighboring genes specialized to opposing conditions and the antisense transcripts expressed between them.
Collapse
Affiliation(s)
- Yulia Mostovoy
- Department of Molecular and Cell Biology, University of California, Berkeley, California Present address: Cardiovascular Research Institute, University of California, San Francisco, CA
| | - Alexander Thiemicke
- Department of Molecular and Cell Biology, University of California, Berkeley, California Program in Molecular Medicine, Friedrich-Schiller-Universität, Jena, Germany Present address: Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN
| | - Tiffany Y Hsu
- Department of Molecular and Cell Biology, University of California, Berkeley, California Present address: Graduate Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA
| | - Rachel B Brem
- Department of Molecular and Cell Biology, University of California, Berkeley, California Present address: Buck Institute for Research on Aging, Novato, CA
| |
Collapse
|
44
|
Chaitankar V, Karakülah G, Ratnapriya R, Giuste FO, Brooks MJ, Swaroop A. Next generation sequencing technology and genomewide data analysis: Perspectives for retinal research. Prog Retin Eye Res 2016; 55:1-31. [PMID: 27297499 DOI: 10.1016/j.preteyeres.2016.06.001] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 06/06/2016] [Accepted: 06/08/2016] [Indexed: 02/08/2023]
Abstract
The advent of high throughput next generation sequencing (NGS) has accelerated the pace of discovery of disease-associated genetic variants and genomewide profiling of expressed sequences and epigenetic marks, thereby permitting systems-based analyses of ocular development and disease. Rapid evolution of NGS and associated methodologies presents significant challenges in acquisition, management, and analysis of large data sets and for extracting biologically or clinically relevant information. Here we illustrate the basic design of commonly used NGS-based methods, specifically whole exome sequencing, transcriptome, and epigenome profiling, and provide recommendations for data analyses. We briefly discuss systems biology approaches for integrating multiple data sets to elucidate gene regulatory or disease networks. While we provide examples from the retina, the NGS guidelines reviewed here are applicable to other tissues/cell types as well.
Collapse
Affiliation(s)
- Vijender Chaitankar
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD, 20892-0610, USA
| | - Gökhan Karakülah
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD, 20892-0610, USA
| | - Rinki Ratnapriya
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD, 20892-0610, USA
| | - Felipe O Giuste
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD, 20892-0610, USA
| | - Matthew J Brooks
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD, 20892-0610, USA
| | - Anand Swaroop
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, National Institutes of Health, 6 Center Drive, Bethesda, MD, 20892-0610, USA.
| |
Collapse
|
45
|
Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods 2016; 13:366-70. [PMID: 26950747 DOI: 10.1038/nmeth.3799] [Citation(s) in RCA: 200] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 01/26/2016] [Indexed: 12/22/2022]
Abstract
Mapping perturbed molecular circuits that underlie complex diseases remains a great challenge. We developed a comprehensive resource of 394 cell type- and tissue-specific gene regulatory networks for human, each specifying the genome-wide connectivity among transcription factors, enhancers, promoters and genes. Integration with 37 genome-wide association studies (GWASs) showed that disease-associated genetic variants--including variants that do not reach genome-wide significance--often perturb regulatory modules that are highly specific to disease-relevant cell types or tissues. Our resource opens the door to systematic analysis of regulatory programs across hundreds of human cell types and tissues (http://regulatorycircuits.org).
Collapse
|
46
|
Joshi A, Beck Y, Michoel T. Multi-species network inference improves gene regulatory network reconstruction for early embryonic development in Drosophila. J Comput Biol 2016; 22:253-65. [PMID: 25844666 DOI: 10.1089/cmb.2014.0290] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Gene regulatory network inference uses genome-wide transcriptome measurements in response to genetic, environmental, or dynamic perturbations to predict causal regulatory influences between genes. We hypothesized that evolution also acts as a suitable network perturbation and that integration of data from multiple closely related species can lead to improved reconstruction of gene regulatory networks. To test this hypothesis, we predicted networks from temporal gene expression data for 3,610 genes measured during early embryonic development in six Drosophila species and compared predicted networks to gold standard networks of ChIP-chip and ChIP-seq interactions for developmental transcription factors in five species. We found that (i) the performance of single-species networks was independent of the species where the gold standard was measured; (ii) differences between predicted networks reflected the known phylogeny and differences in biology between the species; (iii) an integrative consensus network that minimized the total number of edge gains and losses with respect to all single-species networks performed better than any individual network. Our results show that in an evolutionarily conserved system, integration of data from comparable experiments in multiple species improves the inference of gene regulatory networks. They provide a basis for future studies on the numerous multispecies gene expression datasets for other biological processes available in the literature.
Collapse
Affiliation(s)
- Anagha Joshi
- 1 Division of Developmental Biology, The Roslin Institute, The University of Edinburgh , Midlothian, Scotland, United Kingdom
| | | | | |
Collapse
|
47
|
Ong E, Szedlak A, Kang Y, Smith P, Smith N, McBride M, Finlay D, Vuori K, Mason J, Ball ED, Piermarocchi C, Paternostro G. A scalable method for molecular network reconstruction identifies properties of targets and mutations in acute myeloid leukemia. J Comput Biol 2016; 22:266-88. [PMID: 25844667 DOI: 10.1089/cmb.2014.0297] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
A key aim of systems biology is the reconstruction of molecular networks. We do not yet, however, have networks that integrate information from all datasets available for a particular clinical condition. This is in part due to the limited scalability, in terms of required computational time and power, of existing algorithms. Network reconstruction methods should also be scalable in the sense of allowing scientists from different backgrounds to efficiently integrate additional data. We present a network model of acute myeloid leukemia (AML). In the current version (AML 2.1), we have used gene expression data (both microarray and RNA-seq) from 5 different studies comprising a total of 771 AML samples and a protein-protein interactions dataset. Our scalable network reconstruction method is in part based on the well-known property of gene expression correlation among interacting molecules. The difficulty of distinguishing between direct and indirect interactions is addressed by optimizing the coefficient of variation of gene expression, using a validated gold-standard dataset of direct interactions. Computational time is much reduced compared to other network reconstruction methods. A key feature is the study of the reproducibility of interactions found in independent clinical datasets. An analysis of the most significant clusters, and of the network properties (intraset efficiency, degree, betweenness centrality, and PageRank) of common AML mutations demonstrated the biological significance of the network. A statistical analysis of the response of blast cells from 11 AML patients to a library of kinase inhibitors provided an experimental validation of the network. A combination of network and experimental data identified CDK1, CDK2, CDK4, and CDK6 and other kinases as potential therapeutic targets in AML.
Collapse
|
48
|
Experimental and Computational Considerations in the Study of RNA-Binding Protein-RNA Interactions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 907:1-28. [PMID: 27256380 DOI: 10.1007/978-3-319-29073-7_1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
After an RNA is transcribed, it undergoes a variety of processing steps that can change the encoded protein sequence (through alternative splicing and RNA editing), regulate the stability of the RNA, and control subcellular localization, timing, and rate of translation. The recent explosion in genomics techniques has enabled transcriptome-wide profiling of RNA processing in an unbiased manner. However, it has also brought with it both experimental challenges in developing improved methods to probe distinct processing steps, as well as computational challenges in data storage, processing, and analysis tools to enable large-scale interpretation in the genomics era. In this chapter we review experimental techniques and challenges in profiling various aspects of RNA processing, as well as recent efforts to develop analyses integrating multiple data sources and techniques to infer RNA regulatory networks.
Collapse
|
49
|
Roy S, Siahpirani AF, Chasman D, Knaack S, Ay F, Stewart R, Wilson M, Sridharan R. A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Res 2015; 43:8694-712. [PMID: 26338778 PMCID: PMC4605315 DOI: 10.1093/nar/gkv865] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Revised: 08/16/2015] [Accepted: 08/17/2015] [Indexed: 01/28/2023] Open
Abstract
Long range regulatory interactions among distal enhancers and target genes are important for tissue-specific gene expression. Genome-scale identification of these interactions in a cell line-specific manner, especially using the fewest possible datasets, is a significant challenge. We develop a novel computational approach, Regulatory Interaction Prediction for Promoters and Long-range Enhancers (RIPPLE), that integrates published Chromosome Conformation Capture (3C) data sets with a minimal set of regulatory genomic data sets to predict enhancer-promoter interactions in a cell line-specific manner. Our results suggest that CTCF, RAD21, a general transcription factor (TBP) and activating chromatin marks are important determinants of enhancer-promoter interactions. To predict interactions in a new cell line and to generate genome-wide interaction maps, we develop an ensemble version of RIPPLE and apply it to generate interactions in five human cell lines. Computational validation of these predictions using existing ChIA-PET and Hi-C data sets showed that RIPPLE accurately predicts interactions among enhancers and promoters. Enhancer-promoter interactions tend to be organized into subnetworks representing coordinately regulated sets of genes that are enriched for specific biological processes and cis-regulatory elements. Overall, our work provides a systematic approach to predict and interpret enhancer-promoter interactions in a genome-wide cell-type specific manner using a few experimentally tractable measurements.
Collapse
Affiliation(s)
- Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI, USA
| | | | - Deborah Chasman
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI, USA
| | - Sara Knaack
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI, USA
| | - Ferhat Ay
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Michael Wilson
- Genetics & Genome Biology Program, Hospital for Sick Children (SickKids) and Department of Molecular Genetics, University of Toronto,Toronto, ON, Canada Department of Molecular Genetics, University of Toronto, ON, Canada
| | - Rupa Sridharan
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA Department of Cell and Regenerative biology, University of Wisconsin, Madison, WI 53715, USA
| |
Collapse
|
50
|
Bellot P, Olsen C, Salembier P, Oliveras-Vergés A, Meyer PE. NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference. BMC Bioinformatics 2015; 16:312. [PMID: 26415849 PMCID: PMC4587916 DOI: 10.1186/s12859-015-0728-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 09/06/2015] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. RESULTS Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. CONCLUSIONS The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.
Collapse
Affiliation(s)
- Pau Bellot
- Universitat Politecnica de Catalunya BarcelonaTECH, Department of Signal Theory and Communications, UPC-Campus Nord, C/ Jordi Girona, 1-3, Barcelona, 08034, Spain.
- Bioinformatics and Systems Biology (BioSys), Faculty of Sciences, Université de Liège (ULg), 27 Blvd du Rectorat, Liège, 4000, Belgium.
| | - Catharina Olsen
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium.
- Interuniversity Institute of Bioinformatics Brussels, (IB)², Brussels, Belgium.
| | - Philippe Salembier
- Universitat Politecnica de Catalunya BarcelonaTECH, Department of Signal Theory and Communications, UPC-Campus Nord, C/ Jordi Girona, 1-3, Barcelona, 08034, Spain.
| | - Albert Oliveras-Vergés
- Universitat Politecnica de Catalunya BarcelonaTECH, Department of Signal Theory and Communications, UPC-Campus Nord, C/ Jordi Girona, 1-3, Barcelona, 08034, Spain.
| | - Patrick E Meyer
- Bioinformatics and Systems Biology (BioSys), Faculty of Sciences, Université de Liège (ULg), 27 Blvd du Rectorat, Liège, 4000, Belgium.
| |
Collapse
|