1
|
Wu W, Ma X, Wang Q, Gong M, Gao Q. Learning deep representation and discriminative features for clustering of multi-layer networks. Neural Netw 2024; 170:405-416. [PMID: 38029721 DOI: 10.1016/j.neunet.2023.11.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/29/2023] [Accepted: 11/22/2023] [Indexed: 12/01/2023]
Abstract
The multi-layer network consists of the interactions between different layers, where each layer of the network is depicted as a graph, providing a comprehensive way to model the underlying complex systems. The layer-specific modules of multi-layer networks are critical to understanding the structure and function of the system. However, existing methods fail to characterize and balance the connectivity and specificity of layer-specific modules in networks because of the complicated inter- and intra-coupling of various layers. To address the above issues, a joint learning graph clustering algorithm (DRDF) for detecting layer-specific modules in multi-layer networks is proposed, which simultaneously learns the deep representation and discriminative features. Specifically, DRDF learns the deep representation with deep nonnegative matrix factorization, where the high-order topology of the multi-layer network is gradually and precisely characterized. Moreover, it addresses the specificity of modules with discriminative feature learning, where the intra-class compactness and inter-class separation of pseudo-labels of clusters are explored as self-supervised information, thereby providing a more accurate method to explicitly model the specificity of the multi-layer network. Finally, DRDF balances the connectivity and specificity of layer-specific modules with joint learning, where the overall objective of the graph clustering algorithm and optimization rules are derived. The experiments on ten multi-layer networks showed that DRDF not only outperforms eight baselines on graph clustering but also enhances the robustness of algorithms.
Collapse
Affiliation(s)
- Wenming Wu
- School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China.
| | - Quan Wang
- School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| | - Maoguo Gong
- School of Electronic Engineering, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| | - Quanxue Gao
- School of Telecommunication, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| |
Collapse
|
2
|
Rai MN, Rai R, Sethiya P, Parsania C. Transcriptome analysis reveals a common adaptive transcriptional response of Candida glabrata to diverse environmental stresses. Res Microbiol 2023:104073. [PMID: 37100335 DOI: 10.1016/j.resmic.2023.104073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 04/10/2023] [Accepted: 04/17/2023] [Indexed: 04/28/2023]
Abstract
Candida glabrata, an opportunistic fungal pathogen, causes superficial and life-threatening infections in humans. In the host microenvironment, C. glabrata encounters a variety of stresses, and its ability to cope with these stresses is crucial for its pathogenesis. To gain insights into how C. glabrata adapts to adverse environmental conditions, we examined its transcriptional landscape under heat, osmotic, cell wall, oxidative, and genotoxic stresses using RNA sequencing and reveal that C. glabrata displays a diverse transcriptional response involving ∼75% of its genome for adaptation to different environmental stresses. C. glabrata mounts a central common adaptation response wherein ∼25% of all genes (n = 1370) are regulated in a similar fashion at different environmental stresses. Elevated cellular translation and diminished mitochondrial activity-associated transcriptional signature characterize the common adaptation response. Transcriptional regulatory association networks of common adaptation response genes revealed a set of 29 transcription factors acting as potential activators and repressors of associated adaptive response genes. Overall, the current work delineates the adaptive responses of C. glabrata to diverse environmental stresses and reports the existence of a common adaptive transcriptional response upon prolonged exposure to environmental stresses.
Collapse
Affiliation(s)
- Maruti Nandan Rai
- Institute for Sustainability, Energy, and Environment, University of Illinois at Urbana Champaign, IL, USA.
| | - Rikky Rai
- Citrus Research and Education Center, University of Florida, FL, USA.
| | - Pooja Sethiya
- Centre for Infectious Diseases and Microbiology, The Westmead Institute for Medical Research, Westmead 2145 NSW, University of Sydney, Australia.
| | - Chirag Parsania
- Gene and Stem Cell Therapy Program Centenary Institute, Camperdown, NSW, 2050, Faculty of medicine and health, University of Sydney, Australia.
| |
Collapse
|
3
|
Starr AL, Gokhman D, Fraser HB. Accounting for cis-regulatory constraint prioritizes genes likely to affect species-specific traits. Genome Biol 2023; 24:11. [PMID: 36658652 PMCID: PMC9850818 DOI: 10.1186/s13059-023-02846-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 01/04/2023] [Indexed: 01/20/2023] Open
Abstract
Measuring allele-specific expression in interspecies hybrids is a powerful way to detect cis-regulatory changes underlying adaptation. However, it remains difficult to identify genes most likely to explain species-specific traits. Here, we outline a simple strategy that leverages population-scale allele-specific RNA-seq data to identify genes that show constrained cis-regulation within species yet show divergence between species. Applying this strategy to data from human-chimpanzee hybrid cortical organoids, we identify signatures of lineage-specific selection on genes related to saccharide metabolism, neurodegeneration, and primary cilia. We also highlight cis-regulatory divergence in CUX1 and EDNRB that may shape the trajectory of human brain development.
Collapse
Affiliation(s)
| | - David Gokhman
- Department of Biology, Stanford University, Stanford, CA, USA
- Present Address: Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Hunter B Fraser
- Department of Biology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
4
|
Iyer NR, Shin J, Cuskey S, Tian Y, Nicol NR, Doersch TE, Seipel F, McCalla SG, Roy S, Ashton RS. Modular derivation of diverse, regionally discrete human posterior CNS neurons enables discovery of transcriptomic patterns. SCIENCE ADVANCES 2022; 8:eabn7430. [PMID: 36179024 PMCID: PMC9524835 DOI: 10.1126/sciadv.abn7430] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 08/16/2022] [Indexed: 06/02/2023]
Abstract
Our inability to derive the neuronal diversity that comprises the posterior central nervous system (pCNS) using human pluripotent stem cells (hPSCs) poses an impediment to understanding human neurodevelopment and disease in the hindbrain and spinal cord. Here, we establish a modular, monolayer differentiation paradigm that recapitulates both rostrocaudal (R/C) and dorsoventral (D/V) patterning, enabling derivation of diverse pCNS neurons with discrete regional specificity. First, neuromesodermal progenitors (NMPs) with discrete HOX profiles are converted to pCNS progenitors (pCNSPs). Then, by tuning D/V signaling, pCNSPs are directed to locomotor or somatosensory neurons. Expansive single-cell RNA-sequencing (scRNA-seq) analysis coupled with a novel computational pipeline allowed us to detect hundreds of transcriptional markers within region-specific phenotypes, enabling discovery of gene expression patterns across R/C and D/V developmental axes. These findings highlight the potential of these resources to advance a mechanistic understanding of pCNS development, enhance in vitro models, and inform therapeutic strategies.
Collapse
Affiliation(s)
- Nisha R. Iyer
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Stephanie Cuskey
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Yucheng Tian
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| | - Noah R. Nicol
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| | - Tessa E. Doersch
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Frank Seipel
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| | - Sunnie Grace McCalla
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Randolph S. Ashton
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
5
|
Bulbul Ahmed M, Humayan Kabir A. Understanding of the various aspects of gene regulatory networks related to crop improvement. Gene 2022; 833:146556. [PMID: 35609798 DOI: 10.1016/j.gene.2022.146556] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/14/2022] [Accepted: 05/06/2022] [Indexed: 12/30/2022]
Abstract
The hierarchical relationship between transcription factors, associated proteins, and their target genes is defined by a gene regulatory network (GRN). GRNs allow us to understand how the genotype and environment of a plant are incorporated to control the downstream physiological responses. During plant growth or environmental acclimatization, GRNs are diverse and can be differently regulated across tissue types and organs. An overview of recent advances in the development of GRN that speed up basic and applied plant research is given here. Furthermore, the overview of genome and transcriptome involving GRN research along with the exciting advancement and application are discussed. In addition, different approaches to GRN predictions were elucidated. In this review, we also describe the role of GRN in crop improvement, crop plant manipulation, stress responses, speed breeding and identifying genetic variations/locus. Finally, the challenges and prospects of GRN in plant biology are discussed.
Collapse
Affiliation(s)
- Md Bulbul Ahmed
- Plant Science Department, McGill University, 21111 lakeshore Road, Ste. Anne de Bellevue H9X3V9, Quebec, Canada; Institut de Recherche en Biologie Végétale (IRBV), University of Montreal, Montréal, Québec H1X 2B2, Canada.
| | | |
Collapse
|
6
|
Abstract
Gene expression divergence through evolutionary processes is thought to be important for achieving programmed development in multicellular organisms. To test this premise in filamentous fungi, we investigated transcriptional profiles of 3,942 single-copy orthologous genes (SCOGs) in five related sordariomycete species that have morphologically diverged in the formation of their flask-shaped perithecia. We compared expression of the SCOGs to inferred gene expression levels of the most recent common ancestor of the five species, ranking genes from their largest increases to smallest increases in expression during perithecial development in each of the five species. We found that a large proportion of the genes that exhibited evolved increases in gene expression were important for normal perithecial development in Fusarium graminearum. Many of these genes were previously uncharacterized, encoding hypothetical proteins without any known functional protein domains. Interestingly, the developmental stages during which aberrant knockout phenotypes appeared largely coincided with the elevated expression of the deleted genes. In addition, we identified novel genes that affected normal perithecial development in Magnaporthe oryzae and Neurospora crassa, which were functionally and transcriptionally diverged from the orthologous counterparts in F. graminearum. Furthermore, comparative analysis of developmental transcriptomes and phylostratigraphic analysis suggested that genes encoding hypothetical proteins are generally young and transcriptionally divergent between related species. This study provides tangible evidence of shifts in gene expression that led to acquisition of novel function of orthologous genes in each lineage and demonstrates that several genes with hypothetical function are crucial for shaping multicellular fruiting bodies. IMPORTANCE The fungal class Sordariomycetes includes numerous important plant and animal pathogens. It also provides model systems for studying fungal fruiting body development, as its members develop fruiting bodies with a few well-characterized tissue types on common growth media and have rich genomic resources that enable comparative and functional analyses. To understand transcriptional divergence of key developmental genes between five related sordariomycete fungi, we performed targeted knockouts of genes inferred to have evolved significant upward shifts in expression. We found that many previously uncharacterized genes play indispensable roles at different stages of fruiting body development, which have undergone transcriptional activation in specific lineages. These novel genes are predicted to be phylogenetically young and tend to be involved in lineage- or species-specific function. Transcriptional activation of genes with unknown function seems to be more frequent than ever thought, which may be crucial for rapid adaption to changing environments for successful sexual reproduction.
Collapse
|
7
|
Zhang S, Knaack S, Roy S. Enabling Studies of Genome-Scale Regulatory Network Evolution in Large Phylogenies with MRTLE. Methods Mol Biol 2022; 2477:439-455. [PMID: 35524131 PMCID: PMC9794031 DOI: 10.1007/978-1-0716-2257-5_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Transcriptional regulatory networks specify context-specific patterns of genes and play a central role in how species evolve and adapt. Inferring genome-scale regulatory networks in non-model species is the first step for examining patterns of conservation and divergence of regulatory networks. Transcriptomic data obtained under varying environmental stimuli in multiple species are becoming increasingly available, which can be used to infer regulatory networks. However, inference and analysis of multiple gene regulatory networks in a phylogenetic setting remains challenging. We developed an algorithm, Multi-species Regulatory neTwork LEarning (MRTLE), to facilitate such studies of regulatory network evolution. MRTLE is a probabilistic graphical model-based algorithm that uses phylogenetic structure, transcriptomic data for multiple species, and sequence-specific motifs in each species to simultaneously infer genome-scale regulatory networks across multiple species. We applied MRTLE to study regulatory network evolution across six ascomycete yeasts using transcriptomic measurements collected across different stress conditions. MRTLE networks recapitulated experimentally derived interactions in the model organism S. cerevisiae as well as non-model species, and it was more beneficial for network inference than methods that do not use phylogenetic information. We examined the regulatory networks across species and found that regulators associated with significant expression and network changes are involved in stress-related processes. MTRLE and its associated downstream analysis provide a scalable and principled framework to examine evolutionary dynamics of transcriptional regulatory networks across multiple species in a large phylogeny.
Collapse
Affiliation(s)
- Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sara Knaack
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
8
|
Ovens K, Eames BF, McQuillan I. Comparative Analyses of Gene Co-expression Networks: Implementations and Applications in the Study of Evolution. Front Genet 2021; 12:695399. [PMID: 34484293 PMCID: PMC8414652 DOI: 10.3389/fgene.2021.695399] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
Similarities and differences in the associations of biological entities among species can provide us with a better understanding of evolutionary relationships. Often the evolution of new phenotypes results from changes to interactions in pre-existing biological networks and comparing networks across species can identify evidence of conservation or adaptation. Gene co-expression networks (GCNs), constructed from high-throughput gene expression data, can be used to understand evolution and the rise of new phenotypes. The increasing abundance of gene expression data makes GCNs a valuable tool for the study of evolution in non-model organisms. In this paper, we cover motivations for why comparing these networks across species can be valuable for the study of evolution. We also review techniques for comparing GCNs in the context of evolution, including local and global methods of graph alignment. While some protein-protein interaction (PPI) bioinformatic methods can be used to compare co-expression networks, they often disregard highly relevant properties, including the existence of continuous and negative values for edge weights. Also, the lack of comparative datasets in non-model organisms has hindered the study of evolution using PPI networks. We also discuss limitations and challenges associated with cross-species comparison using GCNs, and provide suggestions for utilizing co-expression network alignments as an indispensable tool for evolutionary studies going forward.
Collapse
Affiliation(s)
- Katie Ovens
- Augmented Intelligence & Precision Health Laboratory (AIPHL), Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - B. Frank Eames
- Department of Anatomy, Physiology, & Pharmacology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
9
|
Tripathi RK, Wilkins O. Single cell gene regulatory networks in plants: Opportunities for enhancing climate change stress resilience. PLANT, CELL & ENVIRONMENT 2021; 44:2006-2017. [PMID: 33522607 PMCID: PMC8359182 DOI: 10.1111/pce.14012] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 05/05/2023]
Abstract
Global warming poses major challenges for plant survival and agricultural productivity. Thus, efforts to enhance stress resilience in plants are key strategies for protecting food security. Gene regulatory networks (GRNs) are a critical mechanism conferring stress resilience. Until recently, predicting GRNs of the individual cells that make up plants and other multicellular organisms was impeded by aggregate population scale measurements of transcriptome and other genome-scale features. With the advancement of high-throughput single cell RNA-seq and other single cell assays, learning GRNs for individual cells is now possible, in principle. In this article, we report on recent advances in experimental and analytical methodologies for single cell sequencing assays especially as they have been applied to the study of plants. We highlight recent advances and ongoing challenges for scGRN prediction, and finally, we highlight the opportunity to use scGRN discovery for studying and ultimately enhancing abiotic stress resilience in plants.
Collapse
Affiliation(s)
- Rajiv K. Tripathi
- Department of Biological SciencesUniversity of ManitobaWinnipegManitobaCanada
| | - Olivia Wilkins
- Department of Biological SciencesUniversity of ManitobaWinnipegManitobaCanada
| |
Collapse
|
10
|
Heineike BM, El-Samad H. Paralogs in the PKA Regulon Traveled Different Evolutionary Routes to Divergent Expression in Budding Yeast. FRONTIERS IN FUNGAL BIOLOGY 2021; 2:642336. [PMID: 37744115 PMCID: PMC10512328 DOI: 10.3389/ffunb.2021.642336] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/24/2021] [Indexed: 09/26/2023]
Abstract
Functional divergence of duplicate genes, or paralogs, is an important driver of novelty in evolution. In the model yeast Saccharomyces cerevisiae, there are 547 paralog gene pairs that survive from an interspecies Whole Genome Hybridization (WGH) that occurred ~100MYA. In this work, we report that ~1/6th (110) of these WGH paralogs pairs (or ohnologs) are differentially expressed with a striking pattern upon Protein Kinase A (PKA) inhibition. One member of each pair in this group has low basal expression that increases upon PKA inhibition, while the other has moderate and unchanging expression. For these genes, expression of orthologs upon PKA inhibition in the non-WGH species Kluyveromyces lactis and for PKA-related stresses in other budding yeasts shows unchanging expression, suggesting that lack of responsiveness to PKA was likely the typical ancestral phenotype prior to duplication. Promoter sequence analysis across related budding yeast species further revealed that the subsequent emergence of PKA-dependence took different evolutionary routes. In some examples, regulation by PKA and differential expression appears to have arisen following the WGH, while in others, regulation by PKA appears to have arisen in one of the two parental lineages prior to the WGH. More broadly, our results illustrate the unique opportunities presented by a WGH event for generating functional divergence by bringing together two parental lineages with separately evolved regulation into one species. We propose that functional divergence of two ohnologs can be facilitated through such regulatory divergence.
Collapse
Affiliation(s)
- Benjamin M. Heineike
- Bioinformatics Graduate Program, University of California, San Francisco, San Francisco, CA, United States
| | - Hana El-Samad
- Department of Biochemistry and Biophysics, California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA, United States
- Chan Zuckerberg Biohub, San Francisco, CA, United States
| |
Collapse
|
11
|
Shin J, Marx H, Richards A, Vaneechoutte D, Jayaraman D, Maeda J, Chakraborty S, Sussman M, Vandepoele K, Ané JM, Coon J, Roy S. A network-based comparative framework to study conservation and divergence of proteomes in plant phylogenies. Nucleic Acids Res 2021; 49:e3. [PMID: 33219668 PMCID: PMC7797074 DOI: 10.1093/nar/gkaa1041] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 09/19/2020] [Accepted: 10/19/2020] [Indexed: 12/23/2022] Open
Abstract
Comparative functional genomics offers a powerful approach to study species evolution. To date, the majority of these studies have focused on the transcriptome in mammalian and yeast phylogenies. Here, we present a novel multi-species proteomic dataset and a computational pipeline to systematically compare the protein levels across multiple plant species. Globally we find that protein levels diverge according to phylogenetic distance but is more constrained than the mRNA level. Module-level comparative analysis of groups of proteins shows that proteins that are more highly expressed tend to be more conserved. To interpret the evolutionary patterns of conservation and divergence, we develop a novel network-based integrative analysis pipeline that combines publicly available transcriptomic datasets to define co-expression modules. Our analysis pipeline can be used to relate the changes in protein levels to different species-specific phenotypic traits. We present a case study with the rhizobia-legume symbiosis process that supports the role of autophagy in this symbiotic association.
Collapse
Affiliation(s)
- Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Harald Marx
- Department of Microbiology and Ecosystem Science, University of Vienna, Althanstraße 14, 1090 Vienna, Austria
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Alicia Richards
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Dries Vaneechoutte
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, Belgium
- VIB Center for Plant Systems Biology, VIB, Technologiepark 927, Ghent, Belgium
| | - Dhileepkumar Jayaraman
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Junko Maeda
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Sanhita Chakraborty
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Michael Sussman
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, Belgium
- VIB Center for Plant Systems Biology, VIB, Technologiepark 927, Ghent, Belgium
| | - Jean-Michel Ané
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Joshua Coon
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
| |
Collapse
|
12
|
Mehta TK, Koch C, Nash W, Knaack SA, Sudhakar P, Olbei M, Bastkowski S, Penso-Dolfin L, Korcsmaros T, Haerty W, Roy S, Di-Palma F. Evolution of regulatory networks associated with traits under selection in cichlids. Genome Biol 2021; 22:25. [PMID: 33419455 PMCID: PMC7791837 DOI: 10.1186/s13059-020-02208-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 11/18/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Seminal studies of vertebrate protein evolution speculated that gene regulatory changes can drive anatomical innovations. However, very little is known about gene regulatory network (GRN) evolution associated with phenotypic effect across ecologically diverse species. Here we use a novel approach for comparative GRN analysis in vertebrate species to study GRN evolution in representative species of the most striking examples of adaptive radiations, the East African cichlids. We previously demonstrated how the explosive phenotypic diversification of East African cichlids can be attributed to diverse molecular mechanisms, including accelerated regulatory sequence evolution and gene expression divergence. RESULTS To investigate these mechanisms across species at a genome-wide scale, we develop a novel computational pipeline that predicts regulators for co-extant and ancestral co-expression modules along a phylogeny, and candidate regulatory regions associated with traits under selection in cichlids. As a case study, we apply our approach to a well-studied adaptive trait-the visual system-for which we report striking cases of network rewiring for visual opsin genes, identify discrete regulatory variants, and investigate their association with cichlid visual system evolution. In regulatory regions of visual opsin genes, in vitro assays confirm that transcription factor binding site mutations disrupt regulatory edges across species and segregate according to lake species phylogeny and ecology, suggesting GRN rewiring in radiating cichlids. CONCLUSIONS Our approach reveals numerous novel potential candidate regulators and regulatory regions across cichlid genomes, including some novel and some previously reported associations to known adaptive evolutionary traits.
Collapse
Affiliation(s)
| | - Christopher Koch
- Department of Biostatistics and Medical Informatics, UW Madison, Madison, USA
| | | | - Sara A Knaack
- Wisconsin Institute for Discovery (WID), Madison, USA
| | | | - Marton Olbei
- Earlham Institute (EI), Norwich, UK
- Quadram Institute, Norwich, UK
| | - Sarah Bastkowski
- Earlham Institute (EI), Norwich, UK
- Quadram Institute, Norwich, UK
| | | | - Tamas Korcsmaros
- Earlham Institute (EI), Norwich, UK
- Quadram Institute, Norwich, UK
| | | | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, UW Madison, Madison, USA.
- Wisconsin Institute for Discovery (WID), Madison, USA.
- Department of Computer Sciences, UW Madison, Madison, USA.
| | - Federica Di-Palma
- Earlham Institute (EI), Norwich, UK.
- Norwich Medical School, University of East Anglia, Norwich, UK.
- School of Biological Sciences, University of East Anglia, Norwich, UK.
| |
Collapse
|
13
|
Grote A, Li Y, Liu C, Voronin D, Geber A, Lustigman S, Unnasch TR, Welch L, Ghedin E. Prediction pipeline for discovery of regulatory motifs associated with Brugia malayi molting. PLoS Negl Trop Dis 2020; 14:e0008275. [PMID: 32574217 PMCID: PMC7337397 DOI: 10.1371/journal.pntd.0008275] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 07/06/2020] [Accepted: 04/07/2020] [Indexed: 11/19/2022] Open
Abstract
Filarial nematodes can cause debilitating diseases in humans. They have complicated life cycles involving an insect vector and mammalian hosts, and they go through a number of developmental molts. While whole genome sequences of parasitic worms are now available, very little is known about transcription factor (TF) binding sites and their cognate transcription factors that play a role in regulating development. To address this gap, we developed a novel motif prediction pipeline, Emotif Alpha, that integrates ten different motif discovery algorithms, multiple statistical tests, and a comparative analysis of conserved elements between the filarial worms Brugia malayi and Onchocerca volvulus, and the free-living nematode Caenorhabditis elegans. We identified stage-specific TF binding motifs in B. malayi, with a particular focus on those potentially involved in the L3-L4 molt, a stage important for the establishment of infection in the mammalian host. Using an in vitro molting system, we tested and validated three of these motifs demonstrating the accuracy of the motif prediction pipeline. Diseases caused by parasitic worms such as the filariae are among the leading causes of morbidity in the developing world. Very little is known about how development is regulated in these vector-transmitted parasites. We have developed a computational method to identify motifs that correspond to transcription factor binding sites in the genome of the parasitic worm, Brugia malayi, one of the causative agents of lymphatic filariasis. Using this approach, we were able to predict stage-specific transcription factor binding sites involved in a stage of the molting process important for the establishment of the infection. We validated the role of these motifs using an in vitro molting system.
Collapse
Affiliation(s)
- Alexandra Grote
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Yichao Li
- School of Computer Science and Electrical Engineering, Ohio University, Athens, Ohio, United States of America
| | - Canhui Liu
- Center for Global Infectious Disease Research, University of South Florida, Tampa, FL, Florida, United States of America
| | - Denis Voronin
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, New York, United States of America
| | - Adam Geber
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Sara Lustigman
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, New York, United States of America
| | - Thomas R. Unnasch
- Center for Global Infectious Disease Research, University of South Florida, Tampa, FL, Florida, United States of America
| | - Lonnie Welch
- School of Computer Science and Electrical Engineering, Ohio University, Athens, Ohio, United States of America
- * E-mail: (LW); (EG)
| | - Elodie Ghedin
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Epidemiology, School of Global Public Health, New York University, New York, New York, United States of America
- * E-mail: (LW); (EG)
| |
Collapse
|
14
|
Sethiya P, Rai MN, Rai R, Parsania C, Tan K, Wong KH. Transcriptomic analysis reveals global and temporal transcription changes during Candida glabrata adaptation to an oxidative environment. Fungal Biol 2020; 124:427-439. [DOI: 10.1016/j.funbio.2019.12.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 12/13/2019] [Accepted: 12/17/2019] [Indexed: 01/14/2023]
|
15
|
Erola P, Björkegren JLM, Michoel T. Model-based clustering of multi-tissue gene expression data. Bioinformatics 2020; 36:1807-1813. [PMID: 31688915 PMCID: PMC7162352 DOI: 10.1093/bioinformatics/btz805] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Revised: 09/05/2019] [Accepted: 10/31/2019] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues. RESULTS We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals. AVAILABILITY AND IMPLEMENTATION Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pau Erola
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre (ICMC), Karolinska Institutet, Huddinge 141 57, Sweden
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen N-5020, Norway
| |
Collapse
|
16
|
Liebeskind BJ, Aldrich RW, Marcotte EM. Ancestral reconstruction of protein interaction networks. PLoS Comput Biol 2019; 15:e1007396. [PMID: 31658251 PMCID: PMC6837550 DOI: 10.1371/journal.pcbi.1007396] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 11/07/2019] [Accepted: 09/11/2019] [Indexed: 11/19/2022] Open
Abstract
The molecular and cellular basis of novelty is an active area of research in evolutionary biology. Until very recently, the vast majority of cellular phenomena were so difficult to sample that cross-species studies of biochemistry were rare and comparative analysis at the level of biochemical systems was almost impossible. Recent advances in systems biology are changing what is possible, however, and comparative phylogenetic methods that can handle this new data are wanted. Here, we introduce the term “phylogenetic latent variable models” (PLVMs, pronounced “plums”) for a class of models that has recently been used to infer the evolution of cellular states from systems-level molecular data, and develop a new parameterization and fitting strategy that is useful for comparative inference of biochemical networks. We deploy this new framework to infer the ancestral states and evolutionary dynamics of protein-interaction networks by analyzing >16,000 predominantly metazoan co-fractionation and affinity-purification mass spectrometry experiments. Based on these data, we estimate ancestral interactions across unikonts, broadly recovering protein complexes involved in translation, transcription, proteostasis, transport, and membrane trafficking. Using these results, we predict an ancient core of the Commander complex made up of CCDC22, CCDC93, C16orf62, and DSCR3, with more recent additions of COMMD-containing proteins in tetrapods. We also use simulations to develop model fitting strategies and discuss future model developments. Our ability to probe the inner workings of cells is constantly growing. This is true not only for workhorse model organisms like fruit flies and brewer’s yeast, but increasingly for organisms whose biology is less well trodden—corals, butterflies, exotic plants and fungi, and even precious clinical samples are all fair game. However, the mathematical models that we use to compare biology across species and infer evolutionary dynamics have not kept pace. Sophisticated models exist for DNA and protein sequences, but models that can handle functional cellular data are in their infancy. In this study we introduce a new model that we use to infer the evolutionary history of protein interaction networks from cutting-edge high-throughput proteomics data. We use this model to reconstruct the cell biology of the ancestors we share with fungi and slime molds, and propose a path by which a recently described protein complex involved in human development might have evolved.
Collapse
Affiliation(s)
- Benjamin J. Liebeskind
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
- Department of Neuroscience, University of Texas at Austin, Austin, Texas, United States of America
| | - Richard W. Aldrich
- Department of Neuroscience, University of Texas at Austin, Austin, Texas, United States of America
| | - Edward M. Marcotte
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
- * E-mail:
| |
Collapse
|
17
|
Chasman D, Iyer N, Fotuhi Siahpirani A, Estevez Silva M, Lippmann E, McIntosh B, Probasco MD, Jiang P, Stewart R, Thomson JA, Ashton RS, Roy S. Inferring Regulatory Programs Governing Region Specificity of Neuroepithelial Stem Cells during Early Hindbrain and Spinal Cord Development. Cell Syst 2019; 9:167-186.e12. [PMID: 31302154 DOI: 10.1016/j.cels.2019.05.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 05/05/2019] [Accepted: 05/30/2019] [Indexed: 12/19/2022]
Abstract
Neuroepithelial stem cells (NSC) from different anatomical regions of the embryonic neural tube's rostrocaudal axis can differentiate into diverse central nervous system tissues, but the transcriptional regulatory networks governing these processes are incompletely understood. Here, we measure region-specific NSC gene expression along the rostrocaudal axis in a human pluripotent stem cell model of early central nervous system development over a 72-h time course, spanning the hindbrain to cervical spinal cord. We introduce Escarole, a probabilistic clustering algorithm for non-stationary time series, and combine it with prior-based regulatory network inference to identify genes that are regulated dynamically and predict their upstream regulators. We identify known regulators of patterning and neural development, including the HOX genes, and predict a direct regulatory connection between the transcription factor POU3F2 and target gene STMN2. We demonstrate that POU3F2 is required for expression of STMN2, suggesting that this regulatory connection is important for region specificity of NSCs.
Collapse
Affiliation(s)
- Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Nisha Iyer
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Maria Estevez Silva
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Ethan Lippmann
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Brian McIntosh
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Mitchell D Probasco
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Peng Jiang
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - James A Thomson
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Randolph S Ashton
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA.
| |
Collapse
|
18
|
Kuang MC, Kominek J, Alexander WG, Cheng JF, Wrobel RL, Hittinger CT. Repeated Cis-Regulatory Tuning of a Metabolic Bottleneck Gene during Evolution. Mol Biol Evol 2019; 35:1968-1981. [PMID: 29788479 PMCID: PMC6063270 DOI: 10.1093/molbev/msy102] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Repeated evolutionary events imply underlying genetic constraints that can make evolutionary mechanisms predictable. Morphological traits are thought to evolve frequently through cis-regulatory changes because these mechanisms bypass constraints in pleiotropic genes that are reused during development. In contrast, the constraints acting on metabolic traits during evolution are less well studied. Here we show how a metabolic bottleneck gene has repeatedly adopted similar cis-regulatory solutions during evolution, likely due to its pleiotropic role integrating flux from multiple metabolic pathways. Specifically, the genes encoding phosphoglucomutase activity (PGM1/PGM2), which connect GALactose catabolism to glycolysis, have gained and lost direct regulation by the transcription factor Gal4 several times during yeast evolution. Through targeted mutations of predicted Gal4-binding sites in yeast genomes, we show this galactose-mediated regulation of PGM1/2 supports vigorous growth on galactose in multiple yeast species, including Saccharomyces uvarum and Lachancea kluyveri. Furthermore, the addition of galactose-inducible PGM1 alone is sufficient to improve the growth on galactose of multiple species that lack this regulation, including Saccharomyces cerevisiae. The strong association between regulation of PGM1/2 by Gal4 even enables remarkably accurate predictions of galactose growth phenotypes between closely related species. This repeated mode of evolution suggests that this specific cis-regulatory connection is a common way that diverse yeasts can govern flux through the pathway, likely due to the constraints imposed by this pleiotropic bottleneck gene. Since metabolic pathways are highly interconnected, we argue that cis-regulatory evolution might be widespread at pleiotropic genes that control metabolic bottlenecks and intersections.
Collapse
Affiliation(s)
- Meihua Christina Kuang
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI.,Graduate Program in Cellular and Molecular Biology, University of Wisconsin-Madison, Madison, WI
| | - Jacek Kominek
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI
| | - William G Alexander
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI
| | | | - Russell L Wrobel
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI
| | - Chris Todd Hittinger
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI.,Graduate Program in Cellular and Molecular Biology, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI
| |
Collapse
|
19
|
Erola P, Bonnet E, Michoel T. Learning Differential Module Networks Across Multiple Experimental Conditions. Methods Mol Biol 2019; 1883:303-321. [PMID: 30547406 DOI: 10.1007/978-1-4939-8882-2_13] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Module network inference is a statistical method to reconstruct gene regulatory networks, which uses probabilistic graphical models to learn modules of coregulated genes and their upstream regulatory programs from genome-wide gene expression and other omics data. Here, we review the basic theory of module network inference, present protocols for common gene regulatory network reconstruction scenarios based on the Lemon-Tree software, and show, using human gene expression data, how the software can also be applied to learn differential module networks across multiple experimental conditions.
Collapse
Affiliation(s)
- Pau Erola
- Division of Genetics and Genomics, Roslin Institute, University of Edinburgh, Midlothian, Scotland, UK
| | - Eric Bonnet
- Centre National de Recherche en Génomique Humaine, Institut de Biologie François Jacob, Direction de la Recherche Fondamentale, CEA, Evry, France
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, University of Edinburgh, Midlothian, Scotland, UK.
- Current Address: Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
| |
Collapse
|
20
|
Siahpirani AF, Roy S. A prior-based integrative framework for functional transcriptional regulatory network inference. Nucleic Acids Res 2018; 45:e21. [PMID: 27794550 PMCID: PMC5389674 DOI: 10.1093/nar/gkw963] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2015] [Accepted: 10/12/2016] [Indexed: 12/16/2022] Open
Abstract
Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization.
Collapse
Affiliation(s)
- Alireza F Siahpirani
- Department of Computer Sciences, University of Wisconsin-Madison, 1210 W. Dayton St. Madison, WI 53706-1613, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Discovery Building 330 North Orchard St. Madison, WI 53715, USA.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, K6/446 Clinical Sciences Center 600 Highland Avenue Madison, WI 53792-4675, USA
| |
Collapse
|
21
|
Wang Z, Gudibanda A, Ugwuowo U, Trail F, Townsend JP. Using evolutionary genomics, transcriptomics, and systems biology to reveal gene networks underlying fungal development. FUNGAL BIOL REV 2018. [DOI: 10.1016/j.fbr.2018.02.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
22
|
Benchouaia M, Ripoche H, Sissoko M, Thiébaut A, Merhej J, Delaveau T, Fasseu L, Benaissa S, Lorieux G, Jourdren L, Le Crom S, Lelandais G, Corel E, Devaux F. Comparative Transcriptomics Highlights New Features of the Iron Starvation Response in the Human Pathogen Candida glabrata. Front Microbiol 2018; 9:2689. [PMID: 30505294 PMCID: PMC6250833 DOI: 10.3389/fmicb.2018.02689] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 10/22/2018] [Indexed: 11/21/2022] Open
Abstract
In this work, we used comparative transcriptomics to identify regulatory outliers (ROs) in the human pathogen Candida glabrata. ROs are genes that have very different expression patterns compared to their orthologs in other species. From comparative transcriptome analyses of the response of eight yeast species to toxic doses of selenite, a pleiotropic stress inducer, we identified 38 ROs in C. glabrata. Using transcriptome analyses of C. glabrata response to five different stresses, we pointed out five ROs which were more particularly responsive to iron starvation, a process which is very important for C. glabrata virulence. Global chromatin Immunoprecipitation and gene profiling analyses showed that four of these genes are actually new targets of the iron starvation responsive Aft2 transcription factor in C. glabrata. Two of them (HBS1 and DOM34b) are required for C. glabrata optimal growth in iron limited conditions. In S. cerevisiae, the orthologs of these two genes are involved in ribosome rescue by the NO GO decay (NGD) pathway. Hence, our results suggest a specific contribution of NGD co-factors to the C. glabrata adaptation to iron starvation.
Collapse
Affiliation(s)
- Médine Benchouaia
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Hugues Ripoche
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Mariam Sissoko
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Antonin Thiébaut
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Jawad Merhej
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Thierry Delaveau
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Laure Fasseu
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Sabrina Benaissa
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Geneviève Lorieux
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| | - Laurent Jourdren
- École Normale Supérieure, PSL Research University, CNRS, Inserm U1024, Institut de Biologie de l’École Normale Supérieure, Plateforme Génomique, Paris, France
| | - Stéphane Le Crom
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7138, Évolution, Paris, France
| | - Gaëlle Lelandais
- UMR 9198, Institute for Integrative Biology of the Cell, CEA, CNRS, Université Paris-Sud, UPSay, Gif-sur-Yvette, France
| | - Eduardo Corel
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7138, Évolution, Paris, France
| | - Frédéric Devaux
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
- *Correspondence: Frédéric Devaux,
| |
Collapse
|
23
|
Trail F, Wang Z, Stefanko K, Cubba C, Townsend JP. The ancestral levels of transcription and the evolution of sexual phenotypes in filamentous fungi. PLoS Genet 2017; 13:e1006867. [PMID: 28704372 PMCID: PMC5509106 DOI: 10.1371/journal.pgen.1006867] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 06/13/2017] [Indexed: 12/29/2022] Open
Abstract
Changes in gene expression have been hypothesized to play an important role in the evolution of divergent morphologies. To test this hypothesis in a model system, we examined differences in fruiting body morphology of five filamentous fungi in the Sordariomycetes, culturing them in a common garden environment and profiling genome-wide gene expression at five developmental stages. We reconstructed ancestral gene expression phenotypes, identifying genes with the largest evolved increases in gene expression across development. Conducting knockouts and performing phenotypic analysis in two divergent species typically demonstrated altered fruiting body development in the species that had evolved increased expression. Our evolutionary approach to finding relevant genes proved far more efficient than other gene deletion studies targeting whole genomes or gene families. Combining gene expression measurements with knockout phenotypes facilitated the refinement of Bayesian networks of the genes underlying fruiting body development, regulation of which is one of the least understood processes of multicellular development.
Collapse
Affiliation(s)
- Frances Trail
- Department of Plant Biology, Michigan State University, East Lansing, MI, United States of America
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, United States of America
| | - Zheng Wang
- Department of Biostatistics, Yale University, New Haven, CT, United States of America
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States of America
| | - Kayla Stefanko
- Department of Plant Biology, Michigan State University, East Lansing, MI, United States of America
| | - Caitlyn Cubba
- Department of Plant Biology, Michigan State University, East Lansing, MI, United States of America
| | - Jeffrey P. Townsend
- Department of Biostatistics, Yale University, New Haven, CT, United States of America
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| |
Collapse
|
24
|
Koch C, Konieczka J, Delorey T, Lyons A, Socha A, Davis K, Knaack SA, Thompson D, O'Shea EK, Regev A, Roy S. Inference and Evolutionary Analysis of Genome-Scale Regulatory Networks in Large Phylogenies. Cell Syst 2017; 4:543-558.e8. [PMID: 28544882 PMCID: PMC5515301 DOI: 10.1016/j.cels.2017.04.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 02/20/2017] [Accepted: 04/26/2017] [Indexed: 11/22/2022]
Abstract
Changes in transcriptional regulatory networks can significantly contribute to species evolution and adaptation. However, identification of genome-scale regulatory networks is an open challenge, especially in non-model organisms. Here, we introduce multi-species regulatory network learning (MRTLE), a computational approach that uses phylogenetic structure, sequence-specific motifs, and transcriptomic data, to infer the regulatory networks in different species. Using simulated data from known networks and transcriptomic data from six divergent yeasts, we demonstrate that MRTLE predicts networks with greater accuracy than existing methods because it incorporates phylogenetic information. We used MRTLE to infer the structure of the transcriptional networks that control the osmotic stress responses of divergent, non-model yeast species and then validated our predictions experimentally. Interrogating these networks reveals that gene duplication promotes network divergence across evolution. Taken together, our approach facilitates study of regulatory network evolutionary dynamics across multiple poorly studied species.
Collapse
Affiliation(s)
- Christopher Koch
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wl, USA
| | - Jay Konieczka
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Toni Delorey
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Ana Lyons
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Amanda Socha
- Dartmouth College, Biology department, Hanover, NH 03755, USA
| | - Kathleen Davis
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | - Sara A Knaack
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
| | - Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Erin K O'Shea
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
- Howard Hughes Medical Institute, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Department of Molecular and Cellular Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wl, USA
| |
Collapse
|
25
|
Roy S, Sridharan R. Chromatin module inference on cellular trajectories identifies key transition points and poised epigenetic states in diverse developmental processes. Genome Res 2017; 27:1250-1262. [PMID: 28424352 PMCID: PMC5495076 DOI: 10.1101/gr.215004.116] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2016] [Accepted: 04/12/2017] [Indexed: 12/13/2022]
Abstract
Changes in chromatin state play important roles in cell fate transitions. Current computational approaches to analyze chromatin modifications across multiple cell types do not model how the cell types are related on a lineage or over time. To overcome this limitation, we developed a method called Chromatin Module INference on Trees (CMINT), a probabilistic clustering approach to systematically capture chromatin state dynamics across multiple cell types. Compared to existing approaches, CMINT can handle complex lineage topologies, capture higher quality clusters, and reliably detect chromatin transitions between cell types. We applied CMINT to gain novel insights in two complex processes: reprogramming to induced pluripotent stem cells (iPSCs) and hematopoiesis. In reprogramming, chromatin changes could occur without large gene expression changes, different combinations of activating marks were associated with specific reprogramming factors, there was an order of acquisition of chromatin marks at pluripotency loci, and multivalent states (comprising previously undetermined combinations of activating and repressive histone modifications) were enriched for CTCF. In the hematopoietic system, we defined critical decision points in the lineage tree, identified regulatory elements that were enriched in cell-type–specific regions, and found that the underlying chromatin state was achieved by specific erasure of preexisting chromatin marks in the precursor cell or by de novo assembly. Our method provides a systematic approach to model the dynamics of chromatin state to provide novel insights into the relationships among cell types in diverse cell-fate specification processes.
Collapse
Affiliation(s)
- Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53715, USA.,Wisconsin Institute for Discovery, Madison, Wisconsin 53715, USA
| | - Rupa Sridharan
- Wisconsin Institute for Discovery, Madison, Wisconsin 53715, USA.,Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53715, USA
| |
Collapse
|
26
|
Zinc Cluster Transcription Factors Alter Virulence in Candida albicans. Genetics 2016; 205:559-576. [PMID: 27932543 DOI: 10.1534/genetics.116.195024] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/16/2016] [Indexed: 11/18/2022] Open
Abstract
Almost all humans are colonized with Candida albicans However, in immunocompromised individuals, this benign commensal organism becomes a serious, life-threatening pathogen. Here, we describe and analyze the regulatory networks that modulate innate responses in the host niches. We identified Zcf15 and Zcf29, two Zinc Cluster transcription Factors (ZCF) that are required for C. albicans virulence. Previous sequence analysis of clinical C. albicans isolates from immunocompromised patients indicates that both ZCF genes diverged during clonal evolution. Using in vivo animal models, ex vivo cell culture methods, and in vitro sensitivity assays, we demonstrate that knockout mutants of both ZCF15 and ZCF29 are hypersensitive to reactive oxygen species (ROS), suggesting they help neutralize the host-derived ROS produced by phagocytes, as well as establish a sustained infection in vivo Transcriptomic analysis of mutants under resting conditions where cells were not experiencing oxidative stress revealed a large network that control macro and micronutrient homeostasis, which likely contributes to overall pathogen fitness in host niches. Under oxidative stress, both transcription factors regulate a separate set of genes involved in detoxification of ROS and down-regulating ribosome biogenesis. ChIP-seq analysis, which reveals vastly different binding partners for each transcription factor (TF) before and after oxidative stress, further confirms these results. Furthermore, the absence of a dominant binding motif likely facilitates their mobility, and supports the notion that they represent a recent expansion of the ZCF family in the pathogenic Candida species. Our analyses provide a framework for understanding new aspects of the interface between C. albicans and host defense response, and extends our understanding of how complex cell behaviors are linked to the evolution of TFs.
Collapse
|
27
|
Fused Regression for Multi-source Gene Regulatory Network Inference. PLoS Comput Biol 2016; 12:e1005157. [PMID: 27923054 PMCID: PMC5140053 DOI: 10.1371/journal.pcbi.1005157] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 09/20/2016] [Indexed: 12/03/2022] Open
Abstract
Understanding gene regulatory networks is critical to understanding cellular differentiation and response to external stimuli. Methods for global network inference have been developed and applied to a variety of species. Most approaches consider the problem of network inference independently in each species, despite evidence that gene regulation can be conserved even in distantly related species. Further, network inference is often confined to single data-types (single platforms) and single cell types. We introduce a method for multi-source network inference that allows simultaneous estimation of gene regulatory networks in multiple species or biological processes through the introduction of priors based on known gene relationships such as orthology incorporated using fused regression. This approach improves network inference performance even when orthology mapping and conservation are incomplete. We refine this method by presenting an algorithm that extracts the true conserved subnetwork from a larger set of potentially conserved interactions and demonstrate the utility of our method in cross species network inference. Last, we demonstrate our method’s utility in learning from data collected on different experimental platforms. Gene regulatory networks describing related biological processes are thought to share conserved interaction structure. This assumption motivates a great deal of work in model systems–where discovery of gene regulation may be more experimentally tractable–but is difficult to directly evaluate using existing methods. The presence of shared structure in a well studied model system or process should make the problem of network inference in a related process easier, but this information is not often applied to the discovery of global gene regulatory networks. Further, to be able to successfully translate findings between different organisms, it is important to be able to identify where regulatory structure is different. We provide a method based on penalized fused regression for inferring gene regulatory networks given prior knowledge about the similarity of interactions in each network. This method is demonstrated on synthetic data, and applied to the problem of inferring networks in distantly related bacterial organisms. We then introduce an extension of the method to deal with the condition of uncertainty over the degree of regulatory conservation by simultaneously inferring gene conservation and interaction weights.
Collapse
|
28
|
Thompson DA, Cubillos FA. Natural gene expression variation studies in yeast. Yeast 2016; 34:3-17. [PMID: 27668700 DOI: 10.1002/yea.3210] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Revised: 09/16/2016] [Accepted: 09/18/2016] [Indexed: 11/06/2022] Open
Abstract
The rise of sequence information across different yeast species and strains is driving an increasing number of studies in the emerging field of genomics to associate polymorphic variants, mRNA abundance and phenotypic differences between individuals. Here, we gathered evidence from recent studies covering several layers that define the genotype-phenotype gap, such as mRNA abundance, allele-specific expression and translation efficiency to demonstrate how genetic variants co-evolve and define an individual's genome. Moreover, we exposed several antecedents where inter- and intra-specific studies led to opposite conclusions, probably owing to genetic divergence. Future studies in this area will benefit from the access to a massive array of well-annotated genomes and new sequencing technologies, which will allow the fine breakdown of the complex layers that delineate the genotype-phenotype map. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
| | - Francisco A Cubillos
- Centro de Estudios en Ciencia y Tecnología de Alimentos, Universidad de Santiago de Chile, Santiago, Chile.,Millennium Nucleus for Fungal Integrative and Synthetic Biology.,Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile, Santiago, Chile
| |
Collapse
|
29
|
Zuo C, Chen K, Hewitt KJ, Bresnick EH, Keleş S. A Hierarchical Framework for State-Space Matrix Inference and Clustering. Ann Appl Stat 2016; 10:1348-1372. [PMID: 29910842 DOI: 10.1214/16-aoas938] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC (Matrix Based Analysis for State-space Inference and Clustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its endogenous locus by utilizing transcription factor occupancy data and illustrated applicability of MBASIC in a wide variety of problems. In both studies, MBASIC showed higher levels of raw data fidelity than analyzing these data with a two-step approach using ENCODE results on transcription factor occupancy data.
Collapse
Affiliation(s)
- Chandler Zuo
- Department of Statistics, University of Wisconsin, Madison, WI, U.S.A.,Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, U.S.A
| | - Kailei Chen
- Department of Statistics, University of Wisconsin, Madison, WI, U.S.A.,Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, U.S.A
| | - Kyle J Hewitt
- Department of Cell and Regenerative Biology, University of Wisconsin, Madison, WI, U.S.A
| | - Emery H Bresnick
- Department of Cell and Regenerative Biology, University of Wisconsin, Madison, WI, U.S.A
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin, Madison, WI, U.S.A.,Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, U.S.A
| |
Collapse
|
30
|
Fotuhi Siahpirani A, Ay F, Roy S. A multi-task graph-clustering approach for chromosome conformation capture data sets identifies conserved modules of chromosomal interactions. Genome Biol 2016; 17:114. [PMID: 27233632 PMCID: PMC4882777 DOI: 10.1186/s13059-016-0962-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2016] [Accepted: 04/22/2016] [Indexed: 01/13/2023] Open
Abstract
Chromosome conformation capture methods are being increasingly used to study three-dimensional genome architecture in multiple cell types and species. An important challenge is to examine changes in three-dimensional architecture across cell types and species. We present Arboretum-Hi-C, a multi-task spectral clustering method, to identify common and context-specific aspects of genome architecture. Compared to standard clustering, Arboretum-Hi-C produced more biologically consistent patterns of conservation. Most clusters are conserved and enriched for either high- or low-activity genomic signals. Most genomic regions diverge between clusters with similar chromatin state except for a few that are associated with lamina-associated domains and open chromatin.
Collapse
Affiliation(s)
| | - Ferhat Ay
- La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, 92037, CA, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, 53717, WI, USA. .,Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, 53717, WI, USA.
| |
Collapse
|
31
|
Niu Z, Chasman D, Eisfeld AJ, Kawaoka Y, Roy S. Multi-task consensus clustering of genome-wide transcriptomes from related biological conditions. Bioinformatics 2016; 32:1509-17. [PMID: 26801959 DOI: 10.1093/bioinformatics/btw007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 01/04/2016] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Identifying the shared and pathogen-specific components of host transcriptional regulatory programs is important for understanding the principles of regulation of immune response. Recent efforts in systems biology studies of infectious diseases have resulted in a large collection of datasets measuring host transcriptional response to various pathogens. Computational methods to identify and compare gene expression modules across different infections offer a powerful way to identify strain-specific and shared components of the regulatory program. An important challenge is to identify statistically robust gene expression modules as well as to reliably detect genes that change their module memberships between infections. RESULTS We present MULCCH (MULti-task spectral Consensus Clustering for Hierarchically related tasks), a consensus extension of a multi-task clustering algorithm to infer high-confidence strain-specific host response modules under infections from multiple virus strains. On simulated data, MULCCH more accurately identifies genes exhibiting pathogen-specific patterns compared to non-consensus and nonmulti-task clustering approaches. Application of MULCCH to mammalian transcriptional response to a panel of influenza viruses showed that our method identifies clusters with greater coherence compared to non-consensus methods. Further, MULCCH derived clusters are enriched for several immune system-related processes and regulators. In summary, MULCCH provides a reliable module-based approach to identify molecular pathways and gene sets characterizing commonality and specificity of host response to viruses of different pathogenicities. AVAILABILITY AND IMPLEMENTATION The source code is available at https://bitbucket.org/roygroup/mulcch CONTACT sroy@biostat.wisc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen Niu
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Amie J Eisfeld
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, 53711, USA
| | - Yoshihiro Kawaoka
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, 53711, USA Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Sushmita Roy
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA
| |
Collapse
|
32
|
Joshi A, Beck Y, Michoel T. Multi-species network inference improves gene regulatory network reconstruction for early embryonic development in Drosophila. J Comput Biol 2016; 22:253-65. [PMID: 25844666 DOI: 10.1089/cmb.2014.0290] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Gene regulatory network inference uses genome-wide transcriptome measurements in response to genetic, environmental, or dynamic perturbations to predict causal regulatory influences between genes. We hypothesized that evolution also acts as a suitable network perturbation and that integration of data from multiple closely related species can lead to improved reconstruction of gene regulatory networks. To test this hypothesis, we predicted networks from temporal gene expression data for 3,610 genes measured during early embryonic development in six Drosophila species and compared predicted networks to gold standard networks of ChIP-chip and ChIP-seq interactions for developmental transcription factors in five species. We found that (i) the performance of single-species networks was independent of the species where the gold standard was measured; (ii) differences between predicted networks reflected the known phylogeny and differences in biology between the species; (iii) an integrative consensus network that minimized the total number of edge gains and losses with respect to all single-species networks performed better than any individual network. Our results show that in an evolutionarily conserved system, integration of data from comparable experiments in multiple species improves the inference of gene regulatory networks. They provide a basis for future studies on the numerous multispecies gene expression datasets for other biological processes available in the literature.
Collapse
Affiliation(s)
- Anagha Joshi
- 1 Division of Developmental Biology, The Roslin Institute, The University of Edinburgh , Midlothian, Scotland, United Kingdom
| | | | | |
Collapse
|
33
|
Knaack SA, Thompson DA, Roy S. Reconstruction and Analysis of the Evolution of Modular Transcriptional Regulatory Programs Using Arboretum. Methods Mol Biol 2016; 1361:375-89. [PMID: 26483033 PMCID: PMC5689457 DOI: 10.1007/978-1-4939-3079-1_21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Comparative functional genomics aims to measure and compare genome-wide functional data such as transcriptomes, proteomes, and epigenomes across multiple species to study the conservation and divergence patterns of such quantitative measurements. However, computational methods to systematically compare these quantitative genomic profiles across multiple species are in their infancy. We developed Arboretum, a novel algorithm to identify modules of co-expressed genes and trace their evolutionary history across multiple species from a complex phylogeny. To interpret the results from Arboretum we developed several measures to examine the extent of conservation and divergence in modules and their relationship to species lifestyle, cis-regulatory elements, and gene duplication. We applied Arboretum to study the evolution of modular transcriptional regulatory programs controlling transcriptional response to different environmental stresses in the yeast Ascomycota phylogeny. We found that modules of similar patterns of expression captured the transcriptional responses to different stresses across species; however, the genes exhibiting these patterns were not the same. Divergence in module membership was associated with changes in lifestyle and specific clades and that gene duplication was a major factor contributing to the divergence of module membership.
Collapse
Affiliation(s)
- Sara A. Knaack
- Wisconsin Institute for Discovery, University of Wisconsin at
Madison, Madison, WI, USA
| | | | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin at Madison, Madison, WI, USA. .,Department of Biostatistics and Medical Informatics, University of Wisconsin at Madison, Madison, WI, USA.
| |
Collapse
|
34
|
Thompson DA. Comparative Transcriptomics in Yeasts. Methods Mol Biol 2015; 1361:67-76. [PMID: 26483016 DOI: 10.1007/978-1-4939-3079-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Comparative functional genomics approaches have already shed an important light on the evolution of gene expression that underlies phenotypic diversity. However, comparison across many species in a phylogeny presents several major challenges. Here, we describe our experimental framework for comparative transcriptomics in a complex phylogeny.
Collapse
Affiliation(s)
- Dawn A Thompson
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA, 02142, USA.
| |
Collapse
|
35
|
Roy S, Thompson D. Evolution of regulatory networks in Candida glabrata: learning to live with the human host. FEMS Yeast Res 2015; 15:fov087. [PMID: 26449820 DOI: 10.1093/femsyr/fov087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/17/2015] [Indexed: 12/12/2022] Open
Abstract
The opportunistic human fungal pathogen Candida glabrata is second only to C. albicans as the cause of Candida infections and yet is more closely related to Saccharomyces cerevisiae. Recent advances in functional genomics technologies and computational approaches to decipher regulatory networks, and the comparison of these networks among these and other Ascomycete species, have revealed both unique and shared strategies in adaptation to a human commensal/opportunistic pathogen lifestyle and antifungal drug resistance in C. glabrata. Recently, several C. glabrata sister species in the Nakeseomyces clade representing both human associated (commensal) and environmental isolates have had their genomes sequenced and analyzed. This has paved the way for comparative functional genomics studies to characterize the regulatory networks in these species to identify informative patterns of conservation and divergence linked to phenotypic evolution in the Nakaseomyces lineage.
Collapse
Affiliation(s)
- Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin Madison, Madison, WI 53715, USA Wisconsin Institute for Discovery, University of Wisconsin, Madison, WI 53715, USA
| | - Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
36
|
Sarda S, Hannenhalli S. High-Throughput Identification of Cis-Regulatory Rewiring Events in Yeast. Mol Biol Evol 2015; 32:3047-63. [PMID: 26399482 DOI: 10.1093/molbev/msv203] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
A coregulated module of genes ("regulon") can have evolutionarily conserved expression patterns and yet have diverged upstream regulators across species. For instance, the ribosomal genes regulon is regulated by the transcription factor (TF) TBF1 in Candida albicans, while in Saccharomyces cerevisiae it is regulated by RAP1. Only a handful of such rewiring events have been established, and the prevalence or conditions conducive to such events are not well known. Here, we develop a novel probabilistic scoring method to comprehensively screen for regulatory rewiring within regulons across 23 yeast species. Investigation of 1,713 regulons and 176 TFs yielded 5,353 significant rewiring events at 5% false discovery rate (FDR). Besides successfully recapitulating known rewiring events, our analyses also suggest TF candidates for certain processes reported to be under distinct regulatory controls in S. cerevisiae and C. albicans, for which the implied regulators are not known: 1) Oxidative stress response (Sc-MSN2 to Ca-FKH2) and 2) nutrient modulation (Sc-RTG1 to Ca-GCN4/Ca-UME6). Furthermore, a stringent screen to detect TF rewiring at individual genes identified 1,446 events at 10% FDR. Overall, these events are supported by strong coexpression between the predicted regulator and its target gene(s) in a species-specific fashion (>50-fold). Independent functional analyses of rewiring TF pairs revealed greater functional interactions and shared biological processes between them (P = 1 × 10(-3)).Our study represents the first comprehensive assessment of regulatory rewiring; with a novel approach that has generated a unique high-confidence resource of several specific events, suggesting that evolutionary rewiring is relatively frequent and may be a significant mechanism of regulatory innovation.
Collapse
Affiliation(s)
- Shrutii Sarda
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park
| | - Sridhar Hannenhalli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park
| |
Collapse
|
37
|
Thompson D, Regev A, Roy S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu Rev Cell Dev Biol 2015; 31:399-428. [PMID: 26355593 DOI: 10.1146/annurev-cellbio-100913-012908] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
Collapse
Affiliation(s)
- Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | | | | |
Collapse
|
38
|
Brion C, Pflieger D, Friedrich A, Schacherer J. Evolution of intraspecific transcriptomic landscapes in yeasts. Nucleic Acids Res 2015; 43:4558-68. [PMID: 25897111 PMCID: PMC4482089 DOI: 10.1093/nar/gkv363] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 04/02/2015] [Indexed: 01/13/2023] Open
Abstract
Variations in gene expression have been widely explored in order to obtain an accurate overview of the changes in regulatory networks that underlie phenotypic diversity. Numerous studies have characterized differences in genomic expression between large numbers of individuals of model organisms such as Saccharomyces cerevisiae. To more broadly survey the evolution of the transcriptomic landscape across species, we measured whole-genome expression in a large collection of another yeast species: Lachancea kluyveri (formerly Saccharomyces kluyveri), using RNAseq. Interestingly, this species diverged from the S. cerevisiae lineage prior to its ancestral whole genome duplication. Moreover, L. kluyveri harbors a chromosome-scale compositional heterogeneity due to a 1-Mb ancestral introgressed region as well as a large set of unique unannotated genes. In this context, our comparative transcriptomic analysis clearly showed a link between gene evolutionary history and expression behavior. Indeed, genes that have been recently acquired or under function relaxation tend to be less transcribed show a higher intraspecific variation (plasticity) and are less involved in network (connectivity). Moreover, utilizing this approach in L. kluyveri also highlighted specific regulatory network signatures in aerobic respiration, amino-acid biosynthesis and glycosylation, presumably due to its different lifestyle. Our data set sheds an important light on the evolution of intraspecific transcriptomic variation across distant species.
Collapse
Affiliation(s)
- Christian Brion
- Department of Genetics, Genomics and Microbiology, University of Strasbourg, CNRS, UMR7156, Strasbourg, France
| | - David Pflieger
- Department of Genetics, Genomics and Microbiology, University of Strasbourg, CNRS, UMR7156, Strasbourg, France
| | - Anne Friedrich
- Department of Genetics, Genomics and Microbiology, University of Strasbourg, CNRS, UMR7156, Strasbourg, France
| | - Joseph Schacherer
- Department of Genetics, Genomics and Microbiology, University of Strasbourg, CNRS, UMR7156, Strasbourg, France
| |
Collapse
|
39
|
Bonnet E, Calzone L, Michoel T. Integrative multi-omics module network inference with Lemon-Tree. PLoS Comput Biol 2015; 11:e1003983. [PMID: 25679508 PMCID: PMC4332478 DOI: 10.1371/journal.pcbi.1003983] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Accepted: 10/14/2014] [Indexed: 01/05/2023] Open
Abstract
Module network inference is an established statistical method to reconstruct co-expression modules and their upstream regulatory programs from integrated multi-omics datasets measuring the activity levels of various cellular components across different individuals, experimental conditions or time points of a dynamic process. We have developed Lemon-Tree, an open-source, platform-independent, modular, extensible software package implementing state-of-the-art ensemble methods for module network inference. We benchmarked Lemon-Tree using large-scale tumor datasets and showed that Lemon-Tree algorithms compare favorably with state-of-the-art module network inference software. We also analyzed a large dataset of somatic copy-number alterations and gene expression levels measured in glioblastoma samples from The Cancer Genome Atlas and found that Lemon-Tree correctly identifies known glioblastoma oncogenes and tumor suppressors as master regulators in the inferred module network. Novel candidate driver genes predicted by Lemon-Tree were validated using tumor pathway and survival analyses. Lemon-Tree is available from http://lemon-tree.googlecode.com under the GNU General Public License version 2.0.
Collapse
Affiliation(s)
- Eric Bonnet
- Institut Curie, Paris, France
- INSERM U900, Paris, France
- Mines ParisTech, Fontainebleau, France
- * E-mail: (EB); (TM)
| | - Laurence Calzone
- Institut Curie, Paris, France
- INSERM U900, Paris, France
- Mines ParisTech, Fontainebleau, France
| | - Tom Michoel
- Division of Genetics & Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, United Kingdom
- * E-mail: (EB); (TM)
| |
Collapse
|
40
|
Abu-Jamous B, Fa R, Roberts DJ, Nandi AK. Comprehensive analysis of forty yeast microarray datasets reveals a novel subset of genes (APha-RiB) consistently negatively associated with ribosome biogenesis. BMC Bioinformatics 2014; 15:322. [PMID: 25267386 PMCID: PMC4262117 DOI: 10.1186/1471-2105-15-322] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Accepted: 09/22/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The scale and complexity of genomic data lend themselves to analysis using sophisticated mathematical techniques to yield information that can generate new hypotheses and so guide further experimental investigations. An ensemble clustering method has the ability to perform consensus clustering over the same set of genes from different microarray datasets by combining results from different clustering methods into a single consensus result. RESULTS In this paper we have performed comprehensive analysis of forty yeast microarray datasets. One recently described Bi-CoPaM method can analyse expressions of the same set of genes from various microarray datasets while using different clustering methods, and then combine these results into a single consensus result whose clusters' tightness is tunable from tight, specific clusters to wide, overlapping clusters. This has been adopted in a novel way over genome-wide data from forty yeast microarray datasets to discover two clusters of genes that are consistently co-expressed over all of these datasets from different biological contexts and various experimental conditions. Most strikingly, average expression profiles of those clusters are consistently negatively correlated in all of the forty datasets while neither profile leads or lags the other. CONCLUSIONS The first cluster is enriched with ribosomal biogenesis genes. The biological processes of most of the genes in the second cluster are either unknown or apparently unrelated although they show high connectivity in protein-protein and genetic interaction networks. Therefore, it is possible that this mostly uncharacterised cluster and the ribosomal biogenesis cluster are transcriptionally oppositely regulated by some common machinery. Moreover, we anticipate that the genes included in this previously unknown cluster participate in generic, in contrast to specific, stress response processes. These novel findings illuminate coordinated gene expression in yeast and suggest several hypotheses for future experimental functional work. Additionally, we have demonstrated the usefulness of the Bi-CoPaM-based approach, which may be helpful for the analysis of other groups of (microarray) datasets from other species and systems for the exploration of global genetic co-expression.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- />Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UB8 3PH UK
| | - Rui Fa
- />Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UB8 3PH UK
| | - David J Roberts
- />National Health Service Blood and Transplant, Oxford, UK
- />Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK
| | - Asoke K Nandi
- />Department of Electronic and Computer Engineering, Brunel University, Uxbridge, Middlesex, UB8 3PH UK
- />Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|
41
|
Primate iPS cells as tools for evolutionary analyses. Stem Cell Res 2014; 12:622-9. [PMID: 24631741 DOI: 10.1016/j.scr.2014.02.001] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Revised: 01/31/2014] [Accepted: 02/01/2014] [Indexed: 11/21/2022] Open
Abstract
Induced pluripotent stem cells (iPSCs) are regarded as a central tool to understand human biology in health and disease. Similarly, iPSCs from non-human primates should be a central tool to understand human evolution, in particular for assessing the conservation of regulatory networks in iPSC models. Here, we have generated human, gorilla, bonobo and cynomolgus monkey iPSCs and assess their usefulness in such a framework. We show that these cells are well comparable in their differentiation potential and are generally similar to human, cynomolgus and rhesus monkey embryonic stem cells (ESCs). RNA sequencing reveals that expression differences among clones, individuals and stem cell type are all of very similar magnitude within a species. In contrast, expression differences between closely related primate species are three times larger and most genes show significant expression differences among the analyzed species. However, pseudogenes differ more than twice as much, suggesting that evolution of expression levels in primate stem cells is rapid, but constrained. These patterns in pluripotent stem cells are comparable to those found in other tissues except testis. Hence, primate iPSCs reveal insights into general primate gene expression evolution and should provide a rich source to identify conserved and species-specific gene expression patterns for cellular phenotypes.
Collapse
|
42
|
Ragan MA. Yeast rises to the occasion. eLife 2013; 2:e00933. [PMID: 23795300 PMCID: PMC3687331 DOI: 10.7554/elife.00933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genetic analyses of 15 species of yeast have shed new light on the divergence of gene regulation during evolution, with significant changes occurring after an event in which a whole genome was duplicated.
Collapse
Affiliation(s)
- Mark A Ragan
- Institute for Molecular Bioscience and the School of Information Technology and Electrical Engineering, University of Queensland, Brisbane, Australia
| |
Collapse
|
43
|
Thompson DA, Roy S, Chan M, Styczynsky MP, Pfiffner J, French C, Socha A, Thielke A, Napolitano S, Muller P, Kellis M, Konieczka JH, Wapinski I, Regev A. Evolutionary principles of modular gene regulation in yeasts. eLife 2013; 2:e00603. [PMID: 23795289 PMCID: PMC3687341 DOI: 10.7554/elife.00603] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Accepted: 05/02/2013] [Indexed: 12/20/2022] Open
Abstract
Divergence in gene regulation can play a major role in evolution. Here, we used a phylogenetic framework to measure mRNA profiles in 15 yeast species from the phylum Ascomycota and reconstruct the evolution of their modular regulatory programs along a time course of growth on glucose over 300 million years. We found that modules have diverged proportionally to phylogenetic distance, with prominent changes in gene regulation accompanying changes in lifestyle and ploidy, especially in carbon metabolism. Paralogs have significantly contributed to regulatory divergence, typically within a very short window from their duplication. Paralogs from a whole genome duplication (WGD) event have a uniquely substantial contribution that extends over a longer span. Similar patterns occur when considering the evolution of the heat shock regulatory program measured in eight of the species, suggesting that these are general evolutionary principles. DOI:http://dx.doi.org/10.7554/eLife.00603.001 The incredible diversity of living creatures belies the fact that their genes are quite similar. In the 1970s Mary-Claire King and Allan Wilson proposed that a process called gene regulation—which determines when, where and how genes are expressed as proteins—is responsible for this diversity. Four decades later, the central role of gene regulation in evolution has been confirmed in a wide range of species including bacteria, fungi, flies and mammals, although the details remain poorly understood. In recent years it has been suggested that the duplication of genes—and sometimes the duplication of whole genomes—has had a crucial influence on the part played by gene regulation in the evolution of many different species. Ascomycota fungi are uniquely suited to the study of genetics and evolution because of their diversity—they include C. albicans, a fungus that is found in the human mouth and gut, and various species of yeast—and because many of their genomes have already been sequenced. Moreover, their genomes are relatively small, which simplifies the task of working out how it has changed over the course of evolution. It is also known that species in this branch of the tree of life diverged before and after an event in which a whole genome was duplicated. Ascomycota fungi use glucose as a source of carbon in different ways during aerobic growth. Most, including C. albicans, are respiratory and rely on oxidative phosphorylation processes to produce energy. However, a small number—including S. cerevisiae and S. pombe, two types of yeast that are widely used as model organisms—prefer to ferment glucose, even when oxygen is available. Species that favor the latter respiro-fermentative lifestyle have evolved independently at least twice: once after the whole genome duplication event that lead to S. cerevisiae, and once when S. pombe and the other fission yeasts evolved. Thompson et al. have measured mRNA profiles in 15 different species of yeast and reconstructed how the regulation of groups of genes (modules) have evolved over a period of more than 300 million years. They found that modules have diverged proportionally to evolutionary time, with prominent changes in gene regulation being associated with changes in lifestyle (especially changes in carbon metabolism) and a whole genome duplication event. Gene duplication events result in gene paralogs—identical genes at different places in the genome—and these have made significant contributions to the evolution of different forms of gene regulation, especially just after the duplication event. Moreover, the paralogs produced in whole genome duplication events have resulted in bigger changes over longer periods of time. Similar patterns were observed in the regulation of the genes involved in the response to heat shock in eight of the species, which suggests that these are general evolutionary principles. The changes in gene expression associated with the respiro-fermentative lifestyle may also have implications for our understanding of cancer: healthy cells rely on oxidative phosphorylation to produce energy whereas, similar to yeast cells, most cancerous cells rely on respiro-fermentation. Furthermore, yeast cells and cancer cells both support their rapid growth and proliferation by using glucose for biosynthesis to support cell division, although this process is not fully understood. Normal cells, on the other hand, use glucose primarily for energy and tend not to divide rapidly. Thompson et al. found that the genes encoding enzymes in two biosynthetic pathways—one that produces the nucleotides necessary for DNA replication, and one that synthesizes glycine—are induced in respiro-fermentative yeasts but repressed in respiratory yeast cells. The fact that similar changes are observed in the same two pathways when normal cells become cancer cells suggests that these pathways have an important role in the development of cancer. The framework developed by Thompson et al. could also be used to explore the evolution of gene regulation in other species and biological processes. DOI:http://dx.doi.org/10.7554/eLife.00603.002
Collapse
Affiliation(s)
- Dawn A Thompson
- Broad Institute of MIT and Harvard , Cambridge , United States
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|