1
|
Genome-Wide Identification and Characterization of Polygalacturonase Gene Family in Maize ( Zea mays L.). Int J Mol Sci 2021; 22:ijms221910722. [PMID: 34639068 PMCID: PMC8509529 DOI: 10.3390/ijms221910722] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 09/27/2021] [Accepted: 09/29/2021] [Indexed: 11/29/2022] Open
Abstract
Polygalacturonase (PG, EC 3.2.1.15) is a crucial enzyme for pectin degradation and is involved in various developmental processes such as fruit ripening, pollen development, cell expansion, and organ abscission. However, information on the PG gene family in the maize (Zea mays L.) genome and the specific members involved in maize anther development are still lacking. In this study, we identified 55 PG family genes from the maize genome and further characterized their evolutionary relationship and expression patterns. Phylogenetic analysis revealed that ZmPGs are grouped into six Clades, and gene structures of the same Clade are highly conserved, suggesting their functional conservation. The ZmPGs are randomly distributed across maize chromosomes, and collinearity analysis showed that many ZmPGs might be derived from tandem duplications and segmental duplications, and these genes are under purifying selection. Furthermore, gene expression analysis provided insights into possible functional divergence among ZmPGs. Based on the RNA-seq data analysis, we found that many ZmPGs are expressed in various tissues while 18 ZmPGs are highly expressed in maize anther, and their detailed expression profiles in different anther developmental stages were further investigated by using RT-qPCR analysis. These results provide valuable information for further functional characterization and application of the ZmPGs in maize.
Collapse
|
2
|
Vancaester E, Depuydt T, Osuna-Cruz CM, Vandepoele K. Comprehensive and Functional Analysis of Horizontal Gene Transfer Events in Diatoms. Mol Biol Evol 2021; 37:3243-3257. [PMID: 32918458 DOI: 10.1093/molbev/msaa182] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Diatoms are a diverse group of mainly photosynthetic algae, responsible for 20% of worldwide oxygen production, which can rapidly respond to favorable conditions and often outcompete other phytoplankton. We investigated the contribution of horizontal gene transfer (HGT) to its ecological success. A large-scale phylogeny-based prokaryotic HGT detection procedure across nine sequenced diatoms showed that 3-5% of their proteome has a horizontal origin and a large influx occurred at the ancestor of diatoms. More than 90% of HGT genes are expressed, and species-specific HGT genes in Phaeodactylum tricornutum undergo strong purifying selection. Genes derived from HGT are implicated in several processes including environmental sensing and expand the metabolic toolbox. Cobalamin (vitamin B12) is an essential cofactor for roughly half of the diatoms and is only produced by bacteria. Five consecutive genes involved in the final synthesis of the cobalamin biosynthetic pathway, which could function as scavenging and repair genes, were detected as HGT. The full suite of these genes was detected in the cold-adapted diatom Fragilariopsis cylindrus. This might give diatoms originating from the Southern Ocean, a region typically depleted in cobalamin, a competitive advantage. Overall, we show that HGT is a prevalent mechanism that is actively used in diatoms to expand its adaptive capabilities.
Collapse
Affiliation(s)
- Emmelien Vancaester
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Thomas Depuydt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Cristina Maria Osuna-Cruz
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| |
Collapse
|
3
|
Li D, Kishta MS, Wang J. Regulation of pluripotency and reprogramming by RNA binding proteins. Curr Top Dev Biol 2020; 138:113-138. [PMID: 32220295 DOI: 10.1016/bs.ctdb.2020.01.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Embryonic stem cells have the capacities of self-renewal and pluripotency. Pluripotency establishment (somatic cell reprogramming), maintenance, and execution (differentiation) require orchestrated regulatory mechanisms of a cell's molecular machinery, including signaling pathways, epigenetics, transcription, translation, and protein degradation. RNA binding proteins (RBPs) take part in every process of RNA regulation and recent studies began to address their important functions in the regulation of pluripotency and reprogramming. Here, we discuss the roles of RBPs in key regulatory steps in the control of pluripotency and reprogramming. Among RNA binding proteins are a group of RNA helicases that are responsible for RNA structure remodeling with important functional implications. We highlight the largest family of RNA helicases, DDX (DEAD-box) helicase family and our current understanding of their functions specifically in the regulation of pluripotency and reprogramming.
Collapse
Affiliation(s)
- Dan Li
- Department of Cell, Developmental and Regenerative Biology; The Black Family Stem Cell Institute; Icahn School of Medicine at Mount Sinai, New York, NY, United States; The Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Mohamed S Kishta
- Hormones Department, Medical Research Division, National Research Centre, Cairo, Egypt; Stem Cell Lab., Center of Excellence for Advanced Sciences, National Research Centre, Cairo, Egypt; Department of Medicine, Columbia Center for Human Development, Columbia University Irving Medical Center, New York, NY, United States
| | - Jianlong Wang
- Department of Cell, Developmental and Regenerative Biology; The Black Family Stem Cell Institute; Icahn School of Medicine at Mount Sinai, New York, NY, United States; The Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States; Department of Medicine, Columbia Center for Human Development, Columbia University Irving Medical Center, New York, NY, United States.
| |
Collapse
|
4
|
Abstract
One of the most widely recognized features of biological systems is their modularity. The modules that constitute biological systems are said to be redeployed and combined across several conditions, thus acting as building blocks. In this work, we analyse to what extent are these building blocks reusable as compared with those found in randomized versions of a system. We develop a notion of decompositions of systems into phenotypic building blocks, which allows them to overlap while maximizing the number of times a building block is reused across several conditions. Different biological systems present building blocks whose reusability ranges from single use (e.g. condition specific) to constitutive, although their average reusability is not always higher than random equivalents of the system. These decompositions reveal a distinct distribution of building block sizes in real biological systems. This distribution stems, in part, from the peculiar usage pattern of the elements of biological systems, and constitutes a new angle to study the evolution of modularity.
Collapse
Affiliation(s)
- Victor Mireles
- 1 Department of Mathematics and Computer Science, Freie Universität Berlin , Berlin, Germany.,2 International Max Planck Research School for Computational Biology and Scientific Computing, Max Planck Institute for Molecular Genetics , Berlin , Germany
| | - Tim O F Conrad
- 1 Department of Mathematics and Computer Science, Freie Universität Berlin , Berlin, Germany
| |
Collapse
|
5
|
Jain A, Perisa D, Fliedner F, von Haeseler A, Ebersberger I. The Evolutionary Traceability of a Protein. Genome Biol Evol 2019; 11:531-545. [PMID: 30649284 PMCID: PMC6394115 DOI: 10.1093/gbe/evz008] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/11/2019] [Indexed: 12/12/2022] Open
Abstract
Orthologs document the evolution of genes and metabolic capacities encoded in extant and ancient genomes. However, the similarity between orthologs decays with time, and ultimately it becomes insufficient to infer common ancestry. This leaves ancient gene set reconstructions incomplete and distorted to an unknown extent. Here we introduce the “evolutionary traceability” as a measure that quantifies, for each protein, the evolutionary distance beyond which the sensitivity of the ortholog search becomes limiting. Using yeast, we show that genes that were thought to date back to the last universal common ancestor are of high traceability. Their functions mostly involve catalysis, ion transport, and ribonucleoprotein complex assembly. In turn, the fraction of yeast genes whose traceability is not sufficient to infer their presence in last universal common ancestor is enriched for regulatory functions. Computing the traceabilities of genes that have been experimentally characterized as being essential for a self-replicating cell reveals that many of the genes that lack orthologs outside bacteria have low traceability. This leaves open whether their orthologs in the eukaryotic and archaeal domains have been overlooked. Looking at the example of REC8, a protein essential for chromosome cohesion, we demonstrate how a traceability-informed adjustment of the search sensitivity identifies hitherto missed orthologs in the fast-evolving microsporidia. Taken together, the evolutionary traceability helps to differentiate between true absence and nondetection of orthologs, and thus improves our understanding about the evolutionary conservation of functional protein networks. “protTrace,” a software tool for computing evolutionary traceability, is freely available at https://github.com/BIONF/protTrace.git; last accessed February 10, 2019.
Collapse
Affiliation(s)
- Arpit Jain
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany
| | - Dominik Perisa
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany
| | - Fabian Fliedner
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University Vienna, Austria.,Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Austria
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology & Neuroscience, Goethe University, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Center (BiK-F), Frankfurt, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| |
Collapse
|
6
|
Mackeh R, Marr AK, Fadda A, Kino T. C2H2-Type Zinc Finger Proteins: Evolutionarily Old and New Partners of the Nuclear Hormone Receptors. NUCLEAR RECEPTOR SIGNALING 2018; 15:1550762918801071. [PMID: 30718982 PMCID: PMC6348741 DOI: 10.1177/1550762918801071] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Accepted: 02/02/2017] [Indexed: 12/21/2022]
Abstract
Nuclear hormone receptors (NRs) are evolutionarily conserved ligand-dependent
transcription factors. They are essential for human life, mediating the actions
of lipophilic molecules, such as steroid hormones and metabolites of fatty acid,
cholesterol, and external toxic compounds. The C2H2-type zinc finger proteins
(ZNFs) form the largest family of the transcription factors in humans and are
characterized by multiple, tandemly arranged zinc fingers. Many of the C2H2-type
ZNFs are conserved throughout evolution, suggesting their involvement in
preserved biological activities, such as general transcriptional regulation and
development/differentiation of organs/tissues observed in the early embryonic
phase. However, some C2H2-type ZNFs, such as those with the Krüppel-associated
box (KRAB) domain, appeared relatively late in evolution and have significantly
increased family members in mammals including humans, possibly modulating their
complicated transcriptional network and/or supporting the morphological
development/functions specific to them. Such evolutional characteristics of the
C2H2-type ZNFs indicate that these molecules influence the NR functions
conserved through evolution, whereas some also adjust them to meet with specific
needs of higher organisms. We review the interaction between NRs and C2H2-type
ZNFs by focusing on some of the latter molecules.
Collapse
|
7
|
de Souza MM, Zerlotini A, Geistlinger L, Tizioto PC, Taylor JF, Rocha MIP, Diniz WJS, Coutinho LL, Regitano LCA. A comprehensive manually-curated compendium of bovine transcription factors. Sci Rep 2018; 8:13747. [PMID: 30213987 PMCID: PMC6137171 DOI: 10.1038/s41598-018-32146-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 08/29/2018] [Indexed: 01/28/2023] Open
Abstract
Transcription factors (TFs) are pivotal regulatory proteins that control gene expression in a context-dependent and tissue-specific manner. In contrast to human, where comprehensive curated TF collections exist, bovine TFs are only rudimentary recorded and characterized. In this article, we present a manually-curated compendium of 865 sequence-specific DNA-binding bovines TFs, which we analyzed for domain family distribution, evolutionary conservation, and tissue-specific expression. In addition, we provide a list of putative transcription cofactors derived from known interactions with the identified TFs. Since there is a general lack of knowledge concerning the regulation of gene expression in cattle, the curated list of TF should provide a basis for an improved comprehension of regulatory mechanisms that are specific to the species.
Collapse
Affiliation(s)
- Marcela M de Souza
- Post-graduation Program of Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, São Paulo, 13560-970, Brazil.,Animal Biotechnology, Embrapa Pecuária Sudeste, São Carlos, São Paulo, 13560-970, Brazil
| | - Adhemar Zerlotini
- Bioinformatic Multi-user Laboratory, Embrapa Informática Agropecuária, Campinas, São Paulo, 70770-901, Brazil
| | - Ludwig Geistlinger
- Animal Biotechnology, Embrapa Pecuária Sudeste, São Carlos, São Paulo, 13560-970, Brazil
| | | | - Jeremy F Taylor
- Division of Animal Science, University of Missouri, Columbia, Missouri, 65211-5300, USA
| | - Marina I P Rocha
- Post-graduation Program of Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, São Paulo, 13560-970, Brazil
| | - Wellison J S Diniz
- Post-graduation Program of Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, São Paulo, 13560-970, Brazil
| | - Luiz L Coutinho
- Functional Genomic Center, University of São Paulo, Piracicaba, São Paulo, 13418-900, Brazil
| | - Luciana C A Regitano
- Animal Biotechnology, Embrapa Pecuária Sudeste, São Carlos, São Paulo, 13560-970, Brazil.
| |
Collapse
|
8
|
Schumacher J, Herlyn H. Correlates of evolutionary rates in the murine sperm proteome. BMC Evol Biol 2018; 18:35. [PMID: 29580206 PMCID: PMC5870804 DOI: 10.1186/s12862-018-1157-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 03/19/2018] [Indexed: 01/20/2023] Open
Abstract
Background Protein-coding genes expressed in sperm evolve at different rates. To gain deeper insight into the factors underlying this heterogeneity we examined the relative importance of a diverse set of previously described rate correlates in determining the evolution of murine sperm proteins. Results Using partial rank correlations we detected several major rate indicators: Phyletic gene age, numbers of protein-protein interactions, and survival essentiality emerged as particularly important rate correlates in murine sperm proteins. Tissue specificity, numbers of paralogs, and untranslated region lengths also correlate significantly with sperm genes’ evolutionary rates, albeit to a lesser extent. Multifunctionality, coding sequence or average intron lengths, and mean expression level have insignificant or virtually no independent effects on evolutionary rates in murine sperm genes. Gene ontology enrichment analyses of three equally sized murine sperm protein groups classified based on their evolutionary rates indicate strongest sperm-specific functional specialization in the most quickly evolving gene class. Conclusions We propose a model according to which slowly evolving murine sperm proteins tend to be constrained by factors such as survival essentiality, network connectivity, and/or broad expression. In contrast, evolutionary change may arise especially in less constrained sperm proteins, which might, moreover, be prone to specialize to reproduction-related functions. Our results should be taken into account in future studies on rate variations of reproductive genes. Electronic supplementary material The online version of this article (10.1186/s12862-018-1157-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julia Schumacher
- Institute of Organismic and Molecular Evolution, Anthropology, Johannes Gutenberg University, Mainz, Germany.
| | - Holger Herlyn
- Institute of Organismic and Molecular Evolution, Anthropology, Johannes Gutenberg University, Mainz, Germany.
| |
Collapse
|
9
|
Schmitz JF, Zimmer F, Bornberg-Bauer E. Mechanisms of transcription factor evolution in Metazoa. Nucleic Acids Res 2016; 44:6287-97. [PMID: 27288445 PMCID: PMC5291267 DOI: 10.1093/nar/gkw492] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Revised: 05/18/2016] [Accepted: 05/22/2016] [Indexed: 11/12/2022] Open
Abstract
Transcriptions factors (TFs) are pivotal for the regulation of virtually all cellular processes, including growth and development. Expansions of TF families are causally linked to increases in organismal complexity. Here we study the evolutionary dynamics, genetic causes and functional implications of the five largest metazoan TF families. We find that family expansions dominate across the whole metazoan tree; however, some branches experience exceptional family-specific accelerated expansions. Additionally, we find that such expansions are often predated by modular domain rearrangements, which spur the expansion of a new sub-family by separating it from the rest of the TF family in terms of protein-protein interactions. This separation allows for radical shifts in the functional spectrum of a duplicated TF. We also find functional differentiation inside TF sub-families as changes in expression specificity. Furthermore, accelerated family expansions are facilitated by repeats of sequence motifs such as C2H2 zinc fingers. We quantify whole genome duplications and single gene duplications as sources of TF family expansions, implying that some, but not all, TF duplicates are preferentially retained. We conclude that trans-regulatory changes (domain rearrangements) are instrumental for fundamental functional innovations, that cis-regulatory changes (affecting expression) accomplish wide-spread fine tuning and both jointly contribute to the functional diversification of TFs.
Collapse
Affiliation(s)
- Jonathan F Schmitz
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, Hüfferstrasse 1, D-48149 Münster, Germany
| | - Fabian Zimmer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, Hüfferstrasse 1, D-48149 Münster, Germany Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, Hüfferstrasse 1, D-48149 Münster, Germany
| |
Collapse
|
10
|
Yang L, Wang S, Zhou M, Chen X, Zuo Y, Sun D, Lv Y. Comparative analysis of housekeeping and tissue-selective genes in human based on network topologies and biological properties. Mol Genet Genomics 2016; 291:1227-41. [PMID: 26897376 DOI: 10.1007/s00438-016-1178-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 01/26/2016] [Indexed: 01/14/2023]
Abstract
Housekeeping genes are genes that are turned on most of the time in almost every tissue to maintain cellular functions. Tissue-selective genes are predominantly expressed in one or a few biologically relevant tissue types. Benefitting from the massive gene expression microarray data obtained over the past decades, the properties of housekeeping and tissue-selective genes can now be investigated on a large-scale manner. In this study, we analyzed the topological properties of housekeeping and tissue-selective genes in the protein-protein interaction (PPI) network. Furthermore, we compared the biological properties and amino acid usage between these two gene groups. The results indicated that there were significant differences in topological properties between housekeeping and tissue-selective genes in the PPI network, and housekeeping genes had higher centrality properties and may play important roles in the complex biological network environment. We also found that there were significant differences in multiple biological properties and many amino acid compositions. The functional genes enrichment and subcellular localizations analysis was also performed to investigate the characterization of housekeeping and tissue-selective genes. The results indicated that the two gene groups showed significant different enrichment in drug targets, disease genes and toxin targets, and located in different subcellular localizations. At last, the discriminations between the properties of two gene groups were measured by the F-score, and expression stage had the most discriminative index in all properties. These findings may elucidate the biological mechanisms for understanding housekeeping and tissue-selective genes and may contribute to better annotate housekeeping and tissue-selective genes in other organisms.
Collapse
Affiliation(s)
- Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Xiaowen Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The National Research Center for Animal Transgenic Biotechnology, Inner Mongolia University, Hohhot, 010021, China
| | - Dianjun Sun
- Center for Endemic Disease Control, Chinese Center for Disease Control and Prevention, Harbin Medical University, Harbin, 150081, China.
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
11
|
Liang Y, Yu Y, Shen X, Dong H, Lyu M, Xu L, Ma Z, Liu T, Cao J. Dissecting the complex molecular evolution and expression of polygalacturonase gene family in Brassica rapa ssp. chinensis. PLANT MOLECULAR BIOLOGY 2015; 89:629-46. [PMID: 26506823 DOI: 10.1007/s11103-015-0390-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 10/06/2015] [Indexed: 05/22/2023]
Abstract
Polygalacturonases (PGs) participate in pectin disassembly of cell wall and belong to one of the largest hydrolase families in plants. In this study, we identified 99 PG genes in Brassica rapa. Comprehensive analysis of phylogeny, gene structures, physico-chemical properties and coding sequence evolution demonstrated that plant PGs should be classified into seven divergent clades and each clade's members had specific sequence and structure characteristics, and/or were under specific selection pressures. Genomic distribution and retention rate analysis implied duplication events and biased retention contributed to PG family's expansion. Promoter divergence analysis using "shared motif method" revealed a significant correlation between regulatory and coding sequence evolution of PGs, and proved Clades A and E were of ancient origin. Quantitative real-time PCR analysis showed that expression patterns of PGs displayed group specificities in B. rapa. Particularly, nearly half of PG family members, especially those of Clades C, D and F, closely relates to reproductive development. Most duplicates showed similar expression profiles, suggesting dosage constraints accounted for preservation after duplication. Promoter-GUS assay further indicated PGs' extensive roles and possible redundancy during reproductive development. This work can provide a scientific classification of plant PGs, dissect the internal relationships between their evolution and expressions, and promote functional researches.
Collapse
Affiliation(s)
- Ying Liang
- Laboratory of Cell and Molecular Biology, Institute of Vegetable Science, Zhejiang University, Hangzhou, 310058, China.
- Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Hangzhou, 310058, China.
| | - Youjian Yu
- Laboratory of Cell and Molecular Biology, Institute of Vegetable Science, Zhejiang University, Hangzhou, 310058, China.
- Department of Horticulture, College of Agriculture and Food Science, Zhejiang A & F University, Lin'an, 311300, China.
| | - Xiuping Shen
- Laboratory of Cell and Molecular Biology, Institute of Vegetable Science, Zhejiang University, Hangzhou, 310058, China.
- Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Hangzhou, 310058, China.
| | - Heng Dong
- Laboratory of Cell and Molecular Biology, Institute of Vegetable Science, Zhejiang University, Hangzhou, 310058, China.
- Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Hangzhou, 310058, China.
| | - Meiling Lyu
- Laboratory of Cell and Molecular Biology, Institute of Vegetable Science, Zhejiang University, Hangzhou, 310058, China.
- Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Hangzhou, 310058, China.
| | - Liai Xu
- Laboratory of Cell and Molecular Biology, Institute of Vegetable Science, Zhejiang University, Hangzhou, 310058, China.
- Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Hangzhou, 310058, China.
| | - Zhiming Ma
- Laboratory of Cell and Molecular Biology, Institute of Vegetable Science, Zhejiang University, Hangzhou, 310058, China.
- Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Hangzhou, 310058, China.
| | - Tingting Liu
- Laboratory of Cell and Molecular Biology, Institute of Vegetable Science, Zhejiang University, Hangzhou, 310058, China.
- Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Hangzhou, 310058, China.
| | - Jiashu Cao
- Laboratory of Cell and Molecular Biology, Institute of Vegetable Science, Zhejiang University, Hangzhou, 310058, China.
- Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
12
|
Abstract
Post-transcriptional gene regulation (PTGR) concerns processes involved in the maturation, transport, stability and translation of coding and non-coding RNAs. RNA-binding proteins (RBPs) and ribonucleoproteins coordinate RNA processing and PTGR. The introduction of large-scale quantitative methods, such as next-generation sequencing and modern protein mass spectrometry, has renewed interest in the investigation of PTGR and the protein factors involved at a systems-biology level. Here, we present a census of 1,542 manually curated RBPs that we have analysed for their interactions with different classes of RNA, their evolutionary conservation, their abundance and their tissue-specific expression. Our analysis is a critical step towards the comprehensive characterization of proteins involved in human RNA metabolism.
Collapse
Affiliation(s)
- Stefanie Gerstberger
- Howard Hughes Medical Institute and Laboratory for RNA Molecular Biology, The Rockefeller University, 1230 York Ave, New York 10065, USA
| | - Markus Hafner
- Laboratory of Muscle Stem Cells and Gene Regulation, National Institute of Arthritis and Musculoskeletal and Skin Disease, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Thomas Tuschl
- Howard Hughes Medical Institute and Laboratory for RNA Molecular Biology, The Rockefeller University, 1230 York Ave, New York 10065, USA
| |
Collapse
|
13
|
Schaefer MH, Yang JS, Serrano L, Kiel C. Protein conservation and variation suggest mechanisms of cell type-specific modulation of signaling pathways. PLoS Comput Biol 2014; 10:e1003659. [PMID: 24922536 PMCID: PMC4055412 DOI: 10.1371/journal.pcbi.1003659] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Accepted: 04/21/2014] [Indexed: 02/04/2023] Open
Abstract
Many proteins and signaling pathways are present in most cell types and tissues and yet perform specialized functions. To elucidate mechanisms by which these ubiquitous pathways are modulated, we overlaid information about cross-cell line protein abundance and variability, and evolutionary conservation onto functional pathway components and topological layers in the pathway hierarchy. We found that the input (receptors) and the output (transcription factors) layers evolve more rapidly than proteins in the intermediary transmission layer. In contrast, protein expression variability decreases from the input to the output layer. We observed that the differences in protein variability between the input and transmission layer can be attributed to both the network position and the tendency of variable proteins to physically interact with constitutively expressed proteins. Differences in protein expression variability and conservation are also accompanied by the tendency of conserved and constitutively expressed proteins to acquire somatic mutations, while germline mutations tend to occur in cell type-specific proteins. Thus, conserved core proteins in the transmission layer could perform a fundamental role in most cell types and are therefore less tolerant to germline mutations. In summary, we propose that the core signal transmission machinery is largely modulated by a variable input layer through physical protein interactions. We hypothesize that the bow-tie organization of cellular signaling on the level of protein abundance variability contributes to the specificity of the signal response in different cell types. Cell function is determined by highly organized networks of biological molecules. An important class of protein pathways maintains the transmission of signals from the cell membrane to the nucleus. These signaling pathways are reused for different purposes at an evolutionary scale and in different cell types of the same organism. However, it is largely unknown how this flexibility is achieved and how this flexibility is balanced with the high degree of evolutionary conservation of some signaling proteins and the need for robustness against intra- and extra-cellular perturbations.We show how functional roles of signaling proteins determine patterns of evolutionary conservation, protein abundance (the average over different human cell lines and its variability) and disease mutations. Projecting pathway annotations on protein-protein interaction (PPI) networks, a picture emerges in which PPIs between variable and less conserved receptors and stable and conserved proteins of the core signal transmission machinery largely modulate signaling activity in a tissue-specific manner. This has important implications for the distribution of disease mutations in signaling pathways, which need to be considered for the understanding of their effect.
Collapse
Affiliation(s)
- Martin H. Schaefer
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- * E-mail: (MHS); (LS); (CK)
| | - Jae-Seong Yang
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Luis Serrano
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- * E-mail: (MHS); (LS); (CK)
| | - Christina Kiel
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- * E-mail: (MHS); (LS); (CK)
| |
Collapse
|
14
|
Murali T, Pacifico S, Finley RL. Integrating the interactome and the transcriptome of Drosophila. BMC Bioinformatics 2014; 15:177. [PMID: 24913703 PMCID: PMC4229734 DOI: 10.1186/1471-2105-15-177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 05/28/2014] [Indexed: 12/29/2022] Open
Abstract
Background Networks of interacting genes and gene products mediate most cellular and developmental processes. High throughput screening methods combined with literature curation are identifying many of the protein-protein interactions (PPI) and protein-DNA interactions (PDI) that constitute these networks. Most of the detection methods, however, fail to identify the in vivo spatial or temporal context of the interactions. Thus, the interaction data are a composite of the individual networks that may operate in specific tissues or developmental stages. Genome-wide expression data may be useful for filtering interaction data to identify the subnetworks that operate in specific spatial or temporal contexts. Here we take advantage of the extensive interaction and expression data available for Drosophila to analyze how interaction networks may be unique to specific tissues and developmental stages. Results We ranked genes on a scale from ubiquitously expressed to tissue or stage specific and examined their interaction patterns. Interestingly, ubiquitously expressed genes have many more interactions among themselves than do non-ubiquitously expressed genes both in PPI and PDI networks. While the PDI network is enriched for interactions between tissue-specific transcription factors and their tissue-specific targets, a preponderance of the PDI interactions are between ubiquitous and non-ubiquitously expressed genes and proteins. In contrast to PDI, PPI networks are depleted for interactions among tissue- or stage- specific proteins, which instead interact primarily with widely expressed proteins. In light of these findings, we present an approach to filter interaction data based on gene expression levels normalized across tissues or developmental stages. We show that this filter (the percent maximum or pmax filter) can be used to identify subnetworks that function within individual tissues or developmental stages. Conclusions These observations suggest that protein networks are frequently organized into hubs of widely expressed proteins to which are attached various tissue- or stage-specific proteins. This is consistent with earlier analyses of human PPI data and suggests a similar organization of interaction networks across species. This organization implies that tissue or stage specific networks can be best identified from interactome data by using filters designed to include both ubiquitously expressed and specifically expressed genes and proteins.
Collapse
Affiliation(s)
| | | | - Russell L Finley
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, Michigan 48201, USA.
| |
Collapse
|
15
|
An Integrated Analysis of Lineage-specific Small Proteins Across Eight Eukaryotes Reveals Functional and Evolutionary Significance*. PROG BIOCHEM BIOPHYS 2012. [DOI: 10.3724/sp.j.1206.2011.00290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Vaquerizas JM, Teichmann SA, Luscombe NM. How do you find transcription factors? Computational approaches to compile and annotate repertoires of regulators for any genome. Methods Mol Biol 2012; 786:3-19. [PMID: 21938617 DOI: 10.1007/978-1-61779-292-2_1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Transcription factors (TFs) play an important role in regulating gene expression. The availability of complete genome sequences and associated functional genomic data offer excellent opportunities to understand the transcriptional regulatory system of an entire organism. To do so, however, it is essential to compile a reliable dataset of regulatory components. Here, we review computational methods and publicly accessible resources that help identify TF-coding genes in prokaryotic and eukaryotic genomes. Since the regulatory functions of most TFs remain unknown, we also discuss approaches for combining diverse genomic datasets that will help elucidate their chromosomal organisation, expression, and evolutionary conservation. These analysis methods provide a solid foundation for further investigations of the transcriptional regulatory system.
Collapse
|
17
|
Cavalli FMG, Bourgon R, Huber W, Vaquerizas JM, Luscombe NM. SpeCond: a method to detect condition-specific gene expression. Genome Biol 2011; 12:R101. [PMID: 22008066 PMCID: PMC3333772 DOI: 10.1186/gb-2011-12-10-r101] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2011] [Revised: 09/27/2011] [Accepted: 10/18/2011] [Indexed: 01/31/2023] Open
Abstract
Transcriptomic studies routinely measure expression levels across numerous conditions. These datasets allow identification of genes that are specifically expressed in a small number of conditions. However, there are currently no statistically robust methods for identifying such genes. Here we present SpeCond, a method to detect condition-specific genes that outperforms alternative approaches. We apply the method to a dataset of 32 human tissues to determine 2,673 specifically expressed genes. An implementation of SpeCond is freely available as a Bioconductor package at http://www.bioconductor.org/packages/release/bioc/html/SpeCond.html.
Collapse
Affiliation(s)
- Florence M G Cavalli
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
18
|
Hao L, Ge X, Wan H, Hu S, Lercher MJ, Yu J, Chen WH. Human functional genetic studies are biased against the medically most relevant primate-specific genes. BMC Evol Biol 2010; 10:316. [PMID: 20961448 PMCID: PMC2970608 DOI: 10.1186/1471-2148-10-316] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Accepted: 10/20/2010] [Indexed: 12/02/2022] Open
Abstract
Background Many functional, structural and evolutionary features of human genes have been observed to correlate with expression breadth and/or gene age. Here, we systematically explore these correlations. Results Gene age and expression breadth are strongly correlated, but contribute independently to the variation of functional, structural and evolutionary features, even when we take account of variation in mRNA expression level. Human genes without orthologs in distant species ('young' genes) tend to be tissue-specific in their expression. As computational inference of gene function often relies on the existence of homologs in other species, and experimental characterization is facilitated by broad and high expression, young, tissue-specific human genes are often the least characterized. At the same time, young genes are most likely to be medically relevant. Conclusions Our results indicate that functional characterization of human genes is biased against young, tissue-specific genes that are mostly medically relevant. The biases should not be taken lightly because they may pose serious obstacles to our understanding of the molecular basis of human diseases. Future studies should thus be designed to specifically explore the properties of primate-specific genes.
Collapse
Affiliation(s)
- Lili Hao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| | | | | | | | | | | | | |
Collapse
|
19
|
Selection against Robertsonian fusions involving housekeeping genes in the house mouse: integrating data from gene expression arrays and chromosome evolution. Chromosome Res 2010; 18:801-8. [DOI: 10.1007/s10577-010-9153-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2010] [Revised: 08/03/2010] [Accepted: 08/04/2010] [Indexed: 10/19/2022]
|
20
|
Ramsköld D, Wang ET, Burge CB, Sandberg R. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLoS Comput Biol 2009; 5:e1000598. [PMID: 20011106 PMCID: PMC2781110 DOI: 10.1371/journal.pcbi.1000598] [Citation(s) in RCA: 642] [Impact Index Per Article: 42.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2009] [Accepted: 11/04/2009] [Indexed: 01/25/2023] Open
Abstract
The parts of the genome transcribed by a cell or tissue reflect the biological processes and functions it carries out. We characterized the features of mammalian tissue transcriptomes at the gene level through analysis of RNA deep sequencing (RNA-Seq) data across human and mouse tissues and cell lines. We observed that roughly 8,000 protein-coding genes were ubiquitously expressed, contributing to around 75% of all mRNAs by message copy number in most tissues. These mRNAs encoded proteins that were often intracellular, and tended to be involved in metabolism, transcription, RNA processing or translation. In contrast, genes for secreted or plasma membrane proteins were generally expressed in only a subset of tissues. The distribution of expression levels was broad but fairly continuous: no support was found for the concept of distinct expression classes of genes. Expression estimates that included reads mapping to coding exons only correlated better with qRT-PCR data than estimates which also included 3′ untranslated regions (UTRs). Muscle and liver had the least complex transcriptomes, in that they expressed predominantly ubiquitous genes and a large fraction of the transcripts came from a few highly expressed genes, whereas brain, kidney and testis expressed more complex transcriptomes with the vast majority of genes expressed and relatively small contributions from the most expressed genes. mRNAs expressed in brain had unusually long 3′UTRs, and mean 3′UTR length was higher for genes involved in development, morphogenesis and signal transduction, suggesting added complexity of UTR-based regulation for these genes. Our results support a model in which variable exterior components feed into a large, densely connected core composed of ubiquitously expressed intracellular proteins. A variety of genes are active within the nuclei of our cells. Some are needed for the day-to-day maintenance of cell functions, while others have roles that are more specific to certain tissues or particular cell types; for example, only the pancreas produces insulin. As a result, every tissue has its own profile of gene activity. Since active genes produce RNA, tissue differences in gene activity can be probed by characterizing the RNA they contain. Essentially the entire set of RNAs or ‘transcriptome’ has been sequenced from various tissues, and we used these data to compare the degree of specialization of different tissues and to investigate the set of ‘core’ genes active in every tissue. A central observation was that there are an abundance of such core genes, and that these genes account for the majority of the transcriptome in each tissue. These findings will aid in the understanding of what makes tissues, and cell types, different from each other and what each requires to function.
Collapse
Affiliation(s)
- Daniel Ramsköld
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | - Eric T. Wang
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Christopher B. Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * E-mail: (CBB); (RS)
| | - Rickard Sandberg
- Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
- * E-mail: (CBB); (RS)
| |
Collapse
|
21
|
A census of human transcription factors: function, expression and evolution. Nat Rev Genet 2009; 10:252-63. [PMID: 19274049 DOI: 10.1038/nrg2538] [Citation(s) in RCA: 1088] [Impact Index Per Article: 72.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Transcription factors are key cellular components that control gene expression: their activities determine how cells function and respond to the environment. Currently, there is great interest in research into human transcriptional regulation. However, surprisingly little is known about these regulators themselves. For example, how many transcription factors does the human genome contain? How are they expressed in different tissues? Are they evolutionarily conserved? Here, we present an analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation. Much remains to be explored, but this study provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.
Collapse
|
22
|
Bossi A, Lehner B. Tissue specificity and the human protein interaction network. Mol Syst Biol 2009; 5:260. [PMID: 19357639 PMCID: PMC2683721 DOI: 10.1038/msb.2009.17] [Citation(s) in RCA: 239] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2008] [Accepted: 02/23/2009] [Indexed: 12/18/2022] Open
Abstract
A protein interaction network describes a set of physical associations that can occur between proteins. However, within any particular cell or tissue only a subset of proteins is expressed and so only a subset of interactions can occur. Integrating interaction and expression data, we analyze here this interplay between protein expression and physical interactions in humans. Proteins only expressed in restricted cell types, like recently evolved proteins, make few physical interactions. Most tissue-specific proteins do, however, bind to universally expressed proteins, and so can function by recruiting or modifying core cellular processes. Conversely, most ‘housekeeping' proteins that are expressed in all cells also make highly tissue-specific protein interactions. These results suggest a model for the evolution of tissue-specific biology, and show that most, and possibly all, ‘housekeeping' proteins actually have important tissue-specific molecular interactions.
Collapse
Affiliation(s)
- Alice Bossi
- EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation, UPF, Barcelona, Spain
| | | |
Collapse
|
23
|
Ghosh T, Soni K, Scaria V, Halimani M, Bhattacharjee C, Pillai B. MicroRNA-mediated up-regulation of an alternatively polyadenylated variant of the mouse cytoplasmic {beta}-actin gene. Nucleic Acids Res 2008; 36:6318-32. [PMID: 18835850 PMCID: PMC2577349 DOI: 10.1093/nar/gkn624] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Actin is a major cytoskeletal protein in eukaryotes. Recent studies suggest more diverse functional roles for this protein. Actin mRNA is known to be localized to neuronal synapses and undergoes rapid deadenylation during early developmental stages. However, its 3′-untranslated region (UTR) is not characterized and there are no experimentally determined polyadenylation (polyA) sites in actin mRNA. We have found that the cytoplasmic β-actin (Actb) gene generates two alternative transcripts terminated at tandem polyA sites. We used 3′-RACE, EST end analysis and in situ hybridization to unambiguously establish the existence of two 3′-UTRs of varying length in Actb transcript in mouse neuronal cells. Further analyses showed that these two tandem polyA sites are used in a tissue-specific manner. Although the longer 3′-UTR was expressed at a relatively lower level, it conferred higher translational efficiency to the transcript. The longer transcript harbours a conserved mmu-miR-34a/34b-5p target site. Sequence-specific anti-miRNA molecule, mutations of the miRNA target region in the 3′-UTR resulted in reduced expression. The expression was restored by a mutant miRNA complementary to the mutated target region implying that miR-34 binding to Actb 3′-UTR up-regulates target gene expression. Heterogeneity of the Actb 3′-UTR could shed light on the mechanism of miRNA-mediated regulation of messages in neuronal cells.
Collapse
Affiliation(s)
- Tanay Ghosh
- Institute of Genomics and Integrative Biology (IGIB), Mall Road, New Delhi 110007, India
| | | | | | | | | | | |
Collapse
|
24
|
On the nature of human housekeeping genes. Trends Genet 2008; 24:481-4. [DOI: 10.1016/j.tig.2008.08.004] [Citation(s) in RCA: 204] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2008] [Revised: 07/31/2008] [Accepted: 08/02/2008] [Indexed: 01/27/2023]
|
25
|
Freilich S, Goldovsky L, Ouzounis CA, Thornton JM. Metabolic innovations towards the human lineage. BMC Evol Biol 2008; 8:247. [PMID: 18782449 PMCID: PMC2553087 DOI: 10.1186/1471-2148-8-247] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2008] [Accepted: 09/09/2008] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND We describe a function-driven approach to the analysis of metabolism which takes into account the phylogenetic origin of biochemical reactions to reveal subtle lineage-specific metabolic innovations, undetectable by more traditional methods based on sequence comparison. The origins of reactions and thus entire pathways are inferred using a simple taxonomic classification scheme that describes the evolutionary course of events towards the lineage of interest. We investigate the evolutionary history of the human metabolic network extracted from a metabolic database, construct a network of interconnected pathways and classify this network according to the taxonomic categories representing eukaryotes, metazoa and vertebrates. RESULTS It is demonstrated that lineage-specific innovations correspond to reactions and pathways associated with key phenotypic changes during evolution, such as the emergence of cellular organelles in eukaryotes, cell adhesion cascades in metazoa and the biosynthesis of complex cell-specific biomolecules in vertebrates. CONCLUSION This phylogenetic view of metabolic networks puts gene innovations within an evolutionary context, demonstrating how the emergence of a phenotype in a lineage provides a platform for the development of specialized traits.
Collapse
Affiliation(s)
- Shiri Freilich
- The European Bioinformatics Institute, EMBL Cambridge Outstation, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
| | | | | | | |
Collapse
|
26
|
Woody OZ, Doxey AC, McConkey BJ. Assessing the evolution of gene expression using microarray data. Evol Bioinform Online 2008; 4:139-52. [PMID: 19204814 PMCID: PMC2614203 DOI: 10.4137/ebo.s628] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Classical studies of the evolution of gene function have predominantly focused on mutations within protein coding regions. With the advent of microarrays, however, it has become possible to evaluate the transcriptional activity of a gene as an additional characteristic of function. Recent studies have revealed an equally important role for gene regulation in the retention and evolution of duplicate genes. Here we review approaches to assessing the evolution of gene expression using microarray data, and discuss potential influences on expression divergence. Currently, there are no established standards on how best to identify and quantify instances of expression divergence. There have also been few efforts to date that incorporate suspected influences into mathematical models of expression divergence. Such developments will be crucial to a comprehensive understanding of the role gene duplications and expression evolution play in the emergence of complex traits and functional diversity. An integrative approach to gene family evolution, including both orthologous and paralogous genes, has the potential to bring strong predictive power both to the functional annotation of extant proteins and to the inference of functional characteristics of ancestral gene family members.
Collapse
Affiliation(s)
- Owen Z Woody
- Department of Biology, University of Waterloo, Waterloo, Ontario Canada
| | | | | |
Collapse
|
27
|
Zhu J, He F, Song S, Wang J, Yu J. How many human genes can be defined as housekeeping with current expression data? BMC Genomics 2008; 9:172. [PMID: 18416810 PMCID: PMC2396180 DOI: 10.1186/1471-2164-9-172] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2007] [Accepted: 04/16/2008] [Indexed: 12/16/2022] Open
Abstract
Background Housekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tissue-specific (TS) genes relatively is fundamental for studying gene expression and cellular differentiation. Although many studies have aimed at large-scale and thorough categorization of human HK genes, a meaningful consensus has yet to be reached. Results We collected two latest gene expression datasets (both EST and microarray data) from public databases and analyzed the gene expression profiles in 18 human tissues that have been well-documented by both two data types. Benchmarked by a manually-curated HK gene collection (HK408), we demonstrated that present data from EST sampling was far from saturated, and the inadequacy has limited the gene detectability and our understanding of TS expressions. Due to a likely over-stringent threshold, microarray data showed higher false negative rate compared with EST data, leading to a significant underestimation of HK genes. Based on EST data, we found that 40.0% of the currently annotated human genes were universally expressed in at least 16 of 18 tissues, as compared to only 5.1% specifically expressed in a single tissue. Our current EST-based estimate on human HK genes ranged from 3,140 to 6,909 in number, a ten-fold increase in comparison with previous microarray-based estimates. Conclusion We concluded that a significant fraction of human genes, at least in the currently annotated data depositories, was broadly expressed. Our understanding of tissue-specific expression was still preliminary and required much more large-scale and high-quality transcriptomic data in future studies. The new HK gene list categorized in this study will be useful for genome-wide analyses on structural and functional features of HK genes.
Collapse
Affiliation(s)
- Jiang Zhu
- Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
| | | | | | | | | |
Collapse
|
28
|
Greco D, Somervuo P, Di Lieto A, Raitila T, Nitsch L, Castrén E, Auvinen P. Physiology, pathology and relatedness of human tissues from gene expression meta-analysis. PLoS One 2008; 3:e1880. [PMID: 18382664 PMCID: PMC2268968 DOI: 10.1371/journal.pone.0001880] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2007] [Accepted: 02/25/2008] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Development and maintenance of the identity of tissues is of central importance for multicellular organisms. Based on gene expression profiles, it is possible to divide genes in housekeeping genes and those whose expression is preferential in one or a few tissues and which provide specialized functions that have a strong effect on the physiology of the whole organism. RESULTS We have surveyed the gene expression in 78 normal human tissues integrating publicly available microarray gene expression data. A total amount of 1601 genes were identified as selectively expressed in one or more tissues. The tissue-selective genes covered a wide range of cellular and molecular functions, and could be linked to 361 human diseases with Mendelian inheritance. Based on the gene expression profiles, we were able to form a network of tissues reflecting their functional relatedness and, to certain extent, their development. Using co-citation driven gene network technique and promoter analysis, we predicted a transcriptional module where the co-operation of the transcription factors E2F and NF-kappaB can possibly regulate a number of genes involved in the neurogenesis that takes place in the adult hippocampus. CONCLUSIONS Here we propose that integration of gene expression data from Affymetrix GeneChip experiments is possible through re-annotation and commonly used pre-processing methods. We suggest that some functional aspects of the tissues can be explained by the co-operation of multiple transcription factors that regulate the expression of selected groups of genes.
Collapse
Affiliation(s)
- Dario Greco
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| | | | | | | | | | | | | |
Collapse
|
29
|
Farré D, Bellora N, Mularoni L, Messeguer X, Albà MM. Housekeeping genes tend to show reduced upstream sequence conservation. Genome Biol 2008; 8:R140. [PMID: 17626644 PMCID: PMC2323216 DOI: 10.1186/gb-2007-8-7-r140] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2006] [Revised: 02/16/2007] [Accepted: 07/13/2007] [Indexed: 01/09/2023] Open
Abstract
Mammalian housekeeping genes show significantly lower promoter sequence conservation, especially upstream of position -500 with respect to the transcription start site, than genes expressed in a subset of tissues. Background Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation. Results We show that mammalian housekeeping genes, expressed in all or nearly all tissues, show significantly lower promoter sequence conservation, especially upstream of position -500 with respect to the transcription start site, than genes expressed in a subset of tissues. In addition, we evaluate the effect of gene function, CpG island content and protein evolutionary rate on promoter sequence conservation. Finally, we identify a subset of transcription factors that bind to motifs that are specifically over-represented in housekeeping gene promoters. Conclusion This is the first report that shows that the promoters of housekeeping genes show reduced sequence conservation with respect to genes expressed in a more tissue-restricted manner. This is likely to be related to simpler gene expression, requiring a smaller number of functional cis-regulatory motifs.
Collapse
Affiliation(s)
- Domènec Farré
- Centre for Genomic Regulation, Dr Aiguader 88, Barcelona 08003, Spain
| | - Nicolás Bellora
- Centre for Genomic Regulation, Dr Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra, Dr Aiguader 88, Barcelona 08003, Spain
| | - Loris Mularoni
- Fundació Institut Municipal d'Investigació Mèdica, Dr Aiguader 88, Barcelona 08003, Spain
| | - Xavier Messeguer
- Universitat Politècnica de Catalunya, Jordi Girona 1-3, Barcelona 08034, Spain
| | - M Mar Albà
- Universitat Pompeu Fabra, Dr Aiguader 88, Barcelona 08003, Spain
- Fundació Institut Municipal d'Investigació Mèdica, Dr Aiguader 88, Barcelona 08003, Spain
- Catalan Institution for Research and Advanced Studies, Pg Lluis Companys 23, Barcelona 08010, Spain
| |
Collapse
|
30
|
Lopez-Bigas N, De S, Teichmann SA. Functional protein divergence in the evolution of Homo sapiens. Genome Biol 2008; 9:R33. [PMID: 18279504 PMCID: PMC2374701 DOI: 10.1186/gb-2008-9-2-r33] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2007] [Revised: 12/24/2007] [Accepted: 02/15/2008] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Protein-coding regions in a genome evolve by sequence divergence and gene gain and loss, altering the gene content of the organism. However, it is not well understood how this has given rise to the enormous diversity of metazoa present today. RESULTS To obtain a global view of human genomic evolution, we quantify the divergence of proteins by functional category at different evolutionary distances from human. CONCLUSION This analysis highlights some general systems-level characteristics of human evolution: regulatory processes, such as signal transducers, transcription factors and receptors, have a high degree of plasticity, while core processes, such as metabolism, transport and protein synthesis, are largely conserved. Additionally, this study reveals a dynamic picture of selective forces at short, medium and long evolutionary timescales. Certain functional categories, such as 'development' and 'organogenesis', exhibit temporal patterns of sequence divergence in eukaryotes relative to human. This framework for a grammar of human evolution supports previously postulated theories of robustness and evolvability.
Collapse
Affiliation(s)
- Nuria Lopez-Bigas
- Research Unit on Biomedical Informatics, Experimental and Health Science Department, Universitat Pompeu Fabra, Dr. Aiguader, Barcelona, 08003, Spain
| | - Subhajyoti De
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| | - Sarah A Teichmann
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| |
Collapse
|
31
|
Construction, visualisation, and clustering of transcription networks from microarray expression data. PLoS Comput Biol 2007; 3:2032-42. [PMID: 17967053 PMCID: PMC2041979 DOI: 10.1371/journal.pcbi.0030206] [Citation(s) in RCA: 198] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2007] [Accepted: 09/05/2007] [Indexed: 12/20/2022] Open
Abstract
Network analysis transcends conventional pairwise approaches to data analysis as the context of components in a network graph can be taken into account. Such approaches are increasingly being applied to genomics data, where functional linkages are used to connect genes or proteins. However, while microarray gene expression datasets are now abundant and of high quality, few approaches have been developed for analysis of such data in a network context. We present a novel approach for 3-D visualisation and analysis of transcriptional networks generated from microarray data. These networks consist of nodes representing transcripts connected by virtue of their expression profile similarity across multiple conditions. Analysing genome-wide gene transcription across 61 mouse tissues, we describe the unusual topography of the large and highly structured networks produced, and demonstrate how they can be used to visualise, cluster, and mine large datasets. This approach is fast, intuitive, and versatile, and allows the identification of biological relationships that may be missed by conventional analysis techniques. This work has been implemented in a freely available open-source application named BioLayout Express3D. This paper describes a novel approach for analysis of gene expression data. In this approach, normalized gene expression data is transformed into a graph where nodes in the graph represent transcripts connected to each other by virtue of their coexpression across multiple tissues or samples. The graph paradigm has many advantages for such analyses. Graph clustering of the derived network performs extremely well in comparison to traditional pairwise schemes. We show that this approach is robust and able to accommodate large datasets such as the Genomics Institute of the Novartis Research Foundation mouse tissue atlas. The entire approach and algorithms are combined into a single open-source JAVA application that allows users to perform this analysis and further mining on their own data and to visualize the results interactively in 3-D. The approach is not limited to gene expression data but would also be useful for other complex biological datasets. We use the method to investigate the relationship between the phylogenetic age of transcripts and their tissue specificity.
Collapse
|
32
|
Freilich S, Massingham T, Blanc E, Goldovsky L, Thornton JM. Relating tissue specialization to the differentiation of expression of singleton and duplicate mouse proteins. Genome Biol 2006; 7:R89. [PMID: 17029626 PMCID: PMC1794571 DOI: 10.1186/gb-2006-7-10-r89] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2006] [Revised: 07/26/2006] [Accepted: 10/09/2006] [Indexed: 01/22/2023] Open
Abstract
An analysis of the relationship between duplication events, the time they took place and the expression breadth of the duplicated genes supports the subfunctionalization model, in which expression divergence following gene duplication promotes the retention of a gene in multicellular species. Background Gene duplications have been hypothesized to be a major factor in enabling the evolution of tissue differentiation. Analyses of the expression profiles of duplicate genes in mammalian tissues have indicated that, with time, the expression patterns of duplicate genes diverge and become more tissue specific. We explored the relationship between duplication events, the time at which they took place, and both the expression breadth of the duplicated genes and the cumulative expression breadth of the gene family to which they belong. Results We show that only duplicates that arose through post-multicellularity duplication events show a tendency to become more specifically expressed, whereas such a tendency is not observed for duplicates that arose in a unicellular ancestor. Unlike the narrow expression profile of the duplicated genes, the overall expression of gene families tends to maintain a global expression pattern. Conclusion The work presented here supports the view suggested by the subfunctionalization model, namely that expression divergence in different tissues, following gene duplication, promotes the retention of a gene in the genome of multicellular species. The global expression profile of the gene families suggests division of expression between family members, whose expression becomes specialized. Because specialization of expression is coupled with an increased rate of sequence divergence, it can facilitate the evolution of new, tissue-specific functions.
Collapse
Affiliation(s)
- Shiri Freilich
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK.
| | | | | | | | | |
Collapse
|
33
|
Yanai I, Korbel JO, Boue S, McWeeney SK, Bork P, Lercher MJ. Similar gene expression profiles do not imply similar tissue functions. Trends Genet 2006; 22:132-8. [PMID: 16480787 DOI: 10.1016/j.tig.2006.01.006] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2005] [Revised: 12/01/2005] [Accepted: 01/13/2006] [Indexed: 10/25/2022]
Abstract
Although similarities in gene expression among tissues are commonly inferred to reflect functional constraints, this has never been formally tested. Furthermore, it is unclear which evolutionary processes are responsible for the observed similarities. When examining genome-wide expression data in mouse, we found that patterns of expression similarity between tissues extend to genes that are unlikely to function in the tissues. Thus, ectopic expression can seem coordinated across tissues. This indicates that knowledge of gene expression patterns per se is insufficient to infer gene function. Ectopic expression is possibly explained as expression leakage, caused by spreading of chromatin modifications or the transcription apparatus into neighboring genes.
Collapse
Affiliation(s)
- Itai Yanai
- Department of Molecular Genetics, Weizmann Institute of Science, 76100 Rehovot, Israel
| | | | | | | | | | | |
Collapse
|