1
|
Christmas MJ, Kaplow IM, Genereux DP, Dong MX, Hughes GM, Li X, Sullivan PF, Hindle AG, Andrews G, Armstrong JC, Bianchi M, Breit AM, Diekhans M, Fanter C, Foley NM, Goodman DB, Goodman L, Keough KC, Kirilenko B, Kowalczyk A, Lawless C, Lind AL, Meadows JRS, Moreira LR, Redlich RW, Ryan L, Swofford R, Valenzuela A, Wagner F, Wallerman O, Brown AR, Damas J, Fan K, Gatesy J, Grimshaw J, Johnson J, Kozyrev SV, Lawler AJ, Marinescu VD, Morrill KM, Osmanski A, Paulat NS, Phan BN, Reilly SK, Schäffer DE, Steiner C, Supple MA, Wilder AP, Wirthlin ME, Xue JR, Birren BW, Gazal S, Hubley RM, Koepfli KP, Marques-Bonet T, Meyer WK, Nweeia M, Sabeti PC, Shapiro B, Smit AFA, Springer MS, Teeling EC, Weng Z, Hiller M, Levesque DL, Lewin HA, Murphy WJ, Navarro A, Paten B, Pollard KS, Ray DA, Ruf I, Ryder OA, Pfenning AR, Lindblad-Toh K, Karlsson EK. Evolutionary constraint and innovation across hundreds of placental mammals. Science 2023; 380:eabn3943. [PMID: 37104599 PMCID: PMC10250106 DOI: 10.1126/science.abn3943] [Citation(s) in RCA: 70] [Impact Index Per Article: 70.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 12/16/2022] [Indexed: 04/29/2023]
Abstract
Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth's vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.
Collapse
Affiliation(s)
- Matthew J. Christmas
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Irene M. Kaplow
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | - Michael X. Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Graham M. Hughes
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Xue Li
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Patrick F. Sullivan
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Allyson G. Hindle
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Gregory Andrews
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Joel C. Armstrong
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matteo Bianchi
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ana M. Breit
- School of Biology and Ecology, University of Maine, Orono, ME 04469, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cornelia Fanter
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Nicole M. Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Daniel B. Goodman
- Department of Microbiology and Immunology, University of California San Francisco, San Francisco, CA 94143, USA
| | | | - Kathleen C. Keough
- Fauna Bio, Inc., Emeryville, CA 94608, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Bogdan Kirilenko
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | - Amanda Kowalczyk
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Colleen Lawless
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Abigail L. Lind
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Jennifer R. S. Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Lucas R. Moreira
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Ruby W. Redlich
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Louise Ryan
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Ross Swofford
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Alejandro Valenzuela
- Department of Experimental and Health Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Franziska Wagner
- Museum of Zoology, Senckenberg Natural History Collections Dresden, 01109 Dresden, Germany
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ashley R. Brown
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Joana Damas
- The Genome Center, University of California Davis, Davis, CA 95616, USA
| | - Kaili Fan
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Jenna Grimshaw
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Jeremy Johnson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Sergey V. Kozyrev
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Alyssa J. Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Voichita D. Marinescu
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Kathleen M. Morrill
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Austin Osmanski
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Nicole S. Paulat
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - BaDoi N. Phan
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Daniel E. Schäffer
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Cynthia Steiner
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Megan A. Supple
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Aryn P. Wilder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Morgan E. Wirthlin
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - James R. Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | | | - Bruce W. Birren
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Steven Gazal
- Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | | | - Klaus-Peter Koepfli
- Center for Species Survival, Smithsonian’s National Zoo and Conservation Biology Institute, Washington, DC 20008, USA
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA
| | - Tomas Marques-Bonet
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08036 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Wynn K. Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | - Martin Nweeia
- Department of Comprehensive Care, School of Dental Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Vertebrate Zoology, Canadian Museum of Nature, Ottawa, Ontario K2P 2R1, Canada
- Department of Vertebrate Zoology, Smithsonian Institution, Washington, DC 20002, USA
- Narwhal Genome Initiative, Department of Restorative Dentistry and Biomaterials Sciences, Harvard School of Dental Medicine, Boston, MA 02115, USA
| | - Pardis C. Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Mark S. Springer
- Department of Evolution, Ecology and Organismal Biology, University of California Riverside, Riverside, CA 92521, USA
| | - Emma C. Teeling
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Michael Hiller
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | | | - Harris A. Lewin
- The Genome Center, University of California Davis, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
- John Muir Institute for the Environment, University of California Davis, Davis, CA 95616, USA
| | - William J. Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Arcadi Navarro
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, 08005 Barcelona, Spain
- CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Katherine S. Pollard
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - David A. Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Irina Ruf
- Division of Messel Research and Mammalogy, Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt am Main, Germany
| | - Oliver A. Ryder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
- Department of Evolution, Behavior and Ecology, School of Biological Sciences, University of California San Diego, La Jolla, CA 92039, USA
| | - Andreas R. Pfenning
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Elinor K. Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| |
Collapse
|
2
|
Wei K, Ma L, Zhang T. Characterization of gene promoters in pig: conservative elements, regulatory motifs and evolutionary trend. PeerJ 2019; 7:e7204. [PMID: 31275764 PMCID: PMC6598670 DOI: 10.7717/peerj.7204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 05/29/2019] [Indexed: 02/04/2023] Open
Abstract
It is vital to understand the conservation and evolution of gene promoter sequences in order to understand environmental adaptation. The level of promoter conservation varies greatly between housekeeping (HK) and tissue-specific (TS) genes, denoting differences in the strength of the evolutionary constraints. Here, we analyzed promoter conservation and evolution to exploit differential regulation between HK and TS genes. The analysis of conserved elements showed CpG islands, short tandem repeats and G-quadruplex sequences are highly enriched in HK promoters relative to TS promoters. In addition, the type and density of regulatory motifs in TS promoters are much higher than HK promoters, indicating that TS genes show more complex regulatory patterns than HK genes. Moreover, the evolutionary dynamics of promoters showed similar evolutionary trend to coding sequences. HK promoters suffer more stringent selective pressure in the long-term evolutionary process. HK genes tend to show increased upstream sequence conservation due to stringent selection pressures acting on the promoter regions. The specificity of TS gene expression may be due to complex regulatory motifs acting in different tissues or conditions. The results from this study can be used to deepen our understanding of adaptive evolution.
Collapse
Affiliation(s)
- Kai Wei
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China.,Center of Life and Food Sciences Weihenstephan, Technische Universität München, Freising, Byern, Germany
| | - Lei Ma
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China
| | - Tingting Zhang
- College of Life Science, Shihezi University, Shihezi, Xinjiang, China
| |
Collapse
|
3
|
Zhu YP, Wang M, Xiang Y, Qiu L, Hu S, Zhang Z, Mattjus P, Zhu X, Zhang Y. Nach Is a Novel Subgroup at an Early Evolutionary Stage of the CNC-bZIP Subfamily Transcription Factors from the Marine Bacteria to Humans. Int J Mol Sci 2018; 19:ijms19102927. [PMID: 30261635 PMCID: PMC6213907 DOI: 10.3390/ijms19102927] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 09/19/2018] [Accepted: 09/22/2018] [Indexed: 02/07/2023] Open
Abstract
Normal growth and development, as well as adaptive responses to various intracellular and environmental stresses, are tightly controlled by transcriptional networks. The evolutionarily conserved genomic sequences across species highlights the architecture of such certain regulatory elements. Among them, one of the most conserved transcription factors is the basic-region leucine zipper (bZIP) family. Herein, we have performed phylogenetic analysis of these bZIP proteins and found, to our surprise, that there exist a few homologous proteins of the family members Jun, Fos, ATF2, BATF, C/EBP and CNC (cap’n’collar) in either viruses or bacteria, albeit expansion and diversification of this bZIP superfamily have occurred in vertebrates from metazoan. Interestingly, a specific group of bZIP proteins is identified, designated Nach (Nrf and CNC homology), because of their strong conservation with all the known CNC and NF-E2 p45 subunit-related factors Nrf1 and Nrf2. Further experimental evidence has also been provided, revealing that Nach1 and Nach2 from the marine bacteria exert distinctive functions, when compared with human Nrf1 and Nrf2, in the transcriptional regulation of antioxidant response element (ARE)-battery genes. Collectively, further insights into these Nach/CNC-bZIP subfamily transcription factors provide a novel better understanding of distinct biological functions of these factors expressed in distinct species from the marine bacteria to humans.
Collapse
Affiliation(s)
- Yu-Ping Zhu
- The Laboratory of Cell Biochemistry and Topogenetic Regulation, College of Bioengineering and Faculty of Sciences, Chongqing University, No. 174 Shazheng Street, Shapingba District, Chongqing 400044, China.
| | - Meng Wang
- The Laboratory of Cell Biochemistry and Topogenetic Regulation, College of Bioengineering and Faculty of Sciences, Chongqing University, No. 174 Shazheng Street, Shapingba District, Chongqing 400044, China.
| | - Yuancai Xiang
- The Laboratory of Cell Biochemistry and Topogenetic Regulation, College of Bioengineering and Faculty of Sciences, Chongqing University, No. 174 Shazheng Street, Shapingba District, Chongqing 400044, China.
| | - Lu Qiu
- The Laboratory of Cell Biochemistry and Topogenetic Regulation, College of Bioengineering and Faculty of Sciences, Chongqing University, No. 174 Shazheng Street, Shapingba District, Chongqing 400044, China.
| | - Shaofan Hu
- The Laboratory of Cell Biochemistry and Topogenetic Regulation, College of Bioengineering and Faculty of Sciences, Chongqing University, No. 174 Shazheng Street, Shapingba District, Chongqing 400044, China.
| | - Zhengwen Zhang
- Institute of Neuroscience and Psychology, School of Life Sciences, University of Glasgow, 42 Western Common Road, Glasgow G22 5PQ, Scotland, UK.
| | - Peter Mattjus
- Department of Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Artillerigatan 6A, III, BioCity, FI-20520 Turku, Finland.
| | - Xiaomei Zhu
- Shanghai Center for Quantitative Life Science and Department of Physics, Shanghai University, 99 Shangda Road, Shanghai 200444, China.
| | - Yiguo Zhang
- The Laboratory of Cell Biochemistry and Topogenetic Regulation, College of Bioengineering and Faculty of Sciences, Chongqing University, No. 174 Shazheng Street, Shapingba District, Chongqing 400044, China.
| |
Collapse
|
4
|
Aslibekyan S, Almeida M, Tintle N. Pathway analysis approaches for rare and common variants: insights from Genetic Analysis Workshop 18. Genet Epidemiol 2014; 38 Suppl 1:S86-91. [PMID: 25112195 DOI: 10.1002/gepi.21831] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Pathway analysis, broadly defined as a group of methods incorporating a priori biological information from public databases, has emerged as a promising approach for analyzing high-dimensional genomic data. As part of Genetic Analysis Workshop 18, seven research groups applied pathway analysis techniques to whole-genome sequence data from the San Antonio Family Study. Overall, the groups found that the potential of pathway analysis to improve detection of causal variants by lowering the multiple-testing burden and incorporating biologic insight remains largely unrealized. Specifically, there is a lack of best practices at each stage of the pathway approach: annotation, analysis, interpretation, and follow-up. Annotation of genetic variants is inconsistent across databases, incomplete, and biased toward known genes. At the analysis stage insufficient statistical power remains a major challenge. Analyses combining rare and common variants may have an inflated type I error rate and may not improve detection of causal genes. Inclusion of known causal genes may not improve statistical power, although the fraction of explained phenotypic variance may be a more appropriate metric. Interpretation of findings is further complicated by evidence in support of interactions between pathways and by the lack of consensus on how to best incorporate functional information. Finally, all presented approaches warranted follow-up studies, both to reduce the likelihood of false-positive findings and to identify specific causal variants within a given pathway. Despite the initial promise of pathway analysis for modeling biological complexity of disease phenotypes, many methodological challenges currently remain to be addressed.
Collapse
Affiliation(s)
- Stella Aslibekyan
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | | | | |
Collapse
|
5
|
Rao YS, Chai XW, Wang ZF, Nie QH, Zhang XQ. Impact of GC content on gene expression pattern in chicken. Genet Sel Evol 2013; 45:9. [PMID: 23557030 PMCID: PMC3641017 DOI: 10.1186/1297-9686-45-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 03/16/2013] [Indexed: 11/21/2022] Open
Abstract
Background GC content varies greatly between different genomic regions in many eukaryotes. In order to determine whether this organization named isochore organization influences gene expression patterns, the relationship between GC content and gene expression has been investigated in man and mouse. However, to date, this question is still a matter for debate. Among the avian species, chicken (Gallus gallus) is the best studied representative with a complete genome sequence. The distinctive features and organization of its sequence make it a good model to explore important issues in genome structure and evolution. Methods Only nuclear genes with complete information on protein-coding sequence with no evidence of multiple-splicing forms were included in this study. Chicken protein coding sequences, complete mRNA sequences (or full length cDNA sequences), and 5′ untranslated region sequences (5′ UTR) were downloaded from Ensembl and chicken expression data originated from a previous work. Three indices i.e. expression level, expression breadth and maximum expression level were used to measure the expression pattern of a given gene. CpG islands were identified using hgTables of the UCSC Genome Browser. Correlation analysis between variables was performed by SAS Proprietary Software Release 8.1. Results In chicken, the GC content of 5′ UTR is significantly and positively correlated with expression level, expression breadth, and maximum expression level, whereas that of coding sequences and introns and at the third coding position are negatively correlated with expression level and expression breadth, and not correlated with maximum expression level. These significant trends are independent of recombination rate, chromosome size and gene density. Furthermore, multiple linear regression analysis indicated that GC content in genes could explain approximately 10% of the variation in gene expression. Conclusions GC content is significantly associated with gene expression pattern and could be one of the important regulation factors in the chicken genome.
Collapse
Affiliation(s)
- You Sheng Rao
- Department of Biological Technology, Jiangxi Educational Institute, Jiangxi, Nanchang 330029, China
| | | | | | | | | |
Collapse
|
6
|
Evolutionary rate of human tissue-specific genes are related with transposable element insertions. Genetica 2013; 140:513-23. [PMID: 23337972 DOI: 10.1007/s10709-013-9700-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 01/12/2013] [Indexed: 01/05/2023]
Abstract
The influence of transposable elements (TEs) on genome evolution has been widely studied. However, it remains unclear whether TE insertions also impact on evolutionary rate of human genes. In this study, we have compared the differences in TEs and evolutionary rates between human tissue-specific genes. Our results showed that various functional categories of human tissue-specific genes contained different TE numbers and divergent values of Ka/Ks, with human nucleic acid binding transcription factor activity genes having the fewest TE density and Ka/Ks value. Interestingly, we also found that human tissue-specific genes with TEs have also undergone faster evolution than those without TEs. Therefore, TEs have significant impact on the evolutionary rates of human tissue-specific genes. Furthermore, local genomic properties such as gene length, GC content and recombination rate may reflect a true transpositional bias for the particular TEs. Our results may provide important insights for further elucidating the evolution of human tissue-specific genes.
Collapse
|
7
|
Coordinated Networks of microRNAs and Transcription Factors with Evolutionary Perspectives. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 774:169-87. [DOI: 10.1007/978-94-007-5590-1_10] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
8
|
Iwama H, Murao K, Imachi H, Ishida T. Transcriptional double-autorepression feedforward circuits act for multicellularity and nervous system development. BMC Genomics 2011; 12:228. [PMID: 21569329 PMCID: PMC3116505 DOI: 10.1186/1471-2164-12-228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2010] [Accepted: 05/11/2011] [Indexed: 12/21/2022] Open
Abstract
Background The transcriptional regulatory network is considered to be built from a set of circuit patterns called network motifs. Experimental studies have provided instances where a feedforward circuit (FFC) appears with modification of autoregulation, but little is known systematically about such autoregulation-integrated FFCs. Therefore, we aimed to examine whether the autoregulation-integrated FFC is a network motif relevant to describing the human transcriptional regulatory systems, and explored the relationship of such network motifs with biological functions. Results Based on human-mouse evolutionarily conserved transcription factor binding sites (TFBSs) in 76600 conserved blocks for 5169 genes, we compiled the human transcriptional connections into a matrix, and examined the number of FFC appearances in comparison with randomized networks. The results revealed that the configuration of autoregulation integrated in the FFC critically affects the abundance or avoidance of FFC appearances. In particular, an FFC comprising two repressors that are both autoregulated was revealed as a significant network motif, which we termed the double-autoregulation FFC (DAR-FFC). Interestingly, this network motif preferentially constitutes effecter transcriptional circuits with functions in cell-cell signaling and multicellular organization, and is particularly related to nervous system development. Conclusions We have revealed that the configuration of autoregulation integrated in the FFCs is a critical factor for abundance or avoidance of the appearance of the FFCs. In particular, we have identified the DAR-FFC as a distinctive integrated network motif endowed with properties that are indispensable for forming the transcriptional regulatory circuits involved in multicellular organization and nervous system development. This is the first report showing that the DAR-FFC is a significant network motif.
Collapse
Affiliation(s)
- Hisakazu Iwama
- Life Science Research Center, Kagawa University, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa 761-0793, Japan.
| | | | | | | |
Collapse
|
9
|
Iwama H, Murao K, Imachi H, Ishida T. MicroRNA Networks Alter to Conform to Transcription Factor Networks Adding Redundancy and Reducing the Repertoire of Target Genes for Coordinated Regulation. Mol Biol Evol 2010; 28:639-46. [DOI: 10.1093/molbev/msq231] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
10
|
Farré D, Albà MM. Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Mol Biol Evol 2009; 27:325-35. [PMID: 19822635 DOI: 10.1093/molbev/msp242] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Gene duplication is a major mechanism for molecular evolutionary innovation. Young gene duplicates typically exhibit elevated rates of protein evolution and, according to a number of recent studies, increased expression divergence. However, the nature of these changes is still poorly understood. To gain novel insights into the functional consequences of gene duplication, we have undertaken an in-depth analysis of a large data set of gene families containing primate- and/or rodent-specific gene duplicates. We have found a clear tendency toward an increase in protein, promoter, and expression divergence with increasing number of duplication events undergone by each gene since the human-mouse split. In addition, gene duplication is significantly associated with a reduction in expression breadth and intensity. Interestingly, it is possible to identify three main groups regarding the evolution of gene expression following gene duplication. The first group, which comprises around 25% of the families, shows patterns compatible with tissue-expression partitioning. The second and largest group, comprising 33-53% of the families, shows broad expression of one of the gene copies and reduced, overlapping, expression of the other copy or copies. This can be attributed, in most cases, to loss of expression in several tissues of one or more gene copies. Finally, a substantial number of families, 19-35%, maintain a very high level of tissue-expression overlap (>0.8) after tens of millions of years of evolution. These families may have been subject to selection for increased gene dosage.
Collapse
|
11
|
Park C, Makova KD. Coding region structural heterogeneity and turnover of transcription start sites contribute to divergence in expression between duplicate genes. Genome Biol 2009; 10:R10. [PMID: 19175934 PMCID: PMC2687787 DOI: 10.1186/gb-2009-10-1-r10] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2008] [Revised: 12/24/2008] [Accepted: 01/28/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene expression divergence is one manifestation of functional differences between duplicate genes. Although rapid accumulation of expression divergence between duplicate gene copies has been observed, the driving mechanisms behind this phenomenon have not been explored in detail. RESULTS We examine which factors influence expression divergence between human duplicate genes, utilizing the latest genome-wide data sets. We conclude that the turnover of transcription start sites between duplicate genes occurs rapidly after gene duplication and that gene pairs with shared transcription start sites have significantly higher expression similarity than those without shared transcription start sites. Moreover, we find that most (55%) duplicate gene pairs do not retain the same coding sequence structure between the two duplicate copies and this also contributes to divergence in their expression. Furthermore, the proportion of aligned sequences in cis-regulatory regions between the two copies is positively correlated with expression similarity. Surprisingly, we find no effect of copy-specific transposable element insertions on the divergence of duplicate gene expression. CONCLUSIONS Our results suggest that turnover of transcription start sites, structural heterogeneity of coding sequences, and divergence of cis-regulatory regions between copies play a pivotal role in determining the expression divergence of duplicate genes.
Collapse
Affiliation(s)
- Chungoo Park
- Center for Comparative Genomics and Bioinformatics, Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA.
| | | |
Collapse
|
12
|
Miura H, Tomaru Y, Nakanishi M, Kondo S, Hayashizaki Y, Suzuki M. Identification of DNA regions and a set of transcriptional regulatory factors involved in transcriptional regulation of several human liver-enriched transcription factor genes. Nucleic Acids Res 2008; 37:778-92. [PMID: 19074951 PMCID: PMC2647325 DOI: 10.1093/nar/gkn978] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Mammalian tissue- and/or time-specific transcription is primarily regulated in a combinatorial fashion through interactions between a specific set of transcriptional regulatory factors (TRFs) and their cognate cis-regulatory elements located in the regulatory regions. In exploring the DNA regions and TRFs involved in combinatorial transcriptional regulation, we noted that individual knockdown of a set of human liver-enriched TRFs such as HNF1A, HNF3A, HNF3B, HNF3G and HNF4A resulted in perturbation of the expression of several single TRF genes, such as HNF1A, HNF3G and CEBPA genes. We thus searched the potential binding sites for these five TRFs in the highly conserved genomic regions around these three TRF genes and found several putative combinatorial regulatory regions. Chromatin immunoprecipitation analysis revealed that almost all of the putative regulatory DNA regions were bound by the TRFs as well as two coactivators (CBP and p300). The strong transcription-enhancing activity of the putative combinatorial regulatory region located downstream of the CEBPA gene was confirmed. EMSA demonstrated specific bindings of these HNFs to the target DNA region. Finally, co-transfection reporter assays with various combinations of expression vectors for these HNF genes demonstrated the transcriptional activation of the CEBPA gene in a combinatorial manner by these TRFs.
Collapse
Affiliation(s)
- Hisashi Miura
- RIKEN Omics Science Center, RIKEN Yokohama Institute 1-7-22 Suehiro-Cho, Tsurumi-Ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | | | |
Collapse
|
13
|
Comparative analysis of distinct non-coding characteristics potentially contributing to the divergence of human tissue-specific genes. Genetica 2008; 136:127-34. [DOI: 10.1007/s10709-008-9323-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Accepted: 08/25/2008] [Indexed: 10/21/2022]
|
14
|
Computational analysis of constraints on noncoding regions, coding regions and gene expression in relation to Plasmodium phenotypic diversity. PLoS One 2008; 3:e3122. [PMID: 18769675 PMCID: PMC2518851 DOI: 10.1371/journal.pone.0003122] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2008] [Accepted: 08/02/2008] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Malaria-causing Plasmodium species exhibit marked differences including host choice and preference for invading particular cell types. The genetic bases of phenotypic differences between parasites can be understood, in part, by investigating constraints on gene expression and genic sequences, both coding and regulatory. METHODOLOGY/PRINCIPAL FINDINGS We investigated the evolutionary constraints on sequence and expression of parasitic genes by applying comparative genomics approaches to 6 Plasmodium genomes and 2 genome-wide expression studies. We found that the coding regions of Plasmodium transcription factor and sexual development genes are relatively less constrained, as are those of genes encoding CCCH zinc fingers and invasion proteins, which all play important roles in these parasites. Transcription factors and genes with stage-restricted expression have conserved upstream regions and so do several gene classes critical to the parasite's lifestyle, namely, ion transport, invasion, chromatin assembly and CCCH zinc fingers. Additionally, a cross-species comparison of expression patterns revealed that Plasmodium-specific genes exhibit significant expression divergence. CONCLUSIONS/SIGNIFICANCE Overall, constraints on Plasmodium's protein coding regions confirm observations from other eukaryotes in that transcription factors are under relatively lower constraint. Proteins relevant to the parasite's unique lifestyle also have lower constraint on their coding regions. Greater conservation between Plasmodium species in terms of promoter motifs suggests tight regulatory control of lifestyle genes. However, an interspecies divergence in expression patterns of these genes suggests that either expression is controlled via genomic or epigenomic features not encoded in the proximal promoter sequence, or alternatively, the combinatorial interactions between motifs confer species-specific expression patterns.
Collapse
|
15
|
Weak correlation between sequence conservation in promoter regions and in protein-coding regions of human-mouse orthologous gene pairs. BMC Genomics 2008; 9:152. [PMID: 18384671 PMCID: PMC2335122 DOI: 10.1186/1471-2164-9-152] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2007] [Accepted: 04/02/2008] [Indexed: 12/30/2022] Open
Abstract
Background Interspecies sequence comparison is a powerful tool to extract functional or evolutionary information from the genomes of organisms. A number of studies have compared protein sequences or promoter sequences between mammals, which provided many insights into genomics. However, the correlation between protein conservation and promoter conservation remains controversial. Results We examined promoter conservation as well as protein conservation for 6,901 human and mouse orthologous genes, and observed a very weak correlation between them. We further investigated their relationship by decomposing it based on functional categories, and identified categories with significant tendencies. Remarkably, the 'ribosome' category showed significantly low promoter conservation, despite its high protein conservation, and the 'extracellular matrix' category showed significantly high promoter conservation, in spite of its low protein conservation. Conclusion Our results show the relation of gene function to protein conservation and promoter conservation, and revealed that there seem to be nonparallel components between protein and promoter sequence evolution.
Collapse
|
16
|
Farré D, Bellora N, Mularoni L, Messeguer X, Albà MM. Housekeeping genes tend to show reduced upstream sequence conservation. Genome Biol 2008; 8:R140. [PMID: 17626644 PMCID: PMC2323216 DOI: 10.1186/gb-2007-8-7-r140] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2006] [Revised: 02/16/2007] [Accepted: 07/13/2007] [Indexed: 01/09/2023] Open
Abstract
Mammalian housekeeping genes show significantly lower promoter sequence conservation, especially upstream of position -500 with respect to the transcription start site, than genes expressed in a subset of tissues. Background Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation. Results We show that mammalian housekeeping genes, expressed in all or nearly all tissues, show significantly lower promoter sequence conservation, especially upstream of position -500 with respect to the transcription start site, than genes expressed in a subset of tissues. In addition, we evaluate the effect of gene function, CpG island content and protein evolutionary rate on promoter sequence conservation. Finally, we identify a subset of transcription factors that bind to motifs that are specifically over-represented in housekeeping gene promoters. Conclusion This is the first report that shows that the promoters of housekeeping genes show reduced sequence conservation with respect to genes expressed in a more tissue-restricted manner. This is likely to be related to simpler gene expression, requiring a smaller number of functional cis-regulatory motifs.
Collapse
Affiliation(s)
- Domènec Farré
- Centre for Genomic Regulation, Dr Aiguader 88, Barcelona 08003, Spain
| | - Nicolás Bellora
- Centre for Genomic Regulation, Dr Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra, Dr Aiguader 88, Barcelona 08003, Spain
| | - Loris Mularoni
- Fundació Institut Municipal d'Investigació Mèdica, Dr Aiguader 88, Barcelona 08003, Spain
| | - Xavier Messeguer
- Universitat Politècnica de Catalunya, Jordi Girona 1-3, Barcelona 08034, Spain
| | - M Mar Albà
- Universitat Pompeu Fabra, Dr Aiguader 88, Barcelona 08003, Spain
- Fundació Institut Municipal d'Investigació Mèdica, Dr Aiguader 88, Barcelona 08003, Spain
- Catalan Institution for Research and Advanced Studies, Pg Lluis Companys 23, Barcelona 08010, Spain
| |
Collapse
|
17
|
Iwama H, Hori Y, Matsumoto K, Murao K, Ishida T. ReAlignerV: web-based genomic alignment tool with high specificity and robustness estimated by species-specific insertion sequences. BMC Bioinformatics 2008; 9:112. [PMID: 18294369 PMCID: PMC2267439 DOI: 10.1186/1471-2105-9-112] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Accepted: 02/22/2008] [Indexed: 11/23/2022] Open
Abstract
Background Detecting conserved noncoding sequences (CNSs) across species highlights the functional elements. Alignment procedures combined with computational prediction of transcription factor binding sites (TFBSs) can narrow down key regulatory elements. Repeat masking processes are often performed before alignment to mask insertion sequences such as transposable elements (TEs). However, recently such TEs have been reported to influence the gene regulatory network evolution. Therefore, an alignment approach that is robust to TE insertions is meaningful for finding novel conserved TFBSs in TEs. Results We constructed a web server 'ReAlignerV' for complex alignment of genomic sequences. ReAlignerV returns ladder-like schematic alignments that integrate predicted TFBSs and the location of TEs. It also provides pair-wise alignments in which the predicted TFBS sites and their names are shown alongside each sequence. Furthermore, we evaluated false positive aligned sites by focusing on the species-specific TEs (SSTEs), and found that ReAlignerV has a higher specificity and robustness to insertions for sequences having more than 20% TE content, compared to LAGAN, AVID, MAVID and BLASTZ. Conclusion ReAlignerV can be applied successfully to TE-insertion-rich sequences without prior repeat masking, and this increases the chances of finding regulatory sequences hidden in TEs, which are important sources of the regulatory network evolution. ReAlignerV can be accessed through and downloaded from .
Collapse
Affiliation(s)
- Hisakazu Iwama
- Life Science Research Center, Kagawa University, Ikenobe 1750-1, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan.
| | | | | | | | | |
Collapse
|
18
|
Lee AP, Yang Y, Brenner S, Venkatesh B. TFCONES: a database of vertebrate transcription factor-encoding genes and their associated conserved noncoding elements. BMC Genomics 2007; 8:441. [PMID: 18045502 PMCID: PMC2148067 DOI: 10.1186/1471-2164-8-441] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2007] [Accepted: 11/29/2007] [Indexed: 02/04/2023] Open
Abstract
Background Transcription factors (TFs) regulate gene transcription and play pivotal roles in various biological processes such as development, cell cycle progression, cell differentiation and tumor suppression. Identifying cis-regulatory elements associated with TF-encoding genes is a crucial step in understanding gene regulatory networks. To this end, we have used a comparative genomics approach to identify putative cis-regulatory elements associated with TF-encoding genes in vertebrates. Description We have created a database named TFCONES (Transcription Factor Genes & Associated COnserved Noncoding ElementS) () which contains all human, mouse and fugu TF-encoding genes and conserved noncoding elements (CNEs) associated with them. The CNEs were identified by gene-by-gene alignments of orthologous TF-encoding gene loci using MLAGAN. We also predicted putative transcription factor binding sites within the CNEs. A significant proportion of human-fugu CNEs contain experimentally defined binding sites for transcriptional activators and repressors, indicating that a majority of the CNEs may function as transcriptional regulatory elements. The TF-encoding genes that are involved in nervous system development are generally enriched for human-fugu CNEs. Users can retrieve TF-encoding genes and their associated CNEs by conducting a keyword search or by selecting a family of DNA-binding proteins. Conclusion The conserved noncoding elements identified in TFCONES represent a catalog of highly prioritized putative cis-regulatory elements of TF-encoding genes and are candidates for functional assay.
Collapse
Affiliation(s)
- Alison P Lee
- Institute of Molecular and Cell Biology, 61 Biopolis Drive, Singapore 138673, Singapore.
| | | | | | | |
Collapse
|
19
|
Mahony S, Corcoran DL, Feingold E, Benos PV. Regulatory conservation of protein coding and microRNA genes in vertebrates: lessons from the opossum genome. Genome Biol 2007; 8:R84. [PMID: 17506886 PMCID: PMC1929153 DOI: 10.1186/gb-2007-8-5-r84] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2006] [Revised: 01/29/2007] [Accepted: 05/16/2007] [Indexed: 02/07/2023] Open
Abstract
A study of conservation of non-coding sequences, cis-regulatory elements and biological functions of regulated genes in opossum and other vertebrates enables better estimation of promoter conservation and transcription factor binding site turnover among mammals Background Being the first noneutherian mammal sequenced, Monodelphis domestica (opossum) offers great potential for enhancing our understanding of the evolutionary processes that take place in mammals. This study focuses on the evolutionary relationships between conservation of noncoding sequences, cis-regulatory elements, and biologic functions of regulated genes in opossum and eight vertebrate species. Results Analysis of 145 intergenic microRNA and all protein coding genes revealed that the upstream sequences of the former are up to twice as conserved as the latter among mammals, except in the first 500 base pairs, where the conservation is similar. Comparison of promoter conservation in 513 protein coding genes and related transcription factor binding sites (TFBSs) showed that 41% of the known human TFBSs are located in the 6.7% of promoter regions that are conserved between human and opossum. Some core biologic processes exhibited significantly fewer conserved TFBSs in human-opossum comparisons, suggesting greater functional divergence. A new measure of efficiency in multigenome phylogenetic footprinting (base regulatory potential rate [BRPR]) shows that including human-opossum conservation increases specificity in finding human TFBSs. Conclusion Opossum facilitates better estimation of promoter conservation and TFBS turnover among mammals. The fact that substantial TFBS numbers are located in a small proportion of the human-opossum conserved sequences emphasizes the importance of marsupial genomes for phylogenetic footprinting-based motif discovery strategies. The BRPR measure is expected to help select genome combinations for optimal performance of these algorithms. Finally, although the etiology of the microRNA upstream increased conservation remains unknown, it is expected to have strong implications for our understanding of regulation of their expression.
Collapse
Affiliation(s)
- Shaun Mahony
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Fifth Avenue, Pittsburgh, PA 15260, USA
| | - David L Corcoran
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, DeSoto Street, Pittsburgh, PA 15261, USA
| | - Eleanor Feingold
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, DeSoto Street, Pittsburgh, PA 15261, USA
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, DeSoto Street, Pittsburgh, PA 15261, USA
| | - Panayiotis V Benos
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Fifth Avenue, Pittsburgh, PA 15260, USA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, DeSoto Street, Pittsburgh, PA 15261, USA
- University of Pittsburgh Cancer Institute, School of Medicine, University of Pittsburgh, Centre Avenue, Pittsburgh, PA 15232, USA
| |
Collapse
|
20
|
Woolfe A, Goode DK, Cooke J, Callaway H, Smith S, Snell P, McEwen GK, Elgar G. CONDOR: a database resource of developmentally associated conserved non-coding elements. BMC DEVELOPMENTAL BIOLOGY 2007; 7:100. [PMID: 17760977 PMCID: PMC2020477 DOI: 10.1186/1471-213x-7-100] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2007] [Accepted: 08/30/2007] [Indexed: 12/04/2022]
Abstract
Background Comparative genomics is currently one of the most popular approaches to study the regulatory architecture of vertebrate genomes. Fish-mammal genomic comparisons have proved powerful in identifying conserved non-coding elements likely to be distal cis-regulatory modules such as enhancers, silencers or insulators that control the expression of genes involved in the regulation of early development. The scientific community is showing increasing interest in characterizing the function, evolution and language of these sequences. Despite this, there remains little in the way of user-friendly access to a large dataset of such elements in conjunction with the analysis and the visualization tools needed to study them. Description Here we present CONDOR (COnserved Non-coDing Orthologous Regions) available at: . In an interactive and intuitive way the website displays data on > 6800 non-coding elements associated with over 120 early developmental genes and conserved across vertebrates. The database regularly incorporates results of ongoing in vivo zebrafish enhancer assays of the CNEs carried out in-house, which currently number ~100. Included and highlighted within this set are elements derived from duplication events both at the origin of vertebrates and more recently in the teleost lineage, thus providing valuable data for studying the divergence of regulatory roles between paralogs. CONDOR therefore provides a number of tools and facilities to allow scientists to progress in their own studies on the function and evolution of developmental cis-regulation. Conclusion By providing access to data with an approachable graphics interface, the CONDOR database presents a rich resource for further studies into the regulation and evolution of genes involved in early development.
Collapse
Affiliation(s)
- Adam Woolfe
- School of Biological Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK
- Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Rockville, MD 20870, USA
| | - Debbie K Goode
- School of Biological Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK
| | - Julie Cooke
- School of Biological Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK
| | - Heather Callaway
- School of Biological Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK
| | - Sarah Smith
- School of Biological Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK
| | - Phil Snell
- School of Biological Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK
| | - Gayle K McEwen
- School of Biological Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK
- Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Rockville, MD 20870, USA
| | - Greg Elgar
- School of Biological Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, UK
| |
Collapse
|
21
|
Beaster-Jones L, Schubert M, Holland LZ. Cis-regulation of the amphioxus engrailed gene: Insights into evolution of a muscle-specific enhancer. Mech Dev 2007; 124:532-42. [PMID: 17624741 DOI: 10.1016/j.mod.2007.06.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2006] [Revised: 06/04/2007] [Accepted: 06/05/2007] [Indexed: 11/24/2022]
Abstract
To gain insights into the relation between evolution of cis-regulatory DNA and evolution of gene function, we identified tissue-specific enhancers of the engrailed gene of the basal chordate amphioxus (Branchiostoma floridae) and compared their ability to direct expression in both amphioxus and its nearest chordate relative, the tunicate Ciona intestinalis. In amphioxus embryos, the native engrailed gene is expressed in three domains - the eight most anterior somites, a few cells in the central nervous system (CNS) and a few ectodermal cells. In contrast, in C. intestinalis, in which muscle development is highly divergent, engrailed expression is limited to the CNS. To characterize the tissue-specific enhancers of amphioxus engrailed, we first showed that 7.8kb of upstream DNA of amphioxus engrailed directs expression to all three domains in amphioxus that express the native gene. We then identified the amphioxus engrailed muscle-specific enhancer as the 1.2kb region of upstream DNA with the highest sequence identity to the mouse en-2 jaw muscle enhancer. This amphioxus enhancer directed expression to both the somites in amphioxus and to the larval muscles in C. intestinalis. These results show that even though expression of the native engrailed has apparently been lost in developing C. intestinalis muscles, they express the transcription factors necessary to activate transcription from the amphioxus engrailed enhancer, suggesting that gene networks may not be completely disrupted if an individual component is lost.
Collapse
Affiliation(s)
- Laura Beaster-Jones
- Marine Biology Research Division, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA 92093-0202, USA
| | | | | |
Collapse
|
22
|
Jiang C, Han L, Su B, Li WH, Zhao Z. Features and Trend of Loss of Promoter-Associated CpG Islands in the Human and Mouse Genomes. Mol Biol Evol 2007; 24:1991-2000. [PMID: 17591602 DOI: 10.1093/molbev/msm128] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
CpG islands (CGIs) are often considered as gene markers, but the number of CGIs varies among mammalian genomes that have similar numbers of genes. In this study, we investigated the distribution of CGIs in the promoter regions of 3,197 human-mouse orthologous gene pairs and found that the mouse genome has notably fewer CGIs in the promoter regions and less pronounced CGI characteristics than does the human genome. We further inferred CGI's ancestral state using the dog genome as a reference and examined the nucleotide substitution pattern and the mutational direction in the conserved regions of human and mouse CGIs. The results reveal many losses of CGIs in both genomes but the loss rate in the mouse lineage is two to four times the rate in the human lineage. We found an intriguing feature of CGI loss, namely that the loss of a CGI usually starts from erosion at the both edges and gradually moves towards the center. We found functional bias in the genes that have lost promoter-associated CGIs in the human or mouse lineage. Finally, our analysis indicates that the association of CGIs with housekeeping genes is not as strong as previously estimated. Our study provides a detailed view of the evolution of promoter-associated CGIs in the human and mouse genomes and our findings are helpful for understanding the evolution of mammalian genomes and the role of CGIs in gene function.
Collapse
Affiliation(s)
- Cizhong Jiang
- Department of Psychiatry and Center for the Study of Biological Complexity, Virginia Commonwealth, USA
| | | | | | | | | |
Collapse
|
23
|
Abstract
With almost 20 genomes sequenced from unicellular ascomycetes (Saccharomycotina), and the prospect of many more in the pipeline, we review the patterns and processes of yeast genome evolution. A central core of about 4000 genes is shared by all the sequenced yeast genomes. Gains of genes by horizontal gene transfer seem to be very rare. Gene losses are more frequent, and losses of whole sets of genes in some pathways in some species can be understood in terms of species-specific differences in biology. The wholesale loss of redundant copies of duplicated genes after whole-genome duplication in the ancestor of one clade of yeasts is likely to have caused the emergence of many reproductively isolated lineages of yeasts at that time, but other processes are responsible for species barriers that arose more recently among close relatives of Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Devin R Scannell
- Smurfit Institute of Genetics, Trinity College, Dublin 2, Ireland
| | | | | |
Collapse
|
24
|
Davuluri RV. Bioinformatics tools for modeling transcription factor target genes and epigenetic changes. Methods Mol Biol 2007; 408:129-151. [PMID: 18314581 DOI: 10.1007/978-1-59745-547-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The combinatorial control of gene regulatory switches involves both transcription factor (TF) complexes and associated epigenetic modifications to the chromatin template. The novel high-throughput technologies, such as Chromatin ImmunoPrecipitation ChIP-chip, have enabled genome-wide in vivo identification of TF target regulatory regions and related epigenetic modifications, which led to the view of highly dynamic TF-DNA interactions in activated or repressed promoters. Consequently, modeling and elucidating the combinatorial interaction of TFs and corresponding cis-regulatory modules in target promoters is of paramount interest. An estimated 5% of the genes in mammalian genomes code for TF proteins, and computational modeling of cis-regulatory logic would rapidly increase the pace of experimental confirmation of TF target promoters at the bench. The purpose of this chapter is to discuss the use of different bioinformatics tools for predicting the target genes of TFs of interest in mammalian genomes, and the application of these methods in the analysis of ChIP-chip experimental data. The author describes most commonly used databases and prediction programs that are available on the World Wide Web and demonstrate the use of some of these programs by an example. A list of these programs is provided along with their web Uniform Resource Locator (URLs) and guidelines for successful application are suggested.
Collapse
Affiliation(s)
- Ramana V Davuluri
- OSU Comprehensive Cancer Center, Ohio State University, Columbus, USA
| |
Collapse
|
25
|
Jin VX, Singer GAC, Agosto-Pérez FJ, Liyanarachchi S, Davuluri RV. Genome-wide analysis of core promoter elements from conserved human and mouse orthologous pairs. BMC Bioinformatics 2006; 7:114. [PMID: 16522199 PMCID: PMC1475891 DOI: 10.1186/1471-2105-7-114] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2005] [Accepted: 03/07/2006] [Indexed: 01/20/2023] Open
Abstract
Background The canonical core promoter elements consist of the TATA box, initiator (Inr), downstream core promoter element (DPE), TFIIB recognition element (BRE) and the newly-discovered motif 10 element (MTE). The motifs for these core promoter elements are highly degenerate, which tends to lead to a high false discovery rate when attempting to detect them in promoter sequences. Results In this study, we have performed the first analysis of these core promoter elements in orthologous mouse and human promoters with experimentally-supported transcription start sites. We have identified these various elements using a combination of positional weight matrices (PWMs) and the degree of conservation of orthologous mouse and human sequences – a procedure that significantly reduces the false positive rate of motif discovery. Our analysis of 9,010 orthologous mouse-human promoter pairs revealed two combinations of three-way synergistic effects, TATA-Inr-MTE and BRE-Inr-MTE. The former has previously been putatively identified in human, but the latter represents a novel synergistic relationship. Conclusion Our results demonstrate that DNA sequence conservation can greatly improve the identification of functional core promoter elements in the human genome. The data also underscores the importance of synergistic occurrence of two or more core promoter elements. Furthermore, the sequence data and results presented here can help build better computational models for predicting the transcription start sites in the promoter regions, which remains one of the most challenging problems.
Collapse
Affiliation(s)
- Victor X Jin
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Gregory AC Singer
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Francisco J Agosto-Pérez
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Sandya Liyanarachchi
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Ramana V Davuluri
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
26
|
Choi D, Fang Y, Mathers WD. Condition-specific coregulation with cis-regulatory motifs and modules in the mouse genome. Genomics 2006; 87:500-8. [PMID: 16431075 DOI: 10.1016/j.ygeno.2005.11.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2005] [Accepted: 11/26/2005] [Indexed: 11/30/2022]
Abstract
Deciphering genetic regulatory codes remains a challenge. Here, we present an effective approach to identifying in vivo condition-specific coregulation with cis-regulatory motifs and modules in the mouse genome. A resampling-based algorithm was adopted to cluster our microarray data of a stress response, which generated 35 tight clusters with unique expression patterns containing 811 genes of 5652 genes significantly altered. Database searches identified many known motifs within the 3-kb regulatory regions of 40 genes from 3 clusters and modules with six to nine motifs that were commonly shared by 60-100% of these genes. The upstream regulatory region contained the highest frequency of these common motifs. CisModule program predictions were comparable with the results from database searches and found four potentially novel motifs. This result indicates that these motifs and modules could be responsible for gene coregulation of the stress response in the lacrimal gland.
Collapse
Affiliation(s)
- Dongseok Choi
- Division of Biostatistics, Department of Public Health & Preventive Medicine, Oregon Health & Science University, 3375 SW Terwilliger Boulevard, Portland, OR 97239, USA
| | | | | |
Collapse
|
27
|
Sauer T, Shelest E, Wingender E. Evaluating phylogenetic footprinting for human-rodent comparisons. Bioinformatics 2005; 22:430-7. [PMID: 16332706 DOI: 10.1093/bioinformatics/bti819] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
MOTIVATION 'Phylogenetic footprinting' is a widely applied approach to identify regulatory regions and potential transcription factor binding sites (TFBSs) using alignments of non-coding orthologous regions from two or more organisms. A systematic evaluation of its validity and usability based on known TFBSs is needed to use phylogenetic footprinting most effectively in the identification of unknown TFBSs. RESULTS In this paper we use 2678 human, mouse and rat TFBSs from the TRANSFAC database for this evaluation. To ensure the retrieval of correct orthologous sequences, we combine gene annotation and sequence homology searches. Demanding a sequence identity of at least 65% is most effective in discriminating TFBSs from non-functional sequence parts, while different alignment algorithms only have a minor influence on TFBS identification by human-rodent comparisons. With this threshold approximately 72% of the known TFBSs are found conserved, a number which varies significantly between different transcription factors and also depends on the function of the regulated gene. TFBSs for certain transcription factors do not require strict sequence conservation but instead may show a high pattern conservation, limiting somewhat the validity of purely sequence-based phylogenetic footprinting.
Collapse
Affiliation(s)
- Tilman Sauer
- Department of Bioinformatics, UKG, Georg-August-University of Goettingen Goldschmidtstrasse 1, 37077 Goettingen, Germany.
| | | | | |
Collapse
|
28
|
Choi SS, Bush EC, Lahn BT. Different classes of tissue-specific genes show different levels of noncoding conservation. Genomics 2005; 87:433-6. [PMID: 16303283 DOI: 10.1016/j.ygeno.2005.09.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2005] [Revised: 09/17/2005] [Accepted: 09/21/2005] [Indexed: 10/25/2022]
Abstract
We divide tissue-specific genes into two major classes: regulators, defined as genes participating in tissue-specific transcriptional regulation, and effectors, defined as genes involved in rendering the physiological properties of cells. We show that regulators tend to have significantly greater noncoding conservation than effectors. We further show that within the regulator class, tissue-specific transcription factors generally have the greatest noncoding conservation, whereas signal receptors generally have the least noncoding conservation. Using noncoding conservation as a proxy for the complexity of cis-regulatory DNA, we extrapolate that different classes of tissue-specific genes tend to have different levels of cis-regulatory complexity and that greater complexity can be found in genes involved in transcriptional regulation, especially transcription factors.
Collapse
Affiliation(s)
- Sun Shim Choi
- Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | | | | |
Collapse
|
29
|
Siebert S, Thomsen S, Reimer MM, Bosch TCG. Control of foot differentiation in Hydra: Phylogenetic footprinting indicates interaction of head, bud and foot patterning systems. Mech Dev 2005; 122:998-1007. [PMID: 15922570 DOI: 10.1016/j.mod.2005.04.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2004] [Revised: 04/27/2005] [Accepted: 04/27/2005] [Indexed: 10/25/2022]
Abstract
Homeodomain transcription factor CnNK-2 seems to play a major role in foot formation in Hydra. Recently, we reported in vitro evidence indicating that CnNK-2 has autoregulatory features and regulates expression of the morphogenetic peptide pedibin. We proposed that CnNK-2 and pedibin synergistically orchestrate foot differentiation processes. Here, we further analyzed the regulatory network controlling foot formation in Hydra. By phylogenetic footprinting we compared the CnNK-2 5'-flanking sequence from two closely related species, Hydra vulgaris and Hydra oligactis. Unexpectedly, we detected a highly conserved binding site for HNF-3beta, a vertebrate Forkhead transcription factor, in the CnNK-2 5'-flanking region. The Hydra HNF-3beta homolog budhead is predominantly expressed in the apical region of the body column and early during budding. Budhead is absent from tissue expressing CnNK-2 and thought to be involved in determining tissue for head differentiation. By electrophoretic mobility shift assays we demonstrate an in vitro interaction between recombinant budhead protein and the interspecific conserved HNF-3beta binding motif in the CnNK-2 5'-flanking region. Our results strengthen the view of CnNK-2 as an important regulator during foot patterning processes. Furtheron, they point to budhead as a candidate for a transcriptional regulator of CnNK-2 and to an interaction of foot and head patterning processes in Hydra on the molecular level.
Collapse
Affiliation(s)
- S Siebert
- Zoological Institute, Christian-Albrechts University of Kiel, Am Botanischen Garten 1-9, 24118 Kiel, Germany
| | | | | | | |
Collapse
|
30
|
Sironi M, Menozzi G, Comi GP, Cagliani R, Bresolin N, Pozzoli U. Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences. Hum Mol Genet 2005; 14:2533-46. [PMID: 16037065 DOI: 10.1093/hmg/ddi257] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The non-coding portion of human genome is punctuated by a large number of multispecies conserved sequence (MCS) elements with largely unknown function. We demonstrate that MCSs are unevenly distributed in human introns with the majority of relatively short introns (< 9 kb long) displaying no or a few MCSs and that MCS density reaching up to 10% of total size in longer introns. After correction for intron length, MCSs were found to be enriched within genes involved in development and transcription, whereas depleted in immune response loci. Moreover, many central nervous system tissues show a preferential expression of MCS-rich genes and MCS enrichment significantly correlates with gene functional complexity in terms of distinct protein domains. Analysis of human-mouse orthologous pairs indicated a significant association between intronic MCS density and conservation of protein sequence, promoter regions and untranslated sequences. Moreover, MCS density correlates with the predicted occurrence of human-mouse conserved alternative splicing events. These observations suggest that evolution acts on human genes as integrated units of coding and regulatory capacity and that functional complexity might represent a major source of negative selection on non-coding sequences. To substantiate our result, we also searched previously experimentally identified intronic regulatory elements and indicate that about half of these sequences map to an MCS; in particular, support to the notion whereby mutations in MCSs can result in human genetic diseases is provided, because three previously identified intronic pathological variations were found to occur within MCSs, and human disease and cancer genes were found significantly enriched in MCSs.
Collapse
Affiliation(s)
- Manuela Sironi
- Scientific Institute IRCCS E. Medea, 23842 Bosisio Parini (LC), Italy
| | | | | | | | | | | |
Collapse
|
31
|
Suzuki Y, Ikeo K, Gojobori T. [Genome network]. Nihon Yakurigaku Zasshi 2005; 126:55-9. [PMID: 16141619 DOI: 10.1254/fpj.126.55] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
|
32
|
Yang J, Su AI, Li WH. Gene Expression Evolves Faster in Narrowly Than in Broadly Expressed Mammalian Genes. Mol Biol Evol 2005; 22:2113-8. [PMID: 15987875 DOI: 10.1093/molbev/msi206] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Despite much recent interest, it remains unclear what determines the rate of evolution of gene expression. To study this issue we develop a new measure, called "Expression Conservation Index" (ECI), to quantify the degree of tissue-expression conservation between two homologous genes. Applying this measure to a large set of gene expression data from human and mouse, we show that tissue expression tends to evolve rapidly for genes that are expressed in only a limited number of tissues, whereas tissue expression can be conserved for a long time for genes expressed in a large number of tissues. Therefore, expression breadth is an important determinant for evolutionary conservation of tissue expression. In addition, we find a rapid decrease in ECI with the synonymous divergence between duplicate genes, suggesting fast divergence in tissue expression between duplicate genes.
Collapse
Affiliation(s)
- Jing Yang
- Department of Ecology and Evolution, University of Chicago, USA
| | | | | |
Collapse
|