601
|
Ferretti V, Poitras C, Bergeron D, Coulombe B, Robert F, Blanchette M. PReMod: a database of genome-wide mammalian cis-regulatory module predictions. Nucleic Acids Res 2006; 35:D122-6. [PMID: 17148480 PMCID: PMC1761432 DOI: 10.1093/nar/gkl879] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
We describe PReMod, a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes. The prediction algorithm, described previously in Blanchette et al. (2006) Genome Res., 16, 656–668, exploits the fact that many known CRMs are made of clusters of phylogenetically conserved and repeated transcription factors (TF) binding sites. Contrary to other existing databases, PReMod is not restricted to modules located proximal to genes, but in fact mostly contains distal predicted CRMs (pCRMs). Through its web interface, PReMod allows users to (i) identify pCRMs around a gene of interest; (ii) identify pCRMs that have binding sites for a given TF (or a set of TFs) or (iii) download the entire dataset for local analyses. Queries can also be refined by filtering for specific chromosomal regions, for specific regions relative to genes or for the presence of CpG islands. The output includes information about the binding sites predicted within the selected pCRMs, and a graphical display of their distribution within the pCRMs. It also provides a visual depiction of the chromosomal context of the selected pCRMs in terms of neighboring pCRMs and genes, all of which are linked to the UCSC Genome Browser and the NCBI. PReMod: .
Collapse
Affiliation(s)
| | - Christian Poitras
- Institut de Recherches Cliniques de Montréal110 Pine Avenue West, Montréal, Qc, Canada H2W 1R7
| | - Dominique Bergeron
- Institut de Recherches Cliniques de Montréal110 Pine Avenue West, Montréal, Qc, Canada H2W 1R7
| | - Benoit Coulombe
- Institut de Recherches Cliniques de Montréal110 Pine Avenue West, Montréal, Qc, Canada H2W 1R7
| | - François Robert
- Institut de Recherches Cliniques de Montréal110 Pine Avenue West, Montréal, Qc, Canada H2W 1R7
| | - Mathieu Blanchette
- McGill Center for Bioinformatics. McGill University3775 University Street, room #332. Montréal, Qc, Canada H3A 2B4
- To whom correspondence should be addressed. Tel: 514 398 5209; Fax: 514 398 3387;
| |
Collapse
|
602
|
Wang H, Zhang Y, Cheng Y, Zhou Y, King DC, Taylor J, Chiaromonte F, Kasturi J, Petrykowska H, Gibb B, Dorman C, Miller W, Dore LC, Welch J, Weiss MJ, Hardison RC. Experimental validation of predicted mammalian erythroid cis-regulatory modules. Genes Dev 2006; 16:1480-92. [PMID: 17038566 PMCID: PMC1665632 DOI: 10.1101/gr.5353806] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2006] [Accepted: 06/07/2006] [Indexed: 11/25/2022]
Abstract
Multiple alignments of genome sequences are helpful guides to functional analysis, but predicting cis-regulatory modules (CRMs) accurately from such alignments remains an elusive goal. We predict CRMs for mammalian genes expressed in red blood cells by combining two properties gleaned from aligned, noncoding genome sequences: a positive regulatory potential (RP) score, which detects similarity to patterns in alignments distinctive for regulatory regions, and conservation of a binding site motif for the essential erythroid transcription factor GATA-1. Within eight target loci, we tested 75 noncoding segments by reporter gene assays in transiently transfected human K562 cells and/or after site-directed integration into murine erythroleukemia cells. Segments with a high RP score and a conserved exact match to the binding site consensus are validated at a good rate (50%-100%, with rates increasing at higher RP), whereas segments with lower RP scores or nonconsensus binding motifs tend to be inactive. Active DNA segments were shown to be occupied by GATA-1 protein by chromatin immunoprecipitation, whereas sites predicted to be inactive were not occupied. We verify four previously known erythroid CRMs and identify 28 novel ones. Thus, high RP in combination with another feature of a CRM, such as a conserved transcription factor binding site, is a good predictor of functional CRMs. Genome-wide predictions based on RP and a large set of well-defined transcription factor binding sites are available through servers at http://www.bx.psu.edu/.
Collapse
Affiliation(s)
- Hao Wang
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Ying Zhang
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Intercollege Graduate Degree Program in Genetics
| | - Yong Cheng
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Yuepin Zhou
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - David C. King
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Intercollege Graduate Degree Program in Integrative Biosciences
| | - James Taylor
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Computer Science and Engineering
| | - Francesca Chiaromonte
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Statistics, and
| | - Jyotsna Kasturi
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Computer Science and Engineering
| | - Hanna Petrykowska
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Brian Gibb
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Christine Dorman
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| | - Webb Miller
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Computer Science and Engineering
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Louis C. Dore
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - John Welch
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Mitchell J. Weiss
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Ross C. Hardison
- Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences
- Department of Biochemistry and Molecular Biology
| |
Collapse
|
603
|
Loots G, Ovcharenko I. ECRbase: database of evolutionary conserved regions, promoters, and transcription factor binding sites in vertebrate genomes. Bioinformatics 2006; 23:122-4. [PMID: 17090579 DOI: 10.1093/bioinformatics/btl546] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Evolutionary conservation of DNA sequences provides a tool for the identification of functional elements in genomes. We have created a database of evolutionary conserved regions (ECRs) in vertebrate genomes, entitled ECRbase, which is constructed from a collection of whole-genome alignments produced by the ECR Browser. ECRbase features a database of syntenic blocks that recapitulate the evolution of rearrangements in vertebrates and a comprehensive collection of promoters in all vertebrate genomes generated using multiple sources of gene annotation. The database also contains a collection of annotated transcription factor binding sites (TFBSs) in evolutionary conserved and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish and fugu genomes. It is freely accessible at http://ecrbase.dcode.org.
Collapse
|
604
|
Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD, Plajzer-Frick I, Akiyama J, De Val S, Afzal V, Black BL, Couronne O, Eisen MB, Visel A, Rubin EM. In vivo enhancer analysis of human conserved non-coding sequences. Nature 2006; 444:499-502. [PMID: 17086198 DOI: 10.1038/nature05295] [Citation(s) in RCA: 867] [Impact Index Per Article: 48.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2006] [Accepted: 09/22/2006] [Indexed: 12/16/2022]
Abstract
Identifying the sequences that direct the spatial and temporal expression of genes and defining their function in vivo remains a significant challenge in the annotation of vertebrate genomes. One major obstacle is the lack of experimentally validated training sets. In this study, we made use of extreme evolutionary sequence conservation as a filter to identify putative gene regulatory elements, and characterized the in vivo enhancer activity of a large group of non-coding elements in the human genome that are conserved in human-pufferfish, Takifugu (Fugu) rubripes, or ultraconserved in human-mouse-rat. We tested 167 of these extremely conserved sequences in a transgenic mouse enhancer assay. Here we report that 45% of these sequences functioned reproducibly as tissue-specific enhancers of gene expression at embryonic day 11.5. While directing expression in a broad range of anatomical structures in the embryo, the majority of the 75 enhancers directed expression to various regions of the developing nervous system. We identified sequence signatures enriched in a subset of these elements that targeted forebrain expression, and used these features to rank all approximately 3,100 non-coding elements in the human genome that are conserved between human and Fugu. The testing of the top predictions in transgenic mice resulted in a threefold enrichment for sequences with forebrain enhancer activity. These data dramatically expand the catalogue of human gene enhancers that have been characterized in vivo, and illustrate the utility of such training sets for a variety of biological applications, including decoding the regulatory vocabulary of the human genome.
Collapse
Affiliation(s)
- Len A Pennacchio
- US Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
605
|
Mazzarelli JM, Brestelli J, Gorski RK, Liu J, Manduchi E, Pinney DF, Schug J, White P, Kaestner KH, Stoeckert CJ. EPConDB: a web resource for gene expression related to pancreatic development, beta-cell function and diabetes. Nucleic Acids Res 2006; 35:D751-5. [PMID: 17071715 PMCID: PMC1781120 DOI: 10.1093/nar/gkl748] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
EPConDB () is a public web site that supports research in diabetes, pancreatic development and beta-cell function by providing information about genes expressed in cells of the pancreas. EPConDB displays expression profiles for individual genes and information about transcripts, promoter elements and transcription factor binding sites. Gene expression results are obtained from studies examining tissue expression, pancreatic development and growth, differentiation of insulin-producing cells, islet or beta-cell injury, and genetic models of impaired beta-cell function. The expression datasets are derived using different microarray platforms, including the BCBC PancChips and Affymetrix gene expression arrays. Other datasets include semi-quantitative RT–PCR and MPSS expression studies. For selected microarray studies, lists of differentially expressed genes, derived from PaGE analysis, are displayed on the site. EPConDB provides database queries and tools to examine the relationship between a gene, its transcriptional regulation, protein function and expression in pancreatic tissues.
Collapse
Affiliation(s)
- Joan M. Mazzarelli
- To whom correspondence should be addressed. Tel: +1 610 521 1738; Fax: +1 215 573 3111;
| | | | | | | | | | | | | | | | | | | |
Collapse
|
606
|
Walhout AJM. Unraveling transcription regulatory networks by protein-DNA and protein-protein interaction mapping. Genome Res 2006; 16:1445-54. [PMID: 17053092 DOI: 10.1101/gr.5321506] [Citation(s) in RCA: 113] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Metazoan genomes contain thousands of protein-coding and noncoding RNA genes, most of which are differentially expressed, i.e., at different locations or at different times during development, function, or pathology of the organism. Differential gene expression is achieved in part by the action of regulatory transcription factors (TFs) that bind to cis-regulatory elements that are often located in or near their target genes. Each TF likely regulates many targets in the context of intricate transcription regulatory networks. Up to 10% of a genome may encode TFs, but only a handful of these have been studied in detail. Here, I will discuss the different steps involved in the mapping and analysis of transcription regulatory networks, including the identification of network nodes (TFs and their target sequences) and edges (TF-TF dimers and TF-DNA target interactions), integration with other data types, and network properties and emerging principles that provide insights into differential gene expression.
Collapse
Affiliation(s)
- Albertha J M Walhout
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
| |
Collapse
|
607
|
Bird CP, Stranger BE, Dermitzakis ET. Functional variation and evolution of non-coding DNA. Curr Opin Genet Dev 2006; 16:559-64. [PMID: 17055246 DOI: 10.1016/j.gde.2006.10.003] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2006] [Accepted: 10/06/2006] [Indexed: 10/24/2022]
Abstract
The focus of large genomic studies has shifted from only looking at genes and protein-coding sequences to exploring the full set of elements in each genome. The explosion of comparative sequencing data has led to an increase in methodologies, approaches and ideas on how to analyze the unknown fraction of the genome, namely the non-protein-coding fraction. The main issues relate to the discovery, evolutionary analysis and natural variation of non-coding DNA, and the parameters that prevent us from fully understanding the properties of non-coding DNA.
Collapse
Affiliation(s)
- Christine P Bird
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
| | | | | |
Collapse
|
608
|
Pollard KS, Salama SR, King B, Kern AD, Dreszer T, Katzman S, Siepel A, Pedersen JS, Bejerano G, Baertsch R, Rosenbloom KR, Kent J, Haussler D. Forces shaping the fastest evolving regions in the human genome. PLoS Genet 2006; 2:e168. [PMID: 17040131 PMCID: PMC1599772 DOI: 10.1371/journal.pgen.0020168] [Citation(s) in RCA: 317] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2005] [Accepted: 08/23/2006] [Indexed: 01/19/2023] Open
Abstract
Comparative genomics allow us to search the human genome for segments that were extensively changed in the last approximately 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human. These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements are dramatically changed in human but not in other primates, with seven times more substitutions in human than in chimp. The accelerated elements, and in particular the top five, show a strong bias for adenine and thymine to guanine and cytosine nucleotide changes and are disproportionately located in high recombination and high guanine and cytosine content environments near telomeres, suggesting either biased gene conversion or isochore selection. In addition, there is some evidence of directional selection in the regions containing the two most accelerated regions. A combination of evolutionary forces has contributed to accelerated evolution of the fastest evolving elements in the human genome.
Collapse
Affiliation(s)
- Katherine S Pollard
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
609
|
Bailey PJ, Klos JM, Andersson E, Karlén M, Källström M, Ponjavic J, Muhr J, Lenhard B, Sandelin A, Ericson J. A global genomic transcriptional code associated with CNS-expressed genes. Exp Cell Res 2006; 312:3108-19. [PMID: 16919269 DOI: 10.1016/j.yexcr.2006.06.017] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2006] [Revised: 06/05/2006] [Accepted: 06/12/2006] [Indexed: 01/28/2023]
Abstract
Highly conserved non-coding DNA regions (HCNR) occur frequently in vertebrate genomes, but their functional roles remain unclear. Here, we provide evidence that a large portion of HCNRs are enriched for binding sites for Sox, POU and Homeodomain transcription factors, and such HCNRs can act as cis-regulatory regions active in neural stem cells. Strikingly, these HCNRs are linked to several hundreds of genes expressed in the developing CNS and they may exert locus-wide regulatory effects on multiple genes flanking their genomic location. Moreover, these data imply a unifying transcriptional logic for a large set of CNS-expressed genes in which Sox and POU proteins act as generic promoters of transcription while Homeodomain proteins control the spatial expression of genes through active repression.
Collapse
Affiliation(s)
- Peter J Bailey
- Department of Cell and Molecular Biology, Medical Nobel Institute, Karolinska Institute, S-171, 77 Stockholm, Sweden
| | | | | | | | | | | | | | | | | | | |
Collapse
|
610
|
Derti A, Roth FP, Church GM, Wu CT. Mammalian ultraconserved elements are strongly depleted among segmental duplications and copy number variants. Nat Genet 2006; 38:1216-20. [PMID: 16998490 DOI: 10.1038/ng1888] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2006] [Accepted: 08/23/2006] [Indexed: 02/05/2023]
Abstract
An earlier search in the human, mouse and rat genomes for sequences that are 100% conserved in orthologous segments and > or = 200 bp in length identified 481 distinct sequences. These human-mouse-rat sequences, which represent ultraconserved elements (UCEs), are believed to be important for functions involving DNA binding, RNA processing and the regulation of transcription and development. In vivo and additional computational studies of UCEs and other highly conserved sequences are consistent with these functional associations, with some observations indicating enhancer-like activity for these elements. Here, we show that UCEs are significantly depleted among segmental duplications and copy number variants. Notably, of the UCEs that are found in segmental duplications or copy number variants, the majority overlap exons, indicating, along with other findings presented, that UCEs overlapping exons represent a distinct subset.
Collapse
Affiliation(s)
- Adnan Derti
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | |
Collapse
|
611
|
Xu X, Scott MM, Deneris ES. Shared long-range regulatory elements coordinate expression of a gene cluster encoding nicotinic receptor heteromeric subtypes. Mol Cell Biol 2006; 26:5636-49. [PMID: 16847319 PMCID: PMC1592759 DOI: 10.1128/mcb.00456-06] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The nicotinic acetylcholine receptor (nAChR) beta4/alpha3/alpha5 gene cluster encodes several heteromeric transmitter receptor subtypes that are essential for cholinergic synaptic transmission in adrenal gland, autonomic ganglia, pineal gland, and several nuclei in the central nervous system. However, the transcriptional mechanisms coordinating expression of these subunit genes in different cell populations are unknown. Here, we used transgenic methods to investigate long-range transcriptional control of the cluster. A 132-kb P1-derived artificial chromosome (PAC) encoding the rat cluster recapitulated the neurally- and endocrine-restricted expression patterns of the endogenous beta4/alpha3/alpha5 genes. Mutation of ETS factor binding sites in an enhancer, beta43', embedded in the beta4 3'-untranslated exon resulted in greatly diminished beta4, alpha3, and alpha5 expression in adrenal gland and to a lesser extent in the superior cervical ganglion (SCG) but not in other tissues. Phylogenetic sequence analyses revealed several conserved noncoding regions (CNRs) upstream of beta4 and alpha5. Deletion of one of them (CNR4) located 20 kb upstream of beta4 resulted in a dramatic decrease in beta4 and alpha3 expression in the pineal gland and SCG. CNR4 was sufficient to direct LacZ transgene expression to SCG neurons, which express the endogenous beta4alpha3alpha5 subunits, and pineal cells, which express the endogenous beta4alpha3 combination. Finally, CNR4 was able to direct transgene expression to major sites of expression of the endogenous cluster in the brain. Together, our findings support a model in which cell type-specific shared long-range regulatory elements are required for coordinate expression of clustered nAChR genes.
Collapse
Affiliation(s)
- Xiaohong Xu
- Case School of Medicine, Department of Neuroscience, 2109 Adelbert Rd., Cleveland, OH 44106-4975, USA
| | | | | |
Collapse
|
612
|
Gardiner EJ, Hirons L, Hunter CA, Willett P. Genomic data analysis using DNA structure: an analysis of conserved nongenic sequences and ultraconserved elements. J Chem Inf Model 2006; 46:753-61. [PMID: 16563006 DOI: 10.1021/ci050384i] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Recent comparative studies of the human and mouse genomes have revealed sets of conserved nongenic sequences (CNGs) and sets of ultraconserved elements (UCEs). Both sets of sequences, which exhibit extremely high levels of conservation, extend over hundreds of bases and have no known function. Since there is no detectable sequence homology between paralogous CNGs or UCEs in either of the species, an alignment-free technique is needed for their analysis. We have previously compiled a database of the structural properties of all 32,896 unique DNA octamers, including information on stability, the minimum energy conformation, and flexibility. We have used Fourier techniques to analyze the UCEs and CNGs in terms of their octamer structural properties, to reveal structural correlations which may indicate possible functions for some of these sequences.
Collapse
Affiliation(s)
- Eleanor J Gardiner
- Centre for Chemical Biology, Krebs Institute for Biomolecular Science, Department of Chemistry, University of Sheffield, Sheffield S3 7HF, United Kingdom.
| | | | | | | |
Collapse
|
613
|
El-Mogharbel N, Wakefield M, Deakin JE, Tsend-Ayush E, Grützner F, Alsop A, Ezaz T, Marshall Graves JA. DMRT gene cluster analysis in the platypus: new insights into genomic organization and regulatory regions. Genomics 2006; 89:10-21. [PMID: 16962738 DOI: 10.1016/j.ygeno.2006.07.017] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2006] [Revised: 07/31/2006] [Accepted: 07/31/2006] [Indexed: 10/24/2022]
Abstract
We isolated and characterized a cluster of platypus DMRT genes and compared their arrangement, location, and sequence across vertebrates. The DMRT gene cluster on human 9p24.3 harbors, in order, DMRT1, DMRT3, and DMRT2, which share a DM domain. DMRT1 is highly conserved and involved in sexual development in vertebrates, and deletions in this region cause sex reversal in humans. Sequence comparisons of DMRT genes between species have been valuable in identifying exons, control regions, and conserved nongenic regions (CNGs). The addition of platypus sequences is expected to be particularly valuable, since monotremes fill a gap in the vertebrate genome coverage. We therefore isolated and fully sequenced platypus BAC clones containing DMRT3 and DMRT2 as well as DMRT1 and then generated multispecies alignments and ran prediction programs followed by experimental verification to annotate this gene cluster. We found that the three genes have 58-66% identity to their human orthologues, lie in the same order as in other vertebrates, and colocate on 1 of the 10 platypus sex chromosomes, X5. We also predict that optimal annotation of the newly sequenced platypus genome will be challenging. The analysis of platypus sequence revealed differences in structure and sequence of the DMRT gene cluster. Multispecies comparison was particularly effective for detecting CNGs, revealing several novel potential regulatory regions within DMRT3 and DMRT2 as well as DMRT1. RT-PCR indicated that platypus DMRT1 and DMRT3 are expressed specifically in the adult testis (and not ovary), but DMRT2 has a wider expression profile, as it does for other mammals. The platypus DMRT1 expression pattern, and its location on an X chromosome, suggests an involvement in monotreme sexual development.
Collapse
Affiliation(s)
- Nisrine El-Mogharbel
- Comparative Genomics Group, Research School of Biological Sciences, Australian National University, P.O. Box 475, Canberra, ACT 2601, Australia.
| | | | | | | | | | | | | | | |
Collapse
|
614
|
Franch R, Louro B, Tsalavouta M, Chatziplis D, Tsigenopoulos CS, Sarropoulou E, Antonello J, Magoulas A, Mylonas CC, Babbucci M, Patarnello T, Power DM, Kotoulas G, Bargelloni L. A genetic linkage map of the hermaphrodite teleost fish Sparus aurata L. Genetics 2006; 174:851-61. [PMID: 16951080 PMCID: PMC1602104 DOI: 10.1534/genetics.106.059014] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The gilthead sea bream (Sparus aurata L.) is a marine fish of great importance for fisheries and aquaculture. It has also a peculiar sex-determination system, being a protandrous hermaphrodite. Here we report the construction of a first-generation genetic linkage map for S. aurata, based on 204 microsatellite markers. Twenty-six linkage groups (LG) were found. The total map length was 1241.9 cM. The ratio between sex-specific map lengths was 1:1.2 (male:female). Comparison with a preliminary radiation hybrid (RH) map reveals a good concordance, as all markers located in a single LG are located in a single RH group, except for Ad-25 and CId-31. Comparison with the Tetraodon nigroviridis genome revealed a considerable number of evolutionary conserved regions (ECRs) between the two species. The mean size of ECRs was 182 bp (sequence identity 60-90%). Forty-one ECRs have a known chromosomal location in the pufferfish genome. Despite the limited number of anchoring points, significant syntenic relationships were found. The linkage map presented here provides a robust comparative framework for QTL analysis in S. aurata and is a step toward the identification of genetic loci involved both in the determination of economically important traits and in the individual timing of sex reversal.
Collapse
Affiliation(s)
- Rafaella Franch
- Department of Public Health, University of Padova, 35121 Padova, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
615
|
Inoue F, Nagayoshi S, Ota S, Islam ME, Tonou-Fujimori N, Odaira Y, Kawakami K, Yamasu K. Genomic organization, alternative splicing, and multiple regulatory regions of the zebrafish fgf8 gene. Dev Growth Differ 2006; 48:447-62. [PMID: 16961592 DOI: 10.1111/j.1440-169x.2006.00882.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Fgf8 is among the members of the fibroblast growth factor (FGF) family that play pivotal roles in vertebrate development. In the present study, the genomic DNA of the zebrafish fgf8 gene was cloned to elucidate the regulatory mechanism behind the temporally and spatially restricted expression of the gene in vertebrate embryos. Structural analysis revealed that the exon-intron organization of fgf8 is highly conserved during vertebrate evolution, from teleosts to mammals. Close inspection of the genomic sequence and reverse transcription-polymerase chain reaction analysis revealed that zebrafish fgf8 encodes two splicing variants, corresponding to Fgf8a and Fgf8b, among the four to seven splicing variants known in mammals. Misexpression of the two variants in zebrafish embryos following mRNA injection showed that both variants have dorsalizing activities on zebrafish embryos, with Fgf8b being more potent. Reporter gene analysis of the transcriptional regulation of zebrafish fgf8 suggested that its complicated expression pattern, which is considered essential for its multiple roles in development, is mediated by combinations of different regulatory regions in the upstream and downstream regions of the gene. Furthermore, comparison of the genomic sequence of fgf8 among different vertebrate species suggests that this regulatory mechanism is conserved during vertebrate evolution.
Collapse
Affiliation(s)
- Fumitaka Inoue
- Department of Life Science, Graduate School of Science and Engineering, Saitama University, Saitama City, Saitama 338-8570, Japan
| | | | | | | | | | | | | | | |
Collapse
|
616
|
Sun H, Skogerbø G, Chen R. Conserved distances between vertebrate highly conserved elements. Hum Mol Genet 2006; 15:2911-22. [PMID: 16923797 DOI: 10.1093/hmg/ddl232] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
High numbers of sequence element with very high (>95%) sequence conservation between the human and other vertebrate genomes have been reported and ascribed putative cis-regulatory functions. We have investigated the structural relationships between such elements in mammalian genomes and find that not only their sequences, but also the distances between them are significantly (P<2.2x10(-16)) more conserved than corresponding distances between orthologous protein-coding genes or between exons within these genes. Regions of largely conserved distance between consecutive highly conserved elements (HCE) generally overlap previously identified HCE clusters, but may be far longer (up to 20 Mb) and possibly cover close to 25% of the human genome sequence. Similar conservation of distance is found between bird (chicken) and mammalian genomes and is also discernible in comparisons between fish and mammals. The data suggest either that a substantial amount of essential (functionally active) elements with lower sequence conservation occupy the space between the HCEs or that distance itself is an important factor in transcriptional regulation or chromatin modelling.
Collapse
Affiliation(s)
- Hong Sun
- Bioinformatics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing, P.R. China
| | | | | |
Collapse
|
617
|
Salerno W, Havlak P, Miller J. Scale-invariant structure of strongly conserved sequence in genomic intersections and alignments. Proc Natl Acad Sci U S A 2006; 103:13121-5. [PMID: 16924100 PMCID: PMC1559763 DOI: 10.1073/pnas.0605735103] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
A power-law distribution of the length of perfectly conserved sequence from mouse/human whole-genome intersection and alignment is exhibited. Spatial correlations of these elements within the mouse genome are studied. It is argued that these power-law distributions and correlations are comprised in part by functional noncoding sequence and ought to be accounted for in estimating the statistical significance of apparent sequence conservation. These inter-genomic correlations of conservation are placed in the context of previously observed intra-genomic correlations, and their possible origins and consequences are discussed.
Collapse
Affiliation(s)
| | - Paul Havlak
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Jonathan Miller
- *Department of Biochemistry and Molecular Biology and
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
618
|
Marques AT, Antunes A, Fernandes PA, Ramos MJ. Comparative evolutionary genomics of the HADH2 gene encoding Abeta-binding alcohol dehydrogenase/17beta-hydroxysteroid dehydrogenase type 10 (ABAD/HSD10). BMC Genomics 2006; 7:202. [PMID: 16899120 PMCID: PMC1559703 DOI: 10.1186/1471-2164-7-202] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2006] [Accepted: 08/09/2006] [Indexed: 11/17/2022] Open
Abstract
Background The Aβ-binding alcohol dehydrogenase/17β-hydroxysteroid dehydrogenase type 10 (ABAD/HSD10) is an enzyme involved in pivotal metabolic processes and in the mitochondrial dysfunction seen in the Alzheimer's disease. Here we use comparative genomic analyses to study the evolution of the HADH2 gene encoding ABAD/HSD10 across several eukaryotic species. Results Both vertebrate and nematode HADH2 genes showed a six-exon/five-intron organization while those of the insects had a reduced and varied number of exons (two to three). Eutherian mammal HADH2 genes revealed some highly conserved noncoding regions, which may indicate the presence of functional elements, namely in the upstream region about 1 kb of the transcription start site and in the first part of intron 1. These regions were also conserved between Tetraodon and Fugu fishes. We identified a conserved alternative splicing event between human and dog, which have a nine amino acid deletion, causing the removal of the strand βF. This strand is one of the seven strands that compose the core β-sheet of the Rossman fold dinucleotide-binding motif characteristic of the short chain dehydrogenase/reductase (SDR) family members. However, the fact that the substrate binding cleft residues are retained and the existence of a shared variant between human and dog suggest that it might be functional. Molecular adaptation analyses across eutherian mammal orthologues revealed the existence of sites under positive selection, some of which being localized in the substrate-binding cleft and in the insertion 1 region on loop D (an important region for the Aβ-binding to the enzyme). Interestingly, a higher than expected number of nonsynonymous substitutions were observed between human/chimpanzee and orangutan, with six out of the seven amino acid replacements being under molecular adaptation (including three in loop D and one in the substrate binding loop). Conclusion Our study revealed that HADH2 genes maintained a reasonable conserved organization across a large evolutionary distance. The conserved noncoding regions identified among mammals and between pufferfishes, the evidence of an alternative splicing variant conserved between human and dog, and the detection of positive selection across eutherian mammals, may be of importance for further research on ABAD/HSD10 function and its implication in the Alzheimer's disease.
Collapse
Affiliation(s)
- Alexandra T Marques
- REQUIMTE, Departamento de Química, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 Porto, Portugal
| | - Agostinho Antunes
- REQUIMTE, Departamento de Química, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 Porto, Portugal
| | - Pedro A Fernandes
- REQUIMTE, Departamento de Química, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 Porto, Portugal
| | - Maria J Ramos
- REQUIMTE, Departamento de Química, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 687, 4169-007 Porto, Portugal
| |
Collapse
|
619
|
Faraco JH, Appelbaum L, Marin W, Gaus SE, Mourrain P, Mignot E. Regulation of hypocretin (orexin) expression in embryonic zebrafish. J Biol Chem 2006; 281:29753-61. [PMID: 16867991 DOI: 10.1074/jbc.m605811200] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Hypocretins/orexins are neuropeptides involved in the regulation of sleep and energy balance in mammals. Conservation of gene sequence, hypothalamic localization of cell bodies, and projection patterns in adult zebrafish suggest that the architecture and function of the hypocretin system are conserved in fish. We report on the complete genomic structure of the zebrafish and Tetraodon hypocretin genes and the complete predicted hypocretin protein sequences from five teleosts. Using whole mount in situ hybridization, we have traced the development of hypocretin cells in zebrafish from onset of expression at 22 h post-fertilization through the first week of development. Promoter elements of similar size from zebrafish and Tetraodon were capable of driving efficient and specific expression of enhanced green fluorescent protein in developing zebrafish embryos, thus defining a minimal promoter region able to accurately mimic the native hypocretin pattern. This enhanced green fluorescent protein expression also revealed a complex pattern of projections within the hypothalamus, to the midbrain, and to the spinal cord. To further analyze the promoter, a series of deletion and substitution constructs were injected into embryos, and resulting promoter activity was monitored in the first week of development. A critical region of 250 base pairs was identified containing a core 13-base pair element essential for hypocretin expression.
Collapse
Affiliation(s)
- Juliette H Faraco
- Stanford University Center for Narcolepsy, Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, California 94305, USA
| | | | | | | | | | | |
Collapse
|
620
|
Xie X, Kamal M, Lander ES. A family of conserved noncoding elements derived from an ancient transposable element. Proc Natl Acad Sci U S A 2006; 103:11659-64. [PMID: 16864796 PMCID: PMC1518811 DOI: 10.1073/pnas.0604768103] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
The evolutionary origin of the conserved noncoding elements (CNEs) in the human genome remains poorly understood but may hold important clues to their biological functions. Here, we report the discovery of a CNE family with approximately 124 instances in the human genome that demonstrates a clear signature of having been derived from an ancient transposon. The CNE family is also present in the chicken genome, although typically not at orthologous locations. The CNE family is closely related to the active transposon SINE3 in zebrafish and also to a previously uncharacterized transposon in the coelacanth, the so-called "living fossil" belonging to the lobe-finned fish lineage. The mammal, bird, zebrafish, and coelacanth families all share a highly similar core element of approximately 180 bp but have important differences in their 5' and 3' ends. The core element has thus been preserved over 450 million years of evolution, implying an important biological function. In addition, we identify 95 additional CNE families that likely predate the mammalian radiation. The results highlight both the creative role of transposons and the importance of CNE families.
Collapse
Affiliation(s)
- Xiaohui Xie
- *Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
| | - Michael Kamal
- *Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
| | - Eric S. Lander
- *Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139; and
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
621
|
Gómez-Skarmeta JL, Lenhard B, Becker TS. New technologies, new findings, and new concepts in the study of vertebrate cis-regulatory sequences. Dev Dyn 2006; 235:870-85. [PMID: 16395688 DOI: 10.1002/dvdy.20659] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
All vertebrates share a similar early embryonic body plan and use the same regulatory genes for their development. The availability of numerous sequenced vertebrate genomes and significant advances in bioinformatics have resulted in the finding that the genomic regions of many of these developmental regulatory genes also contain highly conserved noncoding sequence. In silico discovery of conserved noncoding regions and of transcription factor binding sites as well as the development of methods for high throughput transgenesis in Xenopus and zebrafish are dramatically increasing the speed with which regulatory elements can be discovered, characterized, and tested in the context of whole live embryos. We review here some of the recent technological developments that will likely lead to a surge in research on how vertebrate genomes encode regulation of transcriptional activity, how regulatory sequences constrain genomic architecture, and ultimately how vertebrate form has evolved.
Collapse
|
622
|
Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, Chevalier B, Johnstone SE, Cole MF, Isono KI, Koseki H, Fuchikami T, Abe K, Murray HL, Zucker JP, Yuan B, Bell GW, Herbolsheimer E, Hannett NM, Sun K, Odom DT, Otte AP, Volkert TL, Bartel DP, Melton DA, Gifford DK, Jaenisch R, Young RA. Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 2006; 125:301-13. [PMID: 16630818 PMCID: PMC3773330 DOI: 10.1016/j.cell.2006.02.043] [Citation(s) in RCA: 1742] [Impact Index Per Article: 96.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2005] [Revised: 01/20/2006] [Accepted: 02/23/2006] [Indexed: 12/31/2022]
Abstract
Polycomb group proteins are essential for early development in metazoans, but their contributions to human development are not well understood. We have mapped the Polycomb Repressive Complex 2 (PRC2) subunit SUZ12 across the entire nonrepeat portion of the genome in human embryonic stem (ES) cells. We found that SUZ12 is distributed across large portions of over two hundred genes encoding key developmental regulators. These genes are occupied by nucleosomes trimethylated at histone H3K27, are transcriptionally repressed, and contain some of the most highly conserved noncoding elements in the genome. We found that PRC2 target genes are preferentially activated during ES cell differentiation and that the ES cell regulators OCT4, SOX2, and NANOG cooccupy a significant subset of these genes. These results indicate that PRC2 occupies a special set of developmental genes in ES cells that must be repressed to maintain pluripotency and that are poised for activation during ES cell differentiation.
Collapse
Affiliation(s)
- Tong Ihn Lee
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Richard G. Jenner
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Laurie A. Boyer
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Matthew G. Guenther
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Stuart S. Levine
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Roshan M. Kumar
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Brett Chevalier
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Sarah E. Johnstone
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Megan F. Cole
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Kyo-ichi Isono
- Developmental Genetics Group, RIKEN Center for Allergy and Immunology, 1-7-22, Suehiro, Tsurumiku, Yokohama, Kanagawa 230-0045, Japan
| | - Haruhiko Koseki
- Developmental Genetics Group, RIKEN Center for Allergy and Immunology, 1-7-22, Suehiro, Tsurumiku, Yokohama, Kanagawa 230-0045, Japan
| | - Takuya Fuchikami
- Technology and Development Team for Mammalian Cellular Dynamics, BioResource Center, RIKEN Tsukuba Institute, 3-1-1, Koyadai, Tsukuba, Ibaraki 230-0045, Japan
| | - Kuniya Abe
- Technology and Development Team for Mammalian Cellular Dynamics, BioResource Center, RIKEN Tsukuba Institute, 3-1-1, Koyadai, Tsukuba, Ibaraki 230-0045, Japan
| | - Heather L. Murray
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Jacob P. Zucker
- Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Bingbing Yuan
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - George W. Bell
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | | | - Nancy M. Hannett
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Kaiming Sun
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Duncan T. Odom
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - Arie P. Otte
- Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 SM Amsterdam, The Netherlands
| | - Thomas L. Volkert
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | - David P. Bartel
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Douglas A. Melton
- Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - David K. Gifford
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
- MIT CSAIL, 32 Vassar Street, Cambridge, MA 02139, USA
| | - Rudolf Jaenisch
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Richard A. Young
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Contact:
| |
Collapse
|
623
|
Abstract
Stem cells encapsulate the fundamental problem of metazoan biology in miniature: How do cells establish and maintain their fates? Increasing evidence indicates that stem cell chromatin activates proliferation genes and represses differentiation genes. Understanding how these configurations are stabilized by Polycomb group proteins will advance our understanding of embryonic development, tissue homeostasis, regeneration, aging, and oncogenesis.
Collapse
Affiliation(s)
- Michael Buszczak
- Howard Hughes Laboratories and Embryology Department, Carnegie Institution of Washington, 3520 San Martin Drive, Baltimore, MD 21218, USA
| | | |
Collapse
|
624
|
Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K, Jaenisch R, Wagschal A, Feil R, Schreiber SL, Lander ES. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 2006; 125:315-26. [PMID: 16630819 DOI: 10.1016/j.cell.2006.02.041] [Citation(s) in RCA: 3913] [Impact Index Per Article: 217.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2005] [Revised: 01/18/2006] [Accepted: 02/23/2006] [Indexed: 02/06/2023]
Abstract
The most highly conserved noncoding elements (HCNEs) in mammalian genomes cluster within regions enriched for genes encoding developmentally important transcription factors (TFs). This suggests that HCNE-rich regions may contain key regulatory controls involved in development. We explored this by examining histone methylation in mouse embryonic stem (ES) cells across 56 large HCNE-rich loci. We identified a specific modification pattern, termed "bivalent domains," consisting of large regions of H3 lysine 27 methylation harboring smaller regions of H3 lysine 4 methylation. Bivalent domains tend to coincide with TF genes expressed at low levels. We propose that bivalent domains silence developmental genes in ES cells while keeping them poised for activation. We also found striking correspondences between genome sequence and histone methylation in ES cells, which become notably weaker in differentiated cells. These results highlight the importance of DNA sequence in defining the initial epigenetic landscape and suggest a novel chromatin-based mechanism for maintaining pluripotency.
Collapse
Affiliation(s)
- Bradley E Bernstein
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
625
|
Prabhakar S, Poulin F, Shoukry M, Afzal V, Rubin EM, Couronne O, Pennacchio LA. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res 2006; 16:855-63. [PMID: 16769978 PMCID: PMC1484452 DOI: 10.1101/gr.4717506] [Citation(s) in RCA: 156] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons. To address this problem, we identified evolutionarily conserved noncoding regions in primate, mammalian, and more distant comparisons using a uniform approach (Gumby) that facilitates unbiased assessment of the impact of evolutionary distance on predictive power. We benchmarked computational predictions against previously identified cis-regulatory elements at diverse genomic loci and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using an in vivo enhancer assay in transgenic mice. Human regulatory elements were identified with acceptable sensitivity (53%-80%) and true-positive rate (27%-67%) by comparison with one to five other eutherian mammals or six other simian primates. More distant comparisons (marsupial, avian, amphibian, and fish) failed to identify many of the empirically defined functional noncoding elements. Our results highlight the practical utility of close sequence comparisons, and the loss of sensitivity entailed by more distant comparisons. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole-genome comparative analysis that explains most of the observations from empirical benchmarking. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for in vivo testing at embryonic time points.
Collapse
Affiliation(s)
- Shyam Prabhakar
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
- Corresponding authors.E-mail ; fax (510) 486-4229. E-mail ; fax (510) 486-4229
| | - Francis Poulin
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Malak Shoukry
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Veena Afzal
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Edward M. Rubin
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
| | - Olivier Couronne
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
| | - Len A. Pennacchio
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
- Corresponding authors.E-mail ; fax (510) 486-4229. E-mail ; fax (510) 486-4229
| |
Collapse
|
626
|
|
627
|
Richler E, Reichert JG, Buxbaum JD, McInnes LA. Autism and ultraconserved non-coding sequence on chromosome 7q. Psychiatr Genet 2006; 16:19-23. [PMID: 16395125 DOI: 10.1097/01.ypg.0000180683.18665.ef] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
OBJECTIVE Autism has been linked to a broad region on chromosome 7q that contains a large number of genes involved in transcription and development. This region is also enriched for ultraconserved non-coding elements, defined as human-rodent sequences that are 100% aligned over > or =200 base pairs, which have a high likelihood of being functional. Therefore, as only a few rare coding variants have been detected in the autism candidate genes on 7q examined to date, we decided to screen these ultraconserved elements for possible autism susceptibility alleles. METHODS We used denaturing high-performance liquid chromatography, and DNA sequencing, to perform variant detection in a total of 146 cases with autism, 96 from the Autism Genetic Resource Exchange and 50 from the Central Valley of Costa Rica, as well as 124 controls from the Polymorphism Discovery Resource Panel. We screened 10 consecutive ultraconserved elements in, or flanking, the genes DLX5/6, AUTS2 and FOXP2 on chromosome 7q. RESULTS Although we did find several rare variants in autism cases that were not present in controls, we also observed rare variants present in controls and not cases. The most common variant occurred in controls at a frequency of 3.3%. Interestingly, two ultraconserved elements each harbored three independent variants and one ultraconserved element harbored two independent variants, suggesting that ultraconservation is maintained chiefly by a decreased tendency toward fixation, rather than a significantly lower mutation rate. CONCLUSIONS Our results show that these sequences are unlikely to harbor major autism susceptibility alleles.
Collapse
Affiliation(s)
- Esther Richler
- Department of Psychiatry, Mount Sinai School of Medicine, New York, USA
| | | | | | | |
Collapse
|
628
|
Feng J, Bi C, Clark BS, Mady R, Shah P, Kohtz JD. The Evf-2 noncoding RNA is transcribed from the Dlx-5/6 ultraconserved region and functions as a Dlx-2 transcriptional coactivator. Genes Dev 2006; 20:1470-84. [PMID: 16705037 PMCID: PMC1475760 DOI: 10.1101/gad.1416106] [Citation(s) in RCA: 549] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The identification of ultraconserved noncoding sequences in vertebrates has been associated with developmental regulators and DNA-binding proteins. One of the first of these was identified in the intergenic region between the Dlx-5 and Dlx-6 genes, members of the Dlx/dll homeodomain-containing protein family. In previous experiments, we showed that Sonic hedgehog treatment of forebrain neural explants results in the activation of Dlx-2 and the novel noncoding RNA (ncRNA), Evf-1. In this report, we show that the Dlx-5/6 ultraconserved region is transcribed to generate an alternatively spliced form of Evf-1, the ncRNA Evf-2. Evf-2 specifically cooperates with Dlx-2 to increase the transcriptional activity of the Dlx-5/6 enhancer in a target and homeodomain-specific manner. A stable complex containing the Evf-2 ncRNA and the Dlx-2 protein forms in vivo, suggesting that the Evf-2 ncRNA activates transcriptional activity by directly influencing Dlx-2 activity. These experiments identify a novel mechanism whereby transcription is controlled by the cooperative actions of an ncRNA and a homeodomain protein. The possibility that a subset of vertebrate ultraconserved regions may function at both the DNA and RNA level to control key developmental regulators may explain why ultraconserved sequences exhibit 90% or more conservation even after 450 million years of vertebrate evolution.
Collapse
Affiliation(s)
- Jianchi Feng
- Program in Neurobiology and Department of Pediatrics, Children's Memorial Hospital and Feinberg School of Medicine, Northwestern University, Chicago, Illinois 60614, USA
| | | | | | | | | | | |
Collapse
|
629
|
Rigoutsos I, Huynh T, Miranda K, Tsirigos A, McHardy A, Platt D. Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes. Proc Natl Acad Sci U S A 2006; 103:6605-10. [PMID: 16636294 PMCID: PMC1447521 DOI: 10.1073/pnas.0601688103] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Using an unsupervised pattern-discovery method, we processed the human intergenic and intronic regions and catalogued all variable-length patterns with identically conserved copies and multiplicities above what is expected by chance. Among the millions of discovered patterns, we found a subset of 127,998 patterns, termed pyknons, which have additional nonoverlapping instances in the untranslated and protein-coding regions of 30,675 transcripts from 20,059 human genes. The pyknons arrange combinatorially in the untranslated and coding regions of numerous human genes where they form mosaics. Consecutive instances of pyknons in these regions show a strong bias in their relative placement, favoring distances of approximately 22 nucleotides. We also found pyknons to be enriched in a statistically significant manner in genes involved in specific processes, e.g., cell communication, transcription, regulation of transcription, signaling, transport, etc. For approximately 1/3 of the pyknons, the intergenic/intronic instances of their reverse complement lie within 380,084 nonoverlapping regions, typically 60-80 nucleotides long, which are predicted to form double-stranded, energetically stable, hairpin-shaped RNA secondary structures; additionally, the pyknons subsume approximately 40% of the known microRNA sequences, thus suggesting a possible link with posttranscriptional gene silencing and RNA interference. Cross-genome comparisons reveal that many of the pyknons have instances in the 3' UTRs of genes from other vertebrates and invertebrates where they are overrepresented in similar biological processes, as in the human genome. These unexpected findings suggest potential unique functional connections between the coding and noncoding parts of the human genome.
Collapse
Affiliation(s)
- Isidore Rigoutsos
- IBM Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, USA.
| | | | | | | | | | | |
Collapse
|
630
|
Hadrys T, Punnamoottil B, Pieper M, Kikuta H, Pezeron G, Becker TS, Prince V, Baker R, Rinkwitz S. Conserved co-regulation and promoter sharing of hoxb3a and hoxb4a in zebrafish. Dev Biol 2006; 297:26-43. [PMID: 16860306 DOI: 10.1016/j.ydbio.2006.04.446] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2005] [Revised: 03/16/2006] [Accepted: 04/12/2006] [Indexed: 10/24/2022]
Abstract
The expression of zebrafish hoxb3a and hoxb4a has been found to be mediated through five transcripts, hoxb3a transcripts I-III and hoxb4a transcripts I-II, driven by four promoters. A "master" promoter, located about 2 kb downstream of hoxb5a, controls transcription of a pre-mRNA comprising exon sequences of both genes. This unique gene structure is proposed to provide a novel mechanism to ensure overlapping, tissue-specific expression of both genes in the posterior hindbrain and spinal cord. Transgenic approaches were used to analyze the functions of zebrafish hoxb3a/hoxb4a promoters and enhancer sequences containing regions of homology that were previously identified by comparative genomics. Two neural enhancers were shown to establish specific anterior expression borders within the hindbrain and mediate expression in defined neuronal populations derived from hindbrain rhombomeres (r) 5 to 8, suggesting a late role of the genes in neuronal cell lineage specification. Species comparison showed that the zebrafish hoxb3a r5 and r6 enhancer corresponded to a sequence within the mouse HoxA cluster controlling activity of Hoxa3 in r5 and r6, whereas a homologous region within the HoxB cluster activated Hoxb3 expression but limited to r5. We conclude that the similarity of hoxb3a/Hoxa3 regulatory mechanisms reflect the shared descent of both genes from a single ancestral paralog group 3 gene.
Collapse
Affiliation(s)
- Thorsten Hadrys
- Department of Physiology and Neuroscience, NYU Medical School, New York, NY 10016, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
631
|
Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature 2006; 441:87-90. [PMID: 16625209 DOI: 10.1038/nature04696] [Citation(s) in RCA: 369] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2005] [Accepted: 03/02/2006] [Indexed: 01/15/2023]
Abstract
Hundreds of highly conserved distal cis-regulatory elements have been characterized so far in vertebrate genomes. Many thousands more are predicted on the basis of comparative genomics. However, in stark contrast to the genes that they regulate, in invertebrates virtually none of these regions can be traced by using sequence similarity, leaving their evolutionary origins obscure. Here we show that a class of conserved, primarily non-coding regions in tetrapods originated from a previously unknown short interspersed repetitive element (SINE) retroposon family that was active in the Sarcopterygii (lobe-finned fishes and terrestrial vertebrates) in the Silurian period at least 410 million years ago (ref. 4), and seems to be recently active in the 'living fossil' Indonesian coelacanth, Latimeria menadoensis. Using a mouse enhancer assay we show that one copy, 0.5 million bases from the neuro-developmental gene ISL1, is an enhancer that recapitulates multiple aspects of Isl1 expression patterns. Several other copies represent new, possibly regulatory, alternatively spliced exons in the middle of pre-existing Sarcopterygian genes. One of these, a more than 200-base-pair ultraconserved region, 100% identical in mammals, and 80% identical to the coelacanth SINE, contains a 31-amino-acid-residue alternatively spliced exon of the messenger RNA processing gene PCBP2 (ref. 6). These add to a growing list of examples in which relics of transposable elements have acquired a function that serves their host, a process termed 'exaptation', and provide an origin for at least some of the many highly conserved vertebrate-specific genomic sequences.
Collapse
Affiliation(s)
- Gill Bejerano
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
632
|
Abstract
Rett syndrome (RTT) is an X-linked dominant disabling neurodevelopmental disorder caused by loss of function mutations in the MECP2 gene, located at Xq28, which encodes a multifunctional protein. MECP2 expression is regulated in a developmental stage and cell-type-specific manner. The need for tightly controlled MeCP2 levels in brain is strongly suggested by neurologically abnormal phenotypes of mouse models with mild overexpression and by mental retardation in human males with MECP2 duplication. We set out to identify long-range cis-regulatory sequences that differentially regulate MECP2 transcription and, when mutated, may contribute to the pathogenesis of RTT, autism or X-linked mental retardation. By inter-species sequence comparisons, we detected 27 highly conserved non-coding DNA sequences within a 210 kb region covering MECP2. We functionally confirmed four enhancer and two silencer elements by performing luciferase reporter assays in four different human cell lines. The transcription factor binding capability of the identified regulatory elements was tested by gel shift assays. To locate the human MECP2 core promoter, we dissected the promoter region by reporter assays with deletion constructs. We then used chromosome conformation capture methods to document long-range interactions of three enhancers and two silencers with the MECP2 promoter. Acting over distances of up to 130 kb, these elements may influence chromatin configurations and regulate MECP2 transcription. Our study has defined the "MECP2 functional expression module" and identified enhancer and silencer elements that are likely to be responsible for the tissue-specific, developmental stage-specific or splice-variant-specific control of MeCP2 protein expression.
Collapse
Affiliation(s)
- Jinglan Liu
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | |
Collapse
|
633
|
Jiménez-Delgado S, Crespo M, Permanyer J, Garcia-Fernàndez J, Manzanares M. Evolutionary genomics of the recently duplicated amphioxus Hairy genes. Int J Biol Sci 2006; 2:66-72. [PMID: 16733536 PMCID: PMC1458425 DOI: 10.7150/ijbs.2.66] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2006] [Accepted: 02/24/2006] [Indexed: 11/20/2022] Open
Abstract
Amphioxus Hairy genes have gone through a number of lineage-specific duplications, resulting in eight members, some of which are differentially expressed in the embryo. In order to gain insights into the evolution and function of this gene family we have compared their genomic structure and searched for conserved non-coding sequence elements. We have found that introns have been lost independently from these genes at least twice and after the duplication events. By carrying out phylogenetic footprinting between paralogues expressed in the embryo, we have found a differential distribution of conserved elements that could explain the limited overlap in expression patterns of Hairy genes in the amphioxus embryo. Furthermore, clustering of RBP-Jk binding sites in these conserved elements suggests that amphioxus Hairy genes are downstream targets of the Notch signaling pathway, as occurs in vertebrates. All of this evidence suggests that amphioxus Hairy genes have gone through a process of subfunctionalization shortly after their duplication, representing an extreme and rapid case of the duplication-degeneration-complementation model.
Collapse
Affiliation(s)
- Senda Jiménez-Delgado
- 1 Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Avda. Diagonal 645, 08028 Barcelona, Spain
| | - Miguel Crespo
- 2 Instituto de Investigaciones Biomédicas CSIC-UAM, Arturo Duperier 4, 28029 Madrid, Spain
| | - Jon Permanyer
- 1 Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Avda. Diagonal 645, 08028 Barcelona, Spain
| | - Jordi Garcia-Fernàndez
- 1 Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Avda. Diagonal 645, 08028 Barcelona, Spain
| | - Miguel Manzanares
- 2 Instituto de Investigaciones Biomédicas CSIC-UAM, Arturo Duperier 4, 28029 Madrid, Spain
| |
Collapse
|
634
|
Blanchette M, Bataille AR, Chen X, Poitras C, Laganière J, Lefèbvre C, Deblois G, Giguère V, Ferretti V, Bergeron D, Coulombe B, Robert F. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res 2006; 16:656-68. [PMID: 16606704 PMCID: PMC1457048 DOI: 10.1101/gr.4866006] [Citation(s) in RCA: 216] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The identification of regulatory regions is one of the most important and challenging problems toward the functional annotation of the human genome. In higher eukaryotes, transcription-factor (TF) binding sites are often organized in clusters called cis-regulatory modules (CRM). While the prediction of individual TF-binding sites is a notoriously difficult problem, CRM prediction has proven to be somewhat more reliable. Starting from a set of predicted binding sites for more than 200 TF families documented in Transfac, we describe an algorithm relying on the principle that CRMs generally contain several phylogenetically conserved binding sites for a few different TFs. The method allows the prediction of more than 118,000 CRMs within the human genome. A subset of these is shown to be bound in vivo by TFs using ChIP-chip. Their analysis reveals, among other things, that CRM density varies widely across the genome, with CRM-rich regions often being located near genes encoding transcription factors involved in development. Predicted CRMs show a surprising enrichment near the 3' end of genes and in regions far from genes. We document the tendency for certain TFs to bind modules located in specific regions with respect to their target genes and identify TFs likely to be involved in tissue-specific regulation. The set of predicted CRMs, which is made available as a public database called PReMod (http://genomequebec.mcgill.ca/PReMod), will help analyze regulatory mechanisms in specific biological systems.
Collapse
Affiliation(s)
- Mathieu Blanchette
- McGill Centre for Bioinformatics, Montreal, Quebec, Canada, H3A 2B4
- Corresponding authors.E-mail ; fax (514) 398-3387.E-mail ; fax (514) 987-5743
| | - Alain R. Bataille
- Institut de Recherches Cliniques de Montréal, Montreal, Quebec, Canada H2W 1R7
| | - Xiaoyu Chen
- McGill Centre for Bioinformatics, Montreal, Quebec, Canada, H3A 2B4
| | - Christian Poitras
- Institut de Recherches Cliniques de Montréal, Montreal, Quebec, Canada H2W 1R7
| | - Josée Laganière
- Molecular Oncology Group Department of Medicine, Oncology and Biochemistry, McGill University, Montreal, Quebec, Canada H3A 1A1
| | - Céline Lefèbvre
- Molecular Oncology Group Department of Medicine, Oncology and Biochemistry, McGill University, Montreal, Quebec, Canada H3A 1A1
| | - Geneviève Deblois
- Molecular Oncology Group Department of Medicine, Oncology and Biochemistry, McGill University, Montreal, Quebec, Canada H3A 1A1
| | - Vincent Giguère
- Molecular Oncology Group Department of Medicine, Oncology and Biochemistry, McGill University, Montreal, Quebec, Canada H3A 1A1
| | - Vincent Ferretti
- McGill University and Genome Quebec Innovation Center, Montreal, Quebec, Canada H3A 1A4
| | - Dominique Bergeron
- Institut de Recherches Cliniques de Montréal, Montreal, Quebec, Canada H2W 1R7
| | - Benoit Coulombe
- Institut de Recherches Cliniques de Montréal, Montreal, Quebec, Canada H2W 1R7
| | - François Robert
- Institut de Recherches Cliniques de Montréal, Montreal, Quebec, Canada H2W 1R7
- Corresponding authors.E-mail ; fax (514) 398-3387.E-mail ; fax (514) 987-5743
| |
Collapse
|
635
|
Feltus FA, Singh HP, Lohithaswa HC, Schulze SR, Silva TD, Paterson AH. A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops. PLANT PHYSIOLOGY 2006; 140:1183-91. [PMID: 16607031 PMCID: PMC1435799 DOI: 10.1104/pp.105.074203] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species.
Collapse
Affiliation(s)
- F A Feltus
- Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia 30602, USA
| | | | | | | | | | | |
Collapse
|
636
|
McEwen GK, Woolfe A, Goode D, Vavouri T, Callaway H, Elgar G. Ancient duplicated conserved noncoding elements in vertebrates: a genomic and functional analysis. Genome Res 2006; 16:451-65. [PMID: 16533910 PMCID: PMC1457030 DOI: 10.1101/gr.4143406] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Fish-mammal genomic comparisons have proved powerful in identifying conserved noncoding elements likely to be cis-regulatory in nature, and the majority of those tested in vivo have been shown to act as tissue-specific enhancers associated with genes involved in transcriptional regulation of development. Although most of these elements share little sequence identity to each other, a small number are remarkably similar and appear to be the product of duplication events. Here, we searched for duplicated conserved noncoding elements in the human genome, using comparisons with Fugu to select putative cis-regulatory sequences. We identified 124 families of duplicated elements, each containing between two and five members, that are highly conserved within and between vertebrate genomes. In 74% of cases, we were able to assign a specific set of paralogous genes with annotation relating to transcriptional regulation and/or development to each family, thus removing much of the ambiguity in identifying associated genes. We find that duplicate elements have the potential to up-regulate reporter gene expression in a tissue-specific manner and that expression domains often overlap, but are not necessarily identical, between family members. Over two thirds of the families are conserved in duplicate in fish and appear to predate the large-scale duplication events thought to have occurred at the origin of vertebrates. We propose a model whereby gene duplication and the evolution of cis-regulatory elements can be considered in the context of increased morphological diversity and the emergence of the modern vertebrate body plan.
Collapse
Affiliation(s)
- Gayle K. McEwen
- School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, United Kingdom
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, United Kingdom
- MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR, United Kingdom
| | - Adam Woolfe
- School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, United Kingdom
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, United Kingdom
| | - Debbie Goode
- School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, United Kingdom
| | - Tanya Vavouri
- School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, United Kingdom
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, United Kingdom
| | - Heather Callaway
- School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, United Kingdom
| | - Greg Elgar
- School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, United Kingdom
- Corresponding author.E-mail ; fax 0044 207 882 3000
| |
Collapse
|
637
|
Hallikas O, Palin K, Sinjushina N, Rautiainen R, Partanen J, Ukkonen E, Taipale J. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 2006; 124:47-59. [PMID: 16413481 DOI: 10.1016/j.cell.2005.10.042] [Citation(s) in RCA: 350] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2005] [Revised: 09/21/2005] [Accepted: 10/21/2005] [Indexed: 12/21/2022]
Abstract
Understanding the regulation of human gene expression requires knowledge of the "second genetic code," which consists of the binding specificities of transcription factors (TFs) and the combinatorial code by which TF binding sites are assembled to form tissue-specific enhancer elements. Using a novel high-throughput method, we determined the DNA binding specificities of GLIs 1-3, Tcf4, and c-Ets1, which mediate transcriptional responses to the Hedgehog (Hh), Wnt, and Ras/MAPK signaling pathways. To identify mammalian enhancer elements regulated by these pathways on a genomic scale, we developed a computational tool, enhancer element locator (EEL). We show that EEL can be used to identify Hh and Wnt target genes and to predict activated TFs based on changes in gene expression. Predictions validated in transgenic mouse embryos revealed the presence of multiple tissue-specific enhancers in mouse c-Myc and N-Myc genes, which has implications for organ-specific growth control and tumor-type specificity of oncogenes.
Collapse
Affiliation(s)
- Outi Hallikas
- Molecular and Cancer Biology Program, Biomedicum Helsinki, University of Helsinki, Finland
| | | | | | | | | | | | | |
Collapse
|
638
|
Dobbs MB, Gurnett CA, Pierce B, Exner GU, Robarge J, Morcuende JA, Cole WG, Templeton PA, Foster B, Bowcock AM. HOXD10 M319K mutation in a family with isolated congenital vertical talus. J Orthop Res 2006; 24:448-53. [PMID: 16450407 DOI: 10.1002/jor.20052] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Congenital vertical talus (CVT) is a primary dislocation of the talonavicular joint that often occurs in neuromusculoskeletal syndromes, but may also be seen as an isolated abnormality. Six families with isolated CVT were ascertained. DNA was isolated from 21 affected individuals and 17 unaffected individuals from these families, as well as from five sporadic patients with CVT. Variable expressivity was noted in three families, manifesting as clubfoot in three individuals. Genome-wide linkage analysis generated a maximum two-point logarithm of odds score on chromosome 2q with D2S1353 (Zmax = 1.43 at theta(max) = 0.1), 17 Mb from the HOXD gene cluster. DNA from one affected individual of each family was subjected to mutational analysis of the HOXD10 gene. A single missense mutation was identified (M319K, 956T > A) in the homeodomain recognition helix of the HOXD10 gene that segregated with disease in one large British family. This mutation was recently described in a family of Italian descent with CVT and Charcot-Marie-Tooth deformity HOXD10 gene mutations were not identified in any of the other families or sporadic patients with CVT, suggesting that genetic heterogeneity underlies this disorder.
Collapse
Affiliation(s)
- Matthew B Dobbs
- Department of Orthopedic Surgery, Washington University School of Medicine, One Child Place, Suite 45 Saint Louis, Missouri 63110, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
639
|
Kamal M, Xie X, Lander ES. A large family of ancient repeat elements in the human genome is under strong selection. Proc Natl Acad Sci U S A 2006; 103:2740-5. [PMID: 16477033 PMCID: PMC1413850 DOI: 10.1073/pnas.0511238103] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Although conserved noncoding elements (CNEs) constitute the majority of sequences under purifying selection in the human genome, they remain poorly understood. CNEs seem to be largely unique, with no large families of similar elements reported to date. Here, we search for CNEs among the ancestral repeat classes in the human genome and report the discovery of a large CNE family containing >900 members. This family belongs to the MER121 class of repeats. Although the MER121 family members show considerable sequence variation among one another, the individual copies show striking conservation in orthologous locations across the human, dog, mouse, and rat genomes. The element is also present and conserved in orthologous locations in the marsupial, but its genome-wide dispersal postdates the divergence from birds. The comparative genomic data indicate that MER121 does not encode a family of either protein-coding or RNA genes. Although the precise function of these elements remains unknown, the evidence suggests that this unusual family may play a cis-regulatory or structural role in mammalian genomes.
Collapse
Affiliation(s)
- Michael Kamal
- *Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
| | - Xiaohui Xie
- *Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
| | - Eric S. Lander
- *Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139; and
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
640
|
Abstract
The last common ancestor between fish and mammals dates back to the very origin of the vertebrate lineage and today, half of modern vertebrates are fish. It is thus not surprising that several fish species have played important roles in recent years to advance our understanding of vertebrate genome evolution, to inform us on the structure of human genes, and, somewhat more unexpectedly, to provide leads to understanding the function of genes involved in human diseases. Genome sequence comparisons between such distantly related organisms are highly informative due to the accumulation of neutral mutations in nonfunctional regions. Yet humans and fishes share many developmental pathways, organ systems, and physiological mechanisms, making conclusions relevant to human biology. The respective advantages of zebrafish, medaka, Tetraodon, or Takifugu have been well exploited so far with bioinformatics analyses and molecular biology techniques. However the full potential of fish genomics is about to be unleashed with the integration of more traditional disciplines such as biochemistry and physiology, with the study of additional species such as carp, trout, or tilapia and a broadening of its applications to environmental genomics or aquaculture.
Collapse
Affiliation(s)
- Hugues Roest Crollius
- Dyogen Lab, Centre National de la Recherche Scientifique UMR8541, Ecole Normale Supérieure, 75005 Paris, France.
| | | |
Collapse
|
641
|
Bagheri-Fam S, Barrionuevo F, Dohrmann U, Günther T, Schüle R, Kemler R, Mallo M, Kanzler B, Scherer G. Long-range upstream and downstream enhancers control distinct subsets of the complex spatiotemporal Sox9 expression pattern. Dev Biol 2006; 291:382-97. [PMID: 16458883 DOI: 10.1016/j.ydbio.2005.11.013] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2005] [Revised: 11/09/2005] [Accepted: 08/29/2005] [Indexed: 11/20/2022]
Abstract
SOX9 is an evolutionary conserved transcription factor that is expressed in a variety of tissues, with essential functions in cartilage, testis, heart, glial cell, inner ear and neural crest development. By comparing human and pufferfish genomic sequences, we previously identified eight highly conserved sequence elements between 290 kb 5' and 450 kb 3' to human SOX9. In this study, we assayed the regulatory potential of elements E1 to E7 in transgenic mice using a lacZ reporter gene driven by a 529 bp minimal mouse Sox9 promoter. We found that three of these elements and the Sox9 promoter control distinct subsets of the tissue-specific expression pattern of Sox9. E3, located 251 kb 5' to SOX9, directs lacZ expression to cranial neural crest cells and to the inner ear. E1 is located 28 kb 5' to SOX9 and controls expression in the node, notochord, gut, bronchial epithelium and pancreas. Transgene expression in the neuroectoderm is mediated by E7, located 95 kb 3' to SOX9, which regulates expression in the telencephalon and midbrain, and by the Sox9 minimal promoter which controls expression in the ventral spinal cord and hindbrain. We show that E3-directed reporter gene expression in neural crest cells of the first but not of the second and third pharyngeal arch is dependent on beta-catenin, revealing a complex regulation of Sox9 in cranial neural crest cells. Moreover, we identify and discuss highly conserved transcription factor binding sites within enhancer E3 that are in good agreement with current models for neural crest and inner ear development. Finally, we identify enhancer E1 as a cis-regulatory element conserved between vertebrates and invertebrates, indicating that some cis-regulatory sequences that control developmental genes in vertebrates might be phylogenetically ancient.
Collapse
Affiliation(s)
- Stefan Bagheri-Fam
- Institute of Human Genetics and Anthropology, University of Freiburg, Breisacherstr. 33, D-79106 Freiburg, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
642
|
Ogino H, McConnell WB, Grainger RM. Highly efficient transgenesis in Xenopus tropicalis using I-SceI meganuclease. Mech Dev 2006; 123:103-13. [PMID: 16413175 DOI: 10.1016/j.mod.2005.11.006] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2005] [Revised: 11/23/2005] [Accepted: 11/23/2005] [Indexed: 02/07/2023]
Abstract
In this study, we report a highly efficient transgenesis technique for Xenopus tropicalis based on a method described first for Medaka. This simple procedure entails co-injection of meganuclease I-SceI and a transgene construct flanked by two I-SceI sites into fertilized eggs. Approximately 30% of injected embryos express transgenes in a promoter-dependent manner. About 1/3 of such embryos show incorporation of the transgene at the one-cell stage and the remainder are 'half-transgenics' suggesting incorporation at the two-cell stage. Transgenes from both classes of embryos are shown to be transmitted and expressed in offspring. The procedure also works efficiently in Xenopus laevis. Because the needle injection procedure does not significantly damage embryos, a high fraction develop normally and can, as well, be injected with a second reagent, for example an mRNA or antisense morpholino oligonucleotide, thus allowing one to perform several genetic manipulations on embryos at one time. This simple and efficient technique will be a powerful tool for high-throughput transgenesis assays in founder animals, and for facilitating genetic studies in the fast-breeding diploid frog, X. tropicalis.
Collapse
Affiliation(s)
- Hajime Ogino
- Department of Biology, University of Virginia, Charlottesville, VA 22904, USA
| | | | | |
Collapse
|
643
|
Keightley PD, Kryukov GV, Sunyaev S, Halligan DL, Gaffney DJ. Evolutionary constraints in conserved nongenic sequences of mammals. Genome Res 2006; 15:1373-8. [PMID: 16204190 PMCID: PMC1240079 DOI: 10.1101/gr.3942005] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Mammalian genomes contain many highly conserved nongenic sequences (CNGs) whose functional significance is poorly understood. Sets of CNGs have previously been identified by selecting the most conserved elements from a chromosome or genome, but in these highly selected samples, conservation may be unrelated to purifying selection. Furthermore, conservation of CNGs may be caused by mutation rate variation rather than selective constraints. To account for the effect of selective sampling, we have examined conservation of CNGs in taxa whose evolution is largely independent of the taxa from which the CNGs were initially identified, and we have controlled for mutation rate variation in the genome. We show that selective constraints in CNGs and their flanks are about one-half as strong in hominids as in murids, implying that hominids have accumulated many slightly deleterious mutations in functionally important nongenic regions. This is likely to be a consequence of the low effective population size of hominids leading to a reduced effectiveness of selection. We estimate that there are one and two times as many conserved nucleotides in CNGs as in known protein-coding genes of hominids and murids, respectively. Polymorphism frequencies in CNGs indicate that purifying selection operates in these sequences. During hominid evolution, we estimate that a total of about three deleterious mutations in CNGs and protein-coding genes have been selectively eliminated per diploid genome each generation, implying that deleterious mutations are eliminated from populations non-independently and that sex is necessary for long-term population persistence.
Collapse
Affiliation(s)
- Peter D Keightley
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom.
| | | | | | | | | |
Collapse
|
644
|
Deng W, Zhu X, Skogerbø G, Zhao Y, Fu Z, Wang Y, He H, Cai L, Sun H, Liu C, Li B, Bai B, Wang J, Jia D, Sun S, He H, Cui Y, Wang Y, Bu D, Chen R. Organization of the Caenorhabditis elegans small non-coding transcriptome: genomic features, biogenesis, and expression. Genes Dev 2006; 16:20-9. [PMID: 16344563 PMCID: PMC1356125 DOI: 10.1101/gr.4139206] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2005] [Accepted: 08/22/2005] [Indexed: 01/14/2023]
Abstract
Recent evidence points to considerable transcription occurring in non-protein-coding regions of eukaryote genomes. However, their lack of conservation and demonstrated function have created controversy over whether these transcripts are functional. Applying a novel cloning strategy, we have cloned 100 novel and 61 known or predicted Caenorhabditis elegans full-length ncRNAs. Studying the genomic environment and transcriptional characteristics have shown that two-thirds of all ncRNAs, including many intronic snoRNAs, are independently transcribed under the control of ncRNA-specific upstream promoter elements. Furthermore, the transcription levels of at least 60% of the ncRNAs vary with developmental stages. We identified two new classes of ncRNAs, stem-bulge RNAs (sbRNAs) and snRNA-like RNAs (snlRNAs), both featuring distinct internal motifs, secondary structures, upstream elements, and high and developmentally variable expression. Most of the novel ncRNAs are conserved in Caenorhabditis briggsae, but only one homolog was found outside the nematodes. Preliminary estimates indicate that the C. elegans transcriptome contains approximately 2700 small non-coding RNAs, potentially acting as regulatory elements in nematode development.
Collapse
Affiliation(s)
- Wei Deng
- Bioinformatics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
645
|
Juriloff DM, Harris MJ, McMahon AP, Carroll TJ, Lidral AC. Wnt9b is the mutated gene involved in multifactorial nonsyndromic cleft lip with or without cleft palate in A/WySn mice, as confirmed by a genetic complementation test. ACTA ACUST UNITED AC 2006; 76:574-9. [PMID: 16998816 DOI: 10.1002/bdra.20302] [Citation(s) in RCA: 100] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
BACKGROUND Nonsyndromic cleft lip (CL) with or without cleft palate (CLP) is a common human birth defect with complex genetic etiology. One of the unidentified genes maps to chromosome 17q21. A mouse strain, A/WySn, has CLP with complex genetic etiology that models the human defect, and 1 of its causative genes, clf1, maps to a region homologous to human 17q21. Extensive studies of the candidate region pointed to a novel insertion of an IAP transposon 3' from the gene Wnt9b as the clf1 mutation. Independently a recessive knockout mutation of Wnt9b (Wnt9b-) was reported to cause a lethal syndrome that includes some CLP. METHODS A standard genetic test of allelism between clf1 and the Wnt9b- mutation was done. A total of 83 F1 embryos at gestation day 14 (GD 14) from Wnt9b-/+ males crossed with A/WySn females, and 79 BC1 GD 14 embryos from F1 Wnt9b-/clf1 males back-crossed to A/WySn females were observed for CL. Embryo genotypes at clf1 and Wnt9b were obtained from DNA markers. Genotypes for a second unlinked modifier locus from A/WySn, clf2, were similarly obtained. RESULTS The compound mutant embryos (Wnt9b-/clf1) had high frequencies of CL: 27% in the F1 and 63% in the BC1. The clf2 modifier gene was found to have 3 alleles segregating in this study and to strongly influence the penetrance of CL in the compound mutant. CONCLUSIONS The noncomplementation of clf1 and Wnt9b- confirms that clf1 is a mutation of the Wnt9b gene. The homologous human WNT9B gene and 3' conserved noncoding region should be examined for a role in human nonsyndromic CLP.
Collapse
Affiliation(s)
- Diana M Juriloff
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada.
| | | | | | | | | |
Collapse
|
646
|
Siepel A, Pollard KS, Haussler D. New Methods for Detecting Lineage-Specific Selection. LECTURE NOTES IN COMPUTER SCIENCE 2006. [DOI: 10.1007/11732990_17] [Citation(s) in RCA: 124] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
647
|
Lunter G, Ponting CP, Hein J. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput Biol 2006; 2:e5. [PMID: 16410828 PMCID: PMC1326222 DOI: 10.1371/journal.pcbi.0020005] [Citation(s) in RCA: 148] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2005] [Accepted: 11/30/2005] [Indexed: 01/05/2023] Open
Abstract
It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human-mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Furthermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.
Collapse
Affiliation(s)
- Gerton Lunter
- MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, Oxford, United Kingdom.
| | | | | |
Collapse
|
648
|
Van Hellemont R, Monsieurs P, Thijs G, De Moor B, Van de Peer Y, Marchal K. A novel approach to identifying regulatory motifs in distantly related genomes. Genome Biol 2005; 6:R113. [PMID: 16420672 PMCID: PMC1414112 DOI: 10.1186/gb-2005-6-13-r113] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2005] [Revised: 08/22/2005] [Accepted: 12/01/2005] [Indexed: 11/25/2022] Open
Abstract
A two-step procedure for identifying regulatory motifs in distantly related organisms is described that combines the advantages of sequence alignment and motif detection approaches. Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size.
Collapse
Affiliation(s)
- Ruth Van Hellemont
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Pieter Monsieurs
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Gert Thijs
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Bart De Moor
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
| | - Yves Van de Peer
- Plant Systems Biology, Bioinformatics and Evolutionary Genomics, VIB/Ghent University, Technologiepark 927, 9052 Gent, Belgium
| | - Kathleen Marchal
- ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
- Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium
| |
Collapse
|
649
|
Drake JA, Bird C, Nemesh J, Thomas DJ, Newton-Cheh C, Reymond A, Excoffier L, Attar H, Antonarakis SE, Dermitzakis ET, Hirschhorn JN. Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nat Genet 2005; 38:223-7. [PMID: 16380714 DOI: 10.1038/ng1710] [Citation(s) in RCA: 187] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2005] [Accepted: 11/04/2005] [Indexed: 02/06/2023]
Abstract
Noncoding genetic variants are likely to influence human biology and disease, but recognizing functional noncoding variants is difficult. Approximately 3% of noncoding sequence is conserved among distantly related mammals, suggesting that these evolutionarily conserved noncoding regions (CNCs) are selectively constrained and contain functional variation. However, CNCs could also merely represent regions with lower local mutation rates. Here we address this issue and show that CNCs are selectively constrained in humans by analyzing HapMap genotype data. Specifically, new (derived) alleles of SNPs within CNCs are rarer than new alleles in nonconserved regions (P = 3 x 10(-18)), indicating that evolutionary pressure has suppressed CNC-derived allele frequencies. Intronic CNCs and CNCs near genes show greater allele frequency shifts, with magnitudes comparable to those for missense variants. Thus, conserved noncoding variants are more likely to be functional. Allele frequency distributions highlight selectively constrained genomic regions that should be intensively surveyed for functionally important variation.
Collapse
Affiliation(s)
- Jared A Drake
- Program in Genomics and Division of Endocrinology, Children's Hospital, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
650
|
Stone EA, Cooper GM, Sidow A. Trade-offs in detecting evolutionarily constrained sequence by comparative genomics. Annu Rev Genomics Hum Genet 2005; 6:143-64. [PMID: 16124857 DOI: 10.1146/annurev.genom.6.080604.162146] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
As whole-genome sequencing efforts extend beyond more traditional model organisms to include a deep diversity of species, comparative genomic analyses will be further empowered to reveal insights into the human genome and its evolution. The discovery and annotation of functional genomic elements is a necessary step toward a detailed understanding of our biology, and sequence comparisons have proven to be an integral tool for that task. This review is structured to broadly reflect the statistical challenges in discriminating these functional elements from the bulk of the genome that has evolved neutrally. Specifically, we review the comparative genomics literature in terms of specificity, sensitivity, and phylogenetic scope, as well as the trade-offs that relate these factors in standard analyses. We consider the impact of an expanding diversity of orthologous sequences on our ability to resolve functional elements. This impact is assessed through both recent comparative analyses of deep alignments and mathematical modeling.
Collapse
Affiliation(s)
- Eric A Stone
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | | | | |
Collapse
|