3801
|
Computer-aided drug design: the next 20 years. J Comput Aided Mol Des 2007; 21:591-601. [PMID: 17989929 DOI: 10.1007/s10822-007-9142-y] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2007] [Accepted: 10/18/2007] [Indexed: 10/22/2022]
Abstract
This perspectives article has been taken from a talk the author gave at the symposium in honor of Yvonne C. Martin's retirement, held at the American Chemical Society spring meeting in Chicago on March 25, 2007. The talk was intended as a somewhat lighthearted attempt to gaze into the future; inevitably, in print, things will come across more seriously than was intended. As we all know-the past is rarely predictive of the future.
Collapse
|
3802
|
Abstract
Models have been a tool of science at least since the 18th century and serve a variety of purposes from focusing abstract thoughts to representing scaled down version of things for study. Generally, animal models are needed when it is impractical or unethical to study the target animal. Biologists have taken modeling by analogy beyond most other disciplines, deriving the relationship between model and target through evolution. The "unity in diversity" concept suggests that homology between model and target foretells functional similarities. Animal model studies have been invaluable for elucidating general strategies, pathways, processes and guiding the development of hypotheses to test in target animals. The vast majority of animals used as models are used in biomedical preclinical trials. The predictive value of those animal studies is carefully monitored, thus providing an ideal dataset for evaluating the efficacy of animal models. On average, the extrapolated results from studies using tens of millions of animals fail to accurately predict human responses. Inadequacies in experimental designs may account for some of the failure. However, recent discoveries of unexpected variation in genome organization and regulation may reveal a heretofore unknown lack of homology between model animals and target animals that could account for a significant proportion of the weakness in predictive ability. A better understanding of the mechanisms of gene regulation may provide needed insight to improve the predictability of animal models.
Collapse
Affiliation(s)
- R J Wall
- Animal Bioscience and Biotechnology Lab, Agricultural Research Service, Beltsville, MD, USA.
| | | |
Collapse
|
3803
|
Abstract
In the last decade, governments, medical charities, pharmaceutical companies and disease advocacy organizations have spent considerable time and money developing biobanks to aid drug discovery and the investigation of disease. This article identifies and assesses the various expectations that have driven the investment in different types of biobanks. It suggests that they have been the focus of unrealistic promises about producing a ‘biobank revolution’ that will transform biomedicine and healthcare. We need more modest expectations about what can be achieved, and need to tackle certain conceptual and methodological challenges for biobanks to fulfill their potential.
Collapse
Affiliation(s)
- Richard Tutton
- Lancaster University, Centre for the Economic & Social Aspects of Genomics (CESAGen), Institute for Advanced Studies, Lancaster, LA1 4YD, UK
| |
Collapse
|
3804
|
Touching Base. Nat Genet 2007. [DOI: 10.1038/ng1107-1311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
3805
|
Lombardo A, Genovese P, Beausejour CM, Colleoni S, Lee YL, Kim KA, Ando D, Urnov FD, Galli C, Gregory PD, Holmes MC, Naldini L. Gene editing in human stem cells using zinc finger nucleases and integrase-defective lentiviral vector delivery. Nat Biotechnol 2007; 25:1298-306. [PMID: 17965707 DOI: 10.1038/nbt1353] [Citation(s) in RCA: 654] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Accepted: 10/09/2007] [Indexed: 11/08/2022]
Abstract
Achieving the full potential of zinc-finger nucleases (ZFNs) for genome engineering in human cells requires their efficient delivery to the relevant cell types. Here we exploited the infectivity of integrase-defective lentiviral vectors (IDLV) to express ZFNs and provide the template DNA for gene correction in different cell types. IDLV-mediated delivery supported high rates (13-39%) of editing at the IL-2 receptor common gamma-chain gene (IL2RG) across different cell types. IDLVs also mediated site-specific gene addition by a process that required ZFN cleavage and homologous template DNA, thus establishing a platform that can target the insertion of transgenes into a predetermined genomic site. Using IDLV delivery and ZFNs targeting distinct loci, we observed high levels of gene addition (up to 50%) in a panel of human cell lines, as well as human embryonic stem cells (5%), allowing rapid, selection-free isolation of clonogenic cells with the desired genetic modification.
Collapse
Affiliation(s)
- Angelo Lombardo
- San Raffaele Telethon Institute for Gene Therapy, via Olgettina, 58, 20132 Milan, Italy
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3806
|
|
3807
|
Affiliation(s)
- Michael Ashburner
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| |
Collapse
|
3808
|
Becker TS, Lenhard B. The random versus fragile breakage models of chromosome evolution: a matter of resolution. Mol Genet Genomics 2007; 278:487-91. [PMID: 17851692 DOI: 10.1007/s00438-007-0287-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2007] [Accepted: 08/28/2007] [Indexed: 12/01/2022]
Abstract
Conserved synteny--the sharing of at least one orthologous gene by a pair of chromosomes from two species--can, in the strictest sense, be viewed as sequence conservation between chromosomes of two related species, irrespective of whether coding or non-coding sequence is examined. The recent sequencing of multiple vertebrate genomes indicates that certain chromosomal segments of considerable size are conserved in gene order as well as underlying non-coding sequence across all vertebrates. Some of these segments lost genes or non-coding sequence and/or underwent breakage only in teleost genomes, presumably because evolutionary pressure acting on these regions to remain intact were relaxed after an additional round of whole genome duplication. Random reporter insertions into zebrafish chromosomes combined with computational genome-wide analysis indicate that large chromosomal areas of multiple genes contain long-range regulatory elements, which act on their target genes from several gene distances away. In addition, computational breakpoint analyses suggest that recurrent evolutionary breaks are found in "fragile regions" or "hotspots", outside of the conserved blocks of synteny. These findings cannot be accommodated by the random breakage model and suggest that this view of genome and chromosomal evolution requires substantial reassessment.
Collapse
|
3809
|
Abstract
The landscapes of mammalian genomes are characterized by complex patterns of intersecting and overlapping sense and antisense transcription, giving rise to large numbers of coding and non-protein-coding RNAs (ncRNAs). A recent report by Kapranov and colleagues(1) describes three potentially novel classes of RNAs located at the very edges of protein-coding genes. The presence of RNAs from one of these classes appears to be correlated with the expression levels of their associated genes. These results suggest that a proportion of these RNAs might have roles in the cis-regulation of neighbouring protein-coding genes' expression.
Collapse
MESH Headings
- Animals
- Dosage Compensation, Genetic
- Evolution, Molecular
- Gene Expression
- Gene Silencing
- Genome
- Genome, Human
- Genomic Imprinting
- Humans
- Mammals
- MicroRNAs/genetics
- Models, Genetic
- Protein Biosynthesis/genetics
- RNA/chemistry
- RNA/classification
- RNA/genetics
- RNA/metabolism
- RNA, Small Interfering/genetics
- RNA, Small Nuclear/genetics
- RNA, Small Nucleolar/genetics
- RNA, Untranslated/chemistry
- RNA, Untranslated/genetics
- RNA, Untranslated/metabolism
- Sequence Analysis, RNA
- Transcription, Genetic
Collapse
Affiliation(s)
- Jasmina Ponjavic
- MRC Functional Genetics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, UK
| | | |
Collapse
|
3810
|
Poland GA, Ovsyannikova IG, Jacobson RM, Smith DI. Heterogeneity in vaccine immune response: the role of immunogenetics and the emerging field of vaccinomics. Clin Pharmacol Ther 2007; 82:653-64. [PMID: 17971814 DOI: 10.1038/sj.clpt.6100415] [Citation(s) in RCA: 157] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Recent advances in the fields of immunology, genetics, molecular biology, bioinformatics, and the Human Genome Project have allowed for the emergence of the field of vaccinomics. Vaccinomics encompasses the fields of immunogenetics and immunogenomics as applied to understanding the mechanisms of heterogeneity in immune responses to vaccines. In this study, we examine the role of HLA genes, cytokine genes, and cell surface receptor genes as examples of how genetic polymorphism leads to individual and population variations in immune responses to vaccines. In turn, this data, in concert with new high-throughput technology, inform the immune-response network theory to vaccine response. Such information can be used in the directed and rational development of new vaccines, and this new golden age of vaccinology has been termed "predictive vaccinology", which will predict the likelihood of a vaccine response or an adverse response to a vaccine, the number of doses needed and even whether a vaccine is likely to be of benefit (i.e., is the individual at risk for the outcome for which the vaccine is being administered?).
Collapse
Affiliation(s)
- G A Poland
- Mayo Vaccine Research Group and the Program in Translational Immunovirology and Biodefense, Mayo Clinic College of Medicine, Rochester, Minnesota, USA.
| | | | | | | |
Collapse
|
3811
|
Ruepp A, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Stransky M, Waegele B, Schmidt T, Doudieu ON, Stümpflen V, Mewes HW. CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Res 2007; 36:D646-50. [PMID: 17965090 PMCID: PMC2238909 DOI: 10.1093/nar/gkm936] [Citation(s) in RCA: 274] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes.
Collapse
Affiliation(s)
- Andreas Ruepp
- Institute for Bioinformatics (MIPS), German Research Center for Environmental Health, Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3812
|
Singh LN, Wang LS, Hannenhalli S. TREMOR--a tool for retrieving transcriptional modules by incorporating motif covariance. Nucleic Acids Res 2007; 35:7360-71. [PMID: 17962303 PMCID: PMC2189735 DOI: 10.1093/nar/gkm885] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
A transcriptional module (TM) is a collection of transcription factors (TF) that as a group, co-regulate multiple, functionally related genes. The task of identifying TMs poses an important biological challenge. Since TFs belong to evolutionarily and structurally related families, TF family members often bind to similar DNA motifs and can confound sequence-based approaches to TM identification. A previous approach to TM detection addresses this issue by pre-selecting a single representative from each TF family. One problem with this approach is that closely related transcription factors can still target sufficiently distinct genes in a biologically meaningful way, and thus, pre-selecting a single family representative may in principle miss certain TMs. Here we report a method—TREMOR (Transcriptional Regulatory Module Retriever). This method uses the Mahalanobis distance to assess the validity of a TM and automatically incorporates the inter-TF binding similarity without resorting to pre-selecting family representatives. The application of TREMOR on human muscle-specific, liver-specific and cell-cycle-related genes reveals TFs and TMs that were validated from literature and also reveals additional related genes.
Collapse
Affiliation(s)
- Larry N Singh
- Penn Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|
3813
|
Reymond A, Henrichsen CN, Harewood L, Merla G. Side effects of genome structural changes. Curr Opin Genet Dev 2007; 17:381-6. [PMID: 17913489 DOI: 10.1016/j.gde.2007.08.009] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2007] [Accepted: 08/17/2007] [Indexed: 12/13/2022]
Abstract
The first extensive catalog of structural human variation was recently released. It showed that large stretches of genomic DNA that vary considerably in copy number were extremely abundant. Thus it is conceivable that they play a major role in functional variation. Consistently, genomic insertions and deletions were shown to contribute to phenotypic differences by modifying not only the expression levels of genes within the aneuploid segments but also of normal copy-number neighboring genes. In this report, we review the possible mechanisms behind this latter effect.
Collapse
Affiliation(s)
- Alexandre Reymond
- Center for Integrative Genomics, Genopode Building, University of Lausanne, CH-1015 Lausanne, Switzerland.
| | | | | | | |
Collapse
|
3814
|
Conboy CM, Spyrou C, Thorne NP, Wade EJ, Barbosa-Morais NL, Wilson MD, Bhattacharjee A, Young RA, Tavaré S, Lees JA, Odom DT. Cell cycle genes are the evolutionarily conserved targets of the E2F4 transcription factor. PLoS One 2007; 2:e1061. [PMID: 17957245 PMCID: PMC2020443 DOI: 10.1371/journal.pone.0001061] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Accepted: 09/27/2007] [Indexed: 12/21/2022] Open
Abstract
Maintaining quiescent cells in G0 phase is achieved in part through the multiprotein subunit complex known as DREAM, and in human cell lines the transcription factor E2F4 directs this complex to its cell cycle targets. We found that E2F4 binds a highly overlapping set of human genes among three diverse primary tissues and an asynchronous cell line, which suggests that tissue-specific binding partners and chromatin structure have minimal influence on E2F4 targeting. To investigate the conservation of these transcription factor binding events, we identified the mouse genes bound by E2f4 in seven primary mouse tissues and a cell line. E2f4 bound a set of mouse genes that was common among mouse tissues, but largely distinct from the genes bound in human. The evolutionarily conserved set of E2F4 bound genes is highly enriched for functionally relevant regulatory interactions important for maintaining cellular quiescence. In contrast, we found minimal mRNA expression perturbations in this core set of E2f4 bound genes in the liver, kidney, and testes of E2f4 null mice. Thus, the regulatory mechanisms maintaining quiescence are robust even to complete loss of conserved transcription factor binding events.
Collapse
Affiliation(s)
- Caitlin M. Conboy
- Cancer Research UK-Cambridge Research Institute, Li Ka Shing Centre, Cambridge, United Kingdom
| | - Christiana Spyrou
- Cancer Research UK-Cambridge Research Institute, Li Ka Shing Centre, Cambridge, United Kingdom
- Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, United Kingdom
| | - Natalie P. Thorne
- Cancer Research UK-Cambridge Research Institute, Li Ka Shing Centre, Cambridge, United Kingdom
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom
| | - Elizabeth J. Wade
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Nuno L. Barbosa-Morais
- Cancer Research UK-Cambridge Research Institute, Li Ka Shing Centre, Cambridge, United Kingdom
| | - Michael D. Wilson
- Cancer Research UK-Cambridge Research Institute, Li Ka Shing Centre, Cambridge, United Kingdom
| | | | - Richard A. Young
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Simon Tavaré
- Cancer Research UK-Cambridge Research Institute, Li Ka Shing Centre, Cambridge, United Kingdom
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom
| | - Jacqueline A. Lees
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Duncan T. Odom
- Cancer Research UK-Cambridge Research Institute, Li Ka Shing Centre, Cambridge, United Kingdom
| |
Collapse
|
3815
|
Emmrich F. Abstracts of the 3rd World Congress on Regenerative Medicine, October 18-20, 2007, Leipzig, Germany. Regen Med 2007; 2:485-740. [PMID: 17941763 DOI: 10.2217/17460751.2.5.485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Affiliation(s)
- Frank Emmrich
- Congress President Fraunhofer Institute for Cell Therapy and Immunology IZI, Leipzig, Germany
| |
Collapse
|
3816
|
Wang J, Ungar LH, Tseng H, Hannenhalli S. MetaProm: a neural network based meta-predictor for alternative human promoter prediction. BMC Genomics 2007; 8:374. [PMID: 17941982 PMCID: PMC2194789 DOI: 10.1186/1471-2164-8-374] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2007] [Accepted: 10/17/2007] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND De novo eukaryotic promoter prediction is important for discovering novel genes and understanding gene regulation. In spite of the great advances made in the past decade, recent studies revealed that the overall performances of the current promoter prediction programs (PPPs) are still poor, and predictions made by individual PPPs do not overlap each other. Furthermore, most PPPs are trained and tested on the most-upstream promoters; their performances on alternative promoters have not been assessed. RESULTS In this paper, we evaluate the performances of current major promoter prediction programs (i.e., PSPA, FirstEF, McPromoter, DragonGSF, DragonPF, and FProm) using 42,536 distinct human gene promoters on a genome-wide scale, and with emphasis on alternative promoters. We describe an artificial neural network (ANN) based meta-predictor program that integrates predictions from the current PPPs and the predicted promoters' relation to CpG islands. Our specific analysis of recently discovered alternative promoters reveals that although only 41% of the 3' most promoters overlap a CpG island, 74% of 5' most promoters overlap a CpG island. CONCLUSION Our assessment of six PPPs on 1.06 x 109 bps of human genome sequence reveals the specific strengths and weaknesses of individual PPPs. Our meta-predictor outperforms any individual PPP in sensitivity and specificity. Furthermore, we discovered that the 5' alternative promoters are more likely to be associated with a CpG island.
Collapse
Affiliation(s)
- Junwen Wang
- Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
- Core Genotyping Facility, Advanced Technology Program, SAIC-Frederick, Frederick, MD 21702, USA
- Division of Cancer Epidemiology and Genetics, NCI, NIH, Bethesda, MD 20892, USA
| | - Lyle H Ungar
- Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hung Tseng
- Department of Dermatology, University of Pennsylvania, Philadelphia, PA 19104, USA
- Cell and Developmental Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Research on Reproduction and Women's Health, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sridhar Hannenhalli
- Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
3817
|
Wakaguri H, Yamashita R, Suzuki Y, Sugano S, Nakai K. DBTSS: database of transcription start sites, progress report 2008. Nucleic Acids Res 2007; 36:D97-101. [PMID: 17942421 PMCID: PMC2238895 DOI: 10.1093/nar/gkm901] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
DBTSS is a database of transcriptional start sites, based on our unique collection of precise, experimentally determined 5'-end sequences of full-length cDNAs. Since its first release in 2002, several major updates have been made. In this update, we expanded the human transcriptional start site dataset by 19 million uniquely mapped, and RefSeq-associated, 5'-end sequences, which were generated by a newly introduced Solexa sequencer. Moreover, in order to provide means for interpreting those massive TSS data, we implemented two new analytical tools: one for connecting expression information with predicted transcription factor binding sites; the other for examining evolutionary conservation or species-specificity of promoters and transcripts, which can be browsed by our own comparative genome viewer. With the expanded dataset and the enhanced functionalities, DBTSS provides a unique platform that enables in-depth transcriptome analyses. DBTSS is accessible at http://dbtss.hgc.jp/.
Collapse
Affiliation(s)
- Hiroyuki Wakaguri
- Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | | | | | | | | |
Collapse
|
3818
|
Affiliation(s)
- James F Leckman
- Child Study Center, Yale University School of Medicine, New Haven, Connecticut 06520-7900, USA.
| |
Collapse
|
3819
|
Yates T, Okoniewski MJ, Miller CJ. X:Map: annotation and visualization of genome structure for Affymetrix exon array analysis. Nucleic Acids Res 2007; 36:D780-6. [PMID: 17932061 PMCID: PMC2238884 DOI: 10.1093/nar/gkm779] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Affymetrix exon arrays aim to target every known and predicted exon in the human, mouse or rat genomes, and have reporters that extend beyond protein coding regions to other areas of the transcribed genome. This combination of increased coverage and precision is important because a substantial proportion of protein coding genes are predicted to be alternatively spliced, and because many non-coding genes are known also to be of biological significance. In order to fully exploit these arrays, it is necessary to associate each reporter on the array with the features of the genome it is targeting, and to relate these to gene and genome structure. X:Map is a genome annotation database that provides this information. Data can be browsed using a novel Google-maps based interface, and analysed and further visualized through an associated BioConductor package. The database can be found at http://xmap.picr.man.ac.uk.
Collapse
Affiliation(s)
- Tim Yates
- Cancer Research UK, Bioinformatics Group, Paterson Institute for Cancer Research, The University of Manchester, Christie Hospital Site, Wilmslow Road, Withington, Manchester, M20 4BX, UK
| | | | | |
Collapse
|
3820
|
Madsen BE, Villesen P, Wiuf C. A periodic pattern of SNPs in the human genome. Genes Dev 2007; 17:1414-9. [PMID: 17673700 PMCID: PMC1987342 DOI: 10.1101/gr.6223207] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2006] [Accepted: 06/18/2007] [Indexed: 11/24/2022]
Abstract
By surveying a filtered, high-quality set of SNPs in the human genome, we have found that SNPs positioned 1, 2, 4, 6, or 8 bp apart are more frequent than SNPs positioned 3, 5, 7, or 9 bp apart. The observed pattern is not restricted to genomic regions that are known to cause sequencing or alignment errors, for example, transposable elements (SINE, LINE, and LTR), tandem repeats, and large duplicated regions. However, we found that the pattern is almost entirely confined to what we define as "periodic DNA." Periodic DNA is a genomic region with a high degree of periodicity in nucleotide usage. It turned out that periodic DNA is mainly small regions (average length 16.9 bp), widely distributed in the genome. Furthermore, periodic DNA has a 1.8 times higher SNP density than the rest of the genome and SNPs inside periodic DNA have a significantly higher genotyping error rate than SNPs outside periodic DNA. Our results suggest that not all SNPs in the human genome are created by independent single nucleotide mutations, and that care should be taken in analysis of SNPs from periodic DNA. The latter may have important consequences for SNP and association studies.
Collapse
Affiliation(s)
- Bo Eskerod Madsen
- Bioinformatics Research Center (BiRC), University of Aarhus, Hoegh-Guldbergs Gade 10, DK-8000 Aarhus C, Denmark
| | - Palle Villesen
- Bioinformatics Research Center (BiRC), University of Aarhus, Hoegh-Guldbergs Gade 10, DK-8000 Aarhus C, Denmark
| | - Carsten Wiuf
- Bioinformatics Research Center (BiRC), University of Aarhus, Hoegh-Guldbergs Gade 10, DK-8000 Aarhus C, Denmark
- Molecular Diagnostic Laboratory, Aarhus University Hospital, Brendstrupgaardsvej 90, DK-8200 Aarhus N, Denmark
| |
Collapse
|
3821
|
Cho JH, Weaver CT. The genetics of inflammatory bowel disease. Gastroenterology 2007; 133:1327-39. [PMID: 17919503 DOI: 10.1053/j.gastro.2007.08.032] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Accepted: 08/01/2007] [Indexed: 12/21/2022]
Affiliation(s)
- Judy H Cho
- Inflammatory Bowel Disease Center, Section of Digestive Diseases, Yale University, New Haven, Connecticut 06520-8019, USA.
| | | |
Collapse
|
3822
|
Scherrer K, Jost J. Gene and genon concept: coding versus regulation. A conceptual and information-theoretic analysis of genetic storage and expression in the light of modern molecular biology. Theory Biosci 2007; 126:65-113. [PMID: 18087760 PMCID: PMC2242853 DOI: 10.1007/s12064-007-0012-x] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Accepted: 07/13/2007] [Indexed: 01/15/2023]
Abstract
We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term "genon". In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon.
Collapse
Affiliation(s)
- Klaus Scherrer
- Institut Jacques Monod, CNRS and Univ. Paris 7, 2, place Jussieu, 75251 Paris-Cedex 5, France
| | - Jürgen Jost
- Max Planck Institute for Mathematics in the Sciences MPI MIS, Inselstrasse 22, 04103 Leipzig, Germany
| |
Collapse
|
3823
|
Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavaré S, Deloukas P, Dermitzakis ET. Population genomics of human gene expression. Nat Genet 2007; 39:1217-24. [PMID: 17873874 PMCID: PMC2683249 DOI: 10.1038/ng2142] [Citation(s) in RCA: 892] [Impact Index Per Article: 49.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Accepted: 08/29/2007] [Indexed: 01/09/2023]
Abstract
Genetic variation influences gene expression, and this variation in gene expression can be efficiently mapped to specific genomic regions and variants. Here we have used gene expression profiling of Epstein-Barr virus-transformed lymphoblastoid cell lines of all 270 individuals genotyped in the HapMap Consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency in HapMap) with gene expression identified at least 1,348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis signals and 15% of trans signals, respectively. Our results strongly support an abundance of cis-regulatory variation in the human genome. Detection of trans effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. We also explore several methodologies that improve the current state of analysis of gene expression variation.
Collapse
Affiliation(s)
- Barbara E. Stranger
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
| | - Alexandra C. Nica
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
| | - Matthew S. Forrest
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
| | - Antigone Dimas
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
| | - Christine P. Bird
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
| | - Claude Beazley
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
| | - Catherine E. Ingle
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
| | - Mark Dunning
- Department of Oncology, University of Cambridge, Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Paul Flicek
- European Bioinformatics Institute, Hinxton UK
| | - Daphne Koller
- Computer Science Department, Stanford University, Stanford, CA 94305-9010, USA
| | - Stephen Montgomery
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
| | - Simon Tavaré
- Department of Oncology, University of Cambridge, Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Panagiotis Deloukas
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
| | | |
Collapse
|
3824
|
Gerle B, Koroknai A, Fejer G, Bakos A, Banati F, Szenthe K, Wolf H, Niller HH, Minarovits J, Salamon D. Acetylated histone H3 and H4 mark the upregulated LMP2A promoter of Epstein-Barr virus in lymphoid cells. J Virol 2007; 81:13242-7. [PMID: 17898065 PMCID: PMC2169097 DOI: 10.1128/jvi.01396-07] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We analyzed the levels of acetylated histones and histone H3 dimethylated on lysine 4 (H3K4me2) at the LMP2A promoter (LMP2Ap) of Epstein-Barr virus in well-characterized type I and type III lymphoid cell line pairs and additionally in the nasopharyngeal carcinoma cell line C666-1 by using chromatin immunoprecipitation. We found that enhanced levels of acetylated histones marked the upregulated LMP2Ap in lymphoid cells. In contrast, in C666-1 cells, the highly DNA-methylated, inactive LMP2Ap was also enriched in acetylated histones and H3K4me2. Our results suggest that the combinatorial effects of DNA methylation, histone acetylation, and H3K4me2 modulate the activity of LMP2Ap.
Collapse
Affiliation(s)
- Borbala Gerle
- Microbiological Research Group, National Center for Epidemiology, Pihenö u. 1, H-1529 Budapest, Hungary
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3825
|
Ghosh S, Hirsch HA, Sekinger EA, Kapranov P, Struhl K, Gingeras TR. Differential analysis for high density tiling microarray data. BMC Bioinformatics 2007; 8:359. [PMID: 17892592 PMCID: PMC2231405 DOI: 10.1186/1471-2105-8-359] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2007] [Accepted: 09/24/2007] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND High density oligonucleotide tiling arrays are an effective and powerful platform for conducting unbiased genome-wide studies. The ab initio probe selection method employed in tiling arrays is unbiased, and thus ensures consistent sampling across coding and non-coding regions of the genome. These arrays are being increasingly used to study the associated processes of transcription, transcription factor binding, chromatin structure and their association. Studies of differential expression and/or regulation provide critical insight into the mechanics of transcription and regulation that occurs during the developmental program of a cell. The time-course experiment, which comprises an in-vivo system and the proposed analyses, is used to determine if annotated and un-annotated portions of genome manifest coordinated differential response to the induced developmental program. RESULTS We have proposed a novel approach, based on a piece-wise function - to analyze genome-wide differential response. This enables segmentation of the response based on protein-coding and non-coding regions; for genes the methodology also partitions differential response with a 5' versus 3' versus intra-genic bias. CONCLUSION The algorithm built upon the framework of Significance Analysis of Microarrays, uses a generalized logic to define regions/patterns of coordinated differential change. By not adhering to the gene-centric paradigm, discordant differential expression patterns between exons and introns have been identified at a FDR of less than 12 percent. A co-localization of differential binding between RNA Polymerase II and tetra-acetylated histone has been quantified at a p-value < 0.003; it is most significant at the 5' end of genes, at a p-value < 10-13. The prototype R code has been made available as supplementary material [see Additional file 1].
Collapse
Affiliation(s)
| | - Heather A Hirsch
- Dept. Biological Chemistry & Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | | | | | - Kevin Struhl
- Dept. Biological Chemistry & Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | | |
Collapse
|
3826
|
|
3827
|
Abstract
Although the number of protein-encoding human genes is more limited than many had estimated, the human transcript repertoire is much more diverse than anticipated. In part, transcript diversity is generated through the use of alternative promoters and alternate splicing. In addition, based on discoveries using technologies such as full-length cDNA libraries and whole genome tiling microarrays, it is now likely that non-protein-encoding transcripts comprise a substantial fraction of the human RNA population. Much attention is currently focused on understanding the role of alternative promoters in generating transcript diversity, both for non-protein-encoding (ncRNAs) and protein-encoding RNAs.
Collapse
|
3828
|
Patay BA, Topol EJ. Is there a genetic basis for acute coronary syndrome? ACTA ACUST UNITED AC 2007; 4:596-7. [PMID: 17876345 DOI: 10.1038/ncpcardio1006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Accepted: 07/31/2007] [Indexed: 11/09/2022]
Affiliation(s)
- Bradley A Patay
- Division of Cardiovascular Diseases, Scripps Clinic and Research Foundation, 10666 North Torrey Pines Road, SW 206, La Jolla, CA 92037, USA
| | | |
Collapse
|
3829
|
Abstract
MOTIVATION Contemporary, high-throughput sequencing efforts have identified a rich source of naturally occurring single nucleotide polymorphisms (SNPs), a subset of which occur in the coding region of genes and result in a change in the encoded amino acid sequence (non-synonymous coding SNPs or 'nsSNPs'). It is hypothesized that a subset of these nsSNPs may underlie common human disease. Testing all these polymorphisms for disease association would be time consuming and expensive. Thus, computational methods have been developed to both prioritize candidate nsSNPs and make sense of their likely molecular physiologic impact. RESULTS We have developed a method to prioritize nsSNPs and have applied it to the human protein kinase gene family. The results of our analyses provide high quality predictions and outperform available whole genome prediction methods (74% versus 83% prediction accuracy). Our analyses and methods consider both DNA sequence conservation, which most traditional methods are based on, as well unique structural and functional features of kinases. We provide a ranked list of common kinase nsSNPs that have a higher probability of impacting human disease based on our analyses.
Collapse
Affiliation(s)
- Ali Torkamani
- Department of Medicine, Center for Human Genetics and Genomics, The Scripps Research Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | | |
Collapse
|
3830
|
Seemann SE, Gilchrist MJ, Hofacker IL, Stadler PF, Gorodkin J. Detection of RNA structures in porcine EST data and related mammals. BMC Genomics 2007; 8:316. [PMID: 17845718 PMCID: PMC2072958 DOI: 10.1186/1471-2164-8-316] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2007] [Accepted: 09/10/2007] [Indexed: 11/18/2022] Open
Abstract
Background Non-coding RNAs (ncRNAs) are involved in a wide spectrum of regulatory functions. Within recent years, there have been increasing reports of observed polyadenylated ncRNAs and mRNA like ncRNAs in eukaryotes. To investigate this further, we examined the large data set in the Sino-Danish PigEST resource which also contains expression information distributed on 97 non-normalized cDNA libraries. Results We constructed a pipeline, EST2ncRNA, to search for known and novel ncRNAs. The pipeline utilises sequence similarity to ncRNA databases (blast), structure similarity to Rfam (RaveNnA) as well as multiple alignments to predict conserved novel putative RNA structures (RNAz). EST2ncRNA was fed with 48,000 contigs and 73,000 singletons available from the PigEST resource. Using the pipeline we identified known RNA structures in 137 contigs and single reads (conreads), and predicted high confidence RNA structures in non-protein coding regions of additional 1,262 conreads. Of these, structures in 270 conreads overlap with existing predictions in human. To sum up, the PigEST resource comprises trans-acting elements (ncRNAs) in 715 contigs and 340 singletons as well as cis-acting elements (inside UTRs) in 311 contigs and 51 singletons, of which 18 conreads contain both predictions of trans- and cis-acting elements. The predicted RNAz candidates were compared with the PigEST expression information and we identify 114 contigs with an RNAz prediction and expression in at least ten of the non-normalised cDNA libraries. We conclude that the contigs with RNAz and known predictions are in general expressed at a much lower level than protein coding transcripts. In addition, we also observe that our ncRNA candidates constitute about one to two percent of the genes expressed in the cDNA libraries. Intriguingly, the cDNA libraries from developmental (brain) tissues contain the highest amount of ncRNA candidates, about two percent. These observations are related to existing knowledge and hypotheses about the role of ncRNAs in higher organisms. Furthermore, about 80% porcine coding transcripts (of 18,600 identified) as well as less than one-third ORF-free transcripts are conserved at least in the closely related bovine genome. Approximately one percent of the coding and 10% of the remaining matches are unique between the PigEST data and cow genome. Based on the pig-cow alignments, we searched for similarities to 16 other organisms by UCSC available alignments, which resulted in a 87% coverage by the human genome for instance. Conclusion Besides recovering several of the already annotated functional RNA structures, we predicted a large number of high confidence conserved secondary structures in polyadenylated porcine transcripts. Our observations of relatively low expression levels of predicted ncRNA candidates together with the observations of higher relative amount in cDNA libraries from developmental stages are in agreement with the current paradigm of ncRNA roles in higher organisms and supports the idea of polyadenylated ncRNAs.
Collapse
Affiliation(s)
- Stefan E Seemann
- Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Germany
| | - Michael J Gilchrist
- The Wellcome Trust/Cancer Research UK Gurdon Institute, Cambridge, CB2 1QN, UK
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry and Structural Biology, University of Vienna, Austria
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Germany
- Institute for Theoretical Chemistry and Structural Biology, University of Vienna, Austria
| | - Jan Gorodkin
- Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark
| |
Collapse
|
3831
|
De Santa F, Totaro MG, Prosperini E, Notarbartolo S, Testa G, Natoli G. The histone H3 lysine-27 demethylase Jmjd3 links inflammation to inhibition of polycomb-mediated gene silencing. Cell 2007; 130:1083-94. [PMID: 17825402 DOI: 10.1016/j.cell.2007.08.019] [Citation(s) in RCA: 753] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2007] [Revised: 07/09/2007] [Accepted: 08/10/2007] [Indexed: 12/13/2022]
Abstract
Epigenetic chromatin marks restrict the ability of differentiated cells to change gene expression programs in response to environmental cues and to transdifferentiate. Polycomb group (PcG) proteins mediate gene silencing and repress transdifferentiation in a manner dependent on histone H3 lysine 27 trimethylation (H3K27me3). However, macrophages migrated into inflamed tissues can transdifferentiate, but it is unknown whether inflammation alters PcG-dependent silencing. Here we show that the JmjC-domain protein Jmjd3 is a H3K27me demethylase expressed in macrophages in response to bacterial products and inflammatory cytokines. Jmjd3 binds PcG target genes and regulates their H3K27me3 levels and transcriptional activity. The discovery of an inducible enzyme that erases a histone mark controlling differentiation and cell identity provides a link between inflammation and reprogramming of the epigenome, which could be the basis for macrophage plasticity and might explain the differentiation abnormalities in chronic inflammation.
Collapse
Affiliation(s)
- Francesca De Santa
- Department of Experimental Oncology, European Institute of Oncology, Campus IFOM-IEO, Via Adamello 16, 20139 Milan, Italy
| | | | | | | | | | | |
Collapse
|
3832
|
|
3833
|
Abstract
The laboratory mouse is widely considered the model organism of choice for studying the diseases of humans, with whom they share 99% of their genes. A distinguished history of mouse genetic experimentation has been further advanced by the development of powerful new tools to manipulate the mouse genome. The recent launch of several international initiatives to analyse the function of all mouse genes through mutagenesis, molecular analysis and phenotyping underscores the utility of the mouse for translating the information stored in the human genome into increasingly accurate models of human disease.
Collapse
Affiliation(s)
- Nadia Rosenthal
- Mouse Biology Unit, EMBL Monterotondo Outstation, via Ramarini 32, 00016, Monterotondo, Rome, Italy.
| | | |
Collapse
|
3834
|
Keller DM, McWeeney S, Arsenlis A, Drouin J, Wright CVE, Wang H, Wollheim CB, White P, Kaestner KH, Goodman RH. Characterization of pancreatic transcription factor Pdx-1 binding sites using promoter microarray and serial analysis of chromatin occupancy. J Biol Chem 2007; 282:32084-92. [PMID: 17761679 DOI: 10.1074/jbc.m700899200] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The homeobox transcription factor Pdx-1 is necessary for pancreas organogenesis and beta cell function, however, most Pdx-1-regulated genes are unknown. To further the understanding of Pdx-1 in beta cell biology, we have characterized its genomic targets in NIT-1 cells, a mouse insulinoma cell line. To identify novel targets, we developed a microarray that includes traditional promoters as well as non-coding conserved elements, micro-RNAs, and elements identified through an unbiased approach termed serial analysis of chromatin occupancy. In total, 583 new Pdx-1 target genes were identified, many of which contribute to energy sensing and insulin release in pancreatic beta cells. By analyzing 31 of the protein-coding Pdx-1 target genes, we show that 29 are expressed in beta cells and, of these, 68% are down- or up-regulated in cells expressing a dominant negative mutant of Pdx-1. We additionally show that many Pdx-1 targets also interact with NeuroD1/BETA2, including the micro-RNA miR-375, a known regulator of insulin secretion.
Collapse
Affiliation(s)
- David M Keller
- Vollum Institute, and Division of Biostatistics, Department of Public Health and Preventative Medicine, Oregon Health & Science University, Portland, Oregon 97239, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3835
|
Affiliation(s)
- Leonid Kruglyak
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA.
| | | |
Collapse
|
3836
|
Herszberg B, Mata X, Giulotto E, Decaunes P, Piras FM, Chowdhary BP, Chaffaux S, Guérin G. Characterization of the equine glycogen debranching enzyme gene (AGL): Genomic and cDNA structure, localization, polymorphism and expression. Gene 2007; 404:1-9. [PMID: 17905541 DOI: 10.1016/j.gene.2007.07.034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2007] [Revised: 07/20/2007] [Accepted: 07/24/2007] [Indexed: 10/22/2022]
Abstract
Glycogen debranching enzyme (AGL) is a multifunctional enzyme acting in the glycogen degradation pathway. In humans, the AGL activity deficiency causes a type III glycogen storage disease (Cori-Forbes disease). One particularity of AGL gene expression lies in the multiple alternative splicing in its 5' region. The AGL gene was localized on ECA5q14-q15. The sequence of the equine cDNA was determined to be 7.5 kb in length with an open reading frame of 4602 bp. The gene is 69 kb long and contains 35 exons. The equine AGL gene has an ubiquitous expression and presents five tissue-dependent cDNA variants arising from alternative splicing of the first exons. The equine skeletal muscle and heart contain four out of six variants previously described in humans and the equine liver express three of these four human variants. We identified a new alternative splicing variant expressed in equine skeletal and heart muscles. All these mRNA variants most probably encode only two different protein isoforms of 1533 and 1377 amino-acids. Four SNPs were detected in the mRNA. The equine in silico promoter sequence reveals a structure similar to those of other mammalian species. The disposition of the transcription factor biding sites does not correlate to the transcription start sites of tissue-specific variants.
Collapse
Affiliation(s)
- Bérénice Herszberg
- Institut National de la Recherche Agronomique, UR339, Centre de Recherches de Jouy, Laboratoire de Génétique biochimique et de Cytogénétique, 78350 Jouy-en-Josas, France
| | | | | | | | | | | | | | | |
Collapse
|
3837
|
Kullberg M, Hallström B, Arnason U, Janke A. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution. PLoS One 2007; 2:e775. [PMID: 17712423 PMCID: PMC1942079 DOI: 10.1371/journal.pone.0000775] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2007] [Accepted: 07/24/2007] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human), lagomorphs (rabbit), rodents (rat and mouse), artiodactyls (cow), carnivorans (dog) and proboscideans (elephant). METHODOLOGY/PRINCIPAL FINDINGS We have produced 2000 ESTs (1.2 mega bases) from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.
Collapse
Affiliation(s)
- Morgan Kullberg
- Department of Cell and Organism Biology, Division of Evolutionary Molecular Systematics, University of Lund, Lund, Sweden.
| | | | | | | |
Collapse
|
3838
|
Abstract
While less than 1.5% of the mammalian genome encodes proteins, it is now evident that the vast majority is transcribed, mainly into non-protein-coding RNAs. This raises the question of what fraction of the genome is functional, i.e., composed of sequences that yield functional products, are required for the expression (regulation or processing) of these products, or are required for chromosome replication and maintenance. Many of the observed noncoding transcripts are differentially expressed, and, while most have not yet been studied, increasing numbers are being shown to be functional and/or trafficked to specific subcellular locations, as well as exhibit subtle evidence of selection. On the other hand, analyses of conservation patterns indicate that only approximately 5% (3%-8%) of the human genome is under purifying selection for functions common to mammals. However, these estimates rely on the assumption that reference sequences (usually ancient transposon-derived sequences) have evolved neutrally, which may not be the case, and if so would lead to an underestimate of the fraction of the genome under evolutionary constraint. These analyses also do not detect functional sequences that are evolving rapidly and/or have acquired lineage-specific functions. Indeed, many regulatory sequences and known functional noncoding RNAs, including many microRNAs, are not conserved over significant evolutionary distances, and recent evidence from the ENCODE project suggests that many functional elements show no detectable level of sequence constraint. Thus, it is likely that much more than 5% of the genome encodes functional information, and although the upper bound is unknown, it may be considerably higher than currently thought.
Collapse
Affiliation(s)
- Michael Pheasant
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland 4072, Australia
| | | |
Collapse
|
3839
|
Royce TE, Rozowsky JS, Gerstein MB. Toward a universal microarray: prediction of gene expression through nearest-neighbor probe sequence identification. Nucleic Acids Res 2007; 35:e99. [PMID: 17686789 PMCID: PMC1976448 DOI: 10.1093/nar/gkm549] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
A generic DNA microarray design applicable to any species would greatly benefit comparative genomics. We have addressed the feasibility of such a design by leveraging the great feature densities and relatively unbiased nature of genomic tiling microarrays. Specifically, we first divided each Homo sapiens Refseq-derived gene's spliced nucleotide sequence into all of its possible contiguous 25 nt subsequences. For each of these 25 nt subsequences, we searched a recent human transcript mapping experiment's probe design for the 25 nt probe sequence having the fewest mismatches with the subsequence, but that did not match the subsequence exactly. Signal intensities measured with each gene's nearest-neighbor features were subsequently averaged to predict their gene expression levels in each of the experiment's thirty-three hybridizations. We examined the fidelity of this approach in terms of both sensitivity and specificity for detecting actively transcribed genes, for transcriptional consistency between exons of the same gene, and for reproducibility between tiling array designs. Taken together, our results provide proof-of-principle for probing nucleic acid targets with off-target, nearest-neighbor features.
Collapse
Affiliation(s)
- Thomas E Royce
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, USA
| | | | | |
Collapse
|
3840
|
Lindsey-Boltz LA, Sancar A. RNA polymerase: the most specific damage recognition protein in cellular responses to DNA damage? Proc Natl Acad Sci U S A 2007; 104:13213-4. [PMID: 17684092 PMCID: PMC1948916 DOI: 10.1073/pnas.0706316104] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Laura A. Lindsey-Boltz
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC 27599-7260
- *To whom correspondence may be addressed. E-mail: or
| | - Aziz Sancar
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC 27599-7260
- *To whom correspondence may be addressed. E-mail: or
| |
Collapse
|
3841
|
Koch CM, Andrews RM, Flicek P, Dillon SC, Karaöz U, Clelland GK, Wilcox S, Beare DM, Fowler JC, Couttet P, James KD, Lefebvre GC, Bruce AW, Dovey OM, Ellis PD, Dhami P, Langford CF, Weng Z, Birney E, Carter NP, Vetrie D, Dunham I. The landscape of histone modifications across 1% of the human genome in five human cell lines. Genome Res 2007; 17:691-707. [PMID: 17567990 PMCID: PMC1891331 DOI: 10.1101/gr.5704207] [Citation(s) in RCA: 318] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
We generated high-resolution maps of histone H3 lysine 9/14 acetylation (H3ac), histone H4 lysine 5/8/12/16 acetylation (H4ac), and histone H3 at lysine 4 mono-, di-, and trimethylation (H3K4me1, H3K4me2, H3K4me3, respectively) across the ENCODE regions. Studying each modification in five human cell lines including the ENCODE Consortium common cell lines GM06990 (lymphoblastoid) and HeLa-S3, as well as K562, HFL-1, and MOLT4, we identified clear patterns of histone modification profiles with respect to genomic features. H3K4me3, H3K4me2, and H3ac modifications are tightly associated with the transcriptional start sites (TSSs) of genes, while H3K4me1 and H4ac have more widespread distributions. TSSs reveal characteristic patterns of both types of modification present and the position relative to TSSs. These patterns differ between active and inactive genes and in particular the state of H3K4me3 and H3ac modifications is highly predictive of gene activity. Away from TSSs, modification sites are enriched in H3K4me1 and relatively depleted in H3K4me3 and H3ac. Comparison between cell lines identified differences in the histone modification profiles associated with transcriptional differences between the cell lines. These results provide an overview of the functional relationship among histone modifications and gene expression in human cells.
Collapse
Affiliation(s)
- Christoph M. Koch
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Robert M. Andrews
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Shane C. Dillon
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Ulaş Karaöz
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
| | - Gayle K. Clelland
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Sarah Wilcox
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - David M. Beare
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Joanna C. Fowler
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Phillippe Couttet
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Keith D. James
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Gregory C. Lefebvre
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Alexander W. Bruce
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Oliver M. Dovey
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Peter D. Ellis
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Pawandeep Dhami
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Cordelia F. Langford
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Zhiping Weng
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
- Biomedical Engineering Department, Boston University, Boston, Massachusetts 02215, USA
| | - Ewan Birney
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nigel P. Carter
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - David Vetrie
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Ian Dunham
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
- Corresponding author.E-mail ; fax 44 1223 494919
| |
Collapse
|
3842
|
Euskirchen GM, Rozowsky JS, Wei CL, Lee WH, Zhang ZD, Hartman S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB, Ruan Y, Snyder M. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res 2007; 17:898-909. [PMID: 17568005 PMCID: PMC1891348 DOI: 10.1101/gr.5583007] [Citation(s) in RCA: 162] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.
Collapse
Affiliation(s)
- Ghia M. Euskirchen
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
| | - Joel S. Rozowsky
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | | | | | - Zhengdong D. Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Stephen Hartman
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
| | - Olof Emanuelsson
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Viktor Stolc
- Center for Nanotechnology, NASA Ames Research Center, Moffett Field, California 94035, USA
| | - Sherman Weissman
- Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06520-8005, USA
| | - Mark B. Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Yijun Ruan
- Genome Institute of Singapore, Singapore 138672
| | - Michael Snyder
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
- Corresponding author.E-mail ; fax (203) 432-6161
| |
Collapse
|
3843
|
Xi H, Yu Y, Fu Y, Foley J, Halees A, Weng Z. Analysis of overrepresented motifs in human core promoters reveals dual regulatory roles of YY1. Genome Res 2007; 17:798-806. [PMID: 17567998 PMCID: PMC1891339 DOI: 10.1101/gr.5754707] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
A set of 723 high-quality human core promoter sequences were compiled and analyzed for overrepresented motifs. Beside the two well-characterized core promoter motifs (TATA and Inr), several known motifs (YY1, Sp1, NRF-1, NRF-2, CAAT, and CREB) and one potentially new motif (motif8) were found. Interestingly, YY1 and motif8 mostly reside immediately downstream from the TSS. In particular, the YY1 motif occurs primarily in genes with 5'-UTRs shorter than 40 base pairs (bp) and its locations coincide with the translation start site. We verified that the YY1 motif is bound by YY1 in vitro. We then performed detailed analysis on YY1 chromatin immunoprecipitation data with a whole-genome human promoter microarray (ChIP-chip) and revealed that the thus identified promoters in HeLa cells were highly enriched with the YY1 motif. Moreover, the motif overlapped with the translation start sites on the plus strand of a group of genes, many with short 5'-UTRs, and with the transcription start sites on the minus strand of another distinct group of genes; together, the two groups of genes accounted for the majority of the YY1-bound promoters in the ChIP-chip data. Furthermore, the first group of genes was highly enriched in the functional categories of ribosomal proteins and nuclear-encoded mitochondria proteins. We suggest that the YY1 motif plays a dual role in both transcription and translation initiation of these genes. We also discuss the evolutionary advantages of housing a transcriptional element inside the transcript in terms of the migration of these genes in the human genome.
Collapse
Affiliation(s)
- Hualin Xi
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
| | - Yong Yu
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
| | - Yutao Fu
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
| | - Jonathan Foley
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, USA
| | - Anason Halees
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
| | - Zhiping Weng
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, USA
- Corresponding author.E-mail ; fax (617) 353-6766
| |
Collapse
|
3844
|
Zadissa A, McEwan JC, Brown CM. Inference of transcriptional regulation using gene expression data from the bovine and human genomes. BMC Genomics 2007; 8:265. [PMID: 17683551 PMCID: PMC1978505 DOI: 10.1186/1471-2164-8-265] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2006] [Accepted: 08/03/2007] [Indexed: 01/12/2023] Open
Abstract
Background Gene expression is in part regulated by sequences in promoters that bind transcription factors. Thus, co-expressed genes may have shared sequence motifs representing putative transcription factor binding sites (TFBSs). However, for agriculturally important animals the genomic sequence is often incomplete. The more complete human genome may be able to be used for this prediction by taking advantage of the expected evolutionary conservation in TFBSs between the species. Results A method of de novo TFBS prediction based on MEME was implemented, tested, and validated on a muscle-specific dataset. Muscle specific expression data from EST library analysis from cattle was used to predict sets of genes whose expression was enriched in muscle and cardiac tissues. The upstream 1500 bases from calculated orthologous genes were extracted from the human reference set. A set of common motifs were discovered in these promoters. Slightly over one third of these motifs were identified as known TFBSs including known muscle specific binding sites. This analysis also predicted several highly statistically significantly overrepresented sites that may be novel TFBS. An independent analysis of the equivalent bovine genomic sequences was also done, this gave less detailed results than the human analysis due to both the quality of orthologue prediction and assembly in promoter regions. However, the most common motifs could be detected in both sets. Conclusion Using promoter sequences from human genes is a useful approach when studying gene expression in species with limited or non-existing genomic sequence. As the bovine genome becomes better annotated it can in turn serve as the reference genome for other agriculturally important ruminants, such as sheep, goat and deer.
Collapse
Affiliation(s)
- Amonida Zadissa
- Biochemistry Department, University of Otago, PO Box 56, Dunedin, New Zealand.
| | | | | |
Collapse
|
3845
|
Trinklein ND, Karaöz U, Wu J, Halees A, Force Aldred S, Collins PJ, Zheng D, Zhang ZD, Gerstein MB, Snyder M, Myers RM, Weng Z. Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome. Genome Res 2007; 17:720-31. [PMID: 17567992 PMCID: PMC1891333 DOI: 10.1101/gr.5716607] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3'-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5'-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5'-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5'-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.
Collapse
Affiliation(s)
- Nathan D. Trinklein
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Ulaş Karaöz
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
| | - Jiaqian Wu
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520, USA
| | - Anason Halees
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
| | - Shelley Force Aldred
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Patrick J. Collins
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Deyou Zheng
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Zhengdong D. Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Mark B. Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Michael Snyder
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Richard M. Myers
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
- Corresponding authors.E-mail ; fax (617) 353-6766.E-mail ; fax (650) 725-9689
| | - Zhiping Weng
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
- Biomedical Engineering Department, Boston University, Boston, Massachusetts 02215, USA
- Corresponding authors.E-mail ; fax (617) 353-6766.E-mail ; fax (650) 725-9689
| |
Collapse
|
3846
|
Zhang ZD, Paccanaro A, Fu Y, Weissman S, Weng Z, Chang J, Snyder M, Gerstein MB. Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res 2007; 17:787-97. [PMID: 17567997 PMCID: PMC1891338 DOI: 10.1101/gr.5573107] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The comprehensive inventory of functional elements in 44 human genomic regions carried out by the ENCODE Project Consortium enables for the first time a global analysis of the genomic distribution of transcriptional regulatory elements. In this study we developed an intuitive and yet powerful approach to analyze the distribution of regulatory elements found in many different ChIP-chip experiments on a 10 approximately 100-kb scale. First, we focus on the overall chromosomal distribution of regulatory elements in the ENCODE regions and show that it is highly nonuniform. We demonstrate, in fact, that regulatory elements are associated with the location of known genes. Further examination on a local, single-gene scale shows an enrichment of regulatory elements near both transcription start and end sites. Our results indicate that overall these elements are clustered into regulatory rich "islands" and poor "deserts." Next, we examine how consistent the nonuniform distribution is between different transcription factors. We perform on all the factors a multivariate analysis in the framework of a biplot, which enhances biological signals in the experiments. This groups transcription factors into sequence-specific and sequence-nonspecific clusters. Moreover, with experimental variation carefully controlled, detailed correlations show that the distribution of sites was generally reproducible for a specific factor between different laboratories and microarray platforms. Data sets associated with histone modifications have particularly strong correlations. Finally, we show how the correlations between factors change when only regulatory elements far from the transcription start sites are considered.
Collapse
Affiliation(s)
- Zhengdong D. Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Alberto Paccanaro
- Department of Computer Science Royal Holloway, University of London, Egham Hill, TW20 0EX, United Kingdom
| | - Yutao Fu
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
| | - Sherman Weissman
- Department of Genetics, Yale University, New Haven, Connecticut 06510, USA
| | - Zhiping Weng
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
- Biomedical Engineering Department, Boston University, Boston, Massachusetts 02215, USA
| | - Joseph Chang
- Department of Statistics, Yale University, New Haven, Connecticut 06520, USA
| | - Michael Snyder
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut 06520, USA
| | - Mark B. Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Program in Computational Biology and Bioinformatics Yale University, New Haven, Connecticut 06520, USA
- Corresponding author.E-mail ; fax (360) 838-7861
| |
Collapse
|
3847
|
Ruan Y, Ooi HS, Choo SW, Chiu KP, Zhao XD, Srinivasan K, Yao F, Choo CY, Liu J, Ariyaratne P, Bin WG, Kuznetsov VA, Shahab A, Sung WK, Bourque G, Palanisamy N, Wei CL. Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). Genome Res 2007; 17:828-38. [PMID: 17568001 PMCID: PMC1891342 DOI: 10.1101/gr.6018607] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Identification of unconventional functional features such as fusion transcripts is a challenging task in the effort to annotate all functional DNA elements in the human genome. Paired-End diTag (PET) analysis possesses a unique capability to accurately and efficiently characterize the two ends of DNA fragments, which may have either normal or unusual compositions. This unique nature of PET analysis makes it an ideal tool for uncovering unconventional features residing in the human genome. Using the PET approach for comprehensive transcriptome analysis, we were able to identify fusion transcripts derived from genome rearrangements and actively expressed retrotransposed pseudogenes, which would be difficult to capture by other means. Here, we demonstrate this unique capability through the analysis of 865,000 individual transcripts in two types of cancer cells. In addition to the characterization of a large number of differentially expressed alternative 5' and 3' transcript variants and novel transcriptional units, we identified 70 fusion transcript candidates in this study. One was validated as the product of a fusion gene between BCAS4 and BCAS3 resulting from an amplification followed by a translocation event between the two loci, chr20q13 and chr17q23. Through an examination of PETs that mapped to multiple genomic locations, we identified 4055 retrotransposed loci in the human genome, of which at least three were found to be transcriptionally active. The PET mapping strategy presented here promises to be a useful tool in annotating the human genome, especially aberrations in human cancer genomes.
Collapse
Affiliation(s)
- Yijun Ruan
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
- Corresponding authors.E-mail ; fax 65-64789059.E-mail ; fax 65-64789059
| | - Hong Sain Ooi
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Siew Woh Choo
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Kuo Ping Chiu
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Xiao Dong Zhao
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - K.G. Srinivasan
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Fei Yao
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Chiou Yu Choo
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Jun Liu
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Pramila Ariyaratne
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Wilson G.W. Bin
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Vladimir A. Kuznetsov
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Atif Shahab
- Bioinformatics Institute, Singapore 138671, Singapore
| | - Wing-Kin Sung
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
- School of Computing, National University of Singapore, Singapore 117543, Singapore
| | - Guillaume Bourque
- Information and Mathematical Science Group, Genome Institute of Singapore, Singapore 138672, Singapore
| | | | - Chia-Lin Wei
- Genome Technology and Biology Group, Genome Institute of Singapore, Singapore 138672, Singapore
- Corresponding authors.E-mail ; fax 65-64789059.E-mail ; fax 65-64789059
| |
Collapse
|
3848
|
Rozowsky JS, Newburger D, Sayward F, Wu J, Jordan G, Korbel JO, Nagalakshmi U, Yang J, Zheng D, Guigó R, Gingeras TR, Weissman S, Miller P, Snyder M, Gerstein MB. The DART classification of unannotated transcription within the ENCODE regions: associating transcription with known and novel loci. Genome Res 2007; 17:732-45. [PMID: 17567993 PMCID: PMC1891334 DOI: 10.1101/gr.5696007] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
For the approximately 1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of "unannotated transcription." We use a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that approximately 14% of the novel TARs can be associated with known genes, while approximately 21% can be clustered into approximately 200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.
Collapse
Affiliation(s)
- Joel S. Rozowsky
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, Connecticut 06520-8114, USA
- Corresponding authors.E-mail ; fax (203) 432-5175.E-mail ; fax (360) 838-7861
| | - Daniel Newburger
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Fred Sayward
- Center for Medical Informatics, Yale University, New Haven, Connecticut 06520-8009, USA
| | - Jiaqian Wu
- Molecular, Cellular, and Developmental Biology Department, Yale University, New Haven, Connecticut 06520, USA
| | - Greg Jordan
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Jan O. Korbel
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Ugrappa Nagalakshmi
- Molecular, Cellular, and Developmental Biology Department, Yale University, New Haven, Connecticut 06520, USA
| | - Jin Yang
- Center for Medical Informatics, Yale University, New Haven, Connecticut 06520-8009, USA
| | - Deyou Zheng
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, Connecticut 06520-8114, USA
| | - Roderic Guigó
- Grup de Recerca en Informática Biomèdica, Institut Municipal d’Investigació Mèdica/Universitat Pompeu Fabra, 37-49, 08003, Barcelona, Catalonia, Spain
| | | | - Sherman Weissman
- Department of Genetics, Yale University, New Haven, Connecticut 06520, USA
| | - Perry Miller
- Center for Medical Informatics, Yale University, New Haven, Connecticut 06520-8009, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | - Michael Snyder
- Molecular, Cellular, and Developmental Biology Department, Yale University, New Haven, Connecticut 06520, USA
| | - Mark B. Gerstein
- Molecular Biophysics and Biochemistry Department, Yale University, New Haven, Connecticut 06520-8114, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Corresponding authors.E-mail ; fax (203) 432-5175.E-mail ; fax (360) 838-7861
| |
Collapse
|
3849
|
Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, Birney E, Keefe D, Schwartz AS, Hou M, Taylor J, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Brown JB, Bickel P, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Stone EA, Rosenbloom KR, Kent WJ, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VVB, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Hinrichs A, Trumbower H, Clawson H, Zweig A, Kuhn RM, Barber G, Harte R, Karolchik D, Field MA, Moore RA, Matthewson CA, Schein JE, Marra MA, Antonarakis SE, Batzoglou S, Goldman N, Hardison R, Haussler D, Miller W, Pachter L, Green ED, Sidow A. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res 2007; 17:760-74. [PMID: 17567995 PMCID: PMC1891336 DOI: 10.1101/gr.6034307] [Citation(s) in RCA: 149] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.
Collapse
Affiliation(s)
- Elliott H Margulies
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3850
|
King DC, Taylor J, Zhang Y, Cheng Y, Lawson HA, Martin J, ENCODE groups for Transcriptional Regulation and Multispecies Sequence Analysis, Chiaromonte F, Miller W, Hardison RC. Finding cis-regulatory elements using comparative genomics: some lessons from ENCODE data. Genome Res 2007; 17:775-86. [PMID: 17567996 PMCID: PMC1891337 DOI: 10.1101/gr.5592107] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Identification of functional genomic regions using interspecies comparison will be most effective when the full span of relationships between genomic function and evolutionary constraint are utilized. We find that sets of putative transcriptional regulatory sequences, defined by ENCODE experimental data, have a wide span of evolutionary histories, ranging from stringent constraint shown by deep phylogenetic comparisons to recent selection on lineage-specific elements. This diversity of evolutionary histories can be captured, at least in part, by the suite of available comparative genomics tools, especially after correction for regional differences in the neutral substitution rate. Putative transcriptional regulatory regions show alignability in different clades, and the genes associated with them are enriched for distinct functions. Some of the putative regulatory regions show evidence for recent selection, including a primate-specific, distal promoter that may play a novel role in regulation.
Collapse
Affiliation(s)
- David C. King
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - James Taylor
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ying Zhang
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Yong Cheng
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Heather A. Lawson
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Joel Martin
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | | | - Francesca Chiaromonte
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Webb Miller
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ross C. Hardison
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Corresponding author.E-mail ; fax (814) 863-7024
| |
Collapse
|