3751
|
Abstract
The increasing use of gene expression microarrays, and depositing of the resulting data into public repositories, means that more investigators are interested in using the technology either directly or through meta analysis of the publicly available data. The tools available for data analysis have generally been developed for use by experts in the field, making them difficult to use by the general research community. For those interested in entering the field, especially those without a background in statistics, it is difficult to understand why experimental results can be so variable. The purpose of this review is to go through the workflow of a typical microarray experiment, to show that decisions made at each step, from choice of platform through statistical analysis methods to biological interpretation, are all sources of this variability.
Collapse
|
3752
|
Mangan ME, Williams JM, Lathe SM, Karolchik D, Lathe WC. UCSC genome browser: deep support for molecular biomedical research. BIOTECHNOLOGY ANNUAL REVIEW 2008; 14:63-108. [PMID: 18606360 DOI: 10.1016/s1387-2656(08)00003-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).
Collapse
|
3753
|
Dermitzakis ET. Regulatory variation and evolution: implications for disease. ADVANCES IN GENETICS 2008; 61:295-306. [PMID: 18282511 DOI: 10.1016/s0065-2660(07)00011-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In the past few years, it has become apparent that there is a substantial amount of noncoding DNA that contributes to genome function. However, the multidimensionality of noncoding DNA properties does not allow us to readily identify, characterize, and assess the functional impact of mutations, polymorphisms, and interspecific substitutions. In this chapter, we discuss the evolutionary properties of some of the known noncoding genomic elements, namely regulatory regions, and the extensions of this to other potentially functionally important noncoding regions such as conserved noncoding regions. The implications of this analysis for studies looking at molecular phenotypes such as gene expression and whole-organism phenotypes (e.g., disease) are presented in the context of the exploration of noncoding DNA properties. The aim is to take advantage of current and emerging analysis methods for noncoding DNA to elucidate the genetic causes of phenotypic variation.
Collapse
Affiliation(s)
- Emmanouil T Dermitzakis
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA Cambridge, United Kingdom
| |
Collapse
|
3754
|
Affiliation(s)
- Jamison D Feramisco
- Department of Internal Medicine, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, USA
| | | | | | | |
Collapse
|
3755
|
Higgs DR, Vernimmen D, Wood B. Long-range regulation of alpha-globin gene expression. ADVANCES IN GENETICS 2008; 61:143-73. [PMID: 18282505 DOI: 10.1016/s0065-2660(07)00005-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Over the past 20 years, there has been an increasing awareness that gene expression can be regulated by multiple cis-acting sequences located at considerable distances (10-1000 kb) from the genes they control. Detailed investigation of a few specialized mammalian genes, including the genes controlling the synthesis of hemoglobin, provide important models to understand how such long-range regulatory elements act. In general, these elements contain a high density of evolutionarily conserved, transcription factor-binding sites and in many ways resemble the upstream regulatory elements found adjacent to the promoters of genes in simpler organisms, differing only in the distance over which they act. We have investigated in detail how the remote regulatory elements of the alpha-globin cluster become activated as hematopoietic stem cells (HSCs) undergo commitment, lineage specification, and differentiation to form red blood cells. In turn, we have addressed how, during this process, the upstream elements control the correct spatial and temporal expression from the alpha-gene promoter which lies approximately 60 kb downstream of these elements. At present too few loci have been studied to determine whether there are general principles underlying long-range regulation but some common themes are emerging.
Collapse
Affiliation(s)
- Douglas R Higgs
- MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford OX3 9DS, United Kingdom
| | | | | |
Collapse
|
3756
|
Kimura H, Hayashi-Takanaka Y, Goto Y, Takizawa N, Nozaki N. The Organization of Histone H3 Modifications as Revealed by a Panel of Specific Monoclonal Antibodies. Cell Struct Funct 2008; 33:61-73. [DOI: 10.1247/csf.07035] [Citation(s) in RCA: 246] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Affiliation(s)
- Hiroshi Kimura
- Nuclear Function and Dynamics Unit, HMRO, Graduate School of Medicine, Kyoto University
- Cell Biology Group, Kansai Advanced Research Center, National Institute of Information and Communications Technology
- Graduate School of Frontier Biosciences, Osaka University
| | - Yoko Hayashi-Takanaka
- Nuclear Function and Dynamics Unit, HMRO, Graduate School of Medicine, Kyoto University
- Cell Biology Group, Kansai Advanced Research Center, National Institute of Information and Communications Technology
| | - Yuji Goto
- Nuclear Function and Dynamics Unit, HMRO, Graduate School of Medicine, Kyoto University
- Present address: College of Life and Health Sciences, Chubu University
| | - Nanako Takizawa
- Nuclear Function and Dynamics Unit, HMRO, Graduate School of Medicine, Kyoto University
| | | |
Collapse
|
3757
|
Mathew CG. New links to the pathogenesis of Crohn disease provided by genome-wide association scans. Nat Rev Genet 2008; 9:9-14. [PMID: 17968351 DOI: 10.1038/nrg2203] [Citation(s) in RCA: 148] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Genome-wide association scans (GWAS) using large case-control samples and several hundred thousand genetic markers have uncovered at least ten new genomic regions associated with susceptibility to Crohn disease, a chronic inflammatory bowel disorder. The new loci include genes with diverse roles in the immune response and several gene deserts, which may contain regulatory sequences or encode novel functional transcripts. The results so far suggest that genome scans may re-define our ideas on the nature of causal variants in complex disease.
Collapse
Affiliation(s)
- Christopher G Mathew
- Division of Genetics and Molecular Medicine, King's College London School of Medicine, 8th Floor Guy's Tower, Guy's Hospital, London SE1 9RT, UK.
| |
Collapse
|
3758
|
Abstract
Traits related to energy balance and obesity are exceptionally complex, with varying contributions of genetic susceptibility and interacting environmental factors. The use of mouse models has been a powerful driving force in understanding the genetic architecture of polygenic traits such as obesity. However, the use of mouse models for analysis of complex traits is at an important crossroad. Genome-wide association studies in humans are now leading to direct identification of obesity genes. In this review, we focus on three areas representing the current and future roles of mouse models regarding genetics of complex obesity. First, we summarize increasingly powerful ways to harness the strength of mouse models for discovery of genes affecting polygenic obesity. Second, we examine the status of using a systems biology approach to dissect the genetic architecture of obesity. And third, we explore the effects of recent findings indicating increasing levels of complexity in the nature of variation underlying, and the heritability of, complex traits such as obesity.
Collapse
Affiliation(s)
- Daniel Pomp
- Department of Nutrition, Carolina Center for Genome Science, University of North Carolina, Chapel Hill, North Carolina 27599, USA.
| | | | | |
Collapse
|
3759
|
Huppert JL. Thermodynamic prediction of RNA–DNA duplex-forming regions in the human genome. MOLECULAR BIOSYSTEMS 2008; 4:686-91. [DOI: 10.1039/b800354h] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
3760
|
Frith MC, Valen E, Krogh A, Hayashizaki Y, Carninci P, Sandelin A. A code for transcription initiation in mammalian genomes. Genes Dev 2008; 18:1-12. [PMID: 18032727 PMCID: PMC2134772 DOI: 10.1101/gr.6831208] [Citation(s) in RCA: 196] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2007] [Accepted: 10/14/2007] [Indexed: 11/24/2022]
Abstract
Genome-wide detection of transcription start sites (TSSs) has revealed that RNA Polymerase II transcription initiates at millions of positions in mammalian genomes. Most core promoters do not have a single TSS, but an array of closely located TSSs with different rates of initiation. As a rule, genes have more than one such core promoter; however, defining the boundaries between core promoters is not trivial. These discoveries prompt a re-evaluation of our models for transcription initiation. We describe a new framework for understanding the organization of transcription initiation. We show that initiation events are clustered on the chromosomes at multiple scales-clusters within clusters-indicating multiple regulatory processes. Within the smallest of such clusters, which can be interpreted as core promoters, the local DNA sequence predicts the relative transcription start usage of each nucleotide with a remarkable 91% accuracy, implying the existence of a DNA code that determines TSS selection. Conversely, the total expression strength of such clusters is only partially determined by the local DNA sequence. Thus, the overall control of transcription can be understood as a combination of large- and small-scale effects; the selection of transcription start sites is largely governed by the local DNA sequence, whereas the transcriptional activity of a locus is regulated at a different level; it is affected by distal features or events such as enhancers and chromatin remodeling.
Collapse
Affiliation(s)
- Martin C. Frith
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- ARC Centre in Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Brisbane, Qld 4072, Australia
| | - Eivind Valen
- The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 København N, Denmark
| | - Anders Krogh
- The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 København N, Denmark
| | - Yoshihide Hayashizaki
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Piero Carninci
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Albin Sandelin
- The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 København N, Denmark
| |
Collapse
|
3761
|
Tress ML, Wesselink JJ, Frankish A, López G, Goldman N, Löytynoja A, Massingham T, Pardi F, Whelan S, Harrow J, Valencia A. Determination and validation of principal gene products. Bioinformatics 2008; 24:11-7. [PMID: 18006548 PMCID: PMC2734078 DOI: 10.1093/bioinformatics/btm547] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Alternative splicing has the potential to generate a wide range of protein isoforms. For many computational applications and for experimental research, it is important to be able to concentrate on the isoform that retains the core biological function. For many genes this is far from clear. RESULTS We have combined five methods into a pipeline that allows us to detect the principal variant for a gene. Most of the methods were based on conservation between species, at the level of both gene and protein. The five methods used were the conservation of exonic structure, the detection of non-neutral evolution, the conservation of functional residues, the existence of a known protein structure and the abundance of vertebrate orthologues. The pipeline was able to determine a principal isoform for 83% of a set of well-annotated genes with multiple variants.
Collapse
Affiliation(s)
- Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3762
|
|
3763
|
Nebert DW, Zhang G, Vesell ES. From human genetics and genomics to pharmacogenetics and pharmacogenomics: past lessons, future directions. Drug Metab Rev 2008; 40:187-224. [PMID: 18464043 PMCID: PMC2752627 DOI: 10.1080/03602530801952864] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A brief history of human genetics and genomics is provided, comparing recent progress in those fields with that in pharmacogenetics and pharmacogenomics, which are subsets of genetics and genomics, respectively. Sequencing of the entire human genome, the mapping of common haplotypes of single-nucleotide polymorphisms (SNPs), and cost-effective genotyping technologies leading to genome-wide association (GWA) studies - have combined convincingly in the past several years to demonstrate the requirements needed to separate true associations from the plethora of false positives. While research in human genetics has moved from monogenic to oligogenic to complex diseases, its pharmacogenetics branch has followed, usually a few years behind. The continuous discoveries, even today, of new surprises about our genome cause us to question reviews declaring that "personalized medicine is almost here" or that "individualized drug therapy will soon be a reality." As summarized herein, numerous reasons exist to show that an "unequivocal genotype" or even an "unequivocal phenotype" is virtually impossible to achieve in current limited-size studies of human populations. This problem (of insufficiently stringent criteria) leads to a decrease in statistical power and, consequently, equivocal interpretation of most genotype-phenotype association studies. It remains unclear whether personalized medicine or individualized drug therapy will ever be achievable by means of DNA testing alone.
Collapse
Affiliation(s)
- Daniel W Nebert
- Division of Human Genetics, Department of Pediatrics & Molecular Developmental Biology, Cincinnati, Ohio 45267-0056, USA.
| | | | | |
Collapse
|
3764
|
Graveley BR. The haplo-spliceo-transcriptome: common variations in alternative splicing in the human population. Trends Genet 2008; 24:5-7. [PMID: 18054116 PMCID: PMC2372159 DOI: 10.1016/j.tig.2007.10.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2007] [Revised: 10/19/2007] [Accepted: 10/19/2007] [Indexed: 10/22/2022]
Abstract
Numerous inherited human genetic disorders are caused by defects in pre-mRNA splicing. Two recent studies have added a new twist to the link between genetic variation and pre-mRNA splicing by identifying SNPs that correlate with heritable changes in alternative splicing but do not cause disease. This suggests that allele-specific alternative splicing is a mechanism that accounts for individual variation in the human population.
Collapse
Affiliation(s)
- Brenton R Graveley
- Department of Genetics and Developmental Biology, University of Connecticut Health Center, Farmington, CT 06030-3301, USA.
| |
Collapse
|
3765
|
Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, Kober KM, Miller W, Pedersen JS, Pohl A, Raney BJ, Rhead B, Rosenbloom KR, Smith KE, Stanke M, Thakkapallayil A, Trumbower H, Wang T, Zweig AS, Haussler D, Kent WJ. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res 2008; 36:D773-9. [PMID: 18086701 PMCID: PMC2238835 DOI: 10.1093/nar/gkm966] [Citation(s) in RCA: 405] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Revised: 10/17/2007] [Accepted: 10/17/2007] [Indexed: 01/06/2023] Open
Abstract
The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.edu/.
Collapse
Affiliation(s)
- D Karolchik
- Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3766
|
Abstract
OBJECTIVE To assess the evidence for a genetic basis to magic. DESIGN Literature review. SETTING Harry Potter novels of J K Rowling. PARTICIPANTS Muggles, witches, wizards, and squibs. INTERVENTIONS Limited. MAIN OUTCOME MEASURES Family and twin studies, magical ability, and specific magical skills. RESULTS Magic shows strong evidence of heritability, with familial aggregation and concordance in twins. Evidence suggests magical ability to be a quantitative trait. Specific magical skills, notably being able to speak to snakes, predict the future, and change hair colour, all seem heritable. CONCLUSIONS A multilocus model with a dominant gene for magic might exist, controlled epistatically by one or more loci, possibly recessive in nature. Magical enhancers regulating gene expressionmay be involved, combined with mutations at specific genes implicated in speech and hair colour such as FOXP2 and MCR1.
Collapse
|
3767
|
Torarinsson E, Yao Z, Wiklund ED, Bramsen JB, Hansen C, Kjems J, Tommerup N, Ruzzo WL, Gorodkin J. Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res 2007; 18:242-51. [PMID: 18096747 DOI: 10.1101/gr.6887408] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number of potential RNA structures in the ENCODE regions. We report 6587 candidate regions with an estimated false-positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often quite dramatically. For example, approximately one-quarter of our predicted motifs show revisions in >50% of their aligned positions. Furthermore, our results are strongly complementary to those discovered by sequence-alignment-based approaches--84% of our candidates are not covered by Washietl et al., increasing the number of ncRNA candidates in the ENCODE region by 32%. In a group of 11 ncRNA candidates that were tested by RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue, and most show evidence of significant differential expression across tissues. Our results broadly suggest caution in any analysis relying on multiple sequence alignments in less well-conserved regions, clearly support growing appreciation for the biological significance of ncRNAs, and strongly support the argument for considering RNA structure directly in any searches for these elements.
Collapse
Affiliation(s)
- Elfar Torarinsson
- Section for Genetics and Bioinformatics, IBVH, Faculty of Life Sciences, University of Copenhagen, 1870 Frederiksberg C, Denmark
| | | | | | | | | | | | | | | | | |
Collapse
|
3768
|
Mourier T, Carret C, Kyes S, Christodoulou Z, Gardner PP, Jeffares DC, Pinches R, Barrell B, Berriman M, Griffiths-Jones S, Ivens A, Newbold C, Pain A. Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum. Genome Res 2007; 18:281-92. [PMID: 18096748 PMCID: PMC2203626 DOI: 10.1101/gr.6836108] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
We undertook a genome-wide search for novel noncoding RNAs (ncRNA) in the malaria parasite Plasmodium falciparum. We used the RNAz program to predict structures in the noncoding regions of the P. falciparum 3D7 genome that were conserved with at least one of seven other Plasmodium spp. genome sequences. By using Northern blot analysis for 76 high-scoring predictions and microarray analysis for the majority of candidates, we have verified the expression of 33 novel ncRNA transcripts including four members of a ncRNA family in the asexual blood stage. These transcripts represent novel structured ncRNAs in P. falciparum and are not represented in any RNA databases. We provide supporting evidence for purifying selection acting on the experimentally verified ncRNAs by comparing the nucleotide substitutions in the predicted ncRNA candidate structures in P. falciparum with the closely related chimp malaria parasite P. reichenowi. The high confirmation rate within a single parasite life cycle stage suggests that many more of the predictions may be expressed in other stages of the organism's life cycle.
Collapse
Affiliation(s)
- Tobias Mourier
- Ancient DNA and Evolution Group, Department of Biology, University of Copenhagen, Copenhagen DK-2100, Denmark
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3769
|
Abstract
G-quadruplex or G4 DNA, a four-stranded DNA structure formed in G-rich sequences, has been hypothesized to be a structural motif involved in gene regulation. In this study, we examined the regulatory role of potential G4 DNA motifs (PG4Ms) located in the putative transcriptional regulatory region (TRR, -500 to +500) of genes across the human genome. We found that PG4Ms in the 500-bp region downstream of the annotated transcription start site (TSS; PG4M(D500)) are associated with gene expression. Generally, PG4M(D500)-positive genes are expressed at higher levels than PG4M(D500)-negative genes, and an increased number of PG4M(D500) provides a cumulative effect. This observation was validated by controlling for attributes, including gene family, function, and promoter similarity. We also observed an asymmetric pattern of PG4M(D500) distribution between strands, whereby the frequency of PG4M(D500) in the coding strand is generally higher than that in the template strand. Further analysis showed that the presence of PG4M(D500) and its strand asymmetry are associated with significant enrichment of RNAP II at the putative TRR. On the basis of these results, we propose a model of G4 DNA-mediated stimulation of transcription with the hypothesis that PG4M(D500) contributes to gene transcription by maintaining the DNA in an open conformation, while the asymmetric distribution of PG4M(D500) considerably reduces the probability of blocking the progression of the RNA polymerase complex on the template strand. Our findings provide a comprehensive view of the regulatory function of G4 DNA in gene transcription.
Collapse
|
3770
|
Abstract
In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short reads produced by short read technologies. We present a new Eulerian assembler that generates nearly optimal short read assemblies of bacterial genomes and describe an approach to assemble reads in the case of the popular hybrid protocol when short and long Sanger-based reads are combined.
Collapse
|
3771
|
Bi C, Leeder JS, Vyhlidal CA. A comparative study on computational two-block motif detection: algorithms and applications. Mol Pharm 2007; 5:3-16. [PMID: 18076137 DOI: 10.1021/mp7001126] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Since the completion of human genome sequencing, cataloging of all genomic functional elements has been one of the challenging problems in bioinformatics. Deciphering cis-regulatory elements in the human genome still remains elusive although much effort has been expended. This paper reviews a suite of methods for two-block motif discovery including mathematical modeling, de novo motif-finding based on multiple local alignment, and genomic sequence scanning method for putative sites. We formulate a general method to address this challenge and compare two major existing algorithms (i.e., greedy local search and Gibbs sampling) implemented to solve the popular two-block structured motif discovery issue. We demonstrate how to use this suite of methods and apply them to human nuclear receptor response elements (i.e., protein binding sites of several relevant nuclear receptors, HNF4alpha, CAR/RXR, and PXR/RXR).
Collapse
Affiliation(s)
- Chengpeng Bi
- Bioinformatics and Intelligent Computing, Division of Clinical Pharmacology and Toxicology, Children's Mercy Hospitals and Clinics, 2401 Gillham Road, Kansas City, Missouri 64108, USA.
| | | | | |
Collapse
|
3772
|
Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. Proc Natl Acad Sci U S A 2007; 104:20443-8. [PMID: 18077382 DOI: 10.1073/pnas.0705658104] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
A comprehensive phylogenetic framework is indispensable for investigating the evolution of genomic features in mammals as a whole, and particularly in humans. Using the ENCODE sequence data, we estimated mammalian neutral evolutionary rates and selective pressures acting on conserved coding and noncoding elements. We show that neutral evolutionary rates can be explained by the generation time (GT) hypothesis. Accordingly, primates (especially humans), having longer GTs than other mammals, display slower rates of neutral evolution. The evolution of constrained elements, particularly of nonsynonymous sites, is in agreement with the expectations of the nearly neutral theory of molecular evolution. We show that rates of nonsynonymous substitutions (dN) depend on the population size of a species. The results are robust to the exclusion of hypermutable CpG prone sites. The average rate of evolution in conserved noncoding sequences (CNCs) is 1.7 times higher than in nonsynonymous sites. Despite this, CNCs evolve at similar or even lower rates than nonsynonymous sites in the majority of basal branches of the eutherian tree. This observation could be the result of an overall gradual or, alternatively, lineage-specific relaxation of CNCs. The latter hypothesis was supported by the finding that 3 of the 20 longest CNCs displayed significant relaxation of individual branches. This observation may explain why the evolution of CNCs fits the expectations of the nearly neutral theory less well than the evolution of nonsynonymous sites.
Collapse
|
3773
|
Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b. Genome Res 2007; 18:252-60. [PMID: 18071029 DOI: 10.1101/gr.6929408] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Despite its recognized utility, the extent to which evolutionary sequence conservation-based approaches may systematically overlook functional noncoding sequences remains unclear. We have tiled across sequence encompassing the zebrafish phox2b gene, ultimately evaluating 48 amplicons corresponding to all noncoding sequences therein for enhancer activity in zebrafish. Post hoc analyses of this interval utilizing five commonly used measures of evolutionary constraint (AVID, MLAGAN, SLAGAN, phastCons, WebMCS) demonstrate that each systematically overlooks regulatory sequences. These established algorithms detected only 29%-61% of our identified regulatory elements, consistent with the suggestion that many regulatory sequences may not be readily detected by metrics of sequence constraint. However, we were able to discriminate functional from nonfunctional sequences based upon GC composition and identified position weight matrices (PWM), demonstrating that, in at least one case, deleting sequences containing a subset of these PWMs from one identified regulatory element abrogated its regulatory function. Collectively, these data demonstrate that the noncoding functional component of vertebrate genomes may far exceed estimates predicated on evolutionary constraint.
Collapse
|
3774
|
Kun E, Kirsten E, Hakam A, Bauer PI, Mendeleyev J. Identification of poly(ADP-ribose) polymerase-1 as the OXPHOS-generated ATP sensor of nuclei of animal cells. Biochem Biophys Res Commun 2007; 366:568-73. [PMID: 18073140 DOI: 10.1016/j.bbrc.2007.12.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2007] [Accepted: 12/03/2007] [Indexed: 11/19/2022]
Abstract
Our results show that in the intact normal animal cell mitochondrial ATP is directly connected to nuclear PARP-1 by way of a specific adenylate kinase enzymatic path. This mechanism is demonstrated in two models: (a) by its inhibition with a specific inhibitor of adenylate kinase, and (b) by disruption of ATP synthesis through uncoupling of OXPHOS. In each instance the de-inhibited PARP-1 is quantitatively determined by enzyme kinetics. The nuclear binding site of PARP-1 is Topo I, and is identified as a critical "switchpoint" indicating the nuclear element that connects OXPHOS with mRNA synthesis in real time. The mitochondrial-nuclear PARP-1 pathway is not operative in cancer cells.
Collapse
Affiliation(s)
- Ernest Kun
- UCSF Helen Diller Family Comprehensive Cancer Center, Department of Anatomy, University of California, School of Medicine, San Francisco Medical Center, San Francisco, CA 94143, USA
| | | | | | | | | |
Collapse
|
3775
|
Abstract
Biliary cancer comprise carcinoma of the gallbladder as well as the intrahepatic, hilar and extrahepatic bile ducts. Furthermore, many different etiologies and risk factors are contributing to the inhomogeneity of this disease. It is often diagnosed at an advanced stage when potentially curative resection is not feasible. Due to the lack of randomised Phase III studies, there is no standard regimen for chemotherapy in biliary cancer. Recent investigations into the underlying molecular mechanisms involved in biliary carcinogenesis and tumour growth have contributed greatly to our understanding of biliary cancer. Through a better understanding of these mechanisms, improved and more specific diagnostic, therapeutic and preventive strategies may be developed. Although fluoropyrimidines and gemcitabine remain the backbone of routine chemotherapy in advanced disease, new agents such as epidermal growth factor receptor blockers and angiogenesis inhibitors may hold promise for improving the outcome for patients with biliary cancer.
Collapse
Affiliation(s)
- Florian Eckel
- Technical University of Munich, Department of Internal Medicine, Klinikum rechts der Isar, Ismaninger Strasse 22, 81675 Munich, Germany.
| | | |
Collapse
|
3776
|
Multivalent engagement of chromatin modifications by linked binding modules. Nat Rev Mol Cell Biol 2007; 8:983-94. [PMID: 18037899 DOI: 10.1038/nrm2298] [Citation(s) in RCA: 801] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Various chemical modifications on histones and regions of associated DNA play crucial roles in genome management by binding specific factors that, in turn, serve to alter the structural properties of chromatin. These so-called effector proteins have typically been studied with the biochemist's paring knife--the capacity to recognize specific chromatin modifications has been mapped to an increasing number of domains that frequently appear in the nuclear subset of the proteome, often present in large, multisubunit complexes that bristle with modification-dependent binding potential. We propose that multivalent interactions on a single histone tail and beyond may have a significant, if not dominant, role in chromatin transactions.
Collapse
|
3777
|
Bina M. The genome browser at UCSC for locating genes, and much more! Mol Biotechnol 2007; 38:269-75. [PMID: 18058261 DOI: 10.1007/s12033-007-9019-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 11/06/2007] [Indexed: 11/24/2022]
Abstract
For beginners in the field, this review highlights the key features of the genome browser at UCSC for data display, and provides nearly step-by-step procedures for creating publication quality maps. The browser offers an engine (Blat) for searching a known genomic DNA for correspondence with protein and DNA sequences specified by the user. The results provide links to graphical displays, known as maps. Users can create "designer maps" by adding Tracks to view various types of data and specific landmarks. The browser offers an extensive list of options. They include the position of annotated genes, the position of reference cDNA sequences (RefSeq from GenBank), the position of alternatively spliced mRNA species, and predictions derived from computational models to identify potential transcription start sites and potential protein binding elements in genomic DNA. Several tracks can be tailored for comparative genomics. The browser also offers tracks for displaying large-scale experimental data including gene expression profiles, exon chips, and single-nucleotide-polymorphisms.
Collapse
Affiliation(s)
- Minou Bina
- Department of Chemistry, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
3778
|
Clamp M, Fry B, Kamal M, Xie X, Cuff J, Lin MF, Kellis M, Lindblad-Toh K, Lander ES. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci U S A 2007; 104:19428-33. [PMID: 18040051 PMCID: PMC2148306 DOI: 10.1073/pnas.0709013104] [Citation(s) in RCA: 381] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Indexed: 11/18/2022] Open
Abstract
Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of approximately 24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs-specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to approximately 20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.
Collapse
Affiliation(s)
- Michele Clamp
- *Broad Institute of Massachusetts Institute of Technology and Harvard, 7 Cambridge Center, Cambridge, MA 02142
| | - Ben Fry
- *Broad Institute of Massachusetts Institute of Technology and Harvard, 7 Cambridge Center, Cambridge, MA 02142
| | - Mike Kamal
- *Broad Institute of Massachusetts Institute of Technology and Harvard, 7 Cambridge Center, Cambridge, MA 02142
| | - Xiaohui Xie
- *Broad Institute of Massachusetts Institute of Technology and Harvard, 7 Cambridge Center, Cambridge, MA 02142
| | - James Cuff
- *Broad Institute of Massachusetts Institute of Technology and Harvard, 7 Cambridge Center, Cambridge, MA 02142
| | - Michael F. Lin
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Manolis Kellis
- *Broad Institute of Massachusetts Institute of Technology and Harvard, 7 Cambridge Center, Cambridge, MA 02142
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Kerstin Lindblad-Toh
- *Broad Institute of Massachusetts Institute of Technology and Harvard, 7 Cambridge Center, Cambridge, MA 02142
| | - Eric S. Lander
- *Broad Institute of Massachusetts Institute of Technology and Harvard, 7 Cambridge Center, Cambridge, MA 02142
- Department of Biology and
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142; and
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
3779
|
St. Laurent G, Wahlestedt C. Noncoding RNAs: couplers of analog and digital information in nervous system function? Trends Neurosci 2007; 30:612-21. [DOI: 10.1016/j.tins.2007.10.002] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Revised: 10/03/2007] [Accepted: 10/04/2007] [Indexed: 12/14/2022]
|
3780
|
Mason CE, Seringhaus MR, Sattler de Sousa e Brito C. Personalized genomic medicine with a patchwork, partially owned genome. THE YALE JOURNAL OF BIOLOGY AND MEDICINE 2007; 80:145-51. [PMID: 18449389 PMCID: PMC2347364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
"His book was known as the Book of Sand, because neither the book nor the sand have any beginning or end." - Jorge Luis BorgesThe human genome is a three billion-letter recipe for the genesis of a human being, directing development from a single-celled embryo to the trillions of adult cells. Since the sequencing of the human genome was announced in 2001, researchers have an increased ability to discern the genetic basis for diseases. This reference genome has opened the door to genomic medicine, aimed at detecting and understanding all genetic variations of the human genome that contribute to the manifestation and progression of disease. The overarching vision of genomic (or "personalized") medicine is to custom-tailor each treatment for maximum effectiveness in an individual patient. Detecting the variation in a patient's deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and protein structures is no longer an insurmountable hurdle. Today, the challenge for genomic medicine lies in contextualizing those myriad genetic variations in terms of their functional consequences for a person's health and development throughout life and in terms of that patient's susceptibility to disease and differential clinical responses to medication. Additionally, several recent developments have complicated our understanding of the nominal human genome and, thereby, altered the progression of genomic medicine. In this brief review, we shall focus on these developments and examine how they are changing our understanding of our genome.
Collapse
Affiliation(s)
- Christopher E. Mason
- Program on Neurogenetics, Yale University Medical School, New Haven, Connecticut,Information Society Project, Yale Law School, New Haven, Connecticut,To whom all correspondence should be addressed: Christopher E. Mason, Department of Genetics, Yale University Medical School, 300 Cedar Street, New Haven, CT 06511; E-mail:
| | | | | |
Collapse
|
3781
|
Laje G, McMahon FJ. The pharmacogenetics of major depression: past, present, and future. Biol Psychiatry 2007; 62:1205-7. [PMID: 17949692 DOI: 10.1016/j.biopsych.2007.09.016] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Accepted: 09/19/2007] [Indexed: 12/21/2022]
Affiliation(s)
- Gonzalo Laje
- Genetic Basis of Mood and Anxiety Disorders Unit, Mood and Anxiety Program, National Institute of Mental Health, Bethesda, MD 20892, USA
| | | |
Collapse
|
3782
|
Miller W, Rosenbloom K, Hardison RC, Hou M, Taylor J, Raney B, Burhans R, King DC, Baertsch R, Blankenberg D, Kosakovsky Pond SL, Nekrutenko A, Giardine B, Harris RS, Tyekucheva S, Diekhans M, Pringle TH, Murphy WJ, Lesk A, Weinstock GM, Lindblad-Toh K, Gibbs RA, Lander ES, Siepel A, Haussler D, Kent WJ. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome Res 2007; 17:1797-808. [PMID: 17984227 PMCID: PMC2099589 DOI: 10.1101/gr.6761107] [Citation(s) in RCA: 214] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 08/30/2007] [Indexed: 01/17/2023]
Abstract
This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.
Collapse
Affiliation(s)
- Webb Miller
- Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania 16802, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3783
|
Wilkins JM, Loughlin J, Snelling SJB. Osteoarthritis genetics: current status and future prospects. ACTA ACUST UNITED AC 2007. [DOI: 10.2217/17460816.2.6.607] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
3784
|
Stevens JC, Banks GT, Festing MFW, Fisher EMC. Quiet mutations in inbred strains of mice. Trends Mol Med 2007; 13:512-9. [PMID: 17981508 DOI: 10.1016/j.molmed.2007.10.001] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Revised: 09/27/2007] [Accepted: 10/01/2007] [Indexed: 11/22/2022]
Abstract
The year 2009 is the 100th anniversary of the founding of the first inbred strain of mouse, called DBA. During the last 100 years, inbred strains have proved their value for biomedical research and the number of such strains has mushroomed to over 450, each with different genotypic and phenotypic characteristics and useful for the study of disease and normal function. However, although inbred strains are stable, they are not fixed entities and researchers need to be aware of the phenomena of new mutations and of genetic drift, which occur within all mouse colonies. If the mutations are what we term in this review 'quiet mutations', then they might result in rather unexpected and sometimes tremendously valuable results. Here, we discuss these phenomena and look at how new genomic technologies might help us to detect 'quiet mutations' and use them to our advantage.
Collapse
Affiliation(s)
- James C Stevens
- Department of Neurodegenerative Disease, Institute of Neurology, University College London, Queen Square, London WC1N 3BG, UK
| | | | | | | |
Collapse
|
3785
|
Brown BD, Gentner B, Cantore A, Colleoni S, Amendola M, Zingale A, Baccarini A, Lazzari G, Galli C, Naldini L. Endogenous microRNA can be broadly exploited to regulate transgene expression according to tissue, lineage and differentiation state. Nat Biotechnol 2007; 25:1457-67. [PMID: 18026085 DOI: 10.1038/nbt1372] [Citation(s) in RCA: 440] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Accepted: 11/04/2007] [Indexed: 12/19/2022]
Abstract
We have shown previously that transgene expression can be suppressed in hematopoietic cells using vectors that are responsive to microRNA (miRNA) regulation. Here we investigate the potential of this approach for more sophisticated control of transgene expression. Analysis of the relationship between miRNA expression levels and target mRNA suppression suggested that suppression depends on a threshold miRNA concentration. Using this information, we generated vectors that rapidly adjust transgene expression in response to changes in miRNA expression. These vectors sharply segregated transgene expression between closely related states of therapeutically relevant cells, including dendritic cells, hematopoietic and embryonic stem cells, and their progeny, allowing positive/negative selection according to the cells' differentiation state. Moreover, two miRNA target sites were combined to restrict transgene expression to a specific cell type in the liver. Notably, the vectors did not detectably perturb endogenous miRNA expression or regulation of natural targets. The properties of miRNA-regulated vectors should allow for safer and more effective therapeutic applications.
Collapse
Affiliation(s)
- Brian D Brown
- San Raffaele Telethon Institute for Gene Therapy, San Raffaele Scientific Institute, via Olgettina, 58, 20132 Milan, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3786
|
Acevedo LG, Iniguez AL, Holster HL, Zhang X, Green R, Farnham PJ. Genome-scale ChIP-chip analysis using 10,000 human cells. Biotechniques 2007; 43:791-7. [PMID: 18251256 PMCID: PMC2268896 DOI: 10.2144/000112625] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The technique of chromatin immunoprecipitation (ChIP) is a powerful method for identifying in vivo DNA binding sites of transcription factors and for studying chromatin modifications. Unfortunately, the large number of cells needed for the standard ChIP protocol has hindered the analysis of many biologically interesting cell populations that are difficult to obtain in large numbers. New ChIP methods involving the use of carrier chromatin have been developed that allow the one-gene-at-a-time analysis of very small numbers of cells. However such methods are not useful if the resultant sample will be applied to genomic microarrays or used in ChIP-sequencing assays. Therefore, we have miniaturized the ChIP protocol such that as few as 10,000 cells (without the addition of carrier reagents) can be used to obtain enough sample material to analyze the entire human genome. We demonstrate the reproducibility of this MicroChIP technique using 2.1 million feature high-density oligonucleotide arrays and antibodies to RNA polymerase II and to histone H3 trimethylated on lysine 27 or lysine 9.
Collapse
Affiliation(s)
- Luis G. Acevedo
- Cell and Molecular Biology Program, University of Wisconsin, Madison, WI
| | | | | | | | | | - Peggy J. Farnham
- Department of Pharmacology and the Genome Center, University of California-Davis, Davis, CA
| |
Collapse
|
3787
|
Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CLG, Davis C, Ewing B, Oommen S, Lau C, Yu HC, Li J, Roe BA, Green P, Gerhard DS, Temple G, Haussler D, Brent MR. Targeted discovery of novel human exons by comparative genomics. Genome Res 2007; 17:1763-73. [PMID: 17989246 PMCID: PMC2099585 DOI: 10.1101/gr.7128207] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2007] [Accepted: 10/15/2007] [Indexed: 01/20/2023]
Abstract
A complete and accurate set of human protein-coding gene annotations is perhaps the single most important resource for genomic research after the human-genome sequence itself, yet the major gene catalogs remain incomplete and imperfect. Here we describe a genome-wide effort, carried out as part of the Mammalian Gene Collection (MGC) project, to identify human genes not yet in the gene catalogs. Our approach was to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT-PCR. We have identified 734 novel gene fragments (NGFs) containing 2188 exons with, at most, weak prior cDNA support. These NGFs correspond to an estimated 563 distinct genes, of which >160 are completely absent from the major gene catalogs, while hundreds of others represent significant extensions of known genes. The NGFs appear to be predominantly protein-coding genes rather than noncoding RNAs, unlike novel transcribed sequences identified by technologies such as tiling arrays and CAGE. They tend to be expressed at low levels and in a tissue-specific manner, and they are enriched for roles in motor activity, cell adhesion, connective tissue, and central nervous system development. Our results demonstrate that many important genes and gene fragments have been missed by traditional approaches to gene discovery but can be identified by their evolutionary signatures using comparative sequence data. However, they suggest that hundreds-not thousands-of protein-coding genes are completely missing from the current gene catalogs.
Collapse
Affiliation(s)
- Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3788
|
Abstract
This paper is a response to the increasing difficulty biologists find in agreeing upon a definition of the gene, and indeed, the increasing disarray in which that concept finds itself. After briefly reviewing these problems, we propose an alternative to both the concept and the word gene—an alternative that, like the gene, is intended to capture the essence of inheritance, but which is both richer and more expressive. It is also clearer in its separation of what the organism statically is (what it tangibly inherits) and what it dynamically does (its functionality and behavior). Our proposal of a genetic functor, or genitor, is a sweeping extension of the classical genotype/phenotype paradigm, yet it appears to be faithful to the findings of contemporary biology, encompassing many of the recently emerging—and surprisingly complex—links between structure and functionality.
Collapse
Affiliation(s)
- Evelyn Fox Keller
- Program in Science, Technology, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - David Harel
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
3789
|
Abstract
In this issue of Cell, Guenther et al. (2007) analyze the presence of chromatin marks and RNA polymerase at transcription start sites in the human genome. Their results reveal that many "inactive" genes harbor histone marks associated with active transcription at their 5' ends and that although these genes initiate transcription, they do not generate full-length transcripts.
Collapse
Affiliation(s)
- Matthew C Lorincz
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z3, Canada.
| | | |
Collapse
|
3790
|
Yu J, Vodyanik MA, Smuga-Otto K, Antosiewicz-Bourget J, Frane JL, Tian S, Nie J, Jonsdottir GA, Ruotti V, Stewart R, Slukvin II, Thomson JA. Induced pluripotent stem cell lines derived from human somatic cells. Science 2007; 318:1917-20. [PMID: 18029452 DOI: 10.1126/science.1151526] [Citation(s) in RCA: 7209] [Impact Index Per Article: 400.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Somatic cell nuclear transfer allows trans-acting factors present in the mammalian oocyte to reprogram somatic cell nuclei to an undifferentiated state. We show that four factors (OCT4, SOX2, NANOG, and LIN28) are sufficient to reprogram human somatic cells to pluripotent stem cells that exhibit the essential characteristics of embryonic stem (ES) cells. These induced pluripotent human stem cells have normal karyotypes, express telomerase activity, express cell surface markers and genes that characterize human ES cells, and maintain the developmental potential to differentiate into advanced derivatives of all three primary germ layers. Such induced pluripotent human cell lines should be useful in the production of new disease models and in drug development, as well as for applications in transplantation medicine, once technical limitations (for example, mutation through viral integration) are eliminated.
Collapse
Affiliation(s)
- Junying Yu
- Genome Center of Wisconsin, Madison, WI 53706-1580, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3791
|
Abstract
Epigenetic research aims to understand heritable gene regulation that is not directly encoded in the DNA sequence. Epigenetic mechanisms such as DNA methylation and histone modifications modulate the packaging of the DNA in the nucleus and thereby influence gene expression. Patterns of epigenetic information are faithfully propagated over multiple cell divisions, which makes epigenetic regulation a key mechanism for cellular differentiation and cell fate decisions. In addition, incomplete erasure of epigenetic information can lead to complex patterns of non-Mendelian inheritance. Stochastic and environment-induced epigenetic defects are known to play a major role in cancer and ageing, and they may also contribute to mental disorders and autoimmune diseases. Recent technical advances such as ChIP-on-chip and ChIP-seq have started to convert epigenetic research into a high-throughput endeavor, to which bioinformatics is expected to make significant contributions. Here, we review pioneering computational studies that have contributed to epigenetic research. In addition, we give a brief introduction into epigenetics-targeted at bioinformaticians who are new to the field-and we outline future challenges in computational epigenetics.
Collapse
Affiliation(s)
- Christoph Bock
- Max-Planck-Institut für Informatik, Saarbrücken, Germany.
| | | |
Collapse
|
3792
|
Bryne JC, Valen E, Tang MHE, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res 2007; 36:D102-6. [PMID: 18006571 PMCID: PMC2238834 DOI: 10.1093/nar/gkm955] [Citation(s) in RCA: 526] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
JASPAR is a popular open-access database for matrix models describing DNA-binding preferences for transcription factors and other DNA patterns. With its third major release, JASPAR has been expanded and equipped with additional functions aimed at both casual and power users. The heart of the JASPAR database-the JASPAR CORE sub-database-has increased by 12% in size, and three new specialized sub-databases have been added. New functions include clustering of matrix models by similarity, generation of random matrices by sampling from selected sets of existing models and a language-independent Web Service applications programming interface for matrix retrieval. JASPAR is available at http://jaspar.genereg.net.
Collapse
Affiliation(s)
- Jan Christian Bryne
- Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | | | | | | | | | | | | | | | | |
Collapse
|
3793
|
Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S. Analysis of sequence conservation at nucleotide resolution. PLoS Comput Biol 2007; 3:e254. [PMID: 18166073 PMCID: PMC2230682 DOI: 10.1371/journal.pcbi.0030254] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 11/13/2007] [Indexed: 12/02/2022] Open
Abstract
One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved “chunks.” Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence. The structure of the human genome remains largely unknown, including which parts of the genome are functionally relevant and which parts are “junk.” The availability of genomic sequence from a large number of mammals allows a more detailed exploration of this structure, using comparison of related sequences from different species to identify portions of the genome that have remained unchanged, conserved by the action of natural selection, and thus likely to be functionally significant. To date, most efforts focused on localizing the functional fraction of the human genome have been based on identifying contiguous stretches of positions conserved in multiple species. Here, we present an analysis that is based instead on a single-position measure of conservation called SCONE. Our analysis suggests that the majority of conserved and putatively functional positions are highly fragmented and lie outside contiguous regions of conserved sequence. A subset of these fragmented positions may be identified based on local clustering.
Collapse
Affiliation(s)
- Saurabh Asthana
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Mikhail Roytberg
- Computational Biology Group, Institute of Mathematical Problems in Biology, Russian Academy of Sciences, Pushchino, Russia
| | - John Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- * To whom correspondence should be addressed. E-mail: (SS), (JS)
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail: (SS), (JS)
| |
Collapse
|
3794
|
Abstract
Chromosomal inversions have an important role in evolution, and an increasing number of inversion polymorphisms are being identified in the human population. The evolutionary history of these inversions and the mechanisms by which they arise are therefore of significant interest. Previously, a polymorphic inversion on human chromosome Xq28 that includes the FLNA and EMD loci was discovered and hypothesized to have been the result of nonallelic homologous recombination (NAHR) between near-identical inverted duplications flanking this region. Here, we carried out an in-depth study of the orthologous region in 27 additional eutherians and report that this inversion is not specific to humans, but has occurred independently and repeatedly at least 10 times in multiple eutherian lineages. Moreover, inverted duplications flank the FLNA-EMD region in all 16 species for which high-quality sequence assemblies are available. Based on detailed sequence analyses, we propose a model in which the observed inverted duplications originated from a common duplication event that predates the eutherian radiation. Subsequent gene conversion homogenized the duplications, thereby providing a continuous substrate for NAHR that led to the recurrent inversion of this segment of the genome. These results provide an extreme example in support of the evolutionary breakpoint reusage hypothesis and point out that some near-identical human segmental duplications may, in fact, have originated >100 million years ago.
Collapse
|
3795
|
Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc Natl Acad Sci U S A 2007; 104:18613-8. [PMID: 18003932 DOI: 10.1073/pnas.0703637104] [Citation(s) in RCA: 300] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The evolutionary forces that establish and hone target gene networks of transcription factors are largely unknown. Transposition of retroelements may play a role, but its global importance, beyond a few well described examples for isolated genes, is not clear. We report that LTR class I endogenous retrovirus (ERV) retroelements impact considerably the transcriptional network of human tumor suppressor protein p53. A total of 1,509 of approximately 319,000 human ERV LTR regions have a near-perfect p53 DNA binding site. The LTR10 and MER61 families are particularly enriched for copies with a p53 site. These ERV families are primate-specific and transposed actively near the time when the New World and Old World monkey lineages split. Other mammalian species lack these p53 response elements. Analysis of published genomewide ChIP data for p53 indicates that more than one-third of identified p53 binding sites are accounted for by ERV copies with a p53 site. ChIP and expression studies for individual genes indicate that human ERV p53 sites are likely part of the p53 transcriptional program and direct regulation of p53 target genes. These results demonstrate how retroelements can significantly shape the regulatory network of a transcription factor in a species-specific manner.
Collapse
|
3796
|
Wilming LG, Gilbert JGR, Howe K, Trevanion S, Hubbard T, Harrow JL. The vertebrate genome annotation (Vega) database. Nucleic Acids Res 2007; 36:D753-60. [PMID: 18003653 PMCID: PMC2238886 DOI: 10.1093/nar/gkm987] [Citation(s) in RCA: 161] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) was first made public in 2004 and has been designed to view manual annotation of human, mouse and zebrafish genomic sequences produced at the Wellcome Trust Sanger Institute. Since its initial release, the number of human annotated loci has more than doubled to close to 33 000 and now contains comprehensive annotation on 20 of the 24 human chromosomes, four whole mouse chromosomes and around 40% of the zebrafish Danio rerio genome. In addition, we offer manual annotation of a number of haplotype regions in mouse and human and regions of comparative interest in pig and dog that are unique to Vega.
Collapse
Affiliation(s)
- L G Wilming
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | | | | | | | | | | |
Collapse
|
3797
|
Yu X, Lin J, Zack DJ, Qian J. Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors. BMC Bioinformatics 2007; 8:437. [PMID: 17996093 PMCID: PMC2194798 DOI: 10.1186/1471-2105-8-437] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2007] [Accepted: 11/09/2007] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation. RESULTS The conservation-independent approach is based on an empirical potential energy between interacting transcription factors (TFs). In this analysis, the potential energy is defined as a function of the number of TF interactions in a genomic region and the strength of the interactions. By identifying sets of interacting TFs, the analysis locates regions enriched with the binding sites of these interacting TFs. We applied this approach to 30 human tissues and identified 6232 putative cis-regulatory modules (CRMs) regulating 2130 tissue-specific genes. Interestingly, some genes appear to be regulated by different CRMs in different tissues. Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions. We also find that conserved and non-conserved CRMs regulate distinct gene groups. Conserved CRMs control more essential genes and genes involved in fundamental cellular activities such as transcription. In contrast, non-conserved CRMs, in general, regulate more non-essential genes, such as genes related to neural activity. CONCLUSION These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation.
Collapse
Affiliation(s)
- Xueping Yu
- Wilmer Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.
| | | | | | | |
Collapse
|
3798
|
Lis JT. Imaging Drosophila gene activation and polymerase pausing in vivo. Nature 2007; 450:198-202. [PMID: 17994086 DOI: 10.1038/nature06324] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2007] [Accepted: 09/28/2007] [Indexed: 01/15/2023]
Abstract
Since the early 1960s, imaging studies of Drosophila sp. polytene chromosomes have provided unique views of gene transcription in vivo. The dramatic changes in chromatin structure that accompany gene activation can be visualized as chromosome puffs. Now, live-cell imaging techniques coupled with protein-DNA crosslinking assays on a genome-wide scale allow more detailed mechanistic questions to be addressed and are prompting the re-evaluation of models of transcription regulation in both Drosophila and mammals.
Collapse
Affiliation(s)
- John T Lis
- Molecular Biology and Genetics, 416 Biotechnology Building, Cornell University, Ithaca, New York 14853, USA.
| |
Collapse
|
3799
|
Stark A, Lin MF, Kheradpour P, Pedersen JS, Parts L, Carlson JW, Crosby MA, Rasmussen MD, Roy S, Deoras AN, Ruby JG, Brennecke J, Harvard FlyBase curators, Berkeley Drosophila Genome Project, Hodges E, Hinrichs AS, Caspi A, Paten B, Park SW, Han MV, Maeder ML, Polansky BJ, Robson BE, Aerts S, van Helden J, Hassan B, Gilbert DG, Eastman DA, Rice M, Weir M, Hahn MW, Park Y, Dewey CN, Pachter L, Kent WJ, Haussler D, Lai EC, Bartel DP, Hannon GJ, Kaufman TC, Eisen MB, Clark AG, Smith D, Celniker SE, Gelbart WM, Kellis M. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 2007; 450:219-32. [PMID: 17994088 PMCID: PMC2474711 DOI: 10.1038/nature06340] [Citation(s) in RCA: 468] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2007] [Accepted: 10/04/2007] [Indexed: 12/25/2022]
Abstract
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.
Collapse
Affiliation(s)
- Alexander Stark
- The Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02140, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3800
|
Rose D, Hackermüller J, Washietl S, Reiche K, Hertel J, Findeiß S, Stadler PF, Prohaska SJ. Computational RNomics of drosophilids. BMC Genomics 2007; 8:406. [PMID: 17996037 PMCID: PMC2216035 DOI: 10.1186/1471-2164-8-406] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2007] [Accepted: 11/08/2007] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. RESULTS We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al., EMBO J. 26: 79-89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. CONCLUSION The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech. 23: 1383-1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals.
Collapse
Affiliation(s)
- Dominic Rose
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
| | - Jörg Hackermüller
- Fraunhofer Institute for Cell Therapy and Immunology, Deutscher Platz 5e, Leipzig, Germany, D-04103
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
| | - Stefan Washietl
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17,Wien, Austria, A-1090
| | - Kristin Reiche
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
| | - Jana Hertel
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17,Wien, Austria, A-1090
| | - Sven Findeiß
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
- Fraunhofer Institute for Cell Therapy and Immunology, Deutscher Platz 5e, Leipzig, Germany, D-04103
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17,Wien, Austria, A-1090
- Santa Fe Institute,1399 Hyde Park Rd., Santa Fe, USA, NM 87501
| | - Sonja J Prohaska
- Biomedical Informatics, Arizona State University, Tempe, PO-Box 878809, USA, AZ 85287
| |
Collapse
|