3151
|
Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M, Anxolabehere D. Combined evidence annotation of transposable elements in genome sequences. PLoS Comput Biol 2005; 1:166-75. [PMID: 16110336 PMCID: PMC1185648 DOI: 10.1371/journal.pcbi.0010022] [Citation(s) in RCA: 267] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2005] [Accepted: 06/30/2005] [Indexed: 11/18/2022] Open
Abstract
Transposable elements (TEs) are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated “TE models” in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1), and we found a substantially higher number of TEs (n = 6,013) than previously identified (n = 1,572). Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1). We also estimated that 518 TE copies (8.6%) are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other species in the genus Drosophila. A first step in adding value to the large-scale DNA sequences generated by genome projects is the process of annotation—marking biological features on the raw string of adenines, cytosines, guanines, and thymines. The predominant goal in genome annotation thus far has been to identify gene sequences that encode proteins; however, many functional sequences exist in non-protein-coding regions and their annotation remains incomplete. Mobile, repetitive DNA segments known as transposable elements (TEs) are one class of functional sequence in non-protein-coding regions, which can make up large fractions of genome sequences (e.g., about 45% in the human) and can play important roles in gene and chromosome structure and regulation. As a consequence, there has been increasing interest in the computational identification of TEs in genome sequences. Borrowing current ideas from the field of gene annotation, the authors have developed a pipeline to predict TEs in genome sequences that combines multiple sources of evidence from different computational methods. The authors' combined-evidence pipeline represents an important step towards raising the standards of TE annotation to the same quality as that of genes, and should help catalyze their understanding of the biological role of these fascinating sequences.
Collapse
Affiliation(s)
- Hadi Quesneville
- Laboratoire Dynamique du Génome et Evolution, Institut Jacques Monod, Paris, France.
| | | | | | | | | | | | | |
Collapse
|
3152
|
Makeyev AV, Kim CB, Ruddle FH, Enkhmandakh B, Erdenechimeg L, Bayarsaihan D. HnRNP A3 genes and pseudogenes in the vertebrate genomes. ACTA ACUST UNITED AC 2005; 303:259-71. [PMID: 15776420 DOI: 10.1002/jez.a.164] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The hnRNP A/B type proteins are abundant nuclear factors that bind to Pol II transcripts and are involved in numerous RNA-related activities. To date most data on the hnRNP A/B family have been obtained with recombinant proteins and cell cultures. Further characterization can result from an examination of the impact of various modifications in intact functional loci; however, such characterization is hampered by the presence of numerous and widely dispersed hnRNP A/B-related sequences in the mammalian genome. We have found hnRNP A3, a poorly recognized member of the hnRNP A/B family, among candidate transcription factors that interact with the regulatory region of the Hoxc8 gene and screened the human and mouse genomes for genes that encode hnRNP A3. We demonstrate that the sequence reported previously as the human hnRNP A3 gene (Accession number S63912) and located on 10p11.1 belongs to a processed pseudogene of the functional intron-containing locus HNRPA3, which we have identified on 2q31.2. We have also identified its murine orthologs on mouse chromosome 2D and rat chromosome 3q23. Alternative splices were revealed at the N-terminus and in the middle of hnRNP A3. 14 and 28 additional loci in the human and mouse genome, respectively, were mapped and identified as hnRNP A3 processed pseudogenes. In addition, we have found and compared hnRNP A3 orthologous genes in Gallus gallus, Xenopus tropicalis, and Danio rerio. The present in silico analysis serves as a necessary step toward a further functional characterization of hnRNP A3.
Collapse
Affiliation(s)
- Aleksandr V Makeyev
- Department of Genetics and Development, Columbia University, NYC, NY 10032, USA
| | | | | | | | | | | |
Collapse
|
3153
|
Magness CL, Fellin PC, Thomas MJ, Korth MJ, Agy MB, Proll SC, Fitzgibbon M, Scherer CA, Miner DG, Katze MG, Iadonato SP. Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human. Genome Biol 2005; 6:R60. [PMID: 15998449 PMCID: PMC1175991 DOI: 10.1186/gb-2005-6-7-r60] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2005] [Revised: 04/04/2005] [Accepted: 05/23/2005] [Indexed: 11/17/2022] Open
Abstract
We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.
Collapse
Affiliation(s)
- Charles L Magness
- Illumigen Biosciences Inc., Suite 450, 2203 Airport Way South, Seattle, WA 98134, USA
| | - P Campion Fellin
- Illumigen Biosciences Inc., Suite 450, 2203 Airport Way South, Seattle, WA 98134, USA
| | - Matthew J Thomas
- Department of Microbiology, University of Washington, Seattle, WA 98195-8070, USA
| | - Marcus J Korth
- Department of Microbiology, University of Washington, Seattle, WA 98195-8070, USA
| | - Michael B Agy
- Washington National Primate Research Center, University of Washington, Seattle, WA 98195-8070, USA
| | - Sean C Proll
- Department of Microbiology, University of Washington, Seattle, WA 98195-8070, USA
| | - Matthew Fitzgibbon
- Department of Microbiology, University of Washington, Seattle, WA 98195-8070, USA
| | - Christina A Scherer
- Illumigen Biosciences Inc., Suite 450, 2203 Airport Way South, Seattle, WA 98134, USA
| | - Douglas G Miner
- Illumigen Biosciences Inc., Suite 450, 2203 Airport Way South, Seattle, WA 98134, USA
| | - Michael G Katze
- Department of Microbiology, University of Washington, Seattle, WA 98195-8070, USA
- Washington National Primate Research Center, University of Washington, Seattle, WA 98195-8070, USA
| | - Shawn P Iadonato
- Illumigen Biosciences Inc., Suite 450, 2203 Airport Way South, Seattle, WA 98134, USA
| |
Collapse
|
3154
|
Wernersson R, Schierup MH, Jørgensen FG, Gorodkin J, Panitz F, Stærfeldt HH, Christensen OF, Mailund T, Hornshøj H, Klein A, Wang J, Liu B, Hu S, Dong W, Li W, Wong GKS, Yu J, Wang J, Bendixen C, Fredholm M, Brunak S, Yang H, Bolund L. Pigs in sequence space: a 0.66X coverage pig genome survey based on shotgun sequencing. BMC Genomics 2005; 6:70. [PMID: 15885146 PMCID: PMC1142312 DOI: 10.1186/1471-2164-6-70] [Citation(s) in RCA: 244] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2004] [Accepted: 05/10/2005] [Indexed: 02/01/2023] Open
Abstract
Background Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results We have generated ~3.84 million shotgun sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human. Conclusion The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.
Collapse
Affiliation(s)
- Rasmus Wernersson
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Mikkel H Schierup
- Bioinformatics Research Center, University of Aarhus, Aarhus, Denmark
| | - Frank G Jørgensen
- Bioinformatics Research Center, University of Aarhus, Aarhus, Denmark
| | - Jan Gorodkin
- Division of Genetics, The Royal Veterinary and Agricultural University, Copenhagen, Denmark
| | - Frank Panitz
- Department of Animal Breeding and Genetics, Danish Institute of Agricultural Sciences, Foulum, Denmark
| | - Hans-Henrik Stærfeldt
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Ole F Christensen
- Bioinformatics Research Center, University of Aarhus, Aarhus, Denmark
| | - Thomas Mailund
- Bioinformatics Research Center, University of Aarhus, Aarhus, Denmark
| | - Henrik Hornshøj
- Department of Animal Breeding and Genetics, Danish Institute of Agricultural Sciences, Foulum, Denmark
| | - Ami Klein
- Division of Genetics, The Royal Veterinary and Agricultural University, Copenhagen, Denmark
| | - Jun Wang
- Institute of Human Genetics, University of Aarhus, Aarhus, Denmark
- Beijing Genomics Institute, Beijing, China
| | - Bin Liu
- Beijing Genomics Institute, Beijing, China
| | | | - Wei Dong
- Beijing Genomics Institute, Beijing, China
| | - Wei Li
- Beijing Genomics Institute, Beijing, China
| | | | - Jun Yu
- Beijing Genomics Institute, Beijing, China
| | - Jian Wang
- Beijing Genomics Institute, Beijing, China
| | - Christian Bendixen
- Department of Animal Breeding and Genetics, Danish Institute of Agricultural Sciences, Foulum, Denmark
| | - Merete Fredholm
- Division of Genetics, The Royal Veterinary and Agricultural University, Copenhagen, Denmark
| | - Søren Brunak
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | | | - Lars Bolund
- Institute of Human Genetics, University of Aarhus, Aarhus, Denmark
- Beijing Genomics Institute, Beijing, China
| |
Collapse
|
3155
|
Bishop AL, Baker S, Jenks S, Fookes M, Gaora PO, Pickard D, Anjum M, Farrar J, Hien TT, Ivens A, Dougan G. Analysis of the hypervariable region of the Salmonella enterica genome associated with tRNA(leuX). J Bacteriol 2005; 187:2469-82. [PMID: 15774890 PMCID: PMC1065210 DOI: 10.1128/jb.187.7.2469-2482.2005] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The divergence of Salmonella enterica and Escherichia coli is estimated to have occurred approximately 140 million years ago. Despite this evolutionary distance, the genomes of these two species still share extensive synteny and homology. However, there are significant differences between the two species in terms of genes putatively acquired via various horizontal transfer events. Here we report on the composition and distribution across the Salmonella genus of a chromosomal region designated SPI-10 in Salmonella enterica serovar Typhi and located adjacent to tRNA(leuX). We find that across the Salmonella genus the tRNA(leuX) region is a hypervariable hot spot for horizontal gene transfer; different isolates from the same S. enterica serovar can exhibit significant variation in this region. Many P4 phage, plasmid, and transposable element-associated genes are found adjacent to tRNA(leuX) in both Salmonella and E. coli, suggesting that these mobile genetic elements have played a major role in driving the variability of this region.
Collapse
Affiliation(s)
- Anne L Bishop
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3156
|
Abstract
Exon shuffling, a major mechanism of gene evolution, scrambles existing sequences to create new genes. However, is it possible for an exon to be created from scratch? Here we conduct a survey of rat and mouse genomes and identify 2302 putative rodent-specific exons absent from the human genome. Analysis of rodent transcripts supporting these exons indicates that over half appear to be alternatively spliced in genes orthologous between rodents and human. This study demonstrates the importance of sequencing genomes from multiple species to accurately document the evolution of gene structure.
Collapse
Affiliation(s)
- Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Huck Institute for Life Sciences, and Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, PA 16802, USA.
| |
Collapse
|
3157
|
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pontius JU, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2005; 33:D39-45. [PMID: 15608222 PMCID: PMC540016 DOI: 10.1093/nar/gki062] [Citation(s) in RCA: 317] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data retrieval systems and computational resources for the analysis of data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, Entrez Programming Utilities, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- David L Wheeler
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3158
|
Lee Y, Tsai J, Sunkara S, Karamycheva S, Pertea G, Sultana R, Antonescu V, Chan A, Cheung F, Quackenbush J. The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucleic Acids Res 2005; 33:D71-4. [PMID: 15608288 PMCID: PMC540018 DOI: 10.1093/nar/gki064] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis.
Collapse
Affiliation(s)
- Y Lee
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3159
|
Larsson TP, Murray CG, Hill T, Fredriksson R, Schiöth HB. Comparison of the current RefSeq, Ensembl and EST databases for counting genes and gene discovery. FEBS Lett 2005; 579:690-8. [PMID: 15670830 DOI: 10.1016/j.febslet.2004.12.046] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2004] [Revised: 12/13/2004] [Accepted: 12/13/2004] [Indexed: 11/25/2022]
Abstract
Large amounts of refined sequence material in the form of predicted, curated and annotated genes and expressed sequences tags (ESTs) have recently been added to the NCBI databases. We matched the transcript-sequences of RefSeq, Ensembl and dbEST in an attempt to provide an updated overview of how many unique human genes can be found. The results indicate that there are about 25000 unique genes in the union of RefSeq and Ensembl with 12-18% and 8-13% of the genes in each set unique to the other set, respectively. About 20% of all genes had splice variants. There are a considerable number of ESTs (2200000) that do not match the identified genes and we used an in-house pipeline to identify 22 novel genes from Genscan predictions that have considerable EST coverage. The study provides an insight into the current status of human gene catalogues and shows that considerable refinement of methods and datasets is needed to come to a conclusive gene count.
Collapse
Affiliation(s)
- Thomas P Larsson
- Department of Neuroscience, Uppsala University, BMC Box 593, 751 24 Uppsala, Sweden.
| | | | | | | | | |
Collapse
|
3160
|
Perreault J, Noël JF, Brière F, Cousineau B, Lucier JF, Perreault JP, Boire G. Retropseudogenes derived from the human Ro/SS-A autoantigen-associated hY RNAs. Nucleic Acids Res 2005; 33:2032-41. [PMID: 15817567 PMCID: PMC1074747 DOI: 10.1093/nar/gki504] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We report the characterization in the human genome of 966 pseudogenes derived from the four human Y (hY) RNAs, components of the Ro/SS-A autoantigen. About 95% of the Y RNA pseudogenes are found in corresponding locations on the chimpanzee and human chromosomes. On the contrary, Y pseudogenes in mice are both infrequent and found in different genomic regions. In addition to this rodent/primate discrepancy, the conservation of hY pseudogenes relative to hY genes suggests that they occurred after rodent/primate divergence. Flanking regions of hY pseudogenes contain convincing evidence for involvement of the L1 retrotransposition machinery. Although Alu elements are found in close proximity to most hY pseudogenes, these are not chimeric retrogenes. Point mutations in hY RNA transcripts specifically affecting binding of Ro60 protein likely contributed to their selection for direct trans retrotransposition. This represents a novel requirement for the selection of specific RNAs for their genomic integration by the L1 retrotransposition machinery. Over 40% of the hY pseudogenes are found in intronic regions of protein-coding genes. Considering the functions of proteins known to bind subsets of hY RNAs, hY pseudogenes constitute a new class of L1-dependent non-autonomous retroelements, potentially involved in post-transcriptional regulation of gene expression.
Collapse
Affiliation(s)
- Jonathan Perreault
- RNA group/Groupe ARN, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
- Department of Biochemistry, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
| | - Jean-François Noël
- RNA group/Groupe ARN, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
- Department of Microbiology and Infectiology, Faculty of Medicine, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
| | - Francis Brière
- RNA group/Groupe ARN, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
- Department of Biochemistry, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
| | - Benoit Cousineau
- RNA group/Groupe ARN, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
- Department of Microbiology and Immunology, McGill University3775 University Street, Montréal, Quebec, H3A 2B4, Canada
| | - Jean-François Lucier
- RNA group/Groupe ARN, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
| | - Jean-Pierre Perreault
- RNA group/Groupe ARN, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
- Department of Biochemistry, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
| | - Gilles Boire
- RNA group/Groupe ARN, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
- Department of Medicine, Université de SherbrookeSherbrooke, Quebec, J1H 5N4, Canada
- To whom correspondence should be addressed. Tel: +1 819 564 5261; Fax: +1 819 564 5265;
| |
Collapse
|
3161
|
Boles KS, Barchet W, Diacovo T, Cella M, Colonna M. The tumor suppressor TSLC1/NECL-2 triggers NK-cell and CD8+ T-cell responses through the cell-surface receptor CRTAM. Blood 2005; 106:779-86. [PMID: 15811952 DOI: 10.1182/blood-2005-02-0817] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The tumor suppressor in lung cancer-1 (TSLC1) gene is frequently silenced in human lung carcinomas, and its expression suppresses tumorigenesis in nude mice. TSLC1 encodes a cell-surface protein called Necl-2 that belongs to the Nectin and Nectin-like (Necl) family of molecules. Necl-2 mediates epithelial cell junctions by homotypic contacts and/or heterotypic interactions with other Nectins and Necls. Thus, it inhibits tumorigenesis by ensuring that epithelial cells grow in organized layers. Here, we demonstrate that natural killer (NK) cells and CD8+ T cells recognize Necl-2 through a receptor known as class I-restricted T-cell-associated molecule (CRTAM), which is expressed only on activated cells. CRTAM-Necl-2 interactions promote cytotoxicity of NK cells and interferon gamma (IFN-gamma) secretion of CD8+ T cells in vitro as well as NK cell-mediated rejection of tumors expressing Necl-2 in vivo. These results provide evidence for an additional mechanism of tumor suppression mediated by TSLC1 that involves cytotoxic lymphocytes. Furthermore, they reveal Necl-2 as one of the molecular targets that allows the immunosurveillance network to distinguish tumor cells from normal cells.
Collapse
Affiliation(s)
- Kent S Boles
- Washington University School of Medicine, 660 S. Euclid, St Louis, MO 63110, USA
| | | | | | | | | |
Collapse
|
3162
|
Kirschbaum-Slager N, Parmigiani RB, Camargo AA, de Souza SJ. Identification of human exons overexpressed in tumors through the use of genome and expressed sequence data. Physiol Genomics 2005; 21:423-32. [PMID: 15784694 DOI: 10.1152/physiolgenomics.00237.2004] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Alternative splicing is one of the major sources of the large transcriptional diversity found in human cells. Splicing variants have been shown to be associated with features like spreading and progression in several human tumors. Therefore, such variants may be of great importance as both diagnostic and therapeutic tools. Here, by using a set of criteria regarding the expression pattern of splicing variants and statistical analyses, we were able to screen the genome for exons overexpressed in tumors of specific tissues. However, as in other analyses attempting to identify tumor-associated variants, our list of candidates was seriously inflated with cases of genes differentially expressed in tumors. To exclude these cases and increase the probability of finding bona fide regulated splicing variants, we performed a serial analysis of gene expression (SAGE), excluding those genes that were shown to be upregulated in tumors. This allowed us to predict the overexpression of single exons in specific tumors. Our final group of candidates includes 1,386 exons belonging to 638 genes. Experimental validation of a few candidates in normal tissue, tumor cell lines, and patient samples suggests that most of these candidates are indeed tumor-associated exons. Further functional classification of our candidate genes shows that our final list is slightly inflated with cancer-related genes.
Collapse
|
3163
|
Abstract
MOTIVATION We introduce GMAP, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. RESULTS On a set of human messenger RNAs with random mutations at a 1 and 3% rate, GMAP identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, GMAP provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, GMAP performed comparably with GeneSeqer. In these experiments, GMAP demonstrated a several-fold increase in speed over existing programs. AVAILABILITY Source code for gmap and associated programs is available at http://www.gene.com/share/gmap SUPPLEMENTARY INFORMATION http://www.gene.com/share/gmap.
Collapse
Affiliation(s)
- Thomas D Wu
- Department of Bioinformatics Genentech, Inc., South San Francisco, CA 94080, USA.
| | | |
Collapse
|
3164
|
La Rota M, Kantety RV, Yu JK, Sorrells ME. Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley. BMC Genomics 2005; 6:23. [PMID: 15720707 PMCID: PMC550658 DOI: 10.1186/1471-2164-6-23] [Citation(s) in RCA: 159] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2004] [Accepted: 02/18/2005] [Indexed: 11/23/2022] Open
Abstract
Background Earlier comparative maps between the genomes of rice (Oryza sativa L.), barley (Hordeum vulgare L.) and wheat (Triticum aestivum L.) were linkage maps based on cDNA-RFLP markers. The low number of polymorphic RFLP markers has limited the development of dense genetic maps in wheat and the number of available anchor points in comparative maps. Higher density comparative maps using PCR-based anchor markers are necessary to better estimate the conservation of colinearity among cereal genomes. The purposes of this study were to characterize the proportion of transcribed DNA sequences containing simple sequence repeats (SSR or microsatellites) by length and motif for wheat, barley and rice and to determine in-silico rice genome locations for primer sets developed for wheat and barley Expressed Sequence Tags. Results The proportions of SSR types (di-, tri-, tetra-, and penta-nucleotide repeats) and motifs varied with the length of the SSRs within and among the three species, with trinucleotide SSRs being the most frequent. Distributions of genomic microsatellites (gSSRs), EST-derived microsatellites (EST-SSRs), and transcribed regions in the contiguous sequence of rice chromosome 1 were highly correlated. More than 13,000 primer pairs were developed for use by the cereal research community as potential markers in wheat, barley and rice. Conclusion Trinucleotide SSRs were the most common type in each of the species; however, the relative proportions of SSR types and motifs differed among rice, wheat, and barley. Genomic microsatellites were found to be primarily located in gene-rich regions of the rice genome. Microsatellite markers derived from the use of non-redundant EST-SSRs are an economic and efficient alternative to RFLP for comparative mapping in cereals.
Collapse
Affiliation(s)
- Mauricio La Rota
- Department of Plant Breeding and Genetics, 240 Emerson Hall, Cornell University, Ithaca, NY, 14853, USA
| | - Ramesh V Kantety
- Department of Plant & Soil Science, 138 ARC Building, Alabama A&M University, Normal, AL, 35762, USA
| | - Ju-Kyung Yu
- Department of Plant Breeding and Genetics, 240 Emerson Hall, Cornell University, Ithaca, NY, 14853, USA
| | - Mark E Sorrells
- Department of Plant Breeding and Genetics, 240 Emerson Hall, Cornell University, Ithaca, NY, 14853, USA
| |
Collapse
|
3165
|
Ryman KD, Meier KC, Nangle EM, Ragsdale SL, Korneeva NL, Rhoads RE, MacDonald MR, Klimstra WB. Sindbis virus translation is inhibited by a PKR/RNase L-independent effector induced by alpha/beta interferon priming of dendritic cells. J Virol 2005; 79:1487-99. [PMID: 15650175 PMCID: PMC544143 DOI: 10.1128/jvi.79.3.1487-1499.2005] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The tropism of Sindbis virus (SB) for cells of the dendritic cell (DC) lineage and the virulence of SB in vivo are largely determined by the efficacy of alpha/beta interferon (IFN-alpha/beta)-mediated antiviral responses. These responses are essentially intact in the absence of PKR and/or RNase L (K. D. Ryman, L. J. White, R. E. Johnston, and W. B. Klimstra, Viral Immunol. 15:53-76, 2002). In the present studies, we investigated the nature of antiviral effects and identity of antiviral effectors primed by IFN-alpha/beta treatment of bone marrow-derived DCs (BMDCs) generated from mice deficient in PKR and RNase L (TD). IFN-alpha/beta priming exerted significant antiviral activity at very early stages of SB replication and most likely inhibited the initial translation of infecting genomes. The early effect targeted cap-dependent translation as protein synthesis from an SB-like and a simple RNA were inhibited by interferon treatment, but an encephalomyocarditis virus internal ribosome entry site-driven element exhibited no inhibition. Phosphorylation of the alpha subunit of eukaryotic translation initiation factor 2 was defective after virus infection of TD cells, suggesting other mechanisms of translation inhibition. To identify components of these alternative antiviral pathway(s), we have compared global gene regulation in BMDCs derived from normal 129 Sv/Ev, IFNAR1-/-, and TD mice following infection with SB or treatment with IFN-alpha/beta. Candidate effectors of alternative antiviral pathways were those genes induced by virus infection or IFN-alpha/beta treatment in 129 Sv/Ev and TD-derived BMDC but not in virus-infected or IFN-alpha/beta-treated IFNAR1-/- cells. Statistical analyses of gene array data identified 44 genes that met these criteria which are discussed.
Collapse
Affiliation(s)
- K D Ryman
- Department of Microbiology and Immunology, Center for Molecular and Tumor Virology, Louisiana State University Health Sciences Center, 1501 Kings Hwy., Shreveport, LA 71130-3932, USA.
| | | | | | | | | | | | | | | |
Collapse
|
3166
|
Song J, Xu Y, White S, Miller KWP, Wolinsky M. SNPsFinder--a web-based application for genome-wide discovery of single nucleotide polymorphisms in microbial genomes. Bioinformatics 2005; 21:2083-4. [PMID: 15691853 DOI: 10.1093/bioinformatics/bti176] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variations in closely related microbial species, strains or isolates. Some SNPs confer selective advantages for microbial pathogens during infection and many others are powerful genetic markers for distinguishing closely related strains or isolates that could not be distinguished otherwise. To facilitate SNP discovery in microbial genomes, we have developed a web-based application, SNPsFinder, for genome-wide identification of SNPs. SNPsFinder takes multiple genome sequences as input to identify SNPs within homologous regions. It can also take contig sequences and sequence quality scores from ongoing sequencing projects for SNP prediction. SNPsFinder will use genome sequence annotation if available and map the predicted SNP regions to known genes or regions to assist further evaluation of the predicted SNPs for their functional significance. SNPsFinder can generate PCR primers for all predicted SNP regions according to user's input parameters to facilitate experimental validation. The results from SNPsFinder analysis are accessible through the World Wide Web. AVAILABILITY The SNPsFinder program is available at http://snpsfinder.lanl.gov/. SUPPLEMENTARY INFORMATION The user's manual is available at http://snpsfinder.lanl.gov/UsersManual/
Collapse
Affiliation(s)
- Jian Song
- Bioscience Division, Los Alamos National Laboratory, NM 87545, USA
| | | | | | | | | |
Collapse
|
3167
|
Li M, Ma B, Kisman D, Tromp J. Patternhunter II: highly sensitive and fast homology search. J Bioinform Comput Biol 2005; 2:417-39. [PMID: 15359419 DOI: 10.1142/s0219720004000661] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2003] [Revised: 01/29/2003] [Accepted: 01/31/2004] [Indexed: 11/18/2022]
Abstract
Extending the single optimized spaced seed of PatternHunter(20) to multiple ones, PatternHunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of Smith-Waterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bringing homology search methodology research back to a full circle.
Collapse
Affiliation(s)
- Ming Li
- Department of Computer Science, University of Waterloo, Waterloo, ON, Canada N2L 3G.
| | | | | | | |
Collapse
|
3168
|
Abril JF, Castelo R, Guigó R. Comparison of splice sites in mammals and chicken. Genome Res 2005; 15:111-9. [PMID: 15590946 PMCID: PMC540285 DOI: 10.1101/gr.3108805] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2004] [Accepted: 11/11/2004] [Indexed: 01/02/2023]
Abstract
We have carried out an initial analysis of the dynamics of the recent evolution of the splice-sites sequences on a large collection of human, rodent (mouse and rat), and chicken introns. Our results indicate that the sequences of splice sites are largely homogeneous within tetrapoda. We have also found that orthologous splice signals between human and rodents and within rodents are more conserved than unrelated splice sites, but the additional conservation can be explained mostly by background intron conservation. In contrast, additional conservation over background is detectable in orthologous mammalian and chicken splice sites. Our results also indicate that the U2 and U12 intron classes seem to have evolved independently since the split of mammals and birds; we have not been able to find a convincing case of interconversion between these two classes in our collections of orthologous introns. Similarly, we have not found a single case of switching between AT-AC and GT-AG subtypes within U12 introns, suggesting that this event has been a rare occurrence in recent evolutionary times. Switching between GT-AG and the noncanonical GC-AG U2 subtypes, on the contrary, does not appear to be unusual; in particular, T to C mutations appear to be relatively well tolerated in GT-AG introns with very strong donor sites.
Collapse
Affiliation(s)
- Josep F Abril
- Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, and Programa de Bioinformàtica i Genòmica, Centre de Regulació Genòmica, C/ Dr. Aiguader 80, E-08003 Barcelona, Catalonia, Spain
| | | | | |
Collapse
|
3169
|
Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W, Edwards YJK, Cooke JE, Elgar G. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 2005; 3:e7. [PMID: 15630479 PMCID: PMC526512 DOI: 10.1371/journal.pbio.0030007] [Citation(s) in RCA: 685] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2004] [Accepted: 10/21/2004] [Indexed: 02/06/2023] Open
Abstract
In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH), in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development, including many transcription factors. These highly conserved non-coding sequences are likely to form part of the genomic circuitry that uniquely defines vertebrate development.
Collapse
Affiliation(s)
- Adam Woolfe
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Martin Goodson
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Debbie K Goode
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Phil Snell
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Gayle K McEwen
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Tanya Vavouri
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Sarah F Smith
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Phil North
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Heather Callaway
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Krys Kelly
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Klaudia Walter
- 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes HospitalCambridgeUnited Kingdom
| | - Irina Abnizova
- 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes HospitalCambridgeUnited Kingdom
| | - Walter Gilks
- 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes HospitalCambridgeUnited Kingdom
| | - Yvonne J. K Edwards
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Julie E Cooke
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Greg Elgar
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| |
Collapse
|
3170
|
Brown AC, Kai K, May ME, Brown DC, Roopenian DC. ExQuest, a novel method for displaying quantitative gene expression from ESTs. Genomics 2004; 83:528-39. [PMID: 14962679 DOI: 10.1016/j.ygeno.2003.09.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2003] [Accepted: 09/08/2003] [Indexed: 12/21/2022]
Abstract
There is a pressing need for interactive bioinformatics tools that empower investigators with the means to extract information and organize it in a simplified but meaningful format. A wealth of mammalian gene expression data is readily accessible, much of which is based on expressed sequence tags (ESTs). Many mammalian ESTs are derived from tissue-specific cDNA libraries in which the number of ESTs representing a specific gene approximates the transcriptional expression level in the source tissue. Our program ExQuest (Expressional Quantification of ESTs) organizes the public EST database (dbEST) into hierarchical tissue classes and reports tissue or developmental gene expression patterns for both mRNA and genomic sequences. ExQuest also displays tissue expression patterns of genes in the context of assembled chromosomes. These interactive "transcriptome" maps provide a novel tool for investigating the genomic basis of gene expression as well as prioritizing candidate genes within genetically mapped mutant and quantitative trait loci.
Collapse
Affiliation(s)
- Aaron C Brown
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | | | | | | | | |
Collapse
|
3171
|
Sémon M, Mouchiroud D, Duret L. Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance. Hum Mol Genet 2004; 14:421-7. [PMID: 15590696 DOI: 10.1093/hmg/ddi038] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Mammalian chromosomes are characterized by large-scale variations of DNA base composition (the so-called isochores). In contradiction with previous studies, Lercher et al. (Hum. Mol. Genet., 12, 2411, 2003) recently reported a strong correlation between gene expression breadth and GC-content, suggesting that there might be a selective pressure favoring the concentration of housekeeping genes in GC-rich isochores. We reassessed this issue by examining in human and mouse the correlation between gene expression and GC-content, using different measures of gene expression (EST, SAGE and microarray) and different measures of GC-content. We show that correlations between GC-content and expression are very weak, and may vary according to the method used to measure expression. Such weak correlations have a very low predictive value. The strong correlations reported by Lercher et al. (2003) are because of the fact that they measured variables over neighboring genes windows. We show here that using gene windows artificially enhances the correlation. The assertion that the expression of a given gene depends on the GC-content of the region where it is located is therefore not supported by the data.
Collapse
Affiliation(s)
- Marie Sémon
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558 Université Claude Bernard Lyon 1, 16 rue Raphaël Dubois, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
3172
|
Kellner WA, Sullivan RT, Carlson BH, Thomas JW. Uprobe: a genome-wide universal probe resource for comparative physical mapping in vertebrates. Genome Res 2004; 15:166-73. [PMID: 15590945 PMCID: PMC540286 DOI: 10.1101/gr.3066805] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Interspecies comparisons are important for deciphering the functional content and evolution of genomes. The expansive array of >70 public vertebrate genomic bacterial artificial chromosome (BAC) libraries can provide a means of comparative mapping, sequencing, and functional analysis of targeted chromosomal segments that is independent and complementary to whole-genome sequencing. However, at the present time, no complementary resource exists for the efficient targeted physical mapping of the majority of these BAC libraries. Universal overgo-hybridization probes, designed from regions of sequenced genomes that are highly conserved between species, have been demonstrated to be an effective resource for the isolation of orthologous regions from multiple BAC libraries in parallel. Here we report the application of the universal probe design principal across entire genomes, and the subsequent creation of a complementary probe resource, Uprobe, for screening vertebrate BAC libraries. Uprobe currently consists of whole-genome sets of universal overgo-hybridization probes designed for screening mammalian or avian/reptilian libraries. Retrospective analysis, experimental validation of the probe design process on a panel of representative BAC libraries, and estimates of probe coverage across the genome indicate that the majority of all eutherian and avian/reptilian genes or regions of interest can be isolated using Uprobe. Future implementation of the universal probe design strategy will be used to create an expanded number of whole-genome probe sets that will encompass all vertebrate genomes.
Collapse
Affiliation(s)
- Wendy A Kellner
- Emory University School of Medicine, Department of Human Genetics, Atlanta, Georgia 30322, USA
| | | | | | | |
Collapse
|
3173
|
Messina DN, Glasscock J, Gish W, Lovett M. An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res 2004; 14:2041-7. [PMID: 15489324 PMCID: PMC528918 DOI: 10.1101/gr.2584104] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Transcription factors (TFs) are essential regulators of gene expression, and mutated TF genes have been shown to cause numerous human genetic diseases. Yet to date, no single, comprehensive database of human TFs exists. In this work, we describe the collection of an essentially complete set of TF genes from one depiction of the human ORFeome, and the design of a microarray to interrogate their expression. Taking 1468 known TFs from TRANSFAC, InterPro, and FlyBase, we used this seed set to search the ScriptSure human transcriptome database for additional genes. ScriptSure's genome-anchored transcript clusters allowed us to work with a nonredundant high-quality representation of the human transcriptome. We used a high-stringency similarity search by using BLASTN, and a protein motif search of the human ORFeome by using hidden Markov models of DNA-binding domains known to occur exclusively or primarily in TFs. Four hundred ninety-four additional TF genes were identified in the overlap between the two searches, bringing our estimate of the total number of human TFs to 1962. Zinc finger genes are by far the most abundant family (762 members), followed by homeobox (199 members) and basic helix-loop-helix genes (117 members). We designed a microarray of 50-mer oligonucleotide probes targeted to a unique region of the coding sequence of each gene. We have successfully used this microarray to interrogate TF gene expression in species as diverse as chickens and mice, as well as in humans.
Collapse
Affiliation(s)
- David N Messina
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | | | | | | |
Collapse
|
3174
|
Locke DP, Jiang Z, Pertz LM, Misceo D, Archidiacono N, Eichler EE. Molecular evolution of the human chromosome 15 pericentromeric region. Cytogenet Genome Res 2004; 108:73-82. [PMID: 15545718 DOI: 10.1159/000080804] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2003] [Accepted: 12/09/2003] [Indexed: 11/19/2022] Open
Abstract
We present a detailed molecular evolutionary analysis of 1.2 Mb from the pericentromeric region of human 15q11. Sequence analysis indicates the region has been subject to extensive interchromosomal and intrachromosomal duplications during primate evolution. Comparative FISH analyses among non-human primates show remarkable quantitative and qualitative differences in the organization and duplication history of this region - including lineage-specific deletions and duplication expansions. Phylogenetic and comparative analyses reveal that the region is composed of at least 24 distinct segmental duplications or duplicons that have populated the pericentromeric regions of the human genome over the last 40 million years of human evolution. The value of combining both cytogenetic and experimental data in understanding the complex forces which have shaped these regions is discussed.
Collapse
Affiliation(s)
- D P Locke
- Department of Genetics, Center for Computational Genomics, Case Western Reserve University School of Medicine and University Hospitals of Cleveland, Cleveland, OH, USA
| | | | | | | | | | | |
Collapse
|
3175
|
Abstract
Patterns of codon usage bias were studied in the moss model species Physcomitrella patens. A total of 92 nuclear, protein coding genes were employed, and estimated levels of gene expression were tested for association with two measures of codon usage bias and other variables hypothesized to be associated with gene expression. Codon bias was found to be positively associated both with estimated levels of gene expression and GC content in the coding parts of studied genes. However, GC content in noncoding parts, that is, introns and 5' and 3' untranslated regions (UTRs), was not associated with estimated levels of gene expression. It is argued that codon bias is not shaped by mutational bias, but rather by weak natural selection for translational efficiency in P. patens. The possible role of life history characteristics in shaping patterns of codon usage in this species is discussed.
Collapse
Affiliation(s)
- H K Stenøien
- Plant Ecology/Department of Ecology and Evolution, Evolutionary Biology Centre, Uppsala University, Villav. 14, Uppsala SE-752 36, Sweden.
| |
Collapse
|
3176
|
Villesen P, Aagaard L, Wiuf C, Pedersen FS. Identification of endogenous retroviral reading frames in the human genome. Retrovirology 2004; 1:32. [PMID: 15476554 PMCID: PMC524368 DOI: 10.1186/1742-4690-1-32] [Citation(s) in RCA: 135] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2004] [Accepted: 10/11/2004] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Human endogenous retroviruses (HERVs) comprise a large class of repetitive retroelements. Most HERVs are ancient and invaded our genome at least 25 million years ago, except for the evolutionary young HERV-K group. The far majority of the encoded genes are degenerate due to mutational decay and only a few non-HERV-K loci are known to retain intact reading frames. Additional intact HERV genes may exist, since retroviral reading frames have not been systematically annotated on a genome-wide scale. RESULTS By clustering of hits from multiple BLAST searches using known retroviral sequences we have mapped 1.1% of the human genome as retrovirus related. The coding potential of all identified HERV regions were analyzed by annotating viral open reading frames (vORFs) and we report 7836 loci as verified by protein homology criteria. Among 59 intact or almost-intact viral polyproteins scattered around the human genome we have found 29 envelope genes including two novel gammaretroviral types. One encodes a protein similar to a recently discovered zebrafish retrovirus (ZFERV) while another shows partial, C-terminal, homology to Syncytin (HERV-W/FRD). CONCLUSIONS This compilation of HERV sequences and their coding potential provide a useful tool for pursuing functional analysis such as RNA expression profiling and effects of viral proteins, which may, in turn, reveal a role for HERVs in human health and disease. All data are publicly available through a database at http://www.retrosearch.dk.
Collapse
Affiliation(s)
- Palle Villesen
- Bioinformatics Research Center, University of Aarhus, Høegh-Guldbergs Gade 10, Bldg. 090, DK-8000 Aarhus, Denmark
| | - Lars Aagaard
- Bioinformatics Research Center, University of Aarhus, Høegh-Guldbergs Gade 10, Bldg. 090, DK-8000 Aarhus, Denmark
| | - Carsten Wiuf
- Bioinformatics Research Center, University of Aarhus, Høegh-Guldbergs Gade 10, Bldg. 090, DK-8000 Aarhus, Denmark
| | - Finn Skou Pedersen
- Department of Molecular Biology, University of Aarhus, C. F. Møllers Allé, Bldg. 130, DK-8000 Aarhus, Denmark
- Department of Medical Microbiology and Immunology, University of Aarhus, DK-8000 Aarhus, Denmark
| |
Collapse
|
3177
|
Tengs T, LaFramboise T, Den RB, Hayes DN, Zhang J, DebRoy S, Gentleman RC, O'Neill K, Birren B, Meyerson M. Genomic representations using concatenates of Type IIB restriction endonuclease digestion fragments. Nucleic Acids Res 2004; 32:e121. [PMID: 15329383 PMCID: PMC516078 DOI: 10.1093/nar/gnh120] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We have developed a method for genomic representation using Type IIB restriction endonucleases. Representation by concatenation of restriction digests, or RECORD, is an approach to sample the fragments generated by cleavage with these enzymes. Here, we show that the RECORD libraries may be used for digital karyotyping and for pathogen identification by computational subtraction.
Collapse
Affiliation(s)
- Torstein Tengs
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3178
|
Krüger J, Sczyrba A, Kurtz S, Giegerich R. e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences. Nucleic Acids Res 2004; 32:W301-4. [PMID: 15215398 PMCID: PMC441616 DOI: 10.1093/nar/gkh478] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
e2g is a web-based server which efficiently maps large expressed sequence tag (EST) and cDNA datasets to genomic DNA. It significantly extends the volume of data that can be mapped in reasonable time, and makes this improved efficiency available as a web service. Our server hosts large collections of EST sequences (e.g. 4.1 million mouse ESTs of 1.87 Gb) in precomputed indexed data structures for efficient sequence comparison. The user can upload a genomic DNA sequence of interest and rapidly compare this to the complete collection of ESTs on the server. This delivers a mapping of the ESTs on the genomic DNA. The e2g web interface provides a graphical overview of the mapping. Alignments of the mapped EST regions with parts of the genomic sequence are visualized. Zooming functions allow the user to interactively explore the results. Mapped sequences can be downloaded for further analysis. e2g is available on the Bielefeld University Bioinformatics Server at http://bibiserv.techfak.uni-bielefeld.de/e2g/.
Collapse
Affiliation(s)
- Jan Krüger
- Technische Fakultät, Universität Bielefeld, D-33594 Bielefeld, Germany
| | | | | | | |
Collapse
|
3179
|
McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 2004; 32:W20-5. [PMID: 15215342 PMCID: PMC441573 DOI: 10.1093/nar/gkh435] [Citation(s) in RCA: 1349] [Impact Index Per Article: 64.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Basic Local Alignment Search Tool (BLAST) is one of the most heavily used sequence analysis tools available in the public domain. There is now a wide choice of BLAST algorithms that can be used to search many different sequence databases via the BLAST web pages (http://www.ncbi.nlm.nih.gov/BLAST/). All the algorithm-database combinations can be executed with default parameters or with customized settings, and the results can be viewed in a variety of ways. A new online resource, the BLAST Program Selection Guide, has been created to assist in the definition of search strategies. This article discusses optimal search strategies and highlights some BLAST features that can make your searches more powerful.
Collapse
Affiliation(s)
- Scott McGinnis
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | |
Collapse
|
3180
|
Leipzig J, Pevzner P, Heber S. The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome. Nucleic Acids Res 2004; 32:3977-83. [PMID: 15292448 PMCID: PMC506815 DOI: 10.1093/nar/gkh731] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Alternative splicing essentially increases the diversity of the transcriptome and has important implications for physiology, development and the genesis of diseases. Conventionally, alternative splicing is investigated in a case-by-case fashion, but this becomes cumbersome and error prone if genes show a huge abundance of different splice variants. We use a different approach and integrate all transcripts derived from a gene into a single splicing graph. Each transcript corresponds to a path in the graph, and alternative splicing is displayed by bifurcations. This representation preserves the relationships between different splicing variants and allows us to investigate systematically all possible putative transcripts. We built a database of splicing graphs for human genes, using transcript information from various major sources (Ensembl, RefSeq, STACK, TIGR and UniGene). A Web interface allows users to display the splicing graphs, to interactively assemble transcripts and to access their sequences as well as neighboring genomic regions. We also provide for each gene an exhaustive pre-computed catalog of putative transcripts--in total more than 1.2 million sequences. We found that approximately 65% of the investigated genes show evidence for alternative splicing, and in 5% of the cases, a single gene might produce over 100 transcripts.
Collapse
Affiliation(s)
- Jeremy Leipzig
- Department of Computer Science, College of Engineering, North Carolina State University, Raleigh, NC 27695-7566, USA
| | | | | |
Collapse
|
3181
|
Gotea V, Veeramachaneni V, Makałowski W. Mastering seeds for genomic size nucleotide BLAST searches. Nucleic Acids Res 2004; 31:6935-41. [PMID: 14627826 PMCID: PMC290255 DOI: 10.1093/nar/gkg886] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
One of the most common activities in bioinformatics is the search for similar sequences. These searches are usually carried out with the help of programs from the NCBI BLAST family. As the majority of searches are routinely performed with default parameters, a question that should be addressed is how reliable the results obtained using the default parameter values are, i.e. what fraction of potential matches have been retrieved by these searches. Our primary focus is on the initial hit parameter, also known as the seed or word, used by the NCBI BLASTn, MegaBLAST and other similar programs in searches for similar nucleotide sequences. We show that the use of default values for the initial hit parameter can have a big negative impact on the proportion of potentially similar sequences that are retrieved. We also show how the hit probability of different seeds varies with the minimum length and similarity of sequences desired to be retrieved and describe methods that help in determining appropriate seeds. The experimental results described in this paper illustrate situations in which these methods are most applicable and also show the relationship between the various BLAST parameters.
Collapse
Affiliation(s)
- Valer Gotea
- Institute of Molecular Evolutionary Genetics and Department of Biology, The Pennsylvania State University, 514 Mueller Lab, University Park, PA 16802, USA
| | | | | |
Collapse
|
3182
|
Carmel I, Tal S, Vig I, Ast G. Comparative analysis detects dependencies among the 5' splice-site positions. RNA (NEW YORK, N.Y.) 2004; 10:828-840. [PMID: 15100438 PMCID: PMC1370573 DOI: 10.1261/rna.5196404] [Citation(s) in RCA: 163] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2003] [Accepted: 02/04/2004] [Indexed: 05/24/2023]
Abstract
Human-mouse comparative genomics is an informative tool to assess sequence functionality as inferred from its conservation level. We used this approach to examine dependency among different positions of the 5' splice site. We compiled a data set of 50,493 homologous human-mouse internal exons and analyzed the frequency of changes among different positions of homologous human-mouse 5' splice-site pairs. We found mutual relationships between positions +4 and +5, +5 and +6, -2 and +5, and -1 and +5. We also demonstrated the association between the exonic and the intronic positions of the 5' splice site, in which a stronger interaction of U1 snRNA and the intronic portion of the 5' splice site compensates for weak interaction of U1 snRNA and the exonic portion of the 5' splice site, and vice versa. By using an ex vivo system that mimics the effect of mutation in the 5' splice site leading to familial dysautonomia, we demonstrated that U1 snRNA base-pairing with positions +6 and -1 is the only functional requirement for mRNA splicing of this 5' splice site. Our findings indicate the importance of U1 snRNA base-pairing to the exonic portion of the 5' splice site.
Collapse
Affiliation(s)
- Ido Carmel
- Department of Human Genetics and Molecular Medicine, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
| | | | | | | |
Collapse
|
3183
|
Galante PAF, Sakabe NJ, Kirschbaum-Slager N, de Souza SJ. Detection and evaluation of intron retention events in the human transcriptome. RNA (NEW YORK, N.Y.) 2004; 10:757-65. [PMID: 15100430 PMCID: PMC1370565 DOI: 10.1261/rna.5123504] [Citation(s) in RCA: 169] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2003] [Accepted: 01/26/2004] [Indexed: 05/21/2023]
Abstract
Alternative splicing is a very frequent phenomenon in the human transcriptome. There are four major types of alternative splicing: exon skipping, alternative 3' splice site, alternative 5' splice site, and intron retention. Here we present a large-scale analysis of intron retention in a set of 21,106 known human genes. We observed that 14.8% of these genes showed evidence of at least one intron retention event. Most of the events are located within the untranslated regions (UTRs) of human transcripts. For those retained introns interrupting the coding region, the GC content, codon usage, and the frequency of stop codons suggest that these sequences are under selection for coding potential. Furthermore, 26% of the introns within the coding region participate in the coding of a protein domain. A comparison with mouse shows that at least 22% of all informative examples of retained introns in human are also present in the mouse transcriptome. We discuss that the data we present suggest that a significant fraction of the observed events is not spurious and might reflect biological significance. The analyses also allowed us to generate a reliable set of intron retention events that can be used for the identification of splicing regulatory elements.
Collapse
|
3184
|
Hoeng JC, Höng JC, Ivanov NV, Hodor P, Xia M, Wei N, Blevins R, Gerhold D, Borodovsky M, Liu Y. Identification of new human cadherin genes using a combination of protein motif search and gene finding methods. J Mol Biol 2004; 337:307-17. [PMID: 15003449 DOI: 10.1016/j.jmb.2004.01.026] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2003] [Revised: 01/09/2004] [Accepted: 01/14/2004] [Indexed: 12/14/2022]
Abstract
We have combined protein motif search and gene finding methods to identify genes encoding proteins containing specific domains. Particularly, we have focused on finding new human genes of the cadherin superfamily proteins, which represent a major group of cell-cell adhesion receptors contributing to embryonic neuronal morphogenesis. Models for three cadherin protein motifs were generated from over 100 already annotated cadherin domains and used to search the complete translated human genome. The genomic sequence regions containing motif "hits" were analyzed by eukaryotic GeneMark.hmm to identify the exon-intron structure of new genes. Three new genes CDH-J, PCDH-J and FAT-J were found. The predicted proteins PCDH-J and FAT-J were classified into protocadherin and FAT-like subfamilies, respectively, based on the number and organization of cadherin domains and presence of subfamily-specific conserved amino acid residues. Expression of FAT-J was shown in almost all tested tissues. The exon-intron organization of CDH-J was experimentally verified by PCR with specifically designed primers and its tissue-specific expression was demonstrated. The described methodology can be applied to discover new genes encoding proteins from families with well-characterized structural and functional domains.
Collapse
Affiliation(s)
| | - Julia C Höng
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
3185
|
Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M, Yura K, Miyazaki S, Ikeo K, Homma K, Kasprzyk A, Nishikawa T, Hirakawa M, Thierry-Mieg J, Thierry-Mieg D, Ashurst J, Jia L, Nakao M, Thomas MA, Mulder N, Karavidopoulou Y, Jin L, Kim S, Yasuda T, Lenhard B, Eveno E, Suzuki Y, Yamasaki C, Takeda JI, Gough C, Hilton P, Fujii Y, Sakai H, Tanaka S, Amid C, Bellgard M, Bonaldo MDF, Bono H, Bromberg SK, Brookes AJ, Bruford E, Carninci P, Chelala C, Couillault C, de Souza SJ, Debily MA, Devignes MD, Dubchak I, Endo T, Estreicher A, Eyras E, Fukami-Kobayashi K, R. Gopinath G, Graudens E, Hahn Y, Han M, Han ZG, Hanada K, Hanaoka H, Harada E, Hashimoto K, Hinz U, Hirai M, Hishiki T, Hopkinson I, Imbeaud S, Inoko H, Kanapin A, Kaneko Y, Kasukawa T, Kelso J, Kersey P, Kikuno R, Kimura K, Korn B, Kuryshev V, Makalowska I, Makino T, Mano S, Mariage-Samson R, Mashima J, Matsuda H, Mewes HW, Minoshima S, Nagai K, Nagasaki H, Nagata N, Nigam R, Ogasawara O, Ohara O, Ohtsubo M, Okada N, Okido T, Oota S, Ota M, Ota T, et alImanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M, Yura K, Miyazaki S, Ikeo K, Homma K, Kasprzyk A, Nishikawa T, Hirakawa M, Thierry-Mieg J, Thierry-Mieg D, Ashurst J, Jia L, Nakao M, Thomas MA, Mulder N, Karavidopoulou Y, Jin L, Kim S, Yasuda T, Lenhard B, Eveno E, Suzuki Y, Yamasaki C, Takeda JI, Gough C, Hilton P, Fujii Y, Sakai H, Tanaka S, Amid C, Bellgard M, Bonaldo MDF, Bono H, Bromberg SK, Brookes AJ, Bruford E, Carninci P, Chelala C, Couillault C, de Souza SJ, Debily MA, Devignes MD, Dubchak I, Endo T, Estreicher A, Eyras E, Fukami-Kobayashi K, R. Gopinath G, Graudens E, Hahn Y, Han M, Han ZG, Hanada K, Hanaoka H, Harada E, Hashimoto K, Hinz U, Hirai M, Hishiki T, Hopkinson I, Imbeaud S, Inoko H, Kanapin A, Kaneko Y, Kasukawa T, Kelso J, Kersey P, Kikuno R, Kimura K, Korn B, Kuryshev V, Makalowska I, Makino T, Mano S, Mariage-Samson R, Mashima J, Matsuda H, Mewes HW, Minoshima S, Nagai K, Nagasaki H, Nagata N, Nigam R, Ogasawara O, Ohara O, Ohtsubo M, Okada N, Okido T, Oota S, Ota M, Ota T, Otsuki T, Piatier-Tonneau D, Poustka A, Ren SX, Saitou N, Sakai K, Sakamoto S, Sakate R, Schupp I, Servant F, Sherry S, Shiba R, Shimizu N, Shimoyama M, Simpson AJ, Soares B, Steward C, Suwa M, Suzuki M, Takahashi A, Tamiya G, Tanaka H, Taylor T, Terwilliger JD, Unneberg P, Veeramachaneni V, Watanabe S, Wilming L, Yasuda N, Yoo HS, Stodolsky M, Makalowski W, Go M, Nakai K, Takagi T, Kanehisa M, Sakaki Y, Quackenbush J, Okazaki Y, Hayashizaki Y, Hide W, Chakraborty R, Nishikawa K, Sugawara H, Tateno Y, Chen Z, Oishi M, Tonellato P, Apweiler R, Okubo K, Wagner L, Wiemann S, Strausberg RL, Isogai T, Auffray C, Nomura N, Gojobori T, Sugano S. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol 2004; 2:e162. [PMID: 15103394 PMCID: PMC393292 DOI: 10.1371/journal.pbio.0020162] [Show More Authors] [Citation(s) in RCA: 234] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2003] [Accepted: 04/01/2004] [Indexed: 01/08/2023] Open
Abstract
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
Collapse
Affiliation(s)
- Tadashi Imanishi
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Takeshi Itoh
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 2Bioinformatics Laboratory, Genome Research Department, National Institute of Agrobiological SciencesIbarakiJapan
| | - Yutaka Suzuki
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
- 68Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of TokyoTokyoJapan
| | - Claire O'Donovan
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Satoshi Fukuchi
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | | | - Roberto A Barrero
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Takuro Tamura
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 8BITS CompanyShizuokaJapan
| | - Yumi Yamaguchi-Kabata
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Motohiko Tanino
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Kei Yura
- 9Quantum Bioinformatics Group, Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research InstituteKyotoJapan
| | - Satoru Miyazaki
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Kazuho Ikeo
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Keiichi Homma
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Arek Kasprzyk
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Tetsuo Nishikawa
- 10Reverse Proteomics Research InstituteChibaJapan
- 11Central Research Laboratory, HitachiTokyoJapan
| | - Mika Hirakawa
- 12Bioinformatics Center, Institute for Chemical Research, Kyoto UniversityKyotoJapan
| | - Jean Thierry-Mieg
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
- 14Centre National de la Recherche Scientifique (CNRS), Laboratoire de Physique MathematiqueMontpellierFrance
| | - Danielle Thierry-Mieg
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
- 14Centre National de la Recherche Scientifique (CNRS), Laboratoire de Physique MathematiqueMontpellierFrance
| | - Jennifer Ashurst
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Libin Jia
- 16National Cancer Institute, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Mitsuteru Nakao
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Michael A Thomas
- 17Department of Biological Sciences, Idaho State UniversityPocatello, IdahoUnited States of America
| | - Nicola Mulder
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Youla Karavidopoulou
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Lihua Jin
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Sangsoo Kim
- 18Korea Research Institute of Bioscience and BiotechnologyTaejeonKorea
| | | | - Boris Lenhard
- 19Center for Genomics and Bioinformatics, Karolinska InstitutetStockholmSweden
| | - Eric Eveno
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Yoshiyuki Suzuki
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Chisato Yamasaki
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Jun-ichi Takeda
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Craig Gough
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Phillip Hilton
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Yasuyuki Fujii
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Hiroaki Sakai
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 22Tokyo Research Laboratories, Kyowa Hakko Kogyo CompanyTokyoJapan
| | - Susumu Tanaka
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Clara Amid
- 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and HealthNeuherbergGermany
| | - Matthew Bellgard
- 24Centre for Bioinformatics and Biological Computing, School of Information Technology, Murdoch UniversityMurdoch, Western AustraliaAustralia
| | - Maria de Fatima Bonaldo
- 25Medical Education and Biomedical Research Facility, University of IowaIowa City, IowaUnited States of America
| | - Hidemasa Bono
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Susan K Bromberg
- 27Medical College of Wisconsin, MilwaukeeWisconsinUnited States of America
| | - Anthony J Brookes
- 19Center for Genomics and Bioinformatics, Karolinska InstitutetStockholmSweden
| | - Elspeth Bruford
- 28HUGO Gene Nomenclature Committee, University College LondonLondonUnited Kingdom
| | | | - Claude Chelala
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
| | - Christine Couillault
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | | | - Marie-Anne Debily
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
| | | | - Inna Dubchak
- 32Lawrence Berkeley National Laboratory, BerkeleyCaliforniaUnited States of America
| | - Toshinori Endo
- 33Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental UniversityTokyoJapan
| | | | - Eduardo Eyras
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Kaoru Fukami-Kobayashi
- 35Bioresource Information Division, RIKEN BioResource Center, RIKEN Tsukuba InstituteIbarakiJapan
| | - Gopal R. Gopinath
- 36Genome Knowledgebase, Cold Spring Harbor LaboratoryCold Spring Harbor, New YorkUnited States of America
| | - Esther Graudens
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Yoonsoo Hahn
- 18Korea Research Institute of Bioscience and BiotechnologyTaejeonKorea
| | - Michael Han
- 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and HealthNeuherbergGermany
| | - Ze-Guang Han
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
- 37Chinese National Human Genome Center at ShanghaiShanghaiChina
| | - Kousuke Hanada
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Hideki Hanaoka
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Erimi Harada
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Katsuyuki Hashimoto
- 38Division of Genetic Resources, National Institute of Infectious DiseasesTokyoJapan
| | - Ursula Hinz
- 34Swiss Institute of BioinformaticsGenevaSwitzerland
| | - Momoki Hirai
- 39Graduate School of Frontier Sciences, Department of Integrated Biosciences, University of TokyoChibaJapan
| | - Teruyoshi Hishiki
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Ian Hopkinson
- 41Department of Primary Care and Population Sciences, Royal Free University College Medical School, University College LondonLondonUnited Kingdom
- 42Clinical and Molecular Genetics Unit, The Institute of Child HealthLondonUnited Kingdom
| | - Sandrine Imbeaud
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Hidetoshi Inoko
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai UniversityKanagawaJapan
| | - Alexander Kanapin
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Yayoi Kaneko
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Takeya Kasukawa
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Janet Kelso
- 44South African National Bioinformatics Institute, University of the Western CapeBellvilleSouth Africa
| | - Paul Kersey
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | | | | | - Bernhard Korn
- 46RZPD Resource Center for Genome ResearchHeidelbergGermany
| | - Vladimir Kuryshev
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Izabela Makalowska
- 48Pennsylvania State UniversityUniversity Park, PennsylvaniaUnited States of America
| | - Takashi Makino
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Shuhei Mano
- 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai UniversityKanagawaJapan
| | - Regine Mariage-Samson
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
| | - Jun Mashima
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Hideo Matsuda
- 49Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka UniversityOsakaJapan
| | - Hans-Werner Mewes
- 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and HealthNeuherbergGermany
| | - Shinsei Minoshima
- 50Medical Photobiology Department, Photon Medical Research Center, Hamamatsu University School of MedicineShizuokaJapan
- 52Department of Molecular Biology, Keio University School of MedicineTokyoJapan
| | | | - Hideki Nagasaki
- 51Computational Biology Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Naoki Nagata
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Rajni Nigam
- 27Medical College of Wisconsin, MilwaukeeWisconsinUnited States of America
| | - Osamu Ogasawara
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | | | - Masafumi Ohtsubo
- 52Department of Molecular Biology, Keio University School of MedicineTokyoJapan
| | - Norihiro Okada
- 53Department of Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute of TechnologyKanagawaJapan
| | - Toshihisa Okido
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Satoshi Oota
- 35Bioresource Information Division, RIKEN BioResource Center, RIKEN Tsukuba InstituteIbarakiJapan
| | - Motonori Ota
- 54Global Scientific Information and Computing Center, Tokyo Institute of TechnologyTokyoJapan
| | - Toshio Ota
- 22Tokyo Research Laboratories, Kyowa Hakko Kogyo CompanyTokyoJapan
| | - Tetsuji Otsuki
- 55Molecular Biology Laboratory, Medicinal Research Laboratories, Taisho Pharmaceutical CompanySaitamaJapan
| | | | - Annemarie Poustka
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Shuang-Xi Ren
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
- 37Chinese National Human Genome Center at ShanghaiShanghaiChina
| | - Naruya Saitou
- 56Department of Population Genetics, National Institute of GeneticsShizuokaJapan
| | - Katsunaga Sakai
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Shigetaka Sakamoto
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Ryuichi Sakate
- 39Graduate School of Frontier Sciences, Department of Integrated Biosciences, University of TokyoChibaJapan
| | - Ingo Schupp
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Florence Servant
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Stephen Sherry
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Rie Shiba
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Nobuyoshi Shimizu
- 52Department of Molecular Biology, Keio University School of MedicineTokyoJapan
| | - Mary Shimoyama
- 27Medical College of Wisconsin, MilwaukeeWisconsinUnited States of America
| | | | - Bento Soares
- 25Medical Education and Biomedical Research Facility, University of IowaIowa City, IowaUnited States of America
| | - Charles Steward
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Makiko Suwa
- 51Computational Biology Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Mami Suzuki
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Aiko Takahashi
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Gen Tamiya
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai UniversityKanagawaJapan
| | - Hiroshi Tanaka
- 33Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental UniversityTokyoJapan
| | - Todd Taylor
- 57Human Genome Research Group, Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Joseph D Terwilliger
- 58Columbia University and Columbia Genome CenterNew York, New YorkUnited States of America
| | - Per Unneberg
- 59Department of Biotechnology, Royal Institute of TechnologyStockholmSweden
| | - Vamsi Veeramachaneni
- 48Pennsylvania State UniversityUniversity Park, PennsylvaniaUnited States of America
| | - Shinya Watanabe
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Laurens Wilming
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Norikazu Yasuda
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Hyang-Sook Yoo
- 18Korea Research Institute of Bioscience and BiotechnologyTaejeonKorea
| | - Marvin Stodolsky
- 60Biology Division and Genome Task Group, Office of Biological and Environmental Research, United States Department of EnergyWashington, D.CUnited States of America
| | - Wojciech Makalowski
- 48Pennsylvania State UniversityUniversity Park, PennsylvaniaUnited States of America
| | - Mitiko Go
- 61Faculty of Bio-Science, Nagahama Institute of Bio-Science and TechnologyShigaJapan
| | - Kenta Nakai
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Toshihisa Takagi
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Minoru Kanehisa
- 12Bioinformatics Center, Institute for Chemical Research, Kyoto UniversityKyotoJapan
| | - Yoshiyuki Sakaki
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
- 57Human Genome Research Group, Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - John Quackenbush
- 62Institute for Genomic ResearchRockville, MarylandUnited States of America
| | - Yasushi Okazaki
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Yoshihide Hayashizaki
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Winston Hide
- 44South African National Bioinformatics Institute, University of the Western CapeBellvilleSouth Africa
| | - Ranajit Chakraborty
- 63Center for Genome Information, Department of Environmental Health, University of CincinnatiCincinnati, OhioUnited States of America
| | - Ken Nishikawa
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Hideaki Sugawara
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Yoshio Tateno
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Zhu Chen
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
- 37Chinese National Human Genome Center at ShanghaiShanghaiChina
- 64State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, Rui-Jin Hospital, Shanghai Second Medical UniversityShanghaiChina
| | | | - Peter Tonellato
- 65PointOne SystemsWauwatosa, WisconsinUnited States of America
| | - Rolf Apweiler
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Kousaku Okubo
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Lukas Wagner
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Stefan Wiemann
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Robert L Strausberg
- 16National Cancer Institute, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Takao Isogai
- 10Reverse Proteomics Research InstituteChibaJapan
- 66Graduate School of Life and Environmental Sciences, University of TsukubaIbarakiJapan
| | - Charles Auffray
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Nobuo Nomura
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Takashi Gojobori
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
- 67Department of Genetics, Graduate University for Advanced StudiesShizuokaJapan
| | - Sumio Sugano
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 68Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of TokyoTokyoJapan
| |
Collapse
|
3186
|
Osoegawa K, Zhu B, Shu CL, Ren T, Cao Q, Vessere GM, Lutz MM, Jensen-Seaman MI, Zhao S, de Jong PJ. BAC resources for the rat genome project. Genome Res 2004; 14:780-5. [PMID: 15060022 PMCID: PMC383325 DOI: 10.1101/gr.2033904] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2003] [Accepted: 12/28/2003] [Indexed: 02/07/2023]
Abstract
Two 11-fold redundant bacterial artificial chromosome (BAC) libraries (RPCI-32 and CHORI-230) have been constructed to support the rat genome project. The first library was constructed using a male Brown Norway (BN/SsNHsd) rat as a DNA source long before plans for rat genome sequencing had been launched. The second library was prepared from a highly inbred female (BN/SsNHsd/MCW) rat in support of the rat genome sequencing project. The use of an inbred rat strain is essential to avoid problems with genome assembly resulting from the difficulty of distinguishing haplotype variation from variation among duplicons. We have demonstrated the suitability of the library by using a detailed quality assessment of large insert sizes, narrow size distribution, consistent redundancy for many markers, and long-range continuity of BAC contig maps. The widespread use of the two libraries as an integral part of the rat genome project has led to the database annotations for many clones, providing rat researchers with a rich resource of BAC clones that can be screened in silico for genes of interest.
Collapse
Affiliation(s)
- Kazutoyo Osoegawa
- Children's Hospital and Research Center at Oakland, Oakland, California 94609, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
3187
|
Silander K, Mohlke KL, Scott LJ, Peck EC, Hollstein P, Skol AD, Jackson AU, Deloukas P, Hunt S, Stavrides G, Chines PS, Erdos MR, Narisu N, Conneely KN, Li C, Fingerlin TE, Dhanjal SK, Valle TT, Bergman RN, Tuomilehto J, Watanabe RM, Boehnke M, Collins FS. Genetic variation near the hepatocyte nuclear factor-4 alpha gene predicts susceptibility to type 2 diabetes. Diabetes 2004; 53:1141-9. [PMID: 15047633 DOI: 10.2337/diabetes.53.4.1141] [Citation(s) in RCA: 204] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The Finland-United States Investigation Of NIDDM Genetics (FUSION) study aims to identify genetic variants that predispose to type 2 diabetes by studying affected sibling pair families from Finland. Chromosome 20 showed our strongest initial evidence for linkage. It currently has a maximum logarithm of odds (LOD) score of 2.48 at 70 cM in a set of 495 families. In this study, we searched for diabetes susceptibility variant(s) at 20q13 by genotyping single nucleotide polymorphism (SNP) markers in case and control DNA pools. Of 291 SNPs successfully typed in a 7.5-Mb interval, the strongest association confirmed by individual genotyping was with SNP rs2144908, located 1.3 kb downstream of the primary beta-cell promoter P2 of hepatocyte nuclear factor-4 alpha (HNF4A). This SNP showed association with diabetes disease status (odds ratio [OR] 1.33, 95% CI 1.06-1.65, P = 0.011) and with several diabetes-related traits. Most of the evidence for linkage at 20q13 could be attributed to the families carrying the risk allele. We subsequently found nine additional associated SNPs spanning a 64-kb region, including the P2 and P1 promoters and exons 1-3. Our results and the independent observation of association of SNPs near the P2 promoter with diabetes in a separate study population of Ashkenazi Jewish origin suggests that variant(s) located near or within HNF4A increases susceptibility to type 2 diabetes.
Collapse
Affiliation(s)
- Kaisa Silander
- Genome Technology Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3188
|
Kho AT, Zhao Q, Cai Z, Butte AJ, Kim JYH, Pomeroy SL, Rowitch DH, Kohane IS. Conserved mechanisms across development and tumorigenesis revealed by a mouse development perspective of human cancers. Genes Dev 2004; 18:629-40. [PMID: 15075291 PMCID: PMC387239 DOI: 10.1101/gad.1182504] [Citation(s) in RCA: 129] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2003] [Accepted: 02/25/2004] [Indexed: 11/24/2022]
Abstract
Identification of common mechanisms underlying organ development and primary tumor formation should yield new insights into tumor biology and facilitate the generation of relevant cancer models. We have developed a novel method to project the gene expression profiles of medulloblastomas (MBs)--human cerebellar tumors--onto a mouse cerebellar development sequence: postnatal days 1-60 (P1-P60). Genomically, human medulloblastomas were closest to mouse P1-P10 cerebella, and normal human cerebella were closest to mouse P30-P60 cerebella. Furthermore, metastatic MBs were highly associated with mouse P5 cerebella, suggesting that a clinically distinct subset of tumors is identifiable by molecular similarity to a precise developmental stage. Genewise, down- and up-regulated MB genes segregate to late and early stages of development, respectively. Comparable results for human lung cancer vis-a-vis the developing mouse lung suggest the generalizability of this multiscalar developmental perspective on tumor biology. Our findings indicate both a recapitulation of tissue-specific developmental programs in diverse solid tumors and the utility of tumor characterization on the developmental time axis for identifying novel aspects of clinical and biological behavior.
Collapse
Affiliation(s)
- Alvin T Kho
- Children's Hospital Informatics Program, Children's Hospital Boston, MA 02115, USA
| | | | | | | | | | | | | | | |
Collapse
|
3189
|
Riley DE, Krieger JN. Simple repeat replacements support similar functions of distinct repeats in inter-species mRNA homologs. Gene 2004; 328:17-24. [PMID: 15019980 DOI: 10.1016/j.gene.2003.12.036] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2003] [Revised: 12/09/2003] [Accepted: 12/30/2003] [Indexed: 10/26/2022]
Abstract
Simple repeats are ubiquitous in metazoan genomes, but function has been elusive. We reported that distinct, short tandem repeats (STRs) were coupled with rigorous polarity and register, suggesting order in simple repeat usage. Several STRs that lacked internal, canonical base pairs were associated with mRNAs encoding membrane functions and transcription factors. We hypothesized that diverse, simple repeats, such as (AC)n, (GU)n, (AG)n, (CU)n and (CUUU)n, had similar functions. The hypothesis predicts that closely related mRNAs would sometimes exhibit STR replacements. Comparing aquaporin 3 mRNAs, in rodents and humans, (GU)20 replaced (AG)29. Comparing biglycan mRNAs, (GU)25 replaced (CA)12. Comparing immunoglobulin superfamily member 9 mRNAs, the STR-couple, (CU)17(GU)9 replaced the STR-couple, (GU)19(GC)4. Comparing tumor necrosis factor receptor-21 mRNAs, (GU)24 replaced (CUUU)16. In a collection of 52 rodent-H. sapiens homologous mRNAs that had STRs, six (11.5%) STR-STR replacements occurred, significantly more than expected based on an STR frequency of 0.13% in all reported UTRs (p<0.001). Database studies and the observation of STR replacements among transcript homologs independently support the hypothesis that diverse repeat sequences, such as (AG)n, (AC)n, (GU)n, (CU)n and (CUUU)n, have similar usage that is consistent with analogous functions.
Collapse
Affiliation(s)
- Donald E Riley
- Department of Urology, University of Washington, Seattle, WA 98195, USA.
| | | |
Collapse
|
3190
|
Abildgaard L, Ramsing NB, Finster K. Characterization of the marine propionate-degrading, sulfate-reducing bacterium Desulfofaba fastidiosa sp. nov. and reclassification of Desulfomusa hansenii as Desulfofaba hansenii comb. nov. Int J Syst Evol Microbiol 2004; 54:393-399. [PMID: 15023950 DOI: 10.1099/ijs.0.02820-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A rod-shaped, slightly curved sulfate reducer, designated strain P2T, was isolated from the sulfate–methane transition zone of a marine sediment. Cells were motile by means of a single polar flagellum. The strain reduced sulfate, thiosulfate and sulfite to sulfide and used propionate, lactate and 1-propanol as electron donors. Strain P2T also grew by fermentation of lactate. Propionate was oxidized incompletely to acetate and CO2. The DNA G+C content was 48·8mol%. Sequence analysis of the small-subunit rDNA and the dissimilatory sulfite reductase gene revealed that strain P2T was related to the genera Desulfonema, Desulfococcus, Desulfosarcina, ‘Desulfobotulus’, Desulfofaba, Desulfomusa and Desulfofrigus. These genera include incomplete as well as complete oxidizers of substrates. Strain P2T shared important morphological and physiological traits with Desulfofaba gelida and Desulfomusa hansenii, including the ability to oxidize propionate incompletely to acetate. The 16S rRNA gene similarities of P2T to Desulfofaba gelida and Desulfomusa hansenii were respectively 92·9 and 91·5%. Combining phenotypic and genotypic traits, we propose strain P2T to be a member of the genus Desulfofaba. The name Desulfofaba fastidiosa sp. nov. (type strain P2T=DSM 15249T=ATCC BAA-815T) is proposed, reflecting the limited number of substrates consumed by the strain. In addition, the reclassification of Desulfomusa hansenii as a member of the genus Desulfofaba, Desulfofaba hansenii comb. nov., is proposed. A common line of descent and a number of shared phenotypic traits support this reclassification.
Collapse
Affiliation(s)
- Lone Abildgaard
- Department of Microbial Ecology, Bldg 540, Institute of Biological Sciences, University of Aarhus, 8000 Aarhus C, Denmark
| | - Niels Birger Ramsing
- Department of Microbial Ecology, Bldg 540, Institute of Biological Sciences, University of Aarhus, 8000 Aarhus C, Denmark
| | - Kai Finster
- Department of Microbial Ecology, Bldg 540, Institute of Biological Sciences, University of Aarhus, 8000 Aarhus C, Denmark
| |
Collapse
|
3191
|
Allen JE, Pertea M, Salzberg SL. Computational gene prediction using multiple sources of evidence. Genome Res 2004; 14:142-8. [PMID: 14707176 PMCID: PMC314291 DOI: 10.1101/gr.1562804] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.
Collapse
Affiliation(s)
- Jonathan E Allen
- The Institute for Genomic Research, Rockville, Maryland 20850, USA.
| | | | | |
Collapse
|
3192
|
Porcel BM, Delfour O, Castelli V, De Berardinis V, Friedlander L, Cruaud C, Ureta-Vidal A, Scarpelli C, Wincker P, Schächter V, Saurin W, Gyapay G, Salanoubat M, Weissenbach J. Numerous novel annotations of the human genome sequence supported by a 5'-end-enriched cDNA collection. Genome Res 2004; 14:463-71. [PMID: 14962985 PMCID: PMC353234 DOI: 10.1101/gr.1481104] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
A collection of 90,000 human cDNA clones generated to increase the fraction of "full-length" cDNAs available was analyzed by sequence alignment on the human genome assembly. Five hundred fifty-two gene models not found in LocusLink, with coding regions of at least 300 bp, were defined by using this collection. Exon composition proposed for novel genes showed an average of 4.7 exons per gene. In 20% of the cases, at least half of the exons predicted for new genes coincided with evolutionary conserved regions defined by sequence comparisons with the pufferfish Tetraodon nigroviridis. Among this subset, CpG islands were observed at the 5' end of 75%. In-frame stop codons upstream of the initiator ATG were present in 49% of the new genes, and 16% contained a coding region comprising at least 50% of the cDNA sequence. This cDNA resource also provided candidate small protein-coding genes, usually not included in genome annotations. In addition, analysis of a sample from this cDNA collection indicates that approximately 380 gene models described in LocusLink could be extended at their 5' end by at least one new exon. Finally, this cDNA resource provided an experimental support for annotations based exclusively on predictions, thus representing a resource substantially improving the human genome annotation.
Collapse
Affiliation(s)
- Betina M Porcel
- Genoscope-Centre National de Séquençage and CNRS UMR-8030, 91000 Evry, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3193
|
Mariño-Ramírez L, Spouge JL, Kanga GC, Landsman D. Statistical analysis of over-represented words in human promoter sequences. Nucleic Acids Res 2004; 32:949-58. [PMID: 14963262 PMCID: PMC373387 DOI: 10.1093/nar/gkh246] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The identification and characterization of regulatory sequence elements in the proximal promoter region of a gene can be facilitated by knowing the precise location of the transcriptional start site (TSS). Using known TSSs from over 5700 different human full-length cDNAs, this study extracted a set of 4737 distinct putative promoter regions (PPRs) from the human genome. Each PPR consisted of nucleotides from -2000 to +1000 bp, relative to the corresponding TSS. Since many regulatory regions contain short, highly conserved strings of less than 10 nucleotides, we counted eight-letter words within the PPRs, using z-scores and other related statistics to evaluate their over- and under-representation. Several over-represented eight-letter words have known biological functions described in the eukaryotic transcription factor database TRANSFAC; however, many did not. Besides calculating a P-value with the standard normal approximation associated with z-scores, we used two extra statistical controls to evaluate the significance of over-represented words. These controls have important implications for evaluating over- and under-represented words with z-scores.
Collapse
Affiliation(s)
- Leonardo Mariño-Ramírez
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, MSC 6075 Bethesda, MD 20894-6075, USA
| | | | | | | |
Collapse
|
3194
|
Sperisen P, Iseli C, Pagni M, Stevenson BJ, Bucher P, Jongeneel CV. trome, trEST and trGEN: databases of predicted protein sequences. Nucleic Acids Res 2004; 32:D509-11. [PMID: 14681469 PMCID: PMC308801 DOI: 10.1093/nar/gkh067] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We previously introduced two new protein databases (trEST and trGEN) of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Here, we present the updates made on these two databases plus a new database (trome), which uses alignments of EST data to HTG or full genomes to generate virtual transcripts and coding sequences. This new database is of higher quality and since it contains the information in a much denser format it is of much smaller size. These new databases are in a Swiss-Prot-like format and are updated on a weekly basis (trEST and trGEN) or every 3 months (trome). They can be downloaded by anonymous ftp from ftp://ftp.isrec.isb-sib.ch/pub/databases.
Collapse
Affiliation(s)
- Peter Sperisen
- Swiss Institute of Bioinformatics, Ludwig Institute for Cancer Research, Chemin des Boveresses 155, 1066 Epalinges s/Lausanne, Switzerland.
| | | | | | | | | | | |
Collapse
|
3195
|
Wheeler DL, Church DM, Edgar R, Federhen S, Helmberg W, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Suzek TO, Tatusova TA, Wagner L. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res 2004; 32:D35-40. [PMID: 14681353 PMCID: PMC308807 DOI: 10.1093/nar/gkh073] [Citation(s) in RCA: 247] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- David L Wheeler
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3196
|
Margulies EH, Blanchette M, Haussler D, Green ED. Identification and characterization of multi-species conserved sequences. Genome Res 2004; 13:2507-18. [PMID: 14656959 PMCID: PMC403793 DOI: 10.1101/gr.1602203] [Citation(s) in RCA: 242] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Comparative sequence analysis has become an essential component of studies aiming to elucidate genome function. The increasing availability of genomic sequences from multiple vertebrates is creating the need for computational methods that can detect highly conserved regions in a robust fashion. Towards that end, we are developing approaches for identifying sequences that are conserved across multiple species; we call these "Multi-species Conserved Sequences" (or MCSs). Here we report two strategies for MCS identification, demonstrating their ability to detect virtually all known actively conserved sequences (specifically, coding sequences) but very little neutrally evolving sequence (specifically, ancestral repeats). Importantly, we find that a substantial fraction of the bases within MCSs (approximately 70%) resides within non-coding regions; thus, the majority of sequences conserved across multiple vertebrate species has no known function. Initial characterization of these MCSs has revealed sequences that correspond to clusters of transcription factor-binding sites, non-coding RNA transcripts, and other candidate functional elements. Finally, the ability to detect MCSs represents a valuable metric for assessing the relative contribution of a species' sequence to identifying genomic regions of interest, and our results indicate that the currently available genome sequences are insufficient for the comprehensive identification of MCSs in the human genome.
Collapse
Affiliation(s)
- Elliott H Margulies
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | |
Collapse
|
3197
|
Freudenberg-Hua Y, Freudenberg J, Kluck N, Cichon S, Propping P, Nöthen MM. Single nucleotide variation analysis in 65 candidate genes for CNS disorders in a representative sample of the European population. Genome Res 2003; 13:2271-6. [PMID: 14525928 PMCID: PMC403700 DOI: 10.1101/gr.1299703] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The detailed investigation of variation in functionally important regions of the human genome is expected to promote understanding of genetically complex diseases. We resequenced 65 candidate genes for CNS disorders in an average of 85 European individuals. The minor allele frequency (MAF), an indicator of weak purifying selection, was lowest in radical amino acid alterations, whereas similar MAF was observed for synonymous variants and conservative amino acid alterations. In noncoding sequences, variants located in CpG islands tended to have a lower MAF than those outside CpG islands. The transition/transversion ratio was increased among both synonymous and conservative variants compared with noncoding variants. Conversely, the transition/transversion ratio was lowest among radical amino acid alterations. Furthermore, among nonsynonymous variants, transversions displayed lower MAF than did transitions. This suggests that transversions are associated with functionally important amino acid alterations. By comparing our data with public SNP databases, we found that variants with lower allele frequency are underrepresented in these databases. Therefore, radical variants obtain distinctively lower database coverage. However, those variants appear to be under weak purifying selection and thus could play a role in the etiology of genetically complex diseases.
Collapse
|
3198
|
Sakabe NJ, de Souza JES, Galante PAF, de Oliveira PSL, Passetti F, Brentani H, Osório EC, Zaiats AC, Leerkes MR, Kitajima JP, Brentani RR, Strausberg RL, Simpson AJG, de Souza SJ. ORESTES are enriched in rare exon usage variants affecting the encoded proteins. C R Biol 2003; 326:979-85. [PMID: 14744104 DOI: 10.1016/j.crvi.2003.09.027] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A significant fraction of the variability found in the human transcriptome is due to alternative splicing, including alternative exon usage (AEU), intron retention and use of cryptic splice sites. We present a comparison of a large-scale analysis of AEU in the human transcriptome through genome mapping of Open Reading Frame ESTs (ORESTES) and conventional ESTs. It is shown here that ORESTES probe low abundant messages more efficiently. In addition, most of the variants detected by ORESTES affect the structure of the corresponding proteins.
Collapse
Affiliation(s)
- Noboru Jo Sakabe
- Ludwig Institute for Cancer Research, Sao Paulo Branch, Rua Prof Antonio Prudente 109, 4(o) andar, 01509-010, Sao Paulo, Brazil
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3199
|
Zambon AC, McDearmon EL, Salomonis N, Vranizan KM, Johansen KL, Adey D, Takahashi JS, Schambelan M, Conklin BR. Time- and exercise-dependent gene regulation in human skeletal muscle. Genome Biol 2003; 4:R61. [PMID: 14519196 PMCID: PMC328450 DOI: 10.1186/gb-2003-4-10-r61] [Citation(s) in RCA: 189] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2003] [Revised: 08/12/2003] [Accepted: 08/18/2003] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Skeletal muscle remodeling is a critical component of an organism's response to environmental changes. Exercise causes structural changes in muscle and can induce phase shifts in circadian rhythms, fluctuations in physiology and behavior with a period of around 24 hours that are maintained by a core clock mechanism. Both exercise-induced remodeling and circadian rhythms rely on the transcriptional regulation of key genes. RESULTS We used DNA microarrays to determine the effects of resistance exercise (RE) on gene regulation in biopsy samples of human quadriceps muscle obtained 6 and 18 hours after an acute bout of isotonic exercise with one leg. We also profiled diurnal gene regulation at the same time points (2000 and 0800 hours) in the non-exercised leg. Comparison of our results with published circadian gene profiles in mice identified 44 putative genes that were regulated in a circadian fashion. We then used quantitative PCR to validate the circadian expression of selected gene orthologs in mouse skeletal muscle. CONCLUSIONS The coordinated regulation of the circadian clock genes Cry1, Per2, and Bmal1 6 hours after RE and diurnal genes 18 hours after RE in the exercised leg suggest that RE may directly modulate circadian rhythms in human skeletal muscle.
Collapse
Affiliation(s)
- Alexander C Zambon
- Gladstone Institute of Cardiovascular Disease, Department of Medicine, University of California, San Francisco, CA 94141, USA
| | - Erin L McDearmon
- Howard Hughes Medical Institute, Department of Neurobiology and Physiology, Northwestern University, Evanston, IL 60208, USA
| | - Nathan Salomonis
- Gladstone Institute of Cardiovascular Disease, Department of Medicine, University of California, San Francisco, CA 94141, USA
| | - Karen M Vranizan
- Gladstone Institute of Cardiovascular Disease, Department of Medicine, University of California, San Francisco, CA 94141, USA
- Functional Genomics Lab, University of California, Berkeley, CA 94720, USA
| | - Kirsten L Johansen
- Department of Medicine, University of California, San Francisco, CA 94141, USA
| | - Deborah Adey
- Department of Medicine, University of California, San Francisco, CA 94141, USA
| | - Joseph S Takahashi
- Howard Hughes Medical Institute, Department of Neurobiology and Physiology, Northwestern University, Evanston, IL 60208, USA
| | - Morris Schambelan
- Department of Medicine, University of California, San Francisco, CA 94141, USA
| | - Bruce R Conklin
- Gladstone Institute of Cardiovascular Disease, Department of Medicine, University of California, San Francisco, CA 94141, USA
- Department of Medicine, University of California, San Francisco, CA 94141, USA
| |
Collapse
|
3200
|
del Val C, Glatting KH, Suhai S. cDNA2Genome: a tool for mapping and annotating cDNAs. BMC Bioinformatics 2003; 4:39. [PMID: 12964951 PMCID: PMC239864 DOI: 10.1186/1471-2105-4-39] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2003] [Accepted: 09/10/2003] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND In the last years several high-throughput cDNA sequencing projects have been funded worldwide with the aim of identifying and characterizing the structure of complete novel human transcripts. However some of these cDNAs are error prone due to frameshifts and stop codon errors caused by low sequence quality, or to cloning of truncated inserts, among other reasons. Therefore, accurate CDS prediction from these sequences first require the identification of potentially problematic cDNAs in order to speed up the posterior annotation process. RESULTS cDNA2Genome is an application for the automatic high-throughput mapping and characterization of cDNAs. It utilizes current annotation data and the most up to date databases, especially in the case of ESTs and mRNAs in conjunction with a vast number of approaches to gene prediction in order to perform a comprehensive assessment of the cDNA exon-intron structure. The final result of cDNA2Genome is an XML file containing all relevant information obtained in the process. This XML output can easily be used for further analysis such us program pipelines, or the integration of results into databases. The web interface to cDNA2Genome also presents this data in HTML, where the annotation is additionally shown in a graphical form. cDNA2Genome has been implemented under the W3H task framework which allows the combination of bioinformatics tools in tailor-made analysis task flows as well as the sequential or parallel computation of many sequences for large-scale analysis. CONCLUSIONS cDNA2Genome represents a new versatile and easily extensible approach to the automated mapping and annotation of human cDNAs. The underlying approach allows sequential or parallel computation of sequences for high-throughput analysis of cDNAs.
Collapse
Affiliation(s)
- Coral del Val
- Department of Molecular Biophysics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Karl-Heinz Glatting
- Department of Molecular Biophysics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Sandor Suhai
- Department of Molecular Biophysics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| |
Collapse
|