1101
|
Wei CL, Ng P, Chiu KP, Wong CH, Ang CC, Lipovich L, Liu ET, Ruan Y. 5' Long serial analysis of gene expression (LongSAGE) and 3' LongSAGE for transcriptome characterization and genome annotation. Proc Natl Acad Sci U S A 2004; 101:11701-6. [PMID: 15272081 PMCID: PMC511040 DOI: 10.1073/pnas.0403514101] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2004] [Indexed: 11/18/2022] Open
Abstract
Complete genome annotation relies on precise identification of transcription units bounded by a transcription initiation site (TIS) and a polyadenylation site (PAS). To facilitate this process, we developed a set of two complementary methods, 5' Long serial analysis of gene expression (LS) and 3'LS. These analyses are based on the original SAGE and LS methods coupled with full-length cDNA cloning, and enable the high-throughput extraction of the first and the last 20 bp of each transcript. We demonstrate that the mapping of 5'LS and 3'LS tags to the genome allows the localization of TIS and PAS. By using 537 tag pairs mapping to the region of known genes, we confirmed that >90% of the tag pairs appropriately assigned to the first and last exons. Moreover, by using tag sequences as primers for RT-PCRs, we were able to recover putative full-length transcripts in 81% of the attempts. This large-scale generation of transcript terminal tags is at least 20-40 times more efficient than full-length cDNA cloning and sequencing in the identification of complete transcription units. The apparent precision and deep coverage makes 5'LS and 3'LS an advanced approach for genome annotation through whole-transcriptome characterization.
Collapse
Affiliation(s)
- Chia-Lin Wei
- Genome Institute of Singapore, 60 Biopolis Street, Genome 02-01, Singapore 138672
| | | | | | | | | | | | | | | |
Collapse
|
1102
|
Zhou G, Wang J, Zhang Y, Zhong C, Ni J, Wang L, Guo J, Zhang K, Yu L, Zhao S. Cloning, expression and subcellular localization of HN1 and HN1L genes, as well as characterization of their orthologs, defining an evolutionarily conserved gene family. Gene 2004; 331:115-23. [PMID: 15094197 DOI: 10.1016/j.gene.2004.02.025] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2003] [Revised: 01/20/2004] [Accepted: 02/04/2004] [Indexed: 10/26/2022]
Abstract
The present work reported the cloning and characterization of two novel human genes--HN1 (hematopoietic- and neurologic-expressed sequence 1) and HN1L (HN1-like gene) which are proposed to be involved in embryo development. HN1 is mapped on chromosome 17q25.2, with two transcripts (1.0 and 1.6 kb in length, respectively) due to alternative splicing. HN1 is expressed abundantly in testis and skeletal muscle among 16 human tissues, and it is localized in the nucleus indicated by GFP fusion expression. Western blot confirmed that HN1 encodes a 16.5-kDa protein. HN1L is on chromosome 16p13.3, with three splicing in the length of 2.0, 4.0 and 4.2 kb, respectively. HN1L is expressed in a variety of tissues such as liver, kidney, prostate, testis and uterus at varying levels. HN1L gene encodes a 20-kDa protein, which is localized in both the nucleus and cytoplasm. Fourteen of HN1 and sixteen of HN1L homologous genes in different species were determined and analyzed by BLAST searches. Silicon analyses of the 14 orthologous proteins of HN1 and 16 orthologous proteins of HN1L revealed that they share great conservation in vertebrate. Additionally, we identified nine pseudogenes of HN1 (six) and HN1L (three) in the genomes of the human, mouse and rat. Based on sequence alignments and phylogenetic analysis, all these homologous genes and pseudogenes were defined as a HN1 gene family.
Collapse
Affiliation(s)
- Guangjin Zhou
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Science, Fudan University, 220 Handan Road, Shanghai 200433, PR China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1103
|
Clegg N, Abbott D, Ferguson C, Coleman R, Nelson PS. Characterization and comparative analyses of transcriptomes from the normal and neoplastic human prostate. Prostate 2004; 60:227-39. [PMID: 15176052 DOI: 10.1002/pros.20055] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
BACKGROUND The prostate gland is a highly specialized organ with functional attributes that serve to enhance the fertility of mammalian species. Pathological processes affecting the prostate include benign prostate hypertrophy and prostate carcinoma; diseases that account for major morbidity and mortality in middle-aged and elderly men. To facilitate studies of biological processes uniquely represented in the prostate and assess molecular alterations associated with prostate carcinoma, we sought to establish the diversity of gene expression in the normal and neoplastic prostate through the compilation and analysis of a prostate transcriptome. METHODS We assembled and annotated ESTs derived from prostate cDNA libraries that were either produced in our laboratory or available from public sequence repositories such as CGAP, dbEST, and Unigene. Determinations of differential gene expression between the normal prostate, other normal tissues, and neoplastic prostate tissues was performed using statistical algorithms. Confirmation of differential expression was performed by quantitative PCR and Northern analysis. RESULTS A total of 99,448 high-quality ESTs were assembled and annotated to produce a prostate transcriptome comprised of 24,580 distinct TUs. Comparative analyses of gene expression levels identified 61 TUs with exclusive expression in the prostate and 45 TUs with high levels of expression in the prostate relative to at least 25 other normal tissues (P > 0.99). Comparative analyses of ESTs derived from neoplastic prostate tissues identified 75 genes with dysregulated expression in cancer (P > 0.99). CONCLUSIONS The human prostate expresses a diverse repertoire of genes that reflect a functionally complex organ. The identification of genes with prostate-restricted or enhanced expression may provide additional insights into the biochemical processes that interact to form the developmental, signaling, and metabolic pathways of the normal and neoplastic gland.
Collapse
Affiliation(s)
- Nigel Clegg
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | | | | | | | | |
Collapse
|
1104
|
Nilsson HO, Ouis IS, Stenram U, Ljungh A, Moran AP, Wadström T, Al-Soud WA. High prevalence of Helicobacter Species detected in laboratory mouse strains by multiplex PCR-denaturing gradient gel electrophoresis and pyrosequencing. J Clin Microbiol 2004; 42:3781-8. [PMID: 15297530 PMCID: PMC497606 DOI: 10.1128/jcm.42.8.3781-3788.2004] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2004] [Revised: 04/16/2004] [Accepted: 05/04/2004] [Indexed: 11/20/2022] Open
Abstract
Rodent models have been developed to study the pathogenesis of diseases caused by Helicobacter pylori, as well as by other gastric and intestinal Helicobacter spp., but some murine enteric Helicobacter spp. cause hepatobiliary and intestinal tract diseases in specific inbred strains of laboratory mice. To identify these murine Helicobacter spp., we developed an assay based on PCR-denaturing gradient gel electrophoresis and pyrosequencing. Nine strains of mice, maintained in four conventional laboratory animal houses, were assessed for Helicobacter sp. carriage. Tissue samples from the liver, stomach, and small intestine, as well as feces and blood, were collected; and all specimens (n = 210) were screened by a Helicobacter genus-specific PCR. Positive samples were identified to the species level by multiplex denaturing gradient gel electrophoresis, pyrosequencing, and a H. ganmani-specific PCR assay. Histologic examination of 30 tissue samples from 18 animals was performed. All mice of eight of the nine strains tested were Helicobacter genus positive; H. bilis, H. hepaticus, H. typhlonius, H. ganmani, H. rodentium, and a Helicobacter sp. flexispira-like organism were identified. Helicobacter DNA was common in fecal (86%) and gastric tissue (55%) specimens, whereas samples of liver tissue (21%), small intestine tissue (17%), and blood (14%) were less commonly positive. Several mouse strains were colonized with more than one Helicobacter spp. Most tissue specimens analyzed showed no signs of inflammation; however, in one strain of mice, hepatitis was diagnosed in livers positive for H. hepaticus, and in another strain, gastric colonization by H. typhlonius was associated with gastritis. The diagnostic setup developed was efficient at identifying most murine Helicobacter spp.
Collapse
Affiliation(s)
- Hans-Olof Nilsson
- Department of Medical Microbiology, Dermatology and Infection, Lund University, Sölvegatan 23, SE-223 62 Lund, Sweden
| | | | | | | | | | | | | |
Collapse
|
1105
|
Callen BP, Shearwin KE, Egan JB. Transcriptional interference between convergent promoters caused by elongation over the promoter. Mol Cell 2004; 14:647-56. [PMID: 15175159 DOI: 10.1016/j.molcel.2004.05.010] [Citation(s) in RCA: 132] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2003] [Revised: 04/05/2004] [Accepted: 04/12/2004] [Indexed: 01/21/2023]
Abstract
Transcriptional interference with convergent transcription from face-to-face promoters is a potentially important form of gene regulation in all organisms. Using LacZ reporter studies, the mechanism of interference was determined for a pair of face-to-face prokaryotic promoters in which a strong promoter interferes 5.6-fold with a weak promoter, 62 bp away. The promoters were variously rearranged to test different models of interference. Terminating transcription from the strong promoter before it reached the weak promoter dramatically reduced interference, indicating a requirement for the passage of the converging RNAP over the weak promoter. Based on in vitro experiments showing a slow rate of escape for open complexes at the weak promoter and their sensitivity to head-on collisions with elongating RNAP, a "sitting duck" model of interference is proposed and supported with in vivo permanganate footprinting. The model is further supported by the analysis of a second set of prokaryotic face-to-face promoters.
Collapse
MESH Headings
- Gene Expression Regulation, Bacterial/genetics
- Genes, Regulator/genetics
- Models, Biological
- Prokaryotic Cells/metabolism
- Promoter Regions, Genetic/genetics
- Promoter Regions, Genetic/physiology
- RNA Interference/physiology
- RNA, Antisense/genetics
- RNA, Antisense/metabolism
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- Transcription, Genetic/genetics
- Transcription, Genetic/physiology
- Transcriptional Elongation Factors/genetics
Collapse
Affiliation(s)
- Benjamin P Callen
- School of Molecular and Biomedical Science (Biochemistry), University of Adelaide, South Australia 5005, Australia
| | | | | |
Collapse
|
1106
|
Vinuesa CG, Goodnow CC. Illuminating autoimmune regulators through controlled variation of the mouse genome sequence. Immunity 2004; 20:669-79. [PMID: 15189733 DOI: 10.1016/j.immuni.2004.05.012] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Gene variants in mice that have strong, Mendelian effects on autoimmune susceptibility have been one of the most productive entry points for identifying genes and processes regulating human autoimmunity. With the tools now available to map and identify new mouse Mendelian gene variants, the handful of spontaneous mutations accumulated over several decades have all been identified, and the main bottleneck lies in producing new Mendelian immune variants. We outline here a strategy to generate large sets of functional variants in genes controlling lupus and humoral immunity, based upon limited variation of the mouse genome sequence with the chemical mutagen, ENU, combined with a set of sensitive immunological screens.
Collapse
Affiliation(s)
- Carola G Vinuesa
- John Curtin School of Medical Research and Australian Phenomics Facility, The Australian National University, Mills Road, Canberra, ACT 2601, Australia
| | | |
Collapse
|
1107
|
Schwartz F, Duka A, Duka I, Cui J, Gavras H. Novel targets of ANG II regulation in mouse heart identified by serial analysis of gene expression. Am J Physiol Heart Circ Physiol 2004; 287:H1957-66. [PMID: 15242839 DOI: 10.1152/ajpheart.00568.2004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Although the central role of ANG II in cardiovascular homeostasis is well appreciated, the molecular circuitry of its many actions is not completely understood. With the use of serial analysis of gene expression to assess global transcriptional changes in the heart of mice after continuous 7-day ANG II administration, we identified patterns of gene expression indicative of cardiac remodeling, including coordinate regulation of genes previously described in a context of processes associated with hypertrophy and fibrosis. In addition, we discovered several novel ANG II targets, including characterized genes of known function, recently annotated genes of unknown function, and the putative genes not yet present in current databases. The serial analysis of gene expression approach to assess the role of ANG II presented in this report provides new venues for inquiries into ANG II-mediated cardiac function.
Collapse
Affiliation(s)
- Faina Schwartz
- Dept. of Medicine, Genetics Program, Boston Univ. School of Medicine, 715 Albany St., L-320, Boston, MA 02118, USA.
| | | | | | | | | |
Collapse
|
1108
|
Hoffmeister D, Thorson JS. Mechanistic Implications of Escherichia coli Galactokinase Structure-Based Engineering. Chembiochem 2004; 5:989-92. [PMID: 15239057 DOI: 10.1002/cbic.200400003] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Dirk Hoffmeister
- Laboratory for Biosynthetic Chemistry, School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | | |
Collapse
|
1109
|
Nishant KT, Ravishankar H, Rao MRS. Characterization of a mouse recombination hot spot locus encoding a novel non-protein-coding RNA. Mol Cell Biol 2004; 24:5620-34. [PMID: 15169920 PMCID: PMC419864 DOI: 10.1128/mcb.24.12.5620-5634.2004] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Our current knowledge of recombination hot spot activity in mammalian systems implicates a role for both the primary DNA sequence and the nature of the chromatin domain around it. In mice, the only recombination hot spots mapped to date have been confined to a cluster within the major histocompatibility complex (MHC) region. We present a high resolution analysis of a new recombination hot spot in the mouse genome which maps to mouse chromosome 8 C-D. Haplotype diversity analysis across 40 different strains of mice has enabled us to map recombination breakpoints to a 1-kb interval. This hot spot has a recombination intensity that is 10- to 100-fold above the genome average and has a mean gene conversion tract length of 371 bp. This meiotically active locus happens to be flanked by a transcribed region encoding a non-protein-coding RNA polymerase II transcript and the previously characterized repair site. Many of the primary DNA sequence features that have been reported for the mouse MHC hot spots are also shared by this hot spot locus and in addition, along with three other MHC hot spot loci, we show a new parallel feature of association of the crossover sites with the nuclear matrix.
Collapse
Affiliation(s)
- K T Nishant
- Department of Biochemistry, Indian Institute of Science, Bangalore 560012, India
| | | | | |
Collapse
|
1110
|
Mutch DM, Simmering R, Donnicola D, Fotopoulos G, Holzwarth JA, Williamson G, Corthésy-Theulaz I. Impact of commensal microbiota on murine gastrointestinal tract gene ontologies. Physiol Genomics 2004; 19:22-31. [PMID: 15226484 DOI: 10.1152/physiolgenomics.00105.2004] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The gastrointestinal tract (GIT) of eukaryotes is colonized by a vast number of bacteria, where the commensal microbiota play an important role in defining the healthy gut. To investigate the influence of commensal bacteria on multiple regions of the host GIT transcriptome, the gene expression profiles of the corpus, jejunum, descending colon, and rectum of conventional (n = 3) and germ-free mice (n = 3) were examined using the Affymetrix Mu74Av2 GeneChip. Differentially regulated genes were identified using the global error assessment model, and a novel method of Gene Ontology (GO) clustering was used to identify significantly modulated biological functions. The microbiota modify the greatest number of genes in the jejunum (267 genes with an alpha < 0.001) and the fewest in the rectum (137 genes with an alpha < 0.001). Clustering genes by GO biological process and molecular function annotations revealed that, despite the large number of differentially regulated genes, the residential microbiota most significantly modified genes involved in such biological processes as immune function and water transport all along the length of the mouse GIT. Additionally, region-specific communication between the host and microbiota were identified in the corpus and jejunum, where tissue kallikrein and apoptosis regulator activities were modulated, respectively. These findings identify important interactions between the microbiota and the mouse gut tissue transcriptome and, furthermore, suggest that interactions between the microbial population and host GIT are implicated in the coordination of region-specific functions.
Collapse
Affiliation(s)
- David M Mutch
- Nestlé Research Center, Vers-chez-les-Blanc, CH-1000 Lausanne 26, Switzerland
| | | | | | | | | | | | | |
Collapse
|
1111
|
Yalcin B, Fullerton J, Miller S, Keays DA, Brady S, Bhomra A, Jefferson A, Volpi E, Copley RR, Flint J, Mott R. Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc Natl Acad Sci U S A 2004; 101:9734-9. [PMID: 15210992 PMCID: PMC470780 DOI: 10.1073/pnas.0401189101] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2004] [Indexed: 01/21/2023] Open
Abstract
Investigation of sequence variation in common inbred mouse strains has revealed a segmented pattern in which regions of high and low variant density are intermixed. Furthermore, it has been suggested that allelic strain distribution patterns also occur in well defined blocks and consequently could be used to map quantitative trait loci (QTL) in comparisons between inbred strains. We report a detailed analysis of polymorphism distribution in multiple inbred mouse strains over a 4.8-megabase region containing a QTL influencing anxiety. Our analysis indicates that it is only partly true that the genomes of inbred strains exist as a patchwork of segments of sequence identity and difference. We show that the definition of haplotype blocks is not robust and that methods for QTL mapping may fail if they assume a simple block-like structure.
Collapse
Affiliation(s)
- B Yalcin
- Wellcome Trust Centre for Human Genetics, Oxford University, Roosevelt Drive, Oxford OX3 7BN, United Kingdom
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1112
|
Zeeberg BR, Riss J, Kane DW, Bussey KJ, Uchio E, Linehan WM, Barrett JC, Weinstein JN. Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 2004; 5:80. [PMID: 15214961 PMCID: PMC459209 DOI: 10.1186/1471-2105-5-80] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2004] [Accepted: 06/23/2004] [Indexed: 11/10/2022] Open
Abstract
Background When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. Results A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered. Conclusions Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem.
Collapse
Affiliation(s)
- Barry R Zeeberg
- Genomics & Bioinformatics Group, Laboratory of Molecular Pharmacology, Center for Cancer Research (CCR), National Cancer Institute (NCI), National Institutes of Health (NIH), Bldg 37 Rm 5041, NIH, 9000 Rockville Pike, Bethesda, MD 20892 USA
| | - Joseph Riss
- Laboratory of Biosystems and Cancer, CCR, Bldg 37 Rm 5032, NIH, 9000 Rockville Pike, Bethesda, MD 20892 USA
| | - David W Kane
- SRA International, 4300 Fair Lakes CT, Fairfax, VA 22033 USA
| | - Kimberly J Bussey
- Genomics & Bioinformatics Group, Laboratory of Molecular Pharmacology, Center for Cancer Research (CCR), National Cancer Institute (NCI), National Institutes of Health (NIH), Bldg 37 Rm 5041, NIH, 9000 Rockville Pike, Bethesda, MD 20892 USA
| | - Edward Uchio
- Urologic Oncology Branch, Bldg 10 Rm 2B47, National Institutes of Health, Bethesda, MD 20892 USA
| | - W Marston Linehan
- Urologic Oncology Branch, Bldg 10 Rm 2B47, National Institutes of Health, Bethesda, MD 20892 USA
| | - J Carl Barrett
- Laboratory of Biosystems and Cancer, CCR, Bldg 37 Rm 5032, NIH, 9000 Rockville Pike, Bethesda, MD 20892 USA
| | - John N Weinstein
- Genomics & Bioinformatics Group, Laboratory of Molecular Pharmacology, Center for Cancer Research (CCR), National Cancer Institute (NCI), National Institutes of Health (NIH), Bldg 37 Rm 5041, NIH, 9000 Rockville Pike, Bethesda, MD 20892 USA
| |
Collapse
|
1113
|
Schlamp CL, Thliveris AT, Li Y, Kohl LP, Knop C, Dietz JA, Larsen IV, Imesch P, Pinto LH, Nickells RW. Insertion of the beta Geo promoter trap into the Fem1c gene of ROSA3 mice. Mol Cell Biol 2004; 24:3794-803. [PMID: 15082774 PMCID: PMC387761 DOI: 10.1128/mcb.24.9.3794-3803.2004] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
ROSA3 mice were developed by retroviral insertion of the beta Geo gene trap vector. Adult ROSA3 mice exhibit widespread expression of the trap gene in epithelial cells found in most organs. In the central nervous system the highest expression of beta Geo is found in CA1 pyramidal cells of the hippocampus, Purkinje cells of the cerebellum, and ganglion cells of the retina. Characterization of the genomic insertion site for beta Geo in ROSA3 mice shows that the trap vector is located in the first intron of Fem1c, a gene homologous to the sex-determining gene fem-1 of Caenorhabditis elegans. Transcription of the Rosa3 allele (R3) yields a spliced message that includes the first exon of Fem1c and the beta Geo coding region. Although normal processing of the Fem1c transcript is disrupted in homozygous Rosa3 (Fem1c(R3/R3)) mice, some tissues show low levels of a partially processed transcript containing exons 2 and 3. Since the entire coding region of Fem1c is located in these two exons, Fem1c(R3/R3) mice may still be able to express a putative FEM1C protein. To this extent, Fem1c(R3/R3) mice show no adverse effects in their sexual development or fertility or in the attenuation of neuronal cell death, another function that has been attributed to both fem-1 and a second mouse homolog, Fem1b. Examination of beta Geo expression in ganglion cells after exposure to damaging stimuli indicates that protein levels are rapidly depleted prior to cell death, making the beta Geo reporter gene a potentially useful marker to study early molecular events in damaged neurons.
Collapse
Affiliation(s)
- Cassandra L Schlamp
- Department of Ophthalmology and Visual Sciences, University of Wisconsin, Madison, Wisconsin 53704, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1114
|
Sémon M, Duret L. Evidence that functional transcription units cover at least half of the human genome. Trends Genet 2004; 20:229-32. [PMID: 15109775 DOI: 10.1016/j.tig.2004.03.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Transcriptome analyses have revealed that a large proportion of the human genome is transcribed. However, many of these transcripts might be functionless. To distinguish functional transcription units (FTUs) from spurious transcripts, we searched for the hallmarks of selective pressure against mutations that impair transcription. We analyzed the distribution of transposable elements, which are counter selected within FTUs. We show that these features are sufficiently informative to predict whether a sequence is transcribed and, if transcribed, in which orientation. Our results indicate that FTUs constitute at least 50% of the genome and that approximately one-third of these transcripts apparently do not encode proteins.
Collapse
Affiliation(s)
- Marie Sémon
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558 Université Claude Bernard Lyon 1, 16 rue Raphaël Dubois, 69622 Villeurbanne Cedex, France
| | | |
Collapse
|
1115
|
Abramowitz J, Grenet D, Birnbaumer M, Torres HN, Birnbaumer L. XLalphas, the extra-long form of the alpha-subunit of the Gs G protein, is significantly longer than suspected, and so is its companion Alex. Proc Natl Acad Sci U S A 2004; 101:8366-71. [PMID: 15148396 PMCID: PMC420400 DOI: 10.1073/pnas.0308758101] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Because of the use of alternate exons 1, mammals express two distinct forms of Gsalpha-subunits: the canonical 394-aa Gsalpha present in all tissues and a 700+-aa extra-long alphas (XLalphas) expressed in a more restricted manner. Both subunits transduce receptor signals into stimulation of adenylyl cyclase. The XL exon encodes the XL domain of XLalphas and, in a parallel ORF, a protein called Alex. Alex interacts with the XL domain of XLalphas and inhibits its adenylyl cyclase-stimulating function. In mice, rats, and humans, the XL exon is thought to contribute 422.3, 367.3, and 551.3 codons and to encode Alex proteins of 390, 357, and 561 aa, respectively. We report here that the XL exon is longer than presumed and contributes in mice, rats, and humans, respectively, an additional 364, 430, and 139 codons to XLalphas. We called the N-terminally extended XLalphas extra-extra-long Gsalpha, or XXLalphas. Alex is likewise longer. Its ORF also remains open in the 5' direction for approximately 2,000 nt, giving rise to Alex-extended, or AlexX. RT-PCR of murine total brain RNA shows that the entire XXL domain is encoded in a single exon. Furthermore, we discovered two truncated forms of XXLalphas, XXLb1 and XXLb2, in which, because of alternative splicing, the Gsalpha domain is replaced by different sequences. XXLb proteins are likely to be found as stable dimers with AlexX. The N-terminally longer proteins may play regulatory roles.
Collapse
Affiliation(s)
- Joel Abramowitz
- Transmembrane Signal Transduction Group, Laboratory of Signal Transduction, National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, NC 27709, USA
| | | | | | | | | |
Collapse
|
1116
|
Beisel KW, Shiraki T, Morris KA, Pompeia C, Kachar B, Arakawa T, Bono H, Kawai J, Hayashizaki Y, Carninci P. Identification of unique transcripts from a mouse full-length, subtracted inner ear cDNA library. Genomics 2004; 83:1012-23. [PMID: 15177555 DOI: 10.1016/j.ygeno.2004.01.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2003] [Revised: 12/15/2003] [Accepted: 01/25/2004] [Indexed: 11/20/2022]
Abstract
A small-scale full-length library construction approach was developed to facilitate production of a mouse full-length cDNA encyclopedia representing approximately 250 enriched, normalized, and/or subtracted cDNA libraries. One library produced using this approach was a subtracted adult mouse inner ear cDNA library (sIEa). The average size of the inserts was approximately 2.5 kb, with the majority ranging from 0.5 to 7.0 kb. From this library 22,574 sequence reads were obtained from 15,958 independent clones. Sequencing and chromosomal localization established 5240 clusters, with 1302 clusters being unique and 359 representing new ESTs. Our sIEa library contributed 56.1% of the 7773 nonredundant Unigene clusters associated with the four mouse inner ear libraries in the NCBI dbEST. Based on homologous chromosomal regions between human and mouse, we identified 1018 UniGene clusters associated with the deafness locus critical regions. Of these, 59 clusters were found only in our sIEa library and represented approximately 50% of the identified critical regions.
Collapse
Affiliation(s)
- Kirk W Beisel
- Department of Biomedical Sciences, Creighton University, 2500 California, Omaha, NE 68178, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1117
|
Affiliation(s)
- John S Mattick
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland 4072, Australia.
| |
Collapse
|
1118
|
Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004; 5:276-87. [PMID: 15131651 DOI: 10.1038/nrg1315] [Citation(s) in RCA: 806] [Impact Index Per Article: 38.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics and British Columbia Women's and Children's Hospitals, and Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V5Z 4H4, Canada
| | | |
Collapse
|
1119
|
Lavorgna G, Dahary D, Lehner B, Sorek R, Sanderson CM, Casari G. In search of antisense. Trends Biochem Sci 2004; 29:88-94. [PMID: 15102435 DOI: 10.1016/j.tibs.2003.12.002] [Citation(s) in RCA: 225] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In recent years, natural antisense transcripts (NATs) have been implicated in many aspects of eukaryotic gene expression including genomic imprinting, RNA interference, translational regulation, alternative splicing, X-inactivation and RNA editing. Moreover, there is growing evidence to suggest that antisense transcription might have a key role in a range of human diseases. Consequently, there have been several recent attempts to identify novel NATs. To date, approximately 2500 mammalian NATs have been found, indicating that antisense transcription might be a common mechanism of regulating gene expression in human cells. There are increasingly diverse ways in which antisense transcription can regulate gene expression and evidence for the involvement of NATs in human disease is emerging. A range of bioinformatic resources could be used to assist future antisense research.
Collapse
Affiliation(s)
- Giovanni Lavorgna
- Human Molecular Genetics Unit, Dibit-San Raffaele Scientific Institute, Via Olgettina 58, 20132 Milan, Italy.
| | | | | | | | | | | |
Collapse
|
1120
|
Yano Y, Saito R, Yoshida N, Yoshiki A, Wynshaw-Boris A, Tomita M, Hirotsune S. A new role for expressed pseudogenes as ncRNA: regulation of mRNA stability of its homologous coding gene. J Mol Med (Berl) 2004; 82:414-22. [PMID: 15148580 DOI: 10.1007/s00109-004-0550-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2004] [Accepted: 03/15/2004] [Indexed: 10/26/2022]
Abstract
We have earlier generated a mutant mouse in a course of making a transgenic line that exhibited interesting heterozygote phenotypes, which exhibited failure to thrive, severe bone deformities, and polycystic kidneys. This mutant mouse provided a clue to uncover a unique role of expressed pseudogenes. In this mutant the transgene was integrated into the vicinity of the expressing pseudogene of Makorin1 called Makorin1-p1. This insertion reduced transcription of the Makorin1-p1, resulting in destabilization of the Makorin1 mRNA in trans via a cis-acting RNA decay element within the 5' region of Makorin1 that is homologous between Makorin1 and Makorin1-p1. These findings demonstrate a novel and specific regulatory role of an expressed pseudogene as well as functional significance for noncoding RNAs. Next, we developed an original algorithm to determine how many pseudogenes are expressed. Based on our examination 2-3% of human processed pseudogenes are expressed using the most strict criteria. Interestingly, the mouse has a much smaller proportion of expressed pseudogenes (0.5-1%). Pseudogenes are functionally less constrained, and have accumulated more mutations than translated genes. If they have some functions in gene regulation, this property would allow more rapid functional diversification than protein-coding genes. In addition, some genetic phenomena that exhibit incomplete penetrance might be attributed to "mutation" or "variation" of pseudogenes.
Collapse
Affiliation(s)
- Yoshihisa Yano
- Department of Genetic Disease Research, Osaka City University Graduate School of Medicine, Asahi-machi 1-4-3 Abeno, 545-8585 Osaka, Japan
| | | | | | | | | | | | | |
Collapse
|
1121
|
Lo PK, Wang FF. 5'-Heterogeneity of mouse Dda3 transcripts is attributed to differential initiation of transcription and alternative splicing. Arch Biochem Biophys 2004; 425:221-32. [PMID: 15111131 DOI: 10.1016/j.abb.2004.03.026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2004] [Revised: 03/22/2004] [Indexed: 10/26/2022]
Abstract
We have previously shown that mouse Dda3 gene is a p53 and p73 transcriptional target whose expression suppresses tumor cell growth. Here, we report the identification of multiple variants of Dda3 transcripts with diverse 5' sequences through 5'] rapid amplification of cDNA ends (5'-RACE) and RT-PCR. Analysis by primer extension and RNase protection revealed that the 5'-heterogeneity was generated by transcription initiation at multiple sites in exon 1 and intron 1 and by alternative splicing. These transcripts, both coding and non-coding, exhibited distinct expression patterns in various adult tissues and were developmentally regulated. Furthermore, they were induced in a p53-dependent manner by various stress signals. These data demonstrated that differential initiation of transcription and alternative splicing both participate in the regulation of Dda3 gene expression.
Collapse
Affiliation(s)
- Pang-Kuo Lo
- Institute of Biochemistry, National Yang-Ming University, Shih-Pai, Taipei 112, Taiwan
| | | |
Collapse
|
1122
|
Gunaratne PH, Wu JQ, Garcia AM, Hulyk S, Worley KC, Margolin JF, Gibbs RA. Concatenation cDNA sequencing for transcriptome analysis. C R Biol 2004; 326:971-7. [PMID: 14744103 DOI: 10.1016/j.crvi.2003.09.032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We describe a high-throughput cDNA sequencing pipeline (http://www.hgsc.bcm.tmc.edu/projects/cdna) built in response to the emerging need for rapid sequencing of large cDNA collections. Using this strategy cDNA inserts are purified and joined through concatenation into large molecules. These 'pseudo-BACs' are subjected to random shotgun sequencing whereby the majority of cDNA inserts in the pool are sequenced. Using this concatenation cDNA sequencing platform, we have contributed more than 13000 full-length cDNA sequences from human and mouse to the Mammalian Gene Collection (MGC).
Collapse
Affiliation(s)
- Preethi H Gunaratne
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | | | | | | | |
Collapse
|
1123
|
Carter MG, Piao Y, Dudekula DB, Qian Y, VanBuren V, Sharov AA, Tanaka TS, Martin PR, Bassey UC, Stagg CA, Aiba K, Hamatani T, Matoba R, Kargul GJ, Ko MSH. The NIA cDNA project in mouse stem cells and early embryos. C R Biol 2004; 326:931-40. [PMID: 14744099 DOI: 10.1016/j.crvi.2003.09.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
A catalog of mouse genes expressed in early embryos, embryonic and adult stem cells was assembled, including 250000 ESTs, representing approximately 39000 unique transcripts. The cDNA libraries, enriched in full-length clones, were condensed into the NIA 15 and 7.4K clone sets, freely distributed to the research community, providing a standard platform for expression studies using microarrays. They are essential tools for studying mammalian development and stem cell biology, and to provide hints about the differential nature of embryonic and adult stem cells.
Collapse
Affiliation(s)
- Mark G Carter
- Developmental Genomics and Aging Section, Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Baltimore, MD 21224, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1124
|
Han JS, Szak ST, Boeke JD. Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature 2004; 429:268-74. [PMID: 15152245 DOI: 10.1038/nature02536] [Citation(s) in RCA: 377] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2004] [Accepted: 03/30/2004] [Indexed: 11/08/2022]
Abstract
LINE-1 (L1) elements are the most abundant autonomous retrotransposons in the human genome, accounting for about 17% of human DNA. The L1 retrotransposon encodes two proteins, open reading frame (ORF)1 and the ORF2 endonuclease/reverse transcriptase. L1 RNA and ORF2 protein are difficult to detect in mammalian cells, even in the context of overexpression systems. Here we show that inserting L1 sequences on a transcript significantly decreases RNA expression and therefore protein expression. This decreased RNA concentration does not result from major effects on the transcription initiation rate or RNA stability. Rather, the poor L1 expression is primarily due to inadequate transcriptional elongation. Because L1 is an abundant and broadly distributed mobile element, the inhibition of transcriptional elongation by L1 might profoundly affect expression of endogenous human genes. We propose a model in which L1 affects gene expression genome-wide by acting as a 'molecular rheostat' of target genes. Bioinformatic data are consistent with the hypothesis that L1 can serve as an evolutionary fine-tuner of the human transcriptome.
Collapse
Affiliation(s)
- Jeffrey S Han
- Department of Molecular Biology and Genetics and High Throughput Biology Center, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | | | |
Collapse
|
1125
|
Hertel SC, Chwieralski CE, Hinz M, Rio MC, Tomasetto C, Hoffmann W. Profiling trefoil factor family (TFF) expression in the mouse: identification of an antisense TFF1-related transcript in the kidney and liver. Peptides 2004; 25:755-62. [PMID: 15177869 DOI: 10.1016/j.peptides.2003.11.021] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2003] [Accepted: 11/14/2003] [Indexed: 12/13/2022]
Abstract
The expression of the trefoil factor family (TFF) genes (TFF1, TFF2, and TFF3) was systematically analyzed in 18 different organs from male or female mice using RT-PCR analysis. The expression patterns showed some gender-specific differences, e.g., TFF3 transcripts in the urinary bladder and liver. Furthermore, the murine expression profile differed from that in human, e.g., in the respiratory tract and uterine cervix. As a hallmark, an aberrant TFF1-related transcript was detected specifically in the kidney and liver of several mouse strains. Molecular characterization of this rare 1.8kb long transcript from the kidney clearly revealed that its 3' region originated from the antisense strand of the TFF1 locus containing particularly large parts of the antisense strands of introns 1 and 2. Homology searches using various databases revealed that this antisense TFF1-related transcript is subject of intense alternative splicing and no protein product encoded by this antisense TFF1-related transcript could be identified. Although the function of this transcript is not known currently, we can speculate that this antisense TFF1-related transcript might have a gene silencing effect particularly on TFF1 expression in the murine kidney and liver.
Collapse
Affiliation(s)
- Silvia C Hertel
- Institut für Molekularbiologie und Medizinische Chemie, Otto-von-Guericke-Universität, D-39120 Magdeburg, Germany
| | | | | | | | | | | |
Collapse
|
1126
|
Vierstraete E, Verleyen P, Sas F, Van den Bergh G, De Loof A, Arckens L, Schoofs L. The instantly released Drosophila immune proteome is infection-specific. Biochem Biophys Res Commun 2004; 317:1052-60. [PMID: 15094375 DOI: 10.1016/j.bbrc.2004.03.150] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2004] [Indexed: 11/26/2022]
Abstract
In this study, we analyzed the hemolymph proteome of Drosophila third instar larvae, which were induced with a suspension of Gram-positive bacteria or yeast. Profiling of the hemolymph proteins of infected versus non-infected larvae was performed by two-dimensional difference gel electrophoresis. Infection with Micrococcus luteus or Saccharomyces cerevisiae induced, respectively, 20 and 19 differential protein spots. The majority of the spots are specifically regulated by one pathogen, whereas only a few spots correspond to proteins altered in all cases of challenging (including after challenge with lipopolysaccharides). All of the upregulated proteins can be assigned to specific aspects of the immune system, as they did not increase in the hemolymph of sterile pricked larvae. Next to known immune proteins, unannotated proteins were identified such as CG4306 protein, which has homologues with unknown function in all metazoan genome databases available today.
Collapse
Affiliation(s)
- Evy Vierstraete
- Laboratory of Developmental Physiology, Genomics and Proteomics, K.U.Leuven, Naamsestraat 59, B-3000 Louvain, Belgium.
| | | | | | | | | | | | | |
Collapse
|
1127
|
Silva DG, Schönbach C, Brusic V, Socha LA, Nagashima T, Petrovsky N. Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system. BMC Genomics 2004; 5:28. [PMID: 15115540 PMCID: PMC420239 DOI: 10.1186/1471-2164-5-28] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2003] [Accepted: 04/29/2004] [Indexed: 11/24/2022] Open
Abstract
Background A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. Results Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70–85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders. Conclusions Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.
Collapse
Affiliation(s)
- Diego G Silva
- Medical Informatics Centre, University of Canberra, ACT 2601 Australia
- John Curtin School of Medical Research, Australian National University, Canberra ACT 2601, Australia
| | - Christian Schönbach
- Biomedical Knowledge Discovery Team, Bioinformatics Group, RIKEN Genomic Sciences Center, Yokohama 230-0045, Japan
| | | | - Luis A Socha
- Medical Informatics Centre, University of Canberra, ACT 2601 Australia
- John Curtin School of Medical Research, Australian National University, Canberra ACT 2601, Australia
| | - Takeshi Nagashima
- Biomedical Knowledge Discovery Team, Bioinformatics Group, RIKEN Genomic Sciences Center, Yokohama 230-0045, Japan
| | - Nikolai Petrovsky
- Medical Informatics Centre, University of Canberra, ACT 2601 Australia
- John Curtin School of Medical Research, Australian National University, Canberra ACT 2601, Australia
| |
Collapse
|
1128
|
Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M, Yura K, Miyazaki S, Ikeo K, Homma K, Kasprzyk A, Nishikawa T, Hirakawa M, Thierry-Mieg J, Thierry-Mieg D, Ashurst J, Jia L, Nakao M, Thomas MA, Mulder N, Karavidopoulou Y, Jin L, Kim S, Yasuda T, Lenhard B, Eveno E, Suzuki Y, Yamasaki C, Takeda JI, Gough C, Hilton P, Fujii Y, Sakai H, Tanaka S, Amid C, Bellgard M, Bonaldo MDF, Bono H, Bromberg SK, Brookes AJ, Bruford E, Carninci P, Chelala C, Couillault C, de Souza SJ, Debily MA, Devignes MD, Dubchak I, Endo T, Estreicher A, Eyras E, Fukami-Kobayashi K, R. Gopinath G, Graudens E, Hahn Y, Han M, Han ZG, Hanada K, Hanaoka H, Harada E, Hashimoto K, Hinz U, Hirai M, Hishiki T, Hopkinson I, Imbeaud S, Inoko H, Kanapin A, Kaneko Y, Kasukawa T, Kelso J, Kersey P, Kikuno R, Kimura K, Korn B, Kuryshev V, Makalowska I, Makino T, Mano S, Mariage-Samson R, Mashima J, Matsuda H, Mewes HW, Minoshima S, Nagai K, Nagasaki H, Nagata N, Nigam R, Ogasawara O, Ohara O, Ohtsubo M, Okada N, Okido T, Oota S, Ota M, Ota T, et alImanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M, Yura K, Miyazaki S, Ikeo K, Homma K, Kasprzyk A, Nishikawa T, Hirakawa M, Thierry-Mieg J, Thierry-Mieg D, Ashurst J, Jia L, Nakao M, Thomas MA, Mulder N, Karavidopoulou Y, Jin L, Kim S, Yasuda T, Lenhard B, Eveno E, Suzuki Y, Yamasaki C, Takeda JI, Gough C, Hilton P, Fujii Y, Sakai H, Tanaka S, Amid C, Bellgard M, Bonaldo MDF, Bono H, Bromberg SK, Brookes AJ, Bruford E, Carninci P, Chelala C, Couillault C, de Souza SJ, Debily MA, Devignes MD, Dubchak I, Endo T, Estreicher A, Eyras E, Fukami-Kobayashi K, R. Gopinath G, Graudens E, Hahn Y, Han M, Han ZG, Hanada K, Hanaoka H, Harada E, Hashimoto K, Hinz U, Hirai M, Hishiki T, Hopkinson I, Imbeaud S, Inoko H, Kanapin A, Kaneko Y, Kasukawa T, Kelso J, Kersey P, Kikuno R, Kimura K, Korn B, Kuryshev V, Makalowska I, Makino T, Mano S, Mariage-Samson R, Mashima J, Matsuda H, Mewes HW, Minoshima S, Nagai K, Nagasaki H, Nagata N, Nigam R, Ogasawara O, Ohara O, Ohtsubo M, Okada N, Okido T, Oota S, Ota M, Ota T, Otsuki T, Piatier-Tonneau D, Poustka A, Ren SX, Saitou N, Sakai K, Sakamoto S, Sakate R, Schupp I, Servant F, Sherry S, Shiba R, Shimizu N, Shimoyama M, Simpson AJ, Soares B, Steward C, Suwa M, Suzuki M, Takahashi A, Tamiya G, Tanaka H, Taylor T, Terwilliger JD, Unneberg P, Veeramachaneni V, Watanabe S, Wilming L, Yasuda N, Yoo HS, Stodolsky M, Makalowski W, Go M, Nakai K, Takagi T, Kanehisa M, Sakaki Y, Quackenbush J, Okazaki Y, Hayashizaki Y, Hide W, Chakraborty R, Nishikawa K, Sugawara H, Tateno Y, Chen Z, Oishi M, Tonellato P, Apweiler R, Okubo K, Wagner L, Wiemann S, Strausberg RL, Isogai T, Auffray C, Nomura N, Gojobori T, Sugano S. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol 2004; 2:e162. [PMID: 15103394 PMCID: PMC393292 DOI: 10.1371/journal.pbio.0020162] [Show More Authors] [Citation(s) in RCA: 234] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2003] [Accepted: 04/01/2004] [Indexed: 01/08/2023] Open
Abstract
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
Collapse
Affiliation(s)
- Tadashi Imanishi
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Takeshi Itoh
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 2Bioinformatics Laboratory, Genome Research Department, National Institute of Agrobiological SciencesIbarakiJapan
| | - Yutaka Suzuki
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
- 68Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of TokyoTokyoJapan
| | - Claire O'Donovan
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Satoshi Fukuchi
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | | | - Roberto A Barrero
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Takuro Tamura
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 8BITS CompanyShizuokaJapan
| | - Yumi Yamaguchi-Kabata
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Motohiko Tanino
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Kei Yura
- 9Quantum Bioinformatics Group, Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research InstituteKyotoJapan
| | - Satoru Miyazaki
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Kazuho Ikeo
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Keiichi Homma
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Arek Kasprzyk
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Tetsuo Nishikawa
- 10Reverse Proteomics Research InstituteChibaJapan
- 11Central Research Laboratory, HitachiTokyoJapan
| | - Mika Hirakawa
- 12Bioinformatics Center, Institute for Chemical Research, Kyoto UniversityKyotoJapan
| | - Jean Thierry-Mieg
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
- 14Centre National de la Recherche Scientifique (CNRS), Laboratoire de Physique MathematiqueMontpellierFrance
| | - Danielle Thierry-Mieg
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
- 14Centre National de la Recherche Scientifique (CNRS), Laboratoire de Physique MathematiqueMontpellierFrance
| | - Jennifer Ashurst
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Libin Jia
- 16National Cancer Institute, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Mitsuteru Nakao
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Michael A Thomas
- 17Department of Biological Sciences, Idaho State UniversityPocatello, IdahoUnited States of America
| | - Nicola Mulder
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Youla Karavidopoulou
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Lihua Jin
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Sangsoo Kim
- 18Korea Research Institute of Bioscience and BiotechnologyTaejeonKorea
| | | | - Boris Lenhard
- 19Center for Genomics and Bioinformatics, Karolinska InstitutetStockholmSweden
| | - Eric Eveno
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Yoshiyuki Suzuki
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Chisato Yamasaki
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Jun-ichi Takeda
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Craig Gough
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Phillip Hilton
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Yasuyuki Fujii
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Hiroaki Sakai
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 22Tokyo Research Laboratories, Kyowa Hakko Kogyo CompanyTokyoJapan
| | - Susumu Tanaka
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Clara Amid
- 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and HealthNeuherbergGermany
| | - Matthew Bellgard
- 24Centre for Bioinformatics and Biological Computing, School of Information Technology, Murdoch UniversityMurdoch, Western AustraliaAustralia
| | - Maria de Fatima Bonaldo
- 25Medical Education and Biomedical Research Facility, University of IowaIowa City, IowaUnited States of America
| | - Hidemasa Bono
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Susan K Bromberg
- 27Medical College of Wisconsin, MilwaukeeWisconsinUnited States of America
| | - Anthony J Brookes
- 19Center for Genomics and Bioinformatics, Karolinska InstitutetStockholmSweden
| | - Elspeth Bruford
- 28HUGO Gene Nomenclature Committee, University College LondonLondonUnited Kingdom
| | | | - Claude Chelala
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
| | - Christine Couillault
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | | | - Marie-Anne Debily
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
| | | | - Inna Dubchak
- 32Lawrence Berkeley National Laboratory, BerkeleyCaliforniaUnited States of America
| | - Toshinori Endo
- 33Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental UniversityTokyoJapan
| | | | - Eduardo Eyras
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Kaoru Fukami-Kobayashi
- 35Bioresource Information Division, RIKEN BioResource Center, RIKEN Tsukuba InstituteIbarakiJapan
| | - Gopal R. Gopinath
- 36Genome Knowledgebase, Cold Spring Harbor LaboratoryCold Spring Harbor, New YorkUnited States of America
| | - Esther Graudens
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Yoonsoo Hahn
- 18Korea Research Institute of Bioscience and BiotechnologyTaejeonKorea
| | - Michael Han
- 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and HealthNeuherbergGermany
| | - Ze-Guang Han
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
- 37Chinese National Human Genome Center at ShanghaiShanghaiChina
| | - Kousuke Hanada
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Hideki Hanaoka
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Erimi Harada
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Katsuyuki Hashimoto
- 38Division of Genetic Resources, National Institute of Infectious DiseasesTokyoJapan
| | - Ursula Hinz
- 34Swiss Institute of BioinformaticsGenevaSwitzerland
| | - Momoki Hirai
- 39Graduate School of Frontier Sciences, Department of Integrated Biosciences, University of TokyoChibaJapan
| | - Teruyoshi Hishiki
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Ian Hopkinson
- 41Department of Primary Care and Population Sciences, Royal Free University College Medical School, University College LondonLondonUnited Kingdom
- 42Clinical and Molecular Genetics Unit, The Institute of Child HealthLondonUnited Kingdom
| | - Sandrine Imbeaud
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Hidetoshi Inoko
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai UniversityKanagawaJapan
| | - Alexander Kanapin
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Yayoi Kaneko
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Takeya Kasukawa
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Janet Kelso
- 44South African National Bioinformatics Institute, University of the Western CapeBellvilleSouth Africa
| | - Paul Kersey
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | | | | | - Bernhard Korn
- 46RZPD Resource Center for Genome ResearchHeidelbergGermany
| | - Vladimir Kuryshev
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Izabela Makalowska
- 48Pennsylvania State UniversityUniversity Park, PennsylvaniaUnited States of America
| | - Takashi Makino
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Shuhei Mano
- 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai UniversityKanagawaJapan
| | - Regine Mariage-Samson
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
| | - Jun Mashima
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Hideo Matsuda
- 49Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka UniversityOsakaJapan
| | - Hans-Werner Mewes
- 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and HealthNeuherbergGermany
| | - Shinsei Minoshima
- 50Medical Photobiology Department, Photon Medical Research Center, Hamamatsu University School of MedicineShizuokaJapan
- 52Department of Molecular Biology, Keio University School of MedicineTokyoJapan
| | | | - Hideki Nagasaki
- 51Computational Biology Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Naoki Nagata
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Rajni Nigam
- 27Medical College of Wisconsin, MilwaukeeWisconsinUnited States of America
| | - Osamu Ogasawara
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | | | - Masafumi Ohtsubo
- 52Department of Molecular Biology, Keio University School of MedicineTokyoJapan
| | - Norihiro Okada
- 53Department of Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute of TechnologyKanagawaJapan
| | - Toshihisa Okido
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Satoshi Oota
- 35Bioresource Information Division, RIKEN BioResource Center, RIKEN Tsukuba InstituteIbarakiJapan
| | - Motonori Ota
- 54Global Scientific Information and Computing Center, Tokyo Institute of TechnologyTokyoJapan
| | - Toshio Ota
- 22Tokyo Research Laboratories, Kyowa Hakko Kogyo CompanyTokyoJapan
| | - Tetsuji Otsuki
- 55Molecular Biology Laboratory, Medicinal Research Laboratories, Taisho Pharmaceutical CompanySaitamaJapan
| | | | - Annemarie Poustka
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Shuang-Xi Ren
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
- 37Chinese National Human Genome Center at ShanghaiShanghaiChina
| | - Naruya Saitou
- 56Department of Population Genetics, National Institute of GeneticsShizuokaJapan
| | - Katsunaga Sakai
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Shigetaka Sakamoto
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Ryuichi Sakate
- 39Graduate School of Frontier Sciences, Department of Integrated Biosciences, University of TokyoChibaJapan
| | - Ingo Schupp
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Florence Servant
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Stephen Sherry
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Rie Shiba
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Nobuyoshi Shimizu
- 52Department of Molecular Biology, Keio University School of MedicineTokyoJapan
| | - Mary Shimoyama
- 27Medical College of Wisconsin, MilwaukeeWisconsinUnited States of America
| | | | - Bento Soares
- 25Medical Education and Biomedical Research Facility, University of IowaIowa City, IowaUnited States of America
| | - Charles Steward
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Makiko Suwa
- 51Computational Biology Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Mami Suzuki
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Aiko Takahashi
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Gen Tamiya
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai UniversityKanagawaJapan
| | - Hiroshi Tanaka
- 33Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental UniversityTokyoJapan
| | - Todd Taylor
- 57Human Genome Research Group, Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Joseph D Terwilliger
- 58Columbia University and Columbia Genome CenterNew York, New YorkUnited States of America
| | - Per Unneberg
- 59Department of Biotechnology, Royal Institute of TechnologyStockholmSweden
| | - Vamsi Veeramachaneni
- 48Pennsylvania State UniversityUniversity Park, PennsylvaniaUnited States of America
| | - Shinya Watanabe
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Laurens Wilming
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Norikazu Yasuda
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Hyang-Sook Yoo
- 18Korea Research Institute of Bioscience and BiotechnologyTaejeonKorea
| | - Marvin Stodolsky
- 60Biology Division and Genome Task Group, Office of Biological and Environmental Research, United States Department of EnergyWashington, D.CUnited States of America
| | - Wojciech Makalowski
- 48Pennsylvania State UniversityUniversity Park, PennsylvaniaUnited States of America
| | - Mitiko Go
- 61Faculty of Bio-Science, Nagahama Institute of Bio-Science and TechnologyShigaJapan
| | - Kenta Nakai
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Toshihisa Takagi
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Minoru Kanehisa
- 12Bioinformatics Center, Institute for Chemical Research, Kyoto UniversityKyotoJapan
| | - Yoshiyuki Sakaki
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
- 57Human Genome Research Group, Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - John Quackenbush
- 62Institute for Genomic ResearchRockville, MarylandUnited States of America
| | - Yasushi Okazaki
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Yoshihide Hayashizaki
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Winston Hide
- 44South African National Bioinformatics Institute, University of the Western CapeBellvilleSouth Africa
| | - Ranajit Chakraborty
- 63Center for Genome Information, Department of Environmental Health, University of CincinnatiCincinnati, OhioUnited States of America
| | - Ken Nishikawa
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Hideaki Sugawara
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Yoshio Tateno
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Zhu Chen
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
- 37Chinese National Human Genome Center at ShanghaiShanghaiChina
- 64State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, Rui-Jin Hospital, Shanghai Second Medical UniversityShanghaiChina
| | | | - Peter Tonellato
- 65PointOne SystemsWauwatosa, WisconsinUnited States of America
| | - Rolf Apweiler
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Kousaku Okubo
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Lukas Wagner
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Stefan Wiemann
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Robert L Strausberg
- 16National Cancer Institute, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Takao Isogai
- 10Reverse Proteomics Research InstituteChibaJapan
- 66Graduate School of Life and Environmental Sciences, University of TsukubaIbarakiJapan
| | - Charles Auffray
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Nobuo Nomura
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Takashi Gojobori
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
- 67Department of Genetics, Graduate University for Advanced StudiesShizuokaJapan
| | - Sumio Sugano
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 68Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of TokyoTokyoJapan
| |
Collapse
|
1129
|
Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 2004; 101:6062-7. [PMID: 15075390 PMCID: PMC395923 DOI: 10.1073/pnas.0400782101] [Citation(s) in RCA: 2811] [Impact Index Per Article: 133.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2004] [Accepted: 03/02/2004] [Indexed: 01/14/2023] Open
Abstract
The tissue-specific pattern of mRNA expression can indicate important clues about gene function. High-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale. Toward this end, we have designed custom arrays that interrogate the expression of the vast majority of protein-encoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues. The resulting data set provides the expression patterns for thousands of predicted genes, as well as known and poorly characterized genes, from mice and humans. We have explored this data set for global trends in gene expression, evaluated commonly used lines of evidence in gene prediction methodologies, and investigated patterns indicative of chromosomal organization of transcription. We describe hundreds of regions of correlated transcription and show that some are subject to both tissue and parental allele-specific expression, suggesting a link between spatial expression and imprinting.
Collapse
Affiliation(s)
- Andrew I Su
- The Genomics Institute of the Novartis Research Foundation, 10675 John J. Hopkins Drive, San Diego, CA 92121, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1130
|
Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ, Wheeler R, Wong B, Drenkow J, Yamanaka M, Patel S, Brubaker S, Tammana H, Helt G, Struhl K, Gingeras TR. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004; 116:499-509. [PMID: 14980218 DOI: 10.1016/s0092-8674(04)00127-8] [Citation(s) in RCA: 849] [Impact Index Per Article: 40.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2003] [Revised: 12/05/2003] [Accepted: 01/19/2004] [Indexed: 01/25/2023]
Abstract
Using high-density oligonucleotide arrays representing essentially all nonrepetitive sequences on human chromosomes 21 and 22, we map the binding sites in vivo for three DNA binding transcription factors, Sp1, cMyc, and p53, in an unbiased manner. This mapping reveals an unexpectedly large number of transcription factor binding site (TFBS) regions, with a minimal estimate of 12,000 for Sp1, 25,000 for cMyc, and 1600 for p53 when extrapolated to the full genome. Only 22% of these TFBS regions are located at the 5' termini of protein-coding genes while 36% lie within or immediately 3' to well-characterized genes and are significantly correlated with noncoding RNAs. A significant number of these noncoding RNAs are regulated in response to retinoic acid, and overlapping pairs of protein-coding and noncoding RNAs are often coregulated. Thus, the human genome contains roughly comparable numbers of protein-coding and noncoding genes that are bound by common transcription factors and regulated by common environmental signals.
Collapse
Affiliation(s)
- Simon Cawley
- Affymetrix, 3380 Central Expressway, Santa Clara, CA 95051, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1131
|
Mindnich R, Möller G, Adamski J. The role of 17 beta-hydroxysteroid dehydrogenases. Mol Cell Endocrinol 2004; 218:7-20. [PMID: 15130507 DOI: 10.1016/j.mce.2003.12.006] [Citation(s) in RCA: 266] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/27/2003] [Revised: 12/09/2003] [Accepted: 12/15/2003] [Indexed: 11/17/2022]
Abstract
The biological activity of steroid hormones is regulated at the pre-receptor level by several enzymes including 17 beta-hydroxysteroid dehydrogenases (17 beta -HSD). The latter are present in many microorganisms, invertebrates and vertebrates. Dysfunctions in human 17 beta-hydroxysteroid dehydrogenases result in disorders of biology of reproduction and neuronal diseases, the enzymes are also involved in the pathogenesis of various cancers. 17 beta-hydroxysteroid dehydrogenases reveal a remarkable multifunctionality being able to modulate concentrations not only of steroids but as well of fatty and bile acids. Current knowledge on genetics, biochemistry and medical implications is presented in this review.
Collapse
Affiliation(s)
- R Mindnich
- GSF-National Research Center for Environment and Health, Institute of Experimental Genetics, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | | | | |
Collapse
|
1132
|
Zheng ZM. Regulation of alternative RNA splicing by exon definition and exon sequences in viral and mammalian gene expression. J Biomed Sci 2004. [PMID: 15067211 DOI: 10.1159/000077096] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Intron removal from a pre-mRNA by RNA splicing was once thought to be controlled mainly by intron splicing signals. However, viral and other eukaryotic RNA exon sequences have recently been found to regulate RNA splicing, polyadenylation, export, and nonsense-mediated RNA decay in addition to their coding function. Regulation of alternative RNA splicing by exon sequences is largely attributable to the presence of two major cis-acting elements in the regulated exons, the exonic splicing enhancer (ESE) and the suppressor or silencer (ESS). Two types of ESEs have been verified from more than 50 genes or exons: purine-rich ESEs, which are the more common, and non-purine-rich ESEs. In contrast, the sequences of ESSs identified in approximately 20 genes or exons are highly diverse and show little similarity to each other. Through interactions with cellular splicing factors, an ESE or ESS determines whether or not a regulated splice site, usually an upstream 3' splice site, will be used for RNA splicing. However, how these elements function precisely in selecting a regulated splice site is only partially understood. The balance between positive and negative regulation of splice site selection likely depends on the cis-element's identity and changes in cellular splicing factors under physiological or pathological conditions.
Collapse
Affiliation(s)
- Zhi-Ming Zheng
- HIV and AIDS Malignancy Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
1133
|
Sugino H, Yanase H, Hamada S, Kurokawa K, Asakawa S, Shimizu N, Yagi T. Distinct genomic sequence of the CNR/Pcdhα genes in chicken. Biochem Biophys Res Commun 2004; 316:437-45. [PMID: 15020237 DOI: 10.1016/j.bbrc.2004.02.067] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2004] [Indexed: 11/30/2022]
Abstract
CNR/Pcdhalpha family proteins have been first identified as a receptor family that corporate with Fyn, a family of the Src family of tyrosine kinase, and known as synaptic cadherins. Here we report the complete genomic sequence and organization of the chicken (Gallus gallus) CNR/Pcdhalpha The total length of chicken CNR/Pcdhalpha is 177kb. The chicken CNR/Pcdhalpha cluster encodes 12 variable and 3 constant exons. The genomic organizations of the chicken, rat, mouse, and human CNR/Pcdhalpha are basically orthologous. The constant-region exons (CP1, CP2, and CP3) are highly conserved between chicken and mammals, with percent identities of 90.9%, 90.7%, and 91.8% at the amino-acid level for chicken versus rat, mouse, and human, respectively. In contrast, the percent identities of the variable-region exons between chicken and mammals were lower: 51.8%, 51.3%, and 52.7%, on average, for chicken versus rat, mouse, and human, respectively, at the amino-acid level. Moreover, the chicken variable-region exons (from v1 to v12) are highly conserved paralogously (91.4%: nucleic acid, 92.4%: amino acid) in comparison with those of mammals. The CG content of each variable exon in the chicken (v1 to v12) is 74% on average and the CpG dinucleotide frequency in each variable-region exon is twice that of mammals. Due to the high CG content, chicken variable exons (from v1 to v12) encode 3 to 4 frame-shifted open reading frames, which span 1.5-3.0kb, in both the sense and anti-sense orientations.
Collapse
Affiliation(s)
- Hidehiko Sugino
- Genome Information Research Center, Osaka University, 3-1, Yamadaoka Suita, Osaka 565-0871, Japan.
| | | | | | | | | | | | | |
Collapse
|
1134
|
Hayashizaki Y, Kawai J. A new approach to the distribution and storage of genetic resources. Nat Rev Genet 2004; 5:223-8. [PMID: 14970824 DOI: 10.1038/nrg1296] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Yoshihide Hayashizaki
- Genome Exploration Research Group, Riken Genomic Sciences Center, Riken Yokohama Institute, and Japan and Preventure JST, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
| | | |
Collapse
|
1135
|
Elliott D. Pathways of post-transcriptional gene regulation in mammalian germ cell development. Cytogenet Genome Res 2004; 103:210-6. [PMID: 15051941 DOI: 10.1159/000076806] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2003] [Accepted: 09/11/2003] [Indexed: 11/19/2022] Open
Abstract
Male germ cell development is orchestrated by complex and disparate patterns of gene expression operating in different cell types. The mechanisms of gene expression underlying these have been dissected in the mouse because of its readily available genetics. These analyses have shown that as well as the traditional transcriptional mechanisms, post-transcriptional regulatory pathways of gene expression are essential for mouse spermatogenesis. Proteins essential for germ cell development have been identified which operate at different points throughout the life cycle of RNA from pre-mRNA splicing to translation and RNA decay in the cytoplasm. Recent data suggests that these post-transcriptional pathways respond to environmental cues via signalling pathways.
Collapse
Affiliation(s)
- D Elliott
- Institute of Human Genetics, Centre for Life Central Parkway, University of Newcastle upon Tyne, Newcastle, England.
| |
Collapse
|
1136
|
Kodzius R, Matsumura Y, Kasukawa T, Shimokawa K, Fukuda S, Shiraki T, Nakamura M, Arakawa T, Sasaki D, Kawai J, Harbers M, Carninci P, Hayashizaki Y. Absolute expression values for mouse transcripts: re-annotation of the READ expression database by the use of CAGE and EST sequence tags. FEBS Lett 2004; 559:22-6. [PMID: 14960301 DOI: 10.1016/s0014-5793(04)00018-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2003] [Revised: 12/26/2003] [Accepted: 01/05/2004] [Indexed: 11/18/2022]
Abstract
The RIKEN expression array database (READ) provides comprehensive gene expression data for the mouse, which were obtained as relative values from microarray double-staining experiments with E17.5 mRNA as common reference. To assign absolute expression values for mouse transcripts within READ, we applied the E17.5 reference sample to CAGE (cap analysis of gene expression) and expressed sequence tag (EST) high-throughput tag sequencing. Newly assigned values within the READ database were validated by comparison to expression data from serial analysis of gene expression, CAGE and EST experiments. These experiments confirmed the great significance of the absolute expression values within the improved READ database. The new Absolute READ database on absolute expression data is available under.
Collapse
Affiliation(s)
- Rimantas Kodzius
- Genome Science Laboratory, RIKEN, Wako Main Campus, Hirosawa 2-1, Wako, Saitama 351-0198, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1137
|
Abstract
Many non-coding sequences transcribed from the mammalian genome are proving to have important regulatory roles, but the functions of the majority remain mysterious. For decades, researchers have focused most of their attention on protein-coding genes and proteins. With the completion of the human and mouse genomes and the accumulation of data on the mammalian transcriptome, the focus now shifts to non-coding DNA sequences, RNA-coding genes and their transcripts. Many non-coding transcribed sequences are proving to have important regulatory roles, but the functions of the majority remain mysterious.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | |
Collapse
|
1138
|
Nakane T, Satoh T, Inada Y, Nakayama J, Itoh F, Chiba S. Molecular cloning and expression of HRLRRP, a novel heart-restricted leucine-rich repeat protein. Biochem Biophys Res Commun 2004; 314:1086-92. [PMID: 14751244 DOI: 10.1016/j.bbrc.2003.12.202] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We isolated a novel leucine-rich repeat protein (LRRP) cDNA from E13 mouse embryos by the in silico approach. The cDNA encoded a protein of 274 amino acids having 7 leucine-rich repeat motifs at the center of the protein. An in vitro transcription/translation study showed that the cDNA coded for a peptide of approximately 31kDa. Northern blot analysis suggested that the mRNA of this novel LRRP was expressed only in the heart, although RT-PCR indicated slight expression in skeletal muscle as well. The transcripts of this gene and Nkx-2.5/Csx were detected in the early stage of cardiac differentiation of P19CL6 embryonal carcinoma cells treated with 1% dimethyl sulfoxide. The fusion protein made between it and GFP was detected at a high level in mitochondria and a low level in the nuclei of COS7 cells. The nuclei of the adult mouse heart were strongly stained with the antibody raised against the synthetic peptide of the protein. Therefore, we designated the gene as heart-restricted leucine-rich repeat protein (HRLRRP) and assume that mouse HRLRRP may play important roles in cardiac development and/or cardiac function.
Collapse
Affiliation(s)
- Tokio Nakane
- Department of Molecular Pharmacology, Shinshu University School of Medicine, Asahi 3-1-1, 390-8621, Matsumoto, Japan.
| | | | | | | | | | | |
Collapse
|
1139
|
Retta SF, Avolio M, Francalanci F, Procida S, Balzac F, Degani S, Tarone G, Silengo L. Identification of Krit1B: a novel alternative splicing isoform of cerebral cavernous malformation gene-1. Gene 2004; 325:63-78. [PMID: 14697511 DOI: 10.1016/j.gene.2003.09.046] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Cerebral cavernous malformations (CCM) are vascular malformations, mostly located in the central nervous system, which occur in 0.1-0.5% of the population. They are characterized by abnormally enlarged and often leaking capillary cavities without intervening neural parenchyma. Some are clinically silent, whereas others cause seizures, intracerebral haemorrhage or focal neurological deficits. These vascular malformations can arise sporadically or may be inherited as an autosomal dominant condition with incomplete penetrance. At least 45% of families affected with cerebral cavernous malformations harbour a mutation in Krev interaction trapped-1 (Krit1) gene (cerebral cavernous malformation gene-1, CCM1). This gene contains 16 coding exons which encode a 736-amino acid protein containing three ankyrin repeats and a FERM domain. Neither the CCM1 pathogenetic mechanisms nor the function of the Krit1 protein are understood so far, although several hypotheses have been inferred from the predicted consequences of Krit1 mutations as well as from the identification of Krit1 as a binding partner of Rap1A, ICAP1A and microtubules. Here, we report the identification of Krit1B, a novel Krit1 isoform characterized by the alternative splicing of the 15th coding exon. We show that the Krit1B splice isoform is widely expressed in mouse cell lines and tissues, whereas its expression is highly restricted in human. In addition, we developed a real-time PCR strategy to accurately quantify the relative ratio of the two Krit1 alternative transcripts in different tissues, demonstrating a Krit1B/Krit1A ratio up to 20% in mouse thymus, but significantly lower ratios in other tissues. Bioinformatic analysis using exon/gene-prediction, comparative alignment and structure analysis programs supported the existence of Krit1 alternative transcripts lacking the 15th coding exon and showed that the splicing out of this exon occurs outside of potentially important Krit1 structural domains but in a region required for association with Rap1A, suggesting a subtle, yet important effect on the protein function. Our results indicate that maintenance of a proper ratio between Krit1A and Krit1B could be functionally relevant and suggest that the novel Krit1B isoform might expand our understanding of the role of Krit1 in CCM1 pathogenesis.
Collapse
Affiliation(s)
- Saverio Francesco Retta
- Department of Genetic, Biology and Biochemistry, University of Torino and Experimental Medicine Research Centre, San Giovanni Battista Hospital, Via Santena 5/bis, 10126 Turin, Italy.
| | | | | | | | | | | | | | | |
Collapse
|
1140
|
Kampa D, Cheng J, Kapranov P, Yamanaka M, Brubaker S, Cawley S, Drenkow J, Piccolboni A, Bekiranov S, Helt G, Tammana H, Gingeras TR. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res 2004; 14:331-42. [PMID: 14993201 PMCID: PMC353210 DOI: 10.1101/gr.2094104] [Citation(s) in RCA: 373] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In this report, we have achieved a richer view of the transcriptome for Chromosomes 21 and 22 by using high-density oligonucleotide arrays on cytosolic poly(A)(+) RNA. Conservatively, only 31.4% of the observed transcribed nucleotides correspond to well-annotated genes, whereas an additional 4.8% and 14.7% correspond to mRNAs and ESTs, respectively. Approximately 85% of the known exons were detected, and up to 21% of known genes have only a single isoform based on exon-skipping alternative expression. Overall, the expression of the well-characterized exons falls predominately into two categories, uniquely or ubiquitously expressed with an identifiable proportion of antisense transcripts. The remaining observed transcription (49.0%) was outside of any known annotation. These novel transcripts appear to be more cell-line-specific and have lower and less variation in expression than the well-characterized genes. Novel transcripts were further characterized based on their distance to annotations, transcript size, coding capacity, and identification as antisense to intronic sequences. By RT-PCR, 126 novel transcripts were independently verified, resulting in a 65% verification rate. These observations strongly support the argument for a re-evaluation of the total number of human genes and an alternative term for "gene" to encompass these growing, novel classes of RNA transcripts in the human genome.
Collapse
Affiliation(s)
- Dione Kampa
- Affymetrix, Santa Clara, California 95051, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1141
|
Veeramachaneni V, Makałowski W, Galdzicki M, Sood R, Makałowska I. Mammalian overlapping genes: the comparative perspective. Genome Res 2004; 14:280-6. [PMID: 14762064 PMCID: PMC327103 DOI: 10.1101/gr.1590904] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
It is believed that 3.2 billion bp of the human genome harbor approximately 35000 protein-coding genes. On average, one could expect one gene per 300000 nucleotides (nt). Although the distribution of the genes in the human genome is not random,it is rather surprising that a large number of genes overlap in the mammalian genomes. Thousands of overlapping genes were recently identified in the human and mouse genomes. However,the origin and evolution of overlapping genes are still unknown. We identified 1316 pairs of overlapping genes in humans and mice and studied their evolutionary patterns. It appears that these genes do not demonstrate greater than usual conservation. Studies of the gene structure and overlap pattern showed that only a small fraction of analyzed genes preserved exactly the same pattern in both organisms.
Collapse
Affiliation(s)
- Vamsi Veeramachaneni
- Institute of Molecular Evolutionary Genetics, Department of Biology, Pennsylvania State University, State College, University Park, Pennsylvania 16802, USA
| | | | | | | | | |
Collapse
|
1142
|
Hamatani T, Carter MG, Sharov AA, Ko MSH. Dynamics of global gene expression changes during mouse preimplantation development. Dev Cell 2004; 6:117-31. [PMID: 14723852 DOI: 10.1016/s1534-5807(03)00373-3] [Citation(s) in RCA: 683] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Understanding preimplantation development is important both for basic reproductive biology and for practical applications including regenerative medicine and livestock breeding. Global expression profiles revealed and characterized the distinctive patterns of maternal RNA degradation and zygotic gene activation, including two major transient waves of de novo transcription. The first wave corresponds to zygotic genome activation (ZGA); the second wave, named mid-preimplantation gene activation (MGA), precedes the dynamic morphological and functional changes from the morula to blastocyst stage. Further expression profiling of embryos treated with inhibitors of transcription, translation, and DNA replication revealed that the translation of maternal RNAs is required for the initiation of ZGA. We propose a cascade of gene activation from maternal RNA/protein sets to ZGA gene sets and thence to MGA gene sets. The large number of genes identified as involved in each phase is a first step toward analysis of the complex gene regulatory networks.
Collapse
Affiliation(s)
- Toshio Hamatani
- Developmental Genomics and Aging Section, Laboratory of Genetics, National Institute on Aging, National Institutes of Health, 333 Cassell Drive, Suite 3000, Baltimore, MD 21224, USA
| | | | | | | |
Collapse
|
1143
|
Baranova A, Hammarsund M, Ivanov D, Skoblov M, Sangfelt O, Corcoran M, Borodina T, Makeeva N, Pestova A, Tyazhelova T, Nazarenko S, Gorreta F, Alsheddi T, Schlauch K, Nikitin E, Kapanadze B, Shagin D, Poltaraus A, Ivanovich Vorobiev A, Zabarovsky E, Lukianov S, Chandhoke V, Ibbotson R, Oscier D, Einhorn S, Grander D, Yankovsky N. Distinct organization of the candidate tumor suppressor gene RFP2 in human and mouse: multiple mRNA isoforms in both species- and human-specific antisense transcript RFP2OS. Gene 2004; 321:103-12. [PMID: 14636997 DOI: 10.1016/j.gene.2003.08.007] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
In the present study, we describe the human and mouse RFP2 gene structure, multiple RFP2 mRNA isoforms in the two species that have different 5' UTRs and a human-specific antisense transcript RFP2OS. Since the human RFP2 5' UTR is not conserved in mouse, these findings might indicate a different regulation of RFP2 in the two species. The predicted human and mouse RFP2 proteins are shown to contain a tripartite RING finger-B-box-coiled-coil domain (RBCC), also known as a TRIM domain, and therefore belong to a subgroup of RING finger proteins that are often involved in developmental and tumorigenic processes. Because homozygous deletions of chromosomal region 13q14.3 are found in a number of malignancies, including chronic lymphocytic leukemia (CLL) and multiple myeloma (MM), we suggest that RFP2 might be involved in tumor development. This study provides necessary information for evaluation of the role of RFP2 in malignant transformation and other biological processes.
Collapse
MESH Headings
- Alternative Splicing
- Amino Acid Sequence
- Animals
- Chromosome Mapping
- Chromosomes, Human, Pair 13/genetics
- Cloning, Molecular
- DNA/chemistry
- DNA/genetics
- DNA-Binding Proteins/genetics
- Exons
- Female
- Gene Expression
- Genes/genetics
- Humans
- Introns
- Male
- Mice
- Molecular Sequence Data
- Promoter Regions, Genetic/genetics
- RNA, Antisense/genetics
- RNA, Antisense/metabolism
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- Sequence Alignment
- Sequence Analysis, DNA
- Sequence Homology, Amino Acid
- Species Specificity
- Transcription, Genetic
- Tumor Suppressor Proteins/genetics
Collapse
Affiliation(s)
- Ancha Baranova
- Genome Analysis Laboratory, Institute of General Genetics, Russian Academy of Science, Moscow 119991, Russia.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1144
|
Porcel BM, Delfour O, Castelli V, De Berardinis V, Friedlander L, Cruaud C, Ureta-Vidal A, Scarpelli C, Wincker P, Schächter V, Saurin W, Gyapay G, Salanoubat M, Weissenbach J. Numerous novel annotations of the human genome sequence supported by a 5'-end-enriched cDNA collection. Genome Res 2004; 14:463-71. [PMID: 14962985 PMCID: PMC353234 DOI: 10.1101/gr.1481104] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
A collection of 90,000 human cDNA clones generated to increase the fraction of "full-length" cDNAs available was analyzed by sequence alignment on the human genome assembly. Five hundred fifty-two gene models not found in LocusLink, with coding regions of at least 300 bp, were defined by using this collection. Exon composition proposed for novel genes showed an average of 4.7 exons per gene. In 20% of the cases, at least half of the exons predicted for new genes coincided with evolutionary conserved regions defined by sequence comparisons with the pufferfish Tetraodon nigroviridis. Among this subset, CpG islands were observed at the 5' end of 75%. In-frame stop codons upstream of the initiator ATG were present in 49% of the new genes, and 16% contained a coding region comprising at least 50% of the cDNA sequence. This cDNA resource also provided candidate small protein-coding genes, usually not included in genome annotations. In addition, analysis of a sample from this cDNA collection indicates that approximately 380 gene models described in LocusLink could be extended at their 5' end by at least one new exon. Finally, this cDNA resource provided an experimental support for annotations based exclusively on predictions, thus representing a resource substantially improving the human genome annotation.
Collapse
Affiliation(s)
- Betina M Porcel
- Genoscope-Centre National de Séquençage and CNRS UMR-8030, 91000 Evry, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1145
|
Abstract
Sequence data of entire eukaryotic genomes and their detailed comparison have provided new evidence on genome evolution. The major mechanisms involved in the increase of genome sizes are polyploidization and gene duplication. Subsequent gene silencing or mutations, preferentially in regulatory sequences of genes, modify the genome and permit the development of genes with new properties. Mechanisms such as lateral gene transfer, exon shuffling or the creation of new genes by transposition contribute to the evolution of a genome, but remain of relatively restricted relevance. Mechanisms to decrease genome sizes and, in particular, to remove specific DNA sequences, such as blocks of satellite DNAs, appear to involve the action of RNA interference (RNAi). RNAi mechanisms have been proven to be involved in chromatin packaging related with gene inactivation as well as in DNA excision during the macronucleus development in ciliates.
Collapse
Affiliation(s)
- Wolfgang Hennig
- German Academic Exchange Service (DAAD) Laboratory, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China.
| |
Collapse
|
1146
|
Zoraghi R, Corbin JD, Francis SH. Properties and functions of GAF domains in cyclic nucleotide phosphodiesterases and other proteins. Mol Pharmacol 2004; 65:267-78. [PMID: 14742667 DOI: 10.1124/mol.65.2.267] [Citation(s) in RCA: 117] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
- Roya Zoraghi
- Department of Molecular Physiology and Biophysics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232-0615, USA
| | | | | |
Collapse
|
1147
|
Castrillo JI, Oliver SG. Yeast as a Touchstone in Post-genomic Research: Strategies for Integrative Analysis in Functional Genomics. BMB Rep 2004; 37:93-106. [PMID: 14761307 DOI: 10.5483/bmbrep.2004.37.1.093] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The new complexity arising from the genome sequencing projects requires new comprehensive post-genomic strategies: advanced studies in regulatory mechanisms, application of new high-throughput technologies at a genome-wide scale, at the different levels of cellular complexity (genome, transcriptome, proteome and metabolome), efficient analysis of the results, and application of new bioinformatic methods in an integrative or systems biology perspective. This can be accomplished in studies with model organisms under controlled conditions. In this review a perspective of the favourable characteristics of yeast as a touchstone model in post-genomic research is presented. The state-of-the art, latest advances in the field and bottlenecks, new strategies, new regulatory mechanisms, applications (patents) and high-throughput technologies, most of them being developed and validated in yeast, are presented. The optimal characteristics of yeast as a well-defined system for comprehensive studies under controlled conditions makes it a perfect model to be used in integrative, "systems biology" studies to get new insights into the mechanisms of regulation (regulatory networks) responsible of specific phenotypes under particular environmental conditions, to be applied to more complex organisms (e.g. plants, human).
Collapse
Affiliation(s)
- Juan I Castrillo
- School of Biological Sciences, University of Manchester, 2205 Stopford Building, Oxford Road, Manchester M13 9PT, UK.
| | | |
Collapse
|
1148
|
Abstract
Recent years saw a dramatic increase in genomic and proteomic data in public archives. Now with the complete genome sequences of human and other species in hand, detailed analyses of the genome sequences will undoubtedly improve our understanding of biological systems and at the same time require sophisticated bioinformatic tools. Here we review what computational challenges are ahead and what are the new exciting developments in this exciting field.
Collapse
Affiliation(s)
- Ungsik Yu
- National Genome Information Center, Korea Research Institute of Bioscience Biotechnology, Daejeon 305-333, Korea
| | | | | | | |
Collapse
|
1149
|
Kiyosawa H, Abe K. Speculations on the role of natural antisense transcripts in mammalian X chromosome evolution. Cytogenet Genome Res 2004; 99:151-6. [PMID: 12900558 DOI: 10.1159/000071587] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2002] [Accepted: 11/26/2002] [Indexed: 11/19/2022] Open
Abstract
Recent comprehensive transcriptome analyses in mice have revealed tremendous numbers of natural antisense transcripts in a hitherto ignored category of genes in eukaryotes. We discuss the possible biological roles of these transcripts and their relationships with mammalian sex chromosome evolution. Of 60,770 full-length cDNA sequences, as many as 2,500 pairs of sense-antisense transcripts (SATs) with the potential to form RNA duplex via their complementary sequences have been identified. This high number of antisense transcripts indicates their generic roles in gene expression regulation. These SATs are almost evenly distributed along the chromosomes, with the exception of the X chromosome. The rate of occurrence of SATs on the X chromosome is one-third to one-half that on the autosomes, and this under-representation must be related to a property intrinsic to the X chromosome. Here we hypothesize that monoallelically expressed antisense RNA regulates its sense partner, but that this regulatory system cannot operate on the mammalian X chromosome, as the mammalian X chromosome is effectively in a hemizygous state in both sexes. Loss of such regulation may be involved in the evolution of the X chromosome itself.
Collapse
Affiliation(s)
- H Kiyosawa
- Technology and Development Team for Mammalian Cellular Dynamics, RIKEN Tsukuba Institute, BioResource Center, Tsukuba-shi, Ibaraki-ken, Japan.
| | | |
Collapse
|
1150
|
Hill DP, Begley DA, Finger JH, Hayamizu TF, McCright IJ, Smith CM, Beal JS, Corbani LE, Blake JA, Eppig JT, Kadin JA, Richardson JE, Ringwald M. The mouse Gene Expression Database (GXD): updates and enhancements. Nucleic Acids Res 2004; 32:D568-71. [PMID: 14681482 PMCID: PMC308803 DOI: 10.1093/nar/gkh069] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The Gene Expression Database (GXD) is a community resource for gene expression information in the laboratory mouse. By collecting and integrating different types of expression data, GXD provides information about expression profiles in different mouse strains and mutants. Participation in the Gene Ontology (GO) project classifies genes and gene products with regard to molecular functions, biological processes, and cellular components. Integration with other Mouse Genome Informatics (MGI) databases places the gene expression information in the context of mouse genetic, genomic and phenotypic information. The integration of these types of information enables valuable insights into the molecular biology that underlies development and disease. The utility of GXD has been improved by the daily addition of new data and through the implementation of new query and display features. These improvements make it easier for users to interrogate and visualize expression data in the context of their specific needs. GXD is accessible through the MGI website at http://www.informatics.jax.org/ or directly at http://www. informatics.jax.org/menus/expression_menu.shtml.
Collapse
Affiliation(s)
- David P Hill
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|