201
|
Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto T, Manzano C, Chrast J, Dike S, Wyss C, Henrichsen CN, Holroyd N, Dickson MC, Taylor R, Hance Z, Foissac S, Myers RM, Rogers J, Hubbard T, Harrow J, Guigó R, Gingeras TR, Antonarakis SE, Reymond A. Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genes Dev 2007; 17:746-59. [PMID: 17567994 PMCID: PMC1891335 DOI: 10.1101/gr.5660607] [Citation(s) in RCA: 162] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2006] [Accepted: 01/22/2007] [Indexed: 11/24/2022]
Abstract
This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.
Collapse
Affiliation(s)
- France Denoeud
- Grup de Recerca en Informática Biomèdica, Institut Municipal d’Investigació Mèdica/Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain
| | | | - Catherine Ucla
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Adam Frankish
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, United Kingdom
| | - Robert Castelo
- Grup de Recerca en Informática Biomèdica, Institut Municipal d’Investigació Mèdica/Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain
| | - Jorg Drenkow
- Affymetrix, Inc., Santa Clara, California 95051, USA
| | - Julien Lagarde
- Grup de Recerca en Informática Biomèdica, Institut Municipal d’Investigació Mèdica/Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain
| | - Tyler Alioto
- Center for Genomic Regulation, 08003 Barcelona, Catalonia, Spain
| | - Caroline Manzano
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Jacqueline Chrast
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Sujit Dike
- Affymetrix, Inc., Santa Clara, California 95051, USA
| | - Carine Wyss
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | | | - Nancy Holroyd
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, United Kingdom
| | - Mark C. Dickson
- Department of Genetics, Stanford Human Genome Center, Stanford University School of Medicine, Stanford, California 94305-5120, USA
| | - Ruth Taylor
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, United Kingdom
| | - Zahra Hance
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, United Kingdom
| | - Sylvain Foissac
- Center for Genomic Regulation, 08003 Barcelona, Catalonia, Spain
| | - Richard M. Myers
- Department of Genetics, Stanford Human Genome Center, Stanford University School of Medicine, Stanford, California 94305-5120, USA
| | - Jane Rogers
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, United Kingdom
| | - Tim Hubbard
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, United Kingdom
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, United Kingdom
| | - Roderic Guigó
- Grup de Recerca en Informática Biomèdica, Institut Municipal d’Investigació Mèdica/Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain
- Center for Genomic Regulation, 08003 Barcelona, Catalonia, Spain
| | | | - Stylianos E. Antonarakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
| | - Alexandre Reymond
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
202
|
HYBRIDdb: a database of hybrid genes in the human genome. BMC Genomics 2007; 8:128. [PMID: 17519042 PMCID: PMC1890557 DOI: 10.1186/1471-2164-8-128] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2006] [Accepted: 05/23/2007] [Indexed: 11/30/2022] Open
Abstract
Background Hybrid genes are candidate risk factors for human tumors by inducing mutation, translocation, inversion, or rearrangement of genes. The occurrence of hybrid genes may also have given rise to new transcripts during hominid evolution. Description HYBRIDdb is a database of hybrid genes in humans. This system encompasses the bioinformatics analysis of mRNA, EST, cDNA, and genomic DNA sequences in the INDC databases, and can be used to identify hybrid genes. We searched for hybrid genes among the 28,171 genes listed in the NCBI database, and analyzed their structural patterns in the human genome. The 2,344 gene pairs were detected as hybrid forms of transcriptional products. We classified the hybrid genes into two groups: chromosomal-mediated translocation fusion transcripts and transcription-mediated fusion transcripts. Conclusion The HYBRIDdb database will provide genome scientists with insight into potential roles for hybrid genes in human evolution and disease.
Collapse
|
203
|
Kapranov P, Willingham AT, Gingeras TR. Genome-wide transcription and the implications for genomic organization. Nat Rev Genet 2007; 8:413-23. [PMID: 17486121 DOI: 10.1038/nrg2083] [Citation(s) in RCA: 529] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Recent evidence of genome-wide transcription in several species indicates that the amount of transcription that occurs cannot be entirely accounted for by current sets of genome-wide annotations. Evidence indicates that most of both strands of the human genome might be transcribed, implying extensive overlap of transcriptional units and regulatory elements. These observations suggest that genomic architecture is not colinear, but is instead interleaved and modular, and that the same genomic sequences are multifunctional: that is, used for multiple independently regulated transcripts and as regulatory regions. What are the implications and consequences of such an interleaved genomic architecture in terms of increased information content, transcriptional complexity, evolution and disease states?
Collapse
Affiliation(s)
- Philipp Kapranov
- Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California 95051, USA
| | | | | |
Collapse
|
204
|
Unneberg P, Claverie JM. Tentative mapping of transcription-induced interchromosomal interaction using chimeric EST and mRNA data. PLoS One 2007; 2:e254. [PMID: 17330142 PMCID: PMC1804257 DOI: 10.1371/journal.pone.0000254] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2006] [Accepted: 02/06/2007] [Indexed: 11/18/2022] Open
Abstract
Recent studies on chromosome conformation show that chromosomes colocalize in the nucleus, bringing together active genes in transcription factories. This spatial proximity of actively transcribing genes could provide a means for RNA interaction at the transcript level. We have screened public databases for chimeric EST and mRNA sequences with the intent of mapping transcription-induced interchromosomal interactions. We suggest that chimeric transcripts may be the result of close encounters of active genes, either as functional products or "noise" in the transcription process, and that they could be used as probes for chromosome interactions. We have found a total of 5,614 chimeric ESTs and 587 chimeric mRNAs that meet our selection criteria. Due to their higher quality, the mRNA findings are of particular interest and we hope that they may serve as food for thought for specialists in diverse areas of molecular biology.
Collapse
Affiliation(s)
- Per Unneberg
- Structural and Genomic Information Laboratory, Centre National de la Recherche Scientifique (CNRS) UPR-2589, Institut de Biologie Structurale et Microbiologie, Marseille, France.
| | | |
Collapse
|
205
|
Interpretation of multiple probe sets mapping to the same gene in Affymetrix GeneChips. BMC Bioinformatics 2007; 8:13. [PMID: 17224057 PMCID: PMC1784106 DOI: 10.1186/1471-2105-8-13] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2006] [Accepted: 01/15/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Affymetrix GeneChip technology enables the parallel observations of tens of thousands of genes. It is important that the probe set annotations are reliable so that biological inferences can be made about genes which undergo differential expression. Probe sets representing the same gene might be expected to show similar fold changes/z-scores, however this is in fact not the case. RESULTS We have made a case study of the mouse Surf4, chosen because it is a gene that was reported to be represented by the same eight probe sets on the MOE430A array by both Affymetrix and Bioconductor in early 2004. Only five of the probe sets actually detect Surf4 transcripts. Two of the probe sets detect splice variants of Surf2. We have also studied the expression changes of the eight probe sets in a public-domain microarray experiment. The transcripts for Surf4 are correlated in time, and similarly the transcripts for Surf2 are also correlated in time. However, the transcripts for Surf4 and Surf2 are not correlated. This proof of principle shows that observations of expression can be used to confirm, or otherwise, annotation discrepancies. We have also investigated groups of probe sets on the RAE230A array that are assigned to the same LocusID, but which show large variances in differential expression in any one of three different experiments on rat. The probe set groups with high variances are found to represent cases of alternative splicing, use of alternative poly(A) signals, or incorrect annotations. CONCLUSION Our results indicate that some probe sets should not be considered as unique measures of transcription, because the individual probes map to more than one transcript dependent upon the biological condition. Our results highlight the need for care when assessing whether groups of probe sets all measure the same transcript.
Collapse
|
206
|
Kajimoto K, Hiura Y, Sumiya T, Yasui N, Okuda T, Iwai N. Exclusion of the Catechol-O-Methyltransferase Gene from Genes Contributing to Salt-Sensitive Hypertension in Dahl Salt-Sensitive Rats. Hypertens Res 2007; 30:459-67. [PMID: 17587758 DOI: 10.1291/hypres.30.459] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Catechol-O-methyltransferase (COMT) is an enzyme that inactivates catecholamines. Several studies have suggested that this enzyme may play a role in blood pressure regulation. We previously reported that the expression levels of Comt mRNA in Dahl salt-sensitive (DS) rats were lower than those in Lewis (LEW) rats. However, the physiological significance of this phenomenon has not been investigated. The purpose of the present study was to evaluate the significance of lower expression of Comt in Dahl salt-sensitive hypertension. The Comt gene in DS rats has a palindromic insertion in 3'-untranslated region, which appears to be responsible for reduced mRNA stability. A genome-wide quantitative trait loci (QTL) analysis of blood pressure using 107 F2 rats indicated that a statistically significant QTL for pulse pressure was located at the Comt locus in chromosome 11. Microarray analysis confirmed that Comt was the only gene differentially expressed between DS and LEW rats in this chromosomal region. However, COMT inhibitors had no significant effects on blood pressure in either DS or LEW rats. Thus, Comt was excluded from the candidate genes contributing to salt-sensitive hypertension in DS rats. A true gene responsible for pulse pressure in this chromosome 11 region remains to be determined.
Collapse
Affiliation(s)
- Kazuaki Kajimoto
- Department of Epidemiology, Research Institute, National Cardiovascular Center, Suita, Japan
| | | | | | | | | | | |
Collapse
|
207
|
Goodstadt L, Ponting CP. Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput Biol 2006; 2:e133. [PMID: 17009864 PMCID: PMC1584324 DOI: 10.1371/journal.pcbi.0020133] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2006] [Accepted: 08/21/2006] [Indexed: 01/22/2023] Open
Abstract
Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or “in-paralogues,” are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes. Biologists often exploit the evolutionary relationships between proteins in order to explain how their findings are relevant to the biology of other species, including Homo sapiens. The most natural way to define these relationships is to draw family trees showing, for example, which human protein is the counterpart (“orthologue”) of a protein in dog, and which human proteins have arisen by recent duplication of existing genes (“paralogues”). On a small-scale this is relatively straightforward, but it is difficult to do this automatically on a genome-wide scale. In this paper the authors describe a new approach to drawing a giant family tree of all proteins from humans and dogs. They show how this tree allows them to refine some protein predictions and discard others that are likely to be nonfunctional dead sequences. Family relationships can show how the dog and human genomes have been rearranged since their last common ancestor. In addition, they help to identify the proteins that are specific to either dog or human, and which contribute to these species' biological differences. Giant trees, drawn from this method, will help to associate the differences, duplications, and evolution of proteins in different mammals with their distinctive physiologies and behaviours.
Collapse
Affiliation(s)
- Leo Goodstadt
- Medical Research Council Functional Genetics Unit, University of Oxford, Department of Physiology, Anatomy, and Genetics, Oxford, United Kingdom.
| | | |
Collapse
|
208
|
Miura F, Kawaguchi N, Sese J, Toyoda A, Hattori M, Morishita S, Ito T. A large-scale full-length cDNA analysis to explore the budding yeast transcriptome. Proc Natl Acad Sci U S A 2006; 103:17846-51. [PMID: 17101987 PMCID: PMC1693835 DOI: 10.1073/pnas.0605645103] [Citation(s) in RCA: 194] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2006] [Indexed: 11/18/2022] Open
Abstract
We performed a large-scale cDNA analysis to explore the transcriptome of the budding yeast Saccharomyces cerevisiae. We sequenced two cDNA libraries, one from the cells exponentially growing in a minimal medium and the other from meiotic cells. Both libraries were generated by using a vector-capping method that allows the accurate mapping of transcription start sites (TSSs). Consequently, we identified 11,575 TSSs associated with 3,638 annotated genomic features, including 3,599 ORFs, to suggest that most yeast genes have two or more TSSs. In addition, we identified 45 previously undescribed introns, including those affecting current ORF annotations and those spliced alternatively. Furthermore, the analysis revealed 667 transcription units in the intergenic regions and transcripts derived from antisense strands of 367 known features. We also found that 348 ORFs carry TSSs in their 3'-halves to generate sense transcripts starting from inside the ORFs. These results indicate that the budding yeast transcriptome is considerably more complex than previously thought, and it shares many recently revealed characteristics with the transcriptomes of mammals and other higher eukaryotes. Thus, the genome-wide active transcription that generates novel classes of transcripts appears to be an intrinsic feature of the eukaryotic cells. The budding yeast will serve as a versatile model for the studies on these aspects of transcriptome, and the full-length cDNA clones can function as an invaluable resource in such studies.
Collapse
Affiliation(s)
- Fumihito Miura
- *Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
- Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, Tokyo 102-0081, Japan
| | - Noriko Kawaguchi
- *Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
- Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, Tokyo 102-0081, Japan
| | - Jun Sese
- *Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
| | - Atsushi Toyoda
- The Institute of Physical and Chemical Research (RIKEN) Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama 230-0045, Japan; and
| | - Masahira Hattori
- *Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
- The Institute of Physical and Chemical Research (RIKEN) Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama 230-0045, Japan; and
- Kitasato Institute for Life Sciences, Kitasato University, Tokyo 108-8641, Japan
| | - Shinichi Morishita
- *Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
- Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, Tokyo 102-0081, Japan
| | - Takashi Ito
- *Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
- Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, Tokyo 102-0081, Japan
| |
Collapse
|
209
|
Kwaśnicka-Crawford DA, Carson AR, Scherer SW. IQCJ-SCHIP1, a novel fusion transcript encoding a calmodulin-binding IQ motif protein. Biochem Biophys Res Commun 2006; 350:890-9. [PMID: 17045569 DOI: 10.1016/j.bbrc.2006.09.136] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2006] [Accepted: 09/21/2006] [Indexed: 10/24/2022]
Abstract
The existence of transcripts that span two adjacent, independent genes is considered rare in the human genome. This study characterizes a novel human fusion gene named IQCJ-SCHIP1. IQCJ-SCHIP1 is the longest isoform of a complex transcriptional unit that bridges two separate genes that encode distinct proteins, IQCJ, a novel IQ motif containing protein and SCHIP1, a schwannomin interacting protein that has been previously shown to interact with the Neurofibromatosis type 2 (NF2) protein. IQCJ-SCHIP1 is located on the chromosome 3q25 and comprises a 1692-bp transcript encompassing 11 exons spanning 828kb of the genomic DNA. We show that IQCJ-SCHIP1 mRNA is highly expressed in the brain. Protein encoded by the IQCJ-SCHIP1 gene was localized to cytoplasm and actin-rich regions and in differentiated PC12 cells was also seen in neurite extensions.
Collapse
|
210
|
|
211
|
Takeda JI, Suzuki Y, Nakao M, Barrero RA, Koyanagi KO, Jin L, Motono C, Hata H, Isogai T, Nagai K, Otsuki T, Kuryshev V, Shionyu M, Yura K, Go M, Thierry-Mieg J, Thierry-Mieg D, Wiemann S, Nomura N, Sugano S, Gojobori T, Imanishi T. Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56,419 completely sequenced and manually annotated full-length cDNAs. Nucleic Acids Res 2006; 34:3917-28. [PMID: 16914452 PMCID: PMC1557807 DOI: 10.1093/nar/gkl507] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2006] [Revised: 07/03/2006] [Accepted: 07/03/2006] [Indexed: 11/12/2022] Open
Abstract
We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56,419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37,670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants.
Collapse
Affiliation(s)
- Jun-ichi Takeda
- Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics Consortium, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
| | - Yutaka Suzuki
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Mitsuteru Nakao
- Computational Biology Research Center, National Institute of Advanced Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Kazusa DNA Research Institute2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan
| | - Roberto A. Barrero
- Center for Information Biology and DDBJ, National Institute of Genetics1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Kanako O. Koyanagi
- Graduate School of Information Science and Technology, Hokkaido UniversityNorth 14, West 9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan
| | - Lihua Jin
- Center for Information Biology and DDBJ, National Institute of Genetics1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Chie Motono
- Computational Biology Research Center, National Institute of Advanced Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
| | - Hiroko Hata
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Takao Isogai
- Reverse Proteomics Research Institute, 2-6-7 Kazusa-KamatariKisarazu, Chiba 292-0818, Japan
- Helix Research Institute, Inc. 1532-3Yana, Kisarazu, Chiba 292-0812, Japan
| | - Keiichi Nagai
- Helix Research Institute, Inc. 1532-3Yana, Kisarazu, Chiba 292-0812, Japan
- Central Research Laboratory, Hitachi Ltd1-280, Higashi-koigakubo, Kokubunji-shi, Tokyo 185-8601, Japan
| | - Tetsuji Otsuki
- Helix Research Institute, Inc. 1532-3Yana, Kisarazu, Chiba 292-0812, Japan
| | - Vladimir Kuryshev
- Division of Molecular Genome Analysis, German Cancer Research CenterIm Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Masafumi Shionyu
- Faculty of Bio-Science, Nagahama Institute of Bio-Science and Technology1266 Tamura-cho, Nagahama, Shiga 526-0829, Japan
| | - Kei Yura
- Quantum Bioinformatics Team, Center for Computational Science and Engineering, Japan Atomic Energy Agency8-1 Umemidai, Kizu, Souraku, Kyoto 619-0215, Japan
- Core Research for Evolution Science and Technology, Japan Science and Technology AgencyJapan
| | - Mitiko Go
- Division of Molecular Genome Analysis, German Cancer Research CenterIm Neuenheimer Feld 580, D-69120 Heidelberg, Germany
- Ochanomizu University2-1-1 Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MD, USA
- Centre National de la Recherche Scientifique, Laboratoire de Physique MathematiqueMontpellier, France
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MD, USA
- Centre National de la Recherche Scientifique, Laboratoire de Physique MathematiqueMontpellier, France
| | - Stefan Wiemann
- Division of Molecular Genome Analysis, German Cancer Research CenterIm Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Nobuo Nomura
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
| | - Sumio Sugano
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
| | - Takashi Gojobori
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Center for Information Biology and DDBJ, National Institute of Genetics1111 Yata, Mishima, Shizuoka 411-8540, Japan
| | - Tadashi Imanishi
- Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Bio-IT Research BuildingAomi 2-42, Koto-ku, Tokyo 135-0064, Japan
- Graduate School of Information Science and Technology, Hokkaido UniversityNorth 14, West 9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan
| |
Collapse
|
212
|
Hernández-Sánchez C, Bártulos O, Valenciano AI, Mansilla A, de Pablo F. The regulated expression of chimeric tyrosine hydroxylase-insulin transcripts during early development. Nucleic Acids Res 2006; 34:3455-64. [PMID: 16840532 PMCID: PMC1524912 DOI: 10.1093/nar/gkl436] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Biological complexity does not appear to be simply correlated with gene number but rather other mechanisms contribute to the morphological and functional diversity across phyla. Such mechanisms regulate different transcriptional, translational and post-translational processes and include the recently identified transcription induced chimerism (TIC). We have found two novel chimeric transcripts in the chick and quail that result from the fusion of tyrosine hydroxylase (TH) and insulin into a single mature transcript. The th and insulin genes are located in tandem and they are generally transcribed independently. However, it appears that two chimeric transcripts containing exons from both the genes can also be produced in a regulated manner. The TH–INS1 and TH–INS2 chimeras differ in their insulin gene content, and they encode two novel isoforms of the TH protein with markedly reduced functionality when compared with the canonical TH. In addition, the TH–INS1 chimeric mRNA generates a small amount of insulin. We propose that TIC is an additional mechanism that can be employed to further regulate TH and insulin expression according to the specific needs of developing vertebrates.
Collapse
Affiliation(s)
- Catalina Hernández-Sánchez
- Group of Growth Factors in Vertebrate Development, Centro de Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas (CSIC), Ramiro de Maeztu 9, E-28040 Madrid, Spain.
| | | | | | | | | |
Collapse
|
213
|
|
214
|
Parra G, Reymond A, Dabbouseh N, Dermitzakis ET, Castelo R, Thomson TM, Antonarakis SE, Guigó R. Tandem chimerism as a means to increase protein complexity in the human genome. Genes Dev 2006; 16:37-44. [PMID: 16344564 PMCID: PMC1356127 DOI: 10.1101/gr.4145906] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2005] [Accepted: 09/28/2005] [Indexed: 11/24/2022]
Abstract
The "one-gene, one-protein" rule, coined by Beadle and Tatum, has been fundamental to molecular biology. The rule implies that the genetic complexity of an organism depends essentially on its gene number. The discovery, however, that alternative gene splicing and transcription are widespread phenomena dramatically altered our understanding of the genetic complexity of higher eukaryotic organisms; in these, a limited number of genes may potentially encode a much larger number of proteins. Here we investigate yet another phenomenon that may contribute to generate additional protein diversity. Indeed, by relying on both computational and experimental analysis, we estimate that at least 4%-5% of the tandem gene pairs in the human genome can be eventually transcribed into a single RNA sequence encoding a putative chimeric protein. While the functional significance of most of these chimeric transcripts remains to be determined, we provide strong evidence that this phenomenon does not correspond to mere technical artifacts and that it is a common mechanism with the potential of generating hundreds of additional proteins in the human genome.
Collapse
Affiliation(s)
- Genís Parra
- Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, and Programa de Bioinformàtica i Genòmica, Centre de Regulació Genòmica, E08003 Barcelona, Catalonia, Spain
| | | | | | | | | | | | | | | |
Collapse
|