1
|
Rodriguez JM, Maietta P, Ezkurdia I, Pietrelli A, Wesselink JJ, Lopez G, Valencia A, Tress ML. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res 2012; 41:D110-7. [PMID: 23161672 PMCID: PMC3531113 DOI: 10.1093/nar/gks1058] [Citation(s) in RCA: 153] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows annotators and researchers alike to easily identify functional changes brought about by splicing events. In addition to collecting, integrating and analyzing reliable predictions of the effect of splicing events, APPRIS also selects a single reference sequence for each gene, here termed the principal isoform, based on the annotations of structure, function and conservation for each transcript. APPRIS identifies a principal isoform for 85% of the protein-coding genes in the GENCODE 7 release for ENSEMBL. Analysis of the APPRIS data shows that at least 70% of the alternative (non-principal) variants would lose important functional or structural information relative to the principal isoform.
Collapse
|
2
|
Hartman S, Touchton G, Wynn J, Geng T, Chong NW, Smith E. Characterization of expressed sequence tags from a gallus gallus pineal gland cDNA library. Comp Funct Genomics 2008; 6:301-6. [PMID: 18629218 PMCID: PMC2447514 DOI: 10.1002/cfg.484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2005] [Revised: 05/22/2005] [Accepted: 06/01/2005] [Indexed: 11/29/2022] Open
Abstract
The pineal gland is the circadian oscillator in the chicken, regulating diverse
functions ranging from egg laying to feeding. Here, we describe the isolation and
characterization of expressed sequence tags (ESTs) isolated from a chicken pineal
gland cDNA library. A total of 192 unique sequences were analysed and submitted
to GenBank; 6% of the ESTs matched neither GenBank cDNA sequences nor the
newly assembled chicken genomic DNA sequence, three ESTs aligned with sequences
designated to be on the Z_random, while one matched a W chromosome sequence and
could be useful in cataloguing functionally important genes on this sex chromosome.
Additionally, single nucleotide polymorphisms (SNPs) were identified and validated
in 10 ESTs that showed 98% or higher sequence similarity to known chicken genes.
Here, we have described resources that may be useful in comparative and functional
genomic analysis of genes expressed in an important organ, the pineal gland, in a
model and agriculturally important organism.
Collapse
Affiliation(s)
- Stefanie Hartman
- Comparative Genomics Lab, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | | | | | | | | | | |
Collapse
|
3
|
Tress ML, Wesselink JJ, Frankish A, López G, Goldman N, Löytynoja A, Massingham T, Pardi F, Whelan S, Harrow J, Valencia A. Determination and validation of principal gene products. Bioinformatics 2008; 24:11-7. [PMID: 18006548 PMCID: PMC2734078 DOI: 10.1093/bioinformatics/btm547] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Alternative splicing has the potential to generate a wide range of protein isoforms. For many computational applications and for experimental research, it is important to be able to concentrate on the isoform that retains the core biological function. For many genes this is far from clear. RESULTS We have combined five methods into a pipeline that allows us to detect the principal variant for a gene. Most of the methods were based on conservation between species, at the level of both gene and protein. The five methods used were the conservation of exonic structure, the detection of non-neutral evolution, the conservation of functional residues, the existence of a known protein structure and the abundance of vertebrate orthologues. The pipeline was able to determine a principal isoform for 83% of a set of well-annotated genes with multiple variants.
Collapse
Affiliation(s)
- Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Burt DW. Emergence of the chicken as a model organism: implications for agriculture and biology. Poult Sci 2007; 86:1460-71. [PMID: 17575197 DOI: 10.1093/ps/86.7.1460] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Many of the features of the chicken make it an ideal model organism for phylogenetics and embryology, along with applications in agriculture and medicine. The availability of new tools such as whole genome gene expression arrays and single nucleotide polymorphism panels, coupled with the genome sequence, will enhance this position. These advances are reviewed and their implications are discussed.
Collapse
Affiliation(s)
- D W Burt
- Roslin Institute, Edinburgh, Midlothian EH25 9PS, United Kingdom.
| |
Collapse
|
5
|
Burt DW, White SJ. Avian genomics in the 21st century. Cytogenet Genome Res 2007; 117:6-13. [PMID: 17675839 DOI: 10.1159/000103159] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Accepted: 02/01/2007] [Indexed: 11/19/2022] Open
Abstract
The chicken has long been an important model organism for developmental biology, as well as a major source of protein with billions of birds used in meat and egg production each year. Chicken genomics has been transformed in recent years, with the characterisation of large EST collections and most recently with the assembly of the chicken genome sequence. As the first livestock genome to be fully sequenced it leads the way for others to follow--with zebra finch later this year. The genome sequence and the availability of three million genetic polymorphisms are expected to aid the identification of genes that control traits of importance in poultry. As the first bird genome to be sequenced it is a model for the remaining 9,600 species thought to exist today. Many of the features of avian biology and organisation of the chicken genome make it an ideal model organism for phylogenetics and embryology, along with applications in agriculture and medicine. The availability of new tools such as whole-genome gene expression arrays and SNP panels, coupled with information resources on the genes and proteins are likely to enhance this position.
Collapse
Affiliation(s)
- D W Burt
- Department of Genomics and Genetics, Roslin Institute (Edinburgh), Roslin, Midlothian, UK.
| | | |
Collapse
|
6
|
Guigó R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic VB, Birney E, Castelo R, Eyras E, Ucla C, Gingeras TR, Harrow J, Hubbard T, Lewis SE, Reese MG. EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 2006; 7 Suppl 1:S2.1-31. [PMID: 16925836 PMCID: PMC1810551 DOI: 10.1186/gb-2006-7-s1-s2] [Citation(s) in RCA: 198] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. RESULTS The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. CONCLUSION This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.
Collapse
Affiliation(s)
- Roderic Guigó
- Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
- Member of the EGASP Organizing Committee
| | - Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Josep F Abril
- Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Switzerland
| | - Julien Lagarde
- Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
| | - France Denoeud
- Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
| | - Stylianos Antonarakis
- University of Geneva Medical School and University Hospitals of Geneva, 1211 Geneva, Switzerland
| | - Michael Ashburner
- Department of Genetics, University of Cambridge, Cambridge CB3 2EH, UK
- Member of the EGASP Advisory Board
| | - Vladimir B Bajic
- South African National Bioinformatics Institute (SANBI), University of Western Cape, Bellville 7535, South Africa
- Member of the EGASP Advisory Board
| | - Ewan Birney
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Member of the EGASP Organizing Committee
| | - Robert Castelo
- Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
| | - Eduardo Eyras
- Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
| | - Catherine Ucla
- University of Geneva Medical School and University Hospitals of Geneva, 1211 Geneva, Switzerland
| | - Thomas R Gingeras
- Affymetrix Inc., Santa Clara, California 95051, USA
- Member of the EGASP Advisory Board
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- Member of the EGASP Organizing Committee
| | - Tim Hubbard
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- Member of the EGASP Organizing Committee
| | - Suzanna E Lewis
- Department of Molecular and Cellular Biology, University of California, Berkeley, California 94792, USA
- Member of the EGASP Advisory Board
| | - Martin G Reese
- Omicia Inc., Christie Ave., Emeryville, California 94608, USA
- Member of the EGASP Advisory Board
| |
Collapse
|
7
|
Plass M, Eyras E. Differentiated evolutionary rates in alternative exons and the implications for splicing regulation. BMC Evol Biol 2006; 6:50. [PMID: 16792801 PMCID: PMC1543662 DOI: 10.1186/1471-2148-6-50] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2005] [Accepted: 06/22/2006] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Alternatively spliced exons play an important role in the diversification of gene function in most metazoans and are highly regulated by conserved motifs in exons and introns. Two contradicting properties have been associated to evolutionary conserved alternative exons: higher sequence conservation and higher rate of non-synonymous substitutions, relative to constitutive exons. In order to clarify this issue, we have performed an analysis of the evolution of alternative and constitutive exons, using a large set of protein coding exons conserved between human and mouse and taking into account the conservation of the transcript exonic structure. Further, we have also defined a measure of the variation of the arrangement of exonic splicing enhancers (ESE-conservation score) to study the evolution of splicing regulatory sequences. We have used this measure to correlate the changes in the arrangement of ESEs with the divergence of exon and intron sequences. RESULTS We find evidence for a relation between the lack of conservation of the exonic structure and the weakening of the sequence evolutionary constraints in alternative and constitutive exons. Exons in transcripts with non-conserved exonic structures have higher synonymous (dS) and non-synonymous (dN) substitution rates than exons in conserved structures. Moreover, alternative exons in transcripts with non-conserved exonic structure are the least constrained in sequence evolution, and at high EST-inclusion levels they are found to be very similar to constitutive exons, whereas alternative exons in transcripts with conserved exonic structure have a dS significantly lower than average at all EST-inclusion levels. We also find higher conservation in the arrangement of ESEs in constitutive exons compared to alternative ones. Additionally, the sequence conservation at flanking introns remains constant for constitutive exons at all ESE-conservation values, but increases for alternative exons at high ESE-conservation values. CONCLUSION We conclude that most of the differences in dN observed between alternative and constitutive exons can be explained by the conservation of the transcript exonic structure. Low dS values are more characteristic of alternative exons with conserved exonic structure, but not of those with non-conserved exonic structure. Additionally, constitutive exons are characterized by a higher conservation in the arrangement of ESEs, and alternative exons with an ESE-conservation similar to that of constitutive exons are characterized by a conservation of the flanking intron sequences higher than average, indicating the presence of more intronic regulatory signals.
Collapse
Affiliation(s)
- Mireya Plass
- Research Unit of Biomedical Informatics, IMIM – Pompeu Fabra University, E08003, Barcelona, Spain
| | - Eduardo Eyras
- Research Unit of Biomedical Informatics, IMIM – Pompeu Fabra University, E08003, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), E08010, Barcelona, Spain
| |
Collapse
|
8
|
Parra G, Reymond A, Dabbouseh N, Dermitzakis ET, Castelo R, Thomson TM, Antonarakis SE, Guigó R. Tandem chimerism as a means to increase protein complexity in the human genome. Genes Dev 2006; 16:37-44. [PMID: 16344564 PMCID: PMC1356127 DOI: 10.1101/gr.4145906] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2005] [Accepted: 09/28/2005] [Indexed: 11/24/2022]
Abstract
The "one-gene, one-protein" rule, coined by Beadle and Tatum, has been fundamental to molecular biology. The rule implies that the genetic complexity of an organism depends essentially on its gene number. The discovery, however, that alternative gene splicing and transcription are widespread phenomena dramatically altered our understanding of the genetic complexity of higher eukaryotic organisms; in these, a limited number of genes may potentially encode a much larger number of proteins. Here we investigate yet another phenomenon that may contribute to generate additional protein diversity. Indeed, by relying on both computational and experimental analysis, we estimate that at least 4%-5% of the tandem gene pairs in the human genome can be eventually transcribed into a single RNA sequence encoding a putative chimeric protein. While the functional significance of most of these chimeric transcripts remains to be determined, we provide strong evidence that this phenomenon does not correspond to mere technical artifacts and that it is a common mechanism with the potential of generating hundreds of additional proteins in the human genome.
Collapse
Affiliation(s)
- Genís Parra
- Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, and Programa de Bioinformàtica i Genòmica, Centre de Regulació Genòmica, E08003 Barcelona, Catalonia, Spain
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Abstract
The chicken genome sequence is important for several reasons. First, the chicken shared a common ancestor with mammals approximately 310 million years ago (Mya) at a phylogenetic distance not previously covered by other genome sequences. It therefore fills a gap in our knowledge and understanding of the evolution and conservation of genes, regulatory sequences, genomes, and karyotypes. The chicken is also a major source of protein in the world, with billions of birds used in meat and egg production each year. It is the first livestock species to be sequenced and so leads the way for others. The sequence and the 2.8 million genetic polymorphisms defined in a parallel project are expected to benefit agriculture and cast new light on animal domestication. Also, as the first bird to be sequenced, it is a model for the 9600 avian species thought to exist today. Many of the features of the chicken genome and its biology make it an ideal organism for studies in development and evolution, along with applications in agriculture and medicine.
Collapse
Affiliation(s)
- David W Burt
- Department of Genomics and Genetics, Roslin Institute (Edinburgh), Midlothian EH25 9PS, United Kingdom.
| |
Collapse
|