1
|
Hubbard TJP, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Graf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kahari A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Rios D, Schuster M, Slater G, Smedley D, Spooner W, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wilder S, Zadissa A, Birney E, Cunningham F, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Kasprzyk A, Proctor G, Smith J, Searle S, Flicek P. Ensembl 2009. Nucleic Acids Res 2008; 37:D690-7. [PMID: 19033362 PMCID: PMC2686571 DOI: 10.1093/nar/gkn828] [Citation(s) in RCA: 683] [Impact Index Per Article: 42.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases, and other information for chordate, selected model organism and disease vector genomes. As of release 51 (November 2008), Ensembl fully supports 45 species, and three additional species have preliminary support. New species in the past year include orangutan and six additional low coverage mammalian genomes. Major additions and improvements to Ensembl since our previous report include a major redesign of our website; generation of multiple genome alignments and ancestral sequences using the new Enredo-Pecan-Ortheus pipeline and development of our software infrastructure, particularly to support the Ensembl Genomes project (http://www.ensemblgenomes.org/).
Collapse
Affiliation(s)
- T J P Hubbard
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Gräf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kähäri A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJP, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A, Searle S. Ensembl 2008. Nucleic Acids Res 2007; 36:D707-14. [PMID: 18000006 PMCID: PMC2238821 DOI: 10.1093/nar/gkm988] [Citation(s) in RCA: 370] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein–DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.
Collapse
Affiliation(s)
- P Flicek
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Hubbard TJP, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E. Ensembl 2007. Nucleic Acids Res 2006; 35:D610-7. [PMID: 17148474 PMCID: PMC1761443 DOI: 10.1093/nar/gkl996] [Citation(s) in RCA: 657] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees.
Collapse
Affiliation(s)
- T J P Hubbard
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Flicek P, Gräf S, Hammond M, Herrero J, Howe K, Iyer V, Jekosch K, Kähäri A, Kasprzyk A, Keefe D, Kokocinski F, Kulesha E, London D, Longden I, Melsopp C, Meidl P, Overduin B, Parker A, Proctor G, Prlic A, Rae M, Rios D, Redmond S, Schuster M, Sealy I, Searle S, Severin J, Slater G, Smedley D, Smith J, Stabenau A, Stalker J, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Hubbard TJP. Ensembl 2006. Nucleic Acids Res 2006; 34:D556-61. [PMID: 16381931 PMCID: PMC1347495 DOI: 10.1093/nar/gkj133] [Citation(s) in RCA: 323] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The Ensembl () project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased from 4 to 19, with the addition of the mammalian genomes of Rhesus macaque and Opossum, the chordate genome of Ciona intestinalis and the import and integration of the yeast genome. The year has also seen extensive improvements to both data analysis and presentation, with the introduction of a redesigned website, the addition of RNA gene and regulatory annotation and substantial improvements to the integration of human genome variation data.
Collapse
Affiliation(s)
- E Birney
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Ashurst JL, Chen CK, Gilbert JGR, Jekosch K, Keenan S, Meidl P, Searle SM, Stalker J, Storey R, Trevanion S, Wilming L, Hubbard T. The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res 2005; 33:D459-65. [PMID: 15608237 PMCID: PMC540089 DOI: 10.1093/nar/gki135] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) has been designed to be a community resource for browsing manual annotation of finished sequences from a variety of vertebrate genomes. Its core database is based on an Ensembl-style schema, extended to incorporate curation-specific metadata. In collaboration with the genome sequencing centres, Vega attempts to present consistent high-quality annotation of the published human chromosome sequences. In addition, it is also possible to view various finished regions from other vertebrates, including mouse and zebrafish. Vega displays only manually annotated gene structures built using transcriptional evidence, which can be examined in the browser. Attempts have been made to standardize the annotation procedure across each vertebrate genome, which should aid comparative analysis of orthologues across the different finished regions.
Collapse
Affiliation(s)
- J L Ashurst
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Hubbard T, Andrews D, Caccamo M, Cameron G, Chen Y, Clamp M, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Gilbert J, Hammond M, Herrero J, Hotz H, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Keenan S, Kokocinsci F, London D, Longden I, McVicker G, Melsopp C, Meidl P, Potter S, Proctor G, Rae M, Rios D, Schuster M, Searle S, Severin J, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Birney E. Ensembl 2005. Nucleic Acids Res 2005; 33:D447-53. [PMID: 15608235 PMCID: PMC540092 DOI: 10.1093/nar/gki138] [Citation(s) in RCA: 354] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2004] [Revised: 11/01/2004] [Accepted: 11/01/2004] [Indexed: 11/17/2022] Open
Abstract
The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased by 7 to 16, with the addition of the six vertebrate genomes of chimpanzee, dog, cow, chicken, tetraodon and frog and the insect genome of honeybee. The majority have been annotated automatically using the Ensembl gene build system, showing its flexibility to reliably annotate a wide variety of genomes. With the increased number of vertebrate genomes, the comparative analysis provided to users has been greatly improved, with new website interfaces allowing annotation of different genomes to be directly compared. The Ensembl software system is being increasingly widely reused in different projects showing the benefits of a completely open approach to software development and distribution.
Collapse
Affiliation(s)
- T Hubbard
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Birney E, Andrews D, Bevan P, Caccamo M, Cameron G, Chen Y, Clarke L, Coates G, Cox T, Cuff J, Curwen V, Cutts T, Down T, Durbin R, Eyras E, Fernandez-Suarez XM, Gane P, Gibbins B, Gilbert J, Hammond M, Hotz H, Iyer V, Kahari A, Jekosch K, Kasprzyk A, Keefe D, Keenan S, Lehvaslaiho H, McVicker G, Melsopp C, Meidl P, Mongin E, Pettett R, Potter S, Proctor G, Rae M, Searle S, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Ureta-Vidal A, Woodwark C, Clamp M, Hubbard T. Ensembl 2004. Nucleic Acids Res 2004; 32:D468-70. [PMID: 14681459 PMCID: PMC308772 DOI: 10.1093/nar/gkh038] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.
Collapse
Affiliation(s)
- E Birney
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|