501
|
Castellano S, Gladyshev VN, Guigó R, Berry MJ. SelenoDB 1.0 : a database of selenoprotein genes, proteins and SECIS elements. Nucleic Acids Res 2008; 36:D332-8. [PMID: 18174224 PMCID: PMC2238826 DOI: 10.1093/nar/gkm731] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Selenoproteins are a diverse group of proteins usually misidentified and misannotated in sequence databases. The presence of an in-frame UGA (stop) codon in the coding sequence of selenoprotein genes precludes their identification and correct annotation. The in-frame UGA codons are recoded to cotranslationally incorporate selenocysteine, a rare selenium-containing amino acid. The development of ad hoc experimental and, more recently, computational approaches have allowed the efficient identification and characterization of the selenoproteomes of a growing number of species. Today, dozens of selenoprotein families have been described and more are being discovered in recently sequenced species, but the correct genomic annotation is not available for the majority of these genes. SelenoDB is a long-term project that aims to provide, through the collaborative effort of experimental and computational researchers, automatic and manually curated annotations of selenoprotein genes, proteins and SECIS elements. Version 1.0 of the database includes an initial set of eukaryotic genomic annotations, with special emphasis on the human selenoproteome, for immediate inspection by selenium researchers or incorporation into more general databases. SelenoDB is freely available at http://www.selenodb.org.
Collapse
Affiliation(s)
- Sergi Castellano
- Department of Cell and Molecular Biology, University of Hawaii at Manoa, Honolulu, Hawaii, USA.
| | | | | | | |
Collapse
|
502
|
Megy K, Hammond M, Lawson D, Bruggner RV, Birney E, Collins FH. Genomic resources for invertebrate vectors of human pathogens, and the role of VectorBase. INFECTION GENETICS AND EVOLUTION 2008; 9:308-13. [PMID: 18262474 DOI: 10.1016/j.meegid.2007.12.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2007] [Revised: 12/19/2007] [Accepted: 12/20/2007] [Indexed: 11/26/2022]
Abstract
High-throughput genome sequencing techniques have now reached vector biology with an emphasis on those species that are vectors of human pathogens. The first mosquito to be sequenced was Anopheles gambiae, the vector for Plasmodium parasites that cause malaria. Further mosquitoes have followed: Aedes aegypti (yellow fever and dengue fever vector) and Culex pipiens (lymphatic filariasis and West Nile fever). Species that are currently in sequencing include the body louse Pediculus humanus (Typhus vector), the triatomine Rhodnius prolixus (Chagas disease vector) and the tick Ixodes scapularis (Lyme disease vector). The motivations for sequencing vector genomes are to further understand vector biology, with an eye on developing new control strategies (for example novel chemical attractants or repellents) or understanding the limitations of current strategies (for example the mechanism of insecticide resistance); to analyse the mechanisms driving their evolution; and to perform an exhaustive analysis of the gene repertory. The proliferation of genomic data creates the need for efficient and accessible storage. We present VectorBase, a genomic resource centre that is both involved in the annotation of vector genomes and act as a portal for access to the genomic information (http://www.vectorbase.org).
Collapse
Affiliation(s)
- K Megy
- EMBL, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.
| | | | | | | | | | | |
Collapse
|
503
|
Abstract
The increasing use of gene expression microarrays, and depositing of the resulting data into public repositories, means that more investigators are interested in using the technology either directly or through meta analysis of the publicly available data. The tools available for data analysis have generally been developed for use by experts in the field, making them difficult to use by the general research community. For those interested in entering the field, especially those without a background in statistics, it is difficult to understand why experimental results can be so variable. The purpose of this review is to go through the workflow of a typical microarray experiment, to show that decisions made at each step, from choice of platform through statistical analysis methods to biological interpretation, are all sources of this variability.
Collapse
|
504
|
Heger A, Ponting CP. OPTIC: orthologous and paralogous transcripts in clades. Nucleic Acids Res 2008; 36:D267-70. [PMID: 17933761 PMCID: PMC2238935 DOI: 10.1093/nar/gkm852] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Revised: 09/25/2007] [Accepted: 09/26/2007] [Indexed: 12/02/2022] Open
Abstract
The genome sequences of a large number of metazoan species are now known. As multiple closely related genomes are sequenced, comparative studies that previously focussed on only pairs of genomes can now be extended over whole clades. The orthologous and paralogous transcripts in clades (OPTIC) database currently provides sets of gene predictions and orthology assignments for three clades: (i) amniotes, including human, dog, mouse, opossum, platypus and chicken (17 443 orthologous groups); (ii) a Drosophila clade of 12 species (12 889 orthologous groups) and (iii) a nematode clade of four species (13 626 orthologous groups). Gene predictions, multiple alignments and phylogenetic trees are freely available to browse and download from http://genserv.anat.ox.ac.uk/clades. Further genomes and clades will be added in the future.
Collapse
Affiliation(s)
- Andreas Heger
- Department of Physiology, Anatomy and Genetics, MRC Functional Genetics Unit, University of Oxford, Le Gros Clark Building, Oxford OX1 3QX, UK
| | | |
Collapse
|
505
|
Huppert JL. Thermodynamic prediction of RNA–DNA duplex-forming regions in the human genome. MOLECULAR BIOSYSTEMS 2008; 4:686-91. [DOI: 10.1039/b800354h] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
506
|
Kageyama H, Kitamura Y, Hosono T, Kintaka Y, Seki M, Takenoya F, Hori Y, Nonaka N, Arata S, Shioda S. Visualization of ghrelin-producing neurons in the hypothalamic arcuate nucleus using ghrelin-EGFP transgenic mice. ACTA ACUST UNITED AC 2008; 145:116-21. [DOI: 10.1016/j.regpep.2007.09.026] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
507
|
Lee PH, Shatkay H. F-SNP: computationally predicted functional SNPs for disease association studies. Nucleic Acids Res 2008; 36:D820-4. [PMID: 17986460 PMCID: PMC2238878 DOI: 10.1093/nar/gkm904] [Citation(s) in RCA: 264] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2007] [Revised: 10/04/2007] [Accepted: 10/05/2007] [Indexed: 11/13/2022] Open
Abstract
The Functional Single Nucleotide Polymorphism (F-SNP) database integrates information obtained from 16 bioinformatics tools and databases about the functional effects of SNPs. These effects are predicted and indicated at the splicing, transcriptional, translational and post-translational level. As such, the database helps identify and focus on SNPs with potential deleterious effect to human health. In particular, users can retrieve SNPs that disrupt genomic regions known to be functional, including splice sites and transcriptional regulatory regions. Users can also identify non-synonymous SNPs that may have deleterious effects on protein structure or function, interfere with protein translation or impede post-translational modification. A web interface enables easy navigation for obtaining information through multiple starting points and exploration routes (e.g. starting from SNP identifier, genomic region, gene or target disease). The F-SNP database is available at http://compbio.cs.queensu.ca/F-SNP/.
Collapse
Affiliation(s)
- Phil Hyoun Lee
- Computational Biology and Machine Learning Lab, School of Computing, Queen's University, Kingston, ON, Canada.
| | | |
Collapse
|
508
|
Simons C, Makunin IV, Pheasant M, Mattick JS. Maintenance of transposon-free regions throughout vertebrate evolution. BMC Genomics 2007; 8:470. [PMID: 18093339 PMCID: PMC2241635 DOI: 10.1186/1471-2164-8-470] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2007] [Accepted: 12/20/2007] [Indexed: 01/23/2023] Open
Abstract
Background We recently reported the existence of large numbers of regions up to 80 kb long that lack transposon insertions in the human, mouse and opossum genomes. These regions are significantly associated with loci involved in developmental and transcriptional regulation. Results Here we report that transposon-free regions (TFRs) are prominent genomic features of amphibian and fish lineages, and that many have been maintained throughout vertebrate evolution, although most transposon-derived sequences have entered these lineages after their divergence. The zebrafish genome contains 470 TFRs over 10 kb and a further 3,951 TFRs over 5 kb, which is comparable to the number identified in mammals. Two thirds of zebrafish TFRs over 10 kb are orthologous to TFRs in at least one mammal, and many have orthologous TFRs in all three mammalian genomes as well as in the genome of Xenopus tropicalis. This indicates that the mechanism responsible for the maintenance of TFRs has been active at these loci for over 450 million years. However, the majority of TFR bases cannot be aligned between distantly related species, demonstrating that TFRs are not the by-product of strong primary sequence conservation. Syntenically conserved TFRs are also more enriched for regulatory genes compared to lineage-specific TFRs. Conclusion We suggest that TFRs contain extended regulatory sequences that contribute to the precise expression of genes central to early vertebrate development, and can be used as predictors of important regulatory regions.
Collapse
Affiliation(s)
- Cas Simons
- Australian Research Council Special Research Center for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia QLD 4072, Australia.
| | | | | | | |
Collapse
|
509
|
Nozawa M, Kawahara Y, Nei M. Genomic drift and copy number variation of sensory receptor genes in humans. Proc Natl Acad Sci U S A 2007; 104:20421-6. [PMID: 18077390 PMCID: PMC2154446 DOI: 10.1073/pnas.0709956104] [Citation(s) in RCA: 108] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2007] [Indexed: 11/18/2022] Open
Abstract
The number of sensory receptor genes varies extensively among different mammalian species. This variation is believed to be caused partly by physiological requirements of animals and partly by genomic drift due to random duplication and deletion of genes. If the contribution of genomic drift is substantial, each species should contain a significant amount of copy number variation (CNV). We therefore investigated CNVs in sensory receptor genes among 270 healthy humans by using published CNV data. The results indicated that olfactory receptor (OR), taste receptor type 2, and vomeronasal receptor type 1 genes show a high level of intraspecific CNVs. In particular, >30% of the approximately 800 OR gene loci in humans were polymorphic with respect to copy number, and two randomly chosen individuals showed a copy number difference of approximately 11 in functional OR genes on average. There was no significant difference in the amount of CNVs between functional and nonfunctional OR genes. Because pseudogenes are expected to evolve in a neutral fashion, this observation suggests that functional OR genes also have evolved in a similar manner with respect to copy number change. In addition, we found that the evolutionary change of copy number of OR genes approximately follows the Gaussian process in probability theory, and the copy number divergence between populations has increased with evolutionary time. We therefore conclude that genomic drift plays an important role for generating intra- and interspecific CNVs of sensory receptor genes. Similar results were obtained when all annotated genes were analyzed.
Collapse
Affiliation(s)
- Masafumi Nozawa
- *Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, 328 Mueller Laboratory, University Park, PA 16802; and
| | - Yoshihiro Kawahara
- *Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, 328 Mueller Laboratory, University Park, PA 16802; and
- Integrated Database Team, Japan Biological Information Research Center, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Masatoshi Nei
- *Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, 328 Mueller Laboratory, University Park, PA 16802; and
| |
Collapse
|
510
|
Reumers J, Conde L, Medina I, Maurer-Stroh S, Van Durme J, Dopazo J, Rousseau F, Schymkowitz J. Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases. Nucleic Acids Res 2007; 36:D825-9. [PMID: 18086700 PMCID: PMC2238831 DOI: 10.1093/nar/gkm979] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are, together with copy number variation, the primary source of variation in the human genome. SNPs are associated with altered response to drug treatment, susceptibility to disease and other phenotypic variation. Furthermore, during genetic screens for disease-associated mutations in groups of patients and control individuals, the distinction between disease causing mutation and polymorphism is often unclear. Annotation of the functional and structural implications of single nucleotide changes thus provides valuable information to interpret and guide experiments. The SNPeffect and PupaSuite databases are now synchronized to deliver annotations for both non-coding and coding SNP, as well as annotations for the SwissProt set of human disease mutations. In addition, SNPeffect now contains predictions of Tango2: an improved aggregation detector, and Waltz: a novel predictor of amyloid-forming sequences, as well as improved predictors for regions that are recognized by the Hsp70 family of chaperones. The new PupaSuite version incorporates predictions for SNPs in silencers and miRNAs including their targets, as well as additional methods for predicting SNPs in TFBSs and splice sites. Also predictions for mouse and rat genomes have been added. In addition, a PupaSuite web service has been developed to enable data access, programmatically. The combined database holds annotations for 4,965,073 regulatory as well as 133,505 coding human SNPs and 14,935 disease mutations, and phenotypic descriptions of 43,797 human proteins and is accessible via http://snpeffect.vib.be and http://pupasuite.bioinfo.cipf.es/.
Collapse
Affiliation(s)
- Joke Reumers
- Switch Laboratory, Department of Applied Biological Sciences, Vrije Universiteit Brussel, Switch Laboratory, VIB, Pleinlaan 2, 1050 Brussel, Belgium
| | | | | | | | | | | | | | | |
Collapse
|
511
|
Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, Kober KM, Miller W, Pedersen JS, Pohl A, Raney BJ, Rhead B, Rosenbloom KR, Smith KE, Stanke M, Thakkapallayil A, Trumbower H, Wang T, Zweig AS, Haussler D, Kent WJ. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res 2007; 36:D773-9. [PMID: 18086701 PMCID: PMC2238835 DOI: 10.1093/nar/gkm966] [Citation(s) in RCA: 403] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.edu/.
Collapse
Affiliation(s)
- D Karolchik
- Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
512
|
Kuhn M, von Mering C, Campillos M, Jensen LJ, Bork P. STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res 2007; 36:D684-8. [PMID: 18084021 PMCID: PMC2238848 DOI: 10.1093/nar/gkm795] [Citation(s) in RCA: 601] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The knowledge about interactions between proteins and small molecules is essential for the understanding of molecular and cellular functions. However, information on such interactions is widely dispersed across numerous databases and the literature. To facilitate access to this data, STITCH (‘search tool for interactions of chemicals’) integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug–target relationships. Inferred information from phenotypic effects, text mining and chemical structure similarity is used to predict relations between chemicals. STITCH further allows exploring the network of chemical relations, also in the context of associated binding proteins. Each proposed interaction can be traced back to the original data sources. Our database contains interaction information for over 68 000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes and their interactions contained in the STRING database. STITCH is available at http://stitch.embl.de/
Collapse
Affiliation(s)
- Michael Kuhn
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | | | | | | | | |
Collapse
|
513
|
Prachumwat A, Li WH. Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes. Genome Res 2007; 18:221-32. [PMID: 18083775 DOI: 10.1101/gr.7046608] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Where did vertebrate genes come from? Here we address this question by analyzing eight completely sequenced land vertebrate genomes and six completely sequenced invertebrate genomes. Approximately 70% of the vertebrate genes can be found in the six invertebrate genomes with the standard homology search criteria (denoted as V.MCL), another approximately 6% can be found with relaxed search criteria, and an additional approximately 2% can be found in sequenced fungal and bacterial genomes. Thus, a substantial proportion of vertebrate genes (approximately 22%) cannot be found in the nonvertebrate genomes studied (denoted as Vonly). Interestingly, genes in Vonly are predominantly singletons, while the majority of genes in the other three groups belong to gene families. The proteins of Vonly tend to evolve faster than those of V.MCL. Surprisingly, in many cases the family sizes in V.MCL are only as large as or even smaller than their counterparts in the invertebrates, contrary to the general perception of a larger family size in vertebrates. Interestingly, in comparison with the family size in invertebrates, vertebrate gene families involved in regulation, signal transduction, transcription, protein transport, and protein modification tend to be expanded, whereas those involved in metabolic processes tend to be contracted. Furthermore, for almost all of the functional categories with family size expansion in vertebrates, the number of gene types (i.e., the number of singletons plus the number of gene families) tends to be over-represented in Vonly, but under-represented in V.MCL. Our study suggests that gene function is a major determinant of gene family size.
Collapse
Affiliation(s)
- Anuphap Prachumwat
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | | |
Collapse
|
514
|
Bioinformatic prediction and analysis of eukaryotic protein kinases in the rat genome. Gene 2007; 410:147-53. [PMID: 18201844 DOI: 10.1016/j.gene.2007.12.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Revised: 12/03/2007] [Accepted: 12/04/2007] [Indexed: 01/29/2023]
Abstract
Eukaryotic protein kinases, containing a conserved catalytic domain, represent one of the largest superfamilies of the eukaryotic proteins and play distinct roles in cell signaling and diseases. Near completion of rat genome sequencing project enables the evaluation of a near complete set of rat protein kinases. Publicly accessible genetic sequence databases were searched for rat protein kinases, and 515 eukaryotic protein kinases, 40 atypical protein kinases and 45 kinase pseudogenes were identified. The rat has 509 putative protein kinases orthologous to human kinases. Unlike microtubule affinity-regulating kinases, the rat has a few more kinases, in addition to the orthologous pairs of mouse kinases. The comparison of 11 different eukaryotic species revealed the evolutionary conservation of this diverse family of proteins. The evolutionary rate studies of human disease and non-disease associated kinases suggested that relatively uniform selective pressures have been applied to these kinase classes. This bioinformatic study of the rat protein kinases provides a suitable framework for further characterization of the functional and structural properties of these protein kinases.
Collapse
|
515
|
Tsui IF, Chari R, Buys TP, Lam WL. Public databases and software for the pathway analysis of cancer genomes. Cancer Inform 2007; 3:379-97. [PMID: 19455256 PMCID: PMC2410087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The study of pathway disruption is key to understanding cancer biology. Advances in high throughput technologies have led to the rapid accumulation of genomic data. The explosion in available data has generated opportunities for investigation of concerted changes that disrupt biological functions, this in turns created a need for computational tools for pathway analysis. In this review, we discuss approaches to the analysis of genomic data and describe the publicly available resources for studying biological pathways.
Collapse
Affiliation(s)
- Ivy F.L. Tsui
- Correspondence: Ivy Tsui, BC Cancer Research Centre, 675 West 10th Avenue Vancouver, BC, V5Z 1L3, Canada. Tel: +1 604-675-8111; Fax: +1 604-675-8232;
| | | | | | | |
Collapse
|
516
|
Pittlik S, Domingues S, Meyer A, Begemann G. Expression of zebrafish aldh1a3 (raldh3) and absence of aldh1a1 in teleosts. Gene Expr Patterns 2007; 8:141-7. [PMID: 18178530 DOI: 10.1016/j.gep.2007.11.003] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Revised: 11/23/2007] [Accepted: 11/28/2007] [Indexed: 10/22/2022]
Abstract
The vitamin A-derived morphogen retinoic acid (RA) plays important roles during the development of chordate animals. The Aldh1a-family of RA-synthesizing enzymes consists of three members, Aldh1a1-3 (Raldh1-3), that are dynamically expressed throughout development. We have searched the known teleost genomes for the presence of Raldh family members and have found that teleost fish possess orthologs of Aldh1a2 and Aldh1a3 only. Here we describe the expression of aldh1a3 in the zebrafish, Danio rerio. Whole mount in situ hybridization shows that aldh1a3 is expressed during eye development in the retina flanking the optic stalks and later is expressed ventrally, opposite the expression domain of aldh1a2. During inner ear morphogenesis, aldh1a3 is expressed in developing sensory epithelia of the cristae and utricular macula and is specifically up-regulated in epithelial projections throughout the formation of the walls of the semicircular canals and endolymphatic duct. In contrast to the mouse inner ear, which expresses all three Raldhs, we find that only aldh1a3 is expressed in the zebrafish otocyst, while aldh1a2 is present in the periotic mesenchyme. During larval stages, additional expression domains of aldh1a3 appear in the anterior pituitary and the swim bladder. Our analyses provide a starting point for genetic studies to examine the role of RA in these organs and emphasize the suitability of the zebrafish inner ear in dissecting the contribution of RA signaling to the development of the vestibular system.
Collapse
Affiliation(s)
- Silke Pittlik
- Department of Biology, University of Konstanz, Fach M617, 78457 Konstanz, Germany
| | | | | | | |
Collapse
|
517
|
Bina M. The genome browser at UCSC for locating genes, and much more! Mol Biotechnol 2007; 38:269-75. [PMID: 18058261 DOI: 10.1007/s12033-007-9019-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 11/06/2007] [Indexed: 11/24/2022]
Abstract
For beginners in the field, this review highlights the key features of the genome browser at UCSC for data display, and provides nearly step-by-step procedures for creating publication quality maps. The browser offers an engine (Blat) for searching a known genomic DNA for correspondence with protein and DNA sequences specified by the user. The results provide links to graphical displays, known as maps. Users can create "designer maps" by adding Tracks to view various types of data and specific landmarks. The browser offers an extensive list of options. They include the position of annotated genes, the position of reference cDNA sequences (RefSeq from GenBank), the position of alternatively spliced mRNA species, and predictions derived from computational models to identify potential transcription start sites and potential protein binding elements in genomic DNA. Several tracks can be tailored for comparative genomics. The browser also offers tracks for displaying large-scale experimental data including gene expression profiles, exon chips, and single-nucleotide-polymorphisms.
Collapse
Affiliation(s)
- Minou Bina
- Department of Chemistry, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
518
|
Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 2007; 81:1278-83. [PMID: 17966091 DOI: 10.1086/522374] [Citation(s) in RCA: 676] [Impact Index Per Article: 37.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2007] [Accepted: 08/01/2007] [Indexed: 12/30/2022] Open
Abstract
Published genomewide association (GWA) studies typically analyze and report single-nucleotide polymorphisms (SNPs) and their neighboring genes with the strongest evidence of association (the "most-significant SNPs/genes" approach), while paying little attention to the rest. Borrowing ideas from microarray data analysis, we demonstrate that pathway-based approaches, which jointly consider multiple contributing factors in the same pathway, might complement the most-significant SNPs/genes approach and provide additional insights into interpretation of GWA data on complex diseases.
Collapse
Affiliation(s)
- Kai Wang
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | | | |
Collapse
|
519
|
Ruan J, Li H, Chen Z, Coghlan A, Coin LJM, Guo Y, Hériché JK, Hu Y, Kristiansen K, Li R, Liu T, Moses A, Qin J, Vang S, Vilella AJ, Ureta-Vidal A, Bolund L, Wang J, Durbin R. TreeFam: 2008 Update. Nucleic Acids Res 2007; 36:D735-40. [PMID: 18056084 PMCID: PMC2238856 DOI: 10.1093/nar/gkm1005] [Citation(s) in RCA: 247] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.
Collapse
Affiliation(s)
- Jue Ruan
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Heng Li
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Zhongzhong Chen
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Avril Coghlan
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Lachlan James M. Coin
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Yiran Guo
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Jean-Karim Hériché
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Yafeng Hu
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Karsten Kristiansen
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Ruiqiang Li
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Tao Liu
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Alan Moses
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Junjie Qin
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Søren Vang
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Albert J. Vilella
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Abel Ureta-Vidal
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Lars Bolund
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Jun Wang
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Richard Durbin
- Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, Department of Epidemiology & Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK, Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health Sciences, University of Aarhus, DK-8200 Aarhus N, Denmark, EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK and Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
- *To whom correspondence should be addressed.+44 (0) 1223 834244+44 (0) 1223 494919
| |
Collapse
|
520
|
Fan X, Dougan ST. The evolutionary origin of nodal-related genes in teleosts. Dev Genes Evol 2007; 217:807-13. [PMID: 17992538 DOI: 10.1007/s00427-007-0191-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2007] [Accepted: 10/10/2007] [Indexed: 11/29/2022]
Abstract
Because of an extra whole-genome duplication, zebrafish and other teleosts have two copies of genes that are present in a single copy in tetrapod genomes. Some zebrafish genes, however, are present in triplicate. For example, the nodal-related genes encode secreted proteins of the transforming growth factor beta superfamily that are required in all vertebrates to induce the mesoderm and endoderm, pattern all three germ layers, and establish the left-right axis. Zebrafish have three nodal-related genes, called ndr1/squint, ndr2/cyclops, and ndr3/southpaw. As part of an analysis of enhancer elements controlling zebrafish nodal-related gene expression, we analyzed the nodal loci in the sequenced genomes of five teleost species and four tetrapod species. Each teleost genome contains three nodal-related genes, indicating that squint, cyclops, and southpaw orthologues were present early in the teleost lineage. The genes flanking the nodal-related genes are also conserved, demonstrating a high degree of conserved synteny. Although we found little homology outside of the coding sequences in this region, pufferfish enhancer sequences work in zebrafish embryos to drive reporter gene expression in the squint expression pattern. This indicates a high degree of functional conservation of enhancer elements within the teleosts. We conclude that the ancestral squint and cyclops genes arose during the teleost-specific whole-genome duplication event and that southpaw emerged from a subsequent duplication event involving ancestral squint.
Collapse
Affiliation(s)
- Xiang Fan
- Department of Cellular Biology, The University of Georgia, Athens, GA 30602, USA
| | | |
Collapse
|
521
|
Shao H, Reed DR, Tordoff MG. Genetic loci affecting body weight and fatness in a C57BL/6J x PWK/PhJ mouse intercross. Mamm Genome 2007; 18:839-51. [PMID: 18008102 PMCID: PMC2131744 DOI: 10.1007/s00335-007-9069-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Accepted: 09/25/2007] [Indexed: 11/28/2022]
Abstract
To determine the genetic variation that contributes to body composition in the mouse, we interbred a wild-derived strain (PWK/PhJ; PWK) with a common laboratory strain (C57BL/6J; B6). The parental, F(1), and F(2) mice were phenotyped at 18 weeks old for body weight and composition using dual-energy X-ray absorptiometry (DEXA). A total of 479 (244 male and 235 female) F(2) mice were genotyped for 117 polymorphic markers spanning the autosomes. Twenty-eight suggestive or significant linkages for four traits (body weight, adjusted lean and fat weight, and percent fat) were detected. Of these, three QTLs were novel: one on the proximal portion of Chr 5 for body weight (Bwq8; LOD = 4.7), one on Chr 3 for lean weight (Bwtq13; LOD = 3.6), and one on Chr 11 for percent fat (Adip19; LOD = 5.8). The remaining QTLs overlapped previously identified linkages, e.g., Adip5 on Chr 9. One QTL was sex-specific (present in males only) and seven were sex-biased (more prominent in one sex than the other). Most alleles that increased body weight were contributed by the B6 strain, and most alleles that increased percent fat were contributed by the PWK strain. Eight pairs of interacting loci were identified, none of which exactly overlapped the main-effect QTLs. Many of the QTLs found in the B6 x PWK cross map to the location of previously reported linkages, suggesting that some QTLs are common to many strains (consensus QTLs), but three new QTLs appear to be particular to the PWK strain. The location and type of QTLs detected in this new cross will assist in future efforts to identify the genetic variation that determines the ratio of lean to fat weight as well as body size in mice.
Collapse
Affiliation(s)
- Hongguang Shao
- Monell Chemical Senses Center, 3500 Market Street, Philadelphia, Pennsylvania 19104, USA, e-mail:
| | - Danielle R. Reed
- Monell Chemical Senses Center, 3500 Market Street, Philadelphia, Pennsylvania 19104, USA, e-mail:
| | - Michael G. Tordoff
- Monell Chemical Senses Center, 3500 Market Street, Philadelphia, Pennsylvania 19104, USA, e-mail:
| |
Collapse
|
522
|
Miller W, Rosenbloom K, Hardison RC, Hou M, Taylor J, Raney B, Burhans R, King DC, Baertsch R, Blankenberg D, Kosakovsky Pond SL, Nekrutenko A, Giardine B, Harris RS, Tyekucheva S, Diekhans M, Pringle TH, Murphy WJ, Lesk A, Weinstock GM, Lindblad-Toh K, Gibbs RA, Lander ES, Siepel A, Haussler D, Kent WJ. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genes Dev 2007; 17:1797-808. [PMID: 17984227 PMCID: PMC2099589 DOI: 10.1101/gr.6761107] [Citation(s) in RCA: 212] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 08/30/2007] [Indexed: 01/17/2023]
Abstract
This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.
Collapse
Affiliation(s)
- Webb Miller
- Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania 16802, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
523
|
Salgado D, Gimenez G, Coulier F, Marcelle C. COMPARE, a multi-organism system for cross-species data comparison and transfer of information. Bioinformatics 2007; 24:447-9. [PMID: 18056065 DOI: 10.1093/bioinformatics/btm599] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION COMPARE is a multi-organism web-based resource system designed to easily retrieve, correlate and interpret data across species. The COMPARE interface provides access to a wide array of information including genomic structure, expression data, annotations, pathways and literature links for human and three widely studied animal models (zebrafish, Drosophila and mouse). A consensus ortholog-finding pipeline combining several ortholog prediction methods allows accurate comparisons of data across species and has been utilized to transfer information from well studied organisms to more poorly annotated ones. AVAILABILITY http://compare.ibdml.univ-mrs.fr.
Collapse
Affiliation(s)
- David Salgado
- Developmental Biology Institute of Marseille Luminy (IBDML), CNRS UMR 6216, Université de la Méditerranée, Campus de Luminy, case 907. 13288 Marseille, France
| | | | | | | |
Collapse
|
524
|
Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CLG, Davis C, Ewing B, Oommen S, Lau C, Yu HC, Li J, Roe BA, Green P, Gerhard DS, Temple G, Haussler D, Brent MR. Targeted discovery of novel human exons by comparative genomics. Genes Dev 2007; 17:1763-73. [PMID: 17989246 PMCID: PMC2099585 DOI: 10.1101/gr.7128207] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2007] [Accepted: 10/15/2007] [Indexed: 01/20/2023]
Abstract
A complete and accurate set of human protein-coding gene annotations is perhaps the single most important resource for genomic research after the human-genome sequence itself, yet the major gene catalogs remain incomplete and imperfect. Here we describe a genome-wide effort, carried out as part of the Mammalian Gene Collection (MGC) project, to identify human genes not yet in the gene catalogs. Our approach was to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT-PCR. We have identified 734 novel gene fragments (NGFs) containing 2188 exons with, at most, weak prior cDNA support. These NGFs correspond to an estimated 563 distinct genes, of which >160 are completely absent from the major gene catalogs, while hundreds of others represent significant extensions of known genes. The NGFs appear to be predominantly protein-coding genes rather than noncoding RNAs, unlike novel transcribed sequences identified by technologies such as tiling arrays and CAGE. They tend to be expressed at low levels and in a tissue-specific manner, and they are enriched for roles in motor activity, cell adhesion, connective tissue, and central nervous system development. Our results demonstrate that many important genes and gene fragments have been missed by traditional approaches to gene discovery but can be identified by their evolutionary signatures using comparative sequence data. However, they suggest that hundreds-not thousands-of protein-coding genes are completely missing from the current gene catalogs.
Collapse
Affiliation(s)
- Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
525
|
Ranwez V, Delsuc F, Ranwez S, Belkhir K, Tilak MK, Douzery EJ. OrthoMaM: a database of orthologous genomic markers for placental mammal phylogenetics. BMC Evol Biol 2007; 7:241. [PMID: 18053139 PMCID: PMC2249597 DOI: 10.1186/1471-2148-7-241] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2007] [Accepted: 11/30/2007] [Indexed: 11/23/2022] Open
Abstract
Background Molecular sequence data have become the standard in modern day phylogenetics. In particular, several long-standing questions of mammalian evolutionary history have been recently resolved thanks to the use of molecular characters. Yet, most studies have focused on only a handful of standard markers. The availability of an ever increasing number of whole genome sequences is a golden mine for modern systematics. Genomic data now provide the opportunity to select new markers that are potentially relevant for further resolving branches of the mammalian phylogenetic tree at various taxonomic levels. Description The EnsEMBL database was used to determine a set of orthologous genes from 12 available complete mammalian genomes. As targets for possible amplification and sequencing in additional taxa, more than 3,000 exons of length > 400 bp have been selected, among which 118, 368, 608, and 674 are respectively retrieved for 12, 11, 10, and 9 species. A bioinformatic pipeline has been developed to provide evolutionary descriptors for these candidate markers in order to assess their potential phylogenetic utility. The resulting OrthoMaM (Orthologous Mammalian Markers) database can be queried and alignments can be downloaded through a dedicated web interface . Conclusion The importance of marker choice in phylogenetic studies has long been stressed. Our database centered on complete genome information now makes possible to select promising markers to a given phylogenetic question or a systematic framework by querying a number of evolutionary descriptors. The usefulness of the database is illustrated with two biological examples. First, two potentially useful markers were identified for rodent systematics based on relevant evolutionary parameters and sequenced in additional species. Second, a complete, gapless 94 kb supermatrix of 118 orthologous exons was assembled for 12 mammals. Phylogenetic analyses using probabilistic methods unambiguously supported the new placental phylogeny by retrieving the monophyly of Glires, Euarchontoglires, Laurasiatheria, and Boreoeutheria. Muroid rodents thus do not represent a basal placental lineage as it was mistakenly reasserted in some recent phylogenomic analyses based on fewer taxa. We expect the OrthoMaM database to be useful for further resolving the phylogenetic tree of placental mammals and for better understanding the evolutionary dynamics of their genomes, i.e., the forces that shaped coding sequences in terms of selective constraints.
Collapse
Affiliation(s)
- Vincent Ranwez
- Université Montpellier 2, CC064, Place Eugène Bataillon, 34 095 Montpellier Cedex 05, France.
| | | | | | | | | | | |
Collapse
|
526
|
Abstract
Alternative splicing is thought to be one of the major sources for functional diversity in higher eukaryotes. Interestingly, when mapping splicing events onto protein structures, about half of the events affect structured and even highly conserved regions i.e. are non-trivial on the structure level. This has led to the controversial hypothesis that such splice variants result in nonsense-mediated mRNA decay or non-functional, unstructured proteins, which do not contribute to the functional diversity of an organism. Here we show in a comprehensive study on alternative splicing that proteins appear to be much more tolerant to structural deletions, insertions and replacements than previously thought. We find literature evidence that such non-trivial splicing isoforms exhibit different functional properties compared to their native counterparts and allow for interesting regulatory patterns on the protein network level. We provide examples that splicing events may represent transitions between different folds in the protein sequence–structure space and explain these links by a common genetic mechanism. Taken together, those findings hint to a more prominent role of splicing in protein structure evolution and to a different view of phenotypic plasticity of protein structures.
Collapse
Affiliation(s)
- Fabian Birzele
- Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-University, Amalienstrasse 17, D-80333 Munich, Germany.
| | | | | |
Collapse
|
527
|
Lemay DG, Neville MC, Rudolph MC, Pollard KS, German JB. Gene regulatory networks in lactation: identification of global principles using bioinformatics. BMC SYSTEMS BIOLOGY 2007; 1:56. [PMID: 18039394 PMCID: PMC2225983 DOI: 10.1186/1752-0509-1-56] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2007] [Accepted: 11/27/2007] [Indexed: 11/16/2022]
Abstract
Background The molecular events underlying mammary development during pregnancy, lactation, and involution are incompletely understood. Results Mammary gland microarray data, cellular localization data, protein-protein interactions, and literature-mined genes were integrated and analyzed using statistics, principal component analysis, gene ontology analysis, pathway analysis, and network analysis to identify global biological principles that govern molecular events during pregnancy, lactation, and involution. Conclusion Several key principles were derived: (1) nearly a third of the transcriptome fluctuates to build, run, and disassemble the lactation apparatus; (2) genes encoding the secretory machinery are transcribed prior to lactation; (3) the diversity of the endogenous portion of the milk proteome is derived from fewer than 100 transcripts; (4) while some genes are differentially transcribed near the onset of lactation, the lactation switch is primarily post-transcriptionally mediated; (5) the secretion of materials during lactation occurs not by up-regulation of novel genomic functions, but by widespread transcriptional suppression of functions such as protein degradation and cell-environment communication; (6) the involution switch is primarily transcriptionally mediated; and (7) during early involution, the transcriptional state is partially reverted to the pre-lactation state. A new hypothesis for secretory diminution is suggested – milk production gradually declines because the secretory machinery is not transcriptionally replenished. A comprehensive network of protein interactions during lactation is assembled and new regulatory gene targets are identified. Less than one fifth of the transcriptionally regulated nodes in this lactation network have been previously explored in the context of lactation. Implications for future research in mammary and cancer biology are discussed.
Collapse
Affiliation(s)
- Danielle G Lemay
- Department of Food Science and Technology, University of California, One Shields Ave,, Davis, CA 95616, USA.
| | | | | | | | | |
Collapse
|
528
|
Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci U S A 2007; 104:19428-33. [PMID: 18040051 DOI: 10.1073/pnas.0709013104] [Citation(s) in RCA: 370] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of approximately 24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs-specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to approximately 20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.
Collapse
|
529
|
Jones P, Côté RG, Cho SY, Klie S, Martens L, Quinn AF, Thorneycroft D, Hermjakob H. PRIDE: new developments and new datasets. Nucleic Acids Res 2007; 36:D878-83. [PMID: 18033805 PMCID: PMC2238846 DOI: 10.1093/nar/gkm1021] [Citation(s) in RCA: 108] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The PRIDE (http://www.ebi.ac.uk/pride) database of protein and peptide identifications was previously described in the NAR Database Special Edition in 2006. Since this publication, the volume of public data in the PRIDE relational database has increased by more than an order of magnitude. Several significant public datasets have been added, including identifications and processed mass spectra generated by the HUPO Brain Proteome Project and the HUPO Liver Proteome Project. The PRIDE software development team has made several significant changes and additions to the user interface and tool set associated with PRIDE. The focus of these changes has been to facilitate the submission process and to improve the mechanisms by which PRIDE can be queried. The PRIDE team has developed a Microsoft Excel workbook that allows the required data to be collated in a series of relatively simple spreadsheets, with automatic generation of PRIDE XML at the end of the process. The ability to query PRIDE has been augmented by the addition of a BioMart interface allowing complex queries to be constructed. Collaboration with groups outside the EBI has been fruitful in extending PRIDE, including an approach to encode iTRAQ quantitative data in PRIDE XML.
Collapse
Affiliation(s)
- Philip Jones
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | | | | | | | |
Collapse
|
530
|
Hornshøj H, Conley LN, Hedegaard J, Sørensen P, Panitz F, Bendixen C. Microarray expression profiles of 20.000 genes across 23 healthy porcine tissues. PLoS One 2007; 2:e1203. [PMID: 18030337 PMCID: PMC2065904 DOI: 10.1371/journal.pone.0001203] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2007] [Accepted: 10/26/2007] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Gene expression microarrays have been intensively applied to screen for genes involved in specific biological processes of interest such as diseases or responses to environmental stimuli. For mammalian species, cataloging of the global gene expression profiles in large tissue collections under normal conditions have been focusing on human and mouse genomes but is lacking for the pig genome. METHODOLOGY/PRINCIPAL FINDINGS Here we present the results from a large-scale porcine study establishing microarray cDNA expression profiles of approximately 20.000 genes across 23 healthy tissues. As expected, a large portion of the genes show tissue specific expression in agreement with mappings to gene descriptions, Gene Ontology terms and KEGG pathways. Two-way hierarchical clustering identified expected tissue clusters in accordance with tissue type and a number of cDNA clusters having similar gene expression patterns across tissues. For one of these cDNA clusters, we demonstrate that possible tissue associated gene function can be inferred for previously uncharacterized genes based on their shared expression patterns with functionally annotated genes. We show that gene expression in common porcine tissues is similar to the expression in homologous tissues of human. CONCLUSIONS/SIGNIFICANCE The results from this study constitute a valuable and publicly available resource of basic gene expression profiles in normal porcine tissues and will contribute to the identification and functional annotation of porcine genes.
Collapse
Affiliation(s)
- Henrik Hornshøj
- Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, University of Aarhus, Tjele, Denmark
| | - Lene Nagstrup Conley
- Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, University of Aarhus, Tjele, Denmark
| | - Jakob Hedegaard
- Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, University of Aarhus, Tjele, Denmark
| | - Peter Sørensen
- Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, University of Aarhus, Tjele, Denmark
| | - Frank Panitz
- Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, University of Aarhus, Tjele, Denmark
| | - Christian Bendixen
- Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, University of Aarhus, Tjele, Denmark
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
531
|
Hedeler C, Wong HM, Cornell MJ, Alam I, Soanes DM, Rattray M, Hubbard SJ, Talbot NJ, Oliver SG, Paton NW. e-Fungi: a data resource for comparative analysis of fungal genomes. BMC Genomics 2007; 8:426. [PMID: 18028535 PMCID: PMC2242804 DOI: 10.1186/1471-2164-8-426] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2007] [Accepted: 11/20/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The number of sequenced fungal genomes is ever increasing, with about 200 genomes already fully sequenced or in progress. Only a small percentage of those genomes have been comprehensively studied, for example using techniques from functional genomics. Comparative analysis has proven to be a useful strategy for enhancing our understanding of evolutionary biology and of the less well understood genomes. However, the data required for these analyses tends to be distributed in various heterogeneous data sources, making systematic comparative studies a cumbersome task. Furthermore, comparative analyses benefit from close integration of derived data sets that cluster genes or organisms in a way that eases the expression of requests that clarify points of similarity or difference between species. DESCRIPTION To support systematic comparative analyses of fungal genomes we have developed the e-Fungi database, which integrates a variety of data for more than 30 fungal genomes. Publicly available genome data, functional annotations, and pathway information has been integrated into a single data repository and complemented with results of comparative analyses, such as MCL and OrthoMCL cluster analysis, and predictions of signaling proteins and the sub-cellular localisation of proteins. To access the data, a library of analysis tasks is available through a web interface. The analysis tasks are motivated by recent comparative genomics studies, and aim to support the study of evolutionary biology as well as community efforts for improving the annotation of genomes. Web services for each query are also available, enabling the tasks to be incorporated into workflows. CONCLUSION The e-Fungi database provides fungal biologists with a resource for comparative studies of a large range of fungal genomes. Its analysis library supports the comparative study of genome data, functional annotation, and results of large scale analyses over all the genomes stored in the database. The database is accessible at http://www.e-fungi.org.uk, as is the WSDL for the web services.
Collapse
Affiliation(s)
- Cornelia Hedeler
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
532
|
Jakubowska J, Hunt E, Chalmers M, McBride M, Dominiczak AF. VisGenome: visualization of single and comparative genome representations. Bioinformatics 2007; 23:2641-2. [PMID: 17965437 DOI: 10.1093/bioinformatics/btm394] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
VisGenome visualizes single and comparative representations for the rat, the mouse and the human chromosomes at different levels of detail. The tool offers smooth zooming and panning which is more flexible than seen in other browsers. It presents information available in Ensembl for single chromosomes, as well as homologies (orthologue predictions including ortholog one2one, apparent ortholog one2one, ortholog many2many) for any two chromosomes from different species. The application can query supporting data from Ensembl by invoking a link in a browser.
Collapse
Affiliation(s)
- Joanna Jakubowska
- Department of Computing Science, University of Glasgow, G12 8QQ, Scotland.
| | | | | | | | | |
Collapse
|
533
|
Buza TJ, McCarthy FM, Burgess SC. Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome. BMC Genomics 2007; 8:425. [PMID: 18021451 PMCID: PMC2204016 DOI: 10.1186/1471-2164-8-425] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2007] [Accepted: 11/19/2007] [Indexed: 11/11/2022] Open
Abstract
Background The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned. Results We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology), we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO) functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines. Conclusion We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and inform gene prediction algorithms.
Collapse
Affiliation(s)
- Teresia J Buza
- Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, MS 39762, USA.
| | | | | |
Collapse
|
534
|
Porterfield VM, Piontkivska H, Mintz EM. Identification of novel light-induced genes in the suprachiasmatic nucleus. BMC Neurosci 2007; 8:98. [PMID: 18021443 PMCID: PMC2216081 DOI: 10.1186/1471-2202-8-98] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2007] [Accepted: 11/19/2007] [Indexed: 11/16/2022] Open
Abstract
Background The transmission of information about the photic environment to the circadian clock involves a complex array of neurotransmitters, receptors, and second messenger systems. Exposure of an animal to light during the subjective night initiates rapid transcription of a number of immediate-early genes in the suprachiasmatic nucleus of the hypothalamus. Some of these genes have known roles in entraining the circadian clock, while others have unknown functions. Using laser capture microscopy, microarray analysis, and quantitative real-time PCR, we performed a comprehensive screen for changes in gene expression immediately following a 30 minute light pulse in suprachiasmatic nucleus of mice. Results The results of the microarray screen successfully identified previously known light-induced genes as well as several novel genes that may be important in the circadian clock. Newly identified light-induced genes include early growth response 2, proviral integration site 3, growth-arrest and DNA-damage-inducible 45 beta, and TCDD-inducible poly(ADP-ribose) polymerase. Comparative analysis of promoter sequences revealed the presence of evolutionarily conserved CRE and associated TATA box elements in most of the light-induced genes, while other core clock genes generally lack this combination of promoter elements. Conclusion The photic signalling cascade in the suprachiasmatic nucleus activates an array of immediate-early genes, most of which have unknown functions in the circadian clock. Detected evolutionary conservation of CRE and TATA box elements in promoters of light-induced genes suggest that the functional role of these elements has likely remained the same over evolutionary time across mammalian orders.
Collapse
|
535
|
Tzika AC, Helaers R, Van de Peer Y, Milinkovitch MC. MANTIS: a phylogenetic framework for multi-species genome comparisons. ACTA ACUST UNITED AC 2007; 24:151-7. [PMID: 18025004 DOI: 10.1093/bioinformatics/btm567] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Practitioners of comparative genomics face huge analytical challenges as whole genome sequences and functional/expression data accumulate. Furthermore, the field would greatly benefit from a better integration of this wealth of data with evolutionary concepts. RESULTS Here, we present MANTIS, a relational database for the analysis of (i) gains and losses of genes on specific branches of the metazoan phylogeny, (ii) reconstructed genome content of ancestral species and (iii) over- or under-representation of functions/processes and tissue specificity of gained, duplicated and lost genes. MANTIS estimates the most likely positions of gene losses on the true phylogeny using a maximum-likelihood function. A user-friendly interface and an extensive query system allow to investigate questions pertaining to gene identity, phylogenetic mapping and function/expression parameters. AVAILABILITY MANTIS is freely available at http://www.mantisdb.org and constitutes the missing link between multi-species genome comparisons and functional analyses.
Collapse
Affiliation(s)
- Athanasia C Tzika
- Laboratory of Evolutionary Genetics, Institute for Molecular Biology & Medicine, Université Libre de Bruxelles, Belgium
| | | | | | | |
Collapse
|
536
|
|
537
|
Abstract
Epigenetic research aims to understand heritable gene regulation that is not directly encoded in the DNA sequence. Epigenetic mechanisms such as DNA methylation and histone modifications modulate the packaging of the DNA in the nucleus and thereby influence gene expression. Patterns of epigenetic information are faithfully propagated over multiple cell divisions, which makes epigenetic regulation a key mechanism for cellular differentiation and cell fate decisions. In addition, incomplete erasure of epigenetic information can lead to complex patterns of non-Mendelian inheritance. Stochastic and environment-induced epigenetic defects are known to play a major role in cancer and ageing, and they may also contribute to mental disorders and autoimmune diseases. Recent technical advances such as ChIP-on-chip and ChIP-seq have started to convert epigenetic research into a high-throughput endeavor, to which bioinformatics is expected to make significant contributions. Here, we review pioneering computational studies that have contributed to epigenetic research. In addition, we give a brief introduction into epigenetics-targeted at bioinformaticians who are new to the field-and we outline future challenges in computational epigenetics.
Collapse
Affiliation(s)
- Christoph Bock
- Max-Planck-Institut für Informatik, Saarbrücken, Germany.
| | | |
Collapse
|
538
|
Wilming LG, Gilbert JGR, Howe K, Trevanion S, Hubbard T, Harrow JL. The vertebrate genome annotation (Vega) database. Nucleic Acids Res 2007; 36:D753-60. [PMID: 18003653 PMCID: PMC2238886 DOI: 10.1093/nar/gkm987] [Citation(s) in RCA: 183] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) was first made public in 2004 and has been designed to view manual annotation of human, mouse and zebrafish genomic sequences produced at the Wellcome Trust Sanger Institute. Since its initial release, the number of human annotated loci has more than doubled to close to 33 000 and now contains comprehensive annotation on 20 of the 24 human chromosomes, four whole mouse chromosomes and around 40% of the zebrafish Danio rerio genome. In addition, we offer manual annotation of a number of haplotype regions in mouse and human and regions of comparative interest in pig and dog that are unique to Vega.
Collapse
Affiliation(s)
- L G Wilming
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | | | | | | | | | | |
Collapse
|
539
|
Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Gräf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kähäri A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJP, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A, Searle S. Ensembl 2008. Nucleic Acids Res 2007; 36:D707-14. [PMID: 18000006 PMCID: PMC2238821 DOI: 10.1093/nar/gkm988] [Citation(s) in RCA: 371] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein–DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.
Collapse
Affiliation(s)
- P Flicek
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
540
|
Orchard S, Salwinski L, Kerrien S, Montecchi-Palazzi L, Oesterheld M, Stümpflen V, Ceol A, Chatr-aryamontri A, Armstrong J, Woollard P, Salama JJ, Moore S, Wojcik J, Bader GD, Vidal M, Cusick ME, Gerstein M, Gavin AC, Superti-Furga G, Greenblatt J, Bader J, Uetz P, Tyers M, Legrain P, Fields S, Mulder N, Gilson M, Niepmann M, Burgoon L, De Las Rivas J, Prieto C, Perreau VM, Hogue C, Mewes HW, Apweiler R, Xenarios I, Eisenberg D, Cesareni G, Hermjakob H. The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat Biotechnol 2007; 25:894-8. [PMID: 17687370 DOI: 10.1038/nbt1324] [Citation(s) in RCA: 223] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A wealth of molecular interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the experiment. Here we propose MIMIx, the minimum information required for reporting a molecular interaction experiment. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.
Collapse
Affiliation(s)
- Sandra Orchard
- European Molecular Biology Laboratory (EMBL) - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
541
|
Rogers A, Antoshechkin I, Bieri T, Blasiar D, Bastiani C, Canaran P, Chan J, Chen WJ, Davis P, Fernandes J, Fiedler TJ, Han M, Harris TW, Kishore R, Lee R, McKay S, Müller HM, Nakamura C, Ozersky P, Petcherski A, Schindelman G, Schwarz EM, Spooner W, Tuli MA, Van Auken K, Wang D, Wang X, Williams G, Yook K, Durbin R, Stein LD, Spieth J, Sternberg PW. WormBase 2007. Nucleic Acids Res 2007; 36:D612-7. [PMID: 17991679 PMCID: PMC2238927 DOI: 10.1093/nar/gkm975] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
WormBase (www.wormbase.org) is the major publicly available database of information about Caenorhabditis elegans, an important system for basic biological and biomedical research. Derived from the initial ACeDB database of C. elegans genetic and sequence information, WormBase now includes the genomic, anatomical and functional information about C. elegans, other Caenorhabditis species and other nematodes. As such, it is a crucial resource not only for C. elegans biologists but the larger biomedical and bioinformatics communities. Coverage of core areas of C. elegans biology will allow the biomedical community to make full use of the results of intensive molecular genetic analysis and functional genomic studies of this organism. Improved search and display tools, wider cross-species comparisons and extended ontologies are some of the features that will help scientists extend their research and take advantage of other nematode species genome sequences.
Collapse
Affiliation(s)
- Anthony Rogers
- Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridgeshire CB10 1SA, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
542
|
Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res 2007; 36:D154-8. [PMID: 17991681 PMCID: PMC2238936 DOI: 10.1093/nar/gkm952] [Citation(s) in RCA: 3193] [Impact Index Per Article: 177.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
miRBase is the central online repository for microRNA (miRNA) nomenclature, sequence data, annotation and target prediction. The current release (10.0) contains 5071 miRNA loci from 58 species, expressing 5922 distinct mature miRNA sequences: a growth of over 2000 sequences in the past 2 years. miRBase provides a range of data to facilitate studies of miRNA genomics: all miRNAs are mapped to their genomic coordinates. Clusters of miRNA sequences in the genome are highlighted, and can be defined and retrieved with any inter-miRNA distance. The overlap of miRNA sequences with annotated transcripts, both protein- and non-coding, are described. Finally, graphical views of the locations of a wide range of genomic features in model organisms allow for the first time the prediction of the likely boundaries of many miRNA primary transcripts. miRBase is available at http://microrna.sanger.ac.uk/.
Collapse
Affiliation(s)
- Sam Griffiths-Jones
- Faculty of Life Sciences, University of Manchester, Michael Smith Building, Oxford Road, Manchester, UK.
| | | | | | | |
Collapse
|
543
|
Engström PG, Ho Sui SJ, Drivenes O, Becker TS, Lenhard B. Genomic regulatory blocks underlie extensive microsynteny conservation in insects. Genome Res 2007; 17:1898-908. [PMID: 17989259 DOI: 10.1101/gr.6669607] [Citation(s) in RCA: 149] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Insect genomes contain larger blocks of conserved gene order (microsynteny) than would be expected under a random breakage model of chromosome evolution. We present evidence that microsynteny has been retained to keep large arrays of highly conserved noncoding elements (HCNEs) intact. These arrays span key developmental regulatory genes, forming genomic regulatory blocks (GRBs). We recently described GRBs in vertebrates, where most HCNEs function as enhancers and HCNE arrays specify complex expression programs of their target genes. Here we present a comparison of five Drosophila genomes showing that HCNE density peaks centrally in large synteny blocks containing multiple genes. Besides developmental regulators that are likely targets of HCNE enhancers, HCNE arrays often span unrelated neighboring genes. We describe differences in core promoters between the target genes and the unrelated genes that offer an explanation for the differences in their responsiveness to enhancers. We show examples of a striking correspondence between boundaries of synteny blocks, HCNE arrays, and Polycomb binding regions, confirming that the synteny blocks correspond to regulatory domains. Although few noncoding elements are highly conserved between Drosophila and the malaria mosquito Anopheles gambiae, we find that A. gambiae regions orthologous to Drosophila GRBs contain an equivalent distribution of noncoding elements highly conserved in the yellow fever mosquito Aëdes aegypti and coincide with regions of ancient microsynteny between Drosophila and mosquitoes. The structural and functional equivalence between insect and vertebrate GRBs marks them as an ancient feature of metazoan genomes and as a key to future studies of development and gene regulation.
Collapse
Affiliation(s)
- Pär G Engström
- Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Bergen 5008, Norway
| | | | | | | | | |
Collapse
|
544
|
SERpredict: detection of tissue- or tumor-specific isoforms generated through exonization of transposable elements. BMC Genet 2007; 8:78. [PMID: 17986331 PMCID: PMC2194731 DOI: 10.1186/1471-2156-8-78] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2007] [Accepted: 11/06/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transposed elements (TEs) are known to affect transcriptomes, because either new exons are generated from intronic transposed elements (this is called exonization), or the element inserts into the exon, leading to a new transcript. Several examples in the literature show that isoforms generated by an exonization are specific to a certain tissue (for example the heart muscle) or inflict a disease. Thus, exonizations can have negative effects for the transcriptome of an organism. RESULTS As we aimed at detecting other tissue- or tumor-specific isoforms in human and mouse genomes which were generated through exonization of a transposed element, we designed the automated analysis pipeline SERpredict (SER = Specific Exonized Retroelement) making use of Bayesian Statistics. With this pipeline, we found several genes in which a transposed element formed a tissue- or tumor-specific isoform. CONCLUSION Our results show that SERpredict produces relevant results, demonstrating the importance of transposed elements in shaping both the human and the mouse transcriptomes. The effect of transposed elements on the human transcriptome is several times higher than the effect on the mouse transcriptome, due to the contribution of the primate-specific Alu elements.
Collapse
|
545
|
Ding G, Sun Y, Li H, Wang Z, Fan H, Wang C, Yang D, Li Y. EPGD: a comprehensive web resource for integrating and displaying eukaryotic paralog/paralogon information. Nucleic Acids Res 2007; 36:D255-62. [PMID: 17984073 PMCID: PMC2238967 DOI: 10.1093/nar/gkm924] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Gene duplication is common in all three domains of life, especially in eukaryotic genomes. The duplicates provide new material for the action of evolutionary forces such as selection or genetic drift. Here we describe a sophisticated procedure to extract duplicated genes (paralogs) from 26 available eukaryotic genomes, to pre-calculate several evolutionary indexes (evolutionary rate, synonymous distance/clock, transition redundant exchange clock, etc.) based on the paralog family, and to identify block or segmental duplications (paralogons). We also constructed an internet-accessible Eukaryotic Paralog Group Database (EPGD; http://epgd.biosino.org/EPGD/). The database is gene-centered and organized by paralog family. It focuses on paralogs and evolutionary duplication events. The paralog families and paralogons can be searched by text or sequence, and are downloadable from the website as plain text files. The database will be very useful for both experimentalists and bioinformaticians interested in the study of duplication events or paralog families.
Collapse
Affiliation(s)
- Guohui Ding
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, P. R. China
| | | | | | | | | | | | | | | |
Collapse
|
546
|
Sprenger J, Lynn Fink J, Karunaratne S, Hanson K, Hamilton NA, Teasdale RD. LOCATE: a mammalian protein subcellular localization database. Nucleic Acids Res 2007; 36:D230-3. [PMID: 17986452 PMCID: PMC2238969 DOI: 10.1093/nar/gkm950] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
LOCATE is a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of mouse and human proteins. Over the past 2 years, the data in LOCATE have grown substantially. The database now contains high-quality localization data for 20% of the mouse proteome and general localization annotation for nearly 36% of the mouse proteome. The proteome annotated in LOCATE is from the RIKEN FANTOM Consortium Isoform Protein Sequence sets which contains 58 128 mouse and 64 637 human protein isoforms. Other additions include computational subcellular localization predictions, automated computational classification of experimental localization image data, prediction of protein sorting signals and third party submission of literature data. Collectively, this database provides localization proteome for individual subcellular compartments that will underpin future systematic investigations of these regions. It is available at http://locate.imb.uq.edu.au/
Collapse
Affiliation(s)
- Josefine Sprenger
- ARC Centre of Excellence in Bioinformatics, Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland 4072, Australia
| | | | | | | | | | | |
Collapse
|
547
|
Halees AS, El-Badrawi R, Khabar KSA. ARED Organism: expansion of ARED reveals AU-rich element cluster variations between human and mouse. Nucleic Acids Res 2007; 36:D137-40. [PMID: 17984078 PMCID: PMC2238997 DOI: 10.1093/nar/gkm959] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
ARED Organism represents the expansion of the adenylate uridylate (AU)-rich element (ARE)-containing human mRNA database into the transcriptomes of mouse and rat. As a result, we performed quantitative assessment of ARE conservation in human, mouse and rat transcripts. We found that a significant proportion (∼25%) of human genes differ in their ARE patterns from mouse and rat transcripts. ARED-Integrated, another updated and expanded version of ARED, is a compilation of ARED versions 1.0 to 3.0 and updated version 4.0 that is devoted to human mRNAs. Thus, ARED-Integrated and ARED-Organism databases, both publicly available at http://brp.kfshrc.edu.sa/ARED, offer scientists a comprehensive view of AREs in the human transcriptome and the ability to study the comparative genomics of AREs in model organisms. This ultimately will help in inferring the biological consequences of ARE variation in these key animal models as opposed to humans, particularly, in relationships to the role of RNA stability in disease.
Collapse
Affiliation(s)
- Anason S Halees
- The Biomolecular Research Program, King Faisal Specialist Hospital and Research Center, Riyadh 11211, Saudi Arabia
| | | | | |
Collapse
|
548
|
Bruford EA, Lush MJ, Wright MW, Sneddon TP, Povey S, Birney E. The HGNC Database in 2008: a resource for the human genome. Nucleic Acids Res 2007; 36:D445-8. [PMID: 17984084 PMCID: PMC2238870 DOI: 10.1093/nar/gkm881] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique and ideally meaningful name and symbol to every human gene. The HGNC database currently comprises over 24 000 public records containing approved human gene nomenclature and associated gene information. Following our recent relocation to the European Bioinformatics Institute our homepage can now be found at http://www.genenames.org, with direct links to the searchable HGNC database and other related database resources, such as the HCOP orthology search tool and manually curated gene family webpages.
Collapse
Affiliation(s)
- Elspeth A Bruford
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | | | | | | | | | | |
Collapse
|
549
|
Liang C, Jaiswal P, Hebbard C, Avraham S, Buckler ES, Casstevens T, Hurwitz B, McCouch S, Ni J, Pujar A, Ravenscroft D, Ren L, Spooner W, Tecle I, Thomason J, Tung CW, Wei X, Yap I, Youens-Clark K, Ware D, Stein L. Gramene: a growing plant comparative genomics resource. Nucleic Acids Res 2007; 36:D947-53. [PMID: 17984077 PMCID: PMC2238951 DOI: 10.1093/nar/gkm968] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Gramene (www.gramene.org) is a curated resource for genetic, genomic and comparative genomics data for the major crop species, including rice, maize, wheat and many other plant (mainly grass) species. Gramene is an open-source project. All data and software are freely downloadable through the ftp site (ftp.gramene.org/pub/gramene) and available for use without restriction. Gramene's core data types include genome assembly and annotations, other DNA/mRNA sequences, genetic and physical maps/markers, genes, quantitative trait loci (QTLs), proteins, ontologies, literature and comparative mappings. Since our last NAR publication 2 years ago, we have updated these data types to include new datasets and new connections among them. Completely new features include rice pathways for functional annotation of rice genes; genetic diversity data from rice, maize and wheat to show genetic variations among different germplasms; large-scale genome comparisons among Oryza sativa and its wild relatives for evolutionary studies; and the creation of orthologous gene sets and phylogenetic trees among rice, Arabidopsis thaliana, maize, poplar and several animal species (for reference purpose). We have significantly improved the web interface in order to provide a more user-friendly browsing experience, including a dropdown navigation menu system, unified web page for markers, genes, QTLs and proteins, and enhanced quick search functions.
Collapse
Affiliation(s)
- Chengzhi Liang
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
550
|
Courcelle E, Beausse Y, Letort S, Stahl O, Fremez R, Ngom-Bru C, Gouzy J, Faraut T. Narcisse: a mirror view of conserved syntenies. Nucleic Acids Res 2007; 36:D485-90. [PMID: 17981845 PMCID: PMC2238891 DOI: 10.1093/nar/gkm805] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
New methods and tools are needed to exploit the unprecedented source of information made available by the completed and ongoing whole genome sequencing projects. The Narcisse database is dedicated to the study of genome conservation, from sequence similarities to conserved chromosomal segments or conserved syntenies, for a large number of animals, plants and bacterial completely sequenced genomes. The query interface, a comparative genome browser, enables to navigate between genome dotplots, comparative maps and sequence alignments. The Narcisse database can be accessed at http://narcisse.toulouse.inra.fr.
Collapse
Affiliation(s)
- Emmanuel Courcelle
- Laboratoire Interactions Plantes Micro-organismes UMR 441/2594, INRA/CNRS and Laboratoire de Génétique Cellulaire UMR 444 INRA/ENVT, INRA, Centre de Recherches de Toulouse, 31326 Castanet Tolosan, France
| | | | | | | | | | | | | | | |
Collapse
|