201
|
Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S. The generic genome browser: a building block for a model organism system database. Genome Res 2002; 12:1599-610. [PMID: 12368253 PMCID: PMC187535 DOI: 10.1101/gr.403602] [Citation(s) in RCA: 953] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2002] [Accepted: 08/09/2002] [Indexed: 11/24/2022]
Abstract
The Generic Model Organism System Database Project (GMOD) seeks to develop reusable software components for model organism system databases. In this paper we describe the Generic Genome Browser (GBrowse), a Web-based application for displaying genomic annotations and other features. For the end user, features of the browser include the ability to scroll and zoom through arbitrary regions of a genome, to enter a region of the genome by searching for a landmark or performing a full text search of all features, and the ability to enable and disable tracks and change their relative order and appearance. The user can upload private annotations to view them in the context of the public ones, and publish those annotations to the community. For the data provider, features of the browser software include reliance on readily available open source components, simple installation, flexible configuration, and easy integration with other components of a model organism system Web site. GBrowse is freely available under an open source license. The software, its documentation, and support are available at http://www.gmod.org.
Collapse
Affiliation(s)
- Lincoln D Stein
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11790, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
202
|
Roy PJ, Stuart JM, Lund J, Kim SK. Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature 2002; 418:975-9. [PMID: 12214599 DOI: 10.1038/nature01012] [Citation(s) in RCA: 323] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Chromosomes are divided into domains of open chromatin, where genes have the potential to be expressed, and domains of closed chromatin, where genes are not expressed. Classic examples of open chromatin domains include 'puffs' on polytene chromosomes in Drosophila and extended loops from lampbrush chromosomes. If multiple genes were typically expressed together from a single open chromatin domain, the position of co-expressed genes along the chromosomes would appear clustered. To investigate whether co-expressed genes are clustered, we examined the chromosomal positions of the genes expressed in the muscle of Caenorhabditis elegans at the first larval stage. Here we show that co-expressed genes in C. elegans are clustered in groups of 2-5 along the chromosomes, suggesting that expression from a chromatin domain can extend over several genes. These observations reveal a higher-order organization of the structure of the genome, in which the order of the genes along the chromosome id correlated with their expression in specific tissues.
Collapse
Affiliation(s)
- Peter J Roy
- Department of Developmental Biology, Stanford University Medical Center, California 94305, USA
| | | | | | | |
Collapse
|
203
|
Swan KA, Curtis DE, McKusick KB, Voinov AV, Mapa FA, Cancilla MR. High-throughput gene mapping in Caenorhabditis elegans. Genome Res 2002; 12:1100-5. [PMID: 12097347 PMCID: PMC186621 DOI: 10.1101/gr.208902] [Citation(s) in RCA: 425] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
Positional cloning of mutations in model genetic systems is a powerful method for the identification of targets of medical and agricultural importance. To facilitate the high-throughput mapping of mutations in Caenorhabditis elegans, we have identified a further 9602 putative new single nucleotide polymorphisms (SNPs) between two C. elegans strains, Bristol N2 and the Hawaiian mapping strain CB4856, by sequencing inserts from a CB4856 genomic DNA library and using an informatics pipeline to compare sequences with the canonical N2 genomic sequence. When combined with data from other laboratories, our marker set of 17,189 SNPs provides even coverage of the complete worm genome. To date, we have confirmed >1099 evenly spaced SNPs (one every 91 +/- 56 kb) across the six chromosomes and validated the utility of our SNP marker set and new fluorescence polarization-based genotyping methods for systematic and high-throughput identification of genes in C. elegans by cloning several proprietary genes. We illustrate our approach by recombination mapping and confirmation of the mutation in the cloned gene, dpy-18.
Collapse
Affiliation(s)
- Kathryn A Swan
- Exelixis, Inc., South San Francisco, California 94083-0511, USA
| | | | | | | | | | | |
Collapse
|
204
|
Abstract
Lactational strategies and associated development of the young have been studied in a diverse range of species, and comparative analysis allows common trends and differences to be revealed. The whey fraction contains a vast number of proteins, many of which have not been assigned a function. However, it is expected that an understanding of the comparative biology of these proteins may provide some promise in assigning a function to the major whey proteins. Whey acidic protein is a major component of the whey fraction that has been studied across a range of species, revealing conservation of gene structure, whereas regulation and temporal expression patterns vary. This review focuses primarily on comparative analysis of whey acidic protein, highlighting gene structure, developmental and hormonal regulation, and potential functional roles for this protein. In addition, the contrasting regulation and secretion profiles of several other major whey proteins are discussed.
Collapse
Affiliation(s)
- Kaylene J Simpson
- Department of Biochemistry and Molecular Biology, The University of Melbourne, Parkville, Victoria, Australia.
| | | |
Collapse
|
205
|
Blumenthal T, Evans D, Link CD, Guffanti A, Lawson D, Thierry-Mieg J, Thierry-Mieg D, Chiu WL, Duke K, Kiraly M, Kim SK. A global analysis of Caenorhabditis elegans operons. Nature 2002; 417:851-4. [PMID: 12075352 DOI: 10.1038/nature00831] [Citation(s) in RCA: 265] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The nematode worm Caenorhabditis elegans and its relatives are unique among animals in having operons. Operons are regulated multigene transcription units, in which polycistronic pre-messenger RNA (pre-mRNA coding for multiple peptides) is processed to monocistronic mRNAs. This occurs by 3' end formation and trans-splicing using the specialized SL2 small nuclear ribonucleoprotein particle for downstream mRNAs. Previously, the correlation between downstream location in an operon and SL2 trans-splicing has been strong, but anecdotal. Although only 28 operons have been reported, the complete sequence of the C. elegans genome reveals numerous gene clusters. To determine how many of these clusters represent operons, we probed full-genome microarrays for SL2-containing mRNAs. We found significant enrichment for about 1,200 genes, including most of a group of several hundred genes represented by complementary DNAs that contain SL2 sequence. Analysis of their genomic arrangements indicates that >90% are downstream genes, falling in 790 distinct operons. Our evidence indicates that the genome contains at least 1,000 operons, 2 8 genes long, that contain about 15% of all C. elegans genes. Numerous examples of co-transcription of genes encoding functionally related proteins are evident. Inspection of the operon list should reveal previously unknown functional relationships.
Collapse
Affiliation(s)
- Thomas Blumenthal
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Box B121, 4200 E. 9th Avenue, Denver, Colorado 80262, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
206
|
Coghlan A, Wolfe KH. Fourfold faster rate of genome rearrangement in nematodes than in Drosophila. Genome Res 2002; 12:857-67. [PMID: 12045140 PMCID: PMC1383740 DOI: 10.1101/gr.172702] [Citation(s) in RCA: 149] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We compared the genome of the nematode Caenorhabditis elegans to 13% of that of Caenorhabditis briggsae, identifying 252 conserved segments along their chromosomes. We detected 517 chromosomal rearrangements, with the ratio of translocations to inversions to transpositions being approximately 1:1:2. We estimate that the species diverged 50-120 million years ago, and that since then there have been 4030 rearrangements between their whole genomes. Our estimate of the rearrangement rate, 0.4-1.0 chromosomal breakages/Mb per Myr, is at least four times that of Drosophila, which was previously reported to be the fastest rate among eukaryotes. The breakpoints of translocations are strongly associated with dispersed repeats and gene family members in the C. elegans genome.
Collapse
Affiliation(s)
- Avril Coghlan
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland
| | | |
Collapse
|
207
|
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Res 2002; 12:996-1006. [PMID: 12045153 PMCID: PMC186604 DOI: 10.1101/gr.229102] [Citation(s) in RCA: 6744] [Impact Index Per Article: 306.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.
Collapse
Affiliation(s)
- W James Kent
- Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | |
Collapse
|
208
|
Noureddine MA, Donaldson TD, Thacker SA, Duronio RJ. Drosophila Roc1a encodes a RING-H2 protein with a unique function in processing the Hh signal transducer Ci by the SCF E3 ubiquitin ligase. Dev Cell 2002; 2:757-70. [PMID: 12062088 DOI: 10.1016/s1534-5807(02)00164-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Substrate specificity of SCF E3 ubiquitin ligases is thought to be determined by the F box protein subunit. Another component of SCF complexes is provided by members of the Roc1/Rbx1/Hrt1 gene family, which encode RING-H2 proteins. Drosophila contains three members of this gene family. We show that Roc1a mutant cells fail to proliferate. Further, while the F box protein Slimb is required for Cubitus interruptus (Ci) and Armadillo/beta-catenin (Arm) proteolysis, Roc1a mutant cells hyperaccumulate Ci but not Arm. This suggests that Slimb and Roc1a function in the same SCF complex to target Ci but that a different RING-H2 protein acts with Slimb to target Arm. Consequently, the identity of the Roc subunit may contribute to the selection of substrates by metazoan SCF complexes.
Collapse
Affiliation(s)
- Maher A Noureddine
- Department of Biology, University of North Carolina, Chapel Hill 27599, USA
| | | | | | | |
Collapse
|
209
|
Abstract
The genomes of over 60 organisms from all three kingdoms of life are now entirely sequenced. In many respects, the inventory of proteins used in different kingdoms appears surprisingly similar. However, eukaryotes differ from other kingdoms in that they use many long proteins, and have more proteins with coiled-coil helices and with regions abundant in regular secondary structure. Particular structural domains are used in many pathways. Nevertheless, one domain tends to occur only once in one particular pathway. Many proteins do not have close homologues in different species (orphans) and there could even be folds that are specific to one species. This view implies that protein fold space is discrete. An alternative model suggests that structure space is continuous and that modern proteins evolved by aggregating fragments of ancient proteins. Either way, after having harvested proteomes by applying standard tools, the challenge now seems to be to develop better methods for comparative proteomics.
Collapse
Affiliation(s)
- Burkhard Rost
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street, BB217, New York, NY 10032, USA.
| |
Collapse
|
210
|
Abstract
As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.
Collapse
Affiliation(s)
- W James Kent
- Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | |
Collapse
|
211
|
Huang X, Cheng HJ, Tessier-Lavigne M, Jin Y. MAX-1, a novel PH/MyTH4/FERM domain cytoplasmic protein implicated in netrin-mediated axon repulsion. Neuron 2002; 34:563-76. [PMID: 12062040 DOI: 10.1016/s0896-6273(02)00672-4] [Citation(s) in RCA: 86] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The netrin UNC-6 repels motor axons by activating the UNC-5 receptor alone or in combination with the UNC-40/DCC receptor. In a genetic screen for C. elegans mutants exhibiting partial defects in motor axon projections, we isolated the max-1 gene (required for motor neuron axon guidance). max-1 loss-of-function mutations cause fully penetrant but variable axon guidance defects. Mutations in unc-5 and unc-6, but not in unc-40, dominantly enhance the mutant phenotypes of max-1, whereas overexpression of unc-5 or unc-6, but not of unc-40, bypasses the requirement for max-1. MAX-1 proteins contain PH, MyTH4, and FERM domains and appear to be localized to neuronal processes. Human MAX-1 and UNC5H2 colocalize in discrete subcellular regions of transfected cells. Our results suggest a possible role for MAX-1 in netrin-induced axon repulsion by modulating the UNC-5 receptor signaling pathway.
Collapse
Affiliation(s)
- Xun Huang
- Department of Molecular, Cellular, and Developmental Biology, Santa Cruz, CA 95064, USA
| | | | | | | |
Collapse
|
212
|
Graustein A, Gaspar JM, Walters JR, Palopoli MF. Levels of DNA polymorphism vary with mating system in the nematode genus caenorhabditis. Genetics 2002; 161:99-107. [PMID: 12019226 PMCID: PMC1462083 DOI: 10.1093/genetics/161.1.99] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Self-fertilizing species often harbor less genetic variation than cross-fertilizing species, and at least four different models have been proposed to explain this trend. To investigate further the relationship between mating system and genetic variation, levels of DNA sequence polymorphism were compared among three closely related species in the genus Caenorhabditis: two self-fertilizing species, Caenorhabditis elegans and C. briggsae, and one cross-fertilizing species, C. remanei. As expected, estimates of silent site nucleotide diversity were lower in the two self-fertilizing species. For the mitochondrial genome, diversity in the selfing species averaged 42% of diversity in C. remanei. Interestingly, the reduction in genetic variation was much greater for the nuclear than for the mitochondrial genome. For two nuclear genes, diversity in the selfing species averaged 6 and 13% of diversity in C. remanei. We argue that either population bottlenecks or the repeated action of natural selection, coupled with high levels of selfing, are likely to explain the observed reductions in species-wide genetic diversity.
Collapse
Affiliation(s)
- Andrew Graustein
- Department of Biology, Bowdoin College, Brunswick, Maine 04011, USA
| | | | | | | |
Collapse
|
213
|
Abstract
It has been hypothesized that evolutionary changes will be more frequent in later ontogeny than early ontogeny because of developmental constraint. To test this hypothesis, a genomewide examination of molecular evolution through ontogeny was carried out using comparative genomic data in Caenorhabditis elegans and Caenorhabditis briggsae. We found that the mean rate of amino acid replacement is not significantly different between genes expressed during and after embryogenesis. However, synonymous substitution rates differed significantly between these two classes. A genomewide survey of correlation between codon bias and expression level found codon bias to be significantly correlated with mRNA expression (r(s) = -0.30 and P < 10(-131)) but does not alone explain differences in dS between classes. Surprisingly, it was found that genes expressed after embryogenesis have a significantly greater number of duplicates in both the C. elegans and C. briggsae genomes (P < 10(-20) and P < 10(-13)) when compared with early-expressed and nonmodulated genes. A similarity in the distribution of duplicates of nonmodulated and early-expressed genes, as well as a disproportionately higher number of early pseudogenes, lend support to the hypothesis that this difference in duplicate number is caused by selection against gene duplicates of early-expressed genes, reflecting developmental constraint. Developmental constraint at the level of gene duplication may have important implications for macroevolutionary change.
Collapse
Affiliation(s)
- Cristian I Castillo-Davis
- Department of Organismic and Evolutionary Biology, Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | | |
Collapse
|
214
|
Abstract
The aging genes/interventions database (AGEID) is a database of experimental results related to aging. AGEID is available as part of the science of aging knowledge environment on the World Wide Web at http://sageke.sciencemag.org/cgi/genesdb. The goal of AGEID is to catalog, in one location, every published experiment where life span has been measured in any organism. AGEID also includes information on genes that influence the incidence of age-associated disorders such as Alzheimer's disease and Parkinson's disease. AGEID gene/intervention reports are formatted pages containing the organism and strain background in which the particular experiment was performed, the type of genetic or environmental perturbation, the effect on life span, a description of the gene function and its role in longevity, protein homologs, and references. The use of this database by researchers who study aging should facilitate easy comparison of the genes and interventions that affect life span in different organisms.
Collapse
|
215
|
Abstract
The review begins by providing a brief typology of biological databases on the Internet, illustrated by examples of the most influential resources of each kind. We then take an insider look at one typical on-line genomic resource -- the yeast genome database hosted at the Munich Information Center for Protein Sequences (MIPS) -- and explain how and why it has evolved from a basic sequence repository to a multidomain knowledge base. The role of community efforts in curating and annotating genome data is discussed. The crucial role of data integration and interoperability in developing next-generation genomic facilities is underscored.
Collapse
Affiliation(s)
- Dmitrij Frishman
- Institute for Bioinformatics, GSF - National Research Center for Environment and Heatlh, Ingolstädter Landstrasse 1, 85764 Neueherberg, Germany.
| | | | | |
Collapse
|
216
|
Abstract
euGenes is a genome information system and database that provides a common summary of eukaryote genes and genomes, at http://iubio.bio.indiana.edu/eugenes/. Seven popular genomes are included: human, mouse, fruitfly, Caenorhabditis elegans worm, Saccharomyces yeast, Arabidopsis mustard weed and zebrafish, with more planned. This information, automatically extracted and updated from several source databases, offers features not readily available through other genome databases to bioscientists looking for gene relationships across organisms. The database describes 150 000 known, predicted and orphan genes, using consistent gene names along with their homologies and associations with a standard vocabulary of molecular functions, cell locations and biological processes. Usable whole-genome maps including features, chromosome locations and molecular data integration are available, as are options to retrieve sequences from these genomes. Search and retrieval methods for these data are easy to use and efficient, allowing one to ask combined questions of sequence features, protein functions and other gene attributes, and fetch results in reports, computable tabular outputs or bulk database forms. These summarized data are useful for integration in other projects, such as gene expression databases. euGenes provides an extensible, flexible genome information system for many organisms.
Collapse
Affiliation(s)
- Donald G Gilbert
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA.
| |
Collapse
|
217
|
Martin SL, Blackmon BP, Rajagopalan R, Houfek TD, Sceeles RG, Denn SO, Mitchell TK, Brown DE, Wing RA, Dean RA. MagnaportheDB: a federated solution for integrating physical and genetic map data with BAC end derived sequences for the rice blast fungus Magnaporthe grisea. Nucleic Acids Res 2002; 30:121-4. [PMID: 11752272 PMCID: PMC99159 DOI: 10.1093/nar/30.1.121] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We have created a federated database for genome studies of Magnaporthe grisea, the causal agent of rice blast disease, by integrating end sequence data from BAC clones, genetic marker data and BAC contig assembly data. A library of 9216 BAC clones providing >25-fold coverage of the entire genome was end sequenced and fingerprinted by HindIII digestion. The Image/FPC software package was then used to generate an assembly of 188 contigs covering >95% of the genome. The database contains the results of this assembly integrated with hybridization data of genetic markers to the BAC library. AceDB was used for the core database engine and a MySQL relational database, populated with numerical representations of BAC clones within FPC contigs, was used to create appropriately scaled images. The database is being used to facilitate sequencing efforts. The database also allows researchers mapping known genes or other sequences of interest, rapid and easy access to the fundamental organization of the M.grisea genome. This database, MagnaportheDB, can be accessed on the web at http://www.cals.ncsu.edu/fungal_genomics/mgdatabase/int.htm.
Collapse
Affiliation(s)
- Stanton L Martin
- Fungal Genomics Laboratory, North Carolina State University, 840 Main Campus Drive, Suite 1200, Raleigh, NC 27606, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
218
|
Misra S, Crosby MA, Mungall CJ, Matthews BB, Campbell KS, Hradecky P, Huang Y, Kaminker JS, Millburn GH, Prochnik SE, Smith CD, Tupy JL, Whitfied EJ, Bayraktaroglu L, Berman BP, Bettencourt BR, Celniker SE, de Grey ADNJ, Drysdale RA, Harris NL, Richter J, Russo S, Schroeder AJ, Shu SQ, Stapleton M, Yamada C, Ashburner M, Gelbart WM, Rubin GM, Lewis SE. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol 2002; 3:RESEARCH0083. [PMID: 12537572 PMCID: PMC151185 DOI: 10.1186/gb-2002-3-12-research0083] [Citation(s) in RCA: 246] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2002] [Revised: 11/28/2002] [Accepted: 11/28/2002] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. RESULTS Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. CONCLUSIONS Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
Collapse
Affiliation(s)
- Sima Misra
- Department of Molecular and Cell Biology, University of California, Life Sciences Addition, Berkeley, CA 94720-3200, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
219
|
Abstract
GeneLynx is a meta-database providing an extensive collection of hyperlinks to human gene-specific information in diverse databases available on the Internet. The GeneLynx project is based on the simple notion that given any gene-specific identifier (accession number, gene name, text, or sequence), scientists should be able to access a single location that provides a set of links to all the publicly available information pertinent to the specified human gene. GeneLynx was implemented as an extensible relational database with an intuitive and user-friendly Web interface. The data are automatically extracted from more than 40 external resources, using appropriate approaches to maximize coverage of the available data. Construction and curation of the system is mediated by a custom set of software tools. An indexing utility is provided to facilitate the establishment of hyperlinks in external databases. A unique feature of the GeneLynx system is a communal curation system for user-aided annotation. GeneLynx can be accessed freely at http://www.genelynx.org.
Collapse
Affiliation(s)
- B Lenhard
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | | | | |
Collapse
|
220
|
Basham SE, Rose LS. The Caenorhabditis elegans polarity gene ooc-5 encodes a Torsin-related protein of the AAA ATPase superfamily. Development 2001; 128:4645-56. [PMID: 11714689 DOI: 10.1242/dev.128.22.4645] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The PAR proteins are required for polarity and asymmetric localization of cell fate determinants in C. elegans embryos. In addition, several of the PAR proteins are conserved and localized asymmetrically in polarized cells in Drosophila, Xenopus and mammals. We have previously shown that ooc-5 and ooc-3 mutations result in defects in spindle orientation and polarity in early C. elegans embryos. In particular, mutations in these genes affect the re-establishment of PAR protein asymmetry in the P1 cell of two-cell embryos. We now report that ooc-5 encodes a putative ATPase of the Clp/Hsp100 and AAA superfamilies of proteins, with highest sequence similarity to Torsin proteins; the gene for human Torsin A is mutated in individuals with early-onset torsion dystonia, a neuromuscular disease. Although Clp/Hsp100 and AAA family proteins have roles in diverse cellular activities, many are involved in the assembly or disassembly of proteins or protein complexes; thus, OOC-5 may function as a chaperone. OOC-5 protein co-localizes with a marker of the endoplasmic reticulum in all blastomeres of the early C. elegans embryo, in a pattern indistinguishable from that of OOC-3 protein. Furthermore, OOC-5 localization depends on the normal function of the ooc-3 gene. These results suggest that OOC-3 and OOC-5 function in the secretion of proteins required for the localization of PAR proteins in the P1 cell, and may have implications for the study of torsion dystonia.
Collapse
Affiliation(s)
- S E Basham
- Section of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA
| | | |
Collapse
|
221
|
Lim CS, Mian IS, Dernburg AF, Campisi J. C. elegans clk-2, a gene that limits life span, encodes a telomere length regulator similar to yeast telomere binding protein Tel2p. Curr Biol 2001; 11:1706-10. [PMID: 11696330 DOI: 10.1016/s0960-9822(01)00526-7] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
An important quest in modern biology is to identify genes involved in aging. Model organisms such as the nematode Caenorhabditis elegans are particularly useful in this regard. The C. elegans genome has been sequenced [1], and single gene mutations that extend adult life span have been identified [2]. Among these longevity-controlling loci are four apparently unrelated genes that belong to the clk family. In mammals, telomere length and structure can influence cellular, and possibly organismal, aging. Here, we show that clk-2 encodes a regulator of telomere length in C. elegans.
Collapse
Affiliation(s)
- C S Lim
- Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | |
Collapse
|
222
|
Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics 2001; 2:7. [PMID: 11667947 PMCID: PMC58584 DOI: 10.1186/1471-2105-2-7] [Citation(s) in RCA: 319] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2001] [Accepted: 10/10/2001] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Currently, most genome annotation is curated by centralized groups with limited resources. Efforts to share annotations transparently among multiple groups have not yet been satisfactory. RESULTS Here we introduce a concept called the Distributed Annotation System (DAS). DAS allows sequence annotations to be decentralized among multiple third-party annotators and integrated on an as-needed basis by client-side software. The communication between client and servers in DAS is defined by the DAS XML specification. Annotations are displayed in layers, one per server. Any client or server adhering to the DAS XML specification can participate in the system; we describe a simple prototype client and server example. CONCLUSIONS The DAS specification is being used experimentally by Ensembl, WormBase, and the Berkeley Drosophila Genome Project. Continued success will depend on the readiness of the research community to adopt DAS and provide annotations. All components are freely available from the project website http://www.biodas.org/.
Collapse
Affiliation(s)
- Robin D Dowell
- Howard Hughes Medical Institute and Department of Genetics, Washington University, St. Louis, MO 63110 USA
| | - Rodney M Jokerst
- Howard Hughes Medical Institute and Department of Genetics, Washington University, St. Louis, MO 63110 USA
| | - Allen Day
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 USA
| | - Sean R Eddy
- Howard Hughes Medical Institute and Department of Genetics, Washington University, St. Louis, MO 63110 USA
| | - Lincoln Stein
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724 USA
| |
Collapse
|
223
|
Davy A, Bello P, Thierry-Mieg N, Vaglio P, Hitti J, Doucette-Stamm L, Thierry-Mieg D, Reboul J, Boulton S, Walhout AJ, Coux O, Vidal M. A protein-protein interaction map of the Caenorhabditis elegans 26S proteasome. EMBO Rep 2001; 2:821-8. [PMID: 11559592 PMCID: PMC1084039 DOI: 10.1093/embo-reports/kve184] [Citation(s) in RCA: 142] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The ubiquitin-proteasome proteolytic pathway is pivotal in most biological processes. Despite a great level of information available for the eukaryotic 26S proteasome-the protease responsible for the degradation of ubiquitylated proteins-several structural and functional questions remain unanswered. To gain more insight into the assembly and function of the metazoan 26S proteasome, a two-hybrid-based protein interaction map was generated using 30 Caenorhabditis elegans proteasome subunits. The results recapitulate interactions reported for other organisms and reveal new potential interactions both within the 19S regulatory complex and between the 19S and 20S subcomplexes. Moreover, novel potential proteasome interactors were identified, including an E3 ubiquitin ligase, transcription factors, chaperone proteins and other proteins not yet functionally annotated. By providing a wealth of novel biological hypotheses, this interaction map constitutes a framework for further analysis of the ubiquitin-proteasome pathway in a multicellular organism amenable to both classical genetics and functional genomics.
Collapse
Affiliation(s)
- A Davy
- CRBM, CNRS UPR-1086, IFR 24, 34293 Montpellier, France
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
224
|
Abstract
Caenorhabditis elegans is a powerful animal model for the study of functional genomics. The completed and well-annotated DNA sequence is available and a systematic study of gene function by RNA-interference-mediated knockdown of every gene is in progress. Full-genome DNA microarrays and DNA chips can be used to determine expression changes at different stages of development and in different mutant backgrounds, and a protein-interaction map based on the yeast two-hybrid approach is in progress. These high-capacity approaches to studying gene function will provide new insights into invertebrate and vertebrate biology.
Collapse
Affiliation(s)
- S K Kim
- Department of Developmental Biology, Stanford University Medical School, Stanford, California 94305, USA.
| |
Collapse
|
225
|
Abstract
The genome sequence of an organism is an information resource unlike any that biologists have previously had access to. But the value of the genome is only as good as its annotation. It is the annotation that bridges the gap from the sequence to the biology of the organism. The aim of high-quality annotation is to identify the key features of the genome - in particular, the genes and their products. The tools and resources for annotation are developing rapidly, and the scientific community is becoming increasingly reliant on this information for all aspects of biological research.
Collapse
Affiliation(s)
- L Stein
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA.
| |
Collapse
|
226
|
Abstract
The spatio-temporal expression pattern of a gene during development is a valuable piece of information. But there is no way to compare precisely the patterns of expression of different genes, or the way the patterns are changed in a mutant. One way to solve this problem is to construct digital reference images of development (a bioinformatics framework), to which expression patterns can be mapped and stored, then compared. Such frameworks are under active development in several model systems. They will form the basis of powerful and integrated gene expression databases, which facilitate comparisons between genes, tissues and species.
Collapse
Affiliation(s)
- D Davidson
- MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK.
| | | |
Collapse
|