851
|
Salzberg SL, Hotopp JCD, Delcher AL, Pop M, Smith DR, Eisen MB, Nelson WC. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species. Genome Biol 2005; 6:R23. [PMID: 15774024 PMCID: PMC1088942 DOI: 10.1186/gb-2005-6-3-r23] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2004] [Revised: 01/24/2005] [Accepted: 01/24/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. RESULTS By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. CONCLUSIONS The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.
Collapse
Affiliation(s)
- Steven L Salzberg
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | | | - Arthur L Delcher
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Mihai Pop
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Douglas R Smith
- Agencourt Bioscience Corporation, 100 Cumming Center, Beverley, MA 01915, USA
| | - Michael B Eisen
- Center for Integrative Genomics, University of California, Berkeley, CA 94720, USA
| | - William C Nelson
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| |
Collapse
|
852
|
Abstract
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family. Comparative genome sequencing of indica and japonica rice reveals that duplication of genes and genomic regions has played a major part in the evolution of grass genomes
Collapse
|
853
|
Srikanth S, Wang Z, Tu H, Nair S, Mathew MK, Hasan G, Bezprozvanny I. Functional properties of the Drosophila melanogaster inositol 1,4,5-trisphosphate receptor mutants. Biophys J 2005; 86:3634-46. [PMID: 15189860 PMCID: PMC1304265 DOI: 10.1529/biophysj.104.040121] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
The inositol (1,4,5)-trisphosphate receptor (InsP(3)R) is an intracellular calcium (Ca(2+)) release channel that plays a crucial role in cell signaling. In Drosophila melanogaster a single InsP(3)R gene (itpr) encodes a protein (DmInsP(3)R) that is approximately 60% conserved with mammalian InsP(3)Rs. A number of itpr mutant alleles have been identified in genetic screens and studied for their effect on development and physiology. However, the functional properties of wild-type or mutant DmInsP(3)Rs have never been described. Here we use the planar lipid bilayer reconstitution technique to describe single-channel properties of embryonic and adult head DmInsP(3)R splice variants. The three mutants chosen in this study reside in each of the three structural domains of the DmInsP(3)R-the amino-terminal ligand binding domain (ug3), the middle-coupling domain (wc703), and the channel-forming region (ka901). We discovered that 1), the major functional properties of DmInsP(3)R (conductance, gating, and sensitivity to InsP(3) and Ca(2+)) are remarkably conserved with the mammalian InsP(3)R1; 2), single-channel conductance of the adult head DmInsP(3)R isoform is 89 pS and the embryonic DmInsP(3)R isoform is 70 pS; 3), ug3 mutation affects sensitivity of the DmInsP(3)Rs to activation by InsP(3), but not their InsP(3)-binding properties; 4), wc703 channels have increased sensitivity to modulation by Ca(2+); and 5), homomeric ka901 channels are not functional. We correlated the results obtained in planar lipid bilayer experiments with measurements of InsP(3)-induced Ca(2+) fluxes in microsomes isolated from wild-type and heterozygous itpr mutants. Our study validates the use of D. melanogaster as an appropriate model for InsP(3)R structure-function studies and provides novel insights into the fundamental mechanisms of the InsP(3)R function.
Collapse
Affiliation(s)
- Sonal Srikanth
- Department of Physiology, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, USA
| | | | | | | | | | | | | |
Collapse
|
854
|
Vendramini D. Noncoding DNA and the teem theory of inheritance, emotions and innate behaviour. Med Hypotheses 2005; 64:512-9. [PMID: 15617858 DOI: 10.1016/j.mehy.2004.08.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2004] [Accepted: 08/25/2004] [Indexed: 10/26/2022]
Abstract
The evolutionary function of noncoding 'junk' DNA remains one of the most challenging mysteries of genetics. Here a new model of DNA is proposed to explain this function. The hypothesis asserts the DNA molecule contains not one, but two separate modes of inheritance. In addition to exons that code for proteins and physical traits, it is argued noncoding repetitive elements code for the inheritance of emotions and innate behaviour in metazoans. That is to say, noncoding DNA functions as the medium of a second, hitherto unknown evolutionary process that genetically archives adaptive information, configured as emotions and acquired during the life of an organism, into an inheritable form. This second evolutionary process, here called 'Teemosis', is a selectionist process, but paradoxically, because it does not affect physical traits, it has no maladaptive Lamarckian consequences. The medical implications of the hypothesis are discussed.
Collapse
|
855
|
Sasaki T, Matsumoto T, Antonio BA, Nagamura Y. From mapping to sequencing, post-sequencing and beyond. PLANT & CELL PHYSIOLOGY 2005; 46:3-13. [PMID: 15659433 DOI: 10.1093/pcp/pci503] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The Rice Genome Research Program (RGP) in Japan has been collaborating with the international community in elucidating a complete high-quality sequence of the rice genome. As the pioneer in large-scale analysis of the rice genome, the RGP has successfully established the fundamental tools for genome research such as a genetic map, a yeast artificial chromosome (YAC)-based physical map, a transcript map and a phage P1 artificial chromosome (PAC)/bacterial artificial chromosome (BAC) sequence-ready physical map, which serve as common resources for genome sequencing. Among the 12 rice chromosomes, the RGP is in charge of sequencing six chromosomes covering 52% of the 390 Mb total length of the genome. The contribution of the RGP to the realization of decoding the rice genome sequence with high accuracy and deciphering the genetic information in the genome will have a great impact in understanding the biology of the rice plant that provides a major food source for almost half of the world's population. A high-quality draft sequence (phase 2) was completed in December 2002. Since then, much of the finished quality sequence (phase 3) has become available in public databases. With the completion of sequencing in December 2004, it is expected that the genome sequence would facilitate innovative research in functional and applied genomics. A map-based genome sequence is indispensable for further improvement of current rice varieties and for development of novel varieties carrying agronomically important traits such as high yield potential and tolerance to both biotic and abiotic stresses. In addition to genome sequencing, various related projects have been initiated to generate valuable resources, which could serve as indispensable tools in clarifying the structure and function of the rice genome. These resources have been made available to the scientific community through the Rice Genome Resource Center (RGRC) of the National Institute of Agrobiological Sciences (NIAS) to enable rapid progress in research that will lead to thorough understanding of the rice plant. As the next trend in rice genome research will focus on determining the function of about 40,000-50,000 genes predicted in the genome as well as applying various genomics tools in rice breeding, an unlimited access to rice DNA and seed stocks will provide a broad community of scientists with the necessary materials for formulating new concepts, developing innovative research and making new scientific discoveries in rice genomics.
Collapse
Affiliation(s)
- Takuji Sasaki
- National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan.
| | | | | | | |
Collapse
|
856
|
Goldsmith MR, Shimada T, Abe H. The genetics and genomics of the silkworm, Bombyx mori. ANNUAL REVIEW OF ENTOMOLOGY 2005; 50:71-100. [PMID: 15355234 DOI: 10.1146/annurev.ento.50.071803.130456] [Citation(s) in RCA: 343] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
We review progress in applying molecular genetic and genomic technologies to studies in the domesticated silkworm, Bombyx mori, highlighting its use as a model for Lepidoptera, and in sericulture and biotechnology. Dense molecular linkage maps are being integrated with classical linkage maps for positional cloning and marker-assisted selection. Classical mutations have been identified by a candidate gene approach. Cytogenetic and sequence analyses show that the W chromosome is composed largely of nested full-length long terminal repeat retrotransposons. Z-chromosome-linked sequences show a lack of dosage compensation. The downstream sex differentiation mechanism has been studied via the silkworm homolog of doublesex. Expressed sequence tagged databases have been used to discover Lepidoptera-specific genes, provide evidence for horizontal gene transfer, and construct microarrays. Physical maps using large-fragment bacterial artificial chromosome libraries have been constructed, and whole-genome shotgun sequencing is underway. Germline transformation and transient expression systems are well established and available for functional studies, high-level protein expression, and gene silencing via RNA interference.
Collapse
Affiliation(s)
- Marian R Goldsmith
- Biological Sciences Department, University of Rhode Island, Kingston, Rhode Island 02881, USA.
| | | | | |
Collapse
|
857
|
Yazaki J, Kikuchi S. The genomic view of genes responsive to the antagonistic phytohormones, abscisic acid, and gibberellin. VITAMINS AND HORMONES 2005; 72:1-30. [PMID: 16492467 DOI: 10.1016/s0083-6729(05)72001-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
We now have the various genomics tools for monocot (Oryza sativa) and a dicot (Arabidopsis thaliana) plant. Plant is not only a very important agricultural resource but also a model organism for biological research. It is important that the interaction between ABA and GA is investigated for controlling the transition from embryogenesis to germination in seeds using genomics tools. These studies have investigated the relationship between dormancy and germination using genomics tools. Genomics tools identified genes that had never before been annotated as ABA- or GA-responsive genes in plant, detected new interactions between genes responsive to the two hormones, comprehensively characterized cis-elements of hormone-responsive genes, and characterized cis-elements of rice and Arabidopsis. In these research, ABA- and GA-regulated genes have been classified as functional proteins (proteins that probably function in stress or PR tolerance) and regulatory proteins (protein factors involved in further regulation of signal transduction). Comparison between ABA and/or GA-responsive genes in rice and those in Arabidopsis has shown that the cis-element has specificity in each species. cis-Elements for the dehydration-stress response have been specified in Arabidopsis but not in rice. cis-Elements for protein storage are remarkably richer in the upstream regions of the rice gene than in those of Arabidopsis.
Collapse
Affiliation(s)
- Junshi Yazaki
- Department of Molecular Genetics, National Institute of Agrobiological Sciences, 2-1-2 Kannon-dai, Tsukuba, Ibaraki 305-8602, Japan
| | | |
Collapse
|
858
|
Fouts DE, Mongodin EF, Mandrell RE, Miller WG, Rasko DA, Ravel J, Brinkac LM, DeBoy RT, Parker CT, Daugherty SC, Dodson RJ, Durkin AS, Madupu R, Sullivan SA, Shetty JU, Ayodeji MA, Shvartsbeyn A, Schatz MC, Badger JH, Fraser CM, Nelson KE. Major structural differences and novel potential virulence mechanisms from the genomes of multiple campylobacter species. PLoS Biol 2005; 3:e15. [PMID: 15660156 PMCID: PMC539331 DOI: 10.1371/journal.pbio.0030015] [Citation(s) in RCA: 403] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2004] [Accepted: 11/11/2004] [Indexed: 12/19/2022] Open
Abstract
Sequencing and comparative genome analysis of four strains of Campylobacter including C. lari RM2100, C. upsaliensis RM3195, and C. coli RM2228 has revealed major structural differences that are associated with the insertion of phage- and plasmid-like genomic islands, as well as major variations in the lipooligosaccharide complex. Poly G tracts are longer, are greater in number, and show greater variability in C. upsaliensis than in the other species. Many genes involved in host colonization, including racR/S, cadF, cdt, ciaB, and flagellin genes, are conserved across the species, but variations that appear to be species specific are evident for a lipooligosaccharide locus, a capsular (extracellular) polysaccharide locus, and a novel Campylobacter putative licABCD virulence locus. The strains also vary in their metabolic profiles, as well as their resistance profiles to a range of antibiotics. It is evident that the newly identified hypothetical and conserved hypothetical proteins, as well as uncharacterized two-component regulatory systems and membrane proteins, may hold additional significant information on the major differences in virulence among the species, as well as the specificity of the strains for particular hosts.
Collapse
Affiliation(s)
- Derrick E Fouts
- The Institute for Genomic Research, Rockville, Maryland, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
859
|
Abstract
MOTIVATION EST sequences constitute an abundant, yet error prone resource for computational biology. Expressed sequences are important in gene discovery and identification, and they are also crucial for the discovery and classification of alternative splicing. An important challenge when processing EST sequences is the reconstruction of mRNA by assembling EST clusters into consensus sequences. RESULTS In contrast to the more established assembly tools, we propose an algorithm that constructs a graph over sequence fragments of fixed size, and produces consensus sequences as traversals of this graph. We provide a tool implementing this algorithm, and perform an experiment where the consensus sequences produced by our implementation, as well as by currently available tools, are compared to mRNA. The results show that our proposed algorithm in a majority of the cases produces consensus of higher quality than the established sequence assemblers and at a competitive speed. AVAILABILITY The source code for the implementation is available under a GPL license from http://www.ii.uib.no/~ketil/bioinformatics/ CONTACT ketil@ii.uib.no.
Collapse
Affiliation(s)
- Ketil Malde
- Department of Informatics, University of Bergen, Norway.
| | | | | |
Collapse
|
860
|
Abstract
Completing the primary genomic sequence of Arabidopsis thaliana was a major milestone, being the first plant genome and only the third high-quality finished eukaryotic genome sequence. Understanding how the genome sequence comprehensively encodes developmental programs and environmental responses is the next major challenge for all plant genome projects. This requires fully characterizing the genes, the regulatory sequences, and their functions. We discuss several functional genomics approaches to decode the linear sequence of the reference plant Arabidopsis thaliana, including full-length cDNA collections, microarrays, natural variation, knockout collections, and comparative sequence analysis. Genomics provides the essential tools to speed the work of the traditional molecular geneticist and is now a scientific discipline in its own right.
Collapse
Affiliation(s)
- Justin O Borevitz
- Genomic Analysis Laboratory, Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California 92037, USA.
| | | |
Collapse
|
861
|
Degtyarev S, Boykova T, Grishanin A, Belyakin S, Rubtsov N, Karamysheva T, Makarevich G, Akifyev A, Zhimulev I. The molecular structure of the DNA fragments eliminated during chromatin diminution in Cyclops kolensis. Genome Res 2004; 14:2287-94. [PMID: 15520291 PMCID: PMC525688 DOI: 10.1101/gr.2794604] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2004] [Accepted: 08/16/2004] [Indexed: 11/24/2022]
Abstract
Presumptive somatic cells of the copepod Cyclops kolensis specifically eliminate a large fraction of their genome by the process of chromatin diminution. The eliminated DNA (eDNA) remains only in the germline cells. Very little is known about the nature of the sequences eliminated from somatic cells. We cloned a fraction of the eDNA and sequenced 90 clones that total 32 kb. The following organizational patterns were demonstrated for the eDNA sequences. All do not contain open reading frames. Each fragment contains 1-3 families of short repeats (10-30 bp) highly homologous within families (87%-100%). Most repeats are separated by spacers up to 50 bp long. Homologous regions were found between fragments, motifs from 15-300 bp in length. Among fragments there occur groups in which the same motifs are ordered in the same fashion. However, spacers between the motifs differ in length and nucleotide composition. Ubiquitous motifs (those occurring in all fragments) were identified. Analysis of motifs revealed submotifs, each occurring within several motifs. Thus, motifs may be regarded as mosaic structures composed of submotifs (short repeats). Taken together, the results provide evidence of a high organizational ordering of the DNA sequences restricted to the germline. With this in mind, it appears incorrect to refer to this part of the genome as junk. Moreover, eDNA is redundant for only the somatic cells-its function is to be sought in germline cells.
Collapse
Affiliation(s)
- Sergei Degtyarev
- Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia
| | | | | | | | | | | | | | | | | |
Collapse
|
862
|
Abstract
Repetitive sequences make up a significant fraction of almost any genome, and an important and still open question in bioinformatics is how to represent all repeats in DNA sequences. We propose a new approach to repeat classification that represents all repeats in a genome as a mosaic of sub-repeats. Our key algorithmic idea also leads to new approaches to multiple alignment and fragment assembly. In particular, we show that our FragmentGluer assembler improves on Phrap and ARACHNE in assembly of BACs and bacterial genomes.
Collapse
Affiliation(s)
- Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093, USA
| | | | | | | |
Collapse
|
863
|
Springer NM, Xu X, Barbazuk WB. Utility of different gene enrichment approaches toward identifying and sequencing the maize gene space. PLANT PHYSIOLOGY 2004; 136:3023-33. [PMID: 15299128 PMCID: PMC523364 DOI: 10.1104/pp.104.043323] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2004] [Revised: 05/27/2004] [Accepted: 06/01/2004] [Indexed: 05/18/2023]
Abstract
Maize (Zea mays) possesses a large, highly repetitive genome, and subsequently a number of reduced-representation sequencing approaches have been used to try and enrich for gene space while eluding difficulties associated with repetitive DNA. This article documents the ability of publicly available maize expressed sequence tag and Genome Survey Sequences (GSSs; many of which were isolated through the use of reduced representation techniques) to recognize and provide coverage of 78 maize full-length cDNAs (FLCs). All 78 FLCs in the dataset were identified by at least three GSSs, indicating that the majority of maize genes have been identified by at least one currently available GSS. Both methyl-filtration and high-Cot enrichment methods provided a 7- to 8-fold increase in gene discovery rates as compared to random sequencing. The available maize GSSs aligned to 75% of the FLC nucleotides used to perform searches, while the expressed sequence tag sequences aligned to 73% of the nucleotides. Our data suggest that at least approximately 95% of maize genes have been tagged by at least one GSS. While the GSSs are very effective for gene identification, relatively few (18%) of the FLCs are completely represented by GSSs. Analysis of the overlap of coverage and bias due to position within a gene suggest that RescueMu, methyl-filtration, and high-Cot methods are at least partially nonredundant.
Collapse
Affiliation(s)
- Nathan Michael Springer
- Center for Plant and Microbial Genomics, Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108, USA.
| | | | | |
Collapse
|
864
|
Krzywinski J, Sangaré D, Besansky NJ. Satellite DNA from the Y chromosome of the malaria vector Anopheles gambiae. Genetics 2004; 169:185-96. [PMID: 15466420 PMCID: PMC1448884 DOI: 10.1534/genetics.104.034264] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Satellite DNA is an enigmatic component of genomic DNA with unclear function that has been regarded as "junk." Yet, persistence of these tandem highly repetitive sequences in heterochromatic regions of most eukaryotic chromosomes attests to their importance in the genome. We explored the Anopheles gambiae genome for the presence of satellite repeats and identified 12 novel satellite DNA families. Certain families were found in close juxtaposition within the genome. Six satellites, falling into two evolutionarily linked groups, were investigated in detail. Four of them were experimentally confirmed to be linked to the Y chromosome, whereas their relatives occupy centromeric regions of either the X chromosome or the autosomes. A complex evolutionary pattern was revealed among the AgY477-like satellites, suggesting their rapid turnover in the A. gambiae complex and, potentially, recombination between sex chromosomes. The substitution pattern suggested rolling circle replication as an array expansion mechanism in the Y-linked 53-bp satellite families. Despite residing in different portions of the genome, the 53-bp satellites share the same monomer lengths, apparently maintained by molecular drive or structural constraints. Potential functional centromeric DNA structures, consisting of twofold dyad symmetries flanked by a common sequence motif, have been identified in both satellite groups.
Collapse
Affiliation(s)
- Jaroslaw Krzywinski
- Center for Tropical Disease Research and Training, Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana 46556, USA
| | | | | |
Collapse
|
865
|
Paces J, Zíka R, Paces V, Pavlícek A, Clay O, Bernardi G. Representing GC variation along eukaryotic chromosomes. Gene 2004; 333:135-41. [PMID: 15177688 DOI: 10.1016/j.gene.2004.02.041] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2003] [Accepted: 02/10/2004] [Indexed: 02/03/2023]
Abstract
Genome sequencing now permits direct visual representation, at any scale, of GC heterogeneity along the chromosomes of several higher eukaryotes. Plots can be easily obtained from the chromosomal sequences, yet sequence releases of mammalian or plant chromosomes still tend to use small scales or window sizes that obscure important large-scale compositional features. To faithfully reveal, at one glance, the compositional variation at a given scale, we have devised a simple scheme that combines line plots with color-coded shading of the regions underneath the plots. The scheme can be applied to different eukaryotic genomes to facilitate their comparison, as illustrated here for a sample of chromosomes chosen from seven selected species. As a complement to a previously published compact view of isochores in the human genome sequence, we include here an analogous map for the recently sequenced mouse genome, and discuss the contribution of repetitive DNA to the GC variation along the plots. Supplementary information, including a database of color-coded GC profiles for all recently sequenced eukaryotes and the program draw_chromosomes_gc.pl used to obtain them, are available at.
Collapse
Affiliation(s)
- Jan Paces
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Flemingovo 2, Prague CZ-16637, Czech Republic
| | | | | | | | | | | |
Collapse
|
866
|
Brekke KM, Garrard WT. Assembly and analysis of the mouse immunoglobulin kappa gene sequence. Immunogenetics 2004; 56:490-505. [PMID: 15378297 DOI: 10.1007/s00251-004-0659-0] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2003] [Revised: 02/12/2004] [Indexed: 11/30/2022]
Abstract
The mechanisms regulating V gene usage leading to the immunoglobulin (Ig) repertoire have been of interest for many years but are only partially defined. To gain insight into these processes, we have assembled the nucleotide sequence of the Mus musculus Igkappa locus using data recently made available from genome-wide sequencing efforts. We found the locus to be 3.21 Mb in length and mapped all known functional, pseudo- and relic V gene segments onto the sequence, along with known regulatory elements. We corrected errors in former gene assignments, positions and orientations and identified a novel Vkappa4 gene segment. This assembly allowed the establishment of a unified nomenclature for the V genes based on their relative positions similar to the nomenclature system adopted for the human Ig loci. The 5' boundary of the locus is defined by the presence of the tumor-associated calcium-signal transducer-2 gene located 19 kb upstream of Vkappa24-140, the most distal V gene. No non- Vkappa genes were found in the sequence of the locus. Detailed analysis of the sequences 0.5 kb upstream, within, and 0.5 kb downstream of each potentially functional V gene revealed interesting patterns of statistically significant clustering of transcription factor consensus binding sites, generally specific to a particular family. We found E boxes were clustered not only in promoter regions, but also nearby recombination signal sequences. Family members of Vkappa4/5 genes exhibit a conserved pattern of octamer sites in their downstream regions, as well as Ebf sites in their introns, and Lef-1 sites in their upstream regions. We discuss potential functional implications of these findings in the context of possible combinatorial mechanisms for targeting V genes for rearrangement. The assembled sequence and its analyses are available as a resource to the scientific community.
Collapse
Affiliation(s)
- Katherine M Brekke
- Department of Molecular Biology, University of Texas Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., Dallas, TX 75390-9148, USA
| | | |
Collapse
|
867
|
Berman BP, Pfeiffer BD, Laverty TR, Salzberg SL, Rubin GM, Eisen MB, Celniker SE. Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol 2004; 5:R61. [PMID: 15345045 PMCID: PMC522868 DOI: 10.1186/gb-2004-5-9-r61] [Citation(s) in RCA: 170] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2004] [Revised: 08/04/2004] [Accepted: 08/06/2004] [Indexed: 01/03/2023] Open
Abstract
27 predicted gene-regulatory regions in the Drosophila melanogaster genome were analyzed in vivo, confirming 15 active enhancer regions. A comparison with Drosophila pseudoobscura sequences revealed that conservation of binding-site clusters accurately discriminates functional regions from non-functional ones. Background The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. Results We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Conclusions Measuring conservation of sequence features closely linked to function - such as binding-site clustering - makes better use of comparative sequence data than commonly used methods that examine only sequence identity.
Collapse
Affiliation(s)
- Benjamin P Berman
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Barret D Pfeiffer
- Berkeley Drosophila Genome Project, Genome Sciences Department, Life Sciences Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Todd R Laverty
- Howard Hughes Medical Institute, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Steven L Salzberg
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20878, USA
| | - Gerald M Rubin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- Berkeley Drosophila Genome Project, Genome Sciences Department, Life Sciences Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Michael B Eisen
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- Genome Sciences Department, Genomics Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Center for Integrative Genomics, University of California, Berkeley, CA 94720, USA
| | - Susan E Celniker
- Berkeley Drosophila Genome Project, Genome Sciences Department, Life Sciences Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
868
|
Quesneville H, Nouaud D, Anxolabéhère D. Detection of new transposable element families in Drosophila melanogaster and Anopheles gambiae genomes. J Mol Evol 2004; 57 Suppl 1:S50-9. [PMID: 15008403 DOI: 10.1007/s00239-003-0007-2] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The techniques that are usually used to detect transposable elements (TEs) in nucleic acid sequences rely on sequence similarity with previously characterized elements. However, these methods are likely to miss many elements in various organisms. We tested two strategies for the detection of unknown elements. The first, which we call "TBLASTX strategy," searches for TE sequences by comparing the six-frame translations of the nucleic acid sequences of known TEs with the genomic sequence of interest. The second, "repeat-based strategy," searches genomic sequences for long repeats and clusters them in groups of similar sequences. TE copies from a given family are expected to cluster together. We tested the Drosophila melanogaster genomic sequence and the recently sequenced Anopheles gambiae genome in which most TEs remain unknown. We showed that the "TBLASTX strategy" is very efficient as it detected at least 332 new TE families in D. melanogaster and 400 in A. gambiae. This was unexpected in Drosophila as TEs of this organism have been extensively studied. The "repeat-based strategy" appeared to be very inefficient because of two problems: (i) TE copies are heavily deleted and few copies share homologous regions, and (ii) segmental duplications are frequent and it is not easy to distinguish them from TE copies.
Collapse
Affiliation(s)
- Hadi Quesneville
- Laboratoire Dynamique du Génome et Evolution, Institut Jacques Monod, 2, Place Jussieu, 75251 Paris Cedex 05, France.
| | | | | |
Collapse
|
869
|
Roberts M, Hunt BR, Yorke JA, Bolanos RA, Delcher AL. A Preprocessor for Shotgun Assembly of Large Genomes. J Comput Biol 2004; 11:734-52. [PMID: 15579242 DOI: 10.1089/cmb.2004.11.734] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of the fragment is determined, albeit imprecisely, resulting in a sequence of letters called a "read." Each letter in a read is assigned a quality value, which estimates the probability that a sequencing error occurred in determining that letter. Reads are typically cut off after about 500 letters, where sequencing errors become endemic. We report on a set of procedures that (1) corrects most of the sequencing errors, (2) changes quality values accordingly, and (3) produces a list of "overlaps," i.e., pairs of reads that plausibly come from overlapping parts of the genome. Our procedures, which we call collectively the "UMD Overlapper," can be run iteratively and as a preprocessor for other assemblers. We tested the UMD Overlapper on Celera's Drosophila reads. When we replaced Celera's overlap procedures in the front end of their assembler, it was able to produce a significantly improved genome.
Collapse
Affiliation(s)
- Michael Roberts
- Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742-2431, USA
| | | | | | | | | |
Collapse
|
870
|
Granadino B, Rey-Campos J. EVG, the remnants of a primordial bilaterian's synteny of functionally unrelated genes. J Mol Evol 2004; 57:515-9. [PMID: 14738309 DOI: 10.1007/s00239-003-2503-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2002] [Accepted: 05/19/2003] [Indexed: 10/26/2022]
Abstract
Extant genomes are the result of repeated duplications and subsequent divergence of primordial genes that assembled the genomes of the first living beings. Increased information on genome maps of different species is revealing conserved syntenies among different vertebrate taxa, which allow to trace back the history of current chromosomes. However, inferring neighboring relationships between genes of more primitive genomes has proven to be very difficult. Most often, the ancestral arrangements of genes have been lost by multiple histories of internal duplications, chromosomal breaks, and large-scale genomic rearrangements. Here we describe a gene arrangement of nonrelated genes that seems to have endured evolution, at least from the separation of the two major clades of bilateria: deuterostomia and protostomia, approximately 1 billion years ago. In its simplest conception, this gene cluster, named EVG, groups the genes for a glucose transporter, an enolase, and a vesicle-associated membrane protein (VAMP). EVG might represent the evolutionary remnants of the gene organization of an ancient bilaterian genome.
Collapse
Affiliation(s)
- Begoña Granadino
- Departamento de Biología Celular y Desarrollo, Centro de Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas, Velázquez 144, 28006 Madrid, Spain
| | | |
Collapse
|
871
|
Hoffmaster AR, Ravel J, Rasko DA, Chapman GD, Chute MD, Marston CK, De BK, Sacchi CT, Fitzgerald C, Mayer LW, Maiden MCJ, Priest FG, Barker M, Jiang L, Cer RZ, Rilstone J, Peterson SN, Weyant RS, Galloway DR, Read TD, Popovic T, Fraser CM. Identification of anthrax toxin genes in a Bacillus cereus associated with an illness resembling inhalation anthrax. Proc Natl Acad Sci U S A 2004; 101:8449-54. [PMID: 15155910 PMCID: PMC420414 DOI: 10.1073/pnas.0402414101] [Citation(s) in RCA: 352] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Bacillus anthracis is the etiologic agent of anthrax, an acute fatal disease among mammals. It was thought to differ from Bacillus cereus, an opportunistic pathogen and cause of food poisoning, by the presence of plasmids pXO1 and pXO2, which encode the lethal toxin complex and the poly-gamma-d-glutamic acid capsule, respectively. This work describes a non-B. anthracis isolate that possesses the anthrax toxin genes and is capable of causing a severe inhalation anthrax-like illness. Although initial phenotypic and 16S rRNA analysis identified this isolate as B. cereus, the rapid generation and analysis of a high-coverage draft genome sequence revealed the presence of a circular plasmid, named pBCXO1, with 99.6% similarity with the B. anthracis toxin-encoding plasmid, pXO1. Although homologues of the pXO2 encoded capsule genes were not found, a polysaccharide capsule cluster is encoded on a second, previously unidentified plasmid, pBC218. A/J mice challenged with B. cereus G9241 confirmed the virulence of this strain. These findings represent an example of how genomics could rapidly assist public health experts responding not only to clearly identified select agents but also to novel agents with similar pathogenic potentials. In this study, we combined a public health approach with genome analysis to provide insight into the correlation of phenotypic characteristics and their genetic basis.
Collapse
Affiliation(s)
- Alex R Hoffmaster
- Epidemiologic Investigations Laboratory, Meningitis and Special Pathogens Branch, Centers for Disease Control and Prevention, 1600 Clifton Road, MS G34, Atlanta, GA 30333, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
872
|
Havlak P, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Weinstock GM, Gibbs RA. The Atlas genome assembly system. Genome Res 2004; 14:721-32. [PMID: 15060016 PMCID: PMC383319 DOI: 10.1101/gr.2264004] [Citation(s) in RCA: 101] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Atlas is a suite of programs developed for assembly of genomes by a "combined approach" that uses DNA sequence reads from both BACs and whole-genome shotgun (WGS) libraries. The BAC clones afford advantages of localized assembly with reduced computational load, and provide a robust method for dealing with repeated sequences. Inclusion of WGS sequences facilitates use of different clone insert sizes and reduces data production costs. A core function of Atlas software is recruitment of WGS sequences into appropriate BACs based on sequence overlaps. Because construction of consensus sequences is from local assembly of these reads, only small (<0.1%) units of the genome are assembled at a time. Once assembled, each BAC is used to derive a genomic layout. This "sequence-based" growth of the genome map has greater precision than with non-sequence-based methods. Use of BACs allows correction of artifacts due to repeats at each stage of the process. This is aided by ancillary data such as BAC fingerprint, other genomic maps, and syntenic relations with other genomes. Atlas was used to assemble a draft DNA sequence of the rat genome; its major components including overlapper and split-scaffold are also being used in pure WGS projects.
Collapse
Affiliation(s)
- Paul Havlak
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | | | | | | | |
Collapse
|
873
|
Abstract
The Human Genome Project (HGP) is the most ambitious and important effort in the history of biology. It has provided a complete genetic blueprint for human life, and will provide important insights into human health and development. HGP involves a huge amount of data that is stored on computers all over the world. More than just vast amounts of DNA sequences, the project is about developing sets of integrated maps that involve genetic, physical, and sequence data. The data can be sorted, annotated and organized in many different ways using different types of database software, different analysis algorithms and different forms of interfaces. The genomic sequences of the human and the substantial portions of the mouse genome are expected to be finished by 2005. Analytical chemists took the opportunity, addressing the problem of achieving a high throughput with good sensitivity. This paper discusses how analytical chemists saved the Human Genome Project or at least gave it a helping hand.
Collapse
Affiliation(s)
- Subbiah Thangadurai
- Department of Geology and Mining, Guindy, Chennai-600 032, Tamil Nadu, India.
| |
Collapse
|
874
|
Eisenreich W, Ettenhuber C, Laupitz R, Theus C, Bacher A. Isotopolog perturbation techniques for metabolic networks: metabolic recycling of nutritional glucose in Drosophila melanogaster. Proc Natl Acad Sci U S A 2004; 101:6764-9. [PMID: 15096588 PMCID: PMC404119 DOI: 10.1073/pnas.0400916101] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Drosophila melanogaster strain Oregon-R(*) was grown on standard medium supplemented with [U-(13)C(6)]glucose. One to two days after hatching, flies were extracted with water. Glucose was isolated chromatographically from the extract and was analyzed by (13)C NMR spectroscopy. All (13)C signals of the isolated glucose were multiplets arising by (13)C(13)C coupling. Based on a comprehensive analysis of the coupling constants and heavy isotope shifts in glucose, the integrals of individual (13)C signal patterns afforded the concentrations of certain groups of (13)C isotopologs. These data were deconvoluted by a genetic algorithm affording the abundances of all single-labeled and of 15 multiply labeled isotopologs. Among the latter group, seven isotopologs were found at concentrations >0.1 mol % with [1,2-(13)C(2)]glucose as the most prominent species. The multiply (13)C-labeled glucose isotopologs are caused by metabolic remodeling of the proffered glucose via a complex network of catabolic and anabolic processes involving glycolysis and/or passage through the pentose phosphate, the Cori cycle and/or the citrate cycle. The perturbation method described can be adapted to a wide variety of experimental systems and isotope-labeled precursors.
Collapse
Affiliation(s)
- Wolfgang Eisenreich
- Lehrstuhl für Organische Chemie und Biochemie, Technische Universität München, Lichtenbergstrasse 4, D-85747 Garching, Germany.
| | | | | | | | | |
Collapse
|
875
|
Martienssen RA, Rabinowicz PD, O'Shaughnessy A, McCombie WR. Sequencing the maize genome. CURRENT OPINION IN PLANT BIOLOGY 2004; 7:102-7. [PMID: 15003207 DOI: 10.1016/j.pbi.2004.01.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.
Collapse
Affiliation(s)
- Robert A Martienssen
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA.
| | | | | | | |
Collapse
|
876
|
Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera, Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Simons R, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Worley KC, Cooney AJ, D'Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, et alGibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera, Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Simons R, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Worley KC, Cooney AJ, D'Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, Tuzun E, Birney E, Mongin E, Ureta-Vidal A, Woodwark C, Zdobnov E, Bork P, Suyama M, Torrents D, Alexandersson M, Trask BJ, Young JM, Huang H, Wang H, Xing H, Daniels S, Gietzen D, Schmidt J, Stevens K, Vitt U, Wingrove J, Camara F, Mar Albà M, Abril JF, Guigo R, Smit A, Dubchak I, Rubin EM, Couronne O, Poliakov A, Hübner N, Ganten D, Goesele C, Hummel O, Kreitler T, Lee YA, Monti J, Schulz H, Zimdahl H, Himmelbauer H, Lehrach H, Jacob HJ, Bromberg S, Gullings-Handley J, Jensen-Seaman MI, Kwitek AE, Lazar J, Pasko D, Tonellato PJ, Twigger S, Ponting CP, Duarte JM, Rice S, Goodstadt L, Beatson SA, Emes RD, Winter EE, Webber C, Brandt P, Nyakatura G, Adetobi M, Chiaromonte F, Elnitski L, Eswara P, Hardison RC, Hou M, Kolbe D, Makova K, Miller W, Nekrutenko A, Riemer C, Schwartz S, Taylor J, Yang S, Zhang Y, Lindpaintner K, Andrews TD, Caccamo M, Clamp M, Clarke L, Curwen V, Durbin R, Eyras E, Searle SM, Cooper GM, Batzoglou S, Brudno M, Sidow A, Stone EA, Venter JC, Payseur BA, Bourque G, López-Otín C, Puente XS, Chakrabarti K, Chatterji S, Dewey C, Pachter L, Bray N, Yap VB, Caspi A, Tesler G, Pevzner PA, Haussler D, Roskin KM, Baertsch R, Clawson H, Furey TS, Hinrichs AS, Karolchik D, Kent WJ, Rosenbloom KR, Trumbower H, Weirauch M, Cooper DN, Stenson PD, Ma B, Brent M, Arumugam M, Shteynberg D, Copley RR, Taylor MS, Riethman H, Mudunuri U, Peterson J, Guyer M, Felsenfeld A, Old S, Mockrin S, Collins F. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 2004; 428:493-521. [PMID: 15057822 DOI: 10.1038/nature02426] [Show More Authors] [Citation(s) in RCA: 1557] [Impact Index Per Article: 74.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2003] [Accepted: 02/20/2004] [Indexed: 01/16/2023]
Abstract
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
Collapse
Affiliation(s)
- Richard A Gibbs
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, MS BCM226, One Baylor Plaza, Houston, Texas 77030, USA. http://www.hgsc.bcm.tmc.edu
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
877
|
Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO. Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004; 304:66-74. [PMID: 15001713 DOI: 10.1126/science.1093857] [Citation(s) in RCA: 2464] [Impact Index Per Article: 117.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We have applied "whole-genome shotgun sequencing" to microbial populations collected en masse on tangential flow and impact filters from seawater samples collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs of nonredundant sequence was generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance of the organisms within these environmental samples. These data are estimated to derive from at least 1800 genomic species based on sequence relatedness, including 148 previously unknown bacterial phylotypes. We have identified over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors. Variation in species present and stoichiometry suggests substantial oceanic microbial diversity.
Collapse
Affiliation(s)
- J Craig Venter
- Institute for Biological Energy Alternatives, 1901 Research Boulevard, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
878
|
Nesvizhskii AI, Aebersold R. Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS. Drug Discov Today 2004; 9:173-81. [PMID: 14960397 DOI: 10.1016/s1359-6446(03)02978-7] [Citation(s) in RCA: 123] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Tandem mass spectrometry has been used increasingly for high-throughput analysis of complex protein samples. A major challenge lies in the consistent, objective and transparent analysis of the large amounts of data generated by such experiments and in their dissemination and publication. Here, we review currently available computational tools and discuss the need for statistical criteria in the analysis of large proteomics datasets.
Collapse
|
879
|
Abstract
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site.
Collapse
Affiliation(s)
- Mihai Pop
- The Institute for Genomic Research (TIGR), Rockville, Maryland 20850, USA.
| | | | | |
Collapse
|
880
|
Istrail S, Sutton GG, Florea L, Halpern AL, Mobarry CM, Lippert R, Walenz B, Shatkay H, Dew I, Miller JR, Flanigan MJ, Edwards NJ, Bolanos R, Fasulo D, Halldorsson BV, Hannenhalli S, Turner R, Yooseph S, Lu F, Nusskern DR, Shue BC, Zheng XH, Zhong F, Delcher AL, Huson DH, Kravitz SA, Mouchard L, Reinert K, Remington KA, Clark AG, Waterman MS, Eichler EE, Adams MD, Hunkapiller MW, Myers EW, Venter JC. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc Natl Acad Sci U S A 2004; 101:1916-21. [PMID: 14769938 PMCID: PMC357027 DOI: 10.1073/pnas.0307971100] [Citation(s) in RCA: 119] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.
Collapse
Affiliation(s)
- Sorin Istrail
- Applied Biosystems, 45 West Gude Drive, Rockville, MD 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
881
|
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol 2004; 5:R12. [PMID: 14759262 PMCID: PMC395750 DOI: 10.1186/gb-2004-5-2-r12] [Citation(s) in RCA: 3815] [Impact Index Per Article: 181.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2003] [Revised: 12/15/2003] [Accepted: 12/17/2003] [Indexed: 11/29/2022] Open
Abstract
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at .
Collapse
Affiliation(s)
- Stefan Kurtz
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Adam Phillippy
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Arthur L Delcher
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Michael Smoot
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
- Current address: Department of Computer Science, University of Virginia, Charlottesville, VA 22904, USA
| | - Martin Shumway
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Corina Antonescu
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Steven L Salzberg
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| |
Collapse
|
882
|
Abstract
By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species.
Collapse
Affiliation(s)
- Pawel Gajer
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
883
|
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol 2004. [PMID: 14759262 DOI: 10.1186/gb-200-5-2-r12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023] Open
Abstract
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at http://www.tigr.org/software/mummer.
Collapse
Affiliation(s)
- Stefan Kurtz
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | | | | | | | | | | | |
Collapse
|
884
|
Bauer H, Gromer S, Urbani A, Schnölzer M, Schirmer RH, Müller HM. Thioredoxin reductase from the malaria mosquito Anopheles gambiae. ACTA ACUST UNITED AC 2003; 270:4272-81. [PMID: 14622292 DOI: 10.1046/j.1432-1033.2003.03812.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The mosquito, Anopheles gambiae, is an important vector of Plasmodium falciparum malaria. Full genome analysis revealed that, as in Drosophila melanogaster, the enzyme glutathione reductase is absent in A. gambiae and functionally substituted by the thioredoxin system. The key enzyme of this system is thioredoxin reductase-1, a homodimeric FAD-containing protein of 55.3 kDa per subunit, which catalyses the reaction NADPH + H+ + thioredoxin disulfide-->NADP+ + thioredoxin dithiol. The A. gambiae trxr gene is located on chromosome X as a single copy; it represents three splice variants coding for two cytosolic and one mitochondrial variant. The predominant isoform, A. gambiae thioredoxin reductase-1, was recombinantly expressed in Escherichia coli and functionally compared with the wild-type enzyme isolated in a final yield of 1.4 U.ml(-1) of packed insect cells. In redox titrations, the substrate A. gambiae thioredoxin-1 (Km=8.5 microm, kcat=15.4 s(-1) at pH 7.4 and 25 degrees C) was unable to oxidize NADPH-reduced A. gambiae thioredoxin reductase-1 to the fully oxidized state. This indicates that, in contrast to other disulfide reductases, A. gambiae thioredoxin reductase-1 oscillates during catalysis between the four-electron reduced state and a two-electron reduced state. The thioredoxin reductases of the malaria system were compared. A. gambiae thioredoxin reductase-1 shares 52% and 45% sequence identity with its orthologues from humans and P. falciparum, respectively. A major difference among the three enzymes is the structure of the C-terminal redox centre, reflected in the varying resistance of catalytic intermediates to autoxidation. The relevant sequences of this centre are Thr-Cys-Cys-SerOH in A. gambiae thioredoxin reductase, Gly-Cys-selenocysteine-GlyOH in human thioredoxin reductase, and Cys-X-X-X-X-Cys-GlyOH in the P. falciparum enzyme. These differences offer an interesting approach to the design of species-specific inhibitors. Notably, A. gambiae thioredoxin reductase-1 is not a selenoenzyme but instead contains a highly unusual redox-active Cys-Cys sequence.
Collapse
Affiliation(s)
- Holger Bauer
- Biochemie Zentrum, Universität Heidelberg, Heidelberg, Germany
| | | | | | | | | | | |
Collapse
|
885
|
Abstract
Drosophila's importance as a model organism made it an obvious choice to be among the first genomes sequenced, and the Release 1 sequence of the euchromatic portion of the genome was published in March 2000. This accomplishment demonstrated that a whole genome shotgun (WGS) strategy could produce a reliable metazoan genome sequence. Despite the attention to sequencing methods, the nucleotide sequence is just the starting point for genome-wide analyses; at a minimum, the genome sequence must be interpreted using expressed sequence tag (EST) and complementary DNA (cDNA) evidence and computational tools to identify genes and predict the structures of their RNA and protein products. The functions of these products and the manner in which their expression and activities are controlled must then be assessed-a much more challenging task with no clear endpoint that requires a wide variety of experimental and computational methods. We first review the current state of the Drosophila melanogaster genome sequence and its structural annotation and then briefly summarize some promising approaches that are being taken to achieve an initial functional annotation.
Collapse
Affiliation(s)
- Susan E Celniker
- Berkeley Drosophila Genome Project, Department of Genome Sciences, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
| | | |
Collapse
|
886
|
Abstract
The sequencing of eukaryotic genomes has lagged behind sequencing of organisms in the other domains of life, archae and bacteria, primarily due to their greater size and complexity. With recent advances in high-throughput technologies such as robotics and improved computational resources, the number of eukaryotic genome sequencing projects has increased significantly. Among these are a number of sequencing projects of tropical pathogens of medical and veterinary importance, many of which are responsible for causing widespread morbidity and mortality in peoples of developing countries. Uncovering the complete gene complement of these organisms is proving to be of immense value in the development of novel methods of parasite control, such as antiparasitic drugs and vaccines, as well as the development of new diagnostic tools. Combining pathogen genome sequences with the host and vector genome sequences is promising to be a robust method for the identification of host-pathogen interactions. Finally, comparative sequencing of related species, especially of organisms used as model systems in the study of the disease, is beginning to realize its potential in the identification of genes, and the evolutionary forces that shape the genes, that are involved in evasion of the host immune response.
Collapse
Affiliation(s)
- Jane M Carlton
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| |
Collapse
|
887
|
Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, Clarke L, Clee C, Coghlan A, Coulson A, D'Eustachio P, Fitch DHA, Fulton LA, Fulton RE, Griffiths-Jones S, Harris TW, Hillier LW, Kamath R, Kuwabara PE, Mardis ER, Marra MA, Miner TL, Minx P, Mullikin JC, Plumb RW, Rogers J, Schein JE, Sohrmann M, Spieth J, Stajich JE, Wei C, Willey D, Wilson RK, Durbin R, Waterston RH. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol 2003; 1:E45. [PMID: 14624247 PMCID: PMC261899 DOI: 10.1371/journal.pbio.0000045] [Citation(s) in RCA: 666] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2003] [Accepted: 09/04/2003] [Indexed: 11/19/2022] Open
Abstract
The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes. With the Caenorhabditis briggsae genome now in hand, C. elegans biologists have a powerful new research tool to refine their knowledge of gene function in C. elegans and to study the path of genome evolution
Collapse
MESH Headings
- Animals
- Biological Evolution
- Caenorhabditis/genetics
- Caenorhabditis elegans/genetics
- Chromosome Mapping
- Chromosomes, Artificial, Bacterial
- Cluster Analysis
- Codon
- Conserved Sequence
- Evolution, Molecular
- Exons
- Gene Library
- Genome
- Genomics/methods
- Interspersed Repetitive Sequences
- Introns
- MicroRNAs/genetics
- Models, Genetic
- Models, Statistical
- Molecular Sequence Data
- Multigene Family
- Open Reading Frames
- Physical Chromosome Mapping
- Plasmids/metabolism
- Protein Structure, Tertiary
- Proteins/chemistry
- RNA/chemistry
- RNA, Ribosomal/genetics
- RNA, Spliced Leader
- RNA, Transfer/genetics
- Sequence Analysis, DNA
- Species Specificity
Collapse
Affiliation(s)
- Lincoln D Stein
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA..
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
888
|
Abstract
We propose an assembly algorithm Barnacle for sequences generated by the clone-based approach. We illustrate our approach by assembling the human genome. Our novel method abandons the original physical-mapping-first framework. As we show, Barnacle more effectively resolves conflicts due to repeated sequences which is the main difficulty of the sequence assembly problem. In addition, we are able to detect inconsistencies in the underlying data. We present and compare our results on the December 2001 freeze of the public working draft of the human genome with NCBI's assembly (Build 28). The assembly of December 2001 freeze of the public working draft generated by Barnacle and the source code of Barnacle are available at (http://www.cs.rutgers.edu/~vchoi).
Collapse
Affiliation(s)
- Vicky Choi
- Department of Computer Science, Rutgers University, Piscataway, NJ 08854, USA.
| | | |
Collapse
|
889
|
Waters E, Hohn MJ, Ahel I, Graham DE, Adams MD, Barnstead M, Beeson KY, Bibbs L, Bolanos R, Keller M, Kretz K, Lin X, Mathur E, Ni J, Podar M, Richardson T, Sutton GG, Simon M, Soll D, Stetter KO, Short JM, Noordewier M. The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc Natl Acad Sci U S A 2003; 100:12984-8. [PMID: 14566062 PMCID: PMC240731 DOI: 10.1073/pnas.1735403100] [Citation(s) in RCA: 349] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The hyperthermophile Nanoarchaeum equitans is an obligate symbiont growing in coculture with the crenarchaeon Ignicoccus. Ribosomal protein and rRNA-based phylogenies place its branching point early in the archaeal lineage, representing the new archaeal kingdom Nanoarchaeota. The N. equitans genome (490,885 base pairs) encodes the machinery for information processing and repair, but lacks genes for lipid, cofactor, amino acid, or nucleotide biosyntheses. It is the smallest microbial genome sequenced to date, and also one of the most compact, with 95% of the DNA predicted to encode proteins or stable RNAs. Its limited biosynthetic and catabolic capacity indicates that N. equitans' symbiotic relationship to Ignicoccus is parasitic, making it the only known archaeal parasite. Unlike the small genomes of bacterial parasites that are undergoing reductive evolution, N. equitans has few pseudogenes or extensive regions of noncoding DNA. This organism represents a basal archaeal lineage and has a highly reduced genome.
Collapse
Affiliation(s)
- Elizabeth Waters
- Diversa Corporation, 4955 Directors Place, San Diego, CA 92121, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
890
|
Stuart JM, Segal E, Koller D, Kim SK. A gene-coexpression network for global discovery of conserved genetic modules. Science 2003; 302:249-55. [PMID: 12934013 DOI: 10.1126/science.1087447] [Citation(s) in RCA: 1447] [Impact Index Per Article: 65.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
To elucidate gene function on a global scale, we identified pairs of genes that are coexpressed over 3182 DNA microarrays from humans, flies, worms, and yeast. We found 22,163 such coexpression relationships, each of which has been conserved across evolution. This conservation implies that the coexpression of these gene pairs confers a selective advantage and therefore that these genes are functionally related. Many of these relationships provide strong evidence for the involvement of new genes in core biological functions such as the cell cycle, secretion, and protein expression. We experimentally confirmed the predictions implied by some of these links and identified cell proliferation functions for several genes. By assembling these links into a gene-coexpression network, we found several components that were animal-specific as well as interrelationships between newly evolved and ancient modules.
Collapse
Affiliation(s)
- Joshua M Stuart
- Stanford Medical Informatics, 251 Campus Drive, Medical School Office Building X-215, Stanford, CA 94305-5329, USA
| | | | | | | |
Collapse
|
891
|
Kirkness EF, Bafna V, Halpern AL, Levy S, Remington K, Rusch DB, Delcher AL, Pop M, Wang W, Fraser CM, Venter JC. The Dog Genome: Survey Sequencing and Comparative Analysis. Science 2003; 301:1898-903. [PMID: 14512627 DOI: 10.1126/science.1086432] [Citation(s) in RCA: 349] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.
Collapse
Affiliation(s)
- Ewen F Kirkness
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
892
|
Abstract
We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues in assembly. Multiple processors are used to perform most time-consuming computations in assembly. A more sensitive method is used to avoid missing overlaps caused by sequencing errors. Repetitive regions of reads are detected on the basis of many overlaps with other reads, instead of many shorter word matches with other reads. Contaminated end regions of reads are identified and removed. Generation of a consensus sequence for a contig is based on an alignment of reads in the contig, in which both base quality values and coverage information are used to determine every consensus base. The PCAP program was tested on a mouse whole-genome data set of 30 million reads and a human Chromosome 20 data set of 1.7 million reads. The program is freely available for academic use.
Collapse
Affiliation(s)
- Xiaoqiu Huang
- Department of Computer Science Iowa State University, Ames, Iowa 50011-1040, USA.
| | | | | | | | | |
Collapse
|
893
|
Owen AB, Stuart J, Mach K, Villeneuve AM, Kim S. A gene recommender algorithm to identify coexpressed genes in C. elegans. Genome Res 2003; 13:1828-37. [PMID: 12902378 PMCID: PMC403774 DOI: 10.1101/gr.1125403] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
One of the most important uses of whole-genome expression data is for the discovery of new genes with similar function to a given list of genes (the query) already known to have closely related function. We have developed an algorithm, called the gene recommender, that ranks genes according to how strongly they correlate with a set of query genes in those experiments for which the query genes are most strongly coregulated. We used the gene recommender to find other genes coexpressed with several sets of query genes, including genes known to function in the retinoblastoma complex. Genetic experiments confirmed that one gene (JC8.6) identified by the gene recommender acts with lin-35 Rb to regulate vulval cell fates, and that another gene (wrm-1) acts antagonistically. We find that the gene recommender returns lists of genes with better precision, for fixed levels of recall, than lists generated using the C. elegans expression topomap.
Collapse
Affiliation(s)
- Art B Owen
- Department of Statistics, Stanford University, Stanford, California 94305, USA.
| | | | | | | | | |
Collapse
|
894
|
Lerat E, Rizzon C, Biémont C. Sequence divergence within transposable element families in the Drosophila melanogaster genome. Genome Res 2003; 13:1889-96. [PMID: 12869581 PMCID: PMC403780 DOI: 10.1101/gr.827603] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The availability of the sequenced Drosophila melanogaster genome provides an opportunity to study sequence variation between copies within transposable element families. In this study,we analyzed the 624 copies of 22 transposable element (TE) families (14 LTR retrotransposons, five non-LTR retrotransposons, and three transposons). LTR and non-LTR retrotransposons possessed far fewer divergent elements than the transposons,suggesting that the difference depends on the transposition mechanism. However,there was not a continuous range of divergence of the copies in each class,which were either very similar to the canonical elements,or very divergent from them. This sequence homogeneity among TE family copies matches the theoretical models of the dynamics of these repeated sequences. The sequenced Drosophila genome thus appears to be composed of a mixture of TEs that are still active and of ancient relics that have degenerated and the distribution of which along the chromosomes results from natural selection. This clearly demonstrates that the TEs are highly active within the genome,suggesting that the genetic variability of the Drosophila genome is still being renewed by the action of TEs.
Collapse
Affiliation(s)
- Emmanuelle Lerat
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, 69622 Villeurbanne cedex, France
| | | | | |
Collapse
|
895
|
Abstract
In this review, we describe the pathway for generating meiotic crossovers in Drosophila melanogaster females and how these events ensure the segregation of homologous chromosomes. As appears to be common to meiosis in most organisms, recombination is initiated with a double-strand break (DSB). The interesting differences between organisms appear to be associated with what chromosomal events are required for DSBs to form. In Drosophila females, the synaptonemal complex is required for most DSB formation. The repair of these breaks requires several DSB repair genes, some of which are meiosis-specific, and defects at this stage can have effects downstream on oocyte development. This has been suggested to result from a checkpoint-like signaling between the oocyte nucleus and gene products regulating oogenesis. Crossovers result from genetically controlled modifications to the DSB repair pathway. Finally, segregation of chromosomes joined by a chiasma requires a bipolar spindle. At least two kinesin motor proteins are required for the assembly of this bipolar spindle, and while the meiotic spindle lacks traditional centrosomes, some centrosome components are found at the spindle poles.
Collapse
Affiliation(s)
- Kim S McKim
- Waksman Institute and Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854-8020, USA.
| | | | | |
Collapse
|
896
|
Guarnieri DJ, Heberlein U. Drosophila melanogaster, a genetic model system for alcohol research. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2003; 54:199-228. [PMID: 12785288 DOI: 10.1016/s0074-7742(03)54006-5] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In its natural environment, which consists of fermenting plant materials, the fruit fly Drosophila melanogaster encounters high levels of ethanol. Flies are well equipped to deal with the toxic effects of ethanol; they use it as an energy source and for lipid biosynthesis. The primary ethanol-metabolizing pathway in flies involves the enzymes alcohol dehydrogenase (ADH) and acetaldehyde dehydrogenase (ALDH); their role in adaptation to ethanol-rich environments has been studied extensively. The similarity between Drosophila and mammals is not restricted to the manner in which they metabolize ethanol; behaviors elicited by ethanol exposure are also remarkably similar in these organisms. Flies show signs of acute intoxication, which range from locomotor stimulation at low doses to complete sedation at higher doses, they develop tolerance upon intermittent ethanol exposure, and they appear to like ethanol, showing preference for ethanol-containing media. Molecular genetic analysis of ethanol-induced behaviors in Drosophila, while still in its early stages, has already revealed some surprising parallels with mammals. The availability of powerful tools for genetic manipulation in Drosophila, together with the high degree of conservation at the genomic level, make Drosophila a promising model organism to study the mechanism by which ethanol regulates behavior and the mechanisms underlying the organism's adaptation to long-term ethanol exposure.
Collapse
Affiliation(s)
- Douglas J Guarnieri
- Department of Anatomy, Program in Neuroscience, University of California at San Francisco, San Francisco, CA 94143-0452, USA
| | | |
Collapse
|
897
|
|
898
|
Abstract
With the successful completion of the project to sequence the Plasmodium falciparum genome, researchers are now turning their attention to other malaria parasite species. Here, an update on the Plasmodium vivax genome sequencing project is presented, as part of the Trends in Parasitology series of reviews expanding on various aspects of P. vivax research.
Collapse
Affiliation(s)
- Jane Carlton
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| |
Collapse
|
899
|
Abstract
Drosophila discs-large (dlg) mutants exhibit multiple developmental abnormalities, including severe defects in neuronal differentiation and synaptic structure and function. These defects have been ascribed to the loss of a single gene product, Dlg-A, a scaffold protein thought to be expressed in many cell types. Here, we describe that additional isoforms arise as a consequence of different transcription start points and alternative splicing of dlg. At least five different dlg gene products are predicted. We identified a subset of dlg-derived cDNAs that include novel exons encoding a peptide homologous to the N terminus of the mammalian protein SAP97/hDLG (S97N). Dlg isoforms containing the S97N domain are expressed at larval neuromuscular junctions and within the CNS of both embryos and larvae but are not detectable in epithelial tissues. Strong hypomorphic dlg alleles exhibit decreased expression of S97N, which may account for neural-specific aspects of the pleiomorphic dlg mutant phenotype. Selective inhibition of the expression of S97N-containing proteins in embryos by double-strand RNA leads to severe defects in neuronal differentiation and axon guidance, without overt perturbations in epithelia. These results indicate that the differential expression of dlg products correlates with distinct functions in non-neural and neural cells. During embryonic development, proteins that include the S97N domain are essential for proper neuronal differentiation and organization, acting through mechanisms that may include the adequate localization of cell fate determinants.
Collapse
|
900
|
Hong YS, Hogan JR, Wang X, Sarkar A, Sim C, Loftus BJ, Ren C, Huff ER, Carlile JL, Black K, Zhang HB, Gardner MJ, Collins FH. Construction of a BAC library and generation of BAC end sequence-tagged connectors for genome sequencing of the African malaria mosquito Anopheles gambiae. Mol Genet Genomics 2003; 268:720-8. [PMID: 12655398 DOI: 10.1007/s00438-003-0813-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2002] [Accepted: 01/06/2003] [Indexed: 11/28/2022]
Abstract
A Bacterial Artificial Chromosome (BAC) genomic DNA library of Anopheles gambiae, the major human malaria vector in sub-Saharan Africa, was constructed and characterized. This library (ND-TAM) is composed of 30,720 BAC clones in eighty 384-well plates. The estimated average insert size of the library is 133 kb, with an overall genome coverage of approximately 14-fold. The ends of approximately two-thirds of the clones in the library were sequenced, yielding 32,340 pair-mate ends. A statistical analysis (G-test) of the results of PCR screening of the library indicated a random distribution of BACs in the genome, although one gap encompassing the white locus on the X-chromosome was identified. Furthermore, combined with another previously constructed BAC library (ND-1), ~2,000 BACs have been physically mapped by polytene chromosomal in situ hybridization. These BAC end pair mates and physically mapped BACs have been useful for both the assembly of a fully sequenced A. gambiae genome and for linking the assembled sequence to the three polytene chromosomes. This ND-TAM library is now publicly available at both http://www.malaria.mr4.org/mr4pages/index.html/ and http://hbz.tamu.edu/, providing a valuable resource to the mosquito research community.
Collapse
Affiliation(s)
- Y S Hong
- Center for Tropical Disease Research and Training, Department of Biological Sciences, University of Notre Dame, IN 46556, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|