101
|
Heidelberg JF, Paulsen IT, Nelson KE, Gaidos EJ, Nelson WC, Read TD, Eisen JA, Seshadri R, Ward N, Methe B, Clayton RA, Meyer T, Tsapin A, Scott J, Beanan M, Brinkac L, Daugherty S, DeBoy RT, Dodson RJ, Durkin AS, Haft DH, Kolonay JF, Madupu R, Peterson JD, Umayam LA, White O, Wolf AM, Vamathevan J, Weidman J, Impraim M, Lee K, Berry K, Lee C, Mueller J, Khouri H, Gill J, Utterback TR, McDonald LA, Feldblyum TV, Smith HO, Venter JC, Nealson KH, Fraser CM. Genome sequence of the dissimilatory metal ion-reducing bacterium Shewanella oneidensis. Nat Biotechnol 2002; 20:1118-23. [PMID: 12368813 DOI: 10.1038/nbt749] [Citation(s) in RCA: 588] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2002] [Accepted: 09/05/2002] [Indexed: 11/08/2022]
Abstract
Shewanella oneidensis is an important model organism for bioremediation studies because of its diverse respiratory capabilities, conferred in part by multicomponent, branched electron transport systems. Here we report the sequencing of the S. oneidensis genome, which consists of a 4,969,803-base pair circular chromosome with 4,758 predicted protein-encoding open reading frames (CDS) and a 161,613-base pair plasmid with 173 CDSs. We identified the first Shewanella lambda-like phage, providing a potential tool for further genome engineering. Genome analysis revealed 39 c-type cytochromes, including 32 previously unidentified in S. oneidensis, and a novel periplasmic [Fe] hydrogenase, which are integral members of the electron transport system. This genome sequence represents a critical step in the elucidation of the pathways for reduction (and bioremediation) of pollutants such as uranium (U) and chromium (Cr), and offers a starting point for defining this organism's complex electron transport systems and metal ion-reducing capabilities.
Collapse
|
102
|
Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JMC, Wides R, Salzberg SL, Loftus B, Yandell M, Majoros WH, Rusch DB, Lai Z, Kraft CL, Abril JF, Anthouard V, Arensburger P, Atkinson PW, Baden H, de Berardinis V, Baldwin D, Benes V, Biedler J, Blass C, Bolanos R, Boscus D, Barnstead M, Cai S, Center A, Chaturverdi K, Christophides GK, Chrystal MA, Clamp M, Cravchik A, Curwen V, Dana A, Delcher A, Dew I, Evans CA, Flanigan M, Grundschober-Freimoser A, Friedli L, Gu Z, Guan P, Guigo R, Hillenmeyer ME, Hladun SL, Hogan JR, Hong YS, Hoover J, Jaillon O, Ke Z, Kodira C, Kokoza E, Koutsos A, Letunic I, Levitsky A, Liang Y, Lin JJ, Lobo NF, Lopez JR, Malek JA, McIntosh TC, Meister S, Miller J, Mobarry C, Mongin E, Murphy SD, O'Brochta DA, Pfannkoch C, Qi R, Regier MA, Remington K, Shao H, Sharakhova MV, Sitter CD, Shetty J, Smith TJ, Strong R, Sun J, Thomasova D, Ton LQ, Topalis P, Tu Z, Unger MF, Walenz B, Wang A, Wang J, Wang M, Wang X, Woodford KJ, Wortman JR, Wu M, Yao A, Zdobnov EM, Zhang H, Zhao Q, Zhao S, Zhu SC, Zhimulev I, Coluzzi M, della Torre A, Roth CW, Louis C, Kalush F, Mural RJ, Myers EW, Adams MD, Smith HO, Broder S, Gardner MJ, Fraser CM, Birney E, Bork P, Brey PT, Venter JC, Weissenbach J, Kafatos FC, Collins FH, Hoffman SL. The genome sequence of the malaria mosquito Anopheles gambiae. Science 2002; 298:129-49. [PMID: 12364791 DOI: 10.1126/science.1076181] [Citation(s) in RCA: 1399] [Impact Index Per Article: 63.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
Collapse
|
103
|
Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M, Silva JC, Ermolaeva MD, Allen JE, Selengut JD, Koo HL, Peterson JD, Pop M, Kosack DS, Shumway MF, Bidwell SL, Shallom SJ, van Aken SE, Riedmuller SB, Feldblyum TV, Cho JK, Quackenbush J, Sedegah M, Shoaibi A, Cummings LM, Florens L, Yates JR, Raine JD, Sinden RE, Harris MA, Cunningham DA, Preiser PR, Bergman LW, Vaidya AB, van Lin LH, Janse CJ, Waters AP, Smith HO, White OR, Salzberg SL, Venter JC, Fraser CM, Hoffman SL, Gardner MJ, Carucci DJ. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 2002; 419:512-9. [PMID: 12368865 DOI: 10.1038/nature01099] [Citation(s) in RCA: 532] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2002] [Accepted: 08/30/2002] [Indexed: 12/18/2022]
Abstract
Species of malaria parasite that infect rodents have long been used as models for malaria disease research. Here we report the whole-genome shotgun sequence of one species, Plasmodium yoelii yoelii, and comparative studies with the genome of the human malaria parasite Plasmodium falciparum clone 3D7. A synteny map of 2,212 P. y. yoelii contiguous DNA sequences (contigs) aligned to 14 P. falciparum chromosomes reveals marked conservation of gene synteny within the body of each chromosome. Of about 5,300 P. falciparum genes, more than 3,300 P. y. yoelii orthologues of predominantly metabolic function were identified. Over 800 copies of a variant antigen gene located in subtelomeric regions were found. This is the first genome sequence of a model eukaryotic parasite, and it provides insight into the use of such systems in the modelling of Plasmodium biology and disease.
Collapse
|
104
|
Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DMA, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 2002; 419:498-511. [PMID: 12368864 PMCID: PMC3836256 DOI: 10.1038/nature01097] [Citation(s) in RCA: 3062] [Impact Index Per Article: 139.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2002] [Accepted: 09/02/2002] [Indexed: 11/08/2022]
Abstract
The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.
Collapse
|
105
|
Gardner MJ, Shallom SJ, Carlton JM, Salzberg SL, Nene V, Shoaibi A, Ciecko A, Lynn J, Rizzo M, Weaver B, Jarrahi B, Brenner M, Parvizi B, Tallon L, Moazzez A, Granger D, Fujii C, Hansen C, Pederson J, Feldblyum T, Peterson J, Suh B, Angiuoli S, Pertea M, Allen J, Selengut J, White O, Cummings LM, Smith HO, Adams MD, Venter JC, Carucci DJ, Hoffman SL, Fraser CM. Sequence of Plasmodium falciparum chromosomes 2, 10, 11 and 14. Nature 2002; 419:531-4. [PMID: 12368868 DOI: 10.1038/nature01094] [Citation(s) in RCA: 139] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2002] [Accepted: 09/02/2002] [Indexed: 11/09/2022]
Abstract
The mosquito-borne malaria parasite Plasmodium falciparum kills an estimated 0.7-2.7 million people every year, primarily children in sub-Saharan Africa. Without effective interventions, a variety of factors-including the spread of parasites resistant to antimalarial drugs and the increasing insecticide resistance of mosquitoes-may cause the number of malaria cases to double over the next two decades. To stimulate basic research and facilitate the development of new drugs and vaccines, the genome of Plasmodium falciparum clone 3D7 has been sequenced using a chromosome-by-chromosome shotgun strategy. We report here the nucleotide sequences of chromosomes 10, 11 and 14, and a re-analysis of the chromosome 2 sequence. These chromosomes represent about 35% of the 23-megabase P. falciparum genome.
Collapse
|
106
|
Fleischmann RD, Alland D, Eisen JA, Carpenter L, White O, Peterson J, DeBoy R, Dodson R, Gwinn M, Haft D, Hickey E, Kolonay JF, Nelson WC, Umayam LA, Ermolaeva M, Salzberg SL, Delcher A, Utterback T, Weidman J, Khouri H, Gill J, Mikula A, Bishai W, Jacobs WR, Venter JC, Fraser CM. Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol 2002; 184:5479-90. [PMID: 12218036 PMCID: PMC135346 DOI: 10.1128/jb.184.19.5479-5490.2002] [Citation(s) in RCA: 492] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Virulence and immunity are poorly understood in Mycobacterium tuberculosis. We sequenced the complete genome of the M. tuberculosis clinical strain CDC1551 and performed a whole-genome comparison with the laboratory strain H37Rv in order to identify polymorphic sequences with potential relevance to disease pathogenesis, immunity, and evolution. We found large-sequence and single-nucleotide polymorphisms in numerous genes. Polymorphic loci included a phospholipase C, a membrane lipoprotein, members of an adenylate cyclase gene family, and members of the PE/PPE gene family, some of which have been implicated in virulence or the host immune response. Several gene families, including the PE/PPE gene family, also had significantly higher synonymous and nonsynonymous substitution frequencies compared to the genome as a whole. We tested a large sample of M. tuberculosis clinical isolates for a subset of the large-sequence and single-nucleotide polymorphisms and found widespread genetic variability at many of these loci. We performed phylogenetic and epidemiological analysis to investigate the evolutionary relationships among isolates and the origins of specific polymorphic loci. A number of these polymorphisms appear to have occurred multiple times as independent events, suggesting that these changes may be under selective pressure. Together, these results demonstrate that polymorphisms among M. tuberculosis strains are more extensive than initially anticipated, and genetic variation may have an important role in disease pathogenesis and immunity.
Collapse
|
107
|
Woodage T, Venter JC, Broder S. Application of the human genome to obstetrics and gynecology. Clin Obstet Gynecol 2002; 45:711-29; discussion 730-2. [PMID: 12370610 DOI: 10.1097/00003081-200209000-00014] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
108
|
Eisen JA, Nelson KE, Paulsen IT, Heidelberg JF, Wu M, Dodson RJ, Deboy R, Gwinn ML, Nelson WC, Haft DH, Hickey EK, Peterson JD, Durkin AS, Kolonay JL, Yang F, Holt I, Umayam LA, Mason T, Brenner M, Shea TP, Parksey D, Nierman WC, Feldblyum TV, Hansen CL, Craven MB, Radune D, Vamathevan J, Khouri H, White O, Gruber TM, Ketchum KA, Venter JC, Tettelin H, Bryant DA, Fraser CM. The complete genome sequence of Chlorobium tepidum TLS, a photosynthetic, anaerobic, green-sulfur bacterium. Proc Natl Acad Sci U S A 2002; 99:9509-14. [PMID: 12093901 PMCID: PMC123171 DOI: 10.1073/pnas.132181499] [Citation(s) in RCA: 263] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2002] [Accepted: 03/28/2002] [Indexed: 11/18/2022] Open
Abstract
The complete genome of the green-sulfur eubacterium Chlorobium tepidum TLS was determined to be a single circular chromosome of 2,154,946 bp. This represents the first genome sequence from the phylum Chlorobia, whose members perform anoxygenic photosynthesis by the reductive tricarboxylic acid cycle. Genome comparisons have identified genes in C. tepidum that are highly conserved among photosynthetic species. Many of these have no assigned function and may play novel roles in photosynthesis or photobiology. Phylogenomic analysis reveals likely duplications of genes involved in biosynthetic pathways for photosynthesis and the metabolism of sulfur and nitrogen as well as strong similarities between metabolic processes in C. tepidum and many Archaeal species.
Collapse
|
109
|
Mural RJ, Adams MD, Myers EW, Smith HO, Miklos GLG, Wides R, Halpern A, Li PW, Sutton GG, Nadeau J, Salzberg SL, Holt RA, Kodira CD, Lu F, Chen L, Deng Z, Evangelista CC, Gan W, Heiman TJ, Li J, Li Z, Merkulov GV, Milshina NV, Naik AK, Qi R, Shue BC, Wang A, Wang J, Wang X, Yan X, Ye J, Yooseph S, Zhao Q, Zheng L, Zhu SC, Biddick K, Bolanos R, Delcher AL, Dew IM, Fasulo D, Flanigan MJ, Huson DH, Kravitz SA, Miller JR, Mobarry CM, Reinert K, Remington KA, Zhang Q, Zheng XH, Nusskern DR, Lai Z, Lei Y, Zhong W, Yao A, Guan P, Ji RR, Gu Z, Wang ZY, Zhong F, Xiao C, Chiang CC, Yandell M, Wortman JR, Amanatides PG, Hladun SL, Pratts EC, Johnson JE, Dodson KL, Woodford KJ, Evans CA, Gropman B, Rusch DB, Venter E, Wang M, Smith TJ, Houck JT, Tompkins DE, Haynes C, Jacob D, Chin SH, Allen DR, Dahlke CE, Sanders R, Li K, Liu X, Levitsky AA, Majoros WH, Chen Q, Xia AC, Lopez JR, Donnelly MT, Newman MH, Glodek A, Kraft CL, Nodell M, Ali F, An HJ, Baldwin-Pitts D, Beeson KY, Cai S, Carnes M, Carver A, Caulk PM, Center A, Chen YH, Cheng ML, Coyne MD, Crowder M, Danaher S, Davenport LB, Desilets R, Dietz SM, Doup L, Dullaghan P, Ferriera S, Fosler CR, Gire HC, Gluecksmann A, Gocayne JD, Gray J, Hart B, Haynes J, Hoover J, Howland T, Ibegwam C, Jalali M, Johns D, Kline L, Ma DS, MacCawley S, Magoon A, Mann F, May D, McIntosh TC, Mehta S, Moy L, Moy MC, Murphy BJ, Murphy SD, Nelson KA, Nuri Z, Parker KA, Prudhomme AC, Puri VN, Qureshi H, Raley JC, Reardon MS, Regier MA, Rogers YHC, Romblad DL, Schutz J, Scott JL, Scott R, Sitter CD, Smallwood M, Sprague AC, Stewart E, Strong RV, Suh E, Sylvester K, Thomas R, Tint NN, Tsonis C, Wang G, Wang G, Williams MS, Williams SM, Windsor SM, Wolfe K, Wu MM, Zaveri J, Chaturvedi K, Gabrielian AE, Ke Z, Sun J, Subramanian G, Venter JC, Pfannkoch CM, Barnstead M, Stephenson LD. A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 2002; 296:1661-71. [PMID: 12040188 DOI: 10.1126/science.1069193] [Citation(s) in RCA: 300] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The high degree of similarity between the mouse and human genomes is demonstrated through analysis of the sequence of mouse chromosome 16 (Mmu 16), which was obtained as part of a whole-genome shotgun assembly of the mouse genome. The mouse genome is about 10% smaller than the human genome, owing to a lower repetitive DNA content. Comparison of the structure and protein-coding potential of Mmu 16 with that of the homologous segments of the human genome identifies regions of conserved synteny with human chromosomes (Hsa) 3, 8, 12, 16, 21, and 22. Gene content and order are highly conserved between Mmu 16 and the syntenic blocks of the human genome. Of the 731 predicted genes on Mmu 16, 509 align with orthologs on the corresponding portions of the human genome, 44 are likely paralogous to these genes, and 164 genes have homologs elsewhere in the human genome; there are 14 genes for which we could find no human counterpart.
Collapse
|
110
|
Myers EW, Sutton GG, Smith HO, Adams MD, Venter JC. On the sequencing and assembly of the human genome. Proc Natl Acad Sci U S A 2002; 99:4145-6. [PMID: 11904395 PMCID: PMC123615 DOI: 10.1073/pnas.092136699] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
111
|
Hoffman SL, Subramanian GM, Collins FH, Venter JC. Plasmodium, human and Anopheles genomics and malaria. Nature 2002; 415:702-9. [PMID: 11832959 DOI: 10.1038/415702a] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The Plasmodium spp. parasites that cause malaria are transmitted to humans by Anopheles spp. mosquitoes. Scientists have now amassed a great body of knowledge about the parasite, its mosquito vector and human host. Yet this year there will be 300-500 million new malaria infections and 1-3 million deaths caused by the disease. We believe that integrated analyses of genome sequence, DNA polymorphisms, and messenger RNA and protein expression profiles will lead to greater understanding of the molecular basis of vector-human and host-parasite interactions and provide strategies to build upon these insights to develop interventions to mitigate human morbidity and mortality from malaria.
Collapse
|
112
|
Subramanian G, Mural R, Hoffman SL, Venter JC, Broder S. Microbial disease in humans: A genomic perspective. MOLECULAR DIAGNOSIS : A JOURNAL DEVOTED TO THE UNDERSTANDING OF HUMAN DISEASE THROUGH THE CLINICAL APPLICATION OF MOLECULAR BIOLOGY 2001; 6:243-52. [PMID: 11774190 DOI: 10.1054/modi.2001.28062] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The approach of whole-genome shotgun sequencing coupled with the availability of computational algorithms to facilitate the assembly, gene prediction, and functional annotation of entire genomes has sparked a revolution in our understanding of the biology of free-living organisms. More than 40 bacterial genomes have been sequenced to date, of which several are important human pathogens. The capacity to sequence and assemble entire genomes of bacteria, pathogenic protozoans, and fungi in a rapid and cost-effective way has energized every aspect of microbial science. Comparative genome analysis allows us to dissect the evolutionary forces at work and provides insights into adaptations of microbes to their unique ecological niches. Factors that shape host-pathogen interactions and their outcomes include genetic polymorphisms in the microbial pathogen and host, both of which can impact on microbial virulence or host immune responses to infection. The availability of the genome sequence of entire organisms, together with the use of high-throughput sequence-based genomic technologies to define microbial and host physiological states, provides the unparalleled opportunity to better define clinical outcomes in the field of infectious diseases. There is one overarching lesson: completion of the genomic sequence of any species answers many questions, while at the same time it invites totally new questions.
Collapse
|
113
|
Subramanian G, Adams MD, Venter JC, Broder S. Implications of the human genome for understanding human biology and medicine. JAMA 2001; 286:2296-307. [PMID: 11710896 DOI: 10.1001/jama.286.18.2296] [Citation(s) in RCA: 101] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Clinical researchers, practicing physicians, patients, and the general public now live in a world in which the 2.9 billion nucleotide codes of the human genome are available as a resource for scientific discovery. Some of the findings from the sequencing of the human genome were expected, confirming knowledge presaged by many decades of research in both human and comparative genetics. Other findings are unexpected in their scientific and philosophical implications. In either case, the availability of the human genome is likely to have significant implications, first for clinical research and then for the practice of medicine. This article provides our reflections on what the new genomic knowledge might mean for the future of medicine and how the new knowledge relates to what we knew in the era before the availability of the genome sequence. In addition, practicing physicians in many communities are traditionally also ambassadors of science, called on to translate arcane data or the complex ramifications of biology into a language understood by the public at large. This article also may be useful for physicians who serve in this capacity in their communities. We address the following issues: the number of protein-coding genes in the human genome and certain classes of noncoding repeat elements in the genome; features of genome evolution, including large-scale duplications; an overview of the predicted protein set to highlight prominent differences between the human genome and other sequenced eukaryotic genomes; and DNA variation in the human genome. In addition, we show how this information lays the foundations for ongoing and future endeavors that will revolutionize biomedical research and our understanding of human health.
Collapse
|
114
|
Cravchik A, Subramanian G, Broder S, Venter JC. Sequence analysis of the human genome: implications for the understanding of nervous system function and disease. ARCHIVES OF NEUROLOGY 2001; 58:1772-8. [PMID: 11708983 DOI: 10.1001/archneur.58.11.1772] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
The recent publication of the sequence of the human genome will accelerate the discovery of new genetic susceptibility factors for human disease, leading to the development of novel diagnostics and therapeutics. The exhaustive analysis of the human genome sequence will be the focus of the biomedical research community for many years to come. In particular, comparative analysis of the available eukaryotic genome sequences is an important approach to further our understanding of gene structure, function, and evolution. Our initial analysis of the human genome sequence has revealed many interesting features that are relevant to nervous system function, evolution, and disease. We analyzed the prominent features of predicted human proteins involved in neuronal function and prepared a comparative analysis of 146 human genes that have alleles (or mutations) conferring susceptibility for 168 neurologic diseases.
Collapse
|
115
|
Tettelin H, Nelson KE, Paulsen IT, Eisen JA, Read TD, Peterson S, Heidelberg J, DeBoy RT, Haft DH, Dodson RJ, Durkin AS, Gwinn M, Kolonay JF, Nelson WC, Peterson JD, Umayam LA, White O, Salzberg SL, Lewis MR, Radune D, Holtzapple E, Khouri H, Wolf AM, Utterback TR, Hansen CL, McDonald LA, Feldblyum TV, Angiuoli S, Dickinson T, Hickey EK, Holt IE, Loftus BJ, Yang F, Smith HO, Venter JC, Dougherty BA, Morrison DA, Hollingshead SK, Fraser CM. Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science 2001; 293:498-506. [PMID: 11463916 DOI: 10.1126/science.1061217] [Citation(s) in RCA: 1032] [Impact Index Per Article: 44.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The 2,160,837-base pair genome sequence of an isolate of Streptococcus pneumoniae, a Gram-positive pathogen that causes pneumonia, bacteremia, meningitis, and otitis media, contains 2236 predicted coding regions; of these, 1440 (64%) were assigned a biological role. Approximately 5% of the genome is composed of insertion sequences that may contribute to genome rearrangements through uptake of foreign DNA. Extracellular enzyme systems for the metabolism of polysaccharides and hexosamines provide a substantial source of carbon and nitrogen for S. pneumoniae and also damage host tissues and facilitate colonization. A motif identified within the signal peptide of proteins is potentially involved in targeting these proteins to the cell surface of low-guanine/cytosine (GC) Gram-positive species. Several surface-exposed proteins that may serve as potential vaccine candidates were identified. Comparative genome hybridization with DNA arrays revealed strain differences in S. pneumoniae that could contribute to differences in virulence and antigenicity.
Collapse
|
116
|
Nierman WC, Feldblyum TV, Laub MT, Paulsen IT, Nelson KE, Eisen JA, Heidelberg JF, Alley MR, Ohta N, Maddock JR, Potocka I, Nelson WC, Newton A, Stephens C, Phadke ND, Ely B, DeBoy RT, Dodson RJ, Durkin AS, Gwinn ML, Haft DH, Kolonay JF, Smit J, Craven MB, Khouri H, Shetty J, Berry K, Utterback T, Tran K, Wolf A, Vamathevan J, Ermolaeva M, White O, Salzberg SL, Venter JC, Shapiro L, Fraser CM, Eisen J. Complete genome sequence of Caulobacter crescentus. Proc Natl Acad Sci U S A 2001; 98:4136-41. [PMID: 11259647 PMCID: PMC31192 DOI: 10.1073/pnas.061029298] [Citation(s) in RCA: 388] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living alpha-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus.
Collapse
|
117
|
Venter JC. Genomic impact on pharmaceutical development. NOVARTIS FOUNDATION SYMPOSIUM 2001; 229:14-5; discussion 15-8. [PMID: 11084924 DOI: 10.1002/047084664x.ch3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
118
|
Venter JC. From genome to therapy: integrating new technologies with drug development. Introduction. NOVARTIS FOUNDATION SYMPOSIUM 2001; 229:1-3; discussion 4. [PMID: 11084922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
119
|
Sparks AB, Peterson SN, Bell C, Loftus BJ, Hocking L, Cahill DP, Frassica FJ, Streeten EA, Levine MA, Fraser CM, Adams MD, Broder S, Venter JC, Kinzler KW, Vogelstein B, Ralston SH. Mutation screening of the TNFRSF11A gene encoding receptor activator of NF kappa B (RANK) in familial and sporadic Paget's disease of bone and osteosarcoma. Calcif Tissue Int 2001; 68:151-5. [PMID: 11351498 DOI: 10.1007/s002230001211] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Paget's disease of bone (PDB) is a common disorder characterized by focal areas of increased and disorganized osteoclastic bone resorption, leading to bone pain, deformity, pathological fracture, and an increased risk of osteosarcoma. Genetic factors play an important role in the pathogenesis of Paget's disease. In some families, the disease has been found to be linked to a susceptibility locus on chromosome 18q21-22, which also contains the gene responsible for familial expansile osteolysis (FEO)--a rare bone dysplasia with many similarities to Paget's disease. Insertion mutations of the TNFRSF11A gene encoding Receptor Activator of NF kappa B (RANK) have recently been found to be responsible for FEO and rare cases of early onset familial Paget's disease. Loss of heterozygosity (LOH) affecting the PDB/FEO critical region has also been described in osteosarcomas suggesting that TNFRSF11A might also be involved in the development of osteosarcoma. In order to investigate the possible role of TNFRSF11A in the pathogenesis of Paget's disease and osteosarcoma, we conducted mutation screening of the TNFRSF11A gene in patients with familial and sporadic Paget's disease as well as DNA extracted from Pagetic bone lesions, an osteosarcoma arising in Pagetic bone and six osteosarcoma cell lines. No specific abnormalities of the TNFRSF11A gene were identified in a Pagetic osteosarcoma, the osteosarcoma cell lines, DNA extracted from Pagetic bone lesions, or DNA extracted from peripheral blood in patients with familial or sporadic Paget's disease including several individuals with early onset Paget's disease. These data indicate that TNFRSF11A mutations contribute neither to the vast majority of cases of sporadic or familial PDB, nor to the development of osteosarcoma.
Collapse
|
120
|
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigó R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Deslattes Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X. The sequence of the human genome. Science 2001; 291:1304-51. [PMID: 11181995 DOI: 10.1126/science.1058040] [Citation(s) in RCA: 7683] [Impact Index Per Article: 334.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
Collapse
|
121
|
Theologis A, Ecker JR, Palm CJ, Federspiel NA, Kaul S, White O, Alonso J, Altafi H, Araujo R, Bowman CL, Brooks SY, Buehler E, Chan A, Chao Q, Chen H, Cheuk RF, Chin CW, Chung MK, Conn L, Conway AB, Conway AR, Creasy TH, Dewar K, Dunn P, Etgu P, Feldblyum TV, Feng J, Fong B, Fujii CY, Gill JE, Goldsmith AD, Haas B, Hansen NF, Hughes B, Huizar L, Hunter JL, Jenkins J, Johnson-Hopson C, Khan S, Khaykin E, Kim CJ, Koo HL, Kremenetskaia I, Kurtz DB, Kwan A, Lam B, Langin-Hooper S, Lee A, Lee JM, Lenz CA, Li JH, Li Y, Lin X, Liu SX, Liu ZA, Luros JS, Maiti R, Marziali A, Militscher J, Miranda M, Nguyen M, Nierman WC, Osborne BI, Pai G, Peterson J, Pham PK, Rizzo M, Rooney T, Rowley D, Sakano H, Salzberg SL, Schwartz JR, Shinn P, Southwick AM, Sun H, Tallon LJ, Tambunga G, Toriumi MJ, Town CD, Utterback T, Van Aken S, Vaysberg M, Vysotskaia VS, Walker M, Wu D, Yu G, Fraser CM, Venter JC, Davis RW. Sequence and analysis of chromosome 1 of the plant Arabidopsis thaliana. Nature 2000; 408:816-20. [PMID: 11130712 DOI: 10.1038/35048500] [Citation(s) in RCA: 134] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genome of the flowering plant Arabidopsis thaliana has five chromosomes. Here we report the sequence of the largest, chromosome 1, in two contigs of around 14.2 and 14.6 megabases. The contigs extend from the telomeres to the centromeric borders, regions rich in transposons, retrotransposons and repetitive elements such as the 180-base-pair repeat. The chromosome represents 25% of the genome and contains about 6,850 open reading frames, 236 transfer RNAs (tRNAs) and 12 small nuclear RNAs. There are two clusters of tRNA genes at different places on the chromosome. One consists of 27 tRNA(Pro) genes and the other contains 27 tandem repeats of tRNA(Tyr)-tRNA(Tyr)-tRNA(Ser) genes. Chromosome 1 contains about 300 gene families with clustered duplications. There are also many repeat elements, representing 8% of the sequence.
Collapse
|
122
|
Salanoubat M, Lemcke K, Rieger M, Ansorge W, Unseld M, Fartmann B, Valle G, Blöcker H, Perez-Alonso M, Obermaier B, Delseny M, Boutry M, Grivell LA, Mache R, Puigdomènech P, De Simone V, Choisne N, Artiguenave F, Robert C, Brottier P, Wincker P, Cattolico L, Weissenbach J, Saurin W, Quétier F, Schäfer M, Müller-Auer S, Gabel C, Fuchs M, Benes V, Wurmbach E, Drzonek H, Erfle H, Jordan N, Bangert S, Wiedelmann R, Kranz H, Voss H, Holland R, Brandt P, Nyakatura G, Vezzi A, D'Angelo M, Pallavicini A, Toppo S, Simionati B, Conrad A, Hornischer K, Kauer G, Löhnert TH, Nordsiek G, Reichelt J, Scharfe M, Schön O, Bargues M, Terol J, Climent J, Navarro P, Collado C, Perez-Perez A, Ottenwälder B, Duchemin D, Cooke R, Laudie M, Berger-Llauro C, Purnelle B, Masuy D, de Haan M, Maarse AC, Alcaraz JP, Cottet A, Casacuberta E, Monfort A, Argiriou A, flores M, Liguori R, Vitale D, Mannhaupt G, Haase D, Schoof H, Rudd S, Zaccaria P, Mewes HW, Mayer KF, Kaul S, Town CD, Koo HL, Tallon LJ, Jenkins J, Rooney T, Rizzo M, Walts A, Utterback T, Fujii CY, Shea TP, Creasy TH, Haas B, Maiti R, Wu D, Peterson J, Van Aken S, Pai G, Militscher J, Sellers P, Gill JE, Feldblyum TV, Preuss D, Lin X, Nierman WC, Salzberg SL, White O, Venter JC, Fraser CM, Kaneko T, Nakamura Y, Sato S, Kato T, Asamizu E, Sasamoto S, Kimura T, Idesawa K, Kawashima K, Kishida Y, Kiyokawa C, Kohara M, Matsumoto M, Matsuno A, Muraki A, Nakayama S, Nakazaki N, Shinpo S, Takeuchi C, Wada T, Watanabe A, Yamada M, Yasuda M, Tabata S. Sequence and analysis of chromosome 3 of the plant Arabidopsis thaliana. Nature 2000; 408:820-2. [PMID: 11130713 DOI: 10.1038/35048706] [Citation(s) in RCA: 142] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Arabidopsis thaliana is an important model system for plant biologists. In 1996 an international collaboration (the Arabidopsis Genome Initiative) was formed to sequence the whole genome of Arabidopsis and in 1999 the sequence of the first two chromosomes was reported. The sequence of the last three chromosomes and an analysis of the whole genome are reported in this issue. Here we present the sequence of chromosome 3, organized into four sequence segments (contigs). The two largest (13.5 and 9.2 Mb) correspond to the top (long) and the bottom (short) arms of chromosome 3, and the two small contigs are located in the genetically defined centromere. This chromosome encodes 5,220 of the roughly 25,500 predicted protein-coding genes in the genome. About 20% of the predicted proteins have significant homology to proteins in eukaryotic genomes for which the complete sequence is available, pointing to important conserved cellular functions among eukaryotes.
Collapse
|
123
|
Abstract
Our genomic DNA sequence provides a unique glimpse of the provenance and evolution of our species, the migration of peoples, and the causation of disease. Understanding the genome may help resolve previously unanswerable questions, including perhaps which human characteristics are innate or acquired. Such an understanding will make it possible to study how genomic DNA sequence varies among populations and among individuals, including the role of such variation in the pathogenesis of important illnesses and responses to pharmaceuticals. The study of the genome and the associated proteomics of free-living organisms will eventually make it possible to localize and annotate every human gene, as well as the regulatory elements that control the timing, organ-site specificity, extent of gene expression, protein levels, and post-translational modifications. For any given physiological process, we will have a new paradigm for addressing its evolution, development, function, and mechanism.
Collapse
|
124
|
|
125
|
Broder S, Venter JC. Sequencing the entire genomes of free-living organisms: the foundation of pharmacology in the new millennium. Annu Rev Pharmacol Toxicol 2000; 40:97-132. [PMID: 10836129 DOI: 10.1146/annurev.pharmtox.40.1.97] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The power and effectiveness of clinical pharmacology are about to be transformed with a speed that earlier in this decade could not have been foreseen even by the most astute visionaries. In the very near future, we will have at our disposal the reference DNA sequence for the entire human genome, estimated to contain approximately 3.5 billion bp. At the same time, the science of whole genome sequencing is fostering the computational science of bioinformatics needed to develop practical applications for pharmacology and toxicology. Indeed, it is likely that pharmacology, toxicology, bioinformatics, and genomics will merge into a new branch of medical science for studying and developing pharmaceuticals from molecule to bedside.
Collapse
|
126
|
Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, Andrews-Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de Pablos B, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Dugan-Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Sidén-Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM, Venter JC. The genome sequence of Drosophila melanogaster. Science 2000; 287:2185-95. [PMID: 10731132 DOI: 10.1126/science.287.5461.2185] [Citation(s) in RCA: 3976] [Impact Index Per Article: 165.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.
Collapse
|
127
|
Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA, Anson EL, Bolanos RA, Chou HH, Jordan CM, Halpern AL, Lonardi S, Beasley EM, Brandon RC, Chen L, Dunn PJ, Lai Z, Liang Y, Nusskern DR, Zhan M, Zhang Q, Zheng X, Rubin GM, Adams MD, Venter JC. A whole-genome assembly of Drosophila. Science 2000; 287:2196-204. [PMID: 10731133 DOI: 10.1126/science.287.5461.2196] [Citation(s) in RCA: 994] [Impact Index Per Article: 41.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
We report on the quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it. Three independent external data sources essentially agree with and support the assembly's sequence and ordering of contigs across the euchromatic portion of the genome. In addition, there are isolated contigs that we believe represent nonrepetitive pockets within the heterochromatin of the centromeres. Comparison with a previously sequenced 2.9- megabase region indicates that sequencing accuracy within nonrepetitive segments is greater than 99. 99% without manual curation. As such, this initial reconstruction of the Drosophila sequence should be of substantial value to the scientific community.
Collapse
|
128
|
Tettelin H, Saunders NJ, Heidelberg J, Jeffries AC, Nelson KE, Eisen JA, Ketchum KA, Hood DW, Peden JF, Dodson RJ, Nelson WC, Gwinn ML, DeBoy R, Peterson JD, Hickey EK, Haft DH, Salzberg SL, White O, Fleischmann RD, Dougherty BA, Mason T, Ciecko A, Parksey DS, Blair E, Cittone H, Clark EB, Cotton MD, Utterback TR, Khouri H, Qin H, Vamathevan J, Gill J, Scarlato V, Masignani V, Pizza M, Grandi G, Sun L, Smith HO, Fraser CM, Moxon ER, Rappuoli R, Venter JC. Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science 2000; 287:1809-15. [PMID: 10710307 DOI: 10.1126/science.287.5459.1809] [Citation(s) in RCA: 814] [Impact Index Per Article: 33.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The 2,272,351-base pair genome of Neisseria meningitidis strain MC58 (serogroup B), a causative agent of meningitis and septicemia, contains 2158 predicted coding regions, 1158 (53.7%) of which were assigned a biological role. Three major islands of horizontal DNA transfer were identified; two of these contain genes encoding proteins involved in pathogenicity, and the third island contains coding sequences only for hypothetical proteins. Insights into the commensal and virulence behavior of N. meningitidis can be gleaned from the genome, in which sequences for structural proteins of the pilus are clustered and several coding regions unique to serogroup B capsular polysaccharide synthesis can be identified. Finally, N. meningitidis contains more genes that undergo phase variation than any pathogen studied to date, a mechanism that controls their expression and contributes to the evasion of the host immune system.
Collapse
|
129
|
Pizza M, Scarlato V, Masignani V, Giuliani MM, Aricò B, Comanducci M, Jennings GT, Baldi L, Bartolini E, Capecchi B, Galeotti CL, Luzzi E, Manetti R, Marchetti E, Mora M, Nuti S, Ratti G, Santini L, Savino S, Scarselli M, Storni E, Zuo P, Broeker M, Hundt E, Knapp B, Blair E, Mason T, Tettelin H, Hood DW, Jeffries AC, Saunders NJ, Granoff DM, Venter JC, Moxon ER, Grandi G, Rappuoli R. Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science 2000; 287:1816-20. [PMID: 10710308 DOI: 10.1126/science.287.5459.1816] [Citation(s) in RCA: 916] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Neisseria meningitidis is a major cause of bacterial septicemia and meningitis. Sequence variation of surface-exposed proteins and cross-reactivity of the serogroup B capsular polysaccharide with human tissues have hampered efforts to develop a successful vaccine. To overcome these obstacles, the entire genome sequence of a virulent serogroup B strain (MC58) was used to identify vaccine candidates. A total of 350 candidate antigens were expressed in Escherichia coli, purified, and used to immunize mice. The sera allowed the identification of proteins that are surface exposed, that are conserved in sequence across a range of strains, and that induce a bactericidal antibody response, a property known to correlate with vaccine efficacy in humans.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Antibodies, Bacterial/biosynthesis
- Antibodies, Bacterial/blood
- Antigens, Bacterial/chemistry
- Antigens, Bacterial/genetics
- Antigens, Bacterial/immunology
- Antigens, Surface/chemistry
- Antigens, Surface/genetics
- Antigens, Surface/immunology
- Bacterial Capsules
- Bacterial Proteins/chemistry
- Bacterial Proteins/genetics
- Bacterial Proteins/immunology
- Bacterial Vaccines/genetics
- Bacterial Vaccines/immunology
- Conserved Sequence
- Escherichia coli/genetics
- Genome, Bacterial
- Humans
- Immune Sera/immunology
- Mice
- Neisseria meningitidis/classification
- Neisseria meningitidis/genetics
- Neisseria meningitidis/immunology
- Neisseria meningitidis/pathogenicity
- Open Reading Frames
- Recombinant Fusion Proteins/chemistry
- Recombinant Fusion Proteins/immunology
- Recombinant Fusion Proteins/isolation & purification
- Recombination, Genetic
- Sequence Analysis, DNA
- Serotyping
- Vaccination
- Virulence
Collapse
|
130
|
Zhao S, Malek J, Mahairas G, Fu L, Nierman W, Venter JC, Adams MD. Human BAC ends quality assessment and sequence analyses. Genomics 2000; 63:321-32. [PMID: 10704280 DOI: 10.1006/geno.1999.6082] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
End sequences from bacterial artificial chromosomes (BACs) provide highly specific sequence markers in large-scale sequencing projects. To date, we have generated >300,000 end sequences from >186,000 human BAC clones with an average read length of >460 bp for a total of 141 Mb covering approximately 4.7% of the genome. Over 60% of the clones have BAC end sequences (BESs) from both ends representing more than fivefold coverage of the human genome by the paired-end clones. Our quality assessments and sequence analyses indicate that BESs from human BAC libraries developed at The California Institute of Technology (CalTech) and Roswell Park Cancer Institute have similar properties. The analyses have highlighted differences in insert size for different segments of the CalTech library. Problems with the fidelity of tracking of sequence data back to physical clones have been observed in some subsets of the overall BES dataset. The annotation results of BESs for the contents of available genomic sequences, sequence tagged sites, expressed sequence tags, protein encoding regions, and repeats indicate that this resource will be valuable in many areas of genome research.
Collapse
|
131
|
Heidelberg JF, Eisen JA, Nelson WC, Clayton RA, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Umayam L, Gill SR, Nelson KE, Read TD, Tettelin H, Richardson D, Ermolaeva MD, Vamathevan J, Bass S, Qin H, Dragoi I, Sellers P, McDonald L, Utterback T, Fleishmann RD, Nierman WC, White O, Salzberg SL, Smith HO, Colwell RR, Mekalanos JJ, Venter JC, Fraser CM. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 2000; 406:477-83. [PMID: 10952301 PMCID: PMC8288016 DOI: 10.1038/35020000] [Citation(s) in RCA: 1309] [Impact Index Per Article: 54.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Here we determine the complete genomic sequence of the gram negative, gamma-Proteobacterium Vibrio cholerae El Tor N16961 to be 4,033,460 base pairs (bp). The genome consists of two circular chromosomes of 2,961,146 bp and 1,072,314 bp that together encode 3,885 open reading frames. The vast majority of recognizable genes for essential cell functions (such as DNA replication, transcription, translation and cell-wall biosynthesis) and pathogenicity (for example, toxins, surface antigens and adhesins) are located on the large chromosome. In contrast, the small chromosome contains a larger fraction (59%) of hypothetical genes compared with the large chromosome (42%), and also contains many more genes that appear to have origins other than the gamma-Proteobacteria. The small chromosome also carries a gene capture system (the integron island) and host 'addiction' genes that are typically found on plasmids; thus, the small chromosome may have originally been a megaplasmid that was captured by an ancestral Vibrio species. The V. cholerae genomic sequence provides a starting point for understanding how a free-living, environmental organism emerged to become a significant human bacterial pathogen.
Collapse
|
132
|
Lin X, Kaul S, Rounsley S, Shea TP, Benito MI, Town CD, Fujii CY, Mason T, Bowman CL, Barnstead M, Feldblyum TV, Buell CR, Ketchum KA, Lee J, Ronning CM, Koo HL, Moffat KS, Cronin LA, Shen M, Pai G, Van Aken S, Umayam L, Tallon LJ, Gill JE, Adams MD, Carrera AJ, Creasy TH, Goodman HM, Somerville CR, Copenhaver GP, Preuss D, Nierman WC, White O, Eisen JA, Salzberg SL, Fraser CM, Venter JC. Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana. Nature 1999; 402:761-8. [PMID: 10617197 DOI: 10.1038/45471] [Citation(s) in RCA: 417] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130-140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.
Collapse
|
133
|
Hutchison CA, Peterson SN, Gill SR, Cline RT, White O, Fraser CM, Smith HO, Venter JC. Global transposon mutagenesis and a minimal Mycoplasma genome. Science 1999; 286:2165-9. [PMID: 10591650 DOI: 10.1126/science.286.5447.2165] [Citation(s) in RCA: 596] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Mycoplasma genitalium with 517 genes has the smallest gene complement of any independently replicating cell so far identified. Global transposon mutagenesis was used to identify nonessential genes in an effort to learn whether the naturally occurring gene complement is a true minimal genome under laboratory growth conditions. The positions of 2209 transposon insertions in the completely sequenced genomes of M. genitalium and its close relative M. pneumoniae were determined by sequencing across the junction of the transposon and the genomic DNA. These junctions defined 1354 distinct sites of insertion that were not lethal. The analysis suggests that 265 to 350 of the 480 protein-coding genes of M. genitalium are essential under laboratory growth conditions, including about 100 genes of unknown function.
Collapse
|
134
|
White O, Eisen JA, Heidelberg JF, Hickey EK, Peterson JD, Dodson RJ, Haft DH, Gwinn ML, Nelson WC, Richardson DL, Moffat KS, Qin H, Jiang L, Pamphile W, Crosby M, Shen M, Vamathevan JJ, Lam P, McDonald L, Utterback T, Zalewski C, Makarova KS, Aravind L, Daly MJ, Minton KW, Fleischmann RD, Ketchum KA, Nelson KE, Salzberg S, Smith HO, Venter JC, Fraser CM. Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1. Science 1999; 286:1571-7. [PMID: 10567266 PMCID: PMC4147723 DOI: 10.1126/science.286.5444.1571] [Citation(s) in RCA: 678] [Impact Index Per Article: 27.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The complete genome sequence of the radiation-resistant bacterium Deinococcus radiodurans R1 is composed of two chromosomes (2,648,638 and 412,348 base pairs), a megaplasmid (177,466 base pairs), and a small plasmid (45,704 base pairs), yielding a total genome of 3,284, 156 base pairs. Multiple components distributed on the chromosomes and megaplasmid that contribute to the ability of D. radiodurans to survive under conditions of starvation, oxidative stress, and high amounts of DNA damage were identified. Deinococcus radiodurans represents an organism in which all systems for DNA repair, DNA damage export, desiccation and starvation recovery, and genetic redundancy are present in one cell.
Collapse
|
135
|
Lai Z, Jing J, Aston C, Clarke V, Apodaca J, Dimalanta ET, Carucci DJ, Gardner MJ, Mishra B, Anantharaman TS, Paxia S, Hoffman SL, Craig Venter J, Huff EJ, Schwartz DC. A shotgun optical map of the entire Plasmodium falciparum genome. Nat Genet 1999; 23:309-13. [PMID: 10610179 DOI: 10.1038/15484] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The unicellular parasite Plasmodium falciparum is the cause of human malaria, resulting in 1.7-2.5 million deaths each year. To develop new means to treat or prevent malaria, the Malaria Genome Consortium was formed to sequence and annotate the entire 24.6-Mb genome. The plan, already underway, is to sequence libraries created from chromosomal DNA separated by pulsed-field gel electrophoresis (PFGE). The AT-rich genome of P. falciparum presents problems in terms of reliable library construction and the relative paucity of dense physical markers or extensive genetic resources. To deal with these problems, we reasoned that a high-resolution, ordered restriction map covering the entire genome could serve as a scaffold for the alignment and verification of sequence contigs developed by members of the consortium. Thus optical mapping was advanced to use simply extracted, unfractionated genomic DNA as its principal substrate. Ordered restriction maps (BamHI and NheI) derived from single molecules were assembled into 14 deep contigs corresponding to the molecular karyotype determined by PFGE (ref. 3).
Collapse
|
136
|
Loftus BJ, Kim UJ, Sneddon VP, Kalush F, Brandon R, Fuhrmann J, Mason T, Crosby ML, Barnstead M, Cronin L, Deslattes Mays A, Cao Y, Xu RX, Kang HL, Mitchell S, Eichler EE, Harris PC, Venter JC, Adams MD. Genome duplications and other features in 12 Mb of DNA sequence from human chromosome 16p and 16q. Genomics 1999; 60:295-308. [PMID: 10493829 DOI: 10.1006/geno.1999.5927] [Citation(s) in RCA: 105] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Several publicly funded large-scale sequencing efforts have been initiated with the goal of completing the first reference human genome sequence by the year 2005. Here we present the results of analysis of 11.8 Mb of genomic sequence from chromosome 16. The apparent gene density varies throughout the region, but the number of genes predicted (84) suggests that this is a gene-poor region. This result may also suggest that the total number of human genes is likely to be at the lower end of published estimates. One of the most interesting aspects of this region of the genome is the presence of highly homologous, recently duplicated tracts of sequence distributed throughout the p-arm. Such duplications have implications for mapping and gene analysis as well as the predisposition to recurrent chromosomal structural rearrangements associated with genetic disease.
Collapse
|
137
|
Lin J, Qi R, Aston C, Jing J, Anantharaman TS, Mishra B, White O, Daly MJ, Minton KW, Venter JC, Schwartz DC. Whole-genome shotgun optical mapping of Deinococcus radiodurans. Science 1999; 285:1558-62. [PMID: 10477518 DOI: 10.1126/science.285.5433.1558] [Citation(s) in RCA: 150] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
A whole-genome restriction map of Deinococcus radiodurans, a radiation-resistant bacterium able to survive up to 15,000 grays of ionizing radiation, was constructed without using DNA libraries, the polymerase chain reaction, or electrophoresis. Very large, randomly sheared, genomic DNA fragments were used to construct maps from individual DNA molecules that were assembled into two circular overlapping maps (2.6 and 0.415 megabases), without gaps. A third smaller chromosome (176 kilobases) was identified and characterized. Aberrant nonlinear DNA structures that may define chromosome structure and organization, as well as intermediates in DNA repair, were directly visualized by optical mapping techniques after gamma irradiation.
Collapse
|
138
|
Gardner MJ, Tettelin H, Carucci DJ, Cummings LM, Smith HO, Fraser CM, Venter JC, Hoffman SL. The malaria genome sequencing project: complete sequence of Plasmodium falciparum chromosome 2. PARASSITOLOGIA 1999; 41:69-75. [PMID: 10697835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
An international consortium has been formed to sequence the entire genome of the human malaria parasite Plasmodium falciparum. We sequenced chromosome 2 of clone 3D7 using a shotgun sequencing strategy. Chromosome 2 is 947 kb in length, has a base composition of 80.2% A + T, and contains 210 predicted genes. In comparison to the Saccharomyces cerevisiae genome, chromosome 2 has a lower gene density, a greater proportion of genes containing introns, and nearly twice as many proteins containing predicted non-globular domains. A group of putative surface proteins was identified, rifins, which are encoded by a gene family comprising up to 7% of the protein-encoding gene in the genome. The rifins exhibit considerable sequence diversity and may play an important role in antigenic variation. Sixteen genes encoded on chromosome 2 showed signs of a plastid or mitochondrial origin, including several genes involved in fatty acid biosynthesis. Completion of the chromosome 2 sequence demonstrated that the A + T-rich genome of P. falciparum can be sequenced by the shotgun approach. Within 2-3 years, the sequence of almost all P. falciparum genes will have been determined, paving the way for genetic, biochemical, and immunological research aimed at developing new drugs and vaccines against malaria.
Collapse
|
139
|
Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Nelson WC, Ketchum KA, McDonald L, Utterback TR, Malek JA, Linher KD, Garrett MM, Stewart AM, Cotton MD, Pratt MS, Phillips CA, Richardson D, Heidelberg J, Sutton GG, Fleischmann RD, Eisen JA, White O, Salzberg SL, Smith HO, Venter JC, Fraser CM. Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature 1999; 399:323-9. [PMID: 10360571 DOI: 10.1038/20601] [Citation(s) in RCA: 1193] [Impact Index Per Article: 47.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The 1,860,725-base-pair genome of Thermotoga maritima MSB8 contains 1,877 predicted coding regions, 1,014 (54%) of which have functional assignments and 863 (46%) of which are of unknown function. Genome analysis reveals numerous pathways involved in degradation of sugars and plant polysaccharides, and 108 genes that have orthologues only in the genomes of other thermophilic Eubacteria and Archaea. Of the Eubacteria sequenced to date, T. maritima has the highest percentage (24%) of genes that are most similar to archaeal genes. Eighty-one archaeal-like genes are clustered in 15 regions of the T. maritima genome that range in size from 4 to 20 kilobases. Conservation of gene order between T. maritima and Archaea in many of the clustered regions suggests that lateral gene transfer may have occurred between thermophilic Eubacteria and Archaea.
Collapse
|
140
|
Jing J, Lai Z, Aston C, Lin J, Carucci DJ, Gardner MJ, Mishra B, Anantharaman TS, Tettelin H, Cummings LM, Hoffman SL, Venter JC, Schwartz DC. Optical mapping of Plasmodium falciparum chromosome 2. Genome Res 1999; 9:175-81. [PMID: 10022982 PMCID: PMC310721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/1998] [Accepted: 12/15/1998] [Indexed: 02/10/2023]
Abstract
Detailed restriction maps of microbial genomes are a valuable resource in genome sequencing studies but are toilsome to construct by contig construction of maps derived from cloned DNA. Analysis of genomic DNA enables large stretches of the genome to be mapped and circumvents library construction and associated cloning artifacts. We used pulsed-field gel electrophoresis purified Plasmodium falciparum chromosome 2 DNA as the starting material for optical mapping, a system for making ordered restriction maps from ensembles of individual DNA molecules. DNA molecules were bound to derivatized glass surfaces, cleaved with NheI or BamHI, and imaged by digital fluorescence microscopy. Large pieces of the chromosome containing ordered DNA restriction fragments were mapped. Maps were assembled from 50 molecules producing an average contig depth of 15 molecules and high-resolution restriction maps covering the entire chromosome. Chromosome 2 was found to be 976 kb by optical mapping with NheI, and 946 kb with BamHI, which compares closely to the published size of 947 kb from large-scale sequencing. The maps were used to further verify assemblies from the plasmid library used for sequencing. Maps generated in silico from the sequence data were compared to the optical mapping data, and good correspondence was found. Such high-resolution restriction maps may become an indispensable resource for large-scale genome sequencing projects.
Collapse
|
141
|
Tomita M, Hashimoto K, Takahashi K, Shimizu TS, Matsuzaki Y, Miyoshi F, Saito K, Tanida S, Yugi K, Venter JC, Hutchison CA. E-CELL: software environment for whole-cell simulation. Bioinformatics 1999; 15:72-84. [PMID: 10068694 DOI: 10.1093/bioinformatics/15.1.72] [Citation(s) in RCA: 327] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Genome sequencing projects and further systematic functional analyses of complete gene sets are producing an unprecedented mass of molecular information for a wide range of model organisms. This provides us with a detailed account of the cell with which we may begin to build models for simulating intracellular molecular processes to predict the dynamic behavior of living cells. Previous work in biochemical and genetic simulation has isolated well-characterized pathways for detailed analysis, but methods for building integrative models of the cell that incorporate gene regulation, metabolism and signaling have not been established. We, therefore, were motivated to develop a software environment for building such integrative models based on gene sets, and running simulations to conduct experiments in silico. RESULTS E-CELL, a modeling and simulation environment for biochemical and genetic processes, has been developed. The E-CELL system allows a user to define functions of proteins, protein-protein interactions, protein-DNA interactions, regulation of gene expression and other features of cellular metabolism, as a set of reaction rules. E-CELL simulates cell behavior by numerically integrating the differential equations described implicitly in these reaction rules. The user can observe, through a computer display, dynamic changes in concentrations of proteins, protein complexes and other chemical compounds in the cell. Using this software, we constructed a model of a hypothetical cell with only 127 genes sufficient for transcription, translation, energy production and phospholipid synthesis. Most of the genes are taken from Mycoplasma genitalium, the organism having the smallest known chromosome, whose complete 580 kb genome sequence was determined at TIGR in 1995. We discuss future applications of the E-CELL system with special respect to genome engineering. AVAILABILITY The E-CELL software is available upon request. SUPPLEMENTARY INFORMATION The complete list of rules of the developed cell model with kinetic parameters can be obtained via our web site at: http://e-cell.org/.
Collapse
|
142
|
Hoffman SL, Rogers WO, Carucci DJ, Venter JC. From genomics to vaccines: malaria as a model system. Nat Med 1998; 4:1351-3. [PMID: 9846563 DOI: 10.1038/3934] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
143
|
Gardner MJ, Tettelin H, Carucci DJ, Cummings LM, Aravind L, Koonin EV, Shallom S, Mason T, Yu K, Fujii C, Pederson J, Shen K, Jing J, Aston C, Lai Z, Schwartz DC, Pertea M, Salzberg S, Zhou L, Sutton GG, Clayton R, White O, Smith HO, Fraser CM, Adams MD, Venter JC, Hoffman SL. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science 1998; 282:1126-32. [PMID: 9804551 DOI: 10.1126/science.282.5391.1126] [Citation(s) in RCA: 370] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Chromosome 2 of Plasmodium falciparum was sequenced; this sequence contains 947,103 base pairs and encodes 210 predicted genes. In comparison with the Saccharomyces cerevisiae genome, chromosome 2 has a lower gene density, introns are more frequent, and proteins are markedly enriched in nonglobular domains. A family of surface proteins, rifins, that may play a role in antigenic variation was identified. The complete sequencing of chromosome 2 has shown that sequencing of the A+T-rich P. falciparum genome is technically feasible.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Antigens, Protozoan/chemistry
- Antigens, Protozoan/genetics
- Base Composition
- Chromosomes/genetics
- Evolution, Molecular
- Genes, Protozoan
- Genome, Protozoan
- Introns
- Membrane Proteins/chemistry
- Membrane Proteins/genetics
- Molecular Sequence Data
- Multigene Family
- Physical Chromosome Mapping
- Plasmodium falciparum/genetics
- Protozoan Proteins/chemistry
- Protozoan Proteins/genetics
- RNA, Protozoan/genetics
- RNA, Transfer, Glu/genetics
- Repetitive Sequences, Nucleic Acid
- Reverse Transcriptase Polymerase Chain Reaction
- Sequence Alignment
- Sequence Analysis, DNA
Collapse
|
144
|
Dougherty BA, Hill C, Weidman JF, Richardson DR, Venter JC, Ross RP. Sequence and analysis of the 60 kb conjugative, bacteriocin-producing plasmid pMRC01 from Lactococcus lactis DPC3147. Mol Microbiol 1998; 29:1029-38. [PMID: 9767571 DOI: 10.1046/j.1365-2958.1998.00988.x] [Citation(s) in RCA: 135] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The complete sequence of pMRC01, a large conjugative plasmid from Lactococcus lactis ssp. lactis DPC3147, has been determined. Using a shotgun sequencing approach, the 60,232 bp plasmid sequence was obtained by the assembly of 1056 underlying sequences (sevenfold average redundancy). Sixty-four open reading frames (ORFs) were identified. Analysis of the gene organization of pMRC01 suggests that the plasmid can be divided into three functional domains, with each approximately 20 kb region separated by insertion sequence (IS) elements. The three regions are (i) the conjugative transfer region, including a 16-gene Tra (transfer) operon; (ii) the bacteriocin production region, including an operon responsible for the synthesis of the novel bacteriocin lacticin 3147; and (iii) the phage resistance and plasmid replication region of the plasmid. The complete sequence of pMRC01 provides important information about these industrially relevant phenotypes and gives insight into the structure, function and evolution of large gram-positive conjugative plasmids in general. The completely sequenced pMRC01 plasmid should also provide a useful framework for the design of novel plasmids to be incorporated into starter strain improvement programmes for the dairy industry.
Collapse
|
145
|
Fraser CM, Norris SJ, Weinstock GM, White O, Sutton GG, Dodson R, Gwinn M, Hickey EK, Clayton R, Ketchum KA, Sodergren E, Hardham JM, McLeod MP, Salzberg S, Peterson J, Khalak H, Richardson D, Howell JK, Chidambaram M, Utterback T, McDonald L, Artiach P, Bowman C, Cotton MD, Fujii C, Garland S, Hatch B, Horst K, Roberts K, Sandusky M, Weidman J, Smith HO, Venter JC. Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 1998; 281:375-88. [PMID: 9665876 DOI: 10.1126/science.281.5375.375] [Citation(s) in RCA: 697] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The complete genome sequence of Treponema pallidum was determined and shown to be 1,138,006 base pairs containing 1041 predicted coding sequences (open reading frames). Systems for DNA replication, transcription, translation, and repair are intact, but catabolic and biosynthetic activities are minimized. The number of identifiable transporters is small, and no phosphoenolpyruvate:phosphotransferase carbohydrate transporters were found. Potential virulence factors include a family of 12 potential membrane proteins and several putative hemolysins. Comparison of the T. pallidum genome sequence with that of another pathogenic spirochete, Borrelia burgdorferi, the agent of Lyme disease, identified unique and common genes and substantiates the considerable diversity observed among pathogenic spirochetes.
Collapse
|
146
|
Venter JC, Adams MD, Sutton GG, Kerlavage AR, Smith HO, Hunkapiller M. Shotgun sequencing of the human genome. Science 1998; 280:1540-2. [PMID: 9644018 DOI: 10.1126/science.280.5369.1540] [Citation(s) in RCA: 168] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
147
|
Gardner MJ, Tettelin H, Carucci DJ, Cummings LM, Adams MD, Smith HO, Craig Venter J, Hoffman SL. The Malaria Genome Sequencing Project. Protist 1998. [DOI: 10.1016/s1434-4610(98)70014-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
148
|
Fraser CM, Casjens S, Huang WM, Sutton GG, Clayton R, Lathigra R, White O, Ketchum KA, Dodson R, Hickey EK, Gwinn M, Dougherty B, Tomb JF, Fleischmann RD, Richardson D, Peterson J, Kerlavage AR, Quackenbush J, Salzberg S, Hanson M, van Vugt R, Palmer N, Adams MD, Gocayne J, Weidman J, Utterback T, Watthey L, McDonald L, Artiach P, Bowman C, Garland S, Fuji C, Cotton MD, Horst K, Roberts K, Hatch B, Smith HO, Venter JC. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 1997; 390:580-6. [PMID: 9403685 DOI: 10.1038/37551] [Citation(s) in RCA: 1498] [Impact Index Per Article: 55.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The genome of the bacterium Borrelia burgdorferi B31, the aetiologic agent of Lyme disease, contains a linear chromosome of 910,725 base pairs and at least 17 linear and circular plasmids with a combined size of more than 533,000 base pairs. The chromosome contains 853 genes encoding a basic set of proteins for DNA replication, transcription, translation, solute transport and energy metabolism, but, like Mycoplasma genitalium, it contains no genes for cellular biosynthetic reactions. Because B. burgdorferi and M. genitalium are distantly related eubacteria, we suggest that their limited metabolic capacities reflect convergent evolution by gene loss from more metabolically competent progenitors. Of 430 genes on 11 plasmids, most have no known biological function; 39% of plasmid genes are paralogues that form 47 gene families. The biological significance of the multiple plasmid-encoded genes is not clear, although they may be involved in antigenic variation or immune evasion.
Collapse
|
149
|
Caplan A, Venter JC. Using One's Head. Science 1997. [DOI: 10.1126/science.278.5343.1547-b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
150
|
Klenk HP, Clayton RA, Tomb JF, White O, Nelson KE, Ketchum KA, Dodson RJ, Gwinn M, Hickey EK, Peterson JD, Richardson DL, Kerlavage AR, Graham DE, Kyrpides NC, Fleischmann RD, Quackenbush J, Lee NH, Sutton GG, Gill S, Kirkness EF, Dougherty BA, McKenney K, Adams MD, Loftus B, Peterson S, Reich CI, McNeil LK, Badger JH, Glodek A, Zhou L, Overbeek R, Gocayne JD, Weidman JF, McDonald L, Utterback T, Cotton MD, Spriggs T, Artiach P, Kaine BP, Sykes SM, Sadow PW, D'Andrea KP, Bowman C, Fujii C, Garland SA, Mason TM, Olsen GJ, Fraser CM, Smith HO, Woese CR, Venter JC. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 1997; 390:364-70. [PMID: 9389475 DOI: 10.1038/37052] [Citation(s) in RCA: 990] [Impact Index Per Article: 36.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Archaeoglobus fulgidus is the first sulphur-metabolizing organism to have its genome sequence determined. Its genome of 2,178,400 base pairs contains 2,436 open reading frames (ORFs). The information processing systems and the biosynthetic pathways for essential components (nucleotides, amino acids and cofactors) have extensive correlation with their counterparts in the archaeon Methanococcus jannaschii. The genomes of these two Archaea indicate dramatic differences in the way these organisms sense their environment, perform regulatory and transport functions, and gain energy. In contrast to M. jannaschii, A. fulgidus has fewer restriction-modification systems, and none of its genes appears to contain inteins. A quarter (651 ORFs) of the A. fulgidus genome encodes functionally uncharacterized yet conserved proteins, two-thirds of which are shared with M. jannaschii (428 ORFs). Another quarter of the genome encodes new proteins indicating substantial archaeal gene diversity.
Collapse
|