51
|
Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, Venter JC, Eisen JA. Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees. PLoS One 2011; 6:e18011. [PMID: 21437252 PMCID: PMC3060911 DOI: 10.1371/journal.pone.0018011] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2010] [Accepted: 02/20/2011] [Indexed: 02/01/2023] Open
Abstract
Background Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species. Methodology/Principal Findings We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences. Conclusions/Significance Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.
Collapse
|
52
|
Venter JC. Genome-sequencing anniversary. The human genome at 10: successes and challenges. Science 2011; 331:546-7. [PMID: 21292962 DOI: 10.1126/science.1202812] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
53
|
|
54
|
Gibson DG, Glass JI, Lartigue C, Noskov VN, Chuang RY, Algire MA, Benders GA, Montague MG, Ma L, Moodie MM, Merryman C, Vashee S, Krishnakumar R, Assad-Garcia N, Andrews-Pfannkoch C, Denisova EA, Young L, Qi ZQ, Segall-Shapiro TH, Calvey CH, Parmar PP, Hutchison CA, Smith HO, Venter JC. Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome. Science 2010; 329:52-6. [PMID: 20488990 DOI: 10.1126/science.1190719] [Citation(s) in RCA: 1317] [Impact Index Per Article: 94.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
55
|
|
56
|
Pang AW, MacDonald JR, Pinto D, Wei J, Rafiq MA, Conrad DF, Park H, Hurles ME, Lee C, Venter JC, Kirkness EF, Levy S, Feuk L, Scherer SW. Towards a comprehensive structural variation map of an individual human genome. Genome Biol 2010; 11:R52. [PMID: 20482838 PMCID: PMC2898065 DOI: 10.1186/gb-2010-11-5-r52] [Citation(s) in RCA: 202] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2010] [Revised: 04/11/2010] [Accepted: 05/19/2010] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions. RESULTS We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association. CONCLUSIONS Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.
Collapse
|
57
|
Benders GA, Noskov VN, Denisova EA, Lartigue C, Gibson DG, Assad-Garcia N, Chuang RY, Carrera W, Moodie M, Algire MA, Phan Q, Alperovich N, Vashee S, Merryman C, Venter JC, Smith HO, Glass JI, Hutchison CA. Cloning whole bacterial genomes in yeast. Nucleic Acids Res 2010; 38:2558-69. [PMID: 20211840 PMCID: PMC2860123 DOI: 10.1093/nar/gkq119] [Citation(s) in RCA: 109] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2009] [Revised: 02/08/2010] [Accepted: 02/09/2010] [Indexed: 01/21/2023] Open
Abstract
Most microbes have not been cultured, and many of those that are cultivatable are difficult, dangerous or expensive to propagate or are genetically intractable. Routine cloning of large genome fractions or whole genomes from these organisms would significantly enhance their discovery and genetic and functional characterization. Here we report the cloning of whole bacterial genomes in the yeast Saccharomyces cerevisiae as single-DNA molecules. We cloned the genomes of Mycoplasma genitalium (0.6 Mb), M. pneumoniae (0.8 Mb) and M. mycoides subspecies capri (1.1 Mb) as yeast circular centromeric plasmids. These genomes appear to be stably maintained in a host that has efficient, well-established methods for DNA manipulation.
Collapse
|
58
|
Glass JI, Hutchison CA, Smith HO, Venter JC. A systems biology tour de force for a near-minimal bacterium. Mol Syst Biol 2009; 5:330. [PMID: 19953084 PMCID: PMC2824490 DOI: 10.1038/msb.2009.89] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
59
|
Lartigue C, Vashee S, Algire MA, Chuang RY, Benders GA, Ma L, Noskov VN, Denisova EA, Gibson DG, Assad-Garcia N, Alperovich N, Thomas DW, Merryman C, Hutchison CA, Smith HO, Venter JC, Glass JI. Creating Bacterial Strains from Genomes That Have Been Cloned and Engineered in Yeast. Science 2009; 325:1693-6. [PMID: 19696314 DOI: 10.1126/science.1173759] [Citation(s) in RCA: 195] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
60
|
Axelrod N, Lin Y, Ng PC, Stockwell TB, Crabtree J, Huang J, Kirkness E, Strausberg RL, Frazier ME, Venter JC, Kravitz S, Levy S. The HuRef Browser: a web resource for individual human genomics. Nucleic Acids Res 2008; 37:D1018-24. [PMID: 19036787 PMCID: PMC2686481 DOI: 10.1093/nar/gkn939] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The HuRef Genome Browser is a web application for the navigation and analysis of the previously published genome of a human individual, termed HuRef. The browser provides a comparative view between the NCBI human reference sequence and the HuRef assembly, and it enables the navigation of the HuRef genome in the context of HuRef, NCBI and Ensembl annotations. Single nucleotide polymorphisms, indels, inversions, structural and copy-number variations are shown in the context of existing functional annotations on either genome in the comparative view. Demonstrated here are some potential uses of the browser to enable a better understanding of individual human genetic variation. The browser provides full access to the underlying reads with sequence and quality information, the genome assembly and the evidence supporting the identification of DNA polymorphisms. The HuRef Browser is a unique and versatile tool for browsing genome assemblies and studying individual human sequence variation in a diploid context. The browser is available online at http://huref.jcvi.org.
Collapse
|
61
|
Ng PC, Zhao Q, Levy S, Strausberg RL, Venter JC. Individual genomes instead of race for personalized medicine. Clin Pharmacol Ther 2008; 84:306-9. [PMID: 18714319 DOI: 10.1038/clpt.2008.114] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The cost of sequencing and genotyping is aggressively decreasing, enabling pervasive personalized genomic screening for drug reactions. Drug-metabolizing genes have been characterized sufficiently to enable practitioners to go beyond simplistic ethnic characterization and into the precisely targeted world of personal genomics. We examine six drug-metabolizing genes in J. Craig Venter and James Watson, two Caucasian men whose genomes were recently sequenced. Their genetic differences underscore the importance of personalized genomics over a race-based approach to medicine. To attain truly personalized medicine, the scientific community must aim to elucidate the genetic and environmental factors that contribute to drug reactions and not be satisfied with a simple race-based approach.
Collapse
|
62
|
Shaw AK, Halpern AL, Beeson K, Tran B, Venter JC, Martiny JBH. It's all relative: ranking the diversity of aquatic bacterial communities. Environ Microbiol 2008; 10:2200-10. [PMID: 18637951 DOI: 10.1111/j.1462-2920.2008.01626.x] [Citation(s) in RCA: 147] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The study of microbial diversity patterns is hampered by the enormous diversity of microbial communities and the lack of resources to sample them exhaustively. For many questions about richness and evenness, however, one only needs to know the relative order of diversity among samples rather than total diversity. We used 16S libraries from the Global Ocean Survey to investigate the ability of 10 diversity statistics (including rarefaction, non-parametric, parametric, curve extrapolation and diversity indices) to assess the relative diversity of six aquatic bacterial communities. Overall, we found that the statistics yielded remarkably similar rankings of the samples for a given sequence similarity cut-off. This correspondence, despite the different underlying assumptions of the statistics, suggests that diversity statistics are a useful tool for ranking samples of microbial diversity. In addition, sequence similarity cut-off influenced the diversity ranking of the samples, demonstrating that diversity statistics can also be used to detect differences in phylogenetic structure among microbial communities. Finally, a subsampling analysis suggests that further sequencing from these particular clone libraries would not have substantially changed the richness rankings of the samples.
Collapse
|
63
|
Harrison LC, Callaghan J, Venter JC, Fraser CM, Kaliner ML. Atopy, autonomic function and beta-adrenergic receptor autoantibodies. CIBA FOUNDATION SYMPOSIUM 2008:248-62. [PMID: 6291881 DOI: 10.1002/9780470720721.ch14] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Atopic individuals (with asthma, allergic rhinitis or atopic eczema) have impaired sensitivity to beta-adrenergic agents. After the finding of antibodies to the beta-adrenergic receptor in the serum of a subject with allergic rhinitis, coded sera from atopic and control subjects were assayed for immunoglobulins that inhibited the specific binding of 125I-labelled hydroxybenzylpindolol to beta-receptors in mammalian lung membranes. Antibodies were present in nine of 60 subjects: 3/19 normal control subjects, 1/9 pre-allergic, 4/17 asthma, 0/8 allergic rhinitis, and 1/7 cystic fibrosis patients. Antibodies of the IgG class in these sera were also demonstrated by indirect precipitation of solubilized lung beta-receptors. The autonomic sensitivity of the nine antibody-positive subjects (Ab+) was compared with that of antibody-negative subjects (Ab-). The Ab+ subjects required 15.0 +/- 1.9 ng isoprenaline (isoproterenol) kg-1 min-1 i.v. to increase pulse pressure by at least 22 mmHg (Ab-, 7.7 +/- 0.4; n = 20; P less than 0.001), and 12.4 +/- 1.8 ng isoprenaline kg-1 min-1 i.v. to increase plasma cyclic AMP concentrations by 50% (Ab-, 8.08 +/- 0.62; n = 13; P less than 0.02). Ab+ subjects required 2.06 +/- 0.3% phenylephrine to dilate their pupils (Ab-, 2.55 +/- 0.08; n = 57; P less than 0.05) and 0.61 +/- 0.08% carbachol to constrict their pupils (Ab-, 0.78 +/- 0.03%; n = 57; P less than 0.05). A role for autoantibodies as beta-receptor antagonists was further supported by showing that human lung cells (VA-13 line) cultured in the presence of globulins from Ab+ subjects had a markedly impaired cyclic AMP response to isoprenaline. These results suggest that autoantibodies to beta-receptors play a pathogenetic role in asthma and related disorders. They have important implications for the concept of autoimmunity.
Collapse
|
64
|
Fedorova ND, Khaldi N, Joardar VS, Maiti R, Amedeo P, Anderson MJ, Crabtree J, Silva JC, Badger JH, Albarraq A, Angiuoli S, Bussey H, Bowyer P, Cotty PJ, Dyer PS, Egan A, Galens K, Fraser-Liggett CM, Haas BJ, Inman JM, Kent R, Lemieux S, Malavazi I, Orvis J, Roemer T, Ronning CM, Sundaram JP, Sutton G, Turner G, Venter JC, White OR, Whitty BR, Youngman P, Wolfe KH, Goldman GH, Wortman JR, Jiang B, Denning DW, Nierman WC. Genomic islands in the pathogenic filamentous fungus Aspergillus fumigatus. PLoS Genet 2008; 4:e1000046. [PMID: 18404212 PMCID: PMC2289846 DOI: 10.1371/journal.pgen.1000046] [Citation(s) in RCA: 357] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2007] [Accepted: 03/04/2008] [Indexed: 01/23/2023] Open
Abstract
We present the genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of A1163 with the recently sequenced A. fumigatus isolate Af293 has identified core, variable and up to 2% unique genes in each genome. While the core genes are 99.8% identical at the nucleotide level, identity for variable genes can be as low 40%. The most divergent loci appear to contain heterokaryon incompatibility (het) genes associated with fungal programmed cell death such as developmental regulator rosA. Cross-species comparison has revealed that 8.5%, 13.5% and 12.6%, respectively, of A. fumigatus, N. fischeri and A. clavatus genes are species-specific. These genes are significantly smaller in size than core genes, contain fewer exons and exhibit a subtelomeric bias. Most of them cluster together in 13 chromosomal islands, which are enriched for pseudogenes, transposons and other repetitive elements. At least 20% of A. fumigatus-specific genes appear to be functional and involved in carbohydrate and chitin catabolism, transport, detoxification, secondary metabolism and other functions that may facilitate the adaptation to heterogeneous environments such as soil or a mammalian host. Contrary to what was suggested previously, their origin cannot be attributed to horizontal gene transfer (HGT), but instead is likely to involve duplication, diversification and differential gene loss (DDL). The role of duplication in the origin of lineage-specific genes is further underlined by the discovery of genomic islands that seem to function as designated “gene dumps” and, perhaps, simultaneously, as “gene factories”. Aspergillus is an extremely diverse genus of filamentous ascomycetous fungi (molds) found ubiquitously in soil and decomposing vegetation. Being supreme opportunists, aspergilli have adapted to overcome various chemical, physical, and biological stresses found in heterogeneous environments. While most species in the genus are saprophytes, a surprising number are able to infect wounded plants and animals. Remarkably, the allergic human host also responds abnormally to the aspergilli with lung and sinus disease. The advent of immunosuppressive agents and other medical advances have created a large worldwide pool of human hosts susceptible to some Aspergillus species, including the world's most harmful mold and the causative agent of invasive aspergillosis, Aspergillus fumigatus. In this study, we have used the power of comparative genomics to gain insight into genetic mechanisms that may contribute to the metabolic versatility and pathogenicity of this important human pathogen. Comparison of the genomes of two A. fumigatus clinical isolates and two closely related, but rarely pathogenic species showed that their genomes contain several large isolate- and species-specific chromosomal islands. The metabolic capabilities encoded by these highly labile regions are likely to contribute to their rapid adaptation to heterogeneous environments such as soil or a living host.
Collapse
|
65
|
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AWC, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC. The diploid genome sequence of an individual human. PLoS Biol 2008; 5:e254. [PMID: 17803354 PMCID: PMC1964779 DOI: 10.1371/journal.pbio.0050254] [Citation(s) in RCA: 1114] [Impact Index Per Article: 69.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2007] [Accepted: 07/30/2007] [Indexed: 01/20/2023] Open
Abstract
Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.
Collapse
|
66
|
Kannan N, Wu J, Anand GS, Yooseph S, Neuwald AF, Venter JC, Taylor SS. Evolution of allostery in the cyclic nucleotide binding module. Genome Biol 2008; 8:R264. [PMID: 18076763 PMCID: PMC2246266 DOI: 10.1186/gb-2007-8-12-r264] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2007] [Revised: 11/18/2007] [Accepted: 12/12/2007] [Indexed: 11/10/2022] Open
Abstract
Analysis of cyclic nucleotide binding (CNB) domains shows that they have evolved to sense a wide variety of second messenger signals; a mechanism for allosteric regulation by CNB domains is proposed. Background The cyclic nucleotide binding (CNB) domain regulates signaling pathways in both eukaryotes and prokaryotes. In this study, we analyze the evolutionary information embedded in genomic sequences to explore the diversity of signaling through the CNB domain and also how the CNB domain elicits a cellular response upon binding to cAMP. Results Identification and classification of CNB domains in Global Ocean Sampling and other protein sequences reveals that they typically are fused to a wide variety of functional domains. CNB domains have undergone major sequence variation during evolution. In particular, the sequence motif that anchors the cAMP phosphate (termed the PBC motif) is strikingly different in some families. This variation may contribute to ligand specificity inasmuch as members of the prokaryotic cooA family, for example, harbor a CNB domain that contains a non-canonical PBC motif and that binds a heme ligand in the cAMP binding pocket. Statistical comparison of the functional constraints imposed on the canonical and non-canonical PBC containing sequences reveals that a key arginine, which coordinates with the cAMP phosphate, has co-evolved with a glycine in a distal β2-β3 loop that allosterically couples cAMP binding to distal regulatory sites. Conclusion Our analysis suggests that CNB domains have evolved as a scaffold to sense a wide variety of second messenger signals. Based on sequence, structural and biochemical data, we propose a mechanism for allosteric regulation by CNB domains.
Collapse
|
67
|
Kannan N, Wu J, Yooseph S, Neuwald AF, Venter JC, Taylor SS. Evolution of allostery in the cyclic nucleotide binding module: A comparative genomics study. FASEB J 2008. [DOI: 10.1096/fasebj.22.1_supplement.828.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
68
|
Gibson DG, Benders GA, Andrews-Pfannkoch C, Denisova EA, Baden-Tillson H, Zaveri J, Stockwell TB, Brownley A, Thomas DW, Algire MA, Merryman C, Young L, Noskov VN, Glass JI, Venter JC, Hutchison CA, Smith HO. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 2008; 319:1215-20. [PMID: 18218864 DOI: 10.1126/science.1151721] [Citation(s) in RCA: 757] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
We have synthesized a 582,970-base pair Mycoplasma genitalium genome. This synthetic genome, named M. genitalium JCVI-1.0, contains all the genes of wild-type M. genitalium G37 except MG408, which was disrupted by an antibiotic marker to block pathogenicity and to allow for selection. To identify the genome as synthetic, we inserted "watermarks" at intergenic sites known to tolerate transposon insertions. Overlapping "cassettes" of 5 to 7 kilobases (kb), assembled from chemically synthesized oligonucleotides, were joined by in vitro recombination to produce intermediate assemblies of approximately 24 kb, 72 kb ("1/8 genome"), and 144 kb ("1/4 genome"), which were all cloned as bacterial artificial chromosomes in Escherichia coli. Most of these intermediate clones were sequenced, and clones of all four 1/4 genomes with the correct sequence were identified. The complete synthetic genome was assembled by transformation-associated recombination cloning in the yeast Saccharomyces cerevisiae, then isolated and sequenced. A clone with the correct sequence was identified. The methods described here will be generally useful for constructing large DNA molecules from chemically synthesized pieces and also from combinations of natural and synthetic DNA segments.
Collapse
|
69
|
Williamson SJ, Rusch DB, Yooseph S, Halpern AL, Heidelberg KB, Glass JI, Andrews-Pfannkoch C, Fadrosh D, Miller CS, Sutton G, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples. PLoS One 2008; 3:e1456. [PMID: 18213365 PMCID: PMC2186209 DOI: 10.1371/journal.pone.0001456] [Citation(s) in RCA: 214] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2007] [Accepted: 12/12/2007] [Indexed: 12/02/2022] Open
Abstract
Viruses are the most abundant biological entities on our planet. Interactions between viruses and their hosts impact several important biological processes in the world's oceans such as horizontal gene transfer, microbial diversity and biogeochemical cycling. Interrogation of microbial metagenomic sequence data collected as part of the Sorcerer II Global Ocean Expedition (GOS) revealed a high abundance of viral sequences, representing approximately 3% of the total predicted proteins. Cluster analyses of the viral sequences revealed hundreds to thousands of viral genes encoding various metabolic and cellular functions. Quantitative analyses of viral genes of host origin performed on the viral fraction of aquatic samples confirmed the viral nature of these sequences and suggested that significant portions of aquatic viral communities behave as reservoirs of such genetic material. Distributional and phylogenetic analyses of these host-derived viral sequences also suggested that viral acquisition of environmentally relevant genes of host origin is a more abundant and widespread phenomenon than previously appreciated. The predominant viral sequences identified within microbial fractions originated from tailed bacteriophages and exhibited varying global distributions according to viral family. Recruitment of GOS viral sequence fragments against 27 complete aquatic viral genomes revealed that only one reference bacteriophage genome was highly abundant and was closely related, but not identical, to the cyanomyovirus P-SSM4. The co-distribution across all sampling sites of P-SSM4-like sequences with the dominant ecotype of its host, Prochlorococcus supports the classification of the viral sequences as P-SSM4-like and suggests that this virus may influence the abundance, distribution and diversity of one of the most dominant components of picophytoplankton in oligotrophic oceans. In summary, the abundance and broad geographical distribution of viral sequences within microbial fractions, the prevalence of genes among viral sequences that encode microbial physiological function and their distinct phylogenetic distribution lend strong support to the notion that viral-mediated gene acquisition is a common and ongoing mechanism for generating microbial diversity in the marine environment.
Collapse
|
70
|
Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, Strausberg RL, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol 2007; 5:e16. [PMID: 17355171 PMCID: PMC1821046 DOI: 10.1371/journal.pbio.0050016] [Citation(s) in RCA: 667] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2006] [Accepted: 08/15/2006] [Indexed: 02/04/2023] Open
Abstract
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature. The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. Given the wide-ranging roles microbes play in many ecosystems, metagenomics studies of microbial communities will reveal insights into protein families and their evolution. Because most microbes will not grow in the laboratory using current cultivation techniques, scientists have turned to cultivation-independent techniques to study microbial diversity. One such technique—shotgun sequencing—allows random sampling of DNA sequences to examine the genomic material present in a microbial community. We used shotgun sequencing to examine microbial communities in water samples collected by the Sorcerer II Global Ocean Sampling (GOS) expedition. Our analysis predicted more than six million proteins in the GOS data—nearly twice the number of proteins present in current databases. These predictions add tremendous diversity to known protein families and cover nearly all known prokaryotic protein families. Some of the predicted proteins had no similarity to any currently known proteins and therefore represent new families. A higher than expected fraction of these novel families is predicted to be of viral origin. We also found that several protein domains that were previously thought to be kingdom specific have GOS examples in other kingdoms. Our analysis opens the door for a multitude of follow-up protein family analyses and indicates that we are a long way from sampling all the protein families that exist in nature. The GOS data identified 6.12 million predicted proteins covering nearly all known prokaryotic protein families, and several new families. This almost doubles the number of known proteins and shows that we are far from identifying all the proteins in nature.
Collapse
|
71
|
Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers YH, Falcón LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, Platt T, Bermingham E, Gallardo V, Tamayo-Castillo G, Ferrari MR, Strausberg RL, Nealson K, Friedman R, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol 2007; 5:e77. [PMID: 17355176 PMCID: PMC1821060 DOI: 10.1371/journal.pbio.0050077] [Citation(s) in RCA: 1304] [Impact Index Per Article: 76.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2006] [Accepted: 01/16/2007] [Indexed: 11/19/2022] Open
Abstract
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed “fragment recruitment,” addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed “extreme assembly,” made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS. Marine microbes remain elusive and mysterious, even though they are the most abundant life form in the ocean, form the base of the marine food web, and drive energy and nutrient cycling. We know so little about the vast majority of microbes because only a small percentage can be cultivated and studied in the lab. Here we report on the Global Ocean Sampling expedition, an environmental metagenomics project that aims to shed light on the role of marine microbes by sequencing their DNA without first needing to isolate individual organisms. A total of 41 different samples were taken from a wide variety of aquatic habitats collected over 8,000 km. The resulting 7.7 million sequencing reads provide an unprecedented look at the incredible diversity and heterogeneity in naturally occurring microbial populations. We have developed new bioinformatic methods to reconstitute large portions of both cultured and uncultured microbial genomes. Organism diversity is analyzed in relation to sampling locations and environmental pressures. Taken together, these data and analyses serve as a foundation for greatly expanding our understanding of individual microbial lineages and their evolution, the nature of marine microbial communities, and how they are impacted by and impact our world. TheSorcerer II GOS expedition, data sampling, and analysis is described. The immense diversity in the sequence data required novel comparative genomic assembly methods, which uncovered genomic differences that marker-based methods could not.
Collapse
|
72
|
Kannan N, Taylor SS, Zhai Y, Venter JC, Manning G. Structural and functional diversity of the microbial kinome. PLoS Biol 2007; 5:e17. [PMID: 17355172 PMCID: PMC1821047 DOI: 10.1371/journal.pbio.0050017] [Citation(s) in RCA: 227] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Accepted: 09/20/2006] [Indexed: 11/19/2022] Open
Abstract
The eukaryotic protein kinase (ePK) domain mediates the majority of signaling and coordination of complex events in eukaryotes. By contrast, most bacterial signaling is thought to occur through structurally unrelated histidine kinases, though some ePK-like kinases (ELKs) and small molecule kinases are known in bacteria. Our analysis of the Global Ocean Sampling (GOS) dataset reveals that ELKs are as prevalent as histidine kinases and may play an equally important role in prokaryotic behavior. By combining GOS and public databases, we show that the ePK is just one subset of a diverse superfamily of enzymes built on a common protein kinase-like (PKL) fold. We explored this huge phylogenetic and functional space to cast light on the ancient evolution of this superfamily, its mechanistic core, and the structural basis for its observed diversity. We cataloged 27,677 ePKs and 18,699 ELKs, and classified them into 20 highly distinct families whose known members suggest regulatory functions. GOS data more than tripled the count of ELK sequences and enabled the discovery of novel families and classification and analysis of all ELKs. Comparison between and within families revealed ten key residues that are highly conserved across families. However, all but one of the ten residues has been eliminated in one family or another, indicating great functional plasticity. We show that loss of a catalytic lysine in two families is compensated by distinct mechanisms both involving other key motifs. This diverse superfamily serves as a model for further structural and functional analysis of enzyme evolution.
Collapse
|
73
|
Yutin N, Suzuki MT, Teeling H, Weber M, Venter JC, Rusch DB, Béjà O. Assessing diversity and biogeography of aerobic anoxygenic phototrophic bacteria in surface waters of the Atlantic and Pacific Oceans using the Global Ocean Sampling expedition metagenomes. Environ Microbiol 2007; 9:1464-75. [PMID: 17504484 DOI: 10.1111/j.1462-2920.2007.01265.x] [Citation(s) in RCA: 141] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Aerobic anoxygenic photosynthetic bacteria (AAnP) were recently proposed to be significant contributors to global oceanic carbon and energy cycles. However, AAnP abundance, spatial distribution, diversity and potential ecological importance remain poorly understood. Here we present metagenomic data from the Global Ocean Sampling expedition indicating that AAnP diversity and abundance vary in different oceanic regions. Furthermore, we show for the first time that the composition of AAnP assemblages change between different oceanic regions, with specific bacterial assemblages adapted to open ocean or coastal areas respectively. Our results support the notion that marine AAnP populations are complex and dynamic, and compose an important fraction of bacterioplankton assemblages in certain oceanic areas.
Collapse
|
74
|
Lartigue C, Glass JI, Alperovich N, Pieper R, Parmar PP, Hutchison CA, Smith HO, Venter JC. Genome transplantation in bacteria: changing one species to another. Science 2007; 317:632-8. [PMID: 17600181 DOI: 10.1126/science.1144622] [Citation(s) in RCA: 263] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
As a step toward propagation of synthetic genomes, we completely replaced the genome of a bacterial cell with one from another species by transplanting a whole genome as naked DNA. Intact genomic DNA from Mycoplasma mycoides large colony (LC), virtually free of protein, was transplanted into Mycoplasma capricolum cells by polyethylene glycol-mediated transformation. Cells selected for tetracycline resistance, carried by the M. mycoides LC chromosome, contain the complete donor genome and are free of detectable recipient genomic sequences. These cells that result from genome transplantation are phenotypically identical to the M. mycoides LC donor strain as judged by several criteria.
Collapse
|
75
|
Carucci DJ, Gardner MJ, Tettelin H, Cummings LM, Smith HO, Adams MD, Venter JC, Hoffman SL. Sequencing the genome of Plasmodium falciparum. Curr Opin Infect Dis 2007; 11:531-4. [PMID: 17033418 DOI: 10.1097/00001432-199810000-00003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Advances in microbial genomic sequencing have the potential to revolutionize the control of infectious diseases. Recently, a consortium of researchers and funding agencies from the United States and Great Britain have embarked on a project to sequence the genome from Plasmodium falciparum, the most important cause of human malaria. The Malaria Genome Sequencing Project has reached an important milestone with the completion of the entire DNA sequence and annotation of chromosome 2, a 950 kilobase chromosome of Plasmodium falciparum. This review article will provide an overview of the malaria genome sequencing project, highlight progress in the field of microbial sequencing, and suggest new directions for future malaria research.
Collapse
|
76
|
Venter JC. Time 100 scientists & thinkers. Svante Paabo. TIME 2007; 169:116. [PMID: 17536326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
|
77
|
Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, Dinh HH, Dugan-Rocha S, Fulton LA, Gabisi RA, Garner TT, Godfrey J, Hawes AC, Hernandez J, Hines S, Holder M, Hume J, Jhangiani SN, Joshi V, Khan ZM, Kirkness EF, Cree A, Fowler RG, Lee S, Lewis LR, Li Z, Liu YS, Moore SM, Muzny D, Nazareth LV, Ngo DN, Okwuonu GO, Pai G, Parker D, Paul HA, Pfannkoch C, Pohl CS, Rogers YH, Ruiz SJ, Sabo A, Santibanez J, Schneider BW, Smith SM, Sodergren E, Svatek AF, Utterback TR, Vattathil S, Warren W, White CS, Chinwalla AT, Feng Y, Halpern AL, Hillier LW, Huang X, Minx P, Nelson JO, Pepin KH, Qin X, Sutton GG, Venter E, Walenz BP, Wallis JW, Worley KC, Yang SP, Jones SM, Marra MA, Rocchi M, Schein JE, Baertsch R, Clarke L, Csürös M, Glasscock J, Harris RA, Havlak P, Jackson AR, Jiang H, Liu Y, Messina DN, Shen Y, Song HXZ, Wylie T, Zhang L, Birney E, Han K, Konkel MK, Lee J, Smit AFA, Ullmer B, Wang H, Xing J, Burhans R, Cheng Z, Karro JE, Ma J, Raney B, She X, Cox MJ, Demuth JP, Dumas LJ, Han SG, Hopkins J, Karimpour-Fard A, Kim YH, Pollack JR, Vinar T, Addo-Quaye C, Degenhardt J, Denby A, Hubisz MJ, Indap A, Kosiol C, Lahn BT, Lawson HA, Marklein A, Nielsen R, Vallender EJ, Clark AG, Ferguson B, Hernandez RD, Hirani K, Kehrer-Sawatzki H, Kolb J, Patil S, Pu LL, Ren Y, Smith DG, Wheeler DA, Schenck I, Ball EV, Chen R, Cooper DN, Giardine B, Hsu F, Kent WJ, Lesk A, Nelson DL, O'brien WE, Prüfer K, Stenson PD, Wallace JC, Ke H, Liu XM, Wang P, Xiang AP, Yang F, Barber GP, Haussler D, Karolchik D, Kern AD, Kuhn RM, Smith KE, Zwieg AS. Evolutionary and biomedical insights from the rhesus macaque genome. Science 2007; 316:222-34. [PMID: 17431167 DOI: 10.1126/science.1139247] [Citation(s) in RCA: 989] [Impact Index Per Article: 58.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
Collapse
|
78
|
Venkatesh B, Kirkness EF, Loh YH, Halpern AL, Lee AP, Johnson J, Dandona N, Viswanathan LD, Tay A, Venter JC, Strausberg RL, Brenner S. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii) genome. PLoS Biol 2007; 5:e101. [PMID: 17407382 PMCID: PMC1845163 DOI: 10.1371/journal.pbio.0050101] [Citation(s) in RCA: 265] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2006] [Accepted: 02/07/2007] [Indexed: 02/04/2023] Open
Abstract
Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras) provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage) and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.
Collapse
|
79
|
Venkatesh B, Kirkness EF, Loh YH, Halpern AL, Lee AP, Johnson J, Dandona N, Viswanathan LD, Tay A, Venter JC, Strausberg RL, Brenner S. Ancient noncoding elements conserved in the human genome. Science 2007; 314:1892. [PMID: 17185593 DOI: 10.1126/science.1130708] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Cartilaginous fishes represent the living group of jawed vertebrates that diverged from the common ancestor of human and teleost fish lineages about 530 million years ago. We generated approximately 1.4x genome sequence coverage for a cartilaginous fish, the elephant shark (Callorhinchus milii), and compared this genome with the human genome to identify conserved noncoding elements (CNEs). The elephant shark sequence revealed twice as many CNEs as were identified by whole-genome comparisons between teleost fishes and human. The ancient vertebrate-specific CNEs in the elephant shark and human genomes are likely to play key regulatory roles in vertebrate gene expression.
Collapse
|
80
|
Venter JC. An Ointment for the Fly. Science 2006. [DOI: 10.1126/science.1134998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Won for All
. How the
Drosophila
Genome Was Sequenced.
By Michael Ashburner
. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2006. 123 pp. $19.95, £11. ISBN 0-87969-802-0.
The author offers a short, lively account (almost resembling diary extracts) of the collaborative effort that sequenced the
Drosophila
genome.
Collapse
|
81
|
Goldberg SMD, Johnson J, Busam D, Feldblyum T, Ferriera S, Friedman R, Halpern A, Khouri H, Kravitz SA, Lauro FM, Li K, Rogers YH, Strausberg R, Sutton G, Tallon L, Thomas T, Venter E, Frazier M, Venter JC. A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc Natl Acad Sci U S A 2006; 103:11240-5. [PMID: 16840556 PMCID: PMC1544072 DOI: 10.1073/pnas.0604351103] [Citation(s) in RCA: 214] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Since its introduction a decade ago, whole-genome shotgun sequencing (WGS) has been the main approach for producing cost-effective and high-quality genome sequence data. Until now, the Sanger sequencing technology that has served as a platform for WGS has not been truly challenged by emerging technologies. The recent introduction of the pyrosequencing-based 454 sequencing platform (454 Life Sciences, Branford, CT) offers a very promising sequencing technology alternative for incorporation in WGS. In this study, we evaluated the utility and cost-effectiveness of a hybrid sequencing approach using 3730xl Sanger data and 454 data to generate higher-quality lower-cost assemblies of microbial genomes compared to current Sanger sequencing strategies alone.
Collapse
|
82
|
|
83
|
Glass JI, Assad-Garcia N, Alperovich N, Yooseph S, Lewis MR, Maruf M, Hutchison CA, Smith HO, Venter JC. Essential genes of a minimal bacterium. Proc Natl Acad Sci U S A 2006; 103:425-30. [PMID: 16407165 PMCID: PMC1324956 DOI: 10.1073/pnas.0510013103] [Citation(s) in RCA: 598] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Mycoplasma genitalium has the smallest genome of any organism that can be grown in pure culture. It has a minimal metabolism and little genomic redundancy. Consequently, its genome is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. Using global transposon mutagenesis, we isolated and characterized gene disruption mutants for 100 different nonessential protein-coding genes. None of the 43 RNA-coding genes were disrupted. Herein, we identify 382 of the 482 M. genitalium protein-coding genes as essential, plus five sets of disrupted genes that encode proteins with potentially redundant essential functions, such as phosphate transport. Genes encoding proteins of unknown function constitute 28% of the essential protein-coding genes set. Disruption of some genes accelerated M. genitalium growth.
Collapse
|
84
|
Hutchison CA, Smith HO, Pfannkoch C, Venter JC. Cell-free cloning using phi29 DNA polymerase. Proc Natl Acad Sci U S A 2005; 102:17332-6. [PMID: 16286637 PMCID: PMC1283157 DOI: 10.1073/pnas.0508809102] [Citation(s) in RCA: 137] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We describe conditions for rolling-circle amplification (RCA) of individual DNA molecules 5-7 kb in size by >10(9)-fold, using phi29 DNA polymerase. The principal difficulty with amplification of small amounts of template by RCA using phi29 DNA polymerase is "background" DNA synthesis that usually occurs when template is omitted, or at low template concentrations. Reducing the reaction volume while keeping the amount of template fixed increases the template concentration, resulting in a suppression of background synthesis. Cell-free cloning of single circular molecules by using phi29 DNA polymerase was achieved by carrying out the amplification reactions in very small volumes, typically 600 nl. This procedure allows cell-free cloning of individual synthetic DNA molecules that cannot be cloned in Escherichia coli, for example synthetic phage genomes carrying lethal mutations. It also allows cell-free cloning of genomic DNA isolated from bacteria. This DNA can be sequenced directly from the phi29 DNA polymerase reaction without further amplification. In contrast to PCR amplification, RCA using phi29 DNA polymerase does not produce mutant jackpots, and the high processivity of the enzyme eliminates stuttering at homopolymer tracts. Cell-free cloning has many potential applications to both natural and synthetic DNA. These include environmental DNA samples that have proven difficult to clone and synthetic genes encoding toxic products. The method may also speed genome sequencing by eliminating the need for biological cloning.
Collapse
|
85
|
Remington KA, Heidelberg K, Venter JC. Taking metagenomic studies in context. Trends Microbiol 2005; 13:404. [PMID: 16039858 DOI: 10.1016/j.tim.2005.07.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2005] [Revised: 06/30/2005] [Accepted: 07/08/2005] [Indexed: 11/25/2022]
|
86
|
|
87
|
Rand V, Huang J, Stockwell T, Ferriera S, Buzko O, Levy S, Busam D, Li K, Edwards JB, Eberhart C, Murphy KM, Tsiamouri A, Beeson K, Simpson AJG, Venter JC, Riggins GJ, Strausberg RL. Sequence survey of receptor tyrosine kinases reveals mutations in glioblastomas. Proc Natl Acad Sci U S A 2005; 102:14344-9. [PMID: 16186508 PMCID: PMC1242336 DOI: 10.1073/pnas.0507200102] [Citation(s) in RCA: 121] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
It is now clear that tyrosine kinases represent attractive targets for therapeutic intervention in cancer. Recent advances in DNA sequencing technology now provide the opportunity to survey mutational changes in cancer in a high-throughput and comprehensive manner. Here we report on the sequence analysis of members of the receptor tyrosine kinase (RTK) gene family in the genomes of glioblastoma brain tumors. Previous studies have identified a number of molecular alterations in glioblastoma, including amplification of the RTK epidermal growth factor receptor. We have identified mutations in two other RTKs: (i) fibroblast growth receptor 1, including the first mutations in the kinase domain in this gene observed in any cancer, and (ii) a frameshift mutation in the platelet-derived growth factor receptor-alpha gene. Fibroblast growth receptor 1, platelet-derived growth factor receptor-alpha, and epidermal growth factor receptor are all potential entry points to the phosphatidylinositol 3-kinase and mitogen-activated protein kinase intracellular signaling pathways already known to be important for neoplasia. Our results demonstrate the utility of applying DNA sequencing technology to systematically assess the coding sequence of genes within cancer genomes.
Collapse
|
88
|
Gardner MJ, Bishop R, Shah T, de Villiers EP, Carlton JM, Hall N, Ren Q, Paulsen IT, Pain A, Berriman M, Wilson RJM, Sato S, Ralph SA, Mann DJ, Xiong Z, Shallom SJ, Weidman J, Jiang L, Lynn J, Weaver B, Shoaibi A, Domingo AR, Wasawo D, Crabtree J, Wortman JR, Haas B, Angiuoli SV, Creasy TH, Lu C, Suh B, Silva JC, Utterback TR, Feldblyum TV, Pertea M, Allen J, Nierman WC, Taracha ELN, Salzberg SL, White OR, Fitzhugh HA, Morzaria S, Venter JC, Fraser CM, Nene V. Genome Sequence of Theileria parva, a Bovine Pathogen That Transforms Lymphocytes. Science 2005; 309:134-7. [PMID: 15994558 DOI: 10.1126/science.1110439] [Citation(s) in RCA: 258] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
We report the genome sequence of Theileria parva, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa. The parasite chromosomes exhibit limited conservation of gene synteny with Plasmodium falciparum, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand. We tentatively identify proteins that facilitate parasite segregation during host cell cytokinesis and contribute to persistent infection of transformed host cells. Several biosynthetic pathways are incomplete or absent, suggesting substantial metabolic dependence on the host cell. One protein family that may generate parasite antigenic diversity is not telomere-associated.
Collapse
|
89
|
Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera, Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Simons R, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Worley KC, Cooney AJ, D'Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, Tuzun E, Birney E, Mongin E, Ureta-Vidal A, Woodwark C, Zdobnov E, Bork P, Suyama M, Torrents D, Alexandersson M, Trask BJ, Young JM, Huang H, Wang H, Xing H, Daniels S, Gietzen D, Schmidt J, Stevens K, Vitt U, Wingrove J, Camara F, Mar Albà M, Abril JF, Guigo R, Smit A, Dubchak I, Rubin EM, Couronne O, Poliakov A, Hübner N, Ganten D, Goesele C, Hummel O, Kreitler T, Lee YA, Monti J, Schulz H, Zimdahl H, Himmelbauer H, Lehrach H, Jacob HJ, Bromberg S, Gullings-Handley J, Jensen-Seaman MI, Kwitek AE, Lazar J, Pasko D, Tonellato PJ, Twigger S, Ponting CP, Duarte JM, Rice S, Goodstadt L, Beatson SA, Emes RD, Winter EE, Webber C, Brandt P, Nyakatura G, Adetobi M, Chiaromonte F, Elnitski L, Eswara P, Hardison RC, Hou M, Kolbe D, Makova K, Miller W, Nekrutenko A, Riemer C, Schwartz S, Taylor J, Yang S, Zhang Y, Lindpaintner K, Andrews TD, Caccamo M, Clamp M, Clarke L, Curwen V, Durbin R, Eyras E, Searle SM, Cooper GM, Batzoglou S, Brudno M, Sidow A, Stone EA, Venter JC, Payseur BA, Bourque G, López-Otín C, Puente XS, Chakrabarti K, Chatterji S, Dewey C, Pachter L, Bray N, Yap VB, Caspi A, Tesler G, Pevzner PA, Haussler D, Roskin KM, Baertsch R, Clawson H, Furey TS, Hinrichs AS, Karolchik D, Kent WJ, Rosenbloom KR, Trumbower H, Weirauch M, Cooper DN, Stenson PD, Ma B, Brent M, Arumugam M, Shteynberg D, Copley RR, Taylor MS, Riethman H, Mudunuri U, Peterson J, Guyer M, Felsenfeld A, Old S, Mockrin S, Collins F. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 2004; 428:493-521. [PMID: 15057822 DOI: 10.1038/nature02426] [Citation(s) in RCA: 1512] [Impact Index Per Article: 75.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2003] [Accepted: 02/20/2004] [Indexed: 01/16/2023]
Abstract
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
Collapse
|
90
|
Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO. Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004; 304:66-74. [PMID: 15001713 DOI: 10.1126/science.1093857] [Citation(s) in RCA: 2428] [Impact Index Per Article: 121.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We have applied "whole-genome shotgun sequencing" to microbial populations collected en masse on tangential flow and impact filters from seawater samples collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs of nonredundant sequence was generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance of the organisms within these environmental samples. These data are estimated to derive from at least 1800 genomic species based on sequence relatedness, including 148 previously unknown bacterial phylotypes. We have identified over 1.2 million previously unknown genes represented in these samples, including more than 782 new rhodopsin-like photoreceptors. Variation in species present and stoichiometry suggests substantial oceanic microbial diversity.
Collapse
|
91
|
Istrail S, Sutton GG, Florea L, Halpern AL, Mobarry CM, Lippert R, Walenz B, Shatkay H, Dew I, Miller JR, Flanigan MJ, Edwards NJ, Bolanos R, Fasulo D, Halldorsson BV, Hannenhalli S, Turner R, Yooseph S, Lu F, Nusskern DR, Shue BC, Zheng XH, Zhong F, Delcher AL, Huson DH, Kravitz SA, Mouchard L, Reinert K, Remington KA, Clark AG, Waterman MS, Eichler EE, Adams MD, Hunkapiller MW, Myers EW, Venter JC. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc Natl Acad Sci U S A 2004; 101:1916-21. [PMID: 14769938 PMCID: PMC357027 DOI: 10.1073/pnas.0307971100] [Citation(s) in RCA: 136] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.
Collapse
|
92
|
Carucci DJ, Gardner MJ, Tettelin H, Cummings LM, Smith HO, Adams MD, Hoffman SL, Venter JC. The Malaria Genome Sequencing Project. Expert Rev Mol Med 2004; 1998:1-9. [PMID: 14585131 DOI: 10.1017/s146239949800012x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
An international consortium of genome centres, advanced development teams and funding agencies has begun the task of sequencing the genome of the parasite Plasmodium falciparum, the most important cause of human malaria. Sequencing is proceeding chromosome by chromosome, and the annotated sequence of chromosome 2 is nearly finished. With the continual release of sequence data as they are generated, malaria researchers have access to a steady stream of genomic sequences and will soon have the complete annotation of all of the estimated 5000-7000 P. falciparum genes. The task will then be how to best apply these data to the development of new anti-malarial drugs, vaccines and diagnostic tests. This review provides a brief overview of the Malaria Genome Sequencing Project and suggests potential directions for future malaria research.
Collapse
|
93
|
Smith HO, Hutchison CA, Pfannkoch C, Venter JC. Generating a synthetic genome by whole genome assembly: phiX174 bacteriophage from synthetic oligonucleotides. Proc Natl Acad Sci U S A 2003; 100:15440-5. [PMID: 14657399 PMCID: PMC307586 DOI: 10.1073/pnas.2237126100] [Citation(s) in RCA: 351] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We have improved upon the methodology and dramatically shortened the time required for accurate assembly of 5- to 6-kb segments of DNA from synthetic oligonucleotides. As a test of this methodology, we have established conditions for the rapid (14-day) assembly of the complete infectious genome of bacteriophage X174 (5386 bp) from a single pool of chemically synthesized oligonucleotides. The procedure involves three key steps: (i). gel purification of pooled oligonucleotides to reduce contamination with molecules of incorrect chain length, (ii). ligation of the oligonucleotides under stringent annealing conditions (55 degrees C) to select against annealing of molecules with incorrect sequences, and (iii). assembly of ligation products into full-length genomes by polymerase cycling assembly, a nonexponential reaction in which each terminal oligonucleotide can be extended only once to produce a full-length molecule. We observed a discrete band of full-length assemblies upon gel analysis of the polymerase cycling assembly product, without any PCR amplification. PCR amplification was then used to obtain larger amounts of pure full-length genomes for circularization and infectivity measurements. The synthetic DNA had a lower infectivity than natural DNA, indicating approximately one lethal error per 500 bp. However, fully infectious X174 virions were recovered after electroporation into Escherichia coli. Sequence analysis of several infectious isolates verified the accuracy of these synthetic genomes. One such isolate had exactly the intended sequence. We propose to assemble larger genomes by joining separately assembled 5- to 6-kb segments; approximately 60 such segments would be required for a minimal cellular genome.
Collapse
|
94
|
Kirkness EF, Bafna V, Halpern AL, Levy S, Remington K, Rusch DB, Delcher AL, Pop M, Wang W, Fraser CM, Venter JC. The Dog Genome: Survey Sequencing and Comparative Analysis. Science 2003; 301:1898-903. [PMID: 14512627 DOI: 10.1126/science.1086432] [Citation(s) in RCA: 393] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.
Collapse
|
95
|
Abstract
Despite recent genetic evidence and the promise of individualized medicine, there is a continuing interest in using self-identified categories of race and ethnicity as variables in scientific and medical research. The U.S. Food and Drug Administration recently proposed a standardized approach for the collection of race and ethnicity data in clinical trials. We believe that this move fails to acknowledge new scientific data and recommend that relevant data from individuals be collected and used rather than broad group statistics. We also encourage that increased funding be committed to this important issue.
Collapse
|
96
|
Broder S, Subramanian G, Venter JC. The Human Genome. Pharmacogenomics 2003. [DOI: 10.1002/3527600752.ch2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
97
|
Scherer SW, Cheung J, MacDonald JR, Osborne LR, Nakabayashi K, Herbrick JA, Carson AR, Parker-Katiraee L, Skaug J, Khaja R, Zhang J, Hudek AK, Li M, Haddad M, Duggan GE, Fernandez BA, Kanematsu E, Gentles S, Christopoulos CC, Choufani S, Kwasnicka D, Zheng XH, Lai Z, Nusskern D, Zhang Q, Gu Z, Lu F, Zeesman S, Nowaczyk MJ, Teshima I, Chitayat D, Shuman C, Weksberg R, Zackai EH, Grebe TA, Cox SR, Kirkpatrick SJ, Rahman N, Friedman JM, Heng HHQ, Pelicci PG, Lo-Coco F, Belloni E, Shaffer LG, Pober B, Morton CC, Gusella JF, Bruns GAP, Korf BR, Quade BJ, Ligon AH, Ferguson H, Higgins AW, Leach NT, Herrick SR, Lemyre E, Farra CG, Kim HG, Summers AM, Gripp KW, Roberts W, Szatmari P, Winsor EJT, Grzeschik KH, Teebi A, Minassian BA, Kere J, Armengol L, Pujana MA, Estivill X, Wilson MD, Koop BF, Tosi S, Moore GE, Boright AP, Zlotorynski E, Kerem B, Kroisel PM, Petek E, Oscier DG, Mould SJ, Döhner H, Döhner K, Rommens JM, Vincent JB, Venter JC, Li PW, Mural RJ, Adams MD, Tsui LC. Human chromosome 7: DNA sequence and biology. Science 2003; 300:767-72. [PMID: 12690205 PMCID: PMC2882961 DOI: 10.1126/science.1083423] [Citation(s) in RCA: 156] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. This approach enabled the discovery of candidate genes for developmental diseases including autism.
Collapse
|
98
|
Adams MD, Sutton GG, Smith HO, Myers EW, Venter JC. The independence of our genome assemblies. Proc Natl Acad Sci U S A 2003; 100:3025-6. [PMID: 16576752 PMCID: PMC152237 DOI: 10.1073/pnas.0637478100] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
99
|
Venter JC, Levy S, Stockwell T, Remington K, Halpern A. Massive parallelism, randomness and genomic advances. Nat Genet 2003; 33 Suppl:219-27. [PMID: 12610531 DOI: 10.1038/ng1114] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In reviewing the past decade, it is clear that genomics was, and still is, driven by innovative technologies, perhaps more so than any other scientific area in recent memory. From the outset, computing, mathematics and new automated laboratory techniques have been key components in allowing the field to move forward rapidly. We highlight some key innovations that have come together to nurture the explosive growth that makes a new era of genomics a reality. We also document how these new approaches have fueled further innovations and discoveries.
Collapse
|
100
|
|