51
|
Chubb D, Jefferys BR, Sternberg MJE, Kelley LA. Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe. ACTA ACUST UNITED AC 2010; 26:2664-71. [PMID: 20843957 DOI: 10.1093/bioinformatics/btq527] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe. RESULTS We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications.
Collapse
Affiliation(s)
- Daniel Chubb
- Department of Life Science, Imperial College London, London, UK.
| | | | | | | |
Collapse
|
52
|
Metabolic network analysis of Pseudomonas aeruginosa during chronic cystic fibrosis lung infection. J Bacteriol 2010; 192:5534-48. [PMID: 20709898 DOI: 10.1128/jb.00900-10] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
System-level modeling is beginning to be used to decipher high throughput data in the context of disease. In this study, we present an integration of expression microarray data with a genome-scale metabolic reconstruction of Pseudomonas aeruginosa in the context of a chronic cystic fibrosis (CF) lung infection. A genome-scale reconstruction of P. aeruginosa metabolism was tailored to represent the metabolic states of two clonally related lineages of P. aeruginosa isolated from the lungs of a CF patient at different points over a 44-month time course, giving a mechanistic glimpse into how the bacterial metabolism adapts over time in the CF lung. Metabolic capacities were analyzed to determine how tradeoffs between growth and other important cellular processes shift during disease progression. Genes whose knockouts were either significantly growth reducing or lethal in silico were also identified for each time point and serve as hypotheses for future drug targeting efforts specific to the stages of disease progression.
Collapse
|
53
|
Qiu Y, Cho BK, Park YS, Lovley D, Palsson BØ, Zengler K. Structural and operational complexity of the Geobacter sulfurreducens genome. Genome Res 2010; 20:1304-11. [PMID: 20592237 DOI: 10.1101/gr.107540.110] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 5'-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes.
Collapse
Affiliation(s)
- Yu Qiu
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, USA
| | | | | | | | | | | |
Collapse
|
54
|
Recent progress and new challenges in metagenomics for biotechnology. Biotechnol Lett 2010; 32:1351-9. [PMID: 20495950 DOI: 10.1007/s10529-010-0306-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2010] [Accepted: 05/08/2010] [Indexed: 01/30/2023]
Abstract
A brief historical perspective on metagenomics is given followed by a discussion of the rapid progress in this field largely defined by transition to the next generation sequencing technologies. Problems and challenges connected to this transition are also addressed. The review focuses on recent literature describing metagenomic approaches connecting sequence information to functionality that are especially relevant to biotechnological applications, including metagenomics of specialized or enriched microbial communities, metagenomics combined with specific labeling techniques, metatranscriptomics and metaproteomics.
Collapse
|
55
|
GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 2010; 7:455-7. [PMID: 20436475 DOI: 10.1038/nmeth.1457] [Citation(s) in RCA: 450] [Impact Index Per Article: 32.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2010] [Accepted: 03/26/2010] [Indexed: 11/09/2022]
Abstract
We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.
Collapse
|
56
|
Leggewie C, Puls M, Eggert T. Identifizierung und Expression neuer Biokatalysatoren. CHEM-ING-TECH 2010. [DOI: 10.1002/cite.200900153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
57
|
Armengaud J. Proteogenomics and systems biology: quest for the ultimate missing parts. Expert Rev Proteomics 2010; 7:65-77. [DOI: 10.1586/epr.09.104] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
|
58
|
Lagesen K, Ussery DW, Wassenaar TM. Genome update: the 1000th genome--a cautionary tale. MICROBIOLOGY-SGM 2010; 156:603-608. [PMID: 20093288 DOI: 10.1099/mic.0.038257-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
There are now more than 1000 sequenced prokaryotic genomes deposited in public databases and available for analysis. Currently, although the sequence databases GenBank, DNA Database of Japan and EMBL are synchronized continually, there are slight differences in content at the genomes level for a variety of logistical reasons, including differences in format and loading errors, such as those caused by file transfer protocol interruptions. This means that the 1000th genome will be different in the various databases. Some of the data on the highly accessed web pages are inaccurate, leading to false conclusions for example about the largest bacterial genome sequenced. Biological diversity is far greater than many have thought. For example, analysis of multiple Escherichia coli genomes has led to an estimate of around 45 000 gene families - more genes than are recognized in the human genome. Moreover, of the 1000 genomes available, not a single protein is conserved across all genomes. Excluding the members of the Archaea, only a total of four genes are conserved in all bacteria: two protein genes and two RNA genes.
Collapse
Affiliation(s)
- Karin Lagesen
- Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, Oslo University Hospital, Rikshospitalet, NO-0027, Oslo, Norway, and Department of Informatics, University of Oslo, PO Box 1080 Blindern, NO-0316, Oslo, Norway.,Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Lyngby, Denmark
| | - Dave W Ussery
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Lyngby, Denmark
| | - Trudy M Wassenaar
- Molecular Microbiology and Genomics Consultants, Zotzenheim, Germany.,Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Lyngby, Denmark
| |
Collapse
|
59
|
Martin NF, Martin F. From Galactic archeology to soil metagenomics - surfing on massive data streams. THE NEW PHYTOLOGIST 2010; 185:343-347. [PMID: 20088974 DOI: 10.1111/j.1469-8137.2009.03138.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
|
60
|
Liolios K, Chen IMA, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2010; 38:D346-54. [PMID: 19914934 PMCID: PMC2808860 DOI: 10.1093/nar/gkp848] [Citation(s) in RCA: 312] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2009] [Accepted: 09/22/2009] [Indexed: 11/14/2022] Open
Abstract
The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/
Collapse
Affiliation(s)
- Konstantinos Liolios
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - I-Min A. Chen
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Konstantinos Mavromatis
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Nektarios Tavernarakis
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Philip Hugenholtz
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Victor M. Markowitz
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Nikos C. Kyrpides
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| |
Collapse
|
61
|
Veiga DFT, Dutta B, Balázsi G. Network inference and network response identification: moving genome-scale data to the next level of biological discovery. MOLECULAR BIOSYSTEMS 2009; 6:469-80. [PMID: 20174676 DOI: 10.1039/b916989j] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The escalating amount of genome-scale data demands a pragmatic stance from the research community. How can we utilize this deluge of information to better understand biology, cure diseases, or engage cells in bioremediation or biomaterial production for various purposes? A research pipeline moving new sequence, expression and binding data towards practical end goals seems to be necessary. While most individual researchers are not motivated by such well-articulated pragmatic end goals, the scientific community has already self-organized itself to successfully convert genomic data into fundamentally new biological knowledge and practical applications. Here we review two important steps in this workflow: network inference and network response identification, applied to transcriptional regulatory networks. Among network inference methods, we concentrate on relevance networks due to their conceptual simplicity. We classify and discuss network response identification approaches as either data-centric or network-centric. Finally, we conclude with an outlook on what is still missing from these approaches and what may be ahead on the road to biological discovery.
Collapse
Affiliation(s)
- Diogo F T Veiga
- Department of Systems Biology-Unit 950, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA.
| | | | | |
Collapse
|
62
|
Abstract
Over the last few decades, advances in cultivation-independent methods have significantly contributed to our understanding of microbial diversity and community composition in the environment. At the same time, cultivation-dependent methods have thrived, and the growing number of organisms obtained thereby have allowed for detailed studies of their physiology and genetics. Still, most microorganisms are recalcitrant to cultivation. This review not only conveys current knowledge about different isolation and cultivation strategies but also discusses what implications can be drawn from pure culture work for studies in microbial ecology. Specifically, in the light of single-cell individuality and genome heterogeneity, it becomes important to evaluate population-wide measurements carefully. An overview of various approaches in microbial ecology is given, and the cell as a central unit for understanding processes on a community level is discussed.
Collapse
Affiliation(s)
- Karsten Zengler
- Bioengineering Department, University of California, San Diego, La Jolla, California 92093, USA.
| |
Collapse
|
63
|
Methodologies to increase the transformation efficiencies and the range of bacteria that can be transformed. Appl Microbiol Biotechnol 2009; 85:1301-13. [DOI: 10.1007/s00253-009-2349-1] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2009] [Revised: 11/06/2009] [Accepted: 11/07/2009] [Indexed: 10/20/2022]
|
64
|
Zhang W, Li F, Nie L. Integrating multiple 'omics' analysis for microbial biology: application and methodologies. MICROBIOLOGY-SGM 2009; 156:287-301. [PMID: 19910409 DOI: 10.1099/mic.0.034793-0] [Citation(s) in RCA: 281] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Recent advances in various 'omics' technologies enable quantitative monitoring of the abundance of various biological molecules in a high-throughput manner, and thus allow determination of their variation between different biological states on a genomic scale. Several popular 'omics' platforms that have been used in microbial systems biology include transcriptomics, which measures mRNA transcript levels; proteomics, which quantifies protein abundance; metabolomics, which determines abundance of small cellular metabolites; interactomics, which resolves the whole set of molecular interactions in cells; and fluxomics, which establishes dynamic changes of molecules within a cell over time. However, no single 'omics' analysis can fully unravel the complexities of fundamental microbial biology. Therefore, integration of multiple layers of information, the multi-'omics' approach, is required to acquire a precise picture of living micro-organisms. In spite of this being a challenging task, some attempts have been made recently to integrate heterogeneous 'omics' datasets in various microbial systems and the results have demonstrated that the multi-'omics' approach is a powerful tool for understanding the functional principles and dynamics of total cellular systems. This article reviews some basic concepts of various experimental 'omics' approaches, recent application of the integrated 'omics' for exploring metabolic and regulatory mechanisms in microbes, and advances in computational and statistical methodologies associated with integrated 'omics' analyses. Online databases and bioinformatic infrastructure available for integrated 'omics' analyses are also briefly discussed.
Collapse
Affiliation(s)
- Weiwen Zhang
- Center for Ecogenomics, Biodesign Institute, Arizona State University, Tempe, AZ 85287-6501, USA
| | - Feng Li
- Division of Biometrics II, Office of Biometrics/OTS/CDER/FDA, Silver Spring, MD 20993-0002, USA
| | - Lei Nie
- Division of Biometrics IV, Office of Biometrics/OTS/CDER/FDA, Silver Spring, MD 20993-0002, USA
| |
Collapse
|
65
|
Microbial community genomics in eastern Mediterranean Sea surface waters. ISME JOURNAL 2009; 4:78-87. [PMID: 19693100 DOI: 10.1038/ismej.2009.92] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Offshore waters of the eastern Mediterranean Sea are one of the most oligotrophic regions on Earth in which the primary productivity is phosphorus limited. To study the unexplored function and physiology of microbes inhabiting this system, we have analyzed a genomic library from the eastern Mediterranean Sea surface waters by sequencing both termini of nearly 5000 clones. Genome recruitment strategies showed that the majority of high-scoring pairs corresponded to genomes from the Alphaproteobacteria (SAR11-like and Rhodobacterales), Cyanobacteria (Synechococcus and high-light adapted Prochlorococcus) and diverse uncultured Gammaproteobacteria. The community structure observed, as evaluated by both protein similarity scores or metabolic potential, was similar to that found in the euphotic zone of the ALOHA station off Hawaii but very different from that of deep aphotic zones in both the Mediterranean Sea and the Pacific Ocean. In addition, a strong enrichment toward phosphate and phosphonate uptake and utilization metabolism was also observed.
Collapse
|