Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res 2006;34:D332-4. [PMID: 16381880 PMCID: PMC1347507 DOI: 10.1093/nar/gkj145] [Citation(s) in RCA: 196] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

For:	Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res 2006;34:D332-4. [PMID: 16381880 PMCID: PMC1347507 DOI: 10.1093/nar/gkj145] [Citation(s) in RCA: 196] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Number

Cited by Other Article(s)

Loci Encoding Compounds Potentially Active against Drug-Resistant Pathogens amidst a Decreasing Pool of Novel Antibiotics. Appl Environ Microbiol 2019;85:AEM.01438-19. [PMID: 31540982 PMCID: PMC6856318 DOI: 10.1128/aem.01438-19] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 09/12/2019] [Indexed: 12/13/2022] Open

Abstract

Carbapenem-resistant P. aeruginosa is difficult to treat and has been deemed by the World Health Organization as a priority one pathogen for which antibiotics are most urgently needed. Although metagenomics and bioinformatic studies suggest that natural bacteria remain a source of novel compounds, the identification of genes and their products specific to activity against MDR pathogens remains problematic. Here, we examine water-derived pseudomonads and identify gene clusters whose compounds inhibit CF-derived MDR pathogens, including carbapenem-resistant P. aeruginosa.

Since the discovery of penicillin, microbes have been a source of antibiotics that inhibit the growth of pathogens. However, with the evolution of multidrug-resistant (MDR) strains, it remains unclear if there is an abundant or limited supply of natural products to be discovered that are effective against MDR isolates. To identify strains that are antagonistic to pathogens, we examined a set of 471 globally derived environmental Pseudomonas strains (env-Ps) for activity against a panel of 65 pathogens including Achromobacter spp., Burkholderia spp., Pseudomonas aeruginosa, and Stenotrophomonas spp. isolated from the lungs of cystic fibrosis (CF) patients. From more than 30,000 competitive interactions, 1,530 individual inhibitory events were observed. While strains from water habitats were not proportionate in antagonistic activity, MDR CF-derived pathogens (CF-Ps) were less susceptible to inhibition by env-Ps, suggesting that fewer natural products are effective against MDR strains. These results advocate for a directed strategy to identify unique drugs. To facilitate discovery of antibiotics against the most resistant pathogens, we developed a workflow in which phylogenetic and antagonistic data were merged to identify strains that inhibit MDR CF-Ps and subjected those env-Ps to transposon mutagenesis. Six different biosynthetic gene clusters (BGCs) were identified from four strains whose products inhibited pathogens including carbapenem-resistant P. aeruginosa. BGCs were rare in databases, suggesting the production of novel antibiotics. This strategy can be utilized to facilitate the discovery of needed antibiotics that are potentially active against the most drug-resistant pathogens.

IMPORTANCE Carbapenem-resistant P. aeruginosa is difficult to treat and has been deemed by the World Health Organization as a priority one pathogen for which antibiotics are most urgently needed. Although metagenomics and bioinformatic studies suggest that natural bacteria remain a source of novel compounds, the identification of genes and their products specific to activity against MDR pathogens remains problematic. Here, we examine water-derived pseudomonads and identify gene clusters whose compounds inhibit CF-derived MDR pathogens, including carbapenem-resistant P. aeruginosa.

Collapse

Zhang M, Zundel Z, Myers CJ. SBOLExplorer: Data Infrastructure and Data Mining for Genetic Design Repositories. ACS Synth Biol 2019;8:2287-2294. [PMID: 31532640 DOI: 10.1021/acssynbio.9b00089] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Computational methods and tools for binding site recognition between proteins and small molecules: from classical geometrical approaches to modern machine learning strategies. J Comput Aided Mol Des 2019;33:887-903. [PMID: 31628659 DOI: 10.1007/s10822-019-00235-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/11/2019] [Indexed: 10/25/2022]

Mishra A, Pokhrel P, Hoque MT. StackDPPred: a stacking based prediction of DNA-binding protein from sequence. Bioinformatics 2018;35:433-441. [DOI: 10.1093/bioinformatics/bty653] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 07/18/2018] [Indexed: 12/12/2022] Open

Wagner A, Norris S, Chatterjee P, Morris PF, Wildschutte H. Aquatic Pseudomonads Inhibit Oomycete Plant Pathogens of Glycine max. Front Microbiol 2018;9:1007. [PMID: 29896163 PMCID: PMC5986895 DOI: 10.3389/fmicb.2018.01007] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Accepted: 04/30/2018] [Indexed: 11/17/2022] Open

Advanced In Silico Tools for Designing of Antigenic Epitope as Potential Vaccine Candidates Against Coronavirus. BIOINFORMATICS: SEQUENCES, STRUCTURES, PHYLOGENY 2018. [PMCID: PMC7120312 DOI: 10.1007/978-981-13-1562-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Verezemska O, Isbandi M, Thomas AD, Ali R, Sharma K, Kyrpides NC, Reddy TBK. Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements. Nucleic Acids Res 2017;45:D446-D456. [PMID: 27794040 PMCID: PMC5210664 DOI: 10.1093/nar/gkw992] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 10/11/2016] [Accepted: 10/19/2016] [Indexed: 01/28/2023] Open

Brumm PJ, Land ML, Mead DA. Complete genome sequences of Geobacillus sp. WCH70, a thermophilic strain isolated from wood compost. Stand Genomic Sci 2016;11:33. [PMID: 27123157 PMCID: PMC4847372 DOI: 10.1186/s40793-016-0153-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 04/19/2016] [Indexed: 11/10/2022] Open

Brumm P, Land ML, Hauser LJ, Jeffries CD, Chang YJ, Mead DA. Complete genome sequences of Geobacillus sp. Y412MC52, a xylan-degrading strain isolated from obsidian hot spring in Yellowstone National Park. Stand Genomic Sci 2015;10:81. [PMID: 26500717 PMCID: PMC4617443 DOI: 10.1186/s40793-015-0075-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 10/09/2015] [Indexed: 11/10/2022] Open

Brumm P, Land ML, Hauser LJ, Jeffries CD, Chang YJ, Mead DA. Complete genome sequences of Geobacillus sp. Y412MC52, a xylan-degrading strain isolated from obsidian hot spring in Yellowstone National Park. Stand Genomic Sci 2015. [PMID: 26500717 DOI: 10.1186/s40793-015-0075-0 10.1186/s40793-016-0133-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Brumm PJ, Land ML, Mead DA. Complete genome sequence of Geobacillus thermoglucosidasius C56-YS93, a novel biomass degrader isolated from obsidian hot spring in Yellowstone National Park. Stand Genomic Sci 2015;10:73. [PMID: 26442136 PMCID: PMC4593210 DOI: 10.1186/s40793-015-0031-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 06/29/2015] [Indexed: 11/29/2022] Open

Xu R, Zhou J, Wang H, He Y, Wang X, Liu B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC SYSTEMS BIOLOGY 2015;9 Suppl 1:S10. [PMID: 25708928 PMCID: PMC4331676 DOI: 10.1186/1752-0509-9-s1-s10] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Abstract

BACKGROUND

DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions.

RESULTS

We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods.

CONCLUSIONS

The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.

Collapse

Brumm P, Land ML, Hauser LJ, Jeffries CD, Chang YJ, Mead DA. Complete genome sequences of Geobacillus sp. Y412MC52, a xylan-degrading strain isolated from obsidian hot spring in Yellowstone National Park. Stand Genomic Sci 2015. [PMID: 26500717 PMCID: PMC4617443 DOI: 10.1186/s40793-015-0075-0+10.1186/s40793-016-0133-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open

Urban M, Pant R, Raghunath A, Irvine AG, Pedro H, Hammond-Kosack KE. The Pathogen-Host Interactions database (PHI-base): additions and future developments. Nucleic Acids Res 2015;43:D645-55. [PMID: 25414340 PMCID: PMC4383963 DOI: 10.1093/nar/gku1165] [Citation(s) in RCA: 151] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Revised: 10/30/2014] [Accepted: 10/30/2014] [Indexed: 12/12/2022] Open

Marcus S, Lee H, Schatz MC. SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips. ACTA ACUST UNITED AC 2014;30:3476-83. [PMID: 25398610 DOI: 10.1093/bioinformatics/btu756] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Reddy TBK, Thomas AD, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos EA, Kyrpides NC. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res 2014;43:D1099-106. [PMID: 25348402 DOI: 10.1093/nar/gku950] [Citation(s) in RCA: 259] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Simonyan V, Mazumder R. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes (Basel) 2014;5:957-81. [PMID: 25271953 PMCID: PMC4276921 DOI: 10.3390/genes5040957] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Revised: 09/22/2014] [Accepted: 09/22/2014] [Indexed: 12/30/2022] Open

Lua RC, Marciano DC, Katsonis P, Adikesavan AK, Wilkins AD, Lichtarge O. Prediction and redesign of protein-protein interactions. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014;116:194-202. [PMID: 24878423 DOI: 10.1016/j.pbiomolbio.2014.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 05/02/2014] [Accepted: 05/17/2014] [Indexed: 12/14/2022]

Lin J, Qian J. Systems biology approach to integrative comparative genomics. Expert Rev Proteomics 2014;4:107-19. [PMID: 17288519 DOI: 10.1586/14789450.4.1.107] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

Cipriano MJ, Novichkov PN, Kazakov AE, Rodionov DA, Arkin AP, Gelfand MS, Dubchak I. RegTransBase--a database of regulatory sequences and interactions based on literature: a resource for investigating transcriptional regulation in prokaryotes. BMC Genomics 2013;14:213. [PMID: 23547897 PMCID: PMC3639892 DOI: 10.1186/1471-2164-14-213] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 03/22/2013] [Indexed: 11/10/2022] Open

Abstract

Background

Due to the constantly growing number of sequenced microbial genomes, comparative genomics has been playing a major role in the investigation of regulatory interactions in bacteria. Regulon inference mostly remains a field of semi-manual examination since absence of a knowledgebase and informatics platform for automated and systematic investigation restricts opportunities for computational prediction. Additionally, confirming computationally inferred regulons by experimental data is critically important.

Description

RegTransBase is an open-access platform with a user-friendly web interface publicly available at http://regtransbase.lbl.gov. It consists of two databases – a manually collected hierarchical regulatory interactions database based on more than 7000 scientific papers which can serve as a knowledgebase for verification of predictions, and a large set of curated by experts transcription factor binding sites used in regulon inference by a variety of tools. RegTransBase captures the knowledge from published scientific literature using controlled vocabularies and contains various types of experimental data, such as: the activation or repression of transcription by an identified direct regulator; determination of the transcriptional regulatory function of a protein (or RNA) directly binding to DNA or RNA; mapping of binding sites for a regulatory protein; characterization of regulatory mutations. Analysis of the data collected from literature resulted in the creation of Putative Regulons from Experimental Data that are also available in RegTransBase.

Conclusions

RegTransBase is a powerful user-friendly platform for the investigation of regulation in prokaryotes. It uses a collection of validated regulatory sequences that can be easily extracted and used to infer regulatory interactions by comparative genomics techniques thus assisting researchers in the interpretation of transcriptional regulation data.

Collapse

Tiwari MK, Singh R, Singh RK, Kim IW, Lee JK. Computational approaches for rational design of proteins with novel functionalities. Comput Struct Biotechnol J 2012;2:e201209002. [PMID: 24688643 PMCID: PMC3962203 DOI: 10.5936/csbj.201209002] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 08/17/2012] [Accepted: 08/23/2012] [Indexed: 11/22/2022] Open

Plant and bacterial systems biology as platform for plant synthetic bio(techno)logy. J Biotechnol 2012;160:80-90. [DOI: 10.1016/j.jbiotec.2012.01.014] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 01/10/2012] [Accepted: 01/17/2012] [Indexed: 11/17/2022]

Defining sequence space and reaction products within the cyanuric acid hydrolase (AtzD)/barbiturase protein family. J Bacteriol 2012;194:4579-88. [PMID: 22730121 DOI: 10.1128/jb.00791-12] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Lee KS, Kim RN, Yoon BH, Kim DS, Choi SH, Kim DW, Nam SH, Kim A, Kang A, Park KH, Jung JE, Chae SH, Park HS. Bacterial genome mapper: A comparative bacterial genome mapping tool. Bioinformation 2012;8:532-4. [PMID: 22829725 PMCID: PMC3398773 DOI: 10.6026/97320630008532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 06/03/2012] [Indexed: 11/29/2022] Open

Affiliation(s)

Kang Seon Lee Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea University of Science and Technology (UST), Daejeon 305-333, Korea These authors contributed equally to this work
Ryong Nam Kim Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea These authors contributed equally to this work
Byoung Ha Yoon Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea University of Science and Technology (UST), Daejeon 305-333, Korea These authors contributed equally to this work
Dae Soo Kim Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
Sang Haeng Choi Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
Dong Wook Kim Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
Seong Hyeuk Nam Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
Aeri Kim Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
Aram Kang Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea University of Science and Technology (UST), Daejeon 305-333, Korea
Kun Hyang Park Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
Jae Eun Jung Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
Sung Hwa Chae Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
Hong Seog Park Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea University of Science and Technology (UST), Daejeon 305-333, Korea

Collapse

Kumar A, Suthers PF, Maranas CD. MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases. BMC Bioinformatics 2012;13:6. [PMID: 22233419 PMCID: PMC3277463 DOI: 10.1186/1471-2105-13-6] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Accepted: 01/10/2012] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Increasingly, metabolite and reaction information is organized in the form of genome-scale metabolic reconstructions that describe the reaction stoichiometry, directionality, and gene to protein to reaction associations. A key bottleneck in the pace of reconstruction of new, high-quality metabolic models is the inability to directly make use of metabolite/reaction information from biological databases or other models due to incompatibilities in content representation (i.e., metabolites with multiple names across databases and models), stoichiometric errors such as elemental or charge imbalances, and incomplete atomistic detail (e.g., use of generic R-group or non-explicit specification of stereo-specificity).

DESCRIPTION

MetRxn is a knowledgebase that includes standardized metabolite and reaction descriptions by integrating information from BRENDA, KEGG, MetaCyc, Reactome.org and 44 metabolic models into a single unified data set. All metabolite entries have matched synonyms, resolved protonation states, and are linked to unique structures. All reaction entries are elementally and charge balanced. This is accomplished through the use of a workflow of lexicographic, phonetic, and structural comparison algorithms. MetRxn allows for the download of standardized versions of existing genome-scale metabolic models and the use of metabolic information for the rapid reconstruction of new ones.

CONCLUSIONS

The standardization in description allows for the direct comparison of the metabolite and reaction content between metabolic models and databases and the exhaustive prospecting of pathways for biotechnological production. This ever-growing dataset currently consists of over 76,000 metabolites participating in more than 72,000 reactions (including unresolved entries). MetRxn is hosted on a web-based platform that uses relational database models (MySQL).

Collapse

Pagani I, Liolios K, Jansson J, Chen IMA, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2012;40:D571-9. [PMID: 22135293 PMCID: PMC3245063 DOI: 10.1093/nar/gkr1100] [Citation(s) in RCA: 375] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Revised: 11/02/2011] [Accepted: 11/03/2011] [Indexed: 12/03/2022] Open

Affiliation(s)

Ioanna Pagani Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
Konstantinos Liolios Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
Jakob Jansson Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
I-Min A. Chen Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
Tatyana Smirnova Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
Bahador Nosrat Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
Victor M. Markowitz Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
Nikos C. Kyrpides Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA

Collapse

Siebers B, Zaparty M, Raddatz G, Tjaden B, Albers SV, Bell SD, Blombach F, Kletzin A, Kyrpides N, Lanz C, Plagens A, Rampp M, Rosinus A, von Jan M, Makarova KS, Klenk HP, Schuster SC, Hensel R. The complete genome sequence of Thermoproteus tenax: a physiologically versatile member of the Crenarchaeota. PLoS One 2011;6:e24222. [PMID: 22003381 PMCID: PMC3189178 DOI: 10.1371/journal.pone.0024222] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2010] [Accepted: 08/08/2011] [Indexed: 11/18/2022] Open

Affiliation(s)

Bettina Siebers Faculty of Chemistry, Biofilm Centre, Molecular Enzyme Technology and Biochemistry, University of Duisburg-Essen, Essen, Germany * E-mail: (BS); (MZ)
Melanie Zaparty Institute for Molecular and Cellular Anatomy, University of Regensburg, Regensburg, Germany * E-mail: (BS); (MZ)
Guenter Raddatz Max-Planck-Institute for Biological Cybernetics, Tübingen, Germany
Britta Tjaden Prokaryotic RNA Biology, Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
Sonja-Verena Albers Molecular Biology of Archaea, Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
Steve D. Bell Sir William Dunn School of Pathology, Oxford University, Oxford, United Kingdom
Fabian Blombach Laboratory of Microbiology, Wageningen University, Wageningen, The Netherlands
Arnulf Kletzin Institute of Microbiology and Genetics, Technical University Darmstadt, Darmstadt, Germany
Nikos Kyrpides DOE Joint Genome Institute, Walnut Creek, California, United States of America
Christa Lanz Genome Centre, Max-Planck-Institute for Developmental Biology, Tuebingen, Germany
André Plagens Prokaryotic RNA Biology, Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
Markus Rampp Computer Centre Garching of the Max-Planck-Society (RZG), Max-Planck-Institute for Plasma Physics, München, Germany
Andrea Rosinus Genome Centre, Max-Planck-Institute for Developmental Biology, Tuebingen, Germany
Mathias von Jan DSMZ, German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
Kira S. Makarova National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, United States of America
Hans-Peter Klenk DSMZ, German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
Stephan C. Schuster Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, Pennsylvania, United States of America
Reinhard Hensel Prokaryotic RNA Biology, Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany

Collapse

Nanni L, Lumini A, Gupta D, Garg A. Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou's pseudo amino acid composition and on evolutionary information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;9:467-475. [PMID: 21860064 DOI: 10.1109/tcbb.2011.117] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Raes J, Letunic I, Yamada T, Jensen LJ, Bork P. Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data. Mol Syst Biol 2011;7:473. [PMID: 21407210 PMCID: PMC3094067 DOI: 10.1038/msb.2011.6] [Citation(s) in RCA: 148] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2010] [Accepted: 01/25/2011] [Indexed: 11/10/2022] Open

Abstract

Using metagenomic ‘parts lists' to study microbial ecology remains a significant challenge. This work proposes a molecular trait-based approach to biogeography by integrating metagenomic data with external metadata and using functional community composition as readout.

Climatic factors drive functional and phylogenetic composition of ocean microbial communities.

Function dispersal is controlled by environmental conditions.

Functional richness has a clear latitudinal gradient and correlates with primary production.

Metagenomic data can be used as a predictor for ecosystem processes.

To understand the relationship between community composition and environment, functional readouts are the most direct. Metagenomic data enable such trait-based ecology at the molecular level.

Metagenomics (shotgun sequencing of pooled DNA of complete microbial communities) is widely used to investigate ecosystem functioning of environmental and clinical samples. However, the nature of this data (usually a gigantic collection of gene fragments of 1000s of organisms) makes it very hard to infer global patterns on microbial ecology of the environment at hand. To address important ecological questions such as ‘How do microbial communities adapt to the environmental conditions?', ‘What drives the functional variation across the globe and to what extent do genes disperse?' and ‘What drives variation of CO₂ uptake across different locations and communities?', we integrated 25 ocean metagenomes from the Global Ocean Sampling project with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the functional and phylogenetic composition of an environment and the main limiting factor on whether functions dispersal across the planet. We find a distinct latitudinal gradient in the size and diversity of the functional repertoire of ocean microbial communities, peaking at 20°N, and which correlates with oceanic CO₂ uptake. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes can be used as quantitative predictor for molecular trait-based biogeography and ecology.

Using metagenomic ‘parts lists' to infer global patterns on microbial ecology remains a significant challenge. To deduce important ecological indicators such as environmental adaptation, molecular trait dispersal, diversity variation and primary production from the gene pool of an ecosystem, we integrated 25 ocean metagenomes with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the biomolecular repertoire of each sample and the main limiting factor on functional trait dispersal (absence of biogeographic provincialism). Molecular functional richness and diversity show a distinct latitudinal gradient peaking at 20°N and correlate with primary production. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes is an important quantitative readout for molecular trait-based biogeography and ecology.

Collapse

Grzymski JJ, Dussaq AM. The significance of nitrogen cost minimization in proteomes of marine microorganisms. ISME JOURNAL 2011;6:71-80. [PMID: 21697958 PMCID: PMC3246230 DOI: 10.1038/ismej.2011.72] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Galardini M, Biondi EG, Bazzicalupo M, Mengoni A. CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. SOURCE CODE FOR BIOLOGY AND MEDICINE 2011;6:11. [PMID: 21693004 PMCID: PMC3133546 DOI: 10.1186/1751-0473-6-11] [Citation(s) in RCA: 217] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2011] [Accepted: 06/21/2011] [Indexed: 11/10/2022]

Uchôa NN, Ferreira RDP, Sachetto-Martins G, Müller AC. Ten years of the genomic era in Brazil: Impacts on technological development assessed by scientific production and patent analysis. WORLD PATENT INFORMATION 2011. [DOI: 10.1016/j.wpi.2010.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Salichos L, Rokas A. Evaluating ortholog prediction algorithms in a yeast model clade. PLoS One 2011;6:e18755. [PMID: 21533202 PMCID: PMC3076445 DOI: 10.1371/journal.pone.0018755] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/15/2011] [Indexed: 11/18/2022] Open

Abstract

BACKGROUND

Accurate identification of orthologs is crucial for evolutionary studies and for functional annotation. Several algorithms have been developed for ortholog delineation, but so far, manually curated genome-scale biological databases of orthologous genes for algorithm evaluation have been lacking. We evaluated four popular ortholog prediction algorithms (MultiParanoid; and OrthoMCL; RBH: Reciprocal Best Hit; RSD: Reciprocal Smallest Distance; the last two extended into clustering algorithms cRBH and cRSD, respectively, so that they can predict orthologs across multiple taxa) against a set of 2,723 groups of high-quality curated orthologs from 6 Saccharomycete yeasts in the Yeast Gene Order Browser.

RESULTS

Examination of sensitivity [TP/(TP+FN)], specificity [TN/(TN+FP)], and accuracy [(TP+TN)/(TP+TN+FP+FN)] across a broad parameter range showed that cRBH was the most accurate and specific algorithm, whereas OrthoMCL was the most sensitive. Evaluation of the algorithms across a varying number of species showed that cRBH had the highest accuracy and lowest false discovery rate [FP/(FP+TP)], followed by cRSD. Of the six species in our set, three descended from an ancestor that underwent whole genome duplication. Subsequent differential duplicate loss events in the three descendants resulted in distinct classes of gene loss patterns, including cases where the genes retained in the three descendants are paralogs, constituting 'traps' for ortholog prediction algorithms. We found that the false discovery rate of all algorithms dramatically increased in these traps.

CONCLUSIONS

These results suggest that simple algorithms, like cRBH, may be better ortholog predictors than more complex ones (e.g., OrthoMCL and MultiParanoid) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant.

Collapse

Zhang N, Bilsland E. Contributions of Saccharomyces cerevisiae to understanding mammalian gene function and therapy. Methods Mol Biol 2011;759:501-523. [PMID: 21863505 DOI: 10.1007/978-1-61779-173-4_28] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Auchtung TA, Shyndriayeva G, Cavanaugh CM. 16S rRNA phylogenetic analysis and quantification of Korarchaeota indigenous to the hot springs of Kamchatka, Russia. Extremophiles 2010;15:105-16. [PMID: 21153671 DOI: 10.1007/s00792-010-0340-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 11/15/2010] [Indexed: 10/18/2022]

Ojo OO, Omabe M. Incorporating bioinformatics into biological science education in Nigeria: prospects and challenges. INFECTION GENETICS AND EVOLUTION 2010;11:784-7. [PMID: 21145989 DOI: 10.1016/j.meegid.2010.11.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Revised: 11/26/2010] [Accepted: 11/26/2010] [Indexed: 10/18/2022]

Wyatt MA, Wang W, Roux CM, Beasley FC, Heinrichs DE, Dunman PM, Magarvey NA. Staphylococcus aureus nonribosomal peptide secondary metabolites regulate virulence. Science 2010;329:294-6. [PMID: 20522739 DOI: 10.1126/science.1188888] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Schmidt am Busch M, Sedano A, Simonson T. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One 2010;5:e10410. [PMID: 20463972 PMCID: PMC2864755 DOI: 10.1371/journal.pone.0010410] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/31/2010] [Indexed: 11/19/2022] Open

Abstract

BACKGROUND

Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases.

METHODOLOGY/PRINCIPAL FINDINGS

WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed.

CONCLUSIONS/SIGNIFICANCE

For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.

Collapse

Otero JM, Nielsen J. Industrial Systems Biology. Ind Biotechnol (New Rochelle N Y) 2010. [DOI: 10.1002/9783527630233.ch2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Molecular systematics: A synthesis of the common methods and the state of knowledge. Cell Mol Biol Lett 2010;15:311-41. [PMID: 20213503 PMCID: PMC6275913 DOI: 10.2478/s11658-010-0010-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2009] [Accepted: 03/01/2010] [Indexed: 11/21/2022] Open

2x genomes--depth does matter. Genome Biol 2010;11:R16. [PMID: 20144222 PMCID: PMC2872876 DOI: 10.1186/gb-2010-11-2-r16] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 12/08/2009] [Accepted: 02/09/2010] [Indexed: 01/23/2023] Open

Abstract

The use of low coverage genomes in comparative evolutionary analyses skews estimates of gene gains and losses.

Background

Given the availability of full genome sequences, mapping gene gains, duplications, and losses during evolution should theoretically be straightforward. However, this endeavor suffers from overemphasis on detecting conserved genome features, which in turn has led to sequencing multiple eutherian genomes with low coverage rather than fewer genomes with high-coverage and more even distribution in the phylogeny. Although limitations associated with analysis of low coverage genomes are recognized, they have not been quantified.

Results

Here, using recently developed comparative genomic application systems, we evaluate the impact of low-coverage genomes on inferences pertaining to gene gains and losses when analyzing eukaryote genome evolution through gene duplication. We demonstrate that, when performing inference of genome content evolution, low-coverage genomes generate not only a massive number of false gene losses, but also striking artifacts in gene duplication inference, especially at the most recent common ancestor of low-coverage genomes. We show that the artifactual gains are caused by the low coverage of genome sequence per se rather than by the increased taxon sampling in a biased portion of the species tree.

Conclusions

We argue that it will remain difficult to differentiate artifacts from true changes in modes and tempo of genome evolution until there is better homogeneity in both taxon sampling and high-coverage sequencing. This is important for broadening the utility of full genome data to the community of evolutionary biologists, whose interests go well beyond widely conserved physiologies and developmental patterns as they seek to understand the generative mechanisms underlying biological diversity.

Collapse

Celton M, Malpertuy A, Lelandais G, de Brevern AG. Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC Genomics 2010;11:15. [PMID: 20056002 PMCID: PMC2827407 DOI: 10.1186/1471-2164-11-15] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2009] [Accepted: 01/07/2010] [Indexed: 11/17/2022] Open

Abstract

Background

Microarray technologies produced large amount of data. In a previous study, we have shown the interest of k-Nearest Neighbour approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human.

Results

We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (EM_array). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that k-means approach is more efficient to conserve gene associations.

Conclusions

More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The EM_array approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A noticeable point is the specific influence of some biological dataset.

Collapse

Liolios K, Chen IMA, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2010;38:D346-54. [PMID: 19914934 PMCID: PMC2808860 DOI: 10.1093/nar/gkp848] [Citation(s) in RCA: 312] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2009] [Accepted: 09/22/2009] [Indexed: 11/14/2022] Open

Affiliation(s)

Konstantinos Liolios Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
I-Min A. Chen Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
Konstantinos Mavromatis Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
Nektarios Tavernarakis Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
Philip Hugenholtz Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
Victor M. Markowitz Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
Nikos C. Kyrpides Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA

Collapse

Taffs R, Aston JE, Brileya K, Jay Z, Klatt CG, McGlynn S, Mallette N, Montross S, Gerlach R, Inskeep WP, Ward DM, Carlson RP. In silico approaches to study mass and energy flows in microbial consortia: a syntrophic case study. BMC SYSTEMS BIOLOGY 2009;3:114. [PMID: 20003240 PMCID: PMC2799449 DOI: 10.1186/1752-0509-3-114] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2009] [Accepted: 12/10/2009] [Indexed: 11/14/2022]

Abstract

BACKGROUND

Three methods were developed for the application of stoichiometry-based network analysis approaches including elementary mode analysis to the study of mass and energy flows in microbial communities. Each has distinct advantages and disadvantages suitable for analyzing systems with different degrees of complexity and a priori knowledge. These approaches were tested and compared using data from the thermophilic, phototrophic mat communities from Octopus and Mushroom Springs in Yellowstone National Park (USA). The models were based on three distinct microbial guilds: oxygenic phototrophs, filamentous anoxygenic phototrophs, and sulfate-reducing bacteria. Two phases, day and night, were modeled to account for differences in the sources of mass and energy and the routes available for their exchange.

RESULTS

The in silico models were used to explore fundamental questions in ecology including the prediction of and explanation for measured relative abundances of primary producers in the mat, theoretical tradeoffs between overall productivity and the generation of toxic by-products, and the relative robustness of various guild interactions.

CONCLUSION

The three modeling approaches represent a flexible toolbox for creating cellular metabolic networks to study microbial communities on scales ranging from cells to ecosystems. A comparison of the three methods highlights considerations for selecting the one most appropriate for a given microbial system. For instance, communities represented only by metagenomic data can be modeled using the pooled method which analyzes a community's total metabolic potential without attempting to partition enzymes to different organisms. Systems with extensive a priori information on microbial guilds can be represented using the compartmentalized technique, employing distinct control volumes to separate guild-appropriate enzymes and metabolites. If the complexity of a compartmentalized network creates an unacceptable computational burden, the nested analysis approach permits greater scalability at the cost of more user intervention through multiple rounds of pathway analysis.

Collapse

Gao M, Skolnick J. A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol 2009;5:e1000567. [PMID: 19911048 PMCID: PMC2770119 DOI: 10.1371/journal.pcbi.1000567] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Accepted: 10/16/2009] [Indexed: 11/18/2022] Open

Abstract

Diverse mechanisms for DNA-protein recognition have been elucidated in numerous atomic complex structures from various protein families. These structural data provide an invaluable knowledge base not only for understanding DNA-protein interactions, but also for developing specialized methods that predict the DNA-binding function from protein structure. While such methods are useful, a major limitation is that they require an experimental structure of the target as input. To overcome this obstacle, we develop a threading-based method, DNA-Binding-Domain-Threader (DBD-Threader), for the prediction of DNA-binding domains and associated DNA-binding protein residues. Our method, which uses a template library composed of DNA-protein complex structures, requires only the target protein's sequence. In our approach, fold similarity and DNA-binding propensity are employed as two functional discriminating properties. In benchmark tests on 179 DNA-binding and 3,797 non-DNA-binding proteins, using templates whose sequence identity is less than 30% to the target, DBD-Threader achieves a sensitivity/precision of 56%/86%. This performance is considerably better than the standard sequence comparison method PSI-BLAST and is comparable to DBD-Hunter, which requires an experimental structure as input. Moreover, for over 70% of predicted DNA-binding domains, the backbone Root Mean Square Deviations (RMSDs) of the top-ranked structural models are within 6.5 A of their experimental structures, with their associated DNA-binding sites identified at satisfactory accuracy. Additionally, DBD-Threader correctly assigned the SCOP superfamily for most predicted domains. To demonstrate that DBD-Threader is useful for automatic function annotation on a large-scale, DBD-Threader was applied to 18,631 protein sequences from the human genome; 1,654 proteins are predicted to have DNA-binding function. Comparison with existing Gene Ontology (GO) annotations suggests that approximately 30% of our predictions are new. Finally, we present some interesting predictions in detail. In particular, it is estimated that approximately 20% of classic zinc finger domains play a functional role not related to direct DNA-binding.

Collapse

am Busch MS, Mignon D, Simonson T. Computational protein design as a tool for fold recognition. Proteins 2009;77:139-58. [PMID: 19408297 DOI: 10.1002/prot.22426] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

A New Multiplexed Real-Time PCR Assay to Detect Campylobacter jejuni, C. coli, C. lari, and C. upsaliensis. FOOD ANAL METHOD 2009. [DOI: 10.1007/s12161-009-9110-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Ramos AA, Marques AR, Rodrigues M, Henriques N, Baumgartner A, Castilho R, Brenig B, Varela JC. Molecular and functional characterization of a cDNA encoding 4-hydroxy-3-methylbut-2-enyl diphosphate reductase from Dunaliella salina. JOURNAL OF PLANT PHYSIOLOGY 2009;166:968-77. [PMID: 19155093 DOI: 10.1016/j.jplph.2008.11.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2008] [Revised: 11/10/2008] [Accepted: 11/10/2008] [Indexed: 05/03/2023]

Li H, Kristensen DM, Coleman MK, Mushegian A. Detection of biochemical pathways by probabilistic matching of phyletic vectors. PLoS One 2009;4:e5326. [PMID: 19390636 PMCID: PMC2670198 DOI: 10.1371/journal.pone.0005326] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 02/10/2009] [Indexed: 11/18/2022] Open

Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 2009;10:67. [PMID: 19236712 PMCID: PMC2653490 DOI: 10.1186/1471-2105-10-67] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Accepted: 02/23/2009] [Indexed: 11/22/2022] Open