1
|
Loci Encoding Compounds Potentially Active against Drug-Resistant Pathogens amidst a Decreasing Pool of Novel Antibiotics. Appl Environ Microbiol 2019; 85:AEM.01438-19. [PMID: 31540982 PMCID: PMC6856318 DOI: 10.1128/aem.01438-19] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 09/12/2019] [Indexed: 12/13/2022] Open
Abstract
Carbapenem-resistant P. aeruginosa is difficult to treat and has been deemed by the World Health Organization as a priority one pathogen for which antibiotics are most urgently needed. Although metagenomics and bioinformatic studies suggest that natural bacteria remain a source of novel compounds, the identification of genes and their products specific to activity against MDR pathogens remains problematic. Here, we examine water-derived pseudomonads and identify gene clusters whose compounds inhibit CF-derived MDR pathogens, including carbapenem-resistant P. aeruginosa. Since the discovery of penicillin, microbes have been a source of antibiotics that inhibit the growth of pathogens. However, with the evolution of multidrug-resistant (MDR) strains, it remains unclear if there is an abundant or limited supply of natural products to be discovered that are effective against MDR isolates. To identify strains that are antagonistic to pathogens, we examined a set of 471 globally derived environmental Pseudomonas strains (env-Ps) for activity against a panel of 65 pathogens including Achromobacter spp., Burkholderia spp., Pseudomonas aeruginosa, and Stenotrophomonas spp. isolated from the lungs of cystic fibrosis (CF) patients. From more than 30,000 competitive interactions, 1,530 individual inhibitory events were observed. While strains from water habitats were not proportionate in antagonistic activity, MDR CF-derived pathogens (CF-Ps) were less susceptible to inhibition by env-Ps, suggesting that fewer natural products are effective against MDR strains. These results advocate for a directed strategy to identify unique drugs. To facilitate discovery of antibiotics against the most resistant pathogens, we developed a workflow in which phylogenetic and antagonistic data were merged to identify strains that inhibit MDR CF-Ps and subjected those env-Ps to transposon mutagenesis. Six different biosynthetic gene clusters (BGCs) were identified from four strains whose products inhibited pathogens including carbapenem-resistant P. aeruginosa. BGCs were rare in databases, suggesting the production of novel antibiotics. This strategy can be utilized to facilitate the discovery of needed antibiotics that are potentially active against the most drug-resistant pathogens. IMPORTANCE Carbapenem-resistant P. aeruginosa is difficult to treat and has been deemed by the World Health Organization as a priority one pathogen for which antibiotics are most urgently needed. Although metagenomics and bioinformatic studies suggest that natural bacteria remain a source of novel compounds, the identification of genes and their products specific to activity against MDR pathogens remains problematic. Here, we examine water-derived pseudomonads and identify gene clusters whose compounds inhibit CF-derived MDR pathogens, including carbapenem-resistant P. aeruginosa.
Collapse
|
2
|
Zhang M, Zundel Z, Myers CJ. SBOLExplorer: Data Infrastructure and Data Mining for Genetic Design Repositories. ACS Synth Biol 2019; 8:2287-2294. [PMID: 31532640 DOI: 10.1021/acssynbio.9b00089] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
This paper describes SBOLExplorer, a system that is used to provide intuitive searching within the SynBioHub genetic design repository. SynBioHub stores genetic constructs encoded in the SBOL data format. These constructs can represent genetic parts, circuits, and sequences. These constructs are often numerous, exist in various states of completeness and documentation, and do not lend themselves to simple searching and discovery. In particular, this paper focuses on improving the search capabilities of SynBioHub. Inspiration is drawn from the techniques used to organize and search over the World Wide Web, a linked data set with many of the same properties of the SBOL data in SynBioHub. SBOLExplorer integrates these methods into SynBioHub's data representation and search, providing significant improvement over the previous search implementation based on pattern-matching.
Collapse
Affiliation(s)
- Michael Zhang
- School of Computing, University of Utah, Salt Lake City, Utah 84112, United States
| | - Zach Zundel
- School of Computing, University of Utah, Salt Lake City, Utah 84112, United States
| | - Chris J. Myers
- Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, Utah 84112, United States
| |
Collapse
|
3
|
Computational methods and tools for binding site recognition between proteins and small molecules: from classical geometrical approaches to modern machine learning strategies. J Comput Aided Mol Des 2019; 33:887-903. [PMID: 31628659 DOI: 10.1007/s10822-019-00235-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/11/2019] [Indexed: 10/25/2022]
Abstract
In the current "genomic era" the number of identified genes is growing exponentially. However, the biological function of a large number of the corresponding proteins is still unknown. Recognition of small molecule ligands (e.g., substrates, inhibitors, allosteric regulators, etc.) is pivotal for protein functions in the vast majority of the cases and knowledge of the region where these processes take place is essential for protein function prediction and drug design. In this regard, computational methods represent essential tools to tackle this problem. A significant number of software tools have been developed in the last few years which exploit either protein sequence information, structure information or both. This review describes the most recent developments in protein function recognition and binding site prediction, in terms of both freely-available and commercial solutions and tools, detailing the main characteristics of the considered tools and providing a comparative analysis of their performance.
Collapse
|
4
|
Mishra A, Pokhrel P, Hoque MT. StackDPPred: a stacking based prediction of DNA-binding protein from sequence. Bioinformatics 2018; 35:433-441. [DOI: 10.1093/bioinformatics/bty653] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 07/18/2018] [Indexed: 12/12/2022] Open
Affiliation(s)
- Avdesh Mishra
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Pujan Pokhrel
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Md Tamjidul Hoque
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| |
Collapse
|
5
|
Wagner A, Norris S, Chatterjee P, Morris PF, Wildschutte H. Aquatic Pseudomonads Inhibit Oomycete Plant Pathogens of Glycine max. Front Microbiol 2018; 9:1007. [PMID: 29896163 PMCID: PMC5986895 DOI: 10.3389/fmicb.2018.01007] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Accepted: 04/30/2018] [Indexed: 11/17/2022] Open
Abstract
Seedling root rot of soybeans caused by the host-specific pathogen Phytophthora sojae, and a large number of Pythium species, is an economically important disease across the Midwest United States that negatively impacts soybean yields. Research on biocontrol strategies for crop pathogens has focused on compounds produced by microbes from soil, however, recent studies suggest that aquatic bacteria express distinct compounds that efficiently inhibit a wide range of pathogens. Based on these observations, we hypothesized that freshwater strains of pseudomonads might be producing novel antagonistic compounds that inhibit the growth of oomycetes. To test this prediction, we utilized a collection of 330 Pseudomonas strains isolated from soil and freshwater habitats, and determined their activity against a panel of five oomycetes: Phytophthora sojae, Pythium heterothalicum, Pythium irregulare, Pythium sylvaticum, and Pythium ultimum, all of which are pathogenic on soybeans. Among the bacterial strains, 118 exhibited antagonistic activity against at least one oomycete species, and 16 strains were inhibitory to all pathogens. Antagonistic activity toward oomycetes was significantly more common for aquatic isolates than for soil isolates. One water-derived strain, 06C 126, was predicted to express a siderophore and exhibited diverse antagonistic profiles when tested on nutrient rich and iron depleted media suggesting that more than one compound was produced that effectively inhibited oomycetes. These results support the concept that aquatic strains are an efficient source of compounds that inhibit pathogens. We outline a strategy to identify other strains that express unique compounds that may be useful biocontrol agents.
Collapse
Affiliation(s)
| | | | | | - Paul F. Morris
- Department of Biological Sciences, Bowling Green State University, Bowling Green, OH, United States
| | | |
Collapse
|
6
|
Advanced In Silico Tools for Designing of Antigenic Epitope as Potential Vaccine Candidates Against Coronavirus. BIOINFORMATICS: SEQUENCES, STRUCTURES, PHYLOGENY 2018. [PMCID: PMC7120312 DOI: 10.1007/978-981-13-1562-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Vaccines are the most economical and potent substitute of available medicines to cure various bacterial and viral diseases. Earlier, killed or attenuated pathogens were employed for vaccine development. But in present era, the peptide vaccines are in much trend and are favoured over whole vaccines because of their superiority over conventional vaccines. These vaccines are either based on single proteins or on synthetic peptides including several B-cell and T-cell epitopes. However, the overall mechanism of action remains the same and works by prompting the immune system to activate the specific B-cell- and T-cell-mediated responses against the pathogen. Rino Rappuoli and others have contributed in this field by plotting the design of the most potent and fully computational approach for discovery of potential vaccine candidates which is popular as reverse vaccinology. This is quite an unambiguous advance for vaccine evolution where one begins with the genome information of the pathogen and ends up with the list of certain epitopes after application of multiple bioinformatics tools. This book chapter is an effort to bring this approach of reverse vaccinology into notice of readers using example of coronavirus.
Collapse
|
7
|
Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Verezemska O, Isbandi M, Thomas AD, Ali R, Sharma K, Kyrpides NC, Reddy TBK. Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements. Nucleic Acids Res 2017; 45:D446-D456. [PMID: 27794040 PMCID: PMC5210664 DOI: 10.1093/nar/gkw992] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 10/11/2016] [Accepted: 10/19/2016] [Indexed: 01/28/2023] Open
Abstract
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years.
Collapse
Affiliation(s)
- Supratim Mukherjee
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Dimitri Stamatis
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Jon Bertsch
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Galina Ovchinnikova
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Olena Verezemska
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Michelle Isbandi
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Alex D Thomas
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Rida Ali
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Kaushal Sharma
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Nikos C Kyrpides
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - T B K Reddy
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| |
Collapse
|
8
|
Brumm PJ, Land ML, Mead DA. Complete genome sequences of Geobacillus sp. WCH70, a thermophilic strain isolated from wood compost. Stand Genomic Sci 2016; 11:33. [PMID: 27123157 PMCID: PMC4847372 DOI: 10.1186/s40793-016-0153-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 04/19/2016] [Indexed: 11/10/2022] Open
Abstract
Geobacillus sp. WCH70 was one of several thermophilic organisms isolated from hot composts in the Middleton, WI area. Comparison of 16 S rRNA sequences showed the strain may be a new species, and is most closely related to G. galactosidasius and G. toebii. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2009 (CP001638). The genome of Geobacillus species WCH70 consists of one circular chromosome of 3,893,306 bp with an average G + C content of 43 %, and two circular plasmids of 33,899 and 10,287 bp with an average G + C content of 40 %. Among sequenced organisms, Geobacillus sp. WCH70 shares highest Average Nucleotide Identity (86 %) with G. thermoglucosidasius strains, as well as similar genome organization. Geobacillus sp. WCH70 appears to be a highly adaptable organism, with an exceptionally high 125 annotated transposons in the genome. The organism also possesses four predicted restriction-modification systems not found in other Geobacillus species.
Collapse
Affiliation(s)
- Phillip J. Brumm
- />C5-6 Technologies LLC, Fitchburg, Wisconsin USA
- />Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, Wisconsin USA
| | - Miriam L. Land
- />Oak Ridge National Laboratory, Oak Ridge, Tennessee USA
| | - David A. Mead
- />Lucigen Corporation, Middleton, Wisconsin USA
- />Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, Wisconsin USA
| |
Collapse
|
9
|
Brumm P, Land ML, Hauser LJ, Jeffries CD, Chang YJ, Mead DA. Complete genome sequences of Geobacillus sp. Y412MC52, a xylan-degrading strain isolated from obsidian hot spring in Yellowstone National Park. Stand Genomic Sci 2015; 10:81. [PMID: 26500717 PMCID: PMC4617443 DOI: 10.1186/s40793-015-0075-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 10/09/2015] [Indexed: 11/10/2022] Open
Abstract
Geobacillus sp. Y412MC52 was isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). Based on 16S rRNA genes and average nucleotide identity, Geobacillus sp. Y412MC52 and the related Geobacillus sp. Y412MC61 appear to be members of a new species of Geobacillus. The genome of Geobacillus sp. Y412MC52 consists of one circular chromosome of 3,628,883 bp, an average G + C content of 52 % and one circular plasmid of 45,057 bp and an average G + C content of 45 %. Y412MC52 possesses arabinan, arabinoglucuronoxylan, and aromatic acid degradation clusters for degradation of hemicellulose from biomass. Transport and utilization clusters are also present for other carbohydrates including starch, cellobiose, and α- and β-galactooligosaccharides.
Collapse
Affiliation(s)
| | | | | | | | - Yun-Juan Chang
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM USA
| | | |
Collapse
|
10
|
Brumm P, Land ML, Hauser LJ, Jeffries CD, Chang YJ, Mead DA. Complete genome sequences of Geobacillus sp. Y412MC52, a xylan-degrading strain isolated from obsidian hot spring in Yellowstone National Park. Stand Genomic Sci 2015. [PMID: 26500717 DOI: 10.1186/s40793-015-0075-0 10.1186/s40793-016-0133-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Geobacillus sp. Y412MC52 was isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). Based on 16S rRNA genes and average nucleotide identity, Geobacillus sp. Y412MC52 and the related Geobacillus sp. Y412MC61 appear to be members of a new species of Geobacillus. The genome of Geobacillus sp. Y412MC52 consists of one circular chromosome of 3,628,883 bp, an average G + C content of 52 % and one circular plasmid of 45,057 bp and an average G + C content of 45 %. Y412MC52 possesses arabinan, arabinoglucuronoxylan, and aromatic acid degradation clusters for degradation of hemicellulose from biomass. Transport and utilization clusters are also present for other carbohydrates including starch, cellobiose, and α- and β-galactooligosaccharides.
Collapse
Affiliation(s)
| | | | | | | | - Yun-Juan Chang
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM USA
| | | |
Collapse
|
11
|
Brumm PJ, Land ML, Mead DA. Complete genome sequence of Geobacillus thermoglucosidasius C56-YS93, a novel biomass degrader isolated from obsidian hot spring in Yellowstone National Park. Stand Genomic Sci 2015; 10:73. [PMID: 26442136 PMCID: PMC4593210 DOI: 10.1186/s40793-015-0031-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 06/29/2015] [Indexed: 11/29/2022] Open
Abstract
Geobacillus thermoglucosidasius C56-YS93 was one of several thermophilic organisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. Comparison of 16 S rRNA sequences confirmed the classification of the strain as a G. thermoglucosidasius species. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). The genome of G. thermoglucosidasius C56-YS93 consists of one circular chromosome of 3,893,306 bp and two circular plasmids of 80,849 and 19,638 bp and an average G + C content of 43.93 %. G. thermoglucosidasius C56-YS93 possesses a xylan degradation cluster not found in the other G. thermoglucosidasius sequenced strains. This cluster appears to be related to the xylan degradation cluster found in G. stearothermophilus. G. thermoglucosidasius C56-YS93 possesses two plasmids not found in the other two strains. One plasmid contains a novel gene cluster coding for proteins involved in proline degradation and metabolism, the other contains a collection of mostly hypothetical proteins.
Collapse
Affiliation(s)
- Phillip J Brumm
- Lucigen Corporation, Middleton, WI USA ; Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, WI USA
| | | | - David A Mead
- Lucigen Corporation, Middleton, WI USA ; Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, WI USA
| |
Collapse
|
12
|
Xu R, Zhou J, Wang H, He Y, Wang X, Liu B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 1:S10. [PMID: 25708928 PMCID: PMC4331676 DOI: 10.1186/1752-0509-9-s1-s10] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
BACKGROUND DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. RESULTS We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. CONCLUSIONS The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.
Collapse
Affiliation(s)
- Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Yulan He
- School of Engineering & Applied Science, Aston University, Birmingham, UK
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| |
Collapse
|
13
|
Brumm P, Land ML, Hauser LJ, Jeffries CD, Chang YJ, Mead DA. Complete genome sequences of Geobacillus sp. Y412MC52, a xylan-degrading strain isolated from obsidian hot spring in Yellowstone National Park. Stand Genomic Sci 2015. [PMID: 26500717 PMCID: PMC4617443 DOI: 10.1186/s40793-015-0075-0+10.1186/s40793-016-0133-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Geobacillus sp. Y412MC52 was isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2011 (CP002835). Based on 16S rRNA genes and average nucleotide identity, Geobacillus sp. Y412MC52 and the related Geobacillus sp. Y412MC61 appear to be members of a new species of Geobacillus. The genome of Geobacillus sp. Y412MC52 consists of one circular chromosome of 3,628,883 bp, an average G + C content of 52 % and one circular plasmid of 45,057 bp and an average G + C content of 45 %. Y412MC52 possesses arabinan, arabinoglucuronoxylan, and aromatic acid degradation clusters for degradation of hemicellulose from biomass. Transport and utilization clusters are also present for other carbohydrates including starch, cellobiose, and α- and β-galactooligosaccharides.
Collapse
Affiliation(s)
| | | | | | | | - Yun-Juan Chang
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM USA
| | | |
Collapse
|
14
|
Urban M, Pant R, Raghunath A, Irvine AG, Pedro H, Hammond-Kosack KE. The Pathogen-Host Interactions database (PHI-base): additions and future developments. Nucleic Acids Res 2015; 43:D645-55. [PMID: 25414340 PMCID: PMC4383963 DOI: 10.1093/nar/gku1165] [Citation(s) in RCA: 151] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Revised: 10/30/2014] [Accepted: 10/30/2014] [Indexed: 12/12/2022] Open
Abstract
Rapidly evolving pathogens cause a diverse array of diseases and epidemics that threaten crop yield, food security as well as human, animal and ecosystem health. To combat infection greater comparative knowledge is required on the pathogenic process in multiple species. The Pathogen-Host Interactions database (PHI-base) catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and protist pathogens. Mutant phenotypes are associated with gene information. The included pathogens infect a wide range of hosts including humans, animals, plants, insects, fish and other fungi. The current version, PHI-base 3.6, available at http://www.phi-base.org, stores information on 2875 genes, 4102 interactions, 110 host species, 160 pathogenic species (103 plant, 3 fungal and 54 animal infecting species) and 181 diseases drawn from 1243 references. Phenotypic and gene function information has been obtained by manual curation of the peer-reviewed literature. A controlled vocabulary consisting of nine high-level phenotype terms permits comparisons and data analysis across the taxonomic space. PHI-base phenotypes were mapped via their associated gene information to reference genomes available in Ensembl Genomes. Virulence genes and hotspots can be visualized directly in genome browsers. Future plans for PHI-base include development of tools facilitating community-led curation and inclusion of the corresponding host target(s).
Collapse
Affiliation(s)
- Martin Urban
- Department of Plant Biology and Crop Science, Rothamsted Research, Harpenden, Herts, AL5 2JQ, UK
| | - Rashmi Pant
- Molecular Connections Private Limited, Basavanagudi, Bangalore 560 004, Karnataka, India
| | - Arathi Raghunath
- Molecular Connections Private Limited, Basavanagudi, Bangalore 560 004, Karnataka, India
| | - Alistair G Irvine
- Department of Computational and Systems Biology, Rothamsted Research, Harpenden, Herts, AL5 2JQ, UK
| | - Helder Pedro
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Kim E Hammond-Kosack
- Department of Plant Biology and Crop Science, Rothamsted Research, Harpenden, Herts, AL5 2JQ, UK
| |
Collapse
|
15
|
Marcus S, Lee H, Schatz MC. SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips. ACTA ACUST UNITED AC 2014; 30:3476-83. [PMID: 25398610 DOI: 10.1093/bioinformatics/btu756] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Genomics is expanding from a single reference per species paradigm into a more comprehensive pan-genome approach that analyzes multiple individuals together. A compressed de Bruijn graph is a sophisticated data structure for representing the genomes of entire populations. It robustly encodes shared segments, simple single-nucleotide polymorphisms and complex structural variations far beyond what can be represented in a collection of linear sequences alone. RESULTS We explore deep topological relationships between suffix trees and compressed de Bruijn graphs and introduce an algorithm, splitMEM, that directly constructs the compressed de Bruijn graph in time and space linear to the total number of genomes for a given maximum genome size. We introduce suffix skips to traverse several suffix links simultaneously and use them to efficiently decompose maximal exact matches into graph nodes. We demonstrate the utility of splitMEM by analyzing the nine-strain pan-genome of Bacillus anthracis and up to 62 strains of Escherichia coli, revealing their core-genome properties.
Collapse
Affiliation(s)
- Shoshana Marcus
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA
| | - Hayan Lee
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA
| | - Michael C Schatz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
16
|
Reddy TBK, Thomas AD, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos EA, Kyrpides NC. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res 2014; 43:D1099-106. [PMID: 25348402 DOI: 10.1093/nar/gku950] [Citation(s) in RCA: 259] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.
Collapse
Affiliation(s)
- T B K Reddy
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Alex D Thomas
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Dimitri Stamatis
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Jon Bertsch
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Michelle Isbandi
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Jakob Jansson
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Jyothi Mallajosyula
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Ioanna Pagani
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Elizabeth A Lobos
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
| | - Nikos C Kyrpides
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
17
|
Simonyan V, Mazumder R. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes (Basel) 2014; 5:957-81. [PMID: 25271953 PMCID: PMC4276921 DOI: 10.3390/genes5040957] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Revised: 09/22/2014] [Accepted: 09/22/2014] [Indexed: 12/30/2022] Open
Abstract
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.
Collapse
Affiliation(s)
- Vahan Simonyan
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA.
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA.
| |
Collapse
|
18
|
Lua RC, Marciano DC, Katsonis P, Adikesavan AK, Wilkins AD, Lichtarge O. Prediction and redesign of protein-protein interactions. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:194-202. [PMID: 24878423 DOI: 10.1016/j.pbiomolbio.2014.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 05/02/2014] [Accepted: 05/17/2014] [Indexed: 12/14/2022]
Abstract
Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function - those mediated by protein-protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI.
Collapse
Affiliation(s)
- Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David C Marciano
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Anbu K Adikesavan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
19
|
Abstract
As more and more systems biology approaches are used to investigate the different types of biological macromolecules, increasing numbers of whole genomic studies are now available for a large array of organisms. Whether it is genomics, transcriptomics, proteomics, interactomics or metabolomics, the full complement of genomic information on all different levels can be juxtaposed between different organisms to reveal similarities or differences, and even to provide consensus models. At the intersection of comparative genomics and systems biology lies great possibility for discovery, analysis and prediction. This paper explores this nexus and the relationship from four general levels: DNA, RNA, protein and extragenomic. For each level, we provide an overview of the methods, discuss the potential challenges and survey the current research. Finally, we suggest some organizing principles and make proposals for new areas that will be important for future research.
Collapse
Affiliation(s)
- Jimmy Lin
- Wilmer Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | | |
Collapse
|
20
|
Cipriano MJ, Novichkov PN, Kazakov AE, Rodionov DA, Arkin AP, Gelfand MS, Dubchak I. RegTransBase--a database of regulatory sequences and interactions based on literature: a resource for investigating transcriptional regulation in prokaryotes. BMC Genomics 2013; 14:213. [PMID: 23547897 PMCID: PMC3639892 DOI: 10.1186/1471-2164-14-213] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 03/22/2013] [Indexed: 11/10/2022] Open
Abstract
Background Due to the constantly growing number of sequenced microbial genomes, comparative genomics has been playing a major role in the investigation of regulatory interactions in bacteria. Regulon inference mostly remains a field of semi-manual examination since absence of a knowledgebase and informatics platform for automated and systematic investigation restricts opportunities for computational prediction. Additionally, confirming computationally inferred regulons by experimental data is critically important. Description RegTransBase is an open-access platform with a user-friendly web interface publicly available at http://regtransbase.lbl.gov. It consists of two databases – a manually collected hierarchical regulatory interactions database based on more than 7000 scientific papers which can serve as a knowledgebase for verification of predictions, and a large set of curated by experts transcription factor binding sites used in regulon inference by a variety of tools. RegTransBase captures the knowledge from published scientific literature using controlled vocabularies and contains various types of experimental data, such as: the activation or repression of transcription by an identified direct regulator; determination of the transcriptional regulatory function of a protein (or RNA) directly binding to DNA or RNA; mapping of binding sites for a regulatory protein; characterization of regulatory mutations. Analysis of the data collected from literature resulted in the creation of Putative Regulons from Experimental Data that are also available in RegTransBase. Conclusions RegTransBase is a powerful user-friendly platform for the investigation of regulation in prokaryotes. It uses a collection of validated regulatory sequences that can be easily extracted and used to infer regulatory interactions by comparative genomics techniques thus assisting researchers in the interpretation of transcriptional regulation data.
Collapse
Affiliation(s)
- Michael J Cipriano
- Department of Microbiology, University of California Davis, Davis, CA 95616, USA
| | | | | | | | | | | | | |
Collapse
|
21
|
Tiwari MK, Singh R, Singh RK, Kim IW, Lee JK. Computational approaches for rational design of proteins with novel functionalities. Comput Struct Biotechnol J 2012; 2:e201209002. [PMID: 24688643 PMCID: PMC3962203 DOI: 10.5936/csbj.201209002] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 08/17/2012] [Accepted: 08/23/2012] [Indexed: 11/22/2022] Open
Abstract
Proteins are the most multifaceted macromolecules in living systems and have various important functions, including structural, catalytic, sensory, and regulatory functions. Rational design of enzymes is a great challenge to our understanding of protein structure and physical chemistry and has numerous potential applications. Protein design algorithms have been applied to design or engineer proteins that fold, fold faster, catalyze, catalyze faster, signal, and adopt preferred conformational states. The field of de novo protein design, although only a few decades old, is beginning to produce exciting results. Developments in this field are already having a significant impact on biotechnology and chemical biology. The application of powerful computational methods for functional protein designing has recently succeeded at engineering target activities. Here, we review recently reported de novo functional proteins that were developed using various protein design approaches, including rational design, computational optimization, and selection from combinatorial libraries, highlighting recent advances and successes.
Collapse
Affiliation(s)
- Manish Kumar Tiwari
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; These authors contributed equally
| | - Ranjitha Singh
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; These authors contributed equally
| | - Raushan Kumar Singh
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| | - In-Won Kim
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| | - Jung-Kul Lee
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; Institute of SK-KU Biomaterials, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| |
Collapse
|
22
|
Plant and bacterial systems biology as platform for plant synthetic bio(techno)logy. J Biotechnol 2012; 160:80-90. [DOI: 10.1016/j.jbiotec.2012.01.014] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 01/10/2012] [Accepted: 01/17/2012] [Indexed: 11/17/2022]
|
23
|
Defining sequence space and reaction products within the cyanuric acid hydrolase (AtzD)/barbiturase protein family. J Bacteriol 2012; 194:4579-88. [PMID: 22730121 DOI: 10.1128/jb.00791-12] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Cyanuric acid hydrolases (AtzD) and barbiturases are homologous, found almost exclusively in bacteria, and comprise a rare protein family with no discernible linkage to other protein families or an X-ray structural class. There has been confusion in the literature and in genome projects regarding the reaction products, the assignment of individual sequences as either cyanuric acid hydrolases or barbiturases, and spurious connection of this family to another protein family. The present study has addressed those issues. First, the published enzyme reaction products of cyanuric acid hydrolase are incorrectly identified as biuret and carbon dioxide. The current study employed (13)C nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry to show that cyanuric acid hydrolase releases carboxybiuret, which spontaneously decarboxylates to biuret. This is significant because it revealed that homologous cyanuric acid hydrolases and barbiturases catalyze completely analogous reactions. Second, enzymes that had been annotated incorrectly in genome projects have been reassigned here by bioinformatics, gene cloning, and protein characterization studies. Third, the AtzD/barbiturase family has previously been suggested to consist of members of the amidohydrolase superfamily, a large class of metallohydrolases. Bioinformatics and the lack of bound metals both argue against a connection to the amidohydrolase superfamily. Lastly, steady-state kinetic measurements and observations of protein stability suggested that the AtzD/barbiturase family might be an undistinguished protein family that has undergone some resurgence with the recent introduction of industrial s-triazine compounds such as atrazine and melamine into the environment.
Collapse
|
24
|
Lee KS, Kim RN, Yoon BH, Kim DS, Choi SH, Kim DW, Nam SH, Kim A, Kang A, Park KH, Jung JE, Chae SH, Park HS. Bacterial genome mapper: A comparative bacterial genome mapping tool. Bioinformation 2012; 8:532-4. [PMID: 22829725 PMCID: PMC3398773 DOI: 10.6026/97320630008532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 06/03/2012] [Indexed: 11/29/2022] Open
Abstract
UNLABELLED Recently, next generation sequencing (NGS) technologies have led to a revolutionary increase in sequencing speed and costefficacy. Consequently, a vast number of contigs from many recently sequenced bacterial genomes remain to be accurately mapped and annotated, requiring the development of more convenient bioinformatics programs. In this paper, we present a newly developed web-based bioinformatics program, Bacterial Genome Mapper, which is suitable for mapping and annotating contigs that have been assembled from bacterial genome sequence raw data. By constructing a multiple alignment map between target contig sequences and two reference bacterial genome sequences, this program also provides very useful comparative genomics analysis of draft bacterial genomes. AVAILABILITY The database is available for free at http://mbgm.kribb.re.kr.
Collapse
Affiliation(s)
- Kang Seon Lee
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
- University of Science and Technology (UST), Daejeon 305-333, Korea
- These authors contributed equally to this work
| | - Ryong Nam Kim
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
- These authors contributed equally to this work
| | - Byoung Ha Yoon
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
- University of Science and Technology (UST), Daejeon 305-333, Korea
- These authors contributed equally to this work
| | - Dae Soo Kim
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
| | - Sang Haeng Choi
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
| | - Dong Wook Kim
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
| | - Seong Hyeuk Nam
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
| | - Aeri Kim
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
| | - Aram Kang
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
- University of Science and Technology (UST), Daejeon 305-333, Korea
| | - Kun Hyang Park
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
| | - Jae Eun Jung
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
| | - Sung Hwa Chae
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
| | - Hong Seog Park
- Genome Resource Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea
- University of Science and Technology (UST), Daejeon 305-333, Korea
| |
Collapse
|
25
|
Kumar A, Suthers PF, Maranas CD. MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases. BMC Bioinformatics 2012; 13:6. [PMID: 22233419 PMCID: PMC3277463 DOI: 10.1186/1471-2105-13-6] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Accepted: 01/10/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Increasingly, metabolite and reaction information is organized in the form of genome-scale metabolic reconstructions that describe the reaction stoichiometry, directionality, and gene to protein to reaction associations. A key bottleneck in the pace of reconstruction of new, high-quality metabolic models is the inability to directly make use of metabolite/reaction information from biological databases or other models due to incompatibilities in content representation (i.e., metabolites with multiple names across databases and models), stoichiometric errors such as elemental or charge imbalances, and incomplete atomistic detail (e.g., use of generic R-group or non-explicit specification of stereo-specificity). DESCRIPTION MetRxn is a knowledgebase that includes standardized metabolite and reaction descriptions by integrating information from BRENDA, KEGG, MetaCyc, Reactome.org and 44 metabolic models into a single unified data set. All metabolite entries have matched synonyms, resolved protonation states, and are linked to unique structures. All reaction entries are elementally and charge balanced. This is accomplished through the use of a workflow of lexicographic, phonetic, and structural comparison algorithms. MetRxn allows for the download of standardized versions of existing genome-scale metabolic models and the use of metabolic information for the rapid reconstruction of new ones. CONCLUSIONS The standardization in description allows for the direct comparison of the metabolite and reaction content between metabolic models and databases and the exhaustive prospecting of pathways for biotechnological production. This ever-growing dataset currently consists of over 76,000 metabolites participating in more than 72,000 reactions (including unresolved entries). MetRxn is hosted on a web-based platform that uses relational database models (MySQL).
Collapse
Affiliation(s)
- Akhil Kumar
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
| | | | | |
Collapse
|
26
|
Pagani I, Liolios K, Jansson J, Chen IMA, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2012; 40:D571-9. [PMID: 22135293 PMCID: PMC3245063 DOI: 10.1093/nar/gkr1100] [Citation(s) in RCA: 375] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2011] [Revised: 11/02/2011] [Accepted: 11/03/2011] [Indexed: 12/03/2022] Open
Abstract
The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11,472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond.
Collapse
Affiliation(s)
- Ioanna Pagani
- Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Konstantinos Liolios
- Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Jakob Jansson
- Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - I-Min A. Chen
- Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Tatyana Smirnova
- Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Bahador Nosrat
- Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Victor M. Markowitz
- Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Nikos C. Kyrpides
- Department of Energy Joint Genome Institute, Microbial Genomics and Metagenomics Program, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley and Department of Energy Joint Genome Institute, Genome Portals Group, 2800 Mitchell Drive, Walnut Creek, CA, USA
| |
Collapse
|
27
|
Siebers B, Zaparty M, Raddatz G, Tjaden B, Albers SV, Bell SD, Blombach F, Kletzin A, Kyrpides N, Lanz C, Plagens A, Rampp M, Rosinus A, von Jan M, Makarova KS, Klenk HP, Schuster SC, Hensel R. The complete genome sequence of Thermoproteus tenax: a physiologically versatile member of the Crenarchaeota. PLoS One 2011; 6:e24222. [PMID: 22003381 PMCID: PMC3189178 DOI: 10.1371/journal.pone.0024222] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2010] [Accepted: 08/08/2011] [Indexed: 11/18/2022] Open
Abstract
Here, we report on the complete genome sequence of the hyperthermophilic Crenarchaeum Thermoproteus tenax (strain Kra1, DSM 2078T) a type strain of the crenarchaeotal order Thermoproteales. Its circular 1.84-megabase genome harbors no extrachromosomal elements and 2,051 open reading frames are identified, covering 90.6% of the complete sequence, which represents a high coding density. Derived from the gene content, T. tenax is a representative member of the Crenarchaeota. The organism is strictly anaerobic and sulfur-dependent with optimal growth at 86°C and pH 5.6. One particular feature is the great metabolic versatility, which is not accompanied by a distinct increase of genome size or information density as compared to other Crenarchaeota. T. tenax is able to grow chemolithoautotrophically (CO2/H2) as well as chemoorganoheterotrophically in presence of various organic substrates. All pathways for synthesizing the 20 proteinogenic amino acids are present. In addition, two presumably complete gene sets for NADH:quinone oxidoreductase (complex I) were identified in the genome and there is evidence that either NADH or reduced ferredoxin might serve as electron donor. Beside the typical archaeal A0A1-ATP synthase, a membrane-bound pyrophosphatase is found, which might contribute to energy conservation. Surprisingly, all genes required for dissimilatory sulfate reduction are present, which is confirmed by growth experiments. Mentionable is furthermore, the presence of two proteins (ParA family ATPase, actin-like protein) that might be involved in cell division in Thermoproteales, where the ESCRT system is absent, and of genes involved in genetic competence (DprA, ComF) that is so far unique within Archaea.
Collapse
Affiliation(s)
- Bettina Siebers
- Faculty of Chemistry, Biofilm Centre, Molecular Enzyme Technology and Biochemistry, University of Duisburg-Essen, Essen, Germany
- * E-mail: (BS); (MZ)
| | - Melanie Zaparty
- Institute for Molecular and Cellular Anatomy, University of Regensburg, Regensburg, Germany
- * E-mail: (BS); (MZ)
| | - Guenter Raddatz
- Max-Planck-Institute for Biological Cybernetics, Tübingen, Germany
| | - Britta Tjaden
- Prokaryotic RNA Biology, Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
| | - Sonja-Verena Albers
- Molecular Biology of Archaea, Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
| | - Steve D. Bell
- Sir William Dunn School of Pathology, Oxford University, Oxford, United Kingdom
| | - Fabian Blombach
- Laboratory of Microbiology, Wageningen University, Wageningen, The Netherlands
| | - Arnulf Kletzin
- Institute of Microbiology and Genetics, Technical University Darmstadt, Darmstadt, Germany
| | - Nikos Kyrpides
- DOE Joint Genome Institute, Walnut Creek, California, United States of America
| | - Christa Lanz
- Genome Centre, Max-Planck-Institute for Developmental Biology, Tuebingen, Germany
| | - André Plagens
- Prokaryotic RNA Biology, Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
| | - Markus Rampp
- Computer Centre Garching of the Max-Planck-Society (RZG), Max-Planck-Institute for Plasma Physics, München, Germany
| | - Andrea Rosinus
- Genome Centre, Max-Planck-Institute for Developmental Biology, Tuebingen, Germany
| | - Mathias von Jan
- DSMZ, German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Kira S. Makarova
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Hans-Peter Klenk
- DSMZ, German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Stephan C. Schuster
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Reinhard Hensel
- Prokaryotic RNA Biology, Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
| |
Collapse
|
28
|
Nanni L, Lumini A, Gupta D, Garg A. Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou's pseudo amino acid composition and on evolutionary information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 9:467-475. [PMID: 21860064 DOI: 10.1109/tcbb.2011.117] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The availability of a reliable prediction method for prediction of bacterial virulent proteins has several important applications in research efforts targeted aimed at finding novel drug targets, vaccine candidates, and understanding virulence mechanisms in pathogens. In this work, we have studied several feature extraction approaches for representing proteins and propose a novel bacterial virulent protein prediction method, based on an ensemble of classifiers where the features are extracted directly from the amino acid sequence and from the evolutionary information of a given protein. We have evaluated and compared several ensembles obtained by combining six feature extraction methods and several classification approaches based on two general purpose classifiers (i.e., Support Vector Machine and a variant of input decimated ensemble) and their random subspace version. An extensive evaluation was performed according to a blind testing protocol, where the parameters of the system are optimized using the training set and the system is validated in three different independent data sets, allowing selection of the most performing system and demonstrating the validity of the proposed method. Based on the results obtained using the blind test protocol, it is interesting to note that even if in each independent data set the most performing stand-alone method is not always the same, the fusion of different methods enhances prediction efficiency in all the tested independent data sets.
Collapse
|
29
|
Raes J, Letunic I, Yamada T, Jensen LJ, Bork P. Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data. Mol Syst Biol 2011; 7:473. [PMID: 21407210 PMCID: PMC3094067 DOI: 10.1038/msb.2011.6] [Citation(s) in RCA: 148] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2010] [Accepted: 01/25/2011] [Indexed: 11/10/2022] Open
Abstract
Using metagenomic ‘parts lists' to study microbial ecology remains a significant challenge. This work proposes a molecular trait-based approach to biogeography by integrating metagenomic data with external metadata and using functional community composition as readout. Climatic factors drive functional and phylogenetic composition of ocean microbial communities. Function dispersal is controlled by environmental conditions. Functional richness has a clear latitudinal gradient and correlates with primary production. Metagenomic data can be used as a predictor for ecosystem processes. To understand the relationship between community composition and environment, functional readouts are the most direct. Metagenomic data enable such trait-based ecology at the molecular level.
Metagenomics (shotgun sequencing of pooled DNA of complete microbial communities) is widely used to investigate ecosystem functioning of environmental and clinical samples. However, the nature of this data (usually a gigantic collection of gene fragments of 1000s of organisms) makes it very hard to infer global patterns on microbial ecology of the environment at hand. To address important ecological questions such as ‘How do microbial communities adapt to the environmental conditions?', ‘What drives the functional variation across the globe and to what extent do genes disperse?' and ‘What drives variation of CO2 uptake across different locations and communities?', we integrated 25 ocean metagenomes from the Global Ocean Sampling project with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the functional and phylogenetic composition of an environment and the main limiting factor on whether functions dispersal across the planet. We find a distinct latitudinal gradient in the size and diversity of the functional repertoire of ocean microbial communities, peaking at 20°N, and which correlates with oceanic CO2 uptake. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes can be used as quantitative predictor for molecular trait-based biogeography and ecology. Using metagenomic ‘parts lists' to infer global patterns on microbial ecology remains a significant challenge. To deduce important ecological indicators such as environmental adaptation, molecular trait dispersal, diversity variation and primary production from the gene pool of an ecosystem, we integrated 25 ocean metagenomes with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the biomolecular repertoire of each sample and the main limiting factor on functional trait dispersal (absence of biogeographic provincialism). Molecular functional richness and diversity show a distinct latitudinal gradient peaking at 20°N and correlate with primary production. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes is an important quantitative readout for molecular trait-based biogeography and ecology.
Collapse
Affiliation(s)
- Jeroen Raes
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | | | | | | | | |
Collapse
|
30
|
Grzymski JJ, Dussaq AM. The significance of nitrogen cost minimization in proteomes of marine microorganisms. ISME JOURNAL 2011; 6:71-80. [PMID: 21697958 PMCID: PMC3246230 DOI: 10.1038/ismej.2011.72] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Marine microorganisms thrive under low levels of nitrogen (N). N cost minimization is a major selective pressure imprinted on open-ocean microorganism genomes. Here we show that amino-acid sequences from the open ocean are reduced in N, but increased in average mass compared with coastal-ocean microorganisms. Nutrient limitation exerts significant pressure on organisms supporting the trade-off between N cost minimization and increased average mass of amino acids that is a function of increased A+T codon usage. N cost minimization, especially of highly expressed proteins, reduces the total cellular N budget by 2.7–10% this minimization in combination with reduction in genome size and cell size is an evolutionary adaptation to nutrient limitation. The biogeochemical and evolutionary precedent for these findings suggests that N limitation is a stronger selective force in the ocean than biosynthetic costs and is an important evolutionary strategy in resource-limited ecosystems.
Collapse
Affiliation(s)
- Joseph J Grzymski
- Division of Earth and Ecosystem Sciences, Desert Research Institute, Reno, NV, USA.
| | | |
Collapse
|
31
|
Galardini M, Biondi EG, Bazzicalupo M, Mengoni A. CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. SOURCE CODE FOR BIOLOGY AND MEDICINE 2011; 6:11. [PMID: 21693004 PMCID: PMC3133546 DOI: 10.1186/1751-0473-6-11] [Citation(s) in RCA: 217] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2011] [Accepted: 06/21/2011] [Indexed: 11/10/2022]
Abstract
Recent developments in sequencing technologies have given the opportunity to sequence many bacterial genomes with limited cost and labor, compared to previous techniques. However, a limiting step of genome sequencing is the finishing process, needed to infer the relative position of each contig and close sequencing gaps. An additional degree of complexity is given by bacterial species harboring more than one replicon, which are not contemplated by the currently available programs. The availability of a large number of bacterial genomes allows geneticists to use complete genomes (possibly from the same species) as templates for contigs mapping. Here we present CONTIGuator, a software tool for contigs mapping over a reference genome which allows the visualization of a map of contigs, underlining loss and/or gain of genetic elements and permitting to finish multipartite genomes. The functionality of CONTIGuator was tested using four genomes, demonstrating its improved performances compared to currently available programs. Our approach appears efficient, with a clear visualization, allowing the user to perform comparative structural genomics analysis on draft genomes. CONTIGuator is a Python script for Linux environments and can be used on normal desktop machines and can be downloaded from http://contiguator.sourceforge.net.
Collapse
Affiliation(s)
- Marco Galardini
- Department of Evolutionary Biology, University of Firenze, via Romana 17, I-50125 Firenze, Italy.
| | | | | | | |
Collapse
|
32
|
Uchôa NN, Ferreira RDP, Sachetto-Martins G, Müller AC. Ten years of the genomic era in Brazil: Impacts on technological development assessed by scientific production and patent analysis. WORLD PATENT INFORMATION 2011. [DOI: 10.1016/j.wpi.2010.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
33
|
Salichos L, Rokas A. Evaluating ortholog prediction algorithms in a yeast model clade. PLoS One 2011; 6:e18755. [PMID: 21533202 PMCID: PMC3076445 DOI: 10.1371/journal.pone.0018755] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/15/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Accurate identification of orthologs is crucial for evolutionary studies and for functional annotation. Several algorithms have been developed for ortholog delineation, but so far, manually curated genome-scale biological databases of orthologous genes for algorithm evaluation have been lacking. We evaluated four popular ortholog prediction algorithms (MultiParanoid; and OrthoMCL; RBH: Reciprocal Best Hit; RSD: Reciprocal Smallest Distance; the last two extended into clustering algorithms cRBH and cRSD, respectively, so that they can predict orthologs across multiple taxa) against a set of 2,723 groups of high-quality curated orthologs from 6 Saccharomycete yeasts in the Yeast Gene Order Browser. RESULTS Examination of sensitivity [TP/(TP+FN)], specificity [TN/(TN+FP)], and accuracy [(TP+TN)/(TP+TN+FP+FN)] across a broad parameter range showed that cRBH was the most accurate and specific algorithm, whereas OrthoMCL was the most sensitive. Evaluation of the algorithms across a varying number of species showed that cRBH had the highest accuracy and lowest false discovery rate [FP/(FP+TP)], followed by cRSD. Of the six species in our set, three descended from an ancestor that underwent whole genome duplication. Subsequent differential duplicate loss events in the three descendants resulted in distinct classes of gene loss patterns, including cases where the genes retained in the three descendants are paralogs, constituting 'traps' for ortholog prediction algorithms. We found that the false discovery rate of all algorithms dramatically increased in these traps. CONCLUSIONS These results suggest that simple algorithms, like cRBH, may be better ortholog predictors than more complex ones (e.g., OrthoMCL and MultiParanoid) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant.
Collapse
Affiliation(s)
- Leonidas Salichos
- Department of Biological Sciences, Vanderbilt University, Nashville,
Tennessee, United States of America
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville,
Tennessee, United States of America
| |
Collapse
|
34
|
Zhang N, Bilsland E. Contributions of Saccharomyces cerevisiae to understanding mammalian gene function and therapy. Methods Mol Biol 2011; 759:501-523. [PMID: 21863505 DOI: 10.1007/978-1-61779-173-4_28] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Due to its genetic tractability and ease of manipulation, the yeast Saccharomyces cerevisiae has been extensively used as a model organism to understand how eukaryotic cells grow, divide, and respond to environmental changes. In this chapter, we reasoned that functional annotation of novel genes revealed by sequencing should adopt an integrative approach including both bioinformatics and experimental analysis to reveal functional conservation and divergence of complexes and pathways. The techniques and resources generated for systems biology studies in yeast have found a wide range of applications. Here we focused on using these technologies in revealing functions of genes from mammals, in identifying targets of novel and known drugs and in screening drugs targeting specific proteins and/or protein-protein interactions.
Collapse
Affiliation(s)
- Nianshu Zhang
- Department of Biochemistry, Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK.
| | | |
Collapse
|
35
|
Auchtung TA, Shyndriayeva G, Cavanaugh CM. 16S rRNA phylogenetic analysis and quantification of Korarchaeota indigenous to the hot springs of Kamchatka, Russia. Extremophiles 2010; 15:105-16. [PMID: 21153671 DOI: 10.1007/s00792-010-0340-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 11/15/2010] [Indexed: 10/18/2022]
Abstract
The candidate archaeal division Korarchaeota is known primarily from deeply branching sequences of 16S rRNA genes PCR-amplified from hydrothermal springs. Parallels between the phylogeny of these genes and the geographic locations where they were identified suggested that Korarchaeota exhibit a high level of endemism. In this study, the influence of geographic isolation and select environmental factors on the diversification of the Korarchaeota was investigated. Fourteen hot springs from three different regions of Kamchatka, Russia were screened by PCR using Korarchaeota-specific and general Archaea 16S rRNA gene-targeting primers, cloning, and sequencing. Phylogenetic analyses of these sequences with Korarchaeota 16S rRNA sequences previously identified from around the world suggested that all Kamchatka sequences cluster together in a unique clade that subdivides by region within the peninsula. Consistent with endemism, 16S rRNA gene group-specific quantitative PCR of all Kamchatka samples detected only the single clade of Korarchaeota that was found by the non-quantitative PCR screening. In addition, their genes were measured in only low numbers; small Korarchaeota populations would present fewer chances for dispersal to and colonization of other sites. Across the entire division of Korarchaeota, common geographic locations, temperatures, or salinities of identification sites united sequence clusters at different phylogenetic levels, suggesting varied roles of these factors in the diversification of Korarchaeota.
Collapse
Affiliation(s)
- Thomas A Auchtung
- Department of Organismic and Evolutionary Biology, Harvard University, Biological Laboratories 4083, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
36
|
Ojo OO, Omabe M. Incorporating bioinformatics into biological science education in Nigeria: prospects and challenges. INFECTION GENETICS AND EVOLUTION 2010; 11:784-7. [PMID: 21145989 DOI: 10.1016/j.meegid.2010.11.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Revised: 11/26/2010] [Accepted: 11/26/2010] [Indexed: 10/18/2022]
Abstract
The urgency to process and analyze the deluge of data created by proteomics and genomics studies worldwide has caused bioinformatics to gain prominence and importance. However, its multidisciplinary nature has created a unique demand for specialist trained in both biology and computing. Several countries, in response to this challenge, have developed a number of manpower training programmes. This review presents a description of the meaning, scope, history and development of bioinformatics with focus on prospects and challenges facing bioinformatics education worldwide. The paper also provides an overview of attempts at the introduction of bioinformatics in Nigeria; describes the existing bioinformatics scenario in Nigeria and suggests strategies for effective bioinformatics education in Nigeria.
Collapse
Affiliation(s)
- O O Ojo
- Chevron Biotechnology Centre, Federal University of Technology, Yola, Nigeria.
| | | |
Collapse
|
37
|
Wyatt MA, Wang W, Roux CM, Beasley FC, Heinrichs DE, Dunman PM, Magarvey NA. Staphylococcus aureus nonribosomal peptide secondary metabolites regulate virulence. Science 2010; 329:294-6. [PMID: 20522739 DOI: 10.1126/science.1188888] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Staphylococcus aureus is a major human pathogen that is resistant to numerous antibiotics in clinical use. We found two nonribosomal peptide secondary metabolites--the aureusimines, made by S. aureus--that are not antibiotics, but function as regulators of virulence factor expression and are necessary for productive infections. In vivo mouse models of bacteremia showed that strains of S. aureus unable to produce aureusimines were attenuated and/or cleared from major organs, including the spleen, liver, and heart. Targeting aureusimine synthesis may offer novel leads for anti-infective drugs.
Collapse
Affiliation(s)
- Morgan A Wyatt
- Department of Biochemistry and Biomedical Sciences, M. G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario L8N 3Z5, Canada
| | | | | | | | | | | | | |
Collapse
|
38
|
Schmidt am Busch M, Sedano A, Simonson T. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One 2010; 5:e10410. [PMID: 20463972 PMCID: PMC2864755 DOI: 10.1371/journal.pone.0010410] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/31/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. METHODOLOGY/PRINCIPAL FINDINGS WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. CONCLUSIONS/SIGNIFICANCE For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Audrey Sedano
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
39
|
|
40
|
Molecular systematics: A synthesis of the common methods and the state of knowledge. Cell Mol Biol Lett 2010; 15:311-41. [PMID: 20213503 PMCID: PMC6275913 DOI: 10.2478/s11658-010-0010-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2009] [Accepted: 03/01/2010] [Indexed: 11/21/2022] Open
Abstract
The comparative and evolutionary analysis of molecular data has allowed researchers to tackle biological questions that have long remained unresolved. The evolution of DNA and amino acid sequences can now be modeled accurately enough that the information conveyed can be used to reconstruct the past. The methods to infer phylogeny (the pattern of historical relationships among lineages of organisms and/or sequences) range from the simplest, based on parsimony, to more sophisticated and highly parametric ones based on likelihood and Bayesian approaches. In general, molecular systematics provides a powerful statistical framework for hypothesis testing and the estimation of evolutionary processes, including the estimation of divergence times among taxa. The field of molecular systematics has experienced a revolution in recent years, and, although there are still methodological problems and pitfalls, it has become an essential tool for the study of evolutionary patterns and processes at different levels of biological organization. This review aims to present a brief synthesis of the approaches and methodologies that are most widely used in the field of molecular systematics today, as well as indications of future trends and state-of-the-art approaches.
Collapse
|
41
|
Abstract
The use of low coverage genomes in comparative evolutionary analyses skews estimates of gene gains and losses. Background Given the availability of full genome sequences, mapping gene gains, duplications, and losses during evolution should theoretically be straightforward. However, this endeavor suffers from overemphasis on detecting conserved genome features, which in turn has led to sequencing multiple eutherian genomes with low coverage rather than fewer genomes with high-coverage and more even distribution in the phylogeny. Although limitations associated with analysis of low coverage genomes are recognized, they have not been quantified. Results Here, using recently developed comparative genomic application systems, we evaluate the impact of low-coverage genomes on inferences pertaining to gene gains and losses when analyzing eukaryote genome evolution through gene duplication. We demonstrate that, when performing inference of genome content evolution, low-coverage genomes generate not only a massive number of false gene losses, but also striking artifacts in gene duplication inference, especially at the most recent common ancestor of low-coverage genomes. We show that the artifactual gains are caused by the low coverage of genome sequence per se rather than by the increased taxon sampling in a biased portion of the species tree. Conclusions We argue that it will remain difficult to differentiate artifacts from true changes in modes and tempo of genome evolution until there is better homogeneity in both taxon sampling and high-coverage sequencing. This is important for broadening the utility of full genome data to the community of evolutionary biologists, whose interests go well beyond widely conserved physiologies and developmental patterns as they seek to understand the generative mechanisms underlying biological diversity.
Collapse
|
42
|
Celton M, Malpertuy A, Lelandais G, de Brevern AG. Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC Genomics 2010; 11:15. [PMID: 20056002 PMCID: PMC2827407 DOI: 10.1186/1471-2164-11-15] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2009] [Accepted: 01/07/2010] [Indexed: 11/17/2022] Open
Abstract
Background Microarray technologies produced large amount of data. In a previous study, we have shown the interest of k-Nearest Neighbour approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human. Results We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (EM_array). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that k-means approach is more efficient to conserve gene associations. Conclusions More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The EM_array approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A noticeable point is the specific influence of some biological dataset.
Collapse
Affiliation(s)
- Magalie Celton
- INSERM UMR-S 726, Equipe de Bioinformatique Génomique et Moléculaire, DSIMB, Université Paris Diderot-Paris 7, 2 place Jussieu, Paris, France
| | | | | | | |
Collapse
|
43
|
Liolios K, Chen IMA, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2010; 38:D346-54. [PMID: 19914934 PMCID: PMC2808860 DOI: 10.1093/nar/gkp848] [Citation(s) in RCA: 312] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2009] [Accepted: 09/22/2009] [Indexed: 11/14/2022] Open
Abstract
The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/
Collapse
Affiliation(s)
- Konstantinos Liolios
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - I-Min A. Chen
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Konstantinos Mavromatis
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Nektarios Tavernarakis
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Philip Hugenholtz
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Victor M. Markowitz
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Nikos C. Kyrpides
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| |
Collapse
|
44
|
Taffs R, Aston JE, Brileya K, Jay Z, Klatt CG, McGlynn S, Mallette N, Montross S, Gerlach R, Inskeep WP, Ward DM, Carlson RP. In silico approaches to study mass and energy flows in microbial consortia: a syntrophic case study. BMC SYSTEMS BIOLOGY 2009; 3:114. [PMID: 20003240 PMCID: PMC2799449 DOI: 10.1186/1752-0509-3-114] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2009] [Accepted: 12/10/2009] [Indexed: 11/14/2022]
Abstract
BACKGROUND Three methods were developed for the application of stoichiometry-based network analysis approaches including elementary mode analysis to the study of mass and energy flows in microbial communities. Each has distinct advantages and disadvantages suitable for analyzing systems with different degrees of complexity and a priori knowledge. These approaches were tested and compared using data from the thermophilic, phototrophic mat communities from Octopus and Mushroom Springs in Yellowstone National Park (USA). The models were based on three distinct microbial guilds: oxygenic phototrophs, filamentous anoxygenic phototrophs, and sulfate-reducing bacteria. Two phases, day and night, were modeled to account for differences in the sources of mass and energy and the routes available for their exchange. RESULTS The in silico models were used to explore fundamental questions in ecology including the prediction of and explanation for measured relative abundances of primary producers in the mat, theoretical tradeoffs between overall productivity and the generation of toxic by-products, and the relative robustness of various guild interactions. CONCLUSION The three modeling approaches represent a flexible toolbox for creating cellular metabolic networks to study microbial communities on scales ranging from cells to ecosystems. A comparison of the three methods highlights considerations for selecting the one most appropriate for a given microbial system. For instance, communities represented only by metagenomic data can be modeled using the pooled method which analyzes a community's total metabolic potential without attempting to partition enzymes to different organisms. Systems with extensive a priori information on microbial guilds can be represented using the compartmentalized technique, employing distinct control volumes to separate guild-appropriate enzymes and metabolites. If the complexity of a compartmentalized network creates an unacceptable computational burden, the nested analysis approach permits greater scalability at the cost of more user intervention through multiple rounds of pathway analysis.
Collapse
Affiliation(s)
- Reed Taffs
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717, USA
| | - John E Aston
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717, USA
| | - Kristen Brileya
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717, USA
| | - Zackary Jay
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
| | - Christian G Klatt
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
| | - Shawn McGlynn
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
| | - Natasha Mallette
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717, USA
| | - Scott Montross
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
| | - Robin Gerlach
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717, USA
| | - William P Inskeep
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
| | - David M Ward
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
| | - Ross P Carlson
- Thermal Biology Institute, Montana State University, Bozeman, MT 59717, USA
- Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717, USA
| |
Collapse
|
45
|
Gao M, Skolnick J. A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol 2009; 5:e1000567. [PMID: 19911048 PMCID: PMC2770119 DOI: 10.1371/journal.pcbi.1000567] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Accepted: 10/16/2009] [Indexed: 11/18/2022] Open
Abstract
Diverse mechanisms for DNA-protein recognition have been elucidated in numerous atomic complex structures from various protein families. These structural data provide an invaluable knowledge base not only for understanding DNA-protein interactions, but also for developing specialized methods that predict the DNA-binding function from protein structure. While such methods are useful, a major limitation is that they require an experimental structure of the target as input. To overcome this obstacle, we develop a threading-based method, DNA-Binding-Domain-Threader (DBD-Threader), for the prediction of DNA-binding domains and associated DNA-binding protein residues. Our method, which uses a template library composed of DNA-protein complex structures, requires only the target protein's sequence. In our approach, fold similarity and DNA-binding propensity are employed as two functional discriminating properties. In benchmark tests on 179 DNA-binding and 3,797 non-DNA-binding proteins, using templates whose sequence identity is less than 30% to the target, DBD-Threader achieves a sensitivity/precision of 56%/86%. This performance is considerably better than the standard sequence comparison method PSI-BLAST and is comparable to DBD-Hunter, which requires an experimental structure as input. Moreover, for over 70% of predicted DNA-binding domains, the backbone Root Mean Square Deviations (RMSDs) of the top-ranked structural models are within 6.5 A of their experimental structures, with their associated DNA-binding sites identified at satisfactory accuracy. Additionally, DBD-Threader correctly assigned the SCOP superfamily for most predicted domains. To demonstrate that DBD-Threader is useful for automatic function annotation on a large-scale, DBD-Threader was applied to 18,631 protein sequences from the human genome; 1,654 proteins are predicted to have DNA-binding function. Comparison with existing Gene Ontology (GO) annotations suggests that approximately 30% of our predictions are new. Finally, we present some interesting predictions in detail. In particular, it is estimated that approximately 20% of classic zinc finger domains play a functional role not related to direct DNA-binding.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
46
|
am Busch MS, Mignon D, Simonson T. Computational protein design as a tool for fold recognition. Proteins 2009; 77:139-58. [PMID: 19408297 DOI: 10.1002/prot.22426] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computationally designed protein sequences have been proposed as a basis to perform fold recognition and homology searching. To investigate this possibility, an automated procedure is used to completely redesign 24 SH3 proteins and 22 SH2 proteins. We use the experimental backbone coordinates as fixed templates in the folded state and a molecular mechanics model to compute the pairwise interaction energies between all sidechain types and conformations. Energy calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is then used to scan the sequence and conformational space for optimal solutions. We produced 200,000-450,000 sequences for each backbone template. The designed sequences ressemble moderately-distant, natural homologues of the initial templates, according to their identity scores and their similarity with respect to the Pfam sets of SH2 and SH3 domains. Standard homology detection tools document their native-like character: the Conserved Domain Database recognizes 61% (52%) of our low-energy sequences as SH3 (SH2) domains; the SUPERFAMILY, Hidden-Markov Model library recognizes 81% (84%). Conversely, position specific scoring matrices (PSSMs) derived from our designed sequences can be used to detect natural homologues in sequence databases. Within SwissProt, a set of natural SH3 PSSMs detects 772 SH3 domains, for example; our designed PSSMs detect 67% of these, plus one additional sequence and two false positives. If six amino acids involved in substrate binding (a selective pressure not accounted for in our design) are reset to their experimental types, then 77% of the experimental SH3 domains are detected. Results for the SH2 domains are similar. Several directions to improve the method further are discussed.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | | | | |
Collapse
|
47
|
A New Multiplexed Real-Time PCR Assay to Detect Campylobacter jejuni, C. coli, C. lari, and C. upsaliensis. FOOD ANAL METHOD 2009. [DOI: 10.1007/s12161-009-9110-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
48
|
Ramos AA, Marques AR, Rodrigues M, Henriques N, Baumgartner A, Castilho R, Brenig B, Varela JC. Molecular and functional characterization of a cDNA encoding 4-hydroxy-3-methylbut-2-enyl diphosphate reductase from Dunaliella salina. JOURNAL OF PLANT PHYSIOLOGY 2009; 166:968-77. [PMID: 19155093 DOI: 10.1016/j.jplph.2008.11.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2008] [Revised: 11/10/2008] [Accepted: 11/10/2008] [Indexed: 05/03/2023]
Abstract
In green algae, the final step of the plastidial methylerythritol phosphate (MEP) pathway is catalyzed by 4-hydroxy-3-methylbut-2-enyl diphosphate reductase (HDR; EC: 1.17.1.2), an enzyme proposed to play a key role in the regulation of isoprenoid biosynthesis. Here we report the isolation and functional characterization of a 1959-bp Dunaliella salina HDR (DsHDR) cDNA encoding a deduced polypeptide of 474 amino acid residues. Phylogenetic analysis implied a cyanobacterial origin for plant and algal HDR genes. Steady-state DsHDR transcript levels were higher in D. salina cells submitted to nutritional depletion, high salt and/or high light, suggesting that DsHDR may respond to the same environmental cues as genes involved in carotenoid biosynthesis.
Collapse
Affiliation(s)
- Ana A Ramos
- Centre of Marine Sciences, University of Algarve, Campus de Gambelas, Faro 8005-139, Portugal
| | | | | | | | | | | | | | | |
Collapse
|
49
|
Li H, Kristensen DM, Coleman MK, Mushegian A. Detection of biochemical pathways by probabilistic matching of phyletic vectors. PLoS One 2009; 4:e5326. [PMID: 19390636 PMCID: PMC2670198 DOI: 10.1371/journal.pone.0005326] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 02/10/2009] [Indexed: 11/18/2022] Open
Abstract
A phyletic vector, also known as a phyletic (or phylogenetic) pattern, is a binary representation of the presences and absences of orthologous genes in different genomes. Joint occurrence of two or more genes in many genomes results in closely similar binary vectors representing these genes, and this similarity between gene vectors may be used as a measure of functional association between genes. Better understanding of quantitative properties of gene co-occurrences is needed for systematic studies of gene function and evolution. We used the probabilistic iterative algorithm Psi-square to find groups of similar phyletic vectors. An extended Psi-square algorithm, in which pseudocounts are implemented, shows better sensitivity in identifying proteins with known functional links than our earlier hierarchical clustering approach. At the same time, the specificity of inferring functional associations between genes in prokaryotic genomes is strongly dependent on the pathway: phyletic vectors of the genes involved in energy metabolism and in de novo biosynthesis of the essential precursors tend to be lumped together, whereas cellular modules involved in secretion, motility, assembly of cell surfaces, biosynthesis of some coenzymes, and utilization of secondary carbon sources tend to be identified with much greater specificity. It appears that the network of gene coinheritance in prokaryotes contains a giant connected component that encompasses most biosynthetic subsystems, along with a series of more independent modules involved in cell interaction with the environment.
Collapse
Affiliation(s)
- Hua Li
- Stowers Institute for Medical Research, Kansas City, Missouri, United States of America.
| | | | | | | |
Collapse
|
50
|
Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 2009; 10:67. [PMID: 19236712 PMCID: PMC2653490 DOI: 10.1186/1471-2105-10-67] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Accepted: 02/23/2009] [Indexed: 11/22/2022] Open
Abstract
Background The ever-increasing number of sequenced and annotated genomes has made management of their annotations a significant undertaking, especially for large eukaryotic genomes containing many thousands of genes. Typically, changes in gene and transcript numbers are used to summarize changes from release to release, but these measures say nothing about changes to individual annotations, nor do they provide any means to identify annotations in need of manual review. Results In response, we have developed a suite of quantitative measures to better characterize changes to a genome's annotations between releases, and to prioritize problematic annotations for manual review. We have applied these measures to the annotations of five eukaryotic genomes over multiple releases – H. sapiens, M. musculus, D. melanogaster, A. gambiae, and C. elegans. Conclusion Our results provide the first detailed, historical overview of how these genomes' annotations have changed over the years, and demonstrate the usefulness of these measures for genome annotation management.
Collapse
|