Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Dutilh BE, He Y, Hekkelman ML, Huynen MA. Signature, a web server for taxonomic characterization of sequence samples using signature genes. Nucleic Acids Res 2008;36:W470-4. [PMID: 18487625 PMCID: PMC2447722 DOI: 10.1093/nar/gkn277] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

For:	Dutilh BE, He Y, Hekkelman ML, Huynen MA. Signature, a web server for taxonomic characterization of sequence samples using signature genes. Nucleic Acids Res 2008;36:W470-4. [PMID: 18487625 PMCID: PMC2447722 DOI: 10.1093/nar/gkn277] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Number

Cited by Other Article(s)

Barr JJ, Dutilh BE, Skennerton CT, Fukushima T, Hastie ML, Gorman JJ, Tyson GW, Bond PL. Metagenomic and metaproteomic analyses of Accumulibacter phosphatis-enriched floccular and granular biofilm. Environ Microbiol 2015;18:273-87. [PMID: 26279094 DOI: 10.1111/1462-2920.13019] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Revised: 06/30/2015] [Accepted: 08/11/2015] [Indexed: 11/30/2022]

Affiliation(s)

Jeremy J Barr Department of Biology, San Diego State University, San Diego, CA, USA.,Advanced Water Management Centre (AWMC), The University of Queensland, Brisbane, Qld, Australia.,Environmental Biotechnology Cooperative Research Centre (EBCRC), Sydney, NSW, Australia
Bas E Dutilh Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, The Netherlands.,Centre for Molecular and Biomedical Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands.,Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Connor T Skennerton Advanced Water Management Centre (AWMC), The University of Queensland, Brisbane, Qld, Australia.,Australian Centre for Ecogenomics, School of Chemistry and Molecular Bioscience, The University of Queensland, Brisbane, Qld, Australia.,Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, CA, USA
Toshikazu Fukushima Advanced Water Management Centre (AWMC), The University of Queensland, Brisbane, Qld, Australia.,Division of Environmental Studies, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
Marcus L Hastie Protein Discovery Centre, Queensland Institute of Medical Research (QIMR) Berghofer Medical Research Institute, Herston, Qld, Australia
Jeffrey J Gorman Protein Discovery Centre, Queensland Institute of Medical Research (QIMR) Berghofer Medical Research Institute, Herston, Qld, Australia
Gene W Tyson Advanced Water Management Centre (AWMC), The University of Queensland, Brisbane, Qld, Australia.,Australian Centre for Ecogenomics, School of Chemistry and Molecular Bioscience, The University of Queensland, Brisbane, Qld, Australia
Philip L Bond Advanced Water Management Centre (AWMC), The University of Queensland, Brisbane, Qld, Australia.,Environmental Biotechnology Cooperative Research Centre (EBCRC), Sydney, NSW, Australia

Collapse

Wang JD. Comparing virus classification using genomic materials according to different taxonomic levels. J Bioinform Comput Biol 2013;11:1343003. [PMID: 24372032 DOI: 10.1142/s0219720013430038] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Dutilh BE, Backus L, Edwards RA, Wels M, Bayjanov JR, van Hijum SAFT. Explaining microbial phenotypes on a genomic scale: GWAS for microbes. Brief Funct Genomics 2013;12:366-80. [PMID: 23625995 PMCID: PMC3743258 DOI: 10.1093/bfgp/elt008] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Gori F, Tringe SG, Folino G, van Hijum SAFT, Op den Camp HJM, Jetten MSM, Marchiori E. Differences in sequencing technologies improve the retrieval of anammox bacterial genome from metagenomes. BMC Genomics 2013;14:7. [PMID: 23324532 PMCID: PMC3618311 DOI: 10.1186/1471-2164-14-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 12/13/2012] [Indexed: 11/27/2022] Open

Abstract

Background

Sequencing technologies have different biases, in single-genome sequencing and metagenomic sequencing; these can significantly affect ORFs recovery and the population distribution of a metagenome. In this paper we investigate how well different technologies represent information related to a considered organism of interest in a metagenome, and whether it is beneficial to combine information obtained using different technologies. We analyze comparatively three metagenomic datasets acquired from a sample containing the anammox bacterium Candidatus ’Brocadia fulgida’ (B. fulgida). These datasets were obtained using Roche 454 FLX and Sanger sequencing with two different libraries (shotgun and fosmid).

Results

In each dataset, the abundance of the reads annotated to B. fulgida was much lower than the abundance expected from available cell count information. This was due to the overrepresentation of GC-richer organisms, as shown by GC-content distribution of the reads. Nevertheless, by considering the union of B. fulgida reads over the three datasets, the number of B. fulgida ORFs recovered for at least 80% of their length was twice the amount recovered by the best technology. Indeed, while taxonomic distributions of reads in the three datasets were similar, the respective sets of B. fulgida ORFs recovered for a large part of their length were highly different, and depth of coverage patterns of 454 and Sanger were dissimilar.

Conclusions

Precautions should be sought in order to prevent the overrepresentation of GC-rich microbes in the datasets. This overrepresentation and the consistency of the taxonomic distributions of reads obtained with different sequencing technologies suggests that, in general, abundance biases might be mainly due to other steps of the sequencing protocols. Results show that biases against organisms of interest could be compensated combining different sequencing technologies, due to the differences of their genome-level sequencing biases even if the species was present in not very different abundances in the metagenomes.

Collapse

Boleij A, Dutilh BE, Kortman GAM, Roelofs R, Laarakkers CM, Engelke UF, Tjalsma H. Bacterial responses to a simulated colon tumor microenvironment. Mol Cell Proteomics 2012;11:851-62. [PMID: 22713208 DOI: 10.1074/mcp.m112.019315] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Linear normalised hash function for clustering gene sequences and identifying reference sequences from multiple sequence alignments. MICROBIAL INFORMATICS AND EXPERIMENTATION 2012;2:2. [PMID: 22587938 PMCID: PMC3351711 DOI: 10.1186/2042-5783-2-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 01/26/2012] [Indexed: 11/10/2022]

Abstract

BACKGROUND

Comparative genomics has put additional demands on the assessment of similarity between sequences and their clustering as means for classification. However, defining the optimal number of clusters, cluster density and boundaries for sets of potentially related sequences of genes with variable degrees of polymorphism remains a significant challenge. The aim of this study was to develop a method that would identify the cluster centroids and the optimal number of clusters for a given sensitivity level and could work equally well for the different sequence datasets.

RESULTS

A novel method that combines the linear mapping hash function and multiple sequence alignment (MSA) was developed. This method takes advantage of the already sorted by similarity sequences from the MSA output, and identifies the optimal number of clusters, clusters cut-offs, and clusters centroids that can represent reference gene vouchers for the different species. The linear mapping hash function can map an already ordered by similarity distance matrix to indices to reveal gaps in the values around which the optimal cut-offs of the different clusters can be identified. The method was evaluated using sets of closely related (16S rRNA gene sequences of Nocardia species) and highly variable (VP1 genomic region of Enterovirus 71) sequences and outperformed existing unsupervised machine learning clustering methods and dimensionality reduction methods. This method does not require prior knowledge of the number of clusters or the distance between clusters, handles clusters of different sizes and shapes, and scales linearly with the dataset.

CONCLUSIONS

The combination of MSA with the linear mapping hash function is a computationally efficient way of gene sequence clustering and can be a valuable tool for the assessment of similarity, clustering of different microbial genomes, identifying reference sequences, and for the study of evolution of bacteria and viruses.

Collapse

Helal M, Kong F, Chen SCA, Bain M, Christen R, Sintchenko V. Defining reference sequences for Nocardia species by similarity and clustering analyses of 16S rRNA gene sequence data. PLoS One 2011;6:e19517. [PMID: 21687706 PMCID: PMC3110597 DOI: 10.1371/journal.pone.0019517] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2010] [Accepted: 04/08/2011] [Indexed: 01/08/2023] Open

Abstract

Background

The intra- and inter-species genetic diversity of bacteria and the absence of ‘reference’, or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia.

Methods

A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization.

Results

The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as ‘centroids’ in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578.

Conclusion

The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra-species variability.

Collapse

Mitra S, Rupek P, Richter DC, Urich T, Gilbert JA, Meyer F, Wilke A, Huson DH. Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG. BMC Bioinformatics 2011;12 Suppl 1:S21. [PMID: 21342551 PMCID: PMC3044276 DOI: 10.1186/1471-2105-12-s1-s21] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

Molecular signatures for the Crenarchaeota and the Thaumarchaeota. Antonie van Leeuwenhoek 2010;99:133-57. [PMID: 20711675 DOI: 10.1007/s10482-010-9488-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Accepted: 07/26/2010] [Indexed: 10/19/2022]

Abstract

Crenarchaeotes found in mesophilic marine environments were recently placed into a new phylum of Archaea called the Thaumarchaeota. However, very few molecular characteristics of this new phylum are currently known which can be used to distinguish them from the Crenarchaeota. In addition, their relationships to deep-branching archaeal lineages are unclear. We report here detailed analyses of protein sequences from Crenarchaeota and Thaumarchaeota that have identified many conserved signature indels (CSIs) and signature proteins (SPs) (i.e., proteins for which all significant blast hits are from these groups) that are specific for these archaeal groups. Of the identified signatures 6 CSIs and 13 SPs are specific for the Crenarchaeota phylum; 6 CSIs and >250 SPs are uniquely found in various Thaumarchaeota (viz. Cenarchaeum symbiosum, Nitrosopumilus maritimus and a number of uncultured marine crenarchaeotes) and 3 CSIs and ~10 SPs are found in both Thaumarchaeota and Crenarchaeota species. Some of the molecular signatures are also present in Korarchaeum cryptofilum, which forms the independent phylum Korarchaeota. Although some of these molecular signatures suggest a distant shared ancestry between Thaumarchaeota and Crenarchaeota, our identification of large numbers of Thaumarchaeota-specific proteins and their deep branching between the Crenarchaeota and Euryarchaeota phyla in phylogenetic trees shows that they are distinct from both Crenarchaeota and Euryarchaeota in both genetic and phylogenetic terms. These observations support the placement of marine mesophilic archaea into the separate phylum Thaumarchaeota. Additionally, many CSIs and SPs have been found that are specific for different orders within Crenarchaeota (viz. Sulfolobales-3 CSIs and 169 SPs, Thermoproteales-5 CSIs and 25 SPs, Desulfurococcales-4 SPs, and Sulfolobales and Desulfurococcales-2 CSIs and 18 SPs). The signatures described here provide novel means for distinguishing the Crenarchaeota and the Thaumarchaeota and for the classification of related and novel species in different environments. Functional studies on these signature proteins could lead to discovery of novel biochemical properties that are unique to these groups of archaea.

Collapse

Genome analysis of Moraxella catarrhalis strain BBH18, [corrected] a human respiratory tract pathogen. J Bacteriol 2010;192:3574-83. [PMID: 20453089 DOI: 10.1128/jb.00121-10] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Mitra S, Klar B, Huson DH. Visual and statistical comparison of metagenomes. ACTA ACUST UNITED AC 2009;25:1849-55. [PMID: 19515961 DOI: 10.1093/bioinformatics/btp341] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Dutilh BE, Snel B, Ettema TJG, Huynen MA. Signature genes as a phylogenomic tool. Mol Biol Evol 2008;25:1659-67. [PMID: 18492663 PMCID: PMC2464742 DOI: 10.1093/molbev/msn115] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open