Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gao L, Qi J, Sun J, Hao B. Prokaryote phylogeny meets taxonomy: an exhaustive comparison of composition vector trees with systematic bacteriology. ACTA ACUST UNITED AC 2008;50:587-99. [PMID: 17879055 DOI: 10.1007/s11427-007-0084-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2007] [Accepted: 07/21/2007] [Indexed: 10/22/2022]

For:	Gao L, Qi J, Sun J, Hao B. Prokaryote phylogeny meets taxonomy: an exhaustive comparison of composition vector trees with systematic bacteriology. ACTA ACUST UNITED AC 2008;50:587-99. [PMID: 17879055 DOI: 10.1007/s11427-007-0084-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2007] [Accepted: 07/21/2007] [Indexed: 10/22/2022]

Number

Cited by Other Article(s)

Genome sequence of a multidrug-resistant Campylobacter coli strain isolated from a newborn with severe diarrhea in Lebanon. Folia Microbiol (Praha) 2022;67:319-328. [PMID: 34997523 DOI: 10.1007/s12223-021-00921-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 09/18/2021] [Indexed: 11/04/2022]

Zuo G, Hao B. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy. GENOMICS, PROTEOMICS & BIOINFORMATICS 2015;13:321-31. [PMID: 26563468 PMCID: PMC4678791 DOI: 10.1016/j.gpb.2015.08.004] [Citation(s) in RCA: 146] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 08/10/2015] [Indexed: 01/15/2023]

Zuo G, Xu Z, Hao B. Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis. Life (Basel) 2015;5:949-68. [PMID: 25789552 PMCID: PMC4390887 DOI: 10.3390/life5010949] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 03/06/2015] [Accepted: 03/09/2015] [Indexed: 11/29/2022] Open

Prediction of success for polymerase chain reactions using the Markov maximal order model and support vector machine. J Theor Biol 2015;369:51-8. [DOI: 10.1016/j.jtbi.2015.01.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 12/21/2014] [Accepted: 01/14/2015] [Indexed: 11/18/2022]

Koumandou VL, Kossida S. Evolution of the F0F1 ATP synthase complex in light of the patchy distribution of different bioenergetic pathways across prokaryotes. PLoS Comput Biol 2014;10:e1003821. [PMID: 25188293 PMCID: PMC4154653 DOI: 10.1371/journal.pcbi.1003821] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 07/18/2014] [Indexed: 11/22/2022] Open

Abstract

Bacteria and archaea are characterized by an amazing metabolic diversity, which allows them to persist in diverse and often extreme habitats. Apart from oxygenic photosynthesis and oxidative phosphorylation, well-studied processes from chloroplasts and mitochondria of plants and animals, prokaryotes utilize various chemo- or lithotrophic modes, such as anoxygenic photosynthesis, iron oxidation and reduction, sulfate reduction, and methanogenesis. Most bioenergetic pathways have a similar general structure, with an electron transport chain composed of protein complexes acting as electron donors and acceptors, as well as a central cytochrome complex, mobile electron carriers, and an ATP synthase. While each pathway has been studied in considerable detail in isolation, not much is known about their relative evolutionary relationships. Wanting to address how this metabolic diversity evolved, we mapped the distribution of nine bioenergetic modes on a phylogenetic tree based on 16S rRNA sequences from 272 species representing the full diversity of prokaryotic lineages. This highlights the patchy distribution of many pathways across different lineages, and suggests either up to 26 independent origins or 17 horizontal gene transfer events. Next, we used comparative genomics and phylogenetic analysis of all subunits of the F0F1 ATP synthase, common to most bacterial lineages regardless of their bioenergetic mode. Our results indicate an ancient origin of this protein complex, and no clustering based on bioenergetic mode, which suggests that no special modifications are needed for the ATP synthase to work with different electron transport chains. Moreover, examination of the ATP synthase genetic locus indicates various gene rearrangements in the different bacterial lineages, ancient duplications of atpI and of the beta subunit of the F0 subcomplex, as well as more recent stochastic lineage-specific and species-specific duplications of all subunits. We discuss the implications of the overall pattern of conservation and flexibility of the F0F1 ATP synthase genetic locus.

Collapse

Zuo G, Li Q, Hao B. On K-peptide length in composition vector phylogeny of prokaryotes. Comput Biol Chem 2014;53 Pt A:166-73. [PMID: 25205031 DOI: 10.1016/j.compbiolchem.2014.08.021] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 11/25/2022]

Patil KR, McHardy AC. Alignment-free genome tree inference by learning group-specific distance metrics. Genome Biol Evol 2013;5:1470-84. [PMID: 23843191 PMCID: PMC3762195 DOI: 10.1093/gbe/evt105] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Prokaryotic phylogenies inferred from whole-genome sequence and annotation data. BIOMED RESEARCH INTERNATIONAL 2013;2013:409062. [PMID: 24073404 PMCID: PMC3773407 DOI: 10.1155/2013/409062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 06/26/2013] [Accepted: 07/22/2013] [Indexed: 11/25/2022]

Colosimo ME, Peterson MW, Mardis S, Hirschman L. Nephele: genotyping via complete composition vectors and MapReduce. SOURCE CODE FOR BIOLOGY AND MEDICINE 2011;6:13. [PMID: 21851626 PMCID: PMC3182884 DOI: 10.1186/1751-0473-6-13] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 08/18/2011] [Indexed: 02/02/2023]

Abstract

BACKGROUND

Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence. Genotyping is critical to understanding, at a genomic level, the origin and spread of infectious diseases. Increasingly, genotyping is coming into use for disease surveillance activities, as well as for microbial forensics. The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, these traditional single-processor methods are suboptimal for rapidly growing sequence datasets being generated by next-generation DNA sequencing machines, because they increase in computational complexity quickly with the number of sequences.

RESULTS

Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on traditional hardware. Nephele can use the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. We were able to generate a neighbour-joined tree of over 10,000 16S samples in less than 2 hours.

CONCLUSIONS

We conclude that using Nephele can substantially decrease the processing time required for generating genotype trees of tens to hundreds of organisms at genome scale sequence coverage.

Collapse

Wang Z, Zhang XC, Le MH, Xu D, Stacey G, Cheng J. A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS One 2011;6:e17906. [PMID: 21455299 PMCID: PMC3063783 DOI: 10.1371/journal.pone.0017906] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2010] [Accepted: 02/16/2011] [Indexed: 11/18/2022] Open

Abstract

Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species.

Collapse

Zuo G, Xu Z, Yu H, Hao B. Jackknife and bootstrap tests of the composition vector trees. GENOMICS, PROTEOMICS & BIOINFORMATICS 2010;8:262-7. [PMID: 21382595 PMCID: PMC5054193 DOI: 10.1016/s1672-0229(10)60028-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Sun J, Xu Z, Hao B. Whole-genome based Archaea phylogeny and taxonomy: A composition vector approach. CHINESE SCIENCE BULLETIN-CHINESE 2010;55:2323-2328. [PMID: 32214732 PMCID: PMC7089326 DOI: 10.1007/s11434-010-3008-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2009] [Accepted: 08/13/2009] [Indexed: 11/24/2022]

Wang H, Xu Z, Gao L, Hao B. A fungal phylogeny based on 82 complete genomes using the composition vector method. BMC Evol Biol 2009;9:195. [PMID: 19664262 PMCID: PMC3087519 DOI: 10.1186/1471-2148-9-195] [Citation(s) in RCA: 159] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2008] [Accepted: 08/10/2009] [Indexed: 01/23/2023] Open

Abstract

BACKGROUND

Molecular phylogenetics and phylogenomics have greatly revised and enriched the fungal systematics in the last two decades. Most of the analyses have been performed by comparing single or multiple orthologous gene regions. Sequence alignment has always been an essential element in tree construction. These alignment-based methods (to be called the standard methods hereafter) need independent verification in order to put the fungal Tree of Life (TOL) on a secure footing. The ever-increasing number of sequenced fungal genomes and the recent success of our newly proposed alignment-free composition vector tree (CVTree, see Methods) approach have made the verification feasible.

RESULTS

In all, 82 fungal genomes covering 5 phyla were obtained from the relevant genome sequencing centers. An unscaled phylogenetic tree with 3 outgroup species was constructed by using the CVTree method. Overall, the resultant phylogeny infers all major groups in accordance with standard methods. Furthermore, the CVTree provides information on the placement of several currently unsettled groups. Within the sub-phylum Pezizomycotina, our phylogeny places the Dothideomycetes and Eurotiomycetes as sister taxa. Within the Sordariomycetes, it infers that Magnaporthe grisea and the Plectosphaerellaceae are closely related to the Sordariales and Hypocreales, respectively. Within the Eurotiales, it supports that Aspergillus nidulans is the early-branching species among the 8 aspergilli. Within the Onygenales, it groups Histoplasma and Paracoccidioides together, supporting that the Ajellomycetaceae is a distinct clade from Onygenaceae. Within the sub-phylum Saccharomycotina, the CVTree clearly resolves two clades: (1) species that translate CTG as serine instead of leucine (the CTG clade) and (2) species that have undergone whole-genome duplication (the WGD clade). It places Candida glabrata at the base of the WGD clade.

CONCLUSION

Using different input data and methodology, the CVTree approach is a good complement to the standard methods. The remarkable consistency between them has brought about more confidence to the current understanding of the fungal branch of TOL.

Collapse

Xu Z, Hao B. CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes. Nucleic Acids Res 2009;37:W174-8. [PMID: 19398429 PMCID: PMC2703908 DOI: 10.1093/nar/gkp278] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2009] [Revised: 04/10/2009] [Accepted: 04/14/2009] [Indexed: 11/21/2022] Open

Lin GN, Cai Z, Lin G, Chakraborty S, Xu D. ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets. BMC Bioinformatics 2009;10 Suppl 1:S5. [PMID: 19208152 PMCID: PMC2648732 DOI: 10.1186/1471-2105-10-s1-s5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open