1
|
Genome sequence of a multidrug-resistant Campylobacter coli strain isolated from a newborn with severe diarrhea in Lebanon. Folia Microbiol (Praha) 2022; 67:319-328. [PMID: 34997523 DOI: 10.1007/s12223-021-00921-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 09/18/2021] [Indexed: 11/04/2022]
Abstract
A multidrug-resistant (MDR) Campylobacter coli (C. coli) strain was isolated from a 2-month-old newborn who suffered from severe diarrhea in Lebanon. Here, whole-genome sequencing (WGS) analysis was deployed to determine the genetic basis of antimicrobial resistance and virulence in the C. coli isolate and to identify its epidemiological background (sequence type). The identity of the isolate was confirmed using API® Campy, MALDI-TOF, and 16S rRNA gene sequencing analysis. The antimicrobial susceptibility phenotype was determined using the disk diffusion assay. Our analysis showed that resistance to macrolide and quinolone was potentially associated with the presence of multiple point mutations in antibiotic targets on the chromosomal DNA. Furthermore, tetracycline and aminoglycoside resistance were encoded by genes on a pTet plasmid. The blaOXA-61, which is associated with beta-lactam resistance, was also detected in the C. coli genome. A set of 30 genes associated with the virulence in C. coli was detected using WGS analysis. MLST analysis classified the isolate as belonging to a new sequence type (ST-9588), a member of ST-828 complex which is mainly associated with humans and chickens. Taking together, this study provides the first WGS analysis of Campylobacter isolated from Lebanon. The detection of a variety of AMR and virulence determinants strongly emphasizes the need for studying the burden of Campylobacter in Lebanon and the Middle East and North Africa (MENA) region, where information on campylobacteriosis is scant.
Collapse
|
2
|
Zuo G, Hao B. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy. GENOMICS, PROTEOMICS & BIOINFORMATICS 2015; 13:321-31. [PMID: 26563468 PMCID: PMC4678791 DOI: 10.1016/j.gpb.2015.08.004] [Citation(s) in RCA: 146] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 08/10/2015] [Indexed: 01/15/2023]
Abstract
A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements.
Collapse
Affiliation(s)
- Guanghong Zuo
- T-Life Research Center, Department of Physics, Fudan University, Shanghai 200433, China
| | - Bailin Hao
- T-Life Research Center, Department of Physics, Fudan University, Shanghai 200433, China.
| |
Collapse
|
3
|
Zuo G, Xu Z, Hao B. Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis. Life (Basel) 2015; 5:949-68. [PMID: 25789552 PMCID: PMC4390887 DOI: 10.3390/life5010949] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 03/06/2015] [Accepted: 03/09/2015] [Indexed: 11/29/2022] Open
Abstract
A tripartite comparison of Archaea phylogeny and taxonomy at and above the rank order is reported: (1) the whole-genome-based and alignment-free CVTree using 179 genomes; (2) the 16S rRNA analysis exemplified by the All-Species Living Tree with 366 archaeal sequences; and (3) the Second Edition of Bergey's Manual of Systematic Bacteriology complemented by some current literature. A high degree of agreement is reached at these ranks. From the newly proposed archaeal phyla, Korarchaeota, Thaumarchaeota, Nanoarchaeota and Aigarchaeota, to the recent suggestion to divide the class Halobacteria into three orders, all gain substantial support from CVTree. In addition, the CVTree helped to determine the taxonomic position of some newly sequenced genomes without proper lineage information. A few discrepancies between the CVTree and the 16S rRNA approaches call for further investigation.
Collapse
Affiliation(s)
- Guanghong Zuo
- Life Research Center and Department of Physics, Fudan University, 220 Handan Road, Shanghai 200433, China.
| | - Zhao Xu
- Thermo Fisher Scientific, 200 Oyster Point Blvd, South San Francisco, CA 94080, USA.
| | - Bailin Hao
- Life Research Center and Department of Physics, Fudan University, 220 Handan Road, Shanghai 200433, China.
| |
Collapse
|
4
|
Prediction of success for polymerase chain reactions using the Markov maximal order model and support vector machine. J Theor Biol 2015; 369:51-8. [DOI: 10.1016/j.jtbi.2015.01.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 12/21/2014] [Accepted: 01/14/2015] [Indexed: 11/18/2022]
|
5
|
Koumandou VL, Kossida S. Evolution of the F0F1 ATP synthase complex in light of the patchy distribution of different bioenergetic pathways across prokaryotes. PLoS Comput Biol 2014; 10:e1003821. [PMID: 25188293 PMCID: PMC4154653 DOI: 10.1371/journal.pcbi.1003821] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 07/18/2014] [Indexed: 11/22/2022] Open
Abstract
Bacteria and archaea are characterized by an amazing metabolic diversity, which allows them to persist in diverse and often extreme habitats. Apart from oxygenic photosynthesis and oxidative phosphorylation, well-studied processes from chloroplasts and mitochondria of plants and animals, prokaryotes utilize various chemo- or lithotrophic modes, such as anoxygenic photosynthesis, iron oxidation and reduction, sulfate reduction, and methanogenesis. Most bioenergetic pathways have a similar general structure, with an electron transport chain composed of protein complexes acting as electron donors and acceptors, as well as a central cytochrome complex, mobile electron carriers, and an ATP synthase. While each pathway has been studied in considerable detail in isolation, not much is known about their relative evolutionary relationships. Wanting to address how this metabolic diversity evolved, we mapped the distribution of nine bioenergetic modes on a phylogenetic tree based on 16S rRNA sequences from 272 species representing the full diversity of prokaryotic lineages. This highlights the patchy distribution of many pathways across different lineages, and suggests either up to 26 independent origins or 17 horizontal gene transfer events. Next, we used comparative genomics and phylogenetic analysis of all subunits of the F0F1 ATP synthase, common to most bacterial lineages regardless of their bioenergetic mode. Our results indicate an ancient origin of this protein complex, and no clustering based on bioenergetic mode, which suggests that no special modifications are needed for the ATP synthase to work with different electron transport chains. Moreover, examination of the ATP synthase genetic locus indicates various gene rearrangements in the different bacterial lineages, ancient duplications of atpI and of the beta subunit of the F0 subcomplex, as well as more recent stochastic lineage-specific and species-specific duplications of all subunits. We discuss the implications of the overall pattern of conservation and flexibility of the F0F1 ATP synthase genetic locus.
Collapse
Affiliation(s)
- Vassiliki Lila Koumandou
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Athens, Greece
| | - Sophia Kossida
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Athens, Greece
| |
Collapse
|
6
|
Zuo G, Li Q, Hao B. On K-peptide length in composition vector phylogeny of prokaryotes. Comput Biol Chem 2014; 53 Pt A:166-73. [PMID: 25205031 DOI: 10.1016/j.compbiolchem.2014.08.021] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 11/25/2022]
Abstract
Using an enlarged alphabet of K-tuples is the way to carry out alignment-free comparison of genomes in the composition vector (CV) approach to prokaryotic phylogeny. We summarize the known aspects concerning the choice of K and examine the results of using CVs with subtraction of a statistical background for K=3-9 and using raw CVs without subtraction for K=1-12. The criterion for evaluation consists in direct comparison with taxonomy. For prokaryotes the best performances are obtained for K=5 and 6 with subtraction and for K=11, 12 or even more without subtraction. In general, CVs with subtractions are slightly better and less CPU consuming, but CVs without subtraction may provide complementary information.
Collapse
Affiliation(s)
- Guanghong Zuo
- T-Life Research Center, Fudan University, Shanghai 200433, China
| | - Qiang Li
- CAS-MPG Partner Institute for Computational Biology, Shanghai 200032, China
| | - Bailin Hao
- T-Life Research Center, Fudan University, Shanghai 200433, China.
| |
Collapse
|
7
|
Patil KR, McHardy AC. Alignment-free genome tree inference by learning group-specific distance metrics. Genome Biol Evol 2013; 5:1470-84. [PMID: 23843191 PMCID: PMC3762195 DOI: 10.1093/gbe/evt105] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.
Collapse
Affiliation(s)
- Kaustubh R Patil
- Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, Saarbrücken, Germany.
| | | |
Collapse
|
8
|
Prokaryotic phylogenies inferred from whole-genome sequence and annotation data. BIOMED RESEARCH INTERNATIONAL 2013; 2013:409062. [PMID: 24073404 PMCID: PMC3773407 DOI: 10.1155/2013/409062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 06/26/2013] [Accepted: 07/22/2013] [Indexed: 11/25/2022]
Abstract
Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, genes involving potential HGT events are eliminated, since such genes are considered to be the highly conserved genes across different species and the genes located on fragments with abnormal genome barcode. Third, we calculate the distance of the orthologous gene clusters between each genome pair in terms of the number of orthologous genes in conserved clusters. Finally, the neighbor-joining method is employed to construct phylogenetic trees across different species. CGCPhy has been examined on different datasets from 617 complete single-chromosome prokaryotic genomes and achieved applicative accuracies on different species sets in agreement with Bergey's taxonomy in quartet topologies. Simulation results show that CGCPhy achieves high average accuracy and has a low standard deviation on different datasets, so it has an applicative potential for phylogenetic analysis.
Collapse
|
9
|
Colosimo ME, Peterson MW, Mardis S, Hirschman L. Nephele: genotyping via complete composition vectors and MapReduce. SOURCE CODE FOR BIOLOGY AND MEDICINE 2011; 6:13. [PMID: 21851626 PMCID: PMC3182884 DOI: 10.1186/1751-0473-6-13] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 08/18/2011] [Indexed: 02/02/2023]
Abstract
BACKGROUND Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence. Genotyping is critical to understanding, at a genomic level, the origin and spread of infectious diseases. Increasingly, genotyping is coming into use for disease surveillance activities, as well as for microbial forensics. The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, these traditional single-processor methods are suboptimal for rapidly growing sequence datasets being generated by next-generation DNA sequencing machines, because they increase in computational complexity quickly with the number of sequences. RESULTS Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on traditional hardware. Nephele can use the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. We were able to generate a neighbour-joined tree of over 10,000 16S samples in less than 2 hours. CONCLUSIONS We conclude that using Nephele can substantially decrease the processing time required for generating genotype trees of tens to hundreds of organisms at genome scale sequence coverage.
Collapse
Affiliation(s)
- Marc E Colosimo
- The MITRE Corporation, 202 Burlington Rd, Bedford MA 01730, USA.
| | | | | | | |
Collapse
|
10
|
Wang Z, Zhang XC, Le MH, Xu D, Stacey G, Cheng J. A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS One 2011; 6:e17906. [PMID: 21455299 PMCID: PMC3063783 DOI: 10.1371/journal.pone.0017906] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2010] [Accepted: 02/16/2011] [Indexed: 11/18/2022] Open
Abstract
Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Computer Science, University of Missouri, Columbia, Missouri, United States of America
| | - Xue-Cheng Zhang
- Christopher S. Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America
- Division of Plant Science, University of Missouri, Columbia, Missouri, United States of America
| | - Mi Ha Le
- Division of Plant Science, University of Missouri, Columbia, Missouri, United States of America
| | - Dong Xu
- Department of Computer Science, University of Missouri, Columbia, Missouri, United States of America
- Christopher S. Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
| | - Gary Stacey
- Christopher S. Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America
- Division of Plant Science, University of Missouri, Columbia, Missouri, United States of America
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, Missouri, United States of America
- Christopher S. Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
- * E-mail:
| |
Collapse
|
11
|
Zuo G, Xu Z, Yu H, Hao B. Jackknife and bootstrap tests of the composition vector trees. GENOMICS, PROTEOMICS & BIOINFORMATICS 2010; 8:262-7. [PMID: 21382595 PMCID: PMC5054193 DOI: 10.1016/s1672-0229(10)60028-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Composition vector trees (CVTrees) are inferred from whole-genome data by an alignment-free and parameter-free method. The agreement of these trees with the corresponding taxonomy provides an objective justification of the inferred phylogeny In this work, we show the stability and self-consistency of CVTrees by performing bootstrap and jackknife re-sampling tests adapted to this alignment-free approach. Our ultimate goal is to advocate the viewpoint that time-consuming statistical re-sampling tests can be avoided at all in using this alignment-free approach. Agreement with taxonomy should be taken as a major criterion to estimate prokaryotic phylogenetic trees.
Collapse
Affiliation(s)
- Guanghong Zuo
- T-Life Research Center & Department of Physics, Fudan University, Shanghai 200433, China
- Shanghai Institute of Applied Physics, Chinese Acadamy of Sciences, Shanghai 201800, China
| | - Zhao Xu
- T-Life Research Center & Department of Physics, Fudan University, Shanghai 200433, China
- Applied Biosystems, Inc., Beijing 100027, China
| | - Hongjie Yu
- T-Life Research Center & Department of Physics, Fudan University, Shanghai 200433, China
- Fudan-VARI Center for Genetic Epidemiology, Fudan University, Shanghai 200433, China
| | - Bailin Hao
- T-Life Research Center & Department of Physics, Fudan University, Shanghai 200433, China
- Institute of Theoretical Physics, Chinese Acadamy of Sciences, Beijing 100190, China
- Santa Fe Institute, Santa Fe, NM 87505, USA
| |
Collapse
|
12
|
Sun J, Xu Z, Hao B. Whole-genome based Archaea phylogeny and taxonomy: A composition vector approach. CHINESE SCIENCE BULLETIN-CHINESE 2010; 55:2323-2328. [PMID: 32214732 PMCID: PMC7089326 DOI: 10.1007/s11434-010-3008-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2009] [Accepted: 08/13/2009] [Indexed: 11/24/2022]
Abstract
The newly proposed alignment-free and parameter-free composition vector (CVtree) method has been successfully applied to infer phylogenetic relationship of viruses, chloroplasts, bacteria, and fungi from their whole-genome data. In this study we pay special attention to the phylogenetic positions of 56 Archaea genomes among which 7 species have not been listed either in Bergey's Manual of Systematic Bacteriology or in Taxonomic Outline of Bacteria and Archaea (TOBA). By inspecting the stable monophyletic branchings in CVTrees reconstructed from a total of 861 genomes (56 Archaea plus 797 Bacteria, using 8 Eukarya as outgroups) definite taxonomic assignments were proposed for these not-fully-classified species. Further development of Archaea taxonomy may verify the predicted phylogenetic results of the CVTree approach.
Collapse
Affiliation(s)
- JianDong Sun
- 1T-Life Research Center & Department of Physics, Fudan University, Shanghai, 200433 China
| | - Zhao Xu
- 1T-Life Research Center & Department of Physics, Fudan University, Shanghai, 200433 China
| | - BaiLin Hao
- 1T-Life Research Center & Department of Physics, Fudan University, Shanghai, 200433 China
- 2Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, 100190 China
- 3Santa Fe Institute, Santa Fe, New Mexico, 87501 USA
| |
Collapse
|
13
|
Wang H, Xu Z, Gao L, Hao B. A fungal phylogeny based on 82 complete genomes using the composition vector method. BMC Evol Biol 2009; 9:195. [PMID: 19664262 PMCID: PMC3087519 DOI: 10.1186/1471-2148-9-195] [Citation(s) in RCA: 159] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2008] [Accepted: 08/10/2009] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Molecular phylogenetics and phylogenomics have greatly revised and enriched the fungal systematics in the last two decades. Most of the analyses have been performed by comparing single or multiple orthologous gene regions. Sequence alignment has always been an essential element in tree construction. These alignment-based methods (to be called the standard methods hereafter) need independent verification in order to put the fungal Tree of Life (TOL) on a secure footing. The ever-increasing number of sequenced fungal genomes and the recent success of our newly proposed alignment-free composition vector tree (CVTree, see Methods) approach have made the verification feasible. RESULTS In all, 82 fungal genomes covering 5 phyla were obtained from the relevant genome sequencing centers. An unscaled phylogenetic tree with 3 outgroup species was constructed by using the CVTree method. Overall, the resultant phylogeny infers all major groups in accordance with standard methods. Furthermore, the CVTree provides information on the placement of several currently unsettled groups. Within the sub-phylum Pezizomycotina, our phylogeny places the Dothideomycetes and Eurotiomycetes as sister taxa. Within the Sordariomycetes, it infers that Magnaporthe grisea and the Plectosphaerellaceae are closely related to the Sordariales and Hypocreales, respectively. Within the Eurotiales, it supports that Aspergillus nidulans is the early-branching species among the 8 aspergilli. Within the Onygenales, it groups Histoplasma and Paracoccidioides together, supporting that the Ajellomycetaceae is a distinct clade from Onygenaceae. Within the sub-phylum Saccharomycotina, the CVTree clearly resolves two clades: (1) species that translate CTG as serine instead of leucine (the CTG clade) and (2) species that have undergone whole-genome duplication (the WGD clade). It places Candida glabrata at the base of the WGD clade. CONCLUSION Using different input data and methodology, the CVTree approach is a good complement to the standard methods. The remarkable consistency between them has brought about more confidence to the current understanding of the fungal branch of TOL.
Collapse
Affiliation(s)
- Hao Wang
- T-life Research Center, Department of Physics, Fudan University, Shanghai 200433, PR China.
| | | | | | | |
Collapse
|
14
|
Xu Z, Hao B. CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes. Nucleic Acids Res 2009; 37:W174-8. [PMID: 19398429 PMCID: PMC2703908 DOI: 10.1093/nar/gkp278] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2009] [Revised: 04/10/2009] [Accepted: 04/14/2009] [Indexed: 11/21/2022] Open
Abstract
The CVTree web server (http://tlife.fudan.edu.cn/cvtree) presented here is a new implementation of the whole genome-based, alignment-free composition vector (CV) method for phylogenetic analysis. It is more efficient and user-friendly than the previously published version in the 2004 web server issue of Nucleic Acids Research. The development of whole genome-based alignment-free CV method has provided an independent verification to the traditional phylogenetic analysis based on a single gene or a few genes. This new implementation attempts to meet the challenge of ever increasing amount of genome data and includes in its database more than 850 prokaryotic genomes which will be updated monthly from NCBI, and more than 80 fungal genomes collected manually from several sequencing centers. This new CVTree web server provides a faster and stable research platform. Users can upload their own sequences to find their phylogenetic position among genomes selected from the server's; inbuilt database. All sequence data used in a session may be downloaded as a compressed file. In addition to standard phylogenetic trees, users can also choose to output trees whose monophyletic branches are collapsed to various taxonomic levels. This feature is particularly useful for comparing phylogeny with taxonomy when dealing with thousands of genomes.
Collapse
Affiliation(s)
- Zhao Xu
- T-Life Research Center, Fudan University, 220 Handan Road, Shanghai 200433, China.
| | | |
Collapse
|
15
|
Lin GN, Cai Z, Lin G, Chakraborty S, Xu D. ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets. BMC Bioinformatics 2009; 10 Suppl 1:S5. [PMID: 19208152 PMCID: PMC2648732 DOI: 10.1186/1471-2105-10-s1-s5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Background With the increasing availability of whole genome sequences, it is becoming more and more important to use complete genome sequences for inferring species phylogenies. We developed a new tool ComPhy, 'Composite Distance Phylogeny', based on a composite distance matrix calculated from the comparison of complete gene sets between genome pairs to produce a prokaryotic phylogeny. Results The composite distance between two genomes is defined by three components: Gene Dispersion Distance (GDD), Genome Breakpoint Distance (GBD) and Gene Content Distance (GCD). GDD quantifies the dispersion of orthologous genes along the genomic coordinates from one genome to another; GBD measures the shared breakpoints between two genomes; GCD measures the level of shared orthologs between two genomes. The phylogenetic tree is constructed from the composite distance matrix using a neighbor joining method. We tested our method on 9 datasets from 398 completely sequenced prokaryotic genomes. We have achieved above 90% agreement in quartet topologies between the tree created by our method and the tree from the Bergey's taxonomy. In comparison to several other phylogenetic analysis methods, our method showed consistently better performance. Conclusion ComPhy is a fast and robust tool for genome-wide inference of evolutionary relationship among genomes. It can be downloaded from .
Collapse
Affiliation(s)
- Guan Ning Lin
- Digital Biology Laboratory, Informatics Institute, Computer Science Department and Christopher S, Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.
| | | | | | | | | |
Collapse
|