1
|
Van Etten J, Stephens TG, Bhattacharya D. A k-mer-Based Approach for Phylogenetic Classification of Taxa in Environmental Genomic Data. Syst Biol 2023; 72:1101-1118. [PMID: 37314057 DOI: 10.1093/sysbio/syad037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 03/20/2023] [Accepted: 06/12/2023] [Indexed: 06/15/2023] Open
Abstract
In the age of genome sequencing, whole-genome data is readily and frequently generated, leading to a wealth of new information that can be used to advance various fields of research. New approaches, such as alignment-free phylogenetic methods that utilize k-mer-based distance scoring, are becoming increasingly popular given their ability to rapidly generate phylogenetic information from whole-genome data. However, these methods have not yet been tested using environmental data, which often tends to be highly fragmented and incomplete. Here, we compare the results of one alignment-free approach (which utilizes the D2 statistic) to traditional multi-gene maximum likelihood trees in 3 algal groups that have high-quality genome data available. In addition, we simulate lower-quality, fragmented genome data using these algae to test method robustness to genome quality and completeness. Finally, we apply the alignment-free approach to environmental metagenome assembled genome data of unclassified Saccharibacteria and Trebouxiophyte algae, and single-cell amplified data from uncultured marine stramenopiles to demonstrate its utility with real datasets. We find that in all instances, the alignment-free method produces phylogenies that are comparable, and often more informative, than those created using the traditional multi-gene approach. The k-mer-based method performs well even when there are significant missing data that include marker genes traditionally used for tree reconstruction. Our results demonstrate the value of alignment-free approaches for classifying novel, often cryptic or rare, species, that may not be culturable or are difficult to access using single-cell methods, but fill important gaps in the tree of life.
Collapse
Affiliation(s)
- Julia Van Etten
- Graduate Program in Ecology and Evolution, Rutgers, The State University of New Jersey, 14 College Farm Road, New Brunswick, NJ 08901, USA
| | - Timothy G Stephens
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, 59 Dudley Road, New Brunswick, NJ 08901, USA
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, 59 Dudley Road, New Brunswick, NJ 08901, USA
| |
Collapse
|
2
|
Vasundhara D, Raju VN, Hemalatha R, Nagpal R, Kumar M. Vaginal & gut microbiota diversity in pregnant women with bacterial vaginosis & effect of oral probiotics: An exploratory study. Indian J Med Res 2021; 153:492-502. [PMID: 34380796 PMCID: PMC8354056 DOI: 10.4103/ijmr.ijmr_350_19] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Background & objectives: The vaginal microbiota undergoes subtle changes during pregnancy and may affect several aspects of pregnancy outcomes. There has been no comprehensive study characterizing the gestational vaginal and gut microbiota and the dynamics of the microbiota with oral probiotics among Indian women. Hence, the study was aimed to explore the microbiota of pregnant women with normal microbiota and bacterial vaginosis (BV) environments and the effect of oral probiotics on the microbiota and the BV status in these women. Methods: Using high-throughput Illumina-MiSeq sequencing approach, the 16S rRNA gene amplicons were analyzed and the vaginal and gut microbiota of pregnant women with and without BV and pre- and post-probiotics (Lactobacillus rhamnosus GR-1 and Lactobacillus reuteri RC-14) intervention for a month was characterized. Results: The study revealed a compositional difference in the vaginal and gut microbiota between BV and healthy pregnant women. The vaginal microbiota of healthy women was characteristically predominated by Lactobacillus helveticus, followed by L. iners and L. gasseri; in contrast, women positive for BV harboured higher α-diversity and had lower abundance of L. helveticus. Similarly, Prevotella copri, a gut microbe, associated with normal environment was detected in the vaginal samples of all pregnant women without BV, it remained undetected in women with the infection, while all women with BV had Gardnerella vaginalis, which decreased significantly with probiotic treatment. Gut microbiota also revealed dominant abundance of P. copri in healthy women, whereas it was significantly lower in women with BV. The bacterial clade, P. copri abundance increased from 9.17 to 16.49 per cent in the probiotic group and reduced from 7.75 to 4.84 per cent in the placebo group. Interpretation & conclusions: This study showed gestational vaginal and gut microbiota differences in normal and BV environments. With probiotic treatment, the dynamics of L. helveticus and P. copri hint towards a possible role of probiotics in modulating the vaginal microbiota.
Collapse
Affiliation(s)
- Donugama Vasundhara
- Department of Clinical Epidemiology, ICMR-National Institute of Nutrition, Hyderabad, Telangana, India
| | - Vankudavath Naik Raju
- Nutrition Information, Communication & Health Education (NICHE), ICMR-National Institute of Nutrition, Hyderabad, Telangana, India
| | | | - Ravinder Nagpal
- Department of Internal Medicine-Molecular Medicine; Department of Microbiology & Immunology, Wake Forest, School of Medicine, Winston-Salem, NC, United States
| | - Manoj Kumar
- Department of Microbiology, ICMR-National Institute for Research in Environmental Health, Bhopal, Madhya Pradesh, India
| |
Collapse
|
3
|
Abstract
Inferring phylogenetic relationships among hundreds or thousands of microbial genomes is an increasingly common task. The conventional phylogenetic approach adopts multiple sequence alignment to compare gene-by-gene, concatenated multigene or whole-genome sequences, from which a phylogenetic tree would be inferred. These alignments follow the implicit assumption of full-length contiguity among homologous sequences. However, common events in microbial genome evolution (e.g., structural rearrangements and genetic recombination) violate this assumption. Moreover, aligning hundreds or thousands of sequences is computationally intensive and not scalable to the rate at which genome data are generated. Therefore, alignment-free methods present an attractive alternative strategy. Here we describe a scalable alignment-free strategy to infer phylogenetic relationships using complete genome sequences of bacteria and archaea, based on short, subsequences of length k (k-mers). We describe how this strategy can be extended to infer evolutionary relationships beyond a tree-like structure, to better capture both vertical and lateral signals of microbial evolution.
Collapse
|
4
|
Ruiz C, McCarley A, Espejo ML, Cooper KK, Harmon DE. Comparative Genomics Reveals a Well-Conserved Intrinsic Resistome in the Emerging Multidrug-Resistant Pathogen Cupriavidus gilardii. mSphere 2019; 4:e00631-19. [PMID: 31578249 PMCID: PMC6796972 DOI: 10.1128/msphere.00631-19] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 09/16/2019] [Indexed: 12/18/2022] Open
Abstract
The Gram-negative bacterium Cupriavidus gilardii is an emerging multidrug-resistant pathogen found in many environments. However, little is known about this species or its antibiotic resistance mechanisms. We used biochemical tests, antibiotic susceptibility experiments, and whole-genome sequencing to characterize an environmental C. gilardii isolate. Like clinical isolates, this isolate was resistant to meropenem, gentamicin, and other antibiotics. Resistance to these antibiotics appeared to be related to the large number of intrinsic antibiotic resistance genes found in this isolate. As determined by comparative genomics, this resistome was also well conserved in the only two other C. gilardii strains sequenced to date. The intrinsic resistome of C. gilardii did not include the colistin resistance gene mcr-5, which was in a transposon present only in one strain. The intrinsic resistome of C. gilardii was comprised of (i) many multidrug efflux pumps, such as a homolog of the Pseudomonas aeruginosa MexAB-OprM pump that may be involved in resistance to meropenem, other β-lactams, and aminoglycosides; (ii) a novel β-lactamase (OXA-837) that decreases susceptibility to ampicillin but not to other β-lactams tested; (iii) a new aminoglycoside 3-N-acetyltransferase [AAC(3)-IVb, AacC10] that decreases susceptibility to gentamicin and tobramycin; and (iv) a novel partially conserved aminoglycoside 3"-adenylyltransferase [ANT(3")-Ib, AadA32] that decreases susceptibility to spectinomycin and streptomycin. These findings provide the first mechanistic insight into the intrinsic resistance of C. gilardii to multiple antibiotics and its ability to become resistant to an increasing number of drugs during therapy.IMPORTANCECupriavidus gilardii is a bacterium that is gaining increasing attention both as an infectious agent and because of its potential use in the detoxification of toxic compounds and other biotechnological applications. In recent years, however, there has been an increasing number of reported infections, some of them fatal, caused by C. gilardii These infections are hard to treat because this bacterium is naturally resistant to many antibiotics, including last-resort antibiotics, such as carbapenems. Moreover, this bacterium often becomes resistant to additional antibiotics during therapy. However, little is known about C. gilardii and its antibiotic resistance mechanisms. The significance of our research is in providing, for the first time, whole-genome information about the natural antibiotic resistance genes found in this bacterium and their conservation among different C. gilardii strains. This information may provide new insights into the appropriate use of antibiotics in combating infections caused by this emerging pathogen.
Collapse
Affiliation(s)
- Cristian Ruiz
- Department of Biology, California State University, Northridge, Northridge, California, USA
| | - Ashley McCarley
- Department of Biology, California State University, Northridge, Northridge, California, USA
| | - Manuel Luis Espejo
- Department of Biology, California State University, Northridge, Northridge, California, USA
| | - Kerry K Cooper
- Department of Biology, California State University, Northridge, Northridge, California, USA
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona, USA
| | - Dana E Harmon
- Department of Biology, California State University, Northridge, Northridge, California, USA
| |
Collapse
|
5
|
Bernard G, Chan CX, Ragan MA. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer. Sci Rep 2016; 6:28970. [PMID: 27363362 PMCID: PMC4929450 DOI: 10.1038/srep28970] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 06/13/2016] [Indexed: 12/22/2022] Open
Abstract
Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.
Collapse
Affiliation(s)
- Guillaume Bernard
- Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Cheong Xin Chan
- Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Mark A. Ragan
- Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
6
|
Lin MF, Kitahara MV, Luo H, Tracey D, Geller J, Fukami H, Miller DJ, Chen CA. Mitochondrial genome rearrangements in the scleractinia/corallimorpharia complex: implications for coral phylogeny. Genome Biol Evol 2016; 6:1086-95. [PMID: 24769753 PMCID: PMC4040992 DOI: 10.1093/gbe/evu084] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Corallimorpharia is a small Order of skeleton-less animals that is closely related to the reef-building corals (Scleractinia) and of fundamental interest in the context of understanding the potential impacts of climate change in the future on coral reefs. The relationship between the nominal Orders Corallimorpharia and Scleractinia is controversial—the former is either the closest outgroup to the Scleractinia or alternatively is derived from corals via skeleton loss. This latter scenario, the “naked coral” hypothesis, is strongly supported by analyses based on mitochondrial (mt) protein sequences, whereas the former is equally strongly supported by analyses of mt nucleotide sequences. The “naked coral” hypothesis seeks to link skeleton loss in the putative ancestor of corallimorpharians with a period of elevated oceanic CO2 during the Cretaceous, leading to the idea that these skeleton-less animals may be harbingers for the fate of coral reefs under global climate change. In an attempt to better understand their evolutionary relationships, we examined mt genome organization in a representative range (12 species, representing 3 of the 4 extant families) of corallimorpharians and compared these patterns with other Hexacorallia. The most surprising finding was that mt genome organization in Corallimorphus profundus, a deep-water species that is the most scleractinian-like of all corallimorpharians on the basis of morphology, was much more similar to the common scleractinian pattern than to those of other corallimorpharians. This finding is consistent with the idea that C. profundus represents a key position in the coral <-> corallimorpharian transition.
Collapse
Affiliation(s)
- Mei-Fang Lin
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Shida F, Mizuta S. Measurement of word frequencies in genomic DNA sequences based on partial alignment and fuzzy set. J Bioinform Comput Biol 2014; 12:1450019. [PMID: 25152044 DOI: 10.1142/s021972001450019x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accompanied with the rapid increase of the amount of data registered in the databases of biological sequences, the need for a fast method of sequence comparison applicable to sequences of large size is also increasing. In general, alignment is used for sequence comparison. However, the alignment may not be appropriate for comparison of sequences of large size such as whole genome sequences due to its large time complexity. In this article, we propose a semi alignment-free method of sequence comparison based on word frequency distributions, in which we partially use the alignment to measure word frequencies along with the idea of fuzzy set theory. Experiments with ten bacterial genome sequences demonstrated that the fuzzy measurements has the effect that facilitates discrimination between close relatives and distant relatives.
Collapse
Affiliation(s)
- Fumiya Shida
- Graduate School of Science and Technology, Hirosaki University, 3 Bunkyo-cho, Hirosaki, Aomori 036-8561, Japan
| | | |
Collapse
|
8
|
Prasanna AN, Mehra S. Comparative phylogenomics of pathogenic and non-pathogenic mycobacterium. PLoS One 2013; 8:e71248. [PMID: 24015186 PMCID: PMC3756022 DOI: 10.1371/journal.pone.0071248] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Accepted: 06/26/2013] [Indexed: 11/19/2022] Open
Abstract
Mycobacterium species are the source of a variety of infectious diseases in a range of hosts. Genome based methods are used to understand the adaptation of each pathogenic species to its unique niche. In this work, we report the comparison of pathogenic and non-pathogenic Mycobacterium genomes. Phylogenetic trees were constructed using sequence of core orthologs, gene content and gene order. It is found that the genome based methods can better resolve the inter-species evolutionary distances compared to the conventional 16S based tree. Phylogeny based on gene order highlights distinct evolutionary characteristics as compared to the methods based on sequence, as illustrated by the shift in the relative position of M. abscessus. This difference in gene order among the Mycobacterium species is further investigated using a detailed synteny analysis. It is found that while rearrangements between some Mycobacterium genomes are local within synteny blocks, few possess global rearrangements across the genomes. The study illustrates how a combination of different genome based methods is essential to build a robust phylogenetic relationship between closely related organisms.
Collapse
Affiliation(s)
- Arun N. Prasanna
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Sarika Mehra
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
- * E-mail:
| |
Collapse
|
9
|
Luo H, Arndt W, Zhang Y, Shi G, Alekseyev M, Tang J, Hughes AL, Friedman R. Phylogenetic analysis of genome rearrangements among five mammalian orders. Mol Phylogenet Evol 2012; 65:871-82. [PMID: 22929217 PMCID: PMC4425404 DOI: 10.1016/j.ympev.2012.08.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Revised: 08/11/2012] [Accepted: 08/13/2012] [Indexed: 01/16/2023]
Abstract
Evolutionary relationships among placental mammalian orders have been controversial. Whole genome sequencing and new computational methods offer opportunities to resolve the relationships among 10 genomes belonging to the mammalian orders Primates, Rodentia, Carnivora, Perissodactyla and Artiodactyla. By application of the double cut and join distance metric, where gene order is the phylogenetic character, we computed genomic distances among the sampled mammalian genomes. With a marsupial outgroup, the gene order tree supported a topology in which Rodentia fell outside the cluster of Primates, Carnivora, Perissodactyla, and Artiodactyla. Results of breakpoint reuse rate and synteny block length analyses were consistent with the prediction of random breakage model, which provided a diagnostic test to support use of gene order as an appropriate phylogenetic character in this study. We discussed the influence of rate differences among lineages and other factors that may contribute to different resolutions of mammalian ordinal relationships by different methods of phylogenetic reconstruction.
Collapse
Affiliation(s)
- Haiwei Luo
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| | - William Arndt
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Yiwei Zhang
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Guanqun Shi
- Department of Computer Science, University of California, Riverside, 92521, USA
| | - Max Alekseyev
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Austin L. Hughes
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| | - Robert Friedman
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| |
Collapse
|
10
|
Lin Y, Rajan V, Moret BME. TIBA: a tool for phylogeny inference from rearrangement data with bootstrap analysis. Bioinformatics 2012; 28:3324-5. [DOI: 10.1093/bioinformatics/bts603] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
11
|
Lin Y, Rajan V, Moret BME. Bootstrapping phylogenies inferred from rearrangement data. Algorithms Mol Biol 2012; 7:21. [PMID: 22931958 PMCID: PMC3487984 DOI: 10.1186/1748-7188-7-21] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 07/26/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. RESULTS We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. CONCLUSIONS Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested.
Collapse
Affiliation(s)
- Yu Lin
- Laboratory for Computational Biology and Bioinformatics, EPFL, EPFL-IC-LCBB INJ230, Station 14, CH-1015 Lausanne, Switzerland
| | - Vaibhav Rajan
- Laboratory for Computational Biology and Bioinformatics, EPFL, EPFL-IC-LCBB INJ230, Station 14, CH-1015 Lausanne, Switzerland
| | - Bernard ME Moret
- Laboratory for Computational Biology and Bioinformatics, EPFL, EPFL-IC-LCBB INJ230, Station 14, CH-1015 Lausanne, Switzerland
| |
Collapse
|
12
|
Korenblat K, Volkovich Z, Bolshoy A. Robust classifying of prokaryotic genomes. Comput Biol Chem 2012; 40:20-9. [PMID: 22940609 DOI: 10.1016/j.compbiolchem.2012.07.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2012] [Revised: 07/03/2012] [Accepted: 07/03/2012] [Indexed: 01/07/2023]
Abstract
In this paper, we propose a method to classify prokaryotic genomes using the agglomerative information bottleneck method for unsupervised clustering. Although the method we present here is closely related to a group of methods based on detecting the presence or absence of genes, our method is different because it uses gene lengths as well. We show that this amended method is reliable. For robustness evaluation, we apply bootstrap and jackknife techniques to input data. As a result, we are able to propose an approach to determine the stability level of a cladogram. We demonstrate that the genome tree produced for a selected small group of genomes looks a lot like a phylogenetic tree of this group.
Collapse
Affiliation(s)
- Katerina Korenblat
- Software Engineering Department, ORT Braude Academic College, Karmiel, Israel
| | | | | |
Collapse
|
13
|
Lin Y, Rajan V, Moret BME. Fast and accurate phylogenetic reconstruction from high-resolution whole-genome data and a novel robustness estimator. J Comput Biol 2012; 18:1131-9. [PMID: 21899420 DOI: 10.1089/cmb.2011.0114] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The rapid accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, even simple models lead to NP-hard problems, particularly in the area of phylogenetic analysis. Current approaches are limited to small collections of genomes and low-resolution data (typically a few hundred syntenic blocks). Moreover, whereas phylogenetic analyses from sequence data are deemed incomplete unless bootstrapping scores (a measure of confidence) are given for each tree edge, no equivalent to bootstrapping exists for rearrangement-based phylogenetic analysis. We describe a fast and accurate algorithm for rearrangement analysis that scales up, in both time and accuracy, to modern high-resolution genomic data. We also describe a novel approach to estimate the robustness of results-an equivalent to the bootstrapping analysis used in sequence-based phylogenetic reconstruction. We present the results of extensive testing on both simulated and real data showing that our algorithm returns very accurate results, while scaling linearly with the size of the genomes and cubically with their number. We also present extensive experimental results showing that our approach to robustness testing provides excellent estimates of confidence, which, moreover, can be tuned to trade off thresholds between false positives and false negatives. Together, these two novel approaches enable us to attack heretofore intractable problems, such as phylogenetic inference for high-resolution vertebrate genomes, as we demonstrate on a set of six vertebrate genomes with 8,380 syntenic blocks. A copy of the software is available on demand.
Collapse
Affiliation(s)
- Y Lin
- Laboratory for Computational Biology and Bioinformatics, EPFL, Lausanne, Switzerland
| | | | | |
Collapse
|
14
|
Fast and Accurate Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Novel Robustness Estimator. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-3-642-16181-0_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
|