1
|
Jacobson D, Zheng Y, Plucinski MM, Qvarnstrom Y, Barratt JLN. Evaluation of various distance computation methods for construction of haplotype-based phylogenies from large MLST dataset. Mol Phylogenet Evol 2022; 177:107608. [PMID: 35963590 PMCID: PMC10127246 DOI: 10.1016/j.ympev.2022.107608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 06/30/2022] [Accepted: 08/05/2022] [Indexed: 11/24/2022]
Abstract
Multi-locus sequence typing (MLST) is widely used to investigate genetic relationships among eukaryotic taxa, including parasitic pathogens. MLST analysis workflows typically involve construction of alignment-based phylogenetic trees - i.e., where tree structures are computed from nucleotide differences observed in a multiple sequence alignment (MSA). Notably, alignment-based phylogenetic methods require that all isolates/taxa are represented by a single sequence. When multiple loci are sequenced these sequences may be concatenated to produce one tree that includes information from all loci. Alignment-based phylogenetic techniques are robust and widely used yet possess some shortcomings, including how heterozygous sites are handled, intolerance for missing data (i.e., partial genotypes), and differences in the way insertions-deletions (indels) are scored/treated during tree construction. In certain contexts, 'haplotype-based' methods may represent a viable alternative to alignment-based techniques, as they do not possess the aforementioned limitations. This is namely because haplotype-based methods assess genetic similarity based on numbers of shared (i.e., intersecting) haplotypes as opposed to similarities in nucleotide composition observed in an MSA. For haplotype-based comparisons, choosing an appropriate distance statistic is fundamental, and several statistics are available to choose from. However, a comprehensive assessment of various available statistics for their ability to produce a robust haplotype-based phylogenetic reconstruction has not yet been performed. We evaluated seven distance statistics by applying them to extant MLST datasets from the gastrointestinal parasite Cyclospora cayetanensis and two species of pathogenic nematode of the genus Strongyloides. We compare the genetic relationships identified using each statistic to epidemiologic, geographic, and host metadata. We show that Barratt's heuristic definition of genetic distance was the most robust among the statistics evaluated. Consequently, it is proposed that Barratt's heuristic represents a useful approach for use in the context of challenging MLST datasets possessing features (i.e., high heterozygosity, partial genotypes, and indel or repeat-based polymorphisms) that confound or preclude the use of alignment-based methods.
Collapse
Affiliation(s)
- David Jacobson
- Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA; Oak Ridge Associated Universities, Oak Ridge, TN, USA
| | - Yueli Zheng
- Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA; Eagle Global Scientific, San Antonio, TX, USA
| | - Mateusz M Plucinski
- Malaria Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA; U.S. President's Malaria Initiative, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Yvonne Qvarnstrom
- Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Joel L N Barratt
- Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA.
| |
Collapse
|
2
|
Abstract
In 1981, the Journal of Molecular Evolution (JME) published an article entitled "Evolutionary trees from DNA sequences: A maximum likelihood approach" by Joseph (Joe) Felsenstein (J Mol Evol 17:368-376, 1981). This groundbreaking work laid the foundation for the emerging field of statistical phylogenetics, providing a tractable way of finding maximum likelihood (ML) estimates of evolutionary trees from DNA sequence data. This paper is the second most cited (more than 9000 citations) in JME after Kimura's (J Mol Evol 16:111-120, 1980) seminal paper on a model of nucleotide substitution (with nearly 20,000 citations). On the occasion of the 50th anniversary of JME, we elaborate on the significance of Felsenstein's ML approach to estimating phylogenetic trees.
Collapse
Affiliation(s)
- David Posada
- CINBIO, Universidade de Vigo, 36310, Vigo, Spain.
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310, Vigo, Spain.
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain.
| | - Keith A Crandall
- Computational Biology Institute and Milken Institute School of Public Health, The George Washington University, Washington, DC, 20052, USA.
- Department of Biostatistics & Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC, 20052, USA.
| |
Collapse
|
3
|
Gouy M, Tannier E, Comte N, Parsons DP. Seaview Version 5: A Multiplatform Software for Multiple Sequence Alignment, Molecular Phylogenetic Analyses, and Tree Reconciliation. Methods Mol Biol 2021; 2231:241-260. [PMID: 33289897 DOI: 10.1007/978-1-0716-1036-7_15] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
We present Seaview version 5, a multiplatform program to perform multiple alignment and phylogenetic tree building from molecular sequence data. Seaview provides network access to sequence databases, alignment with arbitrary algorithm, parsimony, distance and maximum likelihood tree building with PhyML, and display, printing, and copy-to-clipboard or to SVG files of rooted or unrooted, binary or multifurcating phylogenetic trees. While Seaview is primarily a program providing a graphical user interface to guide the user into performing desired analyses, Seaview possesses also a command-line mode adequate for user-provided scripts. Seaview version 5 introduces the ability to reconcile a gene tree with a reference species tree and use this reconciliation to root and rearrange the gene tree. Seaview is freely available at http://doua.prabi.fr/software/seaview .
Collapse
Affiliation(s)
- Manolo Gouy
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France.
| | - Eric Tannier
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France
- INRIA Grenoble-Rhône-Alpes, Montbonnot, France
| | | | | |
Collapse
|
4
|
Jermiin LS, Catullo RA, Holland BR. A new phylogenetic protocol: dealing with model misspecification and confirmation bias in molecular phylogenetics. NAR Genom Bioinform 2020; 2:lqaa041. [PMID: 33575594 PMCID: PMC7671319 DOI: 10.1093/nargab/lqaa041] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/18/2020] [Accepted: 06/04/2020] [Indexed: 12/15/2022] Open
Abstract
Molecular phylogenetics plays a key role in comparative genomics and has increasingly significant impacts on science, industry, government, public health and society. In this paper, we posit that the current phylogenetic protocol is missing two critical steps, and that their absence allows model misspecification and confirmation bias to unduly influence phylogenetic estimates. Based on the potential offered by well-established but under-used procedures, such as assessment of phylogenetic assumptions and tests of goodness of fit, we introduce a new phylogenetic protocol that will reduce confirmation bias and increase the accuracy of phylogenetic estimates.
Collapse
Affiliation(s)
- Lars S Jermiin
- CSIRO Land & Water, Canberra, ACT 2601, Australia
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
- School of Biology & Environment Science, University College Dublin, Belfield, Dublin 4, Ireland
- Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Renee A Catullo
- CSIRO Land & Water, Canberra, ACT 2601, Australia
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
- School of Science and Health & Hawkesbury Institute of the Environment, Western Sydney University, Penrith, NSW 2751, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
| |
Collapse
|
5
|
Naser-Khdour S, Minh BQ, Zhang W, Stone EA, Lanfear R. The Prevalence and Impact of Model Violations in Phylogenetic Analysis. Genome Biol Evol 2019; 11:3341-3352. [PMID: 31536115 PMCID: PMC6893154 DOI: 10.1093/gbe/evz193] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2019] [Indexed: 12/24/2022] Open
Abstract
In phylogenetic inference, we commonly use models of substitution which assume that sequence evolution is stationary, reversible, and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic data sets. We show that roughly one-quarter of all the partitions we analyzed (23.5%) reject the SRH assumptions, and that for 25% of data sets, tree topologies inferred from all partitions differ significantly from topologies inferred using the subset of partitions that do not reject the SRH assumptions. This proportion increases when comparing trees inferred using the subset of partitions that rejects the SRH assumptions, to those inferred from partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. Our results also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org).
Collapse
Affiliation(s)
- Suha Naser-Khdour
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Bui Quang Minh
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
- Research School of Computer Science, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Wenqi Zhang
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Eric A Stone
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| |
Collapse
|
6
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
7
|
Cruickshank RH, Thomas RH. EVOLUTION OF HAPLODIPLOIDY IN DERMANYSSINE MITES (ACARI: MESOSTIGMATA). Evolution 2017; 53:1796-1803. [PMID: 28565470 DOI: 10.1111/j.1558-5646.1999.tb04563.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/1998] [Accepted: 05/26/1999] [Indexed: 11/29/2022]
Abstract
Haplodiploidy, a widespread phenomenon in which males are haploid and females are diploid, can be caused by a number of different underlying genetic systems. In the most common of these, arrhenotoky, males arise from unfertilized eggs, whereas females arise from fertilized eggs. In another system, pseudoarrhenotoky, males arise from fertilized eggs, but they eliminate the paternal genome at some point prior to spermatogenesis, with the consequence that they do not pass this genome to their offspring. In 1931 Schrader and Hughes-Schrader suggested that arrhenotoky arises through a series of stages involving pseudoarrhenotokous systems such as those found in many scale insects (Homoptera: Coccoidea), however, their hypothesis has been largely ignored. We have used a phylogenetic analysis of 751 base pairs of 28S rDNA from a group of mites (Mesostigmata: Dermanyssina) that contains arrhenotokous, pseudoarrhenotokous, and ancestrally diplodiploid members to test this hypothesis. Neighbor-joining, maximum-parsimony, and maximum-likelihood methods all indicate that the arrhenotokous members of this group form a clade that arose from a pseudoarrhenotokous ancestor, rather than directly from a diplodiploid one. This provides unequivocal support for the hypothesis of Schrader and Hughes-Schrader. The wider implications of this result for the evolution of uniparental genetic systems are discussed.
Collapse
Affiliation(s)
- Robert H Cruickshank
- Division of Environmental and Evolutionary Biology, Institute of Biomedical and Life Sciences, Graham Kerr Building, University of Glasgow, Glasgow, G12 8QQ, United Kingdom
| | - Richard H Thomas
- Department of Zoology, The Natural History Museum, Cromwell Road, London, SW7 5BD, United Kingdom
| |
Collapse
|
8
|
Abstract
Most phylogenetic methods are model-based and depend on models of evolution designed to approximate the evolutionary processes. Several methods have been developed to identify suitable models of evolution for phylogenetic analysis of alignments of nucleotide or amino acid sequences and some of these methods are now firmly embedded in the phylogenetic protocol. However, in a disturbingly large number of cases, it appears that these models were used without acknowledgement of their inherent shortcomings. In this chapter, we discuss the problem of model selection and show how some of the inherent shortcomings may be identified and overcome.
Collapse
Affiliation(s)
| | - Vivek Jayaswal
- School of Biomedical Sciences, Queensland University of Technology, Brisbane, QLD, Australia
| | - Faisal M Ababneh
- Department of Mathematics & Statistics, Al-Hussein Bin Talal University, Ma'an, Jordan
| | - John Robinson
- School of Mathematics & Statistics, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
9
|
Diversification of Sisorid catfishes (Teleostei: Siluriformes) in relation to the orogeny of the Himalayan Plateau. Sci Bull (Beijing) 2016. [DOI: 10.1007/s11434-016-1104-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
10
|
Fares M. Modeling Evolution of Molecular Sequences. NATURAL SELECTION 2014:28-47. [DOI: 10.1201/b17795-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
11
|
|
12
|
Morgan CC, Foster PG, Webb AE, Pisani D, McInerney JO, O'Connell MJ. Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol 2013; 30:2145-56. [PMID: 23813979 PMCID: PMC3748356 DOI: 10.1093/molbev/mst117] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Heterogeneity among life traits in mammals has resulted in considerable phylogenetic conflict, particularly concerning the position of the placental root. Layered upon this are gene- and lineage-specific variation in amino acid substitution rates and compositional biases. Life trait variations that may impact upon mutational rates are longevity, metabolic rate, body size, and germ line generation time. Over the past 12 years, three main conflicting hypotheses have emerged for the placement of the placental root. These hypotheses place the Atlantogenata (common ancestor of Xenarthra plus Afrotheria), the Afrotheria, or the Xenarthra as the sister group to all other placental mammals. Model adequacy is critical for accurate tree reconstruction and by failing to account for these compositional and character exchange heterogeneities across the tree and data set, previous studies have not provided a strongly supported hypothesis for the placental root. For the first time, models that accommodate both tree and data set heterogeneity have been applied to mammal data. Here, we show the impact of accurate model assignment and the importance of data sets in accommodating model parameters while maintaining the power to reject competing hypotheses. Through these sophisticated methods, we demonstrate the importance of model adequacy, data set power and provide strong support for the Atlantogenata over other competing hypotheses for the position of the placental root.
Collapse
Affiliation(s)
- Claire C Morgan
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Glasnevin, Dublin, Ireland
| | | | | | | | | | | |
Collapse
|
13
|
A unified approach to the transition matrices of DNA substitution models. Math Biosci 2013; 242:111-6. [DOI: 10.1016/j.mbs.2012.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2011] [Revised: 06/04/2012] [Accepted: 12/19/2012] [Indexed: 11/23/2022]
|
14
|
Zhang R, Yap VB. Context-dependent substitution models for circular DNA. INFECTION GENETICS AND EVOLUTION 2013; 18:362-6. [PMID: 23499773 DOI: 10.1016/j.meegid.2013.03.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Revised: 02/25/2013] [Accepted: 03/02/2013] [Indexed: 11/30/2022]
Abstract
The most general context-dependent Markov substitution process, where each substitution event involves only one site and substitution rates depend on the whole sequence, is presented for the first time. The focus is on circular DNA sequences, where the problem of specifying the behaviour of the first and last sites in a linear sequence does not arise. Important special cases include (1) the established models where each site behaves independently, (2) models which are increasingly applied to non-coding DNA, where each site depends on only the immediate neighbouring sites, and (3) models where each site depends on two closest neighbours on both sides, such as the codon models. These special cases are classified and illustrated by published models. It is shown that the existing codon substitution models mix up the mutation and selection processes, rendering the substitution rates challenging to interpret. The classification suggests the study of a more interpretable codon model, where the mutation and selection processes are clearly delineated. Furthermore, this model allows a natural accommodation of possibly different selection pressures in overlapping reading frames, which may contribute to furthering the understanding of viral diseases. Also included are brief discussions on the stationary distribution of a context-dependent substitution process and a simple recipe for simulating it on a computer.
Collapse
Affiliation(s)
- Rongli Zhang
- Department of Statistics and Applied Probability, National University of Singapore, Block S16 Level 7, 6 Science Drive 2, Singapore 117546, Singapore
| | | |
Collapse
|
15
|
Pham VH, Nguyen TV, Nguyen TTT, Dang LD, Hoang NH, Nguyen TV, Abe K. Rubella epidemic in Vietnam: characteristic of rubella virus genes from pregnant women and their fetuses/newborns with congenital rubella syndrome. J Clin Virol 2013; 57:152-6. [PMID: 23481444 DOI: 10.1016/j.jcv.2013.02.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2012] [Revised: 01/31/2013] [Accepted: 02/10/2013] [Indexed: 11/18/2022]
Abstract
BACKGROUND Rubella remains poorly controlled in Southeast Asia, including Vietnam. OBJECTIVES The aim of this study was to characterize rubella virus spread in Vietnam during 2011-2012. STUDY DESIGN Amniotic fluid, throat swab and placenta samples were collected from 130 patients (110 cases from pregnant women with suspected rubella and 20 cases from fetuses/newborns). Viral RNA was obtained directly from clinical specimens, amplified by PCR, and then the E1 gene containing 739 nucleotides recommended by the WHO to identify the viral genotypes was sequenced. RESULTS By screening with real-time PCR, viral RNA was detectable in amniotic fluids from 103 out of 110 (93.6%) pregnant women with suspected rubella and in the throat swabs from all of 20 (100%) fetuses/newborns. In addition, viral RNA was also detected in the placenta from all cases of fetuses/newborns. All of 20 fetuses/newborns presented with congenital cataract. Twenty-four strains with the E1 gene were obtained by PCR. Using phylogenetic analysis with rubella reference sequences, all of the strains were found to be genotype 2B. Interestingly, 94% (30/32) of Vietnamese strains, including 9 strains from the database, formed an independent cluster within the genotype 2B suggesting that indigenous viruses are prevalent in this region. CONCLUSIONS Rubella virus identified in Vietnam belonged to the genotype 2B. Importantly, the infection rate of rubella virus in fetuses/newborns was 100% and all of them had congenital cataract. Our results indicate an establishment of rubella prevention in this area is an urgent task in order to improve maternal and child health.
Collapse
Affiliation(s)
- Van Hung Pham
- Biomedical Laboratory, School of Medicine, University of Medicine and Pharmacy in Ho Chi Minh City, Ho Chi Minh City, Viet Nam
| | | | | | | | | | | | | |
Collapse
|
16
|
Findley K, Sun S, Fraser JA, Hsueh YP, Averette AF, Li W, Dietrich FS, Heitman J. Discovery of a modified tetrapolar sexual cycle in Cryptococcus amylolentus and the evolution of MAT in the Cryptococcus species complex. PLoS Genet 2012; 8:e1002528. [PMID: 22359516 PMCID: PMC3280970 DOI: 10.1371/journal.pgen.1002528] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2011] [Accepted: 12/21/2011] [Indexed: 12/16/2022] Open
Abstract
Sexual reproduction in fungi is governed by a specialized genomic region called the mating-type locus (MAT). The human fungal pathogenic and basidiomycetous yeast Cryptococcus neoformans has evolved a bipolar mating system (a, α) in which the MAT locus is unusually large (>100 kb) and encodes >20 genes including homeodomain (HD) and pheromone/receptor (P/R) genes. To understand how this unique bipolar mating system evolved, we investigated MAT in the closely related species Tsuchiyaea wingfieldii and Cryptococcus amylolentus and discovered two physically unlinked loci encoding the HD and P/R genes. Interestingly, the HD (B) locus sex-specific region is restricted (∼2 kb) and encodes two linked and divergently oriented homeodomain genes in contrast to the solo HD genes (SXI1α, SXI2a) of C. neoformans and Cryptococcus gattii. The P/R (A) locus contains the pheromone and pheromone receptor genes but has expanded considerably compared to other outgroup species (Cryptococcus heveanensis) and is linked to many of the genes also found in the MAT locus of the pathogenic Cryptococcus species. Our discovery of a heterothallic sexual cycle for C. amylolentus allowed us to establish the biological roles of the sex-determining regions. Matings between two strains of opposite mating-types (A1B1×A2B2) produced dikaryotic hyphae with fused clamp connections, basidia, and basidiospores. Genotyping progeny using markers linked and unlinked to MAT revealed that meiosis and uniparental mitochondrial inheritance occur during the sexual cycle of C. amylolentus. The sexual cycle is tetrapolar and produces fertile progeny of four mating-types (A1B1, A1B2, A2B1, and A2B2), but a high proportion of progeny are infertile, and fertility is biased towards one parental mating-type (A1B1). Our studies reveal insights into the plasticity and transitions in both mechanisms of sex determination (bipolar versus tetrapolar) and sexual reproduction (outcrossing versus inbreeding) with implications for similar evolutionary transitions and processes in fungi, plants, and animals.
Collapse
Affiliation(s)
- Keisha Findley
- Genetics and Molecular Biology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Sheng Sun
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, United States of America
| | - James A. Fraser
- School of Molecular and Microbial Sciences, University of Queensland, Brisbane, Australia
| | - Yen-Ping Hsueh
- Division of Biology, California Institute of Technology, Pasadena, California, United States of America
| | - Anna Floyd Averette
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Wenjun Li
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Fred S. Dietrich
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Joseph Heitman
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
17
|
Bérard J, Guéguen L. Accurate estimation of substitution rates with neighbor-dependent models in a phylogenetic context. Syst Biol 2012; 61:510-21. [PMID: 22331438 DOI: 10.1093/sysbio/sys024] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Most models and algorithms developed to perform statistical inference from DNA data make the assumption that substitution processes affecting distinct nucleotide sites are stochastically independent. This assumption ensures both mathematical and computational tractability but is in disagreement with observed data in many situations--one well-known example being CpG dinucleotide hypermutability in mammalian genomes. In this paper, we consider the class of RN95 + YpR substitution models, which allows neighbor-dependent effects--including CpG hypermutability--to be taken into account, through transitions between pyrimidine-purine dinucleotides. We show that it is possible to adapt inference methods originally developed under the assumption of independence between sites to RN95 + YpR models, using a mathematically rigorous framework provided by specific structural properties of this class of models. We assess how efficient this approach is at inferring the CpG hypermutability rate from aligned DNA sequences. The method is tested on simulated data and compared against several alternatives; the results suggest that it delivers a high degree of accuracy at a low computational cost. We then apply our method to an alignment of 10 DNA sequences from primate species. Model comparisons within the RN95 + YpR class show the importance of taking into account neighbor-dependent effects. An application of the method to the detection of hypomethylated islands is discussed.
Collapse
Affiliation(s)
- Jean Bérard
- Institut Camille Jordan, UMR CNRS 5208, Université Lyon 1, Villeurbanne F-69622 Cedex, Université de Lyon, Lyon 69003, France
| | | |
Collapse
|
18
|
Gao JJ, Hu YG, Toda MJ, Katoh T, Tamura K. Phylogenetic relationships between Sophophora and Lordiphosa, with proposition of a hypothesis on the vicariant divergences of tropical lineages between the Old and New Worlds in the family Drosophilidae. Mol Phylogenet Evol 2011; 60:98-107. [DOI: 10.1016/j.ympev.2011.04.012] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Revised: 04/11/2011] [Accepted: 04/18/2011] [Indexed: 10/18/2022]
|
19
|
Kartavtsev YP. Sequence divergence at mitochondrial genes in animals: Applicability of DNA data in genetics of speciation and molecular phylogenetics. Mar Genomics 2011; 4:71-81. [DOI: 10.1016/j.margen.2011.02.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Revised: 01/26/2011] [Accepted: 02/23/2011] [Indexed: 11/15/2022]
|
20
|
Criscuolo A, Gribaldo S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 2010; 10:210. [PMID: 20626897 PMCID: PMC3017758 DOI: 10.1186/1471-2148-10-210] [Citation(s) in RCA: 922] [Impact Index Per Article: 65.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Accepted: 07/13/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step. RESULTS Here is presented a new software, named BMGE (Block Mapping and Gathering with Entropy), that is designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. For each character, BMGE computes a score closely related to an entropy value. Calculation of these entropy-like scores is weighted with BLOSUM or PAM similarity matrices in order to distinguish among biologically expected and unexpected variability for each aligned character. Sets of contiguous characters with a score above a given threshold are considered as not suited for phylogenetic inference and then removed. Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences. BMGE also implements trimming and recoding methods aimed at minimizing phylogeny reconstruction artefacts due to compositional heterogeneity. CONCLUSIONS BMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences. Java source code and executable are freely available at ftp://ftp.pasteur.fr/pub/GenSoft/projects/BMGE/.
Collapse
Affiliation(s)
- Alexis Criscuolo
- Institut Pasteur, Unité de Biologie Moléculaire du Gène chez les Extrêmophiles, Département de Microbiologie, 25 rue du Dr Roux, 75015 Paris, France
| | - Simonetta Gribaldo
- Institut Pasteur, Unité de Biologie Moléculaire du Gène chez les Extrêmophiles, Département de Microbiologie, 25 rue du Dr Roux, 75015 Paris, France
| |
Collapse
|
21
|
Gouy M, Guindon S, Gascuel O. SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building. Mol Biol Evol 2009; 27:221-4. [PMID: 19854763 DOI: 10.1093/molbev/msp259] [Citation(s) in RCA: 3894] [Impact Index Per Article: 259.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
22
|
Waddell PJ, Ota R, Penny D. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests. J Mol Evol 2009; 69:289-99. [PMID: 19851702 DOI: 10.1007/s00239-009-9268-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2009] [Accepted: 07/28/2009] [Indexed: 11/29/2022]
Abstract
Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P << 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.
Collapse
Affiliation(s)
- Peter J Waddell
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47906, USA.
| | | | | |
Collapse
|
23
|
Jermiin LS, Ho JWK, Lau KW, Jayaswal V. SeqVis: a tool for detecting compositional heterogeneity among aligned nucleotide sequences. Methods Mol Biol 2009; 537:65-91. [PMID: 19378140 DOI: 10.1007/978-1-59745-251-9_4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Compositional heterogeneity is a poorly appreciated attribute of aligned nucleotide and amino acid sequences. It is a common property of molecular phylogenetic data, and it has been found to occur across sequences and/or across sites. Most molecular phylogenetic methods assume that the sequences have evolved under globally stationary, reversible, and homogeneous conditions, implying that the sequences should be compositionally homogeneous. The presence of the above-mentioned compositional heterogeneity implies that the sequences must have evolved under more general conditions than is commonly assumed. Consequently, there is a need for reliable methods to detect under what conditions alignments of nucleotides or amino acids may have evolved. In this chapter, we describe one such program. SeqVis is designed to survey aligned nucleotide sequences. We discuss pros-et-cons of this program in the context of other methods to detect compositional heterogeneity and violated phylogenetic assumptions. The benefits provided by SeqVis are demonstrated in two studies of alignments of nucleotides, one of which contained 7542 nucleotides from 53 species.
Collapse
Affiliation(s)
- Lars Sommer Jermiin
- School of Biological Sciences, Centre for Mathematical Biology and Sydney Bioinformatics, University of Sydney, Sydney, Australia
| | | | | | | |
Collapse
|
24
|
Stewart FJ, Young CR, Cavanaugh CM. Evidence for homologous recombination in intracellular chemosynthetic clam symbionts. Mol Biol Evol 2009; 26:1391-404. [PMID: 19289597 DOI: 10.1093/molbev/msp049] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Homologous recombination is a fundamental mechanism for the genetic diversification of free-living bacteria. However, recombination may be limited in endosymbiotic bacteria, as these taxa are locked into an intracellular niche and may rarely encounter sources of foreign DNA. This study tested the hypothesis that vertically transmitted endosymbionts of deep-sea clams (Bivalvia: Vesicomyidae) show little or no evidence of recombination. Phylogenetic analysis of 13 loci distributed across the genomes of 14 vesicomyid symbionts revealed multiple, well-supported inconsistencies among gene tree topologies, and maximum likelihood-based tests rejected a hypothesis of shared evolutionary history (linkage) among loci. Further, multiple statistical methods confirmed the presence of recombination by detecting intragenic breakpoints in two symbiont loci. Recombination may be confined to a subset of vesicomyid symbionts, as some clades showed high levels of genomic stability, whereas others showed clear patterns of homologous exchange. Notably, a mosaic genome is present in symB, a symbiont lineage shown to have been acquired laterally (i.e., nonvertically) by Vesicomya sp. JdF clams. The majority of loci analyzed here supported a tight sister clustering of symB with the symbiont of a host species from the Mid-Atlantic Ridge, whereas others placed symB in a clade with symA, the dominant phylotype of V. sp. JdF clams. This result raises the hypothesis that lateral symbiont transfer between hosts may facilitate recombination by bringing divergent symbiont lineages into contact. Together, the data show that homologous recombination contributes to the diversification of vesicomyid clam symbionts, despite the intracellular lifestyle of these bacteria.
Collapse
Affiliation(s)
- Frank J Stewart
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | |
Collapse
|
25
|
O'Brien JD, Minin VN, Suchard MA. Learning to count: robust estimates for labeled distances between molecular sequences. Mol Biol Evol 2009; 26:801-14. [PMID: 19131426 DOI: 10.1093/molbev/msp003] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Researchers routinely estimate distances between molecular sequences using continuous-time Markov chain models. We present a new method, robust counting, that protects against the possibly severe bias arising from model misspecification. We achieve this robustness by generalizing the conventional distance estimation to incorporate the empirical distribution of site patterns found in the observed pairwise sequence alignment. Our flexible framework allows for computing distances based only on a subset of possible substitutions. From this, we show how to estimate labeled codon distances, such as expected numbers of synonymous or nonsynonymous substitutions. We present two simulation studies. The first compares the relative bias and variance of conventional and robust labeled nucleotide estimators. In the second simulation, we demonstrate that robust counting furnishes accurate synonymous and nonsynonymous distance estimates based only on easy-to-fit models of nucleotide substitution, bypassing the need for computationally expensive codon models. We conclude with three empirical examples. In the first two examples, we investigate the evolutionary dynamics of the influenza A hemagglutinin gene using labeled codon distances. In the final example, we demonstrate the advantages of using robust synonymous distances to alleviate the effect of convergent evolution on phylogenetic analysis of an HIV transmission network.
Collapse
Affiliation(s)
- John D O'Brien
- Department of Biomathematics, University of California, Los Angeles, USA
| | | | | |
Collapse
|
26
|
Klein J. Understanding the molecular epidemiology of foot-and-mouth-disease virus. INFECTION GENETICS AND EVOLUTION 2008; 9:153-61. [PMID: 19100342 PMCID: PMC7172361 DOI: 10.1016/j.meegid.2008.11.005] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/04/2008] [Revised: 11/20/2008] [Accepted: 11/20/2008] [Indexed: 12/28/2022]
Abstract
The use of molecular epidemiology is an important tool in understanding and consequently controlling FMDV. In this review I will present basic information about the disease, needed to perform molecular epidemiology. I will give a short introduction to the history and impact of foot-and-mouth disease, clinical picture, infection route, subclinical and persistent infections, general aspects of the transmission of FMDV, serotype-specific epidemiological characteristics, field epidemiology of FMDV, evolution and molecular epidemiology of FMDV. This is followed by two chapters describing the molecular epidemiology of foot-and-mouth disease in global surveillance and molecular epidemiology of foot-and-mouth disease in outbreak investigation.
Collapse
Affiliation(s)
- Joern Klein
- Norwegian University of Science and Technology, Faculty of Medicine, Department of Cancer Research and Molecular Medicine, N-7489 Trondheim, Norway.
| |
Collapse
|
27
|
Hoang PL, Trong KH, Tran TT, Huy TTT, Abe K. Detection of hepatitis A virus RNA from children patients with acute and fulminant hepatitis of unknown etiology in Vietnam: Genomic characterization of Vietnamese HAV strain. Pediatr Int 2008; 50:624-7. [PMID: 19261107 DOI: 10.1111/j.1442-200x.2008.02626.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
BACKGROUND Although it is thought that Vietnam is a high endemic region of hepatitis A virus (HAV) infection, there is no report on genomic characterization of HAV spread in Vietnam. The purpose of the present paper was therefore to identify various virus infections from 33 children with acute or fulminant hepatitis of unknown etiology admitted to Children's Hospital No.1 in Ho Chi Minh City, Vietnam. METHODS Anti-HAV IgM and IgG were assayed by ELISA. Viral RNA and DNA were determined by PCR method. HAV genes isolated by PCR were sequenced and characterized by phylogenetic analysis. RESULTS Anti-HAV IgM was detected in 18 of 26 acute hepatitis (69.2%) and one of seven (14.3%) fulminant hepatitis patients. Furthermore, HAV-RNA in serum was identified in five of 26 acute (19.2%) and two of seven (28.6%) fulminant hepatitis patients, respectively, on nested reverse transcription-polymerase chain reaction. Among the seven HAV-RNA-positive patients tested, two (28.6%) were negative for anti-HAV IgM. We also obtained seven isolates containing the HAV genome with the viral protein 1 (VP1) region sequence. All Vietnamese HAV isolates formed a cluster and belonged to genotype IA according to phylogenetic analysis based on the short sequences of VP1-2A junction region. CONCLUSION HAV is an important agent with regard to fulminant hepatitis among children in Vietnam. To the authors' knowledge this is the first report on Vietnamese HAV strain confirmed on sequencing.
Collapse
Affiliation(s)
- Phuc Le Hoang
- Department of Gastroenterology, Children's Hospital No 1, Ho Chi Minh City, Vietnam
| | | | | | | | | |
Collapse
|
28
|
Squartini F, Arndt PF. Quantifying the stationarity and time reversibility of the nucleotide substitution process. Mol Biol Evol 2008; 25:2525-35. [PMID: 18682605 DOI: 10.1093/molbev/msn169] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Markov models describing the evolution of the nucleotide substitution process, widely used in phylogeny reconstruction, usually assume the hypotheses of stationarity and time reversibility. Although these models give meaningful results when applied to biological data, it is not clear if the 2 assumptions mentioned above hold and, if not, how much sequence evolution processes deviate from them. To this aim, we introduce 2 sets of indices that can be calculated from the nucleotide distribution and the substitution rates. The stationarity indices (STIs) can be used to test the validity of the equilibrium assumption. The irreversibility indices (IRIs) are derived from the Kolmogorov cycle conditions for time reversibility and quantify the degree of nontime reversibility of a process. We have computed STIs and IRIs for the evolutionary process of 2 lineages, Drosophila simulans and Homo sapiens. In the latter case, we use a modified form of the indices that takes into account the CpG decay process. In both cases, we find statistically significant deviations from the ideal case of a process that has reached stationarity and is time reversible.
Collapse
Affiliation(s)
- Federico Squartini
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | |
Collapse
|
29
|
O'Brien JD, She ZS, Suchard MA. Dating the time of viral subtype divergence. BMC Evol Biol 2008; 8:172. [PMID: 18541033 PMCID: PMC2443812 DOI: 10.1186/1471-2148-8-172] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2008] [Accepted: 06/09/2008] [Indexed: 11/10/2022] Open
Abstract
Precise dating of viral subtype divergence enables researchers to correlate divergence with geographic and demographic occurrences. When historical data are absent (that is, the overwhelming majority), viral sequence sampling on a time scale commensurate with the rate of substitution permits the inference of the times of subtype divergence. Currently, researchers use two strategies to approach this task, both requiring strong conditions on the molecular clock assumption of substitution rate. As the underlying structure of the substitution rate process at the time of subtype divergence is not understood and likely highly variable, we present a simple method that estimates rates of substitution, and from there, times of divergence, without use of an assumed molecular clock. We accomplish this by blending estimates of the substitution rate for triplets of dated sequences where each sequence draws from a distinct viral subtype, providing a zeroth-order approximation for the rate between subtypes. As an example, we calculate the time of divergence for three genes among influenza subtypes A-H3N2 and B using subtype C as an outgroup. We show a time of divergence approximately 100 years ago, substantially more recent than previous estimates which range from 250 to 3800 years ago.
Collapse
Affiliation(s)
- John D O'Brien
- Department of Biomathematics, UCLA, Los Angeles, CA 90095, USA.
| | | | | |
Collapse
|
30
|
Lebedev VS, Bannikova AA, Tesakov AS, Abramson NI. Molecular phylogeny of the genus Alticola (Cricetidae, Rodentia) as inferred from the sequence of the cytochrome b gene. ZOOL SCR 2007. [DOI: 10.1111/j.1463-6409.2007.00300.x] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
31
|
Bérard J, Gouéré JB, Piau D. Solvable models of neighbor-dependent substitution processes. Math Biosci 2007; 211:56-88. [PMID: 18001806 DOI: 10.1016/j.mbs.2007.10.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2007] [Revised: 09/27/2007] [Accepted: 10/02/2007] [Indexed: 11/18/2022]
Abstract
We prove that a wide class of Markov models of neighbor-dependent substitution processes on the integer line is solvable. This class contains some models of nucleotidic substitutions recently introduced and studied empirically by molecular biologists. We show that the polynucleotidic frequencies at equilibrium solve some finite-size linear systems. This provides, for the first time up to our knowledge, explicit and algebraic formulas for the stationary frequencies of non-degenerate neighbor-dependent models of DNA substitutions. Furthermore, we show that the dynamics of these stochastic processes and their distribution at equilibrium exhibit some stringent, rather unexpected, independence properties. For example, nucleotidic sites at distance at least three evolve independently, and all the sites, when encoded as purines and pyrimidines, evolve independently.
Collapse
Affiliation(s)
- Jean Bérard
- Institut Camille Jordan - UMR 5208, Université Claude Bernard Lyon 1, 69622, Villeurbanne, France.
| | | | | |
Collapse
|
32
|
Morán T, Fontdevila A. On the phylogeny of the Drosophila hydei subgroup: New insights from combined analyses of nuclear and mitochondrial data. Mol Phylogenet Evol 2007; 43:1198-205. [PMID: 17292635 DOI: 10.1016/j.ympev.2006.12.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2006] [Revised: 10/04/2006] [Accepted: 12/28/2006] [Indexed: 11/19/2022]
Affiliation(s)
- Tomás Morán
- Grup de Biologia Evolutiva, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.
| | | |
Collapse
|
33
|
Som A. A new approach for estimating the efficiencies of the nucleotide substitution models. Theory Biosci 2007; 125:133-45. [PMID: 17412292 DOI: 10.1016/j.thbio.2006.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2006] [Revised: 11/16/2006] [Accepted: 11/21/2006] [Indexed: 11/20/2022]
Abstract
In this article, a new approach is presented for estimating the efficiencies of the nucleotide substitution models in a four-taxon case and then this approach is used to estimate the relative efficiencies of six substitution models under a wide variety of conditions. In this approach, efficiencies of the models are estimated by using a simple probability distribution theory. To assess the accuracy of the new approach, efficiencies of the models are also estimated by using the direct estimation method. Simulation results from the direct estimation method confirmed that the new approach is highly accurate. The success of the new approach opens a unique opportunity to develop analytical methods for estimating the relative efficiencies of the substitution models in a straightforward way.
Collapse
Affiliation(s)
- Anup Som
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA.
| |
Collapse
|
34
|
Abstract
Phylogenetic analysis has changed greatly in the last decade, and the most important themes in that change are reviewed here. Sequence data have become the most common source of phylogenetic information. This means that explicit models for evolutionary processes have been developed in a likelihood context, which allow more realistic data analyses. These models are becoming increasingly complex, both for nucleotides and for amino acid sequences, and so all such models need to be quantitatively assessed for each data set, to find the most appropriate one for use in any particular tree-building analysis. Bayesian analysis has been developed for tree-building and is greatly increasing in popularity. This is because a good heuristic strategy exists, which allows large data sets to be analyzed with complex evolutionary models in a practical time. Perhaps the most disappointing aspect of tree interpretation is the ongoing confusion between rooted and unrooted trees, while the effect of taxon and character sampling is often overlooked when constructing a phylogeny (especially in parasitology). The review finishes with a detailed consideration of the analysis of a multi-gene data set for several dozen taxa of Cryptosporidium (Apicomplexa), illustrating many of the theoretical and practical points highlighted in the review.
Collapse
Affiliation(s)
- David A Morrison
- Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences, 751 89 Uppsala, Sweden
| |
Collapse
|
35
|
Yang Z, O'Brien JD, Zheng X, Zhu HQ, She ZS. Tree and rate estimation by local evaluation of heterochronous nucleotide data. Bioinformatics 2006; 23:169-76. [PMID: 17110369 PMCID: PMC7187891 DOI: 10.1093/bioinformatics/btl577] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation: Heterochronous gene sequence data is important for characterizing the evolutionary processes of fast-evolving organisms such as RNA viruses. A limited set of algorithms exists for estimating the rate of nucleotide substitution and inferring phylogenetic trees from such data. The authors here present a new method, Tree and Rate Estimation by Local Evaluation (TREBLE) that robustly calculates the rate of nucleotide substitution and phylogeny with several orders of magnitude improvement in computational time. Methods: For the basis of its rate estimation TREBLE novelly utilizes a geometric interpretation of the molecular clock assumption to deduce a local estimate of the rate of nucleotide substitution for triplets of dated sequences. Averaging the triplet estimates via a variance weighting yields a global estimate of the rate. From this value, an iterative refinement procedure relying on statistical properties of the triplets then generates a final estimate of the global rate of nucleotide substitution. The estimated global rate is then utilized to find the tree from the pairwise distance matrix via an UPGMA-like algorithm. Results: Simulation studies show that TREBLE estimates the rate of nucleotide substitution with point estimates comparable with the best of available methods. Confidence intervals are comparable with that of BEAST. TREBLE's phylogenetic reconstruction is significantly improved over the other distance matrix method but not as accurate as the Bayesian algorithm. Compared with three other algorithms, TREBLE reduces computational time by a minimum factor of 3000. Relative to the algorithm with the most accurate estimates for the rate of nucleotide substitution (i.e. BEAST), TREBLE is over 10 000 times more computationally efficient. Availability: Contact:jdobrien@ucla.edu
Collapse
Affiliation(s)
- Zhu Yang
- State Key Lab for Turbulence and Complex Systems and Center for Theoretical Biology, Peking University, Beijing 100871, China
| | | | | | | | | |
Collapse
|
36
|
Kelchner SA, Thomas MA. Model use in phylogenetics: nine key questions. Trends Ecol Evol 2006; 22:87-94. [PMID: 17049674 DOI: 10.1016/j.tree.2006.10.004] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2006] [Revised: 09/19/2006] [Accepted: 10/05/2006] [Indexed: 11/16/2022]
Abstract
Models of character evolution underpin all phylogeny estimations, thus model adequacy remains a crucial issue for phylogenetics and its many applications. Although progress has been made in selecting appropriate models for phylogeny estimation, there is still concern about their purpose and proper use. How do we interpret models in a phylogenetic context? What are their effects on phylogeny estimation? How can we improve confidence in the models that we choose? That the phylogenetics community is asking such questions denotes an important stage in the use of explicit models. Here, we examine these and other common questions and draw conclusions about how the community is using and choosing models, and where this process will take us next.
Collapse
Affiliation(s)
- Scot A Kelchner
- Department of Biological Sciences, Idaho State University, Pocatello, ID 83209-8007, USA.
| | | |
Collapse
|
37
|
Kartavtsev YP, Lee JS. Analysis of nucleotide diversity at the cytochrome b and cytochrome oxidase 1 genes at the population, species, and genus levels. RUSS J GENET+ 2006. [DOI: 10.1134/s1022795406040016] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
38
|
Ababneh F, Jermiin LS, Ma C, Robinson J. Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics 2006; 22:1225-31. [PMID: 16492684 DOI: 10.1093/bioinformatics/btl064] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Most phylogenetic methods assume that the sequences of nucleotides or amino acids have evolved under stationary, reversible and homogeneous conditions. When these assumptions are violated by the data, there is an increased probability of errors in the phylogenetic estimates. Methods to examine aligned sequences for these violations are available, but they are rarely used, possibly because they are not widely known or because they are poorly understood. RESULTS We describe and compare the available tests for symmetry of k-dimensional contingency tables from homologous sequences, and develop two new tests to evaluate different aspects of the evolutionary processes. For any pair of sequences, we consider a partition of the test for symmetry into a test for marginal symmetry and a test for internal symmetry. The proposed tests can be used to identify appropriate models for estimation of evolutionary relationships under a Markovian model. Simulations under more or less complex evolutionary conditions were done to display the performance of the tests. Finally, the tests were applied to an alignment of small-subunit ribosomal RNA sequences of five species of bacteria to outline the evolutionary processes under which they evolved. AVAILABILITY Programs written in R to do the tests on nucleotides are available from http://www.maths.usyd.edu.au/u/johnr/testsym/
Collapse
Affiliation(s)
- Faisal Ababneh
- School of Mathematics and Statistics, University of Sydney NSW 2006, Australia
| | | | | | | |
Collapse
|
39
|
Morán T, Fontdevila A. Phylogeny and molecular evolution of the Drosophila hydei subgroup (Drosophila repleta group) inferred from the Xanthine dehydrogenase gene. Mol Phylogenet Evol 2005; 36:695-705. [PMID: 15935705 DOI: 10.1016/j.ympev.2005.04.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2004] [Revised: 03/18/2005] [Accepted: 04/05/2005] [Indexed: 11/30/2022]
Abstract
The hydei subgroup (Drosophila repleta group) consists of seven species divided into two complexes: bifurca and hydei, whose phylogenetic relationships are not well understood. To evaluate the molecular phylogeny of this subgroup, we analyzed 2085 bp of coding sequence of the Xanthine dehydrogenase gene in six available species of the hydei subgroup, with Drosophila buzzatii and Drosophila mulleri as an outgroup. For phylogenetic reconstruction we adopted a maximum-likelihood framework, based on the adjustment of descriptive models of nucleotide substitution to real data. We employed distance-based and weighted parsimony methods to construct candidate phylogenies. In all cases, we obtained only one completely resolved tree with strong statistical support for each node, that shows a phylogeny that is partially discordant with the proposed systematics of the subgroup. This tree suggests that the two species complexes are paraphyletic, as opposed to classic phylogenies using morphologic and cytologic traits. This discordance is discussed in relation to its implication for the evolutionary history of the hydei subgroup.
Collapse
Affiliation(s)
- Tomás Morán
- Grup de Biología Evolutiva, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra (Barcelona), Spain
| | | |
Collapse
|
40
|
Merritt TJS, Young CR, Vogt RG, Wilkerson RC, Quattro JM. Intron retention identifies a malaria vector within the Anopheles (Nyssorhynchus) albitaris complex (Diptera: Culicidae). Mol Phylogenet Evol 2005; 35:719-24. [PMID: 15878139 DOI: 10.1016/j.ympev.2005.03.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2004] [Revised: 03/09/2005] [Accepted: 03/10/2005] [Indexed: 11/18/2022]
Affiliation(s)
- T J S Merritt
- Department of Biological Sciences, Program in Marine Science, Baruch Institute and School of the Environment, University of South Carolina, Columbia, SC 29208, USA.
| | | | | | | | | |
Collapse
|
41
|
Posada D, Buckley TR. Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol 2005; 53:793-808. [PMID: 15545256 DOI: 10.1080/10635150490522304] [Citation(s) in RCA: 2289] [Impact Index Per Article: 120.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus(genus Carabus) ground beetles described by Sota and Vogler (2001).
Collapse
Affiliation(s)
- David Posada
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, Vigo 36200, Spain.
| | | |
Collapse
|
42
|
Bos DH, Posada D. Using models of nucleotide evolution to build phylogenetic trees. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2005; 29:211-227. [PMID: 15572070 DOI: 10.1016/j.dci.2004.07.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2004] [Revised: 06/17/2004] [Accepted: 07/31/2004] [Indexed: 05/24/2023]
Abstract
Molecular phylogenetics and its applications are popular and useful tools for making comparative investigations in genetics; however, estimating phylogenetic trees is not always straightforward. Some phylogenetic estimators use an explicit model of nucleotide evolution to estimate evolutionary parameters such as branch lengths and tree topology. There are many models to choose from, and use of the optimal model for a particular data set is important to avoid a loss of power and accuracy in phylogenetic estimations. Here, we review some molecular evolutionary forces and the parameters included in some common models of evolution used to interpret resulting patterns of molecular variation. We present some statistical methods of selecting a particular model of nucleotide evolution, and provide an empirical example of model selection. Statistical model selection strikes a balance between the bias introduced by some models and the increased variance of parameter estimates that results from using other models.
Collapse
Affiliation(s)
- David H Bos
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, New Zealand.
| | | |
Collapse
|
43
|
Abstract
Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the true tree is known with confidence. It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree. Tree searching and placement of composition vectors on the tree are done in a Bayesian framework using Markov chain Monte Carlo (MCMC) methods. Assessment of fit of the model to the data is made in both maximum likelihood (ML) and Bayesian frameworks. In an ML framework, overall model fit is assessed using the Goldman-Cox test, and the fit of the composition implied by a (possibly heterogeneous) model to the composition of the data is assessed using a novel tree-and model-based composition fit test. In a Bayesian framework, overall model fit and composition fit are assessed using posterior predictive simulation. It is shown that when composition is not accommodated, then the model does not fit, and incorrect trees are found; but when composition is accommodated, the model then fits, and the known correct phylogenies are obtained.
Collapse
Affiliation(s)
- Peter G Foster
- Department of Zoology, The Natural History Museum, Cromwell Road, London SW7 5BD, United Kingdom.
| |
Collapse
|
44
|
Jermiin L, Ho SY, Ababneh F, Robinson J, Larkum AW. The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst Biol 2004; 53:638-43. [PMID: 15371251 DOI: 10.1080/10635150490468648] [Citation(s) in RCA: 179] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Affiliation(s)
- Lars Jermiin
- School of Biological Sciences, University of Sydney, NSW 2006, Australia.
| | | | | | | | | |
Collapse
|
45
|
Ko WY, David RM, Akashi H. Molecular phylogeny of the Drosophila melanogaster species subgroup. J Mol Evol 2004; 57:562-73. [PMID: 14738315 DOI: 10.1007/s00239-003-2510-x] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2002] [Accepted: 06/02/2003] [Indexed: 11/30/2022]
Abstract
Although molecular and phenotypic evolution have been studied extensively in Drosophila melanogaster and its close relatives, phylogenetic relationships within the D. melanogaster species subgroup remain unresolved. In particular, recent molecular studies have not converged on the branching orders of the D. yakuba-D. teissieri and D. erecta-D. orena species pairs relative to the D. melanogaster-D. simulans-D. mauritiana-D. sechellia species complex. Here, we reconstruct the phylogeny of the melanogaster species subgroup using DNA sequence data from four nuclear genes. We have employed "vectorette PCR" to obtain sequence data for orthologous regions of the Alcohol dehydrogenase (Adh), Alcohol dehydrogenase related (Adhr), Glucose dehydrogenase (Gld), and rosy (ry) genes (totaling 7164 bp) from six melanogaster subgroup species (D. melanogaster, D. simulans, D. teissieri, D. yakuba, D. erecta, and D. orena) and three species from subgroups outside the melanogaster species subgroup [D. eugracilis (eugracilis subgroup), D. mimetica (suzukii subgroup), and D. lutescens (takahashii subgroup)]. Relationships within the D. simulans complex are not addressed. Phylogenetic analyses employing maximum parsimony, neighbor-joining, and maximum likelihood methods strongly support a D. yakuba-D. teissieri and D. erecta-D. orena clade within the melanogaster species subgroup. D. eugracilis is grouped closer to the melanogaster subgroup than a D. mimetica-D. lutescens clade. This tree topology is supported by reconstructions employing simple (single parameter) and more complex (nonreversible) substitution models.
Collapse
Affiliation(s)
- Wen-Ya Ko
- Institute of Molecular Evolutionary Genetics and Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | | | | |
Collapse
|
46
|
|
47
|
Aiba N, Nishimura H, Arakawa Y, Abe K. Complete nucleotide sequence and phylogenetic analyses of hepatitis B virus isolated from two pileated gibbons. Virus Genes 2004; 27:219-26. [PMID: 14618082 DOI: 10.1023/a:1026387614162] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We analyzed full-length sequence of hepatitis B virus (HBV) recovered from two pileated gibbons (Hylobates pileatus) originally born in East Asia. Two animals possessed a viral genome of 3182 nt in length with a 33 nt deletion in the pre-S1 region, and designated HBV PG-Makiko and HBV PG-Yohko, respectively. Both sequences had 65-90% similarity to type A-G of human HBV isolates. Phylogenetic analysis demonstrated that both isolates were distinct from the human and other nonhuman primate HBV isolates, but could be classified into gibbon isolates that were previously reported by others. Small spherical and tubular particles and large particles with outer envelopes were observed in the serum under immunoelectron microscopic examination. By immunohistochemical staining, HBsAg and HBcAg were detected in the cytoplasm and nuclei of hepatocytes, respectively. Our results suggested that HBV found in these animals is indigenous to their respective hosts and not recent acquisitions from human.
Collapse
Affiliation(s)
- Naoto Aiba
- Department of Pathology, National Institute of Infectious Diseases, Tokyo, Japan
| | | | | | | |
Collapse
|
48
|
Abstract
Guanine plus cytosine (GC) content ranges broadly among bacterial genomes. In this study, we explore the use of a Brownian-motion model for the evolution of GC content over time. This model assumes that GC content varies over time in a continuous and homogeneous manner. Using this model and a maximum-likelihood approach, we analyzed the evolution of GC content across several bacterial phylogenies. Using three independent tests, we found that the observed divergence in GC content was consistent with a homogeneous Brownian-motion model. For example, similar rates of GC content evolution were inferred in several different bacterial subclades, indicating that there is relatively little rate heterogeneity in GC content evolution over broad evolutionary time scales. We thus argue that the homogeneous Brownian-motion model provides a good working model for GC content evolution. We then use this model to determine the overall rate of GC content evolution among eubacteria. We also determine the time frame over which GC content remains similar in related taxa, using a flexible definition for "similarity" in GC content so that, depending on the context, more or less stringent criteria may be applied. Our results have implications for models of sequence evolution, including those used for phylogenetic reconstruction and for inferring unusual changes in GC content.
Collapse
Affiliation(s)
- Eric Haywood-Farmer
- Department of Zoology, University of British Columbia, Vancouver V6T 1Z4, Canada
| | | |
Collapse
|
49
|
|
50
|
Romeike J, Friedl T, Helms G, Ott S. Genetic diversity of algal and fungal partners in four species of Umbilicaria (Lichenized Ascomycetes) along a transect of the Antarctic peninsula. Mol Biol Evol 2002; 19:1209-17. [PMID: 12140232 DOI: 10.1093/oxfordjournals.molbev.a004181] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Lichens from the genus Umbilicaria were collected across a 5,000-km transect through Antarctica and investigated for DNA sequence polymorphism in a region of 480-660 bp of the nuclear internal transcribed spacer region of ribosomal DNA. Sequences from both fungal (16 ascomycetes) and photosynthetic partners (22 chlorophytes from the genus Trebouxia) were determined and compared with homologs from lichens inhabiting more temperate, continental climates. The phylogenetic analyses reveal that Antarctic lichens have colonized their current habitats both through multiple independent colonization events from temperate embarkation zones and through recent long-range dispersal in the Antarctic of successful preexisting colonizers. Furthermore, the results suggest that relichenization-de novo establishment of the fungus-photosynthesizer symbiosis from nonlichenized algal and fungal cells-has occurred during the process of Antarctic lichen dispersal. Independent dispersal of algal and fungal cultures therefore can lead to a successful establishment of the lichen symbiosis even under harsh Antarctic conditions.
Collapse
Affiliation(s)
- J Romeike
- Botanisches Institut, Heinrich-Heine-Universität, Universitätsstr. 1, Düsseldorf, Germany
| | | | | | | |
Collapse
|