151
|
Steel M, Linz S, Huson DH, Sanderson MJ. Identifying a species tree subject to random lateral gene transfer. J Theor Biol 2013; 322:81-93. [DOI: 10.1016/j.jtbi.2013.01.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2012] [Revised: 01/09/2013] [Accepted: 01/10/2013] [Indexed: 11/26/2022]
|
152
|
Schumer M, Cui R, Boussau B, Walter R, Rosenthal G, Andolfatto P. An evaluation of the hybrid speciation hypothesis for Xiphophorus clemenciae based on whole genome sequences. Evolution 2013; 67:1155-68. [PMID: 23550763 PMCID: PMC3621027 DOI: 10.1111/evo.12009] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Once thought rare in animal taxa, hybridization has been increasingly recognized as an important and common force in animal evolution. In the past decade, a number of studies have suggested that hybridization has driven speciation in some animal groups. We investigate the signature of hybridization in the genome of a putative hybrid species, Xiphophorus clemenciae, through whole genome sequencing of this species and its hypothesized progenitors. Based on analysis of this data, we find that X. clemenciae is unlikely to have been derived from admixture between its proposed parental species. However, we find significant evidence for recent gene flow between Xiphophorus species. Although we detect genetic exchange in two pairs of species analyzed, the proportion of genomic regions that can be attributed to hybrid origin is small, suggesting that strong behavioral premating isolation prevents frequent hybridization in Xiphophorus. The direction of gene flow between species is potentially consistent with a role for sexual selection in mediating hybridization.
Collapse
Affiliation(s)
- Molly Schumer
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey 08544, USA.
| | | | | | | | | | | |
Collapse
|
153
|
Tarcz S, Przyboś E, Surmacz M. An assessment of haplotype variation in ribosomal and mitochondrial DNA fragments suggests incomplete lineage sorting in some species of the Paramecium aurelia complex (Ciliophora, Protozoa). Mol Phylogenet Evol 2013; 67:255-65. [DOI: 10.1016/j.ympev.2013.01.016] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Revised: 01/23/2013] [Accepted: 01/29/2013] [Indexed: 01/11/2023]
|
154
|
Grechko VV. The problems of molecular phylogenetics with the example of squamate reptiles: Mitochondrial DNA markers. Mol Biol 2013. [DOI: 10.1134/s0026893313010056] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
155
|
Affiliation(s)
- David P. Mindell
- Department of Biochemistry & Biophysics, University of California, San Francisco, CA 94158, USA
| |
Collapse
|
156
|
Oliver JC. Microevolutionary processes generate phylogenomic discordance at ancient divergences. Evolution 2013; 67:1823-30. [PMID: 23730773 DOI: 10.1111/evo.12047] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 12/31/2012] [Indexed: 11/27/2022]
Abstract
Stochastic population processes may cause differences between species histories and gene histories. These processes are assumed to only influence the most recent divergences in the tree of life; however, there may be underappreciated potential for microevolutionary processes to impact deep divergences. I used multispecies coalescent models to determine the impact of stochastic processes on deep phylogenomic histories. Here I show phylogenomic discordance between gene histories and species histories is expected at deep divergences for many eukaryotic taxa, and the probability of discordance increases with population size, generation time, and the number of species in the tree. Five eukaryotic clades (angiosperms, birds, harpaline beetles, mammals, and nymphalid butterflies) demonstrate significant discordance potential at divergences over 50 million years old, and this discordance potential is independent of the age of divergence. These findings demonstrate population processes acting over very short timescales will leave a lasting impact on genomic histories, even for divergence events occurring tens to hundreds of millions of years ago.
Collapse
Affiliation(s)
- Jeffrey C Oliver
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520, USA.
| |
Collapse
|
157
|
Zheng Y, Zhang L. Effect of Incomplete Lineage Sorting on Tree-Reconciliation-Based Inference of Gene Duplication. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/978-3-642-38036-5_26] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
158
|
Sen D, Brown CJ, Top EM, Sullivan J. Inferring the evolutionary history of IncP-1 plasmids despite incongruence among backbone gene trees. Mol Biol Evol 2013; 30:154-66. [PMID: 22936717 PMCID: PMC3525142 DOI: 10.1093/molbev/mss210] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Plasmids of the incompatibility group IncP-1 can transfer and replicate in many genera of the Proteobacteria. They are composed of backbone genes that encode a variety of essential functions and accessory genes that have implications for human health and environmental remediation. Although it is well understood that the accessory genes are transferred horizontally between plasmids, recent studies have also provided examples of recombination in the backbone genes of IncP-1 plasmids. As a consequence, phylogeny estimation based on backbone genes is expected to produce conflicting gene tree topologies. The main goal of this study was therefore to infer the evolutionary history of IncP-1 plasmids in the presence of both vertical and horizontal gene transfer. This was achieved by quantifying the incongruence among gene trees and attributing it to known causes such as 1) phylogenetic uncertainty, 2) coalescent stochasticity, and 3) horizontal inheritance. Topologies of gene trees exhibited more incongruence than could be attributed to phylogenetic uncertainty alone. Species-tree estimation using a Bayesian framework that takes coalescent stochasticity into account was well supported, but it differed slightly from the maximum-likelihood tree estimated by concatenation of backbone genes. After removal of the gene that demonstrated a signal of intergroup recombination, the concatenated tree was congruent with the species-tree estimate, which itself was robust to inclusion/exclusion of the recombinant gene. Thus, in spite of horizontal gene exchange both within and among IncP-1 subgroups, the backbone genome of these IncP-1 plasmids retains a detectable vertical evolutionary history.
Collapse
Affiliation(s)
- Diya Sen
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho
- Bioinformatics and Computational Biology Graduate Program, University of Idaho
| | - Celeste J. Brown
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho
- Bioinformatics and Computational Biology Graduate Program, University of Idaho
- Department of Biological Sciences, University of Idaho
| | - Eva M. Top
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho
- Bioinformatics and Computational Biology Graduate Program, University of Idaho
- Department of Biological Sciences, University of Idaho
| | - Jack Sullivan
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho
- Bioinformatics and Computational Biology Graduate Program, University of Idaho
- Department of Biological Sciences, University of Idaho
| |
Collapse
|
159
|
Park HJ, Nakhleh L. Inference of reticulate evolutionary histories by maximum likelihood: the performance of information criteria. BMC Bioinformatics 2012; 13 Suppl 19:S12. [PMID: 23281614 PMCID: PMC3526433 DOI: 10.1186/1471-2105-13-s19-s12] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Maximum likelihood has been widely used for over three decades to infer phylogenetic trees from molecular data. When reticulate evolutionary events occur, several genomic regions may have conflicting evolutionary histories, and a phylogenetic network may provide a more adequate model for representing the evolutionary history of the genomes or species. A maximum likelihood (ML) model has been proposed for this case and accounts for both mutation within a genomic region and reticulation across the regions. However, the performance of this model in terms of inferring information about reticulate evolution and properties that affect this performance have not been studied. RESULTS In this paper, we study the effect of the evolutionary diameter and height of a reticulation event on its identifiability under ML. We find both of them, particularly the diameter, have a significant effect. Further, we find that the number of genes (which can be generalized to the concept of "non-recombining genomic regions") that are transferred across a reticulation edge affects its detectability. Last but not least, a fundamental challenge with phylogenetic networks is that they allow an arbitrary level of complexity, giving rise to the model selection problem. We investigate the performance of two information criteria, the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), for addressing this problem. We find that BIC performs well in general for controlling the model complexity and preventing ML from grossly overestimating the number of reticulation events. CONCLUSION Our results demonstrate that BIC provides a good framework for inferring reticulate evolutionary histories. Nevertheless, the results call for caution when interpreting the accuracy of the inference particularly for data sets with particular evolutionary features.
Collapse
Affiliation(s)
- Hyun Jung Park
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
160
|
Seetharam A, Stuart GW. Whole genome phylogenies for multiple Drosophila species. BMC Res Notes 2012; 5:670. [PMID: 23210901 PMCID: PMC3531268 DOI: 10.1186/1756-0500-5-670] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 11/27/2012] [Indexed: 11/23/2022] Open
Abstract
Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD) to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins) strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between Drosophila species. Furthermore, protein filtering can be effectively applied to reduce incongruence in the dataset as well as to generate alternative phylogenies.
Collapse
Affiliation(s)
- Arun Seetharam
- Department of Biology, Indiana State University, Terre Haute, Indiana 47809, USA
| | | |
Collapse
|
161
|
Abstract
Empirical codon models (ECMs) estimated from a large number of globular protein families outperformed mechanistic codon models in their description of the general process of protein evolution. Among other factors, ECMs implicitly model the influence of amino acid properties and multiple nucleotide substitutions (MNS). However, the estimation of ECMs requires large quantities of data, and until recently, only few suitable data sets were available. Here, we take advantage of several new Drosophila species genomes to estimate codon models from genome-wide data. The availability of large numbers of genomes over varying phylogenetic depths in the Drosophila genus allows us to explore various divergence levels. In consequence, we can use these data to determine the appropriate level of divergence for the estimation of ECMs, avoiding overestimation of MNS rates caused by saturation. To account for variation in evolutionary rates along the genome, we develop new empirical codon hidden Markov models (ecHMMs). These models significantly outperform previous ones with respect to maximum likelihood values, suggesting that they provide a better fit to the evolutionary process. Using ECMs and ecHMMs derived from genome-wide data sets, we devise new likelihood ratio tests (LRTs) of positive selection. We found classical LRTs very sensitive to the presence of MNSs, showing high false-positive rates, especially with small phylogenies. The new LRTs are more conservative than the classical ones, having acceptable false-positive rates and reduced power.
Collapse
Affiliation(s)
- Nicola De Maio
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.
| | | | | | | |
Collapse
|
162
|
Dávalos LM, Cirranello AL, Geisler JH, Simmons NB. Understanding phylogenetic incongruence: lessons from phyllostomid bats. Biol Rev Camb Philos Soc 2012; 87:991-1024. [PMID: 22891620 PMCID: PMC3573643 DOI: 10.1111/j.1469-185x.2012.00240.x] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Revised: 07/04/2012] [Accepted: 07/18/2012] [Indexed: 12/25/2022]
Abstract
All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive morphological convergence among nectar-feeding lineages, and incongruent gene trees. Applying methods to account for nucleotide sequence saturation reduces, but does not completely eliminate, phylogenetic conflict. We ruled out paralogy, lateral gene transfer, and poor taxon sampling and outgroup choices among the processes leading to incongruent gene trees in phyllostomid bats. Uncovering and countering the possible effects of introgression and lineage sorting of ancestral polymorphism on gene trees will require great leaps in genomic and allelic sequencing in this species-rich mammalian family. We also found evidence for adaptive molecular evolution leading to convergence in mitochondrial proteins among nectar-feeding lineages. In conclusion, the biological processes that generate phylogenetic conflict are ubiquitous, and overcoming incongruence requires better models and more data than have been collected even in well-studied organisms such as phyllostomid bats.
Collapse
Affiliation(s)
- Liliana M Dávalos
- Department of Ecology and Evolution, and Consortium for Inter-Disciplinary Environmental Research, State University of New York at Stony BrookStony Brook, NY 11794, USA
| | - Andrea L Cirranello
- Division of Vertebrate Zoology (Mammalogy), American Museum of Natural HistoryNew York, NY 10024, USA
- Department of Anatomical Sciences, State University of New York at Stony BrookStony Brook, NY 11794, USA
| | - Jonathan H Geisler
- Department of Anatomy, New York College of Osteopathic MedicineOld Westbury, NY 11568, USA
| | - Nancy B Simmons
- Division of Vertebrate Zoology (Mammalogy), American Museum of Natural HistoryNew York, NY 10024, USA
| |
Collapse
|
163
|
Obbard DJ, Maclennan J, Kim KW, Rambaut A, O'Grady PM, Jiggins FM. Estimating divergence dates and substitution rates in the Drosophila phylogeny. Mol Biol Evol 2012; 29:3459-73. [PMID: 22683811 PMCID: PMC3472498 DOI: 10.1093/molbev/mss150] [Citation(s) in RCA: 157] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
An absolute timescale for evolution is essential if we are to associate evolutionary phenomena, such as adaptation or speciation, with potential causes, such as geological activity or climatic change. Timescales in most phylogenetic studies use geologically dated fossils or phylogeographic events as calibration points, but more recently, it has also become possible to use experimentally derived estimates of the mutation rate as a proxy for substitution rates. The large radiation of drosophilid taxa endemic to the Hawaiian islands has provided multiple calibration points for the Drosophila phylogeny, thanks to the "conveyor belt" process by which this archipelago forms and is colonized by species. However, published date estimates for key nodes in the Drosophila phylogeny vary widely, and many are based on simplistic models of colonization and coalescence or on estimates of island age that are not current. In this study, we use new sequence data from seven species of Hawaiian Drosophila to examine a range of explicit coalescent models and estimate substitution rates. We use these rates, along with a published experimentally determined mutation rate, to date key events in drosophilid evolution. Surprisingly, our estimate for the date for the most recent common ancestor of the genus Drosophila based on mutation rate (25-40 Ma) is closer to being compatible with independent fossil-derived dates (20-50 Ma) than are most of the Hawaiian-calibration models and also has smaller uncertainty. We find that Hawaiian-calibrated dates are extremely sensitive to model choice and give rise to point estimates that range between 26 and 192 Ma, depending on the details of the model. Potential problems with the Hawaiian calibration may arise from systematic variation in the molecular clock due to the long generation time of Hawaiian Drosophila compared with other Drosophila and/or uncertainty in linking island formation dates with colonization dates. As either source of error will bias estimates of divergence time, we suggest mutation rate estimates be used until better models are available.
Collapse
Affiliation(s)
- Darren J Obbard
- Institute of Evolutionary Biology, and Centre for Infection Immunity and Evolution, University of Edinburgh, Edinburgh, United Kingdom.
| | | | | | | | | | | |
Collapse
|
164
|
Random roots and lineage sorting. Mol Phylogenet Evol 2012; 64:12-20. [DOI: 10.1016/j.ympev.2012.02.029] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Revised: 02/11/2012] [Accepted: 02/27/2012] [Indexed: 11/16/2022]
|
165
|
Abstract
Despite advances in genetic mapping of quantitative traits and in phylogenetic comparative approaches, these two perspectives are rarely combined. The joint consideration of multiple crosses among related taxa (whether species or strains) not only allows more precise mapping of the genetic loci (called quantitative trait loci, QTL) that contribute to important quantitative traits, but also offers the opportunity to identify the origin of a QTL allele on the phylogenetic tree that relates the taxa. We describe a formal method for combining multiple crosses to infer the location of a QTL on a tree. We further discuss experimental design issues for such endeavors, such as how many crosses are required and which sets of crosses are best. Finally, we explore the method's performance in computer simulations, and we illustrate its use through application to a set of four mouse intercrosses among five inbred strains, with data on HDL cholesterol.
Collapse
|
166
|
Lin HT, Burleigh JG, Eulenstein O. Consensus properties for the deep coalescence problem and their application for scalable tree search. BMC Bioinformatics 2012; 13 Suppl 10:S12. [PMID: 22759417 PMCID: PMC3382448 DOI: 10.1186/1471-2105-13-s10-s12] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background To infer a species phylogeny from unlinked genes, phylogenetic inference methods must confront the biological processes that create incongruence between gene trees and the species phylogeny. Intra-specific gene variation in ancestral species can result in deep coalescence, also known as incomplete lineage sorting, which creates incongruence between gene trees and the species tree. One approach to account for deep coalescence in phylogenetic analyses is the deep coalescence problem, which takes a collection of gene trees and seeks the species tree that implies the fewest deep coalescence events. Although this approach is promising for phylogenetics, the consensus properties of this problem are mostly unknown and analyses of large data sets may be computationally prohibitive. Results We prove that the deep coalescence consensus tree problem satisfies the highly desirable Pareto property for clusters (clades). That is, in all instances, each cluster that is present in all of the input gene trees, called a consensus cluster, will also be found in every optimal solution. Moreover, we introduce a new divide and conquer method for the deep coalescence problem based on the Pareto property. This method refines the strict consensus of the input gene trees, thereby, in practice, often greatly reducing the complexity of the tree search and guaranteeing that the estimated species tree will satisfy the Pareto property. Conclusions Analyses of both simulated and empirical data sets demonstrate that the divide and conquer method can greatly improve upon the speed of heuristics that do not consider the Pareto consensus property, while also guaranteeing that the proposed solution fulfills the Pareto property. The divide and conquer method extends the utility of the deep coalescence problem to data sets with enormous numbers of taxa.
Collapse
Affiliation(s)
- Harris T Lin
- Department of Computer Science, Iowa State University, Ames, IA, USA
| | | | | |
Collapse
|
167
|
Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One 2012; 7:e37135. [PMID: 22675423 PMCID: PMC3365034 DOI: 10.1371/journal.pone.0037135] [Citation(s) in RCA: 1954] [Impact Index Per Article: 162.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 04/13/2012] [Indexed: 12/14/2022] Open
Abstract
The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. To date, high capacity, low cost genotyping has been largely achieved via “SNP chip” microarray-based platforms which require substantial prior knowledge of both genome sequence and variability, and once designed are suitable only for those targeted variable nucleotide sites. This method introduces substantial ascertainment bias and inherently precludes detection of rare or population-specific variants, a major source of information for both population history and genotype-phenotype association. Recent developments in reduced-representation genome sequencing experiments on massively parallel sequencers (commonly referred to as RAD-tag or RADseq) have brought direct sequencing to the problem of population genotyping, but increased cost and procedural and analytical complexity have limited their widespread adoption. Here, we describe a complete laboratory protocol, including a custom combinatorial indexing method, and accompanying software tools to facilitate genotyping across large numbers (hundreds or more) of individuals for a range of markers (hundreds to hundreds of thousands). Our method requires no prior genomic knowledge and achieves per-site and per-individual costs below that of current SNP chip technology, while requiring similar hands-on time investment, comparable amounts of input DNA, and downstream analysis times on the order of hours. Finally, we provide empirical results from the application of this method to both genotyping in a laboratory cross and in wild populations. Because of its flexibility, this modified RADseq approach promises to be applicable to a diversity of biological questions in a wide range of organisms.
Collapse
Affiliation(s)
- Brant K Peterson
- Department of Organismic & Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts, United States of America.
| | | | | | | | | |
Collapse
|
168
|
Yu Y, Degnan JH, Nakhleh L. The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet 2012; 8:e1002660. [PMID: 22536161 PMCID: PMC3330115 DOI: 10.1371/journal.pgen.1002660] [Citation(s) in RCA: 136] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2011] [Accepted: 03/05/2012] [Indexed: 11/29/2022] Open
Abstract
Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa. Species trees depict how species split and diverge. Within the branches of a species tree, gene trees, which depict the evolutionary histories of different genomic regions in the species, grow. Evolutionary analyses of the genomes of closely related organisms have highlighted the phenomenon that gene trees may disagree with each other as well as with the species tree that contains them due to deep coalescence. Furthermore, for several groups of organisms, hybridization plays an important role in their evolution and diversification. This evolutionary event also results in gene tree incongruence and gives rise to a species phylogeny that is a network. Thus, inferring the evolutionary histories of groups of organisms where hybridization is known, or suspected, to play an evolutionary role requires dealing simultaneously with hybridization and other sources of gene tree incongruence. Currently, no methods exist for doing this with general scenarios of hybridization. In this paper, we propose the first method for this task and demonstrate its performance. We revisit the analysis of a set of yeast species and another of Drosophila species, and show that evolutionary histories involving hybridization have higher support than the strictly diverging evolutionary histories estimated when not incorporating hybridization in the analysis.
Collapse
Affiliation(s)
- Yun Yu
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - James H. Degnan
- Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
- National Institute of Mathematical and Biological Synthesis, Knoxville, Tennessee, United States of America
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
169
|
Rubin BER, Ree RH, Moreau CS. Inferring phylogenies from RAD sequence data. PLoS One 2012; 7:e33394. [PMID: 22493668 PMCID: PMC3320897 DOI: 10.1371/journal.pone.0033394] [Citation(s) in RCA: 208] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 02/14/2012] [Indexed: 11/24/2022] Open
Abstract
Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD)--the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct "known" phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for "total evidence" phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species.
Collapse
Affiliation(s)
- Benjamin E R Rubin
- Committee on Evolutionary Biology, University of Chicago, Chicago, Illinois, United States of America.
| | | | | |
Collapse
|
170
|
Twyford AD, Ennos RA. Next-generation hybridization and introgression. Heredity (Edinb) 2012; 108:179-89. [PMID: 21897439 PMCID: PMC3282392 DOI: 10.1038/hdy.2011.68] [Citation(s) in RCA: 215] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2011] [Revised: 06/17/2011] [Accepted: 06/27/2011] [Indexed: 12/21/2022] Open
Abstract
Hybridization has a major role in evolution-from the introgression of important phenotypic traits between species, to the creation of new species through hybrid speciation. Molecular studies of hybridization aim to understand the class of hybrids and the frequency of introgression, detect the signature of ancient hybridization, and understand the behaviour of introgressed loci in their new genomic background. This often involves a large investment in the design and application of molecular markers, leading to a compromise between the depth and breadth of genomic data. New techniques designed to assay a large sub-section of the genome, in association with next-generation sequencing (NGS) technologies, will allow genome-wide hybridization and introgression studies in organisms with no prior sequence data. These detailed genotypic data will unite the breadth of sampling of loci characteristic of population genetics with the depth of sequence information associated with molecular phylogenetics. In this review, we assess the theoretical and methodological constraints that limit our understanding of natural hybridization, and promote the use of NGS for detecting hybridization and introgression between non-model organisms. We also make recommendations for the ways in which emerging techniques, such as pooled barcoded amplicon sequencing and restriction site-associated DNA tags, should be used to overcome current limitations, and enhance our understanding of this evolutionary significant process.
Collapse
|
171
|
Habel JC, Husemann M, Schmitt T, Zachos FE, Honnen AC, Petersen B, Parmakelis A, Stathi I. Microallopatry caused strong diversification in Buthus scorpions (Scorpiones: Buthidae) in the Atlas Mountains (NW Africa). PLoS One 2012; 7:e29403. [PMID: 22383951 PMCID: PMC3287997 DOI: 10.1371/journal.pone.0029403] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Accepted: 11/28/2011] [Indexed: 11/26/2022] Open
Abstract
The immense biodiversity of the Atlas Mountains in North Africa might be the result of high rates of microallopatry caused by mountain barriers surpassing 4000 meters leading to patchy habitat distributions. We test the influence of geographic structures on the phylogenetic patterns among Buthus scorpions using mtDNA sequences. We sampled 91 individuals of the genus Buthus from 51 locations scattered around the Atlas Mountains (Antiatlas, High Atlas, Middle Atlas and Jebel Sahro). We sequenced 452 bp of the Cytochrome Oxidase I gene which proved to be highly variable within and among Buthus species. Our phylogenetic analysis yielded 12 distinct genetic groups one of which comprised three subgroups mostly in accordance with the orographic structure of the mountain systems. Main clades overlap with each other, while subclades are distributed parapatrically. Geographic structures likely acted as long-term barriers among populations causing restriction of gene flow and allowing for strong genetic differentiation. Thus, genetic structure and geographical distribution of genetic (sub)clusters follow the classical theory of allopatric differentiation where distinct groups evolve without range overlap until reproductive isolation and ecological differentiation has built up. Philopatry and low dispersal ability of Buthus scorpions are the likely causes for the observed strong genetic differentiation at this small geographic scale.
Collapse
Affiliation(s)
- Jan C Habel
- Natural History Museum Luxembourg, Invertebrate Biology, Luxembourg.
| | | | | | | | | | | | | | | |
Collapse
|
172
|
Piffaretti J, Vanlerberghe-Masutti F, Tayeh A, Clamens AL, D’Acier AC, Jousselin E. Molecular phylogeny reveals the existence of two sibling species in the aphid pest Brachycaudus helichrysi (Hemiptera: Aphididae). ZOOL SCR 2012. [DOI: 10.1111/j.1463-6409.2012.00531.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
173
|
Rasmussen MD, Kellis M. Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res 2012; 22:755-65. [PMID: 22271778 DOI: 10.1101/gr.123901.111] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Gene phylogenies provide a rich source of information about the way evolution shapes genomes, populations, and phenotypes. In addition to substitutions, evolutionary events such as gene duplication and loss (as well as horizontal transfer) play a major role in gene evolution, and many phylogenetic models have been developed in order to reconstruct and study these events. However, these models typically make the simplifying assumption that population-related effects such as incomplete lineage sorting (ILS) are negligible. While this assumption may have been reasonable in some settings, it has become increasingly problematic as increased genome sequencing has led to denser phylogenies, where effects such as ILS are more prominent. To address this challenge, we present a new probabilistic model, DLCoal, that defines gene duplication and loss in a population setting, such that coalescence and ILS can be directly addressed. Interestingly, this model implies that in addition to the usual gene tree and species tree, there exists a third tree, the locus tree, which will likely have many applications. Using this model, we develop the first general reconciliation method that accurately infers gene duplications and losses in the presence of ILS, and we show its improved inference of orthologs, paralogs, duplications, and losses for a variety of clades, including flies, fungi, and primates. Also, our simulations show that gene duplications increase the frequency of ILS, further illustrating the importance of a joint model. Going forward, we believe that this unified model can offer insights to questions in both phylogenetics and population genetics.
Collapse
Affiliation(s)
- Matthew D Rasmussen
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | |
Collapse
|
174
|
Lavagnino N, Serra F, Arbiza L, Dopazo H, Hasson E. Evolutionary Genomics of Genes Involved in Olfactory Behavior in the Drosophila melanogaster Species Group. Evol Bioinform Online 2012; 8:89-104. [PMID: 22346339 PMCID: PMC3273929 DOI: 10.4137/ebo.s8484] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Previous comparative genomic studies of genes involved in olfactory behavior in Drosophila focused only on particular gene families such as odorant receptor and/or odorant binding proteins. However, olfactory behavior has a complex genetic architecture that is orchestrated by many interacting genes. In this paper, we present a comparative genomic study of olfactory behavior in Drosophila including an extended set of genes known to affect olfactory behavior. We took advantage of the recent burst of whole genome sequences and the development of powerful statistical tools to analyze genomic data and test evolutionary and functional hypotheses of olfactory genes in the six species of the Drosophila melanogaster species group for which whole genome sequences are available. Our study reveals widespread purifying selection and limited incidence of positive selection on olfactory genes. We show that the pace of evolution of olfactory genes is mostly independent of the life cycle stage, and of the number of life cycle stages, in which they participate in olfaction. However, we detected a relationship between evolutionary rates and the position that the gene products occupy in the olfactory system, genes occupying central positions tend to be more constrained than peripheral genes. Finally, we demonstrate that specialization to one host does not seem to be associated with bursts of adaptive evolution in olfactory genes in D. sechellia and D. erecta, the two specialists species analyzed, but rather different lineages have idiosyncratic evolutionary histories in which both historical and ecological factors have been involved.
Collapse
Affiliation(s)
- Nicolás Lavagnino
- Departamento de Ecología, Genética y Evolución; Facultad de Ciencias Exactas y Naturales; Universidad de Buenos Aires; Buenos Aires; Argentina
| | | | | | | | | |
Collapse
|
175
|
Zhou X, Xu S, Xu J, Chen B, Zhou K, Yang G. Phylogenomic analysis resolves the interordinal relationships and rapid diversification of the laurasiatherian mammals. Syst Biol 2012; 61:150-64. [PMID: 21900649 PMCID: PMC3243735 DOI: 10.1093/sysbio/syr089] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Revised: 04/06/2011] [Accepted: 06/23/2011] [Indexed: 11/24/2022] Open
Abstract
Although great progress has been made in resolving the relationships of placental mammals, the position of several clades in Laurasiatheria remain controversial. In this study, we performed a phylogenetic analysis of 97 orthologs (46,152 bp) for 15 taxa, representing all laurasiatherian orders. Additionally, phylogenetic trees of laurasiatherian mammals with draft genome sequences were reconstructed based on 1608 exons (2,175,102 bp). Our reconstructions resolve the interordinal relationships within Laurasiatheria and corroborate the clades Scrotifera, Fereuungulata, and Cetartiodactyla. Furthermore, we tested alternative topologies within Laurasiatheria, and among alternatives for the phylogenetic position of Perissodactyla, a sister-group relationship with Cetartiodactyla receives the highest support. Thus, Pegasoferae (Perissodactyla + Carnivora + Pholidota + Chiroptera) does not appear to be a natural group. Divergence time estimates from these genes were compared with published estimates for splits within Laurasiatheria. Our estimates were similar to those of several studies and suggest that the divergences among these orders occurred within just a few million years.
Collapse
Affiliation(s)
- Xuming Zhou
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210046, China
| | - Shixia Xu
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210046, China
| | - Junxiao Xu
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210046, China
| | - Bingyao Chen
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210046, China
| | - Kaiya Zhou
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210046, China
| | - Guang Yang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210046, China
| |
Collapse
|
176
|
Skoracka A, Kuczyński L, Santos de Mendonça R, Dabert M, Szydło W, Knihinicki D, Truol G, Navia D. Cryptic species within the wheat curl mite Aceria tosichella (Keifer) (Acari : Eriophyoidea), revealed by mitochondrial, nuclear and morphometric data. INVERTEBR SYST 2012. [DOI: 10.1071/is11037] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The wheat curl mite (WCM), Aceria tosichella (Keifer, 1969), is one of the primary pests of wheat and other cereals throughout the world. Traditional taxonomy recognises WCM as a single eriophyoid species; however, a recent study suggested that two genetic lineages of WCM in Australia might represent putative species. Here, we investigate WCM populations from different host plants in Australia, South America and Europe and test the hypothesis that WCM is, in fact, a complex of cryptic species. We used morphological data in combination with nucleotide sequences of the mitochondrial cytochromec oxidase subunitI (COI) and nuclear D2 region of 28S rDNA and internal transcribed spacer region (ITS1, ITS2) sequences. The molecular analyses did not support the monophyly of A. tosichella because the outgroup A. tulipae (Keifer, 1938) is grouped within WCM. The molecular datasets indicated the existence of distinct lineages within WCM, with the distances between lineages corresponding to interspecific divergence. Morphological analyses failed to clearly separate WCM populations and lineages, but completely separated A. tulipae from A. tosichella. The results suggest that what has been recognised historically as a single species is, in fact, a complex of several genetically isolated evolutionary lineages that demonstrate potential as cryptic species. Hence, their discrimination using solely morphological criteria may be misleading. These findings are particularly significant because of the economic importance of WCM as a direct pest and vector of plant viruses.
Collapse
|
177
|
McCormack JE, Faircloth BC, Crawford NG, Gowaty PA, Brumfield RT, Glenn TC. Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species-tree analysis. Genome Res 2011; 22:746-54. [PMID: 22207614 DOI: 10.1101/gr.125864.111] [Citation(s) in RCA: 260] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Phylogenomics offers the potential to fully resolve the Tree of Life, but increasing genomic coverage also reveals conflicting evolutionary histories among genes, demanding new analytical strategies for elucidating a single history of life. Here, we outline a phylogenomic approach using a novel class of phylogenetic markers derived from ultraconserved elements and flanking DNA. Using species-tree analysis that accounts for discord among hundreds of independent loci, we show that this class of marker is useful for recovering deep-level phylogeny in placental mammals. In broad outline, our phylogeny agrees with recent phylogenomic studies of mammals, including several formerly controversial relationships. Our results also inform two outstanding questions in placental mammal phylogeny involving rapid speciation, where species-tree methods are particularly needed. Contrary to most phylogenomic studies, our study supports a first-diverging placental mammal lineage that includes elephants and tenrecs (Afrotheria). The level of conflict among gene histories is consistent with this basal divergence occurring in or near a phylogenetic "anomaly zone" where a failure to account for coalescent stochasticity will mislead phylogenetic inference. Addressing a long-standing phylogenetic mystery, we find some support from a high genomic coverage data set for a traditional placement of bats (Chiroptera) sister to a clade containing Perissodactyla, Cetartiodactyla, and Carnivora, and not nested within the latter clade, as has been suggested recently, although other results were conflicting. One of the most remarkable findings of our study is that ultraconserved elements and their flanking DNA are a rich source of phylogenetic information with strong potential for application across Amniotes.
Collapse
Affiliation(s)
- John E McCormack
- Museum of Natural Science, Louisiana State University, Baton Rouge, Louisiana 70803, USA.
| | | | | | | | | | | |
Collapse
|
178
|
Yu Y, Warnow T, Nakhleh L. Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J Comput Biol 2011; 18:1543-59. [PMID: 22035329 PMCID: PMC3216099 DOI: 10.1089/cmb.2011.0174] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
One of the criteria for inferring a species tree from a collection of gene trees, when gene tree incongruence is assumed to be due to incomplete lineage sorting (ILS), is Minimize Deep Coalescence (MDC). Exact algorithms for inferring the species tree from rooted, binary trees under MDC were recently introduced. Nevertheless, in phylogenetic analyses of biological data sets, estimated gene trees may differ from true gene trees, be incompletely resolved, and not necessarily rooted. In this article, we propose new MDC formulations for the cases where the gene trees are unrooted/binary, rooted/non-binary, and unrooted/non-binary. Further, we prove structural theorems that allow us to extend the algorithms for the rooted/binary gene tree case to these cases in a straightforward manner. In addition, we devise MDC-based algorithms for cases when multiple alleles per species may be sampled. We study the performance of these methods in coalescent-based computer simulations.
Collapse
Affiliation(s)
- Yun Yu
- Department of Computer Science, Rice University, Houston, Texas
| | - Tandy Warnow
- Department of Computer Sciences, University of Texas at Austin, Austin, Texas
| | - Luay Nakhleh
- Department of Computer Sciences, University of Texas at Austin, Austin, Texas
| |
Collapse
|
179
|
Ames RM, Money D, Ghatge VP, Whelan S, Lovell SC. Determining the evolutionary history of gene families. ACTA ACUST UNITED AC 2011; 28:48-55. [PMID: 22039210 DOI: 10.1093/bioinformatics/btr592] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
MOTIVATION Recent large-scale studies of individuals within a population have demonstrated that there is widespread variation in copy number in many gene families. In addition, there is increasing evidence that the variation in gene copy number can give rise to substantial phenotypic effects. In some cases, these variations have been shown to be adaptive. These observations show that a full understanding of the evolution of biological function requires an understanding of gene gain and gene loss. Accurate, robust evolutionary models of gain and loss events are, therefore, required. RESULTS We have developed weighted parsimony and maximum likelihood methods for inferring gain and loss events. To test these methods, we have used Markov models of gain and loss to simulate data with known properties. We examine three models: a simple birth-death model, a single rate model and a birth-death innovation model with parameters estimated from Drosophila genome data. We find that for all simulations maximum likelihood-based methods are very accurate for reconstructing the number of duplication events on the phylogenetic tree, and that maximum likelihood and weighted parsimony have similar accuracy for reconstructing the ancestral state. Our implementations are robust to different model parameters and provide accurate inferences of ancestral states and the number of gain and loss events. For ancestral reconstruction, we recommend weighted parsimony because it has similar accuracy to maximum likelihood, but is much faster. For inferring the number of individual gene loss or gain events, maximum likelihood is noticeably more accurate, albeit at greater computational cost. AVAILABILITY www.bioinf.manchester.ac.uk/dupliphy CONTACT simon.lovell@manchester.ac.uk; simon.whelan@manchester.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ryan M Ames
- Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | | | | | | | | |
Collapse
|
180
|
Lee JY, Joseph L, Edwards SV. A species tree for the Australo-Papuan Fairy-wrens and allies (Aves: Maluridae). Syst Biol 2011; 61:253-71. [PMID: 21978990 DOI: 10.1093/sysbio/syr101] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We explored the efficacy of species tree methods at the family level in birds, using the Australo-Papuan Fairy-wrens (Passeriformes: Maluridae) as a model system. Fairy-wrens of the genus Malurus are known for high intensities of sexual selection, resulting in some cases in rapid speciation. This history suggests that incomplete lineage sorting (ILS) of neutrally evolving loci could be substantial, a situation that could compromise traditional methods of combining loci in phylogenetic analysis. Using 18 molecular markers (5 anonymous loci, 7 exons, 5 introns, and 1 mitochondrial DNA locus), we show that gene tree monophyly across species could be rejected for 16 of 18 loci, suggesting substantial ILS at the family level in these birds. Using the software Concaterpillar, we also detect three statistically distinct clusters of gene trees among the 18 loci. Despite substantial variation in gene trees, species trees constructed using four different species tree estimation methods (BEST, BUCKy, and STAR) were generally well supported and similar to each other and to the concatenation tree, with a few mild discordances at nodes that could be explained by rapid and recent speciation events. By contrast, minimizing deep coalescences produced a species tree that was topologically more divergent from those of the other methods as measured by multidimensional scaling of trees. Additionally, gene and species trees were topologically more similar in the BEST analysis, presumably because of the species tree prior employed in BEST which appropriately assumes that gene trees are correlated with each other and with the species tree. Among the 18 loci, we also discovered 102 independent indel markers, which also proved phylogenetically informative, primarily among genera, and displayed a ∼4-fold bias towards deletions. As suggested in earlier work, the grasswrens (Amytornis) are sister to the rest of the family and the emu-wrens (Stipiturus) are sister to fairy-wrens (Malurus, Clytomyias). Our study shows that ILS is common at the family level in birds yet, despite this, species tree methods converge on broadly similar results for this family.
Collapse
Affiliation(s)
- June Y Lee
- Museum of Comparative Zoology and Department of Organismic & Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
181
|
Jagadeeshan S, Haerty W, Singh RS. Is speciation accompanied by rapid evolution? Insights from comparing reproductive and nonreproductive transcriptomes in Drosophila. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2011; 2011:595121. [PMID: 21869936 PMCID: PMC3159995 DOI: 10.4061/2011/595121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2011] [Revised: 04/04/2011] [Accepted: 05/19/2011] [Indexed: 12/18/2022]
Abstract
The tempo and mode of evolutionary change during speciation have remained contentious until recently. While much of the evidence claiming speciation is an abrupt and rapid process comes from fossil data, recent molecular phylogenetics show that the background of gradual evolution is often broken by accelerated rates of molecular evolution during speciation. However, what kinds of genes affect or are affected by speciation remains unexplored. Our analysis of 4843 protein-coding genes in five species of the Drosophila melanogaster subgroup shows that while ~70% of genes follow clock-like evolution, between 17-19.67% of loci show signatures of accelerated rates of evolution in recently formed species. These genes show 2-3-fold higher rates of substitution in recently diverged species compared to older species. This fraction of loci affects a diverse range of functions. Only a small proportion of reproductive genes experience speciation-related accelerated changes but many sex-and -reproduction related genes show an interesting pattern of persistent rapid evolution suggesting that sex-and-reproduction related genes are under constant selective pressures. The identification of loci associated with accelerated evolution allows us to address the mechanisms of rapid evolution and speciation, which in our study appears to be a combination of both selection and rapid demographical changes.
Collapse
Affiliation(s)
- Santosh Jagadeeshan
- Department of Biology, McMaster University, Hamilton, ON, Canada L8S 4KI
- Smithsonian Tropical Research Institute, P. O. Box 0834-03092, Balboa, Ancón, Panama
| | - Wilfried Haerty
- Department of Biology, McMaster University, Hamilton, ON, Canada L8S 4KI
| | - Rama S. Singh
- Department of Biology, McMaster University, Hamilton, ON, Canada L8S 4KI
| |
Collapse
|
182
|
Arbiza L, Patricio M, Dopazo H, Posada D. Genome-wide heterogeneity of nucleotide substitution model fit. Genome Biol Evol 2011; 3:896-908. [PMID: 21824869 PMCID: PMC3175760 DOI: 10.1093/gbe/evr080] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
At a genomic scale, the patterns that have shaped molecular evolution are believed to be largely heterogeneous. Consequently, comparative analyses should use appropriate probabilistic substitution models that capture the main features under which different genomic regions have evolved. While efforts have concentrated in the development and understanding of model selection techniques, no descriptions of overall relative substitution model fit at the genome level have been reported. Here, we provide a characterization of best-fit substitution models across three genomic data sets including coding regions from mammals, vertebrates, and Drosophila (24,000 alignments). According to the Akaike Information Criterion (AIC), 82 of 88 models considered were selected as best-fit models at least in one occasion, although with very different frequencies. Most parameter estimates also varied broadly among genes. Patterns found for vertebrates and Drosophila were quite similar and often more complex than those found in mammals. Phylogenetic trees derived from models in the 95% confidence interval set showed much less variance and were significantly closer to the tree estimated under the best-fit model than trees derived from models outside this interval. Although alternative criteria selected simpler models than the AIC, they suggested similar patterns. All together our results show that at a genomic scale, different gene alignments for the same set of taxa are best explained by a large variety of different substitution models and that model choice has implications on different parameter estimates including the inferred phylogenetic trees. After taking into account the differences related to sample size, our results suggest a noticeable diversity in the underlying evolutionary process. All together, we conclude that the use of model selection techniques is important to obtain consistent phylogenetic estimates from real data at a genomic scale.
Collapse
Affiliation(s)
- Leonardo Arbiza
- Department of Biochemistry, Genetics, and Immunology, University of Vigo, Vigo, Spain
| | | | | | | |
Collapse
|
183
|
Sjölander K, Datta RS, Shen Y, Shoffner GM. Ortholog identification in the presence of domain architecture rearrangement. Brief Bioinform 2011; 12:413-22. [PMID: 21712343 PMCID: PMC3178056 DOI: 10.1093/bib/bbr036] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area.
Collapse
Affiliation(s)
- Kimmen Sjölander
- 308C Stanley Hall #1762, Department of Bioengineering, University of California, Berkeley, CA 94720, USA.
| | | | | | | |
Collapse
|
184
|
Escobar JS, Scornavacca C, Cenci A, Guilhaumon C, Santoni S, Douzery EJP, Ranwez V, Glémin S, David J. Multigenic phylogeny and analysis of tree incongruences in Triticeae (Poaceae). BMC Evol Biol 2011; 11:181. [PMID: 21702931 PMCID: PMC3142523 DOI: 10.1186/1471-2148-11-181] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2011] [Accepted: 06/24/2011] [Indexed: 11/30/2022] Open
Abstract
Background Introgressive events (e.g., hybridization, gene flow, horizontal gene transfer) and incomplete lineage sorting of ancestral polymorphisms are a challenge for phylogenetic analyses since different genes may exhibit conflicting genealogical histories. Grasses of the Triticeae tribe provide a particularly striking example of incongruence among gene trees. Previous phylogenies, mostly inferred with one gene, are in conflict for several taxon positions. Therefore, obtaining a resolved picture of relationships among genera and species of this tribe has been a challenging task. Here, we obtain the most comprehensive molecular dataset to date in Triticeae, including one chloroplastic and 26 nuclear genes. We aim to test whether it is possible to infer phylogenetic relationships in the face of (potentially) large-scale introgressive events and/or incomplete lineage sorting; to identify parts of the evolutionary history that have not evolved in a tree-like manner; and to decipher the biological causes of gene-tree conflicts in this tribe. Results We obtain resolved phylogenetic hypotheses using the supermatrix and Bayesian Concordance Factors (BCF) approaches despite numerous incongruences among gene trees. These phylogenies suggest the existence of 4-5 major clades within Triticeae, with Psathyrostachys and Hordeum being the deepest genera. In addition, we construct a multigenic network that highlights parts of the Triticeae history that have not evolved in a tree-like manner. Dasypyrum, Heteranthelium and genera of clade V, grouping Secale, Taeniatherum, Triticum and Aegilops, have evolved in a reticulated manner. Their relationships are thus better represented by the multigenic network than by the supermatrix or BCF trees. Noteworthy, we demonstrate that gene-tree incongruences increase with genetic distance and are greater in telomeric than centromeric genes. Together, our results suggest that recombination is the main factor decoupling gene trees from multigenic trees. Conclusions Our study is the first to propose a comprehensive, multigenic phylogeny of Triticeae. It clarifies several aspects of the relationships among genera and species of this tribe, and pinpoints biological groups with likely reticulate evolution. Importantly, this study extends previous results obtained in Drosophila by demonstrating that recombination can exacerbate gene-tree conflicts in phylogenetic reconstructions.
Collapse
Affiliation(s)
- Juan S Escobar
- Institut National de la Recherche Agronomique, Centre de Montpellier, UMR Diversité et Adaptation des Plantes Cultivées, Domaine de Melgueil, 34130 Mauguio, France.
| | | | | | | | | | | | | | | | | |
Collapse
|
185
|
Roos C, Zinner D, Kubatko LS, Schwarz C, Yang M, Meyer D, Nash SD, Xing J, Batzer MA, Brameier M, Leendertz FH, Ziegler T, Perwitasari-Farajallah D, Nadler T, Walter L, Osterholz M. Nuclear versus mitochondrial DNA: evidence for hybridization in colobine monkeys. BMC Evol Biol 2011; 11:77. [PMID: 21435245 PMCID: PMC3068967 DOI: 10.1186/1471-2148-11-77] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Accepted: 03/24/2011] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Colobine monkeys constitute a diverse group of primates with major radiations in Africa and Asia. However, phylogenetic relationships among genera are under debate, and recent molecular studies with incomplete taxon-sampling revealed discordant gene trees. To solve the evolutionary history of colobine genera and to determine causes for possible gene tree incongruences, we combined presence/absence analysis of mobile elements with autosomal, X chromosomal, Y chromosomal and mitochondrial sequence data from all recognized colobine genera. RESULTS Gene tree topologies and divergence age estimates derived from different markers were similar, but differed in placing Piliocolobus/Procolobus and langur genera among colobines. Although insufficient data, homoplasy and incomplete lineage sorting might all have contributed to the discordance among gene trees, hybridization is favored as the main cause of the observed discordance. We propose that African colobines are paraphyletic, but might later have experienced female introgression from Piliocolobus/Procolobus into Colobus. In the late Miocene, colobines invaded Eurasia and diversified into several lineages. Among Asian colobines, Semnopithecus diverged first, indicating langur paraphyly. However, unidirectional gene flow from Semnopithecus into Trachypithecus via male introgression followed by nuclear swamping might have occurred until the earliest Pleistocene. CONCLUSIONS Overall, our study provides the most comprehensive view on colobine evolution to date and emphasizes that analyses of various molecular markers, such as mobile elements and sequence data from multiple loci, are crucial to better understand evolutionary relationships and to trace hybridization events. Our results also suggest that sex-specific dispersal patterns, promoted by a respective social organization of the species involved, can result in different hybridization scenarios.
Collapse
Affiliation(s)
- Christian Roos
- Primate Genetics Laboratory, German Primate Center, Göttingen, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
186
|
Parametric Analysis of Alignment and Phylogenetic Uncertainty. Bull Math Biol 2011; 73:795-810. [DOI: 10.1007/s11538-010-9610-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2009] [Accepted: 11/04/2010] [Indexed: 10/18/2022]
|
187
|
Markova-Raina P, Petrov D. High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes. Genome Res 2011; 21:863-74. [PMID: 21393387 DOI: 10.1101/gr.115949.110] [Citation(s) in RCA: 108] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
We investigate the effect of aligner choice on inferences of positive selection using site-specific models of molecular evolution. We find that independently of the choice of aligner, the rate of false positives is unacceptably high. Our study is a whole-genome analysis of all protein-coding genes in 12 Drosophila genomes annotated in either all 12 species (~6690 genes) or in the six melanogaster group species. We compare six popular aligners: PRANK, T-Coffee, ClustalW, ProbCons, AMAP, and MUSCLE, and find that the aligner choice strongly influences the estimates of positive selection. Differences persist when we use (1) different stringency cutoffs, (2) different selection inference models, (3) alignments with or without gaps, and/or additional masking, (4) per-site versus per-gene statistics, (5) closely related melanogaster group species versus more distant 12 Drosophila genomes. Furthermore, we find that these differences are consequential for downstream analyses such as determination of over/under-represented GO terms associated with positive selection. Visual analysis indicates that most sites inferred as positively selected are, in fact, misaligned at the codon level, resulting in false positive rates of 48%-82%. PRANK, which has been reported to outperform other aligners in simulations, performed best in our empirical study as well. Unfortunately, PRANK still had a high, and unacceptable for most applications, false positives rate of 50%-55%. We identify misannotations and indels, many of which appear to be located in disordered protein regions, as primary culprits for the high misalignment-related error levels and discuss possible workaround approaches to this apparently pervasive problem in genome-wide evolutionary analyses.
Collapse
Affiliation(s)
- Penka Markova-Raina
- Department of Biology, Stanford University, Stanford, California 94305, USA.
| | | |
Collapse
|
188
|
Hobolth A, Dutheil JY, Hawks J, Schierup MH, Mailund T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res 2011; 21:349-56. [PMID: 21270173 PMCID: PMC3044849 DOI: 10.1101/gr.114751.110] [Citation(s) in RCA: 153] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We search the complete orangutan genome for regions where humans are more closely related to orangutans than to chimpanzees due to incomplete lineage sorting (ILS) in the ancestor of human and chimpanzees. The search uses our recently developed coalescent hidden Markov model (HMM) framework. We find ILS present in ∼1% of the genome, and that the ancestral species of human and chimpanzees never experienced a severe population bottleneck. The existence of ILS is validated with simulations, site pattern analysis, and analysis of rare genomic events. The existence of ILS allows us to disentangle the time of isolation of humans and orangutans (the speciation time) from the genetic divergence time, and we find speciation to be as recent as 9-13 million years ago (Mya; contingent on the calibration point). The analyses provide further support for a recent speciation of human and chimpanzee at ∼4 Mya and a diverse ancestor of human and chimpanzee with an effective population size of about 50,000 individuals. Posterior decoding infers ILS for each nucleotide in the genome, and we use this to deduce patterns of selection in the ancestral species. We demonstrate the effect of background selection in the common ancestor of humans and chimpanzees. In agreement with predictions from population genetics, ILS was found to be reduced in exons and gene-dense regions when we control for confounding factors such as GC content and recombination rate. Finally, we find the broad-scale recombination rate to be conserved through the complete ape phylogeny.
Collapse
Affiliation(s)
- Asger Hobolth
- Bioinformatics Research Center, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Julien Y. Dutheil
- Bioinformatics Research Center, Aarhus University, DK-8000 Aarhus C, Denmark
| | - John Hawks
- University of Wisconsin–Madison, Madison, Wisconsin 53706, USA
| | - Mikkel H. Schierup
- Bioinformatics Research Center, Aarhus University, DK-8000 Aarhus C, Denmark
- Department of Biology, Aarhus University, DK-8000 Aarhus C, Denmark
- Corresponding authors.E-mail ; fax 45-8942-3077.E-mail
| | - Thomas Mailund
- Bioinformatics Research Center, Aarhus University, DK-8000 Aarhus C, Denmark
- Corresponding authors.E-mail ; fax 45-8942-3077.E-mail
| |
Collapse
|
189
|
Phylogenetic relationships in the spoon tarsus subgroup of Hawaiian drosophila: Conflict and concordance between gene trees. Mol Phylogenet Evol 2011; 58:492-501. [DOI: 10.1016/j.ympev.2010.12.015] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2010] [Revised: 12/24/2010] [Accepted: 12/24/2010] [Indexed: 11/23/2022]
|
190
|
Ané C. Detecting phylogenetic breakpoints and discordance from genome-wide alignments for species tree reconstruction. Genome Biol Evol 2011; 3:246-58. [PMID: 21362638 PMCID: PMC3070431 DOI: 10.1093/gbe/evr013] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
With the easy acquisition of sequence data, it is now possible to obtain and align whole genomes across multiple related species or populations. In this work, I assess the performance of a statistical method to reconstruct the whole distribution of phylogenetic trees along the genome, estimate the proportion of the genome for which a given clade is true, and infer a concordance tree that summarizes the dominant vertical inheritance pattern. There are two main issues when dealing with whole-genome alignments, as opposed to multiple genes: the size of the data and the detection of recombination breakpoints. These breakpoints partition the genomic alignment into phylogenetically homogeneous loci, where sites within a given locus all share the same phylogenetic tree topology. To delimitate these loci, I describe here a method based on the minimum description length (MDL) principle, implemented with dynamic programming for computational efficiency. Simulations show that combining MDL partitioning with Bayesian concordance analysis provides an efficient and robust way to estimate both the vertical inheritance signal and the horizontal phylogenetic signal. The method performed well both in the presence of incomplete lineage sorting and in the presence of horizontal gene transfer. A high level of systematic bias was found here, highlighting the need for good individual tree building methods, which form the basis for more elaborate gene tree/species tree reconciliation methods.
Collapse
Affiliation(s)
- Cécile Ané
- Departments of Statistics and Botany, University of Wisconsin-Madison, USA.
| |
Collapse
|
191
|
Yang CC, Sakai H, Numa H, Itoh T. Gene tree discordance of wild and cultivated Asian rice deciphered by genome-wide sequence comparison. Gene 2011; 477:53-60. [PMID: 21277362 DOI: 10.1016/j.gene.2011.01.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Revised: 01/13/2011] [Accepted: 01/14/2011] [Indexed: 12/21/2022]
Abstract
Although a large number of genes are expected to correctly solve a phylogenetic relationship, inconsistent gene tree topologies have been observed. This conflicting evidence in gene tree topologies, known as gene tree discordance, becomes increasingly important as advanced sequencing technologies produce an enormous amount of sequence information for phylogenomic studies among closely related species. Here, we aim to characterize the gene tree discordance of the Asian cultivated rice Oryza sativa and its progenitor, O. rufipogon, which will be an ideal case study of gene tree discordance. Using genome and cDNA sequences of O. sativa and O. rufipogon, we have conducted the first in-depth analyses of gene tree discordance in Asian rice. Our comparison of full-length cDNA sequences of O. rufipogon with the genome sequences of the japonica and indica cultivars of O. sativa revealed that 60% of the gene trees showed a topology consistent with the expected one, whereas the remaining genes supported significantly different topologies. Moreover, the proportions of the topologies deviated significantly from expectation, suggesting at least one hybridization event between the two subgroups of O. sativa, japonica and indica. In fact, a genome-wide alignment between japonica and indica indicated that significant portions of the indica genome are derived from japonica. In addition, literature concerning the pedigree of the indica cultivar strongly supported the hybridization hypothesis. Our molecular evolutionary analyses deciphered complicated evolutionary processes in closely related species. They also demonstrated the importance of gene tree discordance in the era of high-speed DNA sequencing.
Collapse
Affiliation(s)
- Ching-chia Yang
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8561, Japan.
| | | | | | | |
Collapse
|
192
|
Yu Y, Than C, Degnan JH, Nakhleh L. Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst Biol 2011; 60:138-49. [PMID: 21248369 DOI: 10.1093/sysbio/syq084] [Citation(s) in RCA: 126] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Analyses of the increasingly available genomic data continue to reveal the extent of hybridization and its role in the evolutionary diversification of various groups of species. We show, through extensive coalescent-based simulations of multilocus data sets on phylogenetic networks, how divergence times before and after hybridization events can result in incomplete lineage sorting with gene tree incongruence signatures identical to those exhibited by hybridization. Evolutionary analysis of such data under the assumption of a species tree model can miss all hybridization events, whereas analysis under the assumption of a species network model would grossly overestimate hybridization events. These issues necessitate a paradigm shift in evolutionary analysis under these scenarios, from a model that assumes a priori a single source of gene tree incongruence to one that integrates multiple sources in a unifying framework. We propose a framework of coalescence within the branches of a phylogenetic network and show how this framework can be used to detect hybridization despite incomplete lineage sorting. We apply the model to simulated data and show that the signature of hybridization can be revealed as long as the interval between the divergence times of the species involved in hybridization is not too small. We reanalyze a data set of 106 loci from 7 in-group Saccharomyces species for which a species tree with no hybridization has been reported in the literature. Our analysis supports the hypothesis that hybridization occurred during the evolution of this group, explaining a large amount of the incongruence in the data. Our findings show that an integrative approach to gene tree incongruence and its reconciliation is needed. Our framework will help in systematically analyzing genomic data for the occurrence of hybridization and elucidating its evolutionary role.
Collapse
Affiliation(s)
- Yun Yu
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | | | | | | |
Collapse
|
193
|
Guirao-Rico S, Aguadé M. Molecular evolution of the ligands of the insulin-signaling pathway: dilp genes in the genus Drosophila. Mol Biol Evol 2010; 28:1557-60. [PMID: 21196470 DOI: 10.1093/molbev/msq353] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Drosophila melanogaster, unlike mammals, has seven insulin-like peptides (DILPS). In Drosophila, all seven genes (dilp1-7) are single copy in the 12 species studied, except for D. grimshawi with two tandem copies of dilp2. Our comparative analysis revealed that genes dilp1-dilp7 exhibit differential functional constraint, which is indicative of some functional divergence. Species of the subgenera Sophophora and Drosophila differ in some traits likely affected by the insulin-signaling pathway, such as adult body size. It is in the branch connecting the two subgenera that we found the footprint left by positive selection driving nonsynonymous changes at some dilp1 codons to fixation. Finally, the similar rate at which the two dilp2 copies of D. grimshawi have evolved since their duplication and the presence of a putative regulatory region highly conserved between the two paralogs would suggest that both copies were preserved either because of subfunctionalization or dose dependency rather than by the neofunctionalization of one of the two copies.
Collapse
Affiliation(s)
- Sara Guirao-Rico
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | | |
Collapse
|
194
|
Compact genomes and complex evolution in the genus Brachypodium. Chromosoma 2010; 120:199-212. [DOI: 10.1007/s00412-010-0303-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2010] [Revised: 12/01/2010] [Accepted: 12/03/2010] [Indexed: 12/31/2022]
|
195
|
Liu L, Yu L, Edwards SV. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 2010; 10:302. [PMID: 20937096 PMCID: PMC2976751 DOI: 10.1186/1471-2148-10-302] [Citation(s) in RCA: 403] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2009] [Accepted: 10/11/2010] [Indexed: 12/01/2022] Open
Abstract
Background Several phylogenetic approaches have been developed to estimate species trees from collections of gene trees. However, maximum likelihood approaches for estimating species trees under the coalescent model are limited. Although the likelihood of a species tree under the multispecies coalescent model has already been derived by Rannala and Yang, it can be shown that the maximum likelihood estimate (MLE) of the species tree (topology, branch lengths, and population sizes) from gene trees under this formula does not exist. In this paper, we develop a pseudo-likelihood function of the species tree to obtain maximum pseudo-likelihood estimates (MPE) of species trees, with branch lengths of the species tree in coalescent units. Results We show that the MPE of the species tree is statistically consistent as the number M of genes goes to infinity. In addition, the probability that the MPE of the species tree matches the true species tree converges to 1 at rate O(M -1). The simulation results confirm that the maximum pseudo-likelihood approach is statistically consistent even when the species tree is in the anomaly zone. We applied our method, Maximum Pseudo-likelihood for Estimating Species Trees (MP-EST) to a mammal dataset. The four major clades found in the MP-EST tree are consistent with those in the Bayesian concatenation tree. The bootstrap supports for the species tree estimated by the MP-EST method are more reasonable than the posterior probability supports given by the Bayesian concatenation method in reflecting the level of uncertainty in gene trees and controversies over the relationship of four major groups of placental mammals. Conclusions MP-EST can consistently estimate the topology and branch lengths (in coalescent units) of the species tree. Although the pseudo-likelihood is derived from coalescent theory, and assumes no gene flow or horizontal gene transfer (HGT), the MP-EST method is robust to a small amount of HGT in the dataset. In addition, increasing the number of genes does not increase the computational time substantially. The MP-EST method is fast for analyzing datasets that involve a large number of genes but a moderate number of species.
Collapse
Affiliation(s)
- Liang Liu
- Department of Agriculture and Natural Resources, Delaware State University, Dover, DE 19901, USA.
| | | | | |
Collapse
|
196
|
Aguileta G, Marthey S, Chiapello H, Lebrun MH, Rodolphe F, Fournier E, Gendrault-Jacquemard A, Giraud T. Assessing the performance of single-copy genes for recovering robust phylogenies. Syst Biol 2010; 57:613-27. [PMID: 18709599 DOI: 10.1080/10635150802306527] [Citation(s) in RCA: 132] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available.
Collapse
Affiliation(s)
- G Aguileta
- Laboratoire Ecologie, Systématique et Evolution, Université Paris-Sud, Orsay, UMR8079, Orsay, Cedex, France.
| | | | | | | | | | | | | | | |
Collapse
|
197
|
Gattolliat JL, Monaghan MT. DNA-based association of adults and larvae in Baetidae (Ephemeroptera) with the description of a new genusAdnoptilumin Madagascar. ACTA ACUST UNITED AC 2010. [DOI: 10.1899/09-119.1] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
| | - Michael T. Monaghan
- Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 301, 12587 Berlin, Germany
| |
Collapse
|
198
|
PAYSEUR BRETA. Using differential introgression in hybrid zones to identify genomic regions involved in speciation. Mol Ecol Resour 2010; 10:806-20. [DOI: 10.1111/j.1755-0998.2010.02883.x] [Citation(s) in RCA: 164] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
199
|
Ridout KE, Dixon CJ, Filatov DA. Positive selection differs between protein secondary structure elements in Drosophila. Genome Biol Evol 2010; 2:166-79. [PMID: 20624723 PMCID: PMC2997536 DOI: 10.1093/gbe/evq008] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Different protein secondary structure elements have different physicochemical properties and roles in the protein, which may determine their evolutionary flexibility. However, it is not clear to what extent protein structure affects the way Darwinian selection acts at the amino acid level. Using phylogeny-based likelihood tests for positive selection, we have examined the relationship between protein secondary structure and selection across six species of Drosophila. We find that amino acids that form disordered regions, such as random coils, are far more likely to be under positive selection than expected from their proportion in the proteins, and residues in helices and β-structures are subject to less positive selection than predicted. In addition, it appears that sites undergoing positive selection are more likely than expected to occur close to one another in the protein sequence. Finally, on a genome-wide scale, we have determined that positively selected sites are found more frequently toward the gene ends. Our results demonstrate that protein structures with a greater degree of organization and strong hydrophobicity, represented here as helices and β-structures, are less tolerant to molecular adaptation than disordered, hydrophilic regions, across a diverse set of proteins.
Collapse
Affiliation(s)
- Kate E Ridout
- Department of Plant Sciences, University of Oxford, Oxford, United Kingdom
| | | | | |
Collapse
|
200
|
Thomas PD. GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC Bioinformatics 2010; 11:312. [PMID: 20534164 PMCID: PMC2905364 DOI: 10.1186/1471-2105-11-312] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2009] [Accepted: 06/09/2010] [Indexed: 11/10/2022] Open
Abstract
Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.
Collapse
Affiliation(s)
- Paul D Thomas
- Evolutionary Systems Biology Group, SRI International, Menlo Park, CA, USA.
| |
Collapse
|