26
|
Penny D, Hasegawa M, Waddell PJ, Hendy MD. Mammalian evolution: timing and implications from using the LogDeterminant transform for proteins of differing amino acid composition. Syst Biol 1999; 48:76-93. [PMID: 12078647 DOI: 10.1080/106351599260454] [Citation(s) in RCA: 66] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022] Open
Abstract
We explore the tree of mammalian mtDNA sequences, using particularly the LogDet transform on amino acid sequences, the distance Hadamard transform, and the Closest Tree selection criterion. The amino acid composition of different species show significant differences, even within mammals. After compensating for these differences, nearest-neighbor bootstrap results suggest that the tree is locally stable, though a few groups show slightly greater rearrangements when a large proportion of the constant sites are removed. Many parts of the trees we obtain agree with those on published protein ML trees. Interesting results include a preference for rodent monophyly. The detection of a few alternative signals to those on the optimal tree were obtained using the distance Hadamard transform (with results expressed as a Lento plot). One rearrangement suggested was the interchange of the position of primates and rodents on the optimal tree. The basic stability of the tree, combined with two calibration points (whale/cow and horse/rhinoceros), together with a distant secondary calibration from the mammal/bird divergence, allows inferences of the times of divergence of putative clades. Allowing for sampling variances due to finite sequence length, most major divergences amongst lineages leading to modern orders, appear to occur well before the Cretaceous/Tertiary (K/T) boundary. Implications arising from these early divergences are discussed, particularly the possibility of competition between the small dinosaurs and the new mammal clades.
Collapse
|
27
|
Matisoo-Smith E, Roberts RM, Irwin GJ, Allen JS, Penny D, Lambert DM. Patterns of prehistoric human mobility in polynesia indicated by mtDNA from the Pacific rat. Proc Natl Acad Sci U S A 1998; 95:15145-50. [PMID: 9844030 PMCID: PMC24590 DOI: 10.1073/pnas.95.25.15145] [Citation(s) in RCA: 120] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Human settlement of Polynesia was a major event in world prehistory. Despite the vastness of the distances covered, research suggests that prehistoric Polynesian populations maintained spheres of continuing interaction for at least some period of time in some regions. A low level of genetic variation in ancestral Polynesian populations, genetic admixture (both prehistoric and post-European contact), and severe population crashes resulting from introduction of European diseases make it difficult to trace prehistoric human mobility in the region by using only human genetic and morphological markers. We focus instead on an animal that accompanied the ancestral Polynesians on their voyages. DNA phylogenies derived from mitochondrial control-region sequences of Pacific rats (Rattus exulans) from east Polynesia are presented. A range of specific hypotheses regarding the degree of interaction within Polynesia are tested. These include the issues of multiple contacts between central east Polynesia and the geographically distinct archipelagos of New Zealand and Hawaii. Results are inconsistent with models of Pacific settlement involving substantial isolation after colonization and confirm the value of genetic studies on commensal species for elucidating the history of human settlement.
Collapse
|
28
|
Bromham L, Rambaut A, Fortey R, Cooper A, Penny D. Testing the Cambrian explosion hypothesis by using a molecular dating technique. Proc Natl Acad Sci U S A 1998; 95:12386-9. [PMID: 9770496 PMCID: PMC22841 DOI: 10.1073/pnas.95.21.12386] [Citation(s) in RCA: 104] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/1998] [Indexed: 11/18/2022] Open
Abstract
Molecular studies have the potential to shed light on the origin of the animal phyla by providing independent estimates of the divergence times, but have been criticized for failing to account adequately for variation in rate of evolution. A method of dating divergence times from molecular data addresses the criticisms of earlier studies and provides more realistic, but wider, confidence intervals. The data are not compatible with the Cambrian explosion hypothesis as an explanation for the origin of metazoan phyla, and provide additional support for an extended period of Precambrian metazoan diversification.
Collapse
|
29
|
Penny D. Implant site for avian microchips. Vet Rec 1998; 143:288. [PMID: 9787430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
30
|
Murray-McIntosh RP, Scrimshaw BJ, Hatfield PJ, Penny D. Testing migration patterns and estimating founding population size in Polynesia by using human mtDNA sequences. Proc Natl Acad Sci U S A 1998; 95:9047-52. [PMID: 9671802 PMCID: PMC21200 DOI: 10.1073/pnas.95.15.9047] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/1997] [Accepted: 03/30/1998] [Indexed: 02/08/2023] Open
Abstract
The hypervariable 1 region of human mtDNA shows markedly reduced variability in Polynesians, and this variability decreases from western to eastern Polynesia. Fifty-four sequences from New Zealand Maori show that the mitochondrial variability with just four haplotypes is the lowest of any sizeable human group studied and that the frequency of haplotypes is markedly skewed. The Maori sequences, combined with 268 published sequences from the Pacific, are consistent with a series of founder effects from small populations settling new island groups. The distributions of haplotypes were used to estimate the number of females in founding population of New Zealand Maori. The three-step simulation used a randomly selected founding population from eastern Polynesia, an expansionary phase in New Zealand, and finally the random selection of 54 haplotypes. The results are consistent with a founding population that includes approximately 70 women (between 50 and 100), and sensitivity analysis shows that this conclusion is robust to small changes in haplotype frequencies. This size is too large for models postulating a very small founding population of "castaways," but it is consistent with a general understanding of Maori oral history as well as the results of recent canoe voyages recreating early trans-oceanic voyages.
Collapse
|
31
|
Penny D, Murray-McIntosh RP, Hendy MD. Estimating times of divergence with a change of rate: the orangutan/African ape divergence. Mol Biol Evol 1998; 15:608-10. [PMID: 9580991 DOI: 10.1093/oxfordjournals.molbev.a025962] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
|
32
|
Abstract
An RNA world is widely accepted as a probable stage in the early evolution of life. Two implications are that proteins have gradually replaced RNA as the main biological catalysts and that RNA has not taken on any major de novo catalytic function after the evolution of protein synthesis, that is, there is an essentially irreversible series of steps RNA --> RNP --> protein. This transition, as expected from a consideration of catalytic perfection, is essentially complete for reactions when the substrates are small molecules. Based on these principles we derive criteria for identifying RNAs in modern organisms that are relics from the RNA world and then examine the function and phylogenetic distribution of RNA for such remnants of the RNA world. This allows an estimate of the minimum complexity of the last ribo-organism-the stage just preceding the advent of genetically encoded protein synthesis. Despite the constraints placed on its size by a low fidelity of replication (the Eigen limit), we conclude that the genome of this organism reached a considerable level of complexity that included several RNA-processing steps. It would include a large protoribosome with many smaller RNAs involved in its assembly, pre-tRNAs and tRNA processing, an ability for recombination of RNA, some RNA editing, an ability to copy to the end of each RNA strand, and some transport functions. It is harder to recognize specific metabolic reactions that must have existed but synthetic and bio-energetic functions would be necessary. Overall, this requires that such an organism maintained a multiple copy, double-stranded linear RNA genome capable of recombination and splicing. The genome was most likely fragmented, allowing each "chromosome" to be replicated with minimum error, that is, within the Eigen limit. The model as developed serves as an outgroup to root the tree of life and is an alternative to using sequence data for inferring properties of the earliest cells.
Collapse
|
33
|
Abstract
We describe a sequential (step by step) Darwinian model for the evolution of life from the late stages of the RNA world through to the emergence of eukaryotes and prokaryotes. The starting point is our model, derived from current RNA activity, of the RNA world just prior to the advent of genetically-encoded protein synthesis. By focusing on the function of the protoribosome we develop a plausible model for the evolution of a protein-synthesizing ribosome from a high-fidelity RNA polymerase that incorporated triplets of oligonucleotides. With the standard assumption that during the evolution of enzymatic activity, catalysis is transferred from RNA --> RNP --> protein, the first proteins in the "breakthrough organism" (the first to have encoded protein synthesis) would be nonspecific chaperone-like proteins rather than catalytic. Moreover, because some RNA molecules that pre-date protein synthesis under this model now occur as introns in some of the very earliest proteins, the model predicts these particular introns are older than the exons surrounding them, the "introns-first" theory. Many features of the model for the genome organization in the final RNA world ribo-organism are more prevalent in the eukaryotic genome and we suggest that the prokaryotic genome organization (a single, circular genome with one center of replication) was derived from a "eukaryotic-like" genome organization (a fragmented linear genome with multiple centers of replication). The steps from the proposed ribo-organism RNA genome --> eukaryotic-like DNA genome --> prokaryotic-like DNA genome are all relatively straightforward, whereas the transition prokaryotic-like genome --> eukaryotic-like genome appears impossible under a Darwinian mechanism of evolution, given the assumption of the transition RNA --> RNP --> protein. A likely molecular mechanism, "plasmid transfer," is available for the origin of prokaryotic-type genomes from an eukaryotic-like architecture. Under this model prokaryotes are considered specialized and derived with reduced dependence on ssRNA biochemistry. A functional explanation is that prokaryote ancestors underwent selection for thermophily (high temperature) and/or for rapid reproduction (r selection) at least once in their history.
Collapse
|
34
|
Waddell PJ, Penny D, Moore T. Hadamard conjugations and modeling sequence evolution with unequal rates across sites. Mol Phylogenet Evol 1997; 8:33-50. [PMID: 9242594 DOI: 10.1006/mpev.1997.0405] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
This paper considers the many different distributions that may approximate the distribution of site rates in DNA sequences and shows how the Hadamard conjugation may be modified to take these into account. This is done for both 2-state and 4-state data. Distributions which give simple closed forms include the gamma (gamma) distribution, the inverse Gaussian distribution (which is similar to the lognormal), and a mixture of either of these with a proportion of sites which cannot change (invariant sites). It is seen that the tail of a distribution can have major effects upon the coefficient of variation of site rates. Because the Hadamard conjugation can be used to either correct data or predict the data given the model (i.e., the likelihood of site patterns), light is shed on properties of maximum likelihood tree selection with unequal site rates. Analysis of rRNA shows how unequal rates across sites can change the optimal tree. Maximum likelihood analysis also shows that distinct distributions fit each data set, with the gamma often not being the best. Analyzing both these data and a long stretch of primate mtDNA reveals evidence of many "hidden" multiple substitutions, while signals not corresponding to the preferred biological tree generally decrease an unequal rates are allowed for. Last, we discuss the expected behavior of sequences evolving by models where stabilizing selection alone explains unequal site rates. Such models do not explain "synapomorphies" or informative changes in ancient molecules, because while stabilizing selection can vastly decrease change at a site, it will also vastly accelerate back-substitution (leaving only a covarion model to explain old synapomorphies). When and why models allowing a continuous distribution of site rates (e.g., gamma) will approximate covarion evolution requires further study.
Collapse
|
35
|
|
36
|
Abstract
The extent of terrestrial vertebrate extinctions at the end of the Cretaceous is poorly understood, and estimates have ranged from a mass extinction to limited extinctions of specific groups. Molecular and paleontological data demonstrate that modern bird orders started diverging in the Early Cretaceous; at least 22 avian lineages of modern birds cross the Cretaceous-Tertiary boundary. Data for several other terrestrial vertebrate groups indicate a similar pattern of survival and, taken together, favor incremental changes during a Cretaceous diversification of birds and mammals rather than an explosive radiation in the Early Tertiary.
Collapse
|
37
|
Schardl CL, Leuchtmann A, Chung KR, Penny D, Siegel MR. Coevolution by Common Descent of Fungal Symbionts (Epichloe spp.) and Grass Hosts. Mol Biol Evol 1997. [DOI: 10.1093/oxfordjournals.molbev.a025746] [Citation(s) in RCA: 134] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
38
|
Lockhart PJ, Larkum AW, Steel M, Waddell PJ, Penny D. Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc Natl Acad Sci U S A 1996; 93:1930-4. [PMID: 8700861 PMCID: PMC39885 DOI: 10.1073/pnas.93.5.1930] [Citation(s) in RCA: 150] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Competing hypotheses seek to explain the evolution of oxygenic and anoxygenic processes of photosynthesis. Since chlorophyll is less reduced and precedes bacteriochlorophyll on the modern biosynthetic pathway, it has been proposed that chlorophyll preceded bacteriochlorophyll in its evolution. However, recent analyses of nucleotide sequences that encode chlorophyll and bacteriochlorophyll biosynthetic enzymes appear to provide support for an alternative hypothesis. This is that the evolution of bacteriochlorophyll occurred earlier than the evolution of chlorophyll. Here we demonstrate that the presence of invariant sites in sequence datasets leads to inconsistency in tree building (including maximum-likelihood methods). Homologous sequences with different biological functions often share invariant sites at the same nucleotide positions. However, different constraints can also result in additional invariant sites unique to the genes, which have specific and different biological functions. Consequently, the distribution of these sites can be uneven between the different types of homologous genes. The presence of invariant sites, shared by related biosynthetic genes as well as those unique to only some of these genes, has misled the recent evolutionary analysis of oxygenic and anoxygenic photosynthetic pigments. We evaluate an alternative scheme for the evolution of chlorophyll and bacteriochlorophyll.
Collapse
|
39
|
Hickson RE, Simon C, Cooper A, Spicer GS, Sullivan J, Penny D. Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12S rRNA. Mol Biol Evol 1996; 13:150-69. [PMID: 8583888 DOI: 10.1093/oxfordjournals.molbev.a025552] [Citation(s) in RCA: 185] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Secondary structure models are an important step for aligning sequences, understanding probabilities of nucleotide substitutions, and evaluating the reliability of phylogenetic reconstructions. A set of conserved sequence motifs is derived from comparative sequence analysis of 184 invertebrate and vertebrate taxa (including many taxa from the same genera, families, and orders) with reference to a secondary structure model for domain III of animal mitochondrial small subunit (12S) ribosomal RNA. A template is presented to assist with secondary structure drawing. Our model is similar to previous models but is more specific to mitochondrial DNA, fitting both invertebrate and vertebrate groups, including taxa with markedly different nucleotide compositions. The second half of the domain III sequence can be difficult to align precisely, even when secondary structure information is considered. This is especially true for comparisons of anciently diverged taxa, but well-conserved motifs assist in determining biologically meaningful alignments. Patterns of conservation and variability in both paired and unpaired regions make differential phylogenetic weighting in terms of "stems" and "loops" unsatisfactory. We emphasize looking carefully at the sequence data before and during analyses, and advocate the use of conserved motifs and other secondary structure information for assessing sequencing fidelity.
Collapse
|
40
|
Hendy MD, Penny D. Complete families of linear invariants for some stochastic models of sequence evolution, with and without the molecular clock assumption. J Comput Biol 1996; 3:19-31. [PMID: 8697236 DOI: 10.1089/cmb.1996.3.19] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
For various models of sequence evolution, the set of linear functions of the frequencies of the nucleotide patterns forms a vector space, the invariant space. Here we distinguish between the model of nucleotide substitution, and the phylogenetic tree T describing the paths on which these changes occur. We describe a procedure to construct a basis of the invariant space for those models that are extensions of models incorporating Kimura's three substitution model of nucleotide change, including both the Jukes-Cantor and Cavender-Farris models. The dimension of the invariant space is determined, for those models where it is independent of the tree topology, as a function of the number of sequences. These are calculated where the nucleotide distribution at the root is unspecified, and both with, and without, the assumption of the molecular clock hypothesis. The invariants have a number of potential applications, including tree identification, and testing the fit of models (which could include the molecular clock) to sequence data.
Collapse
|
41
|
Penny D, Steel M, Waddell PJ, Hendy MD. Improved analyses of human mtDNA sequences support a recent African origin for Homo sapiens. Mol Biol Evol 1995; 12:863-82. [PMID: 7476132 DOI: 10.1093/oxfordjournals.molbev.a040263] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
New quantitative methods are applied to the 135 human mitochondrial sequences from the Vigilant et al. data set. General problems in analyzing large numbers of short sequences are discussed, and an improved strategy is suggested. A key feature is to focus not on individual trees but on the general "landscape" of trees. Over 1,000 searches were made from random starting trees with only one tree (a local optimum) being retained each time, thereby ensuring optima were found independently. A new tree comparison metric was developed that is unaffected by rearrangements of trees around many very short internal edges. Use of this metric showed that downweighting hypervariable sites revealed more evolutionary structure than studies that weighted all sites equally. Our results are consistent with convergence toward a global optimum. Crucial features are that the best optima show very strong regional differentiation, a common group of 49 African sequences is found in all the best optima, and the best optima contain the 16 !Kung sequences in a separate group of San people. The other 86 sequences form a heterogeneous mixture of Africans, Europeans, Australopapuans, and Asians. Thus all major human lineages occur in Africa, but only a subset occurs in the rest of the world. The existence of these African-only groups strongly contradicts multiregional theories for the origin of Homo sapiens that require widespread migration and interbreeding over the entire range of H. erectus. Only when the multiregional model is rejected is it appropriate to consider the root, based on a single locus, to be the center of origin of a population (otherwise different loci could give alternative geographic positions for the root). For this data, several methods locate the root within the group of 49 African sequences and are thus consistent with the recent African origin of H. sapiens. We demonstrate that the time of the last common ancestor cannot be the time of major expansion in human numbers, and our results are thus also consistent with recent models that differentiate between the last common ancestor, expansion out of Africa, and the major expansion in human populations. Such a two-phase model is consistent with a wide range of molecular and archeological evidence.
Collapse
|
42
|
|
43
|
Abstract
We describe techniques for assessing evolutionary trees constructed by the parsimony criteria, when sequences exhibit irregular base compositions. In particular, we extend a recently described frequency-dependent significance test to handle any number of taxa and describe a modification of the Kishino-Hasegawa sites test. These modifications are useful for detecting historical signals beyond those patterns which arise purely from irregular base compositions between the compared sequences. We apply the test to extend our earlier studies on chloroplast origins using 16S rDNA sequences, where a failure to compensate for irregular base compositions between the compared sequences provides statistically significant support for unjustified phylogenetic inferences. We also describe how the techniques can be modified to determine how "tree-like" data are, given independent variation in the base frequencies.
Collapse
|
44
|
Lento GM, Hickson RE, Chambers GK, Penny D. Use of spectral analysis to test hypotheses on the origin of pinnipeds. Mol Biol Evol 1995; 12:28-52. [PMID: 7877495 DOI: 10.1093/oxfordjournals.molbev.a040189] [Citation(s) in RCA: 106] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The evolutionary origin of the pinnipeds (seals, sea lions, and walruses) is still uncertain. Most authors support a hypothesis of a monophyletic origin of the pinnipeds from a caniform carnivore. A minority view suggests a diphyletic origin with true seals being related to the mustelids (otters and ferrets). The phylogenetic relationships of the walrus to other pinniped and carnivore families are also still particularly problematic. Here we examined the relative support for mono- and diphyletic hypotheses using DNA sequence data from the mitochondrial small subunit (12S) rRNA and cytochrome b genes. We first analyzed a small group of taxa representing the three pinniped families (Phocidae, Otariidae, and Odobenidae) and caniform carnivore families thought to be related to them. We inferred phylogenetic reconstructions from DNA sequence data using standard parsimony and neighbor-joining algorithms for phylogenetic inference as well as a new method called spectral analysis (Hendy and Penny) in which phylogenetic information is displayed independently of any selected tree. We identified and compensated for potential sources of error known to lead to selection of incorrect phylogenetic trees. These include sampling error, unequal evolutionary rates on lineages, unequal nucleotide composition among lineages, unequal rates of change at different sites, and inappropriate tree selection criteria. To correct for these errors, we performed additional transformations of the observed substitution patterns in the sequence data, applied more stringent structural constraints to the analyses, and included several additional taxa to help resolve long, unbranched lineages in the tree. We find that there is strong support for a monophyletic origin of the pinnipeds from within the caniform carnivores, close to the bear/raccoon/panda radiation. Evidence for a diphyletic origin was very weak and can be partially attributed to unequal nucleotide compositions among the taxa analyzed. Subsequently, there is slightly more evidence for grouping the walrus with the eared seals versus the true seals. A more conservative interpretation, however, is that the walrus is an early, but not the first, independent divergence from the common pinniped ancestor.
Collapse
|
45
|
Redington A, Penny D. Regional ventricular wall motion abnormalities in tricuspid atresia after the Fontan procedure: flawed methodology may lead to a spurious finding of hypokinesia. J Am Coll Cardiol 1994; 24:271. [PMID: 8006279 DOI: 10.1016/0735-1097(94)90575-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
We believe that the two-frame method described by Akagi et al. cannot adequately describe the highly abnormal wall motion characteristics of these post-Fontan ventricles, and the systolic hypokinesia they describe may be spurious. Our data show that the predominant abnormality is incoordinate relaxation of the ventricular wall, which in turn prolongs the time constant of relaxation and the isovolumetric relaxation time and leads to reduced early rapid filling. Indeed, it was these abnormalities of diastolic, not systolic, function that were the strongest predictor of poor exercise performance in our study of patients late after the Fontan procedure. We strongly believe that the analysis of ventricular wall motion requires sequential data throughout the cardiac cycle, with well defined reference points concerning the timing of cardiac events, so that misinterpretation can be avoided.
Collapse
|
46
|
Hendy MD, Penny D, Steel MA. A discrete Fourier analysis for evolutionary trees. Proc Natl Acad Sci U S A 1994; 91:3339-43. [PMID: 8159749 PMCID: PMC43572 DOI: 10.1073/pnas.91.8.3339] [Citation(s) in RCA: 58] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Discrete Fourier transformations have recently been developed to model the evolution of two-state characters (the Cavender/Farris model). We report here the extension of these transformations to provide invertible relationships between a phylogenetic tree T (with three probability parameters of nucleotide substitution on each edge corresponding to Kimura's 3ST model) and the expected frequencies of the nucleotide patterns in the sequences. We refer to these relationships as spectral analysis. In either model with independent and identically distributed site substitutions, spectral analysis allows a global correction for all multiple substitutions (second- and higher-order interactions), independent of any particular tree. From these corrected data we use a least-squares selection procedure, the closest tree algorithm, to infer an evolutionary tree. Other selection criteria such as parsimony or compatibility analysis could also be used; each of these criteria will be statistically consistent for these models. The closest tree algorithm selects a unique best-fit phylogenetic tree together with independent edge length parameters for each edge. The method is illustrated with an analysis of some primate hemoglobin sequences.
Collapse
|
47
|
Charleston MA, Hendy MD, Penny D. The effects of sequence length, tree topology, and number of taxa on the performance of phylogenetic methods. J Comput Biol 1994; 1:133-51. [PMID: 8790460 DOI: 10.1089/cmb.1994.1.133] [Citation(s) in RCA: 26] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Simulations were used to study the performance of several character-based and distance-based phylogenetic methods in obtaining the correct tree from pseudo-randomly generated input data. The study included all the topologies of unrooted binary trees with from 4 to 10 pendant vertices (taxa) inclusive. The length of the character sequences used ranged from 10 to 10(5) characters exponentially. The methods studied include Closest Tree, Compatibility, Li's method, Maximum Parsimony, Neighbor-joining, Neighborliness, and UPGMA. We also provide a modification to Li's method (SimpLi) which is consistent with additive data. We give estimations of the sequence lengths required for given confidence in the output of these methods under the assumptions of molecular evolution used in this study. A notation for characterizing all tree topologies is described. We show that when the number of taxa, the maximum path length, and the minimum edge length are held constant, there it little but significant dependence of the performance of the methods on the tree topology. We show that those methods that are consistent with the model used perform similarly, whereas the inconsistent methods, UPGMA and Li's method, perform very poorly.
Collapse
|
48
|
|
49
|
|
50
|
Steel MA, Lockhart PJ, Penny D. Confidence in evolutionary trees from biological sequence data. Nature 1993; 364:440-2. [PMID: 8332213 DOI: 10.1038/364440a0] [Citation(s) in RCA: 122] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The reliable construction of evolutionary trees from nucleotide sequences often depends on randomization tests such as the bootstrap and PTP (cladistic permutation tail probability) tests. The genomes of bacteria, viruses, animals and plants, however, vary widely in their nucleotide frequencies. Where genomes have independently acquired similar G+C base compositions, signals in the data arise that cause methods of evolutionary tree reconstruction to estimate the wrong tree by grouping together sequences with similar G+C content. Under these conditions randomization tests can lead to both the rejection of the correct evolutionary hypothesis and acceptance of an incorrect hypothesis (such as with the contradictory inferences from the photosynthetic rbcS and rbcL sequences). We have proposed one approach to testing for G+C content problem. Here we present a formalization of this method, a frequency-dependent significance test, which has general application.
Collapse
|