1
|
Azouri D, Granit O, Alburquerque M, Mansour Y, Pupko T, Mayrose I. The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement Learning. Mol Biol Evol 2024; 41:msae105. [PMID: 38829798 PMCID: PMC11180600 DOI: 10.1093/molbev/msae105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 05/17/2024] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning-based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning-based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.
Collapse
Affiliation(s)
- Dana Azouri
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Oz Granit
- Balvatnik School of Computer Science, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Yishay Mansour
- Balvatnik School of Computer Science, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel
| |
Collapse
|
2
|
Tao Q, Barba-Montoya J, Huuki LA, Durnan MK, Kumar S. Relative Efficiencies of Simple and Complex Substitution Models in Estimating Divergence Times in Phylogenomics. Mol Biol Evol 2021; 37:1819-1831. [PMID: 32119075 PMCID: PMC7253201 DOI: 10.1093/molbev/msaa049] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The conventional wisdom in molecular evolution is to apply parameter-rich models of nucleotide and amino acid substitutions for estimating divergence times. However, the actual extent of the difference between time estimates produced by highly complex models compared with those from simple models is yet to be quantified for contemporary data sets that frequently contain sequences from many species and genes. In a reanalysis of many large multispecies alignments from diverse groups of taxa, we found that the use of the simplest models can produce divergence time estimates and credibility intervals similar to those obtained from the complex models applied in the original studies. This result is surprising because the use of simple models underestimates sequence divergence for all the data sets analyzed. We found three fundamental reasons for the observed robustness of time estimates to model complexity in many practical data sets. First, the estimates of branch lengths and node-to-tip distances under the simplest model show an approximately linear relationship with those produced by using the most complex models applied on data sets with many sequences. Second, relaxed clock methods automatically adjust rates on branches that experience considerable underestimation of sequence divergences, resulting in time estimates that are similar to those from complex models. And, third, the inclusion of even a few good calibrations in an analysis can reduce the difference in time estimates from simple and complex models. The robustness of time estimates to model complexity in these empirical data analyses is encouraging, because all phylogenomics studies use statistical models that are oversimplified descriptions of actual evolutionary substitution processes.
Collapse
Affiliation(s)
- Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Jose Barba-Montoya
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Louise A Huuki
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Mary Kathleen Durnan
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
3
|
Azouri D, Abadi S, Mansour Y, Mayrose I, Pupko T. Harnessing machine learning to guide phylogenetic-tree search algorithms. Nat Commun 2021; 12:1983. [PMID: 33790270 PMCID: PMC8012635 DOI: 10.1038/s41467-021-22073-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 02/26/2021] [Indexed: 02/01/2023] Open
Abstract
Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees.
Collapse
Affiliation(s)
- Dana Azouri
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel
| | - Shiran Abadi
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel
| | - Yishay Mansour
- Balvatnik School of Computer Science, Tel-Aviv University, Ramat Aviv, Tel-Aviv, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel.
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel.
| |
Collapse
|
4
|
Song J, Sun J, Kim S. Geographic variation ofgranulilittorina exigua(littorinidae, gastropoda) in Korea based on the mitochondrial cytochromebgene sequence. ACTA ACUST UNITED AC 2010. [DOI: 10.1080/12265071.2000.9647555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
5
|
The effect of heterotachy in multigene analysis using the neighbor joining method. Mol Phylogenet Evol 2009; 52:846-51. [DOI: 10.1016/j.ympev.2009.05.025] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2008] [Revised: 05/12/2009] [Accepted: 05/23/2009] [Indexed: 11/22/2022]
|
6
|
Roettger M, Martin W, Dagan T. A machine-learning approach reveals that alignment properties alone can accurately predict inference of lateral gene transfer from discordant phylogenies. Mol Biol Evol 2009; 26:1931-9. [PMID: 19443855 DOI: 10.1093/molbev/msp105] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Among the methods currently used in phylogenomic practice to detect the presence of lateral gene transfer (LGT), one of the most frequently employed is the comparison of gene tree topologies for different genes. In cases where the phylogenies for different genes are incompatible, or discordant, for well-supported branches there are three simple interpretations for the result: 1) gene duplications (paralogy) followed by many independent gene losses have occurred, 2) LGT has occurred, or 3) the phylogeny is well supported but for reasons unknown is nonetheless incorrect. Here, we focus on the third possibility by examining the properties of 22,437 published multiple sequence alignments, the Bayesian maximum likelihood trees for which either do or do not suggest the occurrence of LGT by the criterion of discordant branches. The alignments that produce discordant phylogenies differ significantly in several salient alignment properties from those that do not. Using a support vector machine, we were able to predict the inference of discordant tree topologies with up to 80% accuracy from alignment properties alone.
Collapse
Affiliation(s)
- Mayo Roettger
- Institut für Botanik III, Heinrich-Heine Universität Düsseldorf, Germany.
| | | | | |
Collapse
|
7
|
Gong L, Song X, Li M, Guo W, Hu L, Tian Q, Yang Y, Zhang Y, Zhong X, Wang D, Liu B. Extent and pattern of genetic differentiation within and between phenotypic populations ofLeymus chinensis(Poaceae) revealed by AFLP analysis. ACTA ACUST UNITED AC 2007. [DOI: 10.1139/b07-072] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The extent and pattern of genetic differentiation between two naturally occurring phenotypes, grey–green leaf (GGL) and yellow–green leaf (YGL), of Leymus chinensis (Trin.) Tzvel., which colonize distinct habitats in the Songnen Prairie in northeast China, were investigated by amplified fragment length polymorphism (AFLP) analysis. Twelve selected AFLP primer pairs amplified 593 reproducible bands, of which 148 (24.96%) were polymorphic among 69 individuals taken from three populations: two natural ones (YGL and GGL1) and one transplanted (GGL2). Cluster analysis based on the AFLP data categorized the plants into distinct groups that are in line with their phenotypes and population origins, thus denoting clear genetic differentiation between the two phenotypes. This, together with their adaptation to contrasting natural habitats, suggests that the two phenotypes probably represent stabilized ecotypes. The grouping was supported by multiple statistical analyses including Mantel’s test, principal coordinate analysis (PCOORDA), and analysis of molecular variance (AMOVA). The GGL phenotype harbors a higher level of within-population genetic diversity than YGL, possibly reflecting selection by habitat heterogeneity. Although GGL2 is largely similar to its original population (GGL1), further diversification since transplantation was evident. Sequence analysis of a subset of phenotype-specific or phenotype-enriched AFLP bands implicated diverse biological functions being involved in ecological adaptation and formation of the two phenotypes.
Collapse
Affiliation(s)
- Lei Gong
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Xinxin Song
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Mu Li
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Wanli Guo
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Lanjuan Hu
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Qin Tian
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Yunfei Yang
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Yufei Zhang
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Xiaofang Zhong
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Deli Wang
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| | - Bao Liu
- Laboratory of Plant Molecular Epigenetics, Institute of Genetics and Cytology, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Grassland Vegetation of the Ministry of Education, Northeast Normal University, Changchun 130024, China
- Key Laboratory for Applied Statistics of the Ministry of Education, Northeast Normal University, Changchun 130024, China
| |
Collapse
|
8
|
Pol D, Siddall ME. Biases in Maximum Likelihood and Parsimony: A Simulation Approach to a 10-Taxon Case. Cladistics 2005; 17:266-281. [DOI: 10.1111/j.1096-0031.2001.tb00123.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
9
|
Zhang X, Marchant A, Wilson KL, Bruhl JJ. Phylogenetic relationships of Carpha and its relatives (Schoeneae, Cyperaceae) inferred from chloroplast trnL intron and trnL–trnF intergenic spacer sequences. Mol Phylogenet Evol 2004; 31:647-57. [PMID: 15062800 DOI: 10.1016/j.ympev.2003.09.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2003] [Revised: 09/08/2003] [Indexed: 10/26/2022]
Abstract
Within the tribe Schoeneae (Cyperaceae), the relationships between Carpha and its relatives have not been certain, and the limits and definition of Carpha have been controversial. Further, the relationships of species within Carpha have been unclear. In this study, cladistic analyses based on chloroplast trnL intron and trnL-trnF intergenic spacer sequence data were undertaken to estimate phylogenetic relationships in and around Carpha. This study found that Trianoptiles is sister to Carpha; Ptilothrix is sister to Cyathochaeta rather than to Carpha as suggested by some former authors; and Gymnoschoenus is distant from Carpha and its close relatives. The merging of Schoenoides back into Oreobolus is supported. The findings also revealed the non-monophyletic status of Costularia and of Schoenus, and indicated the phylogenetic relationships of species within Carpha.
Collapse
Affiliation(s)
- Xiufu Zhang
- Botany, University of New England, Armidale NSW 2351, Australia
| | | | | | | |
Collapse
|
10
|
Schwarzott D, Walker C, Schüssler A. Glomus, the largest genus of the arbuscular mycorrhizal fungi (Glomales), is nonmonophyletic. Mol Phylogenet Evol 2001; 21:190-7. [PMID: 11697915 DOI: 10.1006/mpev.2001.1007] [Citation(s) in RCA: 131] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Arbuscular mycorrhizal (AM) fungi form a widespread and ecologically important symbiosis with plants in the land ecosystem. The phylogeny of the largest presently accepted genus, Glomus, of the monogeneric family Glomaceae (Glomales; AM fungi) was analyzed. Phylogenetic trees were computed from nearly full-length SSU rRNA gene sequences of 30 isolates, and show that "Glomus" is not monophyletic. Even after the very recent separation of Archaeospora and Paraglomus from "Glomus," the genus further separates into two suprageneric clades. One of them diverges further into two subclades, differing by phylogenetic distances equivalent to family level. The other, comprising Glomus versiforme, G. spurcum, and a species morphologically similar to G. etunicatum, is not closely related to the Glomaceae, but clusters together with the Acaulosporaceae and Gigasporaceae in a monophyletic clade. Based on the molecular evidence, a new family, separate from the Glomaceae, is required to accommodate this group of organisms, initially named Diversisporaceae fam. ined. The current taxonomic concept of the recently erected family Archaeosporaceae also requires future emendation, because Geosiphon pyriformis (Geosiphonaceae) renders Archaeospora, the sole genus formally included in this family, paraphyletic. The suborders Gigasporineae and Glominaeae are not congruent with the natural phylogeny of the AM fungi. Our data necessitate a general reexamination of the generic concepts within the Glomales. In addition to the new family structure hypothesized herein, establishment of at least three new genera will be necessary in the future.
Collapse
Affiliation(s)
- D Schwarzott
- Institute of Botany, Technische Universität Darmstadt, Schnittspahnstrasse 10, D-64287 Darmstadt, Germany
| | | | | |
Collapse
|
11
|
Analysis of partial Glomales SSU rRNA gene sequences: implications for primer design and phylogeny. ACTA ACUST UNITED AC 2001. [DOI: 10.1017/s0953756200003725] [Citation(s) in RCA: 101] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
12
|
Murphy WJ, Thomerson JE, Collier GE. Phylogeny of the Neotropical killifish family Rivulidae (Cyprinodontiformes, Aplocheiloidei) inferred from mitochondrial DNA sequences. Mol Phylogenet Evol 1999; 13:289-301. [PMID: 10603257 DOI: 10.1006/mpev.1999.0656] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Phylogenetic relationships of 70 taxa representing 68 species of the Neotropical killifish family Rivulidae were derived from analysis of 1516 nucleotides sampled from four different segments of the mitochondrial genome: 12S rRNA, 16S rRNA, cytochrome oxidase I, and cytochrome b. The basal bifurcation of Cynolebiatinae and Rivulinae (Costa, 1990a,b) is supported; however, Terranatos, Maratecoara, and Plesiolebias are rivulins, not cynolebiatins. These three genera, along with the other recognized annual rivulin genera, form a monophyletic clade. Austrofundulus, Rachovia, Renova, Terranatos, and 3 species of the genus Pterolebias, all from northeastern South America, form a monophyletic clade excluding other species of Pterolebias. Pterolebias as presently understood is clearly polyphyletic. Trigonectes and Moema are supported as sister groups but do not form a monophyletic group with the genera Neofundulus and Renova as previously proposed. The suite of adaptations necessary for an annual life history has clearly been lost several times in the course of rivulid evolution. Also revealed is a considerable increase in substitution rate in most annual lineages relative to the nonannual Rivulus species. The widespread and speciose genus Rivulus is paraphyletic, representing both basal and terminal clades within the Rivulidae. Previous hypotheses regarding the vicariant origin of Greater Antillean Rivulus species are supported. Most rivulid clades show considerable endemism; thus, detailed analysis of rivulid phylogeny and distribution will contribute robust hypotheses to the clarification of Neotropical biogeography.
Collapse
Affiliation(s)
- W J Murphy
- Department of Biological Sciences, The University of Tulsa, 600 South College Avenue, Tulsa, Oklahoma 74104, USA
| | | | | |
Collapse
|
13
|
Siddall ME. Success of Parsimony in the Four-Taxon Case: Long-Branch Repulsion by Likelihood in the Farris Zone. Cladistics 1998; 14:209-220. [DOI: 10.1111/j.1096-0031.1998.tb00334.x] [Citation(s) in RCA: 156] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
14
|
Mahieux R, Ibrahim F, Mauclere P, Herve V, Michel P, Tekaia F, Chappey C, Garin B, Van Der Ryst E, Guillemain B, Ledru E, Delaporte E, de The G, Gessain A. Molecular epidemiology of 58 new African human T-cell leukemia virus type 1 (HTLV-1) strains: identification of a new and distinct HTLV-1 molecular subtype in Central Africa and in Pygmies. J Virol 1997; 71:1317-33. [PMID: 8995656 PMCID: PMC191187 DOI: 10.1128/jvi.71.2.1317-1333.1997] [Citation(s) in RCA: 105] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
To gain new insights on the origin, evolution, and modes of dissemination of human T-cell leukemia virus type I (HTLV-1), we performed a molecular analysis of 58 new African HTLV-1 strains (18 from West Africa, 36 from Central Africa, and 4 from South Africa) originating from 13 countries. Of particular interest were eight strains from Pygmies of remote areas of Cameroon and the Central African Republic (CAR), considered to be the oldest inhabitants of these regions. Eight long-term activated T-cell lines producing HTLV-1 gag and env antigens were established from peripheral blood mononuclear cell cultures of HTLV-1 seropositive individuals, including three from Pygmies. A fragment of the env gene encompassing most of the gp21 transmembrane region was sequenced for the 58 new strains, while the complete long terminal repeat (LTR) region was sequenced for 9 strains, including 4 from Pygmies. Comparative sequence analyses and phylogenetic studies performed on both the env and LTR regions by the neighbor-joining and DNA parsimony methods demonstrated that all 22 strains from West and South Africa belong to the widespread cosmopolitan subtype (also called HTLV-1 subtype A). Within or alongside the previously described Zairian cluster (HTLV-1 subtype B), we discovered a number of new HTLV-1 variants forming different subgroups corresponding mainly to the geographical origins of the infected persons, Cameroon, Gabon, and Zaire. Six of the eight Pygmy strains clustered together within this Central African subtype, suggesting a common origin. Furthermore, three new strains (two originating from Pygmies from Cameroon and the CAR, respectively, and one from a Gabonese individual) were particularly divergent and formed a distinct new phylogenetic cluster, characterized by specific mutations and occupying in most analyses a unique phylogenetic position between the large Central African genotype (HTLV-1 subtype B) and the Melanesian subtype (HTLV-1 subtype C). We have tentatively named this new HTLV-1 genotype HTLV-1 subtype D. While the HTLV-1 subtype D strains were not closely related to any known African strain of simian T-cell leukemia virus type 1 (STLV-1), other Pygmy strains and some of the new Cameroonian and Gabonese HTLV-1 strains were very similar (>98% nucleotide identity) to chimpanzee STLV-1 strains, reinforcing the hypothesis of interspecies transmission between humans and monkeys in Central Africa.
Collapse
Affiliation(s)
- R Mahieux
- Unité d'Epidémiologie des Virus Oncogènes, Institut Pasteur, Paris, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
Recent developments of statistical methods in molecular phylogenetics are reviewed. It is shown that the mathematical foundations of these methods are not well established, but computer simulations and empirical data indicate that currently used methods such as neighbor joining, minimum evolution, likelihood, and parsimony methods produce reasonably good phylogenetic trees when a sufficiently large number of nucleotides or amino acids are used. However, when the rate of evolution varies extensively from branch to branch, many methods may fail to recover the true topology. Solid statistical tests for examining the accuracy of trees obtained by neighbor joining, minimum evolution, and least-squares method are available, but the methods for likelihood and parsimony trees are yet to be refined. Parsimony, likelihood, and distance methods can all be used for inferring amino acid sequences of the proteins of ancestral organisms that have become extinct.
Collapse
Affiliation(s)
- M Nei
- Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park 16802, USA.
| |
Collapse
|
16
|
Leitner T, Escanilla D, Franzén C, Uhlén M, Albert J. Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc Natl Acad Sci U S A 1996; 93:10864-9. [PMID: 8855273 PMCID: PMC38248 DOI: 10.1073/pnas.93.20.10864] [Citation(s) in RCA: 177] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Phylogenetic analyses are increasingly used in attempts to clarify transmission patterns of human immunodeficiency virus type 1 (HIV-1), but there is a continuing discussion about their validity because convergent evolution and transmission of minor HIV variants may obscure epidemiological patterns. Here we have studied a unique HIV-1 transmission cluster consisting of nine infected individuals, for whom the time and direction of each virus transmission was exactly known. Most of the transmissions occurred between 1981 and 1983, and a total of 13 blood samples were obtained approximately 2-12 years later. The p17 gag and env V3 regions of the HIV-1 genome were directly sequenced from uncultured lymphocytes. A true phylogenetic tree was constructed based on the knowledge about when the transmissions had occurred and when the samples were obtained. This complex, known HIV-1 transmission history was compared with reconstructed molecular trees, which were calculated from the DNA sequences by several commonly used phylogenetic inference methods [Fitch-Margoliash, neighbor-joining, minimum-evolution, maximum-likelihood, maximum-parsimony, unweighted pair group method using arithmetic averages (UPGMA), and a Fitch-Margoliash method assuming a molecular clock (KITSCH)]. A majority of the reconstructed trees were good estimates of the true phylogeny; 12 of 13 taxa were correctly positioned in the most accurate trees. The choice of gene fragment was found to be more important than the choice of phylogenetic method and substitution model. However, methods that are sensitive to unequal rates of change performed more poorly (such as UPGMA and KITSCH, which assume a constant molecular clock). The rapidly evolving V3 fragment gave better reconstructions than p17, but a combined data set of both p17 and V3 performed best. The accuracy of the phylogenetic methods justifies their use in HIV-1 research and argues against convergent evolution and selective transmission of certain virus variants.
Collapse
Affiliation(s)
- T Leitner
- Department of Clinical Virology, Swedish Institute for Infectious Disease Control, Karolinska Institute, Stockholm, Sweden
| | | | | | | | | |
Collapse
|
17
|
Gehrig H, Schüssler A, Kluge M. Geosiphon pyriforme, a fungus forming endocytobiosis with Nostoc (cyanobacteria), is an ancestral member of the Glomales: evidence by SSU rRNA analysis. J Mol Evol 1996; 43:71-81. [PMID: 8660431 DOI: 10.1007/bf02352301] [Citation(s) in RCA: 124] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Geosiphon pyriforme inhabiting the surface of humid soils represents the only known example of endocytobiosis between a fungus (Zygomycotina; macrosymbiont) and cyanobacteria (Nostoc; endosymbiont). In order to elucidate the taxonomical and evolutionary relationship of Geosiphon pyriforme to fungi forming arbuscular mycorrhiza (AM fungi), the small-subunit (SSU) ribosomal RNA genes of Geosiphon pyriforme and Glomus versiforme (Glomales; a typical AM fungus) were analyzed and aligned with SSU rRNA sequences of several Basidiomycetes, Ascomycetes, Chytridiomycetes, and Zygomycetes, together with all AM-fungal (Glomales) sequences published yet. The distinct group of the order Glomales, which includes Geosiphon, does not form a clade with any other group of Zygomycetes. Within the Glomales, two main lineages exist. One includes the families Gigasporaceae and Acaulosporaceae; the other one is represented by the genus Glomus, the members of which are very divergent. Glomus etunicatum and Geosiphon pyriforme both form independent lineages ancestral to the Glomales. The data provided by the present paper confirm clearly that Geosiphon represents a fungus belonging to the Glomales. The question remains still open as to whether or not Geosiphon is to be placed within or outside the genus Glomus, since this genus is probably polyphyletic and not well defined yet. Geosiphon shows the ability of a Glomus-like fungus to form a "primitive" symbiosis with a unicellular photoautotrophic organism, in this case a cyanobacterium, leading to the conclusion that a hypothetical association of a Glomus-like fungus with a green alga as a step during the evolution of the land plants appears probable.
Collapse
Affiliation(s)
- H Gehrig
- Institut für Botanik, der Technischen Hochschule Darmstadt, Germany
| | | | | |
Collapse
|
18
|
Bunyard BA, Chaichuchote S, Nicholson MS, Royse DJ. Ribosomal DNA analysis for resolution of genotypic classes of Pleurotus. ACTA ACUST UNITED AC 1996. [DOI: 10.1016/s0953-7562(96)80112-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
Gessain A, de Thé G. Geographic and molecular epidemiology of primate T lymphotropic retroviruses: HTLV-I, HTLV-II, STLV-I, STLV-PP, and PTLV-L. Adv Virus Res 1996; 47:377-426. [PMID: 8895837 DOI: 10.1016/s0065-3527(08)60740-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- A Gessain
- Département du SIDA et des Rétrovirus, Institut Pasteur, Paris, France
| | | |
Collapse
|
20
|
Gessain A, Mahieux R, de Thé G. Genetic variability and molecular epidemiology of human and simian T cell leukemia/lymphoma virus type I. JOURNAL OF ACQUIRED IMMUNE DEFICIENCY SYNDROMES AND HUMAN RETROVIROLOGY : OFFICIAL PUBLICATION OF THE INTERNATIONAL RETROVIROLOGY ASSOCIATION 1996; 13 Suppl 1:S132-45. [PMID: 8797716 DOI: 10.1097/00042560-199600001-00022] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
In the past few years, numerous investigators have demonstrated that human T cell leukemia/lymphoma virus type I (HTLV-I) possesses a great genetic stability, and recent data indicate that viral amplification via clonal expansion of infected cells, rather than by reverse transcription, could explain this remarkable genetic stability. In parallel, the molecular epidemiology of HTLV-I proviruses showed that the few nucleotide changes observed between isolates were specific for the geographical origin of the patients but not for the type of the associated pathologies (adult T cell leukemia/lymphoma, tropical spastic paraparesis/HTLV-I-associated myelopathy). Thus, based on sequence and/or restriction fragment length polymorphism analysis of more than 250 HTLV-I isolates originating from the main viral endemic areas, three major molecular geographical subtypes (or genotypes) emerged, strongly supported by phylogenetic analysis (high bootstrap values). Each of these genotypes (Cosmopolitan, Central African, and Melanesian) appeared to arise from ancient interspecies transmission between monkeys infected with simian T cell leukemia/lymphoma virus type I and humans. Furthermore, careful sequences analyses indicate that, within (or alongside) these three main genotypes, there are molecular subgroups defined clearly by several specific mutations but not always supported by phylogenetic analyses. Thus in Japan, there is evidence for two ancestral HTLV-I lineages: the classical Cosmopolitan genotype, representing approximately 25% of the HTLV-I present in Japan and clustering in the southern islands; and a related subgroup that we called the Japanese group. Similarly, within the Central African cluster, there are molecular subgroups defined by specific substitutions in either the env or the long terminal repeat. Furthermore, recent data from our laboratory indicate the presence of a new molecular phylogenetic group (fourth genotype) found among inhabitants of Central Africa, particularly in Pygmies. While geographical subtypes vary from 2 to 8% between themselves, HTLV-I quasi-species present within an individual appear to be much lower, with a variability of < 0.5%.
Collapse
Affiliation(s)
- A Gessain
- Départment du SIDA et des Rétrovirus, Institut Pasteur, Paris, France
| | | | | |
Collapse
|
21
|
Ibrahim F, de Thé G, Gessain A. Isolation and characterization of a new simian T-cell leukemia virus type 1 from naturally infected celebes macaques (Macaca tonkeana): complete nucleotide sequence and phylogenetic relationship with the Australo-Melanesian human T-cell leukemia virus type 1. J Virol 1995; 69:6980-93. [PMID: 7474117 PMCID: PMC189617 DOI: 10.1128/jvi.69.11.6980-6993.1995] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
A study of simian T-cell leukemia virus type 1 (STLV-1) infection in a captive colony of 23 Macaca tonkeana macaques indicated that 17 animals had high human T-cell leukemia virus type 1 (HTLV-1) antibody titers. Genealogical analysis suggested mainly a mother-to-offspring transmission of this STLV-1. Three long-term T-cell lines, established from peripheral blood mononuclear cell cultures from three STLV-1-seropositive monkeys, produced HTLV-1 Gag and Env antigens and retroviral particles. The first complete nucleotide sequence of an STLV-1 (9,025 bp), obtained for one of these isolates, indicated an overall genetic organization similar to that of HTLV-1 but with a nucleotide variability for the structural genes ranging from 7.8 to 13.1% compared with the HTLV-1 ATK and STLV-1 PTM3 Asian prototypes. The Tax and Rex regulatory proteins were well conserved, while the pX region, known to encode new proteins in HTLV-1 (open reading frames I and II), was more divergent than that in the ATK strain. Furthermore, a fragment of 522 bp of the gp21 env gene from uncultured peripheral blood mononuclear cell DNAs from five of the STLV-1-infected monkeys was sequenced. Phylogenetic trees constructed with the long terminal repeat and env (gp46 and gp21) regions demonstrated that this new STLV-1 occupies a unique position within the Asian STLV-1 and HTLV-1 isolates, being, by most analyses, related more to the Australo-Melanesian HTLV-1 topotype than to any other Asian STLV-1. These data raise new hypotheses on the possible interspecies viral transmission between monkeys carrying STLV-1 and early Australoid settlers, ancestors of the present day Australo-Melanesian inhabitants, during their migrations from the Southeast Asian land mass to the greater Australian continent.
Collapse
Affiliation(s)
- F Ibrahim
- Unité d'Epidémiologie des Virus Oncogènes, Institut Pasteur, France
| | | | | |
Collapse
|
22
|
|