1
|
Sun N, Ma XY, Shi GH, Yang XH, Li W, Feng CG, Mi D, Li GG, Lu JQ. Chromosome-level genome provides insight into the evolution and conservation of the threatened goral (Naemorhedus goral). BMC Genomics 2024; 25:92. [PMID: 38254015 PMCID: PMC10804785 DOI: 10.1186/s12864-024-09987-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
BACKGROUND Gorals Naemorhedus resemble both goats and antelopes, which prompts much debate about the intragenus species delimitation and phylogenetic status of the genus Naemorhedus within the subfamily Caprinae. Their evolution is believed to be linked to the uplift of the Qinghai-Tibet Plateau (QTP). To better understand its phylogenetics, the genetic information is worth being resolved. RESULTS Based on a sample from the eastern margin of QTP, we constructed the first reference genome for Himalayan goral Naemorhedus goral, using PacBio long-read sequencing and Hi-C technology. The 2.59 Gb assembled genome had a contig N50 of 3.70 Mb and scaffold N50 of 106.66 Mb, which anchored onto 28 pseudo chromosomes. A total of 20,145 protein-coding genes were predicted in the assembled genome, of which 99.93% were functionally annotated. Phylogenetically, the goral was closely related to muskox on the mitochondrial genome level and nested into the takin-muskox clade on the genome tree, rather than other so-called goat-antelopes. The cladogenetic event among muskox, takin and goral occurred sequentially during the late Miocene (~ 11 - 5 Mya), when the QTP experienced a third dramatic uplift with consequent profound changes in climate and environment. Several chromosome fusions and translocations were observed between goral and takin/muskox. The expanded gene families in the goral genome were mainly related to the metabolism of drugs and diseases, so as the positive selected genes. The Ne of goral continued to decrease since ~ 1 Mya during the Pleistocene with active glaciations. CONCLUSION The high-quality goral genome provides insights into the evolution and valuable information for the conservation of this threatened group.
Collapse
Affiliation(s)
- Nan Sun
- School of Life Sciences, Zhengzhou University, 450001, Zhengzhou, Henan, China
| | - Xiao-Ying Ma
- College of Life Sciences, Academy of Plateau Science and Sustainability, Qinghai Normal University, 810008, Xining, Qinghai, China
| | - Guang-Hong Shi
- Qinghai Makehe Forestry Bureau, Golog Tibetan Autonomous Prefecture 814300, Qinghai, China
| | - Xiao-Hong Yang
- Xi'an Haorui Genomics Technology Co., LTD, 710116, Xi'an, Shaanxi, China
| | - Wei Li
- Xi'an Haorui Genomics Technology Co., LTD, 710116, Xi'an, Shaanxi, China
| | - Chen-Guang Feng
- School of Ecology and Environment, Northwestern Polytechnical University, 710129, Xi'an, Shaanxi, China
| | - Da Mi
- Xi'an Haorui Genomics Technology Co., LTD, 710116, Xi'an, Shaanxi, China.
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, 710049, Xi'an, Shaanxi, China.
| | - Guo-Gang Li
- College of Life Sciences, Academy of Plateau Science and Sustainability, Qinghai Normal University, 810008, Xining, Qinghai, China.
| | - Ji-Qi Lu
- School of Life Sciences, Zhengzhou University, 450001, Zhengzhou, Henan, China.
| |
Collapse
|
2
|
Bruno L, Ronchini M, Binelli G, Muto A, Chiappetta A, Bitonti MB, Gerola P. A Study of GUS Expression in Arabidopsis as a Tool for the Evaluation of Gene Evolution, Function and the Role of Expression Derived from Gene Duplication. PLANTS (BASEL, SWITZERLAND) 2023; 12:2051. [PMID: 37653968 PMCID: PMC10221982 DOI: 10.3390/plants12102051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/17/2023] [Accepted: 05/18/2023] [Indexed: 09/02/2023]
Abstract
Gene duplication played a fundamental role in eukaryote evolution and different copies of a given gene can be present in extant species, often with expressions and functions differentiated during evolution. We assume that, when such differentiation occurs in a gene copy, this may be indicated by its maintenance in all the derived species. To verify this hypothesis, we compared the histological expression domains of the three β-glucuronidase genes (AtGUS) present in Arabidopsis thaliana with the GUS evolutionary tree in angiosperms. We found that AtGUS gene expression overlaps in the shoot apex, the floral bud and the root hairs. In the root apex, AtGUS3 expression differs completely from AtGUS1 and AtGUS2, whose transcripts are present in the root cap meristem and columella, in the staminal cell niche, in the epidermis and in the proximal cortex. Conversely, AtGUS3 transcripts are limited to the old border-like cells of calyptra and those found along the protodermal cell line. The GUS evolutionary tree reveals that the two main clusters (named GUS1 and GUS3) originate from a duplication event predating angiosperm radiation. AtGUS3 belongs to the GUS3 cluster, while AtGUS1 and AtGUS2, which originate from a duplication event that occurred in an ancestor of the Brassicaceae family, are found together in the GUS1 cluster. There is another, previously undescribed cluster, called GUS4, originating from a very ancient duplication event. While the copy of GUS4 has been lost in many species, copies of GUS3 and GUS1 have been conserved in all species examined.
Collapse
Affiliation(s)
- Leonardo Bruno
- Dipartimento di Biologia, Ecologia e Scienze della Terra, Università della Calabria, Arcavacata di Rende, 87036 Cosenza, Italy; (A.M.); (A.C.); (M.B.B.)
| | - Matteo Ronchini
- Dipartimento di Scienze Teoriche e Applicate, Università degli Studi dell’Insubria, 21100 Varese, Italy; (M.R.); (P.G.)
| | - Giorgio Binelli
- Dipartimento di Biotecnologie e Scienze della Vita, Università degli Studi dell’Insubria, 21100 Varese, Italy;
| | - Antonella Muto
- Dipartimento di Biologia, Ecologia e Scienze della Terra, Università della Calabria, Arcavacata di Rende, 87036 Cosenza, Italy; (A.M.); (A.C.); (M.B.B.)
| | - Adriana Chiappetta
- Dipartimento di Biologia, Ecologia e Scienze della Terra, Università della Calabria, Arcavacata di Rende, 87036 Cosenza, Italy; (A.M.); (A.C.); (M.B.B.)
| | - Maria Beatrice Bitonti
- Dipartimento di Biologia, Ecologia e Scienze della Terra, Università della Calabria, Arcavacata di Rende, 87036 Cosenza, Italy; (A.M.); (A.C.); (M.B.B.)
| | - Paolo Gerola
- Dipartimento di Scienze Teoriche e Applicate, Università degli Studi dell’Insubria, 21100 Varese, Italy; (M.R.); (P.G.)
| |
Collapse
|
3
|
Cerón-Romero MA, Fonseca MM, de Oliveira Martins L, Posada D, Katz LA. Phylogenomic Analyses of 2,786 Genes in 158 Lineages Support a Root of the Eukaryotic Tree of Life between Opisthokonts and All Other Lineages. Genome Biol Evol 2022; 14:evac119. [PMID: 35880421 PMCID: PMC9366629 DOI: 10.1093/gbe/evac119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2022] [Indexed: 12/02/2022] Open
Abstract
Advances in phylogenomics and high-throughput sequencing have allowed the reconstruction of deep phylogenetic relationships in the evolution of eukaryotes. Yet, the root of the eukaryotic tree of life remains elusive. The most popular hypothesis in textbooks and reviews is a root between Unikonta (Opisthokonta + Amoebozoa) and Bikonta (all other eukaryotes), which emerged from analyses of a single-gene fusion. Subsequent, highly cited studies based on concatenation of genes supported this hypothesis with some variations or proposed a root within Excavata. However, concatenation of genes does not consider phylogenetically-informative events like gene duplications and losses. A recent study using gene tree parsimony (GTP) suggested the root lies between Opisthokonta and all other eukaryotes, but only including 59 taxa and 20 genes. Here we use GTP with a duplication-loss model in a gene-rich and taxon-rich dataset (i.e., 2,786 gene families from two sets of 155 and 158 diverse eukaryotic lineages) to assess the root, and we iterate each analysis 100 times to quantify tree space uncertainty. We also contrasted our results and discarded alternative hypotheses from the literature using GTP and the likelihood-based method SpeciesRax. Our estimates suggest a root between Fungi or Opisthokonta and all other eukaryotes; but based on further analysis of genome size, we propose that the root between Opisthokonta and all other eukaryotes is the most likely.
Collapse
Affiliation(s)
- Mario A Cerón-Romero
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, USA
| | - Miguel M Fonseca
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
| | - Leonardo de Oliveira Martins
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Laura A Katz
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
4
|
Åstrand J, Knight C, Robson J, Talle B, Wilson ZA. Evolution and diversity of the angiosperm anther: trends in function and development. PLANT REPRODUCTION 2021; 34:307-319. [PMID: 34173886 PMCID: PMC8566645 DOI: 10.1007/s00497-021-00416-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 05/28/2021] [Indexed: 05/21/2023]
Abstract
Anther development and dehiscence is considered from an evolutionary perspective to identify drivers for differentiation, functional conservation and to identify key questions for future male reproduction research. Development of viable pollen and its timely release from the anther are essential for fertilisation of angiosperm flowers. The formation and subsequent dehiscence of the anther are under tight regulatory control, and these processes are remarkably conserved throughout the diverse families of the angiosperm clade. Anther development is a complex process, which requires timely formation and communication between the multiple somatic anther cell layers (the epidermis, endothecium, middle layer and tapetum) and the developing pollen. These layers go through regulated development and selective degeneration to facilitate the formation and ultimate release of the pollen grains. Insight into the evolution and divergence of anther development and dehiscence, especially between monocots and dicots, is driving greater understanding of the male reproductive process and increased, resilient crop yields. This review focuses on anther structure from an evolutionary perspective by highlighting their diversity across plant species. We summarise new findings that illustrate the complexities of anther development and evaluate how they challenge established models of anther form and function, and how they may help to deliver future sustainable crop yields.
Collapse
Affiliation(s)
- Johanna Åstrand
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire LE12 5RD UK
| | - Christopher Knight
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire LE12 5RD UK
| | - Jordan Robson
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire LE12 5RD UK
| | - Behzad Talle
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire LE12 5RD UK
| | - Zoe A. Wilson
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire LE12 5RD UK
| |
Collapse
|
5
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
6
|
Li HT, Luo Y, Gan L, Ma PF, Gao LM, Yang JB, Cai J, Gitzendanner MA, Fritsch PW, Zhang T, Jin JJ, Zeng CX, Wang H, Yu WB, Zhang R, van der Bank M, Olmstead RG, Hollingsworth PM, Chase MW, Soltis DE, Soltis PS, Yi TS, Li DZ. Plastid phylogenomic insights into relationships of all flowering plant families. BMC Biol 2021; 19:232. [PMID: 34711223 PMCID: PMC8555322 DOI: 10.1186/s12915-021-01166-2] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/14/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Flowering plants (angiosperms) are dominant components of global terrestrial ecosystems, but phylogenetic relationships at the familial level and above remain only partially resolved, greatly impeding our full understanding of their evolution and early diversification. The plastome, typically mapped as a circular genome, has been the most important molecular data source for plant phylogeny reconstruction for decades. RESULTS Here, we assembled by far the largest plastid dataset of angiosperms, composed of 80 genes from 4792 plastomes of 4660 species in 2024 genera representing all currently recognized families. Our phylogenetic tree (PPA II) is essentially congruent with those of previous plastid phylogenomic analyses but generally provides greater clade support. In the PPA II tree, 75% of nodes at or above the ordinal level and 78% at or above the familial level were resolved with high bootstrap support (BP ≥ 90). We obtained strong support for many interordinal and interfamilial relationships that were poorly resolved previously within the core eudicots, such as Dilleniales, Saxifragales, and Vitales being resolved as successive sisters to the remaining rosids, and Santalales, Berberidopsidales, and Caryophyllales as successive sisters to the asterids. However, the placement of magnoliids, although resolved as sister to all other Mesangiospermae, is not well supported and disagrees with topologies inferred from nuclear data. Relationships among the five major clades of Mesangiospermae remain intractable despite increased sampling, probably due to an ancient rapid radiation. CONCLUSIONS We provide the most comprehensive dataset of plastomes to date and a well-resolved phylogenetic tree, which together provide a strong foundation for future evolutionary studies of flowering plants.
Collapse
Affiliation(s)
- Hong-Tao Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Yang Luo
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Lu Gan
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Peng-Fei Ma
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Lian-Ming Gao
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- Lijiang Forest Ecosystem National Observation and Research Station, Kunming Institute of Botany, Chinese Academy of Sciences, Lijiang, 674100, Yunnan, China
| | - Jun-Bo Yang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Jie Cai
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Matthew A Gitzendanner
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- Biodiversity Institute, University of Florida, Gainesville, FL, 32611, USA
| | - Peter W Fritsch
- Botanical Research Institute of Texas, 1700 University Drive, Fort Worth, TX, 76017, USA
| | - Ting Zhang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Jian-Jun Jin
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY, 10025, USA
| | - Chun-Xia Zeng
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Hong Wang
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Wen-Bin Yu
- Center for Integrative Conservation, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, 666303, Yunnan, China
| | - Rong Zhang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Michelle van der Bank
- Department of Botany & Plant Biotechnology, University of Johannesburg, PO Box 524, Auckland Park, Johannesburg, Gauteng, 2006, South Africa
| | - Richard G Olmstead
- Department of Biology and Burke Museum, University of Washington, Seattle, WA, 98195-5325, USA
| | | | - Mark W Chase
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, England, UK
- Department of Environment and Agriculture, Curtin University, Bentley, Western Australia, 6102, Australia
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- Biodiversity Institute, University of Florida, Gainesville, FL, 32611, USA
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- Biodiversity Institute, University of Florida, Gainesville, FL, 32611, USA
- Department of Biology, University of Florida, Gainesville, FL, 32611, USA
| | - Ting-Shuang Yi
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China.
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650201, Yunnan, China.
| | - De-Zhu Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China.
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, 650201, Yunnan, China.
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China.
| |
Collapse
|
7
|
Genome-scale reconstructions to assess metabolic phylogeny and organism clustering. PLoS One 2020; 15:e0240953. [PMID: 33373364 PMCID: PMC7771690 DOI: 10.1371/journal.pone.0240953] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Accepted: 11/28/2020] [Indexed: 12/28/2022] Open
Abstract
Approaches for systematizing information of relatedness between organisms is important in biology. Phylogenetic analyses based on sets of highly conserved genes are currently the basis for the Tree of Life. Genome-scale metabolic reconstructions contain high-quality information regarding the metabolic capability of an organism and are typically restricted to metabolically active enzyme-encoding genes. While there are many tools available to generate draft reconstructions, expert-level knowledge is still required to generate and manually curate high-quality genome-scale metabolic models and to fill gaps in their reaction networks. Here, we use the tool AutoKEGGRec to construct 975 genome-scale metabolic draft reconstructions encoded in the KEGG database without further curation. The organisms are selected across all three domains, and their metabolic networks serve as basis for generating phylogenetic trees. We find that using all reactions encoded, these metabolism-based comparisons give rise to a phylogenetic tree with close similarity to the Tree of Life. While this tree is quite robust to reasonable levels of noise in the metabolic reaction content of an organism, we find a significant heterogeneity in how much noise an organism may tolerate before it is incorrectly placed in the tree. Furthermore, by using the protein sequences for particular metabolic functions and pathway sets, such as central carbon-, nitrogen-, and sulfur-metabolism, as basis for the organism comparisons, we generate highly specific phylogenetic trees. We believe the generation of phylogenetic trees based on metabolic reaction content, in particular when focused on specific functions and pathways, could aid the identification of functionally important metabolic enzymes and be of value for genome-scale metabolic modellers and enzyme-engineers.
Collapse
|
8
|
Górecki P, Markin A, Eulenstein O. Exact median-tree inference for unrooted reconciliation costs. BMC Evol Biol 2020; 20:136. [PMID: 33115401 PMCID: PMC7593691 DOI: 10.1186/s12862-020-01700-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Solving median tree problems under tree reconciliation costs is a classic and well-studied approach for inferring species trees from collections of discordant gene trees. These problems are NP-hard, and therefore are, in practice, typically addressed by local search heuristics. So far, however, such heuristics lack any provable correctness or precision. Further, even for small phylogenetic studies, it has been demonstrated that local search heuristics may only provide sub-optimal solutions. Obviating such heuristic uncertainties are exact dynamic programming solutions that allow solving tree reconciliation problems for smaller phylogenetic studies. Despite these promises, such exact solutions are only suitable for credibly rooted input gene trees, which constitute only a tiny fraction of the readily available gene trees. Standard gene tree inference approaches provide only unrooted gene trees and accurately rooting such trees is often difficult, if not impossible. Results Here, we describe complex dynamic programming solutions that represent the first nonnaïve exact solutions for solving the tree reconciliation problems for unrooted input gene trees. Further, we show that the asymptotic runtime of the proposed solutions does not increase when compared to the most time-efficient dynamic programming solutions for rooted input trees. Conclusions In an experimental evaluation, we demonstrate that the described solutions for unrooted gene trees are, like the solutions for rooted input gene trees, suitable for smaller phylogenetic studies. Finally, for the first time, we study the accuracy of classic local search heuristics for unrooted tree reconciliation problems.
Collapse
Affiliation(s)
- Paweł Górecki
- University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, 02-097, Poland.
| | - Alexey Markin
- Department of Computer Science, Iowa State University, Atanasoff Hall 212, Ames, 50011, USA
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Atanasoff Hall 212, Ames, 50011, USA
| |
Collapse
|
9
|
Molloy EK, Warnow T. FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics 2020; 36:i57-i65. [PMID: 32657396 PMCID: PMC7355287 DOI: 10.1093/bioinformatics/btaa444] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed. RESULTS We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods. AVAILABILITY AND IMPEMENTATION FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
10
|
Bateman RM. Hunting the Snark: the flawed search for mythical Jurassic angiosperms. JOURNAL OF EXPERIMENTAL BOTANY 2020; 71:22-35. [PMID: 31538196 DOI: 10.1093/jxb/erz411] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 09/03/2019] [Indexed: 06/10/2023]
Abstract
Several recent palaeobotanical studies claim to have found and described pre-Cretaceous angiosperm macrofossils. With rare exceptions, these papers fail to define a flower, do not acknowledge that fossils require character-based rather than group-based classification, do not explicitly state which morphological features would unambiguously identify a fossil as angiospermous, ignore the modern conceptual framework of phylogeny reconstruction, and infer features in the fossils in question that are interpreted differently by (or even invisible to) other researchers. This unfortunate situation is compounded by the relevant fossils being highly disarticulated two-dimensional compression-impressions lacking anatomical preservation. Given current evidence, all supposed pre-Cretaceous angiosperms are assignable to other major clades among the gymnosperms sensu lato. By any workable morphological definition, flowers are not confined to, and therefore cannot delimit, the angiosperm clade. More precisely defined character states that are potentially diagnostic of angiosperms must by definition originate on the phylogenetic branch that immediately precedes the angiosperm crown group. Although the most reliable candidates for diagnostic characters (triploid endosperm reflecting double fertilization, closed carpel, bitegmic ovule, and phloem companion cells) are rarely preserved and/or difficult to detect unambiguously, similar characters have occasionally been preserved in high-quality permineralized non-angiosperm fossils. The angiosperm radiation documented by Early Cretaceous fossils involves only lineages closely similar to extant taxonomic families, lacks obvious morphological gaps, and (as agreed by both the fossil record and molecular phylogenies) was relatively rapid-all features that suggest a primary radiation. It is unlikely that ancestors of the crown group common ancestor would have fulfilled a character-based definition of (and thereby required expansion of the concept of) an angiosperm; they would instead form a new element of the non-angiosperm members of the 'anthophyte' grade, competing with Caytonia to be viewed as morphologically determined sister group for angiosperms. Conclusions drawn from molecular phylogenetics should not be allowed to routinely constrain palaeobotanical inferences; reciprocal illumination between different categories of data offers greater explanatory power than immediately resorting to Grand Syntheses. The Jurassic angiosperm-essentially a product of molecular phylogenetics-may have become the holy grail of palaeobotany but it appears equally mythical.
Collapse
|
11
|
Zhang LY, Ming H, Meng XL, Fang BZ, Jiao JY, Salam N, Zhang XT, Li WJ, Nie GX. Ornithinimicrobium cavernae sp. nov., an actinobacterium isolated from a karst cave. Antonie van Leeuwenhoek 2018; 112:179-186. [PMID: 30123944 DOI: 10.1007/s10482-018-1141-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 08/10/2018] [Indexed: 11/26/2022]
Abstract
A novel actinobacterium, designated strain CFH 30183T, was isolated from a soil sample collected from a karst cave in Luoyang, Henan Province. The taxonomic position of the strain was investigated using a polyphasic approach. Cells of strain CFH 30183T were observed to be Gram-stain positive, motile, asporogenous and coccoid to rod shaped. The strain was found to be aerobic and oxidase positive. On the basis of 16S rRNA gene sequence analysis, strain CFH 30183T was found to be closely related to Ornithinimicrobium murale 01-Gi-040T (97.8% sequence identity). The ANIb/ANIm values between strain CFH 30183T and O. murale DSM 22056T were found to be 80.3%/85.9%. Strain CFH 30183T was found to grow optimally at 28-32 °C, at pH 8.0-9.0 and in the presence of up to 7% NaCl (w/v). Whole cell hydrolysates of strain CFH 30183T contained L-ornithine as the diagnostic diamino acid, and arabinose, glucose, mannose and rhamnose as whole cell sugars. The respiratory quinone was determined to be MK-8(H4), while the major fatty acids were found to consist of iso-C15:0 and iso-C16:0. The polar lipids profile was found to include diphosphatidylglycerol, phosphatidylglycerol, phosphatidylinositol, an unidentified phospholipid, an unidentified phosphoglycolipid and four unidentified lipids. The DNA G + C content of strain CFH 30183T was calculated to be 70.9%. Based on the phenotypic, genotypic and phylogenetic data obtained, strain CFH 30183T is considered to represent a novel species of the genus Ornithinimicrobium, for which the name Ornithinimicrobium cavernae sp. nov. is proposed. The type strain is CFH 30183T (= KCTC 49018T = CGMCC 1.16393T).
Collapse
Affiliation(s)
- Ling-Yu Zhang
- College of Fisheries, Henan Normal University, Xinxiang, 453007, People's Republic of China
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, People's Republic of China
| | - Hong Ming
- Synthetic Biology Engineering Laboratory of Henan Province, College of Life Sciences and Technology, Xinxiang Medical University, Xinxiang, 453003, People's Republic of China
| | - Xiao-Lin Meng
- College of Fisheries, Henan Normal University, Xinxiang, 453007, People's Republic of China
| | - Bao-Zhu Fang
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, People's Republic of China
| | - Jian-Yu Jiao
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, People's Republic of China
| | - Nimaichand Salam
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, People's Republic of China
| | - Xiao-Tong Zhang
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, People's Republic of China
| | - Wen-Jun Li
- College of Fisheries, Henan Normal University, Xinxiang, 453007, People's Republic of China.
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, People's Republic of China.
| | - Guo-Xing Nie
- College of Fisheries, Henan Normal University, Xinxiang, 453007, People's Republic of China.
| |
Collapse
|
12
|
Kuang T, Tornabene L, Li J, Jiang J, Chakrabarty P, Sparks JS, Naylor GJP, Li C. Phylogenomic analysis on the exceptionally diverse fish clade Gobioidei (Actinopterygii: Gobiiformes) and data-filtering based on molecular clocklikeness. Mol Phylogenet Evol 2018; 128:192-202. [PMID: 30036699 DOI: 10.1016/j.ympev.2018.07.018] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 07/11/2018] [Accepted: 07/17/2018] [Indexed: 11/30/2022]
Abstract
The use of genome-scale data to infer phylogenetic relationships has gained in popularity in recent years due to the progress made in target-gene capture and sequencing techniques. Data filtering, the approach of excluding data inconsistent with the model from analyses, presumably could alleviate problems caused by systematic errors in phylogenetic inference. Different data filtering criteria, such as those based on evolutionary rate and molecular clocklikeness as well as others have been proposed for selecting useful phylogenetic markers, yet few studies have tested these criteria using phylogenomic data. We developed a novel set of single-copy nuclear coding markers to capture thousands of target genes in gobioid fishes, a species-rich lineages of vertebrates, and tested the effects of data-filtering methods based on substitution rate and molecular clocklikeness while attempting to control for the compounding effects of missing data and variation in locus length. We found that molecular clocklikeness was a better predictor than overall substitution rate for phylogenetic usefulness of molecular markers in our study. In addition, when the 100 best ranked loci for our predictors were concatenated and analyzed using maximum likelihood, or combined in a coalescent-based species-tree analysis, the resulting trees showed a well-resolved topology of Gobioidei that mostly agrees with previous studies. However, trees generated from the 100 least clocklike frequently recovered conflicting, and in some cases clearly erroneous topologies with strong support, thus indicating strong systematic biases in those datasets. Collectively these results suggest that data filtering has the potential improve the performance of phylogenetic inference when using both a concatenation approach as well as methods that rely on input from individual gene trees (i.e. coalescent species-tree approaches), which may be preferred in scenarios where incomplete lineage sorting is likely to be an issue.
Collapse
Affiliation(s)
- Ting Kuang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Luke Tornabene
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98105, USA
| | - Jingyan Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Jiamei Jiang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Prosanta Chakrabarty
- Louisiana State University, Museum of Natural Science, Department of Biological Sciences, Baton Rouge, LA 70803, USA
| | - John S Sparks
- American Museum of Natural History, Central Park West at 79th Street, NY, NY 10024, USA
| | | | - Chenhong Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China.
| |
Collapse
|
13
|
Genome-Guided Phylo-Transcriptomic Methods and the Nuclear Phylogentic Tree of the Paniceae Grasses. Sci Rep 2017; 7:13528. [PMID: 29051622 PMCID: PMC5648822 DOI: 10.1038/s41598-017-13236-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 09/20/2017] [Indexed: 11/23/2022] Open
Abstract
The past few years have witnessed a paradigm shift in molecular systematics from phylogenetic methods (using one or a few genes) to those that can be described as phylogenomics (phylogenetic inference with entire genomes). One approach that has recently emerged is phylo-transcriptomics (transcriptome-based phylogenetic inference). As in any phylogenetics experiment, accurate orthology inference is critical to phylo-transcriptomics. To date, most analyses have inferred orthology based either on pure sequence similarity or using gene-tree approaches. The use of conserved genome synteny in orthology detection has been relatively under-employed in phylogenetics, mainly due to the cost of sequencing genomes. While current trends focus on the quantity of genes included in an analysis, the use of synteny is likely to improve the quality of ortholog inference. In this study, we combine de novo transcriptome data and sequenced genomes from an economically important group of grass species, the tribe Paniceae, to make phylogenomic inferences. This method, which we call “genome-guided phylo-transcriptomics”, is compared to other recently published orthology inference pipelines, and benchmarked using a set of sequenced genomes from across the grasses. These comparisons provide a framework for future researchers to evaluate the costs and benefits of adding sequenced genomes to transcriptome data sets.
Collapse
|
14
|
Coestimation of Gene Trees and Reconciliations Under a Duplication-Loss-Coalescence Model. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/978-3-319-59575-7_18] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
15
|
Kordi M, Bansal MS. On the Complexity of Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:587-599. [PMID: 28055898 DOI: 10.1109/tcbb.2015.2511761] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Duplication-Transfer-Loss (DTL) reconciliation has emerged as a powerful technique for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation takes as input a gene family phylogeny and the corresponding species phylogeny, and reconciles the two by postulating speciation, gene duplication, horizontal gene transfer, and gene loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. However, gene trees are frequently non-binary. With such non-binary gene trees, the reconciliation problem seeks to find a binary resolution of the gene tree that minimizes the reconciliation cost. Given the prevalence of non-binary gene trees, many efficient algorithms have been developed for this problem in the context of the simpler Duplication-Loss (DL) reconciliation model. Yet, no efficient algorithms exist for DTL reconciliation with non-binary gene trees and the complexity of the problem remains unknown. In this work, we resolve this open question by showing that the problem is, in fact, NP-hard. Our reduction applies to both the dated and undated formulations of DTL reconciliation. By resolving this long-standing open problem, this work will spur the development of both exact and heuristic algorithms for this important problem.
Collapse
|
16
|
Zhao L, Li X, Zhang N, Zhang SD, Yi TS, Ma H, Guo ZH, Li DZ. Phylogenomic analyses of large-scale nuclear genes provide new insights into the evolutionary relationships within the rosids. Mol Phylogenet Evol 2016; 105:166-176. [DOI: 10.1016/j.ympev.2016.06.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Revised: 06/06/2016] [Accepted: 06/27/2016] [Indexed: 12/28/2022]
|
17
|
Binet M, Gascuel O, Scornavacca C, Douzery EJP, Pardi F. Fast and accurate branch lengths estimation for phylogenomic trees. BMC Bioinformatics 2016; 17:23. [PMID: 26744021 PMCID: PMC4705742 DOI: 10.1186/s12859-015-0821-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 11/02/2015] [Indexed: 01/26/2023] Open
Abstract
Background Branch lengths are an important attribute of phylogenetic trees, providing essential information for many studies in evolutionary biology. Yet, part of the current methodology to reconstruct a phylogeny from genomic information — namely supertree methods — focuses on the topology or structure of the phylogenetic tree, rather than the evolutionary divergences associated to it. Moreover, accurate methods to estimate branch lengths — typically based on probabilistic analysis of a concatenated alignment — are limited by large demands in memory and computing time, and may become impractical when the data sets are too large. Results Here, we present a novel phylogenomic distance-based method, named ERaBLE (Evolutionary Rates and Branch Length Estimation), to estimate the branch lengths of a given reference topology, and the relative evolutionary rates of the genes employed in the analysis. ERaBLE uses as input data a potentially very large collection of distance matrices, where each matrix is obtained from a different genomic region — either directly from its sequence alignment, or indirectly from a gene tree inferred from the alignment. Our experiments show that ERaBLE is very fast and fairly accurate when compared to other possible approaches for the same tasks. Specifically, it efficiently and accurately deals with large data sets, such as the OrthoMaM v8 database, composed of 6,953 exons from up to 40 mammals. Conclusions ERaBLE may be used as a complement to supertree methods — or it may provide an efficient alternative to maximum likelihood analysis of concatenated alignments — to estimate branch lengths from phylogenomic data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0821-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Manuel Binet
- Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS, Université de Montpellier, Montpellier, France. .,Institut de Biologie Computationnelle, Montpellier, France. .,Institut des Sciences de l'Evolution de Montpellier, CNRS, IRD, EPHE, Université de Montpellier, France.
| | - Olivier Gascuel
- Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS, Université de Montpellier, Montpellier, France. .,Institut de Biologie Computationnelle, Montpellier, France.
| | - Celine Scornavacca
- Institut de Biologie Computationnelle, Montpellier, France. .,Institut des Sciences de l'Evolution de Montpellier, CNRS, IRD, EPHE, Université de Montpellier, France.
| | - Emmanuel J P Douzery
- Institut des Sciences de l'Evolution de Montpellier, CNRS, IRD, EPHE, Université de Montpellier, France.
| | - Fabio Pardi
- Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS, Université de Montpellier, Montpellier, France. .,Institut de Biologie Computationnelle, Montpellier, France.
| |
Collapse
|
18
|
Big Data in Plant Science: Resources and Data Mining Tools for Plant Genomics and Proteomics. Methods Mol Biol 2016; 1415:533-47. [PMID: 27115651 DOI: 10.1007/978-1-4939-3572-7_27] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
In modern plant biology, progress is increasingly defined by the scientists' ability to gather and analyze data sets of high volume and complexity, otherwise known as "big data". Arguably, the largest increase in the volume of plant data sets over the last decade is a consequence of the application of the next-generation sequencing and mass-spectrometry technologies to the study of experimental model and crop plants. The increase in quantity and complexity of biological data brings challenges, mostly associated with data acquisition, processing, and sharing within the scientific community. Nonetheless, big data in plant science create unique opportunities in advancing our understanding of complex biological processes at a level of accuracy without precedence, and establish a base for the plant systems biology. In this chapter, we summarize the major drivers of big data in plant science and big data initiatives in life sciences with a focus on the scope and impact of iPlant, a representative cyberinfrastructure platform for plant science.
Collapse
|
19
|
Impact of gene family evolutionary histories on phylogenetic species tree inference by gene tree parsimony. Mol Phylogenet Evol 2015; 96:9-16. [PMID: 26702957 DOI: 10.1016/j.ympev.2015.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 10/11/2015] [Accepted: 12/03/2015] [Indexed: 11/21/2022]
Abstract
Complicated history of gene duplication and loss brings challenge to molecular phylogenetic inference, especially in deep phylogenies. However, phylogenomic approaches, such as gene tree parsimony (GTP), show advantage over some other approaches in its ability to use gene families with duplications. GTP searches the 'optimal' species tree by minimizing the total cost of biological events such as duplications, but accuracy of GTP and phylogenetic signal in the context of different gene families with distinct histories of duplication and loss are unclear. To evaluate how different evolutionary properties of different gene families can impact on species tree inference, 3900 gene families from seven angiosperms encompassing a wide range of gene content, lineage-specific expansions and contractions were analyzed. It was found that the gene content and total duplication number in a gene family strongly influence species tree inference accuracy, with the highest accuracy achieved at either very low or very high gene content (or duplication number) and lowest accuracy centered in intermediate gene content (or duplication number), as the relationship can fit a binomial regression. Besides, for gene families of similar level of average gene content, those with relatively higher lineage-specific expansion or duplication rates tend to show lower accuracy. Additional correlation tests support that high accuracy for those gene families with large gene content may rely on abundant ancestral copies to provide many subtrees to resolve conflicts, whereas high accuracy for single or low copy gene families are just subject to sequence substitution per se. Very low accuracy reached by gene families of intermediate gene content or duplication number can be due to insufficient subtrees to resolve the conflicts from loss of alternative copies. As these evolutionary properties can significantly influence species tree accuracy, I discussed the potential weighting of the duplication cost by evolutionary properties of gene families in future GTP analyses.
Collapse
|
20
|
Wu M, Lan S, Cai B, Chen S, Chen H, Zhou S. The Complete Chloroplast Genome of Guadua angustifolia and Comparative Analyses of Neotropical-Paleotropical Bamboos. PLoS One 2015; 10:e0143792. [PMID: 26630488 PMCID: PMC4668023 DOI: 10.1371/journal.pone.0143792] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2014] [Accepted: 11/10/2015] [Indexed: 11/19/2022] Open
Abstract
To elucidate chloroplast genome evolution within neotropical-paleotropical bamboos, we fully characterized the chloroplast genome of the woody bamboo Guadua angustifolia. This genome is 135,331 bp long and comprises of an 82,839-bp large single-copy (LSC) region, a 12,898-bp small single-copy (SSC) region, and a pair of 19,797-bp inverted repeats (IRs). Comparative analyses revealed marked conservation of gene content and sequence evolutionary rates between neotropical and paleotropical woody bamboos. The neotropical herbaceous bamboo Cryptochloa strictiflora differs from woody bamboos in IR/SSC boundaries in that it exhibits slightly contracted IRs and a faster substitution rate. The G. angustifolia chloroplast genome is similar in size to that of neotropical herbaceous bamboos but is ~3 kb smaller than that of paleotropical woody bamboos. Dissimilarities in genome size are correlated with differences in the lengths of intergenic spacers, which are caused by large-fragment insertion and deletion. Phylogenomic analyses of 62 taxa yielded a tree topology identical to that found in preceding studies. Divergence time estimation suggested that most bamboo genera diverged after the Miocene and that speciation events of extant species occurred during or after the Pliocene.
Collapse
Affiliation(s)
- Miaoli Wu
- Forestry College, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Siren Lan
- Forestry College, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Bangping Cai
- Xiamen Botanical Garden, Xiamen, 361000, Fujian, China
| | - Shipin Chen
- Forestry College, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Hui Chen
- Forestry College, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
- * E-mail: (HC); (SZ)
| | - Shiliang Zhou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing, 100093, China
- * E-mail: (HC); (SZ)
| |
Collapse
|
21
|
Wang B, Zhang Y, Wei P, Sun M, Ma X, Zhu X. Identification of nuclear low-copy genes and their phylogenetic utility in rosids. Genome 2015; 57:547-54. [PMID: 25761707 DOI: 10.1139/gen-2014-0138] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
By far, the interordinal relationships in rosids remain poorly resolved. Previous studies based on chloroplast, mitochondrial, and nuclear DNA has produced conflicting phylogenetic resolutions that has become a widely concerned problem in recent phylogenetic studies. Here, a total of 96 single-copy nuclear gene loci were identified from the KOG (eukaryotic orthologous groups) database, most of which were first used for phylogenetic analysis of angiosperms. The orthologous sequence datasets from completely sequenced genomes of rosids were assembled for the resolution of the position of the COM (Celastrales-Oxalidales-Malpighiales) clade in rosids. Our analysis revealed strong and consistent support for CM topology (the COM clade as sister to the malvids). Our results will contribute to further exploring the underlying cause of conflict between chloroplast, mitochondrial, and nuclear data. In addition, our study identified a few novel nuclear molecular markers with potential to investigate the deep phylogenetic relationship of plants or other eukaryotic taxonomical groups.
Collapse
Affiliation(s)
- Baohua Wang
- School of Life Sciences, Nantong University, Nantong 226019, China
| | | | | | | | | | | |
Collapse
|
22
|
Martín M, Marín D, Serrot PH, Sabater B. Evolutionary reversion of editing sites of ndh genes suggests their origin in the Permian-Triassic, before the increase of atmospheric CO2. Front Ecol Evol 2015. [DOI: 10.3389/fevo.2015.00081] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
23
|
Boeckmann B, Marcet-Houben M, Rees JA, Forslund K, Huerta-Cepas J, Muffato M, Yilmaz P, Xenarios I, Bork P, Lewis SE, Gabaldón T. Quest for Orthologs Entails Quest for Tree of Life: In Search of the Gene Stream. Genome Biol Evol 2015; 7:1988-99. [PMID: 26133389 PMCID: PMC4524488 DOI: 10.1093/gbe/evv121] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Quest for Orthologs (QfO) is a community effort with the goal to improve and benchmark orthology predictions. As quality assessment assumes prior knowledge on species phylogenies, we investigated the congruency between existing species trees by comparing the relationships of 147 QfO reference organisms from six Tree of Life (ToL)/species tree projects: The National Center for Biotechnology Information (NCBI) taxonomy, Opentree of Life, the sequenced species/species ToL, the 16S ribosomal RNA (rRNA) database, and trees published by Ciccarelli et al. (Ciccarelli FD, et al. 2006. Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283–1287) and by Huerta-Cepas et al. (Huerta-Cepas J, Marcet-Houben M, Gabaldon T. 2014. A nested phylogenetic reconstruction approach provides scalable resolution in the eukaryotic Tree Of Life. PeerJ PrePrints 2:223) Our study reveals that each species tree suggests a different phylogeny: 87 of the 146 (60%) possible splits of a dichotomous and rooted tree are congruent, while all other splits are incongruent in at least one of the species trees. Topological differences are observed not only at deep speciation events, but also within younger clades, such as Hominidae, Rodentia, Laurasiatheria, or rosids. The evolutionary relationships of 27 archaea and bacteria are highly inconsistent. By assessing 458,108 gene trees from 65 genomes, we show that consistent species topologies are more often supported by gene phylogenies than contradicting ones. The largest concordant species tree includes 77 of the QfO reference organisms at the most. Results are summarized in the form of a consensus ToL (http://swisstree.vital-it.ch/species_tree) that can serve different benchmarking purposes.
Collapse
Affiliation(s)
| | - Marina Marcet-Houben
- Bioinformatics and Genomics, Centre for Genomic Regulation, Barcelona, Spain Universitat Pompeu Fabra, Barcelona, Spain
| | - Jonathan A Rees
- US National Evolutionary Synthesis Center, Duke University, Durham, NC
| | - Kristoffer Forslund
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Pelin Yilmaz
- Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Bremen, Germany
| | - Ioannis Xenarios
- Swiss-Prot, Swiss Institute of Bioinformatics, Geneva, Switzerland Vital-IT, Swiss Institute of Bioinformatics, Lausanne, Switzerland Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany Germany Molecular Medicine Partnership Unit, University Hospital Heidelberg and European Molecular Biology Laboratory, Heidelberg, Germany Max Delbrück Centre for Molecular Medicine, Berlin, Germany
| | | | - Toni Gabaldón
- Bioinformatics and Genomics, Centre for Genomic Regulation, Barcelona, Spain Universitat Pompeu Fabra, Barcelona, Spain Institució Catalana de Recerca I Estudis Avançats, Barcelona, Spain
| | | |
Collapse
|
24
|
Schwartz RS, Harkins KM, Stone AC, Cartwright RA. A composite genome approach to identify phylogenetically informative data from next-generation sequencing. BMC Bioinformatics 2015; 16:193. [PMID: 26062548 PMCID: PMC4464851 DOI: 10.1186/s12859-015-0632-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 05/29/2015] [Indexed: 11/16/2022] Open
Abstract
Background Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. Results For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets. Conclusions SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0632-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Kelly M Harkins
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA. .,Department of Anthropology, University of California - Santa Cruz, Santa Cruz, CA, USA.
| | - Anne C Stone
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA.
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA. .,School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
25
|
Saarela JM, Wysocki WP, Barrett CF, Soreng RJ, Davis JI, Clark LG, Kelchner SA, Pires JC, Edger PP, Mayfield DR, Duvall MR. Plastid phylogenomics of the cool-season grass subfamily: clarification of relationships among early-diverging tribes. AOB PLANTS 2015; 7:plv046. [PMID: 25940204 PMCID: PMC4480051 DOI: 10.1093/aobpla/plv046] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 04/21/2015] [Indexed: 05/08/2023]
Abstract
Whole plastid genomes are being sequenced rapidly from across the green plant tree of life, and phylogenetic analyses of these are increasing resolution and support for relationships that have varied among or been unresolved in earlier single- and multi-gene studies. Pooideae, the cool-season grass lineage, is the largest of the 12 grass subfamilies and includes important temperate cereals, turf grasses and forage species. Although numerous studies of the phylogeny of the subfamily have been undertaken, relationships among some 'early-diverging' tribes conflict among studies, and some relationships among subtribes of Poeae have not yet been resolved. To address these issues, we newly sequenced 25 whole plastomes, which showed rearrangements typical of Poaceae. These plastomes represent 9 tribes and 11 subtribes of Pooideae, and were analysed with 20 existing plastomes for the subfamily. Maximum likelihood (ML), maximum parsimony (MP) and Bayesian inference (BI) robustly resolve most deep relationships in the subfamily. Complete plastome data provide increased nodal support compared with protein-coding data alone at nodes that are not maximally supported. Following the divergence of Brachyelytrum, Phaenospermateae, Brylkinieae-Meliceae and Ampelodesmeae-Stipeae are the successive sister groups of the rest of the subfamily. Ampelodesmeae are nested within Stipeae in the plastome trees, consistent with its hybrid origin between a phaenospermatoid and a stipoid grass (the maternal parent). The core Pooideae are strongly supported and include Brachypodieae, a Bromeae-Triticeae clade and Poeae. Within Poeae, a novel sister group relationship between Phalaridinae and Torreyochloinae is found, and the relative branching order of this clade and Aveninae, with respect to an Agrostidinae-Brizinae clade, are discordant between MP and ML/BI trees. Maximum likelihood and Bayesian analyses strongly support Airinae and Holcinae as the successive sister groups of a Dactylidinae-Loliinae clade.
Collapse
Affiliation(s)
- Jeffery M Saarela
- Botany Section, Research and Collections, Canadian Museum of Nature, PO Box 3443 Stn. D, Ottawa, ON, Canada K1P 3P4
| | - William P Wysocki
- Biological Sciences, Northern Illinois University, 1425 W. Lincoln Hwy, DeKalb, IL 60115-2861, USA
| | - Craig F Barrett
- Department of Biological Sciences, California State University, 5151 State University Dr., Los Angeles, CA 90032-8201, USA
| | - Robert J Soreng
- Department of Botany, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA
| | - Jerrold I Davis
- Section of Plant Biology, Cornell University, 412 Mann Library, Ithaca, NY 14853, USA
| | - Lynn G Clark
- Ecology, Evolution and Organismal Biology, Iowa State University, 251 Bessey Hall, Ames, IA 50011-1020, USA
| | - Scot A Kelchner
- Biological Sciences, Idaho State University, 921 S. 8th Ave, Pocatello, ID 83209, USA
| | - J Chris Pires
- Division of Biological Sciences, University of Missouri, 1201 Rollins St, Columbia, MO 65211, USA
| | - Patrick P Edger
- Department of Plant and Microbial Biology, University of California - Berkeley, Berkeley, CA 94720, USA
| | - Dustin R Mayfield
- Division of Biological Sciences, University of Missouri, 1201 Rollins St, Columbia, MO 65211, USA
| | - Melvin R Duvall
- Biological Sciences, Northern Illinois University, 1425 W. Lincoln Hwy, DeKalb, IL 60115-2861, USA
| |
Collapse
|
26
|
Petersen G, Seberg O, Cuenca A, Stevenson DW, Thadeo M, Davis JI, Graham S, Ross TG. Phylogeny of the Alismatales (Monocotyledons) and the relationship ofAcorus(Acorales?). Cladistics 2015; 32:141-159. [DOI: 10.1111/cla.12120] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2015] [Indexed: 11/28/2022] Open
Affiliation(s)
- Gitte Petersen
- Natural History Museum of Denmark; University of Copenhagen; Sølvgade 83 Opg. S DK-1307 Copenhagen Denmark
| | - Ole Seberg
- Natural History Museum of Denmark; University of Copenhagen; Sølvgade 83 Opg. S DK-1307 Copenhagen Denmark
| | - Argelia Cuenca
- Natural History Museum of Denmark; University of Copenhagen; Sølvgade 83 Opg. S DK-1307 Copenhagen Denmark
| | | | | | - Jerrold I. Davis
- L. H. Bailey Hortorium and Section of Plant Biology; Cornell University; Ithaca NY 14853 USA
| | - Sean Graham
- Department of Botany; University of British Columbia; Vancouver BC V6T 1Z4 Canada
| | - T. Gregory Ross
- Department of Botany; University of British Columbia; Vancouver BC V6T 1Z4 Canada
| |
Collapse
|
27
|
Bansal MS, Wu YC, Alm EJ, Kellis M. Improved gene tree error correction in the presence of horizontal gene transfer. Bioinformatics 2015; 31:1211-8. [PMID: 25481006 PMCID: PMC4393519 DOI: 10.1093/bioinformatics/btu806] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Revised: 11/30/2014] [Accepted: 12/02/2014] [Indexed: 01/30/2023] Open
Abstract
MOTIVATION The accurate inference of gene trees is a necessary step in many evolutionary studies. Although the problem of accurate gene tree inference has received considerable attention, most existing methods are only applicable to gene families unaffected by horizontal gene transfer. As a result, the accurate inference of gene trees affected by horizontal gene transfer remains a largely unaddressed problem. RESULTS In this study, we introduce a new and highly effective method for gene tree error correction in the presence of horizontal gene transfer. Our method efficiently models horizontal gene transfers, gene duplications and losses, and uses a statistical hypothesis testing framework [Shimodaira-Hasegawa (SH) test] to balance sequence likelihood with topological information from a known species tree. Using a thorough simulation study, we show that existing phylogenetic methods yield inaccurate gene trees when applied to horizontally transferred gene families and that our method dramatically improves gene tree accuracy. We apply our method to a dataset of 11 cyanobacterial species and demonstrate the large impact of gene tree accuracy on downstream evolutionary analyses. AVAILABILITY AND IMPLEMENTATION An implementation of our method is available at http://compbio.mit.edu/treefix-dtl/ CONTACT : mukul@engr.uconn.edu or manoli@mit.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mukul S Bansal
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge and Broad Institute, Cambridge, MA, USA Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge and Broad Institute, Cambridge, MA, USA
| | - Yi-Chieh Wu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge and Broad Institute, Cambridge, MA, USA
| | - Eric J Alm
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge and Broad Institute, Cambridge, MA, USA Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge and Broad Institute, Cambridge, MA, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge and Broad Institute, Cambridge, MA, USA Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge and Broad Institute, Cambridge, MA, USA
| |
Collapse
|
28
|
Sun M, Soltis DE, Soltis PS, Zhu X, Burleigh JG, Chen Z. Deep phylogenetic incongruence in the angiosperm clade Rosidae. Mol Phylogenet Evol 2015; 83:156-66. [DOI: 10.1016/j.ympev.2014.11.003] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Revised: 11/01/2014] [Accepted: 11/05/2014] [Indexed: 10/24/2022]
|
29
|
Chaudhary R, Boussau B, Burleigh JG, Fernández-Baca D. Assessing approaches for inferring species trees from multi-copy genes. Syst Biol 2014; 64:325-39. [PMID: 25540456 DOI: 10.1093/sysbio/syu128] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
With the availability of genomic sequence data, there is increasing interest in using genes with a possible history of duplication and loss for species tree inference. Here we assess the performance of both nonprobabilistic and probabilistic species tree inference approaches using gene duplication and loss and coalescence simulations. We evaluated the performance of gene tree parsimony (GTP) based on duplication (Only-dup), duplication and loss (Dup-loss), and deep coalescence (Deep-c) costs, the NJst distance method, the MulRF supertree method, and PHYLDOG, which jointly estimates gene trees and species tree using a hierarchical probabilistic model. We examined the effects of gene tree and species sampling, gene tree error, and duplication and loss rates on the accuracy of phylogenetic estimates. In the 10-taxon duplication and loss simulation experiments, MulRF is more accurate than the other methods when the duplication and loss rates are low, and Dup-loss is generally the most accurate when the duplication and loss rates are high. PHYLDOG performs well in 10-taxon duplication and loss simulations, but its run time is prohibitively long on larger data sets. In the larger duplication and loss simulation experiments, MulRF outperforms all other methods in experiments with at most 100 taxa; however, in the larger simulation, Dup-loss generally performs best. In all duplication and loss simulation experiments with more than 10 taxa, all methods perform better with more gene trees and fewer missing sequences, and they are all affected by gene tree error. Our results also highlight high levels of error in estimates of duplications and losses from GTP methods and demonstrate the usefulness of methods based on generic tree distances for large analyses.
Collapse
Affiliation(s)
- Ruchi Chaudhary
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France
| | - Bastien Boussau
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France
| | - J Gordon Burleigh
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France
| | - David Fernández-Baca
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France
| |
Collapse
|
30
|
Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, Matasci N, Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA, Ruhfel BR, Wafula E, Der JP, Graham SW, Mathews S, Melkonian M, Soltis DE, Soltis PS, Miles NW, Rothfels CJ, Pokorny L, Shaw AJ, DeGironimo L, Stevenson DW, Surek B, Villarreal JC, Roure B, Philippe H, dePamphilis CW, Chen T, Deyholos MK, Baucom RS, Kutchan TM, Augustin MM, Wang J, Zhang Y, Tian Z, Yan Z, Wu X, Sun X, Wong GKS, Leebens-Mack J. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad Sci U S A 2014; 111:E4859-68. [PMID: 25355905 PMCID: PMC4234587 DOI: 10.1073/pnas.1323926111] [Citation(s) in RCA: 756] [Impact Index Per Article: 75.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Reconstructing the origin and evolution of land plants and their algal relatives is a fundamental problem in plant phylogenetics, and is essential for understanding how critical adaptations arose, including the embryo, vascular tissue, seeds, and flowers. Despite advances in molecular systematics, some hypotheses of relationships remain weakly resolved. Inferring deep phylogenies with bouts of rapid diversification can be problematic; however, genome-scale data should significantly increase the number of informative characters for analyses. Recent phylogenomic reconstructions focused on the major divergences of plants have resulted in promising but inconsistent results. One limitation is sparse taxon sampling, likely resulting from the difficulty and cost of data generation. To address this limitation, transcriptome data for 92 streptophyte taxa were generated and analyzed along with 11 published plant genome sequences. Phylogenetic reconstructions were conducted using up to 852 nuclear genes and 1,701,170 aligned sites. Sixty-nine analyses were performed to test the robustness of phylogenetic inferences to permutations of the data matrix or to phylogenetic method, including supermatrix, supertree, and coalescent-based approaches, maximum-likelihood and Bayesian methods, partitioned and unpartitioned analyses, and amino acid versus DNA alignments. Among other results, we find robust support for a sister-group relationship between land plants and one group of streptophyte green algae, the Zygnematophyceae. Strong and robust support for a clade comprising liverworts and mosses is inconsistent with a widely accepted view of early land plant evolution, and suggests that phylogenetic hypotheses used to understand the evolution of fundamental plant traits should be reevaluated.
Collapse
Affiliation(s)
- Norman J Wickett
- Chicago Botanic Garden, Glencoe, IL 60022; Program in Biological Sciences, Northwestern University, Evanston, IL 60208;
| | - Siavash Mirarab
- Department of Computer Science, University of Texas, Austin, TX 78712
| | - Nam Nguyen
- Department of Computer Science, University of Texas, Austin, TX 78712
| | - Tandy Warnow
- Department of Computer Science, University of Texas, Austin, TX 78712
| | - Eric Carpenter
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E9
| | - Naim Matasci
- iPlant Collaborative, Tucson, AZ 85721; Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721
| | | | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721
| | | | - Matthew A Gitzendanner
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611
| | - Brad R Ruhfel
- Department of Biology and Department of Biological Sciences, Eastern Kentucky University, Richmond, KY 40475; Florida Museum of Natural History, Gainesville, FL 32611
| | - Eric Wafula
- Department of Biology, Pennsylvania State University, University Park, PA 16803
| | - Joshua P Der
- Department of Biology, Pennsylvania State University, University Park, PA 16803
| | | | - Sarah Mathews
- Arnold Arboretum of Harvard University, Cambridge, MA 02138
| | | | - Douglas E Soltis
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611; Florida Museum of Natural History, Gainesville, FL 32611
| | - Pamela S Soltis
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611; Florida Museum of Natural History, Gainesville, FL 32611
| | | | - Carl J Rothfels
- Department of Biology, Duke University, Durham, NC 27708; Department of Zoology, University of British Columbia, Vancouver, BC, Canada V6T 1Z4
| | - Lisa Pokorny
- Department of Biology, Duke University, Durham, NC 27708; Department of Biodiversity and Conservation, Real Jardín Botánico-Consejo Superior de Investigaciones Cientificas, 28014 Madrid, Spain
| | | | | | | | - Barbara Surek
- Botanical Institute, Universität zu Köln, Cologne D-50674, Germany
| | - Juan Carlos Villarreal
- Department fur Biologie, Systematische Botanik und Mykologie, Ludwig-Maximilians-Universitat, 80638 Munich, Germany
| | - Béatrice Roure
- Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Succursale Centre-Ville, Montreal, QC, Canada H3C 3J7
| | - Hervé Philippe
- Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Succursale Centre-Ville, Montreal, QC, Canada H3C 3J7; CNRS, Station d' Ecologie Expérimentale du CNRS, Moulis, 09200, France
| | | | - Tao Chen
- Shenzhen Fairy Lake Botanical Garden, The Chinese Academy of Sciences, Shenzhen, Guangdong 518004, China
| | - Michael K Deyholos
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E9
| | - Regina S Baucom
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109
| | - Toni M Kutchan
- Donald Danforth Plant Science Center, St. Louis, MO 63132
| | | | - Jun Wang
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Yong Zhang
- CNRS, Station d' Ecologie Expérimentale du CNRS, Moulis, 09200, France
| | - Zhijian Tian
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Zhixiang Yan
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Xiaolei Wu
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Xiao Sun
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Gane Ka-Shu Wong
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E9; BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and Department of Medicine, University of Alberta, Edmonton, AB, Canada T6G 2E1
| | | |
Collapse
|
31
|
Zhao P, Capella-Gutiérrez S, Shi Y, Zhao X, Chen G, Gabaldón T, Ma XF. Transcriptomic analysis of a psammophyte food crop, sand rice (Agriophyllum squarrosum) and identification of candidate genes essential for sand dune adaptation. BMC Genomics 2014; 15:872. [PMID: 25287394 PMCID: PMC4459065 DOI: 10.1186/1471-2164-15-872] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Accepted: 09/29/2014] [Indexed: 12/22/2022] Open
Abstract
Background Sand rice (Agriophyllum squarrosum) is an annual desert plant adapted to
mobile sand dunes in arid and semi-arid regions of Central Asia. The sand rice
seeds have excellent nutrition value and have been historically consumed by local
populations in the desert regions of northwest China. Sand rice is a potential
food crop resilient to ongoing climate change; however, partly due to the scarcity
of genetic information, this species has undergone only little agronomic
modifications through classical breeding during recent years. Results We generated a deep transcriptomic sequencing of sand rice, which uncovers 67,741
unigenes. Phylogenetic analysis based on 221 single-copy genes showed close
relationship between sand rice and the recently domesticated crop sugar beet.
Transcriptomic comparisons also showed a high level of global sequence
conservation between these two species. Conservation of sand rice and sugar beet
orthologs assigned to response to salt stress gene ontology term suggests that
sand rice is also a potential salt tolerant plant. Furthermore, sand rice is far
more tolerant to high temperature. A set of genes likely relevant for resistance
to heat stress, was functionally annotated according to expression levels,
sequence annotation, and comparisons corresponding transcriptome profiling results
in Arabidopsis. Conclusions The present work provides abundant genomic information for functional dissection
of the important traits in sand rice. Future screening the genetic variation among
different ecotypes and constructing a draft genome sequence will further
facilitate agronomic trait improvement and final domestication of sand rice. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-872) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pengshan Zhao
- Key Laboratory of Stress Physiology and Ecology in Cold and Arid Regions, Gansu Province, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China. .,Shapotou Desert Research and Experimental Station, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China.
| | - Salvador Capella-Gutiérrez
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader, 88, 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain. .,Yeast and Basidiomycete Research Group, CBS Fungal Biodiversity Centre, Uppsalalaan 8, 3584, LT, Utrecht, The Netherlands.
| | - Yong Shi
- Key Laboratory of Stress Physiology and Ecology in Cold and Arid Regions, Gansu Province, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China. .,Shapotou Desert Research and Experimental Station, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China.
| | - Xin Zhao
- Key Laboratory of Stress Physiology and Ecology in Cold and Arid Regions, Gansu Province, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China. .,Shapotou Desert Research and Experimental Station, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China.
| | - Guoxiong Chen
- Key Laboratory of Stress Physiology and Ecology in Cold and Arid Regions, Gansu Province, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China. .,Shapotou Desert Research and Experimental Station, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China.
| | - Toni Gabaldón
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader, 88, 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain. .,Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010, Barcelona, Spain.
| | - Xiao-Fei Ma
- Key Laboratory of Stress Physiology and Ecology in Cold and Arid Regions, Gansu Province, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China. .,Shapotou Desert Research and Experimental Station, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, Lanzhou, 730000, People's Republic of China.
| |
Collapse
|
32
|
Liu JX, Liu J, Gao YL, Mi JX, Ma CX, Wang D. A class-information-based penalized matrix decomposition for identifying plants core genes responding to abiotic stresses. PLoS One 2014; 9:e106097. [PMID: 25180509 PMCID: PMC4152128 DOI: 10.1371/journal.pone.0106097] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 07/29/2014] [Indexed: 12/03/2022] Open
Abstract
In terms of making genes expression data more interpretable and comprehensible, there exists a significant superiority on sparse methods. Many sparse methods, such as penalized matrix decomposition (PMD) and sparse principal component analysis (SPCA), have been applied to extract plants core genes. Supervised algorithms, especially the support vector machine-recursive feature elimination (SVM-RFE) method, always have good performance in gene selection. In this paper, we draw into class information via the total scatter matrix and put forward a class-information-based penalized matrix decomposition (CIPMD) method to improve the gene identification performance of PMD-based method. Firstly, the total scatter matrix is obtained based on different samples of the gene expression data. Secondly, a new data matrix is constructed by decomposing the total scatter matrix. Thirdly, the new data matrix is decomposed by PMD to obtain the sparse eigensamples. Finally, the core genes are identified according to the nonzero entries in eigensamples. The results on simulation data show that CIPMD method can reach higher identification accuracies than the conventional gene identification methods. Moreover, the results on real gene expression data demonstrate that CIPMD method can identify more core genes closely related to the abiotic stresses than the other methods.
Collapse
Affiliation(s)
- Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong, China
- Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, Guangdong, China
- * E-mail:
| | - Jian Liu
- School of Communication, Qufu Normal University, Rizhao, Shandong, China
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, Shandong, China
| | - Jian-Xun Mi
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
- Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Chun-Xia Ma
- School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong, China
| | - Dong Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, Shandong, China
| |
Collapse
|
33
|
Wisecaver JH, Brosnahan ML, Hackett JD. Horizontal gene transfer is a significant driver of gene innovation in dinoflagellates. Genome Biol Evol 2014; 5:2368-81. [PMID: 24259313 PMCID: PMC3879968 DOI: 10.1093/gbe/evt179] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The dinoflagellates are an evolutionarily and ecologically important group of microbial eukaryotes. Previous work suggests that horizontal gene transfer (HGT) is an important source of gene innovation in these organisms. However, dinoflagellate genomes are notoriously large and complex, making genomic investigation of this phenomenon impractical with currently available sequencing technology. Fortunately, de novo transcriptome sequencing and assembly provides an alternative approach for investigating HGT. We sequenced the transcriptome of the dinoflagellate Alexandrium tamarense Group IV to investigate how HGT has contributed to gene innovation in this group. Our comprehensive A. tamarense Group IV gene set was compared with those of 16 other eukaryotic genomes. Ancestral gene content reconstruction of ortholog groups shows that A. tamarense Group IV has the largest number of gene families gained (314-1,563 depending on inference method) relative to all other organisms in the analysis (0-782). Phylogenomic analysis indicates that genes horizontally acquired from bacteria are a significant proportion of this gene influx, as are genes transferred from other eukaryotes either through HGT or endosymbiosis. The dinoflagellates also display curious cases of gene loss associated with mitochondrial metabolism including the entire Complex I of oxidative phosphorylation. Some of these missing genes have been functionally replaced by bacterial and eukaryotic xenologs. The transcriptome of A. tamarense Group IV lends strong support to a growing body of evidence that dinoflagellate genomes are extraordinarily impacted by HGT.
Collapse
|
34
|
Ma PF, Zhang YX, Zeng CX, Guo ZH, Li DZ. Chloroplast Phylogenomic Analyses Resolve Deep-Level Relationships of an Intractable Bamboo Tribe Arundinarieae (Poaceae). Syst Biol 2014; 63:933-50. [DOI: 10.1093/sysbio/syu054] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Peng-Fei Ma
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; 2Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; and 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; 2Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; and 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; 2Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; and 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - Yu-Xiao Zhang
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; 2Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; and 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; 2Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; and 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - Chun-Xia Zeng
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; 2Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; and 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - Zhen-Hua Guo
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; 2Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; and 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - De-Zhu Li
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; 2Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; and 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; 2Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; and 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| |
Collapse
|
35
|
Abstract
In the tree reconciliation approach for species tree inference, a tree that has the minimum reconciliation score for given gene trees is taken as an estimate of the species tree. The scoring models used in existing tree reconciliation methods include the duplication, mutation, and deep coalescence costs. Since existing inference methods all are heuristic, their performances are often evaluated by using the Robinson-Foulds (RF) distance between the true species trees and the estimates output on simulated multi-locus datasets. To better understand these methods, we study the relationships between the duplication cost and the RF distance. We prove that the gap between the duplication cost and the RF distance is unbounded, but the symmetric duplication cost is logarithmically equivalent to the RF distance. The relationships between other reconciliation costs and the RF distance are also investigated.
Collapse
Affiliation(s)
- Yu Zheng
- Department of Mathematics, National University of Singapore, Singapore
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore
- Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore
| |
Collapse
|
36
|
Abstract
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Eric Tannier
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Bastien Boussau
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;
| |
Collapse
|
37
|
A daily-updated tree of (sequenced) life as a reference for genome research. Sci Rep 2014; 3:2015. [PMID: 23778980 PMCID: PMC6504836 DOI: 10.1038/srep02015] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Accepted: 05/10/2013] [Indexed: 11/08/2022] Open
Abstract
We report a daily-updated sequenced/species Tree Of Life (sTOL) as a reference for the increasing number of cellular organisms with their genomes sequenced. The sTOL builds on a likelihood-based weight calibration algorithm to consolidate NCBI taxonomy information in concert with unbiased sampling of molecular characters from whole genomes of all sequenced organisms. Via quantifying the extent of agreement between taxonomic and molecular data, we observe there are many potential improvements that can be made to the status quo classification, particularly in the Fungi kingdom; we also see that the current state of many animal genomes is rather poor. To augment the use of sTOL in providing evolutionary contexts, we integrate an ontology infrastructure and demonstrate its utility for evolutionary understanding on: nuclear receptors, stem cells and eukaryotic genomes. The sTOL (http://supfam.org/SUPERFAMILY/sTOL) provides a binary tree of (sequenced) life, and contributes to an analytical platform linking genome evolution, function and phenotype.
Collapse
|
38
|
Som A. Causes, consequences and solutions of phylogenetic incongruence. Brief Bioinform 2014; 16:536-48. [PMID: 24872401 DOI: 10.1093/bib/bbu015] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 04/05/2014] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic analysis is used to recover the evolutionary history of species, genes or proteins. Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. Moreover, it is important because of its wide applications that include understanding genome organization, epidemiological investigations, predicting protein functions, and deciding the genes to be analyzed in comparative studies. Despite immense progress in recent years, phylogenetic reconstruction involves many challenges that create uncertainty with respect to the true evolutionary relationships of the species or genes analyzed. One of the most notable difficulties is the widespread occurrence of incongruence among methods and also among individual genes or different genomic regions. Presence of widespread incongruence inhibits successful revealing of evolutionary relationships and applications of phylogenetic analysis. In this article, I concisely review the effect of various factors that cause incongruence in molecular phylogenies, the advances in the field that resolved some factors, and explore unresolved factors that cause incongruence along with possible ways for tackling them.
Collapse
|
39
|
Abeysundera M, Kenney T, Field C, Gu H. Combining distance matrices on identical taxon sets for multi-gene analysis with singular value decomposition. PLoS One 2014; 9:e94279. [PMID: 24732341 PMCID: PMC3986248 DOI: 10.1371/journal.pone.0094279] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 03/14/2014] [Indexed: 11/26/2022] Open
Abstract
We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97), and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.
Collapse
Affiliation(s)
- Melanie Abeysundera
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Toby Kenney
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Chris Field
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Hong Gu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| |
Collapse
|
40
|
Rusin LY, Lyubetskaya EV, Gorbunov KY, Lyubetsky VA. Reconciliation of gene and species trees. BIOMED RESEARCH INTERNATIONAL 2014; 2014:642089. [PMID: 24800245 PMCID: PMC3985182 DOI: 10.1155/2014/642089] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/11/2013] [Accepted: 11/27/2013] [Indexed: 11/18/2022]
Abstract
The first part of the paper briefly overviews the problem of gene and species trees reconciliation with the focus on defining and algorithmic construction of the evolutionary scenario. Basic ideas are discussed for the aspects of mapping definitions, costs of the mapping and evolutionary scenario, imposing time scales on a scenario, incorporating horizontal gene transfers, binarization and reconciliation of polytomous trees, and construction of species trees and scenarios. The review does not intend to cover the vast diversity of literature published on these subjects. Instead, the authors strived to overview the problem of the evolutionary scenario as a central concept in many areas of evolutionary research. The second part provides detailed mathematical proofs for the solutions of two problems: (i) inferring a gene evolution along a species tree accounting for various types of evolutionary events and (ii) trees reconciliation into a single species tree when only gene duplications and losses are allowed. All proposed algorithms have a cubic time complexity and are mathematically proved to find exact solutions. Solving algorithms for problem (ii) can be naturally extended to incorporate horizontal transfers, other evolutionary events, and time scales on the species tree.
Collapse
Affiliation(s)
- L. Y. Rusin
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoy Karetny Pereulok 19, Moscow 127994, Russia
- Faculty of Biology, Moscow State University, Leninskie Gory 1-12, Moscow 119234, Russia
| | - E. V. Lyubetskaya
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoy Karetny Pereulok 19, Moscow 127994, Russia
| | - K. Y. Gorbunov
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoy Karetny Pereulok 19, Moscow 127994, Russia
| | - V. A. Lyubetsky
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoy Karetny Pereulok 19, Moscow 127994, Russia
| |
Collapse
|
41
|
Abstract
Amborella trichopoda is strongly supported as the single living species of the sister lineage to all other extant flowering plants, providing a unique reference for inferring the genome content and structure of the most recent common ancestor (MRCA) of living angiosperms. Sequencing the Amborella genome, we identified an ancient genome duplication predating angiosperm diversification, without evidence of subsequent, lineage-specific genome duplications. Comparisons between Amborella and other angiosperms facilitated reconstruction of the ancestral angiosperm gene content and gene order in the MRCA of core eudicots. We identify new gene families, gene duplications, and floral protein-protein interactions that first appeared in the ancestral angiosperm. Transposable elements in Amborella are ancient and highly divergent, with no recent transposon radiations. Population genomic analysis across Amborella's native range in New Caledonia reveals a recent genetic bottleneck and geographic structure with conservation implications.
Collapse
|
42
|
Chaudhary R, Burleigh JG, Fernández-Baca D. Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance. Algorithms Mol Biol 2013; 8:28. [PMID: 24180377 PMCID: PMC3874668 DOI: 10.1186/1748-7188-8-28] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 10/08/2013] [Indexed: 11/24/2022] Open
Abstract
Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as well as phylogenetic error may contribute to gene tree discord. In experiments, the MulRF method estimated species trees accurately and quickly, demonstrating MulRF as an efficient alternative approach for phylogenetic inference from large-scale genomic data sets.
Collapse
|
43
|
Bansal MS, Alm EJ, Kellis M. Reconciliation revisited: handling multiple optima when reconciling with duplication, transfer, and loss. J Comput Biol 2013; 20:738-54. [PMID: 24033262 PMCID: PMC3791060 DOI: 10.1089/cmb.2013.0073] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Phylogenetic tree reconciliation is a powerful approach for inferring evolutionary events like gene duplication, horizontal gene transfer, and gene loss, which are fundamental to our understanding of molecular evolution. While duplication-loss (DL) reconciliation leads to a unique maximum-parsimony solution, duplication-transfer-loss (DTL) reconciliation yields a multitude of optimal solutions, making it difficult to infer the true evolutionary history of the gene family. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Here, we present an effective, efficient, and scalable method for dealing with these fundamental problems in DTL reconciliation. Our approach works by sampling the space of optimal reconciliations uniformly at random and aggregating the results. We show that even gene trees with only a few dozen genes often have millions of optimal reconciliations and present an algorithm to efficiently sample the space of optimal reconciliations uniformly at random in O(mn(2)) time per sample, where m and n denote the number of genes and species, respectively. We use these samples to understand how different optimal reconciliations vary in their node mappings and event assignments and to investigate the impact of varying event costs. We apply our method to a biological dataset of approximately 4700 gene trees from 100 taxa and observe that 93% of event assignments and 73% of mappings remain consistent across different multiple optima. Our analysis represents the first systematic investigation of the space of optimal DTL reconciliations and has many important implications for the study of gene family evolution.
Collapse
Affiliation(s)
- Mukul S. Bansal
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Eric J. Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| |
Collapse
|
44
|
Beaulieu JM, Donoghue MJ. Fruit evolution and diversification in campanulid angiosperms. Evolution 2013; 67:3132-44. [PMID: 24151998 DOI: 10.1111/evo.12180] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 05/30/2013] [Indexed: 11/27/2022]
Abstract
With increases in both the size and scope of phylogenetic trees, we are afforded a renewed opportunity to address long-standing comparative questions, such as whether particular fruit characters account for much of the variation in diversity among flowering plant clades. Studies to date have reported conflicting results, largely as a consequence of taxonomic scale and a reliance on potentially conservative statistical measures. Here we examine a larger and older angiosperm clade, the Campanulidae, and infer the rates of character transitions among the major fruit types, emphasizing the evolution of the achene fruits that are most frequently observed within the group. Our analyses imply that campanulids likely originated bearing capsules, and that all subsequent fruit diversity was derived from various modifications of this dry fruit type. We also found that the preponderance of lineages bearing achenes is a consequence of not only being a fruit type that is somewhat irreversible once it evolves, but one that also seems to have a positive association with diversification rates. Although these results imply the achene fruit type is a significant correlate of diversity patterns observed across campanulids, we conclude that it remains difficult to confidently and directly view this character state as the actual cause of increased diversification rates.
Collapse
Affiliation(s)
- Jeremy M Beaulieu
- Department of Ecology and Evolutionary Biology, Yale University, P.O. Box 208106, New Haven, Connecticut, 10620.
| | | |
Collapse
|
45
|
Wan Y, Schwaninger HR, Baldo AM, Labate JA, Zhong GY, Simon CJ. A phylogenetic analysis of the grape genus (Vitis L.) reveals broad reticulation and concurrent diversification during neogene and quaternary climate change. BMC Evol Biol 2013; 13:141. [PMID: 23826735 PMCID: PMC3750556 DOI: 10.1186/1471-2148-13-141] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 05/28/2013] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Grapes are one of the most economically important fruit crops. There are about 60 species in the genus Vitis. The phylogenetic relationships among these species are of keen interest for the conservation and use of this germplasm. We selected 309 accessions from 48 Vitis species,varieties, and outgroups, examined ~11 kb (~3.4 Mb total) of aligned nuclear DNA sequences from 27 unlinked genes in a phylogenetic context, and estimated divergence times based on fossil calibrations. RESULTS Vitis formed a strongly supported clade. There was substantial support for species and less for the higher-level groupings (series). As estimated from extant taxa, the crown age of Vitis was 28 Ma and the divergence of subgenera (Vitis and Muscadinia) occurred at ~18 Ma. Higher clades in subgenus Vitis diverged 16 - 5 Ma with overlapping confidence intervals, and ongoing divergence formed extant species at 12 - 1.3 Ma. Several species had species-specific SNPs. NeighborNet analysis showed extensive reticulation at the core of subgenus Vitis representing the deeper nodes, with extensive reticulation radiating outward. Fitch Parsimony identified North America as the origin of the most recent common ancestor of extant Vitis species. CONCLUSIONS Phylogenetic patterns suggested origination of the genus in North America, fragmentation of an ancestral range during the Miocene, formation of extant species in the late Miocene-Pleistocene, and differentiation of species in the context of Pliocene-Quaternary tectonic and climatic change. Nuclear SNPs effectively resolved relationships at and below the species level in grapes and rectified several misclassifications of accessions in the repositories. Our results challenge current higher-level classifications, reveal the abundance of genetic diversity in the genus that is potentially available for crop improvement, and provide a valuable resource for species delineation, germplasm conservation and use.
Collapse
Affiliation(s)
- Yizhen Wan
- College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, People’s Republic of China
| | - Heidi R Schwaninger
- US Department of Agriculture, Agriculture Research Service, Plant Genetic Resources Unit, New York State Agricultural Experiment Station, Cornell University, Geneva, NY 14456, USA
| | - Angela M Baldo
- US Department of Agriculture, Agriculture Research Service, Plant Genetic Resources Unit, New York State Agricultural Experiment Station, Cornell University, Geneva, NY 14456, USA
- US Department of Agriculture, Agriculture Research Service, Grape Genetic Research Unit, New York State Agricultural Experiment Station, Cornell University, Geneva, NY 14456, USA
| | - Joanne A Labate
- US Department of Agriculture, Agriculture Research Service, Plant Genetic Resources Unit, New York State Agricultural Experiment Station, Cornell University, Geneva, NY 14456, USA
| | - Gan-Yuan Zhong
- US Department of Agriculture, Agriculture Research Service, Plant Genetic Resources Unit, New York State Agricultural Experiment Station, Cornell University, Geneva, NY 14456, USA
- US Department of Agriculture, Agriculture Research Service, Grape Genetic Research Unit, New York State Agricultural Experiment Station, Cornell University, Geneva, NY 14456, USA
| | - Charles J Simon
- US Department of Agriculture, Agriculture Research Service, Plant Genetic Resources Unit, New York State Agricultural Experiment Station, Cornell University, Geneva, NY 14456, USA
| |
Collapse
|
46
|
Bansal MS, Eulenstein O. Algorithms for genome-scale phylogenetics using gene tree parsimony. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:939-956. [PMID: 24334388 DOI: 10.1109/tcbb.2013.103] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The use of genomic data sets for phylogenetics is complicated by the fact that evolutionary processes such as gene duplication and loss, or incomplete lineage sorting (deep coalescence) cause incongruence among gene trees. One well-known approach that deals with this complication is gene tree parsimony, which, given a collection of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, a lack of efficient algorithms has limited the use of this approach. Here, we present efficient algorithms for SPR and TBR-based local search heuristics for gene tree parsimony under the 1) duplication, 2) loss, 3) duplication-loss, and 4) deep coalescence reconciliation costs. These novel algorithms improve upon the time complexities of previous algorithms for these problems by a factor of n, where n is the number of species in the collection of gene trees. Our algorithms provide a substantial improvement in runtime and scalability compared to previous implementations and enable large-scale gene tree parsimony analyses using any of the four reconciliation costs. Our algorithms have been implemented in the software packages DupTree and iGTP, and have already been used to perform several compelling phylogenetic studies.
Collapse
|
47
|
Zhao L, Zhang N, Ma PF, Liu Q, Li DZ, Guo ZH. Phylogenomic analyses of nuclear genes reveal the evolutionary relationships within the BEP clade and the evidence of positive selection in Poaceae. PLoS One 2013; 8:e64642. [PMID: 23734211 PMCID: PMC3667173 DOI: 10.1371/journal.pone.0064642] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 04/16/2013] [Indexed: 11/23/2022] Open
Abstract
BEP clade of the grass family (Poaceae) is composed of three subfamilies, i.e. Bambusoideae, Ehrhartoideae, and Pooideae. Controversies on the phylogenetic relationships among three subfamilies still persist in spite of great efforts. However, previous evidence was mainly provided from plastid genes with only a few nuclear genes utilized. Given different evolutionary histories recorded by plastid and nuclear genes, it is indispensable to uncover their relationships based on nuclear genes. Here, eleven species with whole-sequenced genome and six species with transcriptomic data were included in this study. A total of 121 one-to-one orthologous groups (OGs) were identified and phylogenetic trees were reconstructed by different tree-building methods. Genes which might have undergone positive selection and played important roles in adaptive evolution were also investigated from 314 and 173 one-to-one OGs in two bamboo species and 14 grass species, respectively. Our results support the ((B, P) E) topology with high supporting values. Besides, our findings also indicate that 24 and nine orthologs with statistically significant evidence of positive selection are mainly involved in abiotic and biotic stress response, reproduction and development, plant metabolism and enzyme etc. from two bamboo species and 14 grass species, respectively. In summary, this study demonstrates the power of phylogenomic approach to shed lights on the evolutionary relationships within the BEP clade, and offers valuable insights into adaptive evolution of the grass family.
Collapse
Affiliation(s)
- Lei Zhao
- Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Ning Zhang
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Peng-Fei Ma
- Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Qi Liu
- Institute of Genomic Medicine, Wenzhou Medical College, Wenzhou, Zhejiang, China
| | - De-Zhu Li
- Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhen-Hua Guo
- Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
- Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| |
Collapse
|
48
|
Zimmer EA, Wen J. Reprint of: using nuclear gene data for plant phylogenetics: progress and prospects. Mol Phylogenet Evol 2013; 66:539-50. [PMID: 23375140 DOI: 10.1016/j.ympev.2013.01.005] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Revised: 06/14/2012] [Accepted: 07/16/2012] [Indexed: 12/25/2022]
Abstract
The paper reviews the current state of low and single copy nuclear markers that have been applied successfully in plant phylogenetics to date, and discusses case studies highlighting the potential of massively parallel high throughput or next-generation sequencing (NGS) approaches for molecular phylogenetic and evolutionary investigations. The current state, prospects and challenges of specific single- or low-copy plant nuclear markers as well as phylogenomic case studies are presented and evaluated.
Collapse
Affiliation(s)
- Elizabeth A Zimmer
- Department of Botany, National Museum of Natural History, MRC 166, Smithsonian Institution, Washington, DC 20013-7012, USA.
| | | |
Collapse
|
49
|
Górecki P, Eulenstein O, Tiuryn J. Unrooted tree reconciliation: a unified approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:522-536. [PMID: 23929875 DOI: 10.1109/tcbb.2013.22] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Tree comparison functions are widely used in phylogenetics for comparing evolutionary trees. Unrooted trees can be compared with rooted trees by identifying all rootings of the unrooted tree that minimize some provided comparison function between two rooted trees. The plateau property is satisfied by the provided function, if all optimal rootings form a subtree, or plateau, in the unrooted tree, from which the rootings along every path toward a leaf have monotonically increasing costs. This property is sufficient for the linear-time identification of all optimal rootings and rooting costs. However, the plateau property has only been proven for a few rooted comparison functions, requiring individual proofs for each function without benefitting from inherent structural features of such functions. Here, we introduce the consistency condition that is sufficient for a general function to satisfy the plateau property. For consistent functions, we introduce general linear-time solutions that identify optimal rootings and all rooting costs. Further, we identify novel relationships between consistent functions in terms of plateaus, especially the plateau of the well-studied duplication-loss function is part of a plateau of every other consistent function. We introduce a novel approach for identifying consistent cost functions by defining a formal language of Boolean costs. Formulas in this language can be interpreted as cost functions. Finally, we demonstrate the performance of our general linear-time solutions in practice using empirical and simulation studies.
Collapse
Affiliation(s)
- Pawel Górecki
- Department of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Mazowieckie 02-097, Poland
| | | | | |
Collapse
|
50
|
Yoder JB, Briskine R, Mudge J, Farmer A, Paape T, Steele K, Weiblen GD, Bharti AK, Zhou P, May GD, Young ND, Tiffin P. Phylogenetic signal variation in the genomes of Medicago (Fabaceae). Syst Biol 2013; 62:424-38. [PMID: 23417680 DOI: 10.1093/sysbio/syt009] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Genome-scale data offer the opportunity to clarify phylogenetic relationships that are difficult to resolve with few loci, but they can also identify genomic regions with evolutionary history distinct from that of the species history. We collected whole-genome sequence data from 29 taxa in the legume genus Medicago, then aligned these sequences to the Medicago truncatula reference genome to confidently identify 87 596 variable homologous sites. We used this data set to estimate phylogenetic relationships among Medicago species, to investigate the number of sites needed to provide robust phylogenetic estimates and to identify specific genomic regions supporting topologies in conflict with the genome-wide phylogeny. Our full genomic data set resolves relationships within the genus that were previously intractable. Subsampling the data reveals considerable variation in phylogenetic signal and power in smaller subsets of the data. Even when sampling 5000 sites, no random sample of the data supports a topology identical to that of the genome-wide phylogeny. Phylogenetic relationships estimated from 500-site sliding windows revealed genome regions supporting several alternative species relationships among recently diverged taxa, consistent with the expected effects of deep coalescence or introgression in the recent history of Medicago.
Collapse
Affiliation(s)
- Jeremy B Yoder
- Department of Plant Biology, University of Minnesota, Saint Paul MN 55108, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|