1
|
Boom AF, Migliore J, Ojeda Alayon DI, Kaymak E, Hardy OJ. Phylogenomics of Brachystegia: Insights into the origin of African miombo woodlands. AMERICAN JOURNAL OF BOTANY 2024; 111:e16352. [PMID: 38853465 DOI: 10.1002/ajb2.16352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 06/11/2024]
Abstract
PREMISE Phylogenetic approaches can provide valuable insights on how and when a biome emerged and developed using its structuring species. In this context, Brachystegia Benth, a dominant genus of trees in miombo woodlands, appears as a key witness of the history of the largest woodland and savanna biome of Africa. METHODS We reconstructed the evolutionary history of the genus using targeted-enrichment sequencing on 60 Brachystegia specimens for a nearly complete species sampling. Phylogenomic inferences used supermatrix (RAxML-NG) and summary-method (ASTRAL-III) approaches. Conflicts between species and gene trees were assessed, and the phylogeny was time-calibrated in BEAST. Introgression between species was explored using Phylonet. RESULTS The phylogenies were globally congruent regardless of the method used. Most of the species were recovered as monophyletic, unlike previous plastid phylogenetic reconstructions where lineages were shared among geographically close individuals independently of species identity. Still, most of the individual gene trees had low levels of phylogenetic information and, when informative, were mostly in conflict with the reconstructed species trees. These results suggest incomplete lineage sorting and/or reticulate evolution, which was supported by network analyses. The BEAST analysis supported a Pliocene origin for current Brachystegia lineages, with most of the diversification events dated to the Pliocene-Pleistocene. CONCLUSIONS These results suggest a recent origin of species of the miombo, congruently with their spatial expansion documented from plastid data. Brachystegia species appear to behave potentially as a syngameon, a group of interfertile but still relatively well-delineated species, an aspect that deserves further investigations.
Collapse
Affiliation(s)
- Arthur F Boom
- Royal Museum for Central Africa, Biology Department, Section Vertebrates, Tervuren, Belgium
- Université Libre de Bruxelles, Faculté des Sciences, Service Evolution Biologique et Ecologie, Bruxelles, Belgium
| | - Jérémy Migliore
- Université Libre de Bruxelles, Faculté des Sciences, Service Evolution Biologique et Ecologie, Bruxelles, Belgium
- Muséum départemental du Var, Toulon, France
| | - Dario I Ojeda Alayon
- Muséum départemental du Var, Toulon, France
- Department of Forest Biodiversity, Norwegian Institute of Bioeconomy Research, Ås, Norway
| | - Esra Kaymak
- Université Libre de Bruxelles, Faculté des Sciences, Service Evolution Biologique et Ecologie, Bruxelles, Belgium
- Institute of Science and Technology (OIST), Okinawa, Japan
| | - Olivier J Hardy
- Université Libre de Bruxelles, Faculté des Sciences, Service Evolution Biologique et Ecologie, Bruxelles, Belgium
| |
Collapse
|
2
|
Lyu ZT, Zeng ZC, Wan H, Li Q, Tominaga A, Nishikawa K, Matsui M, Li SZ, Jiang ZW, Liu Y, Wang YY. Contrasting nidification behaviors facilitate diversification and colonization of the Music frogs under a changing paleoclimate. Commun Biol 2024; 7:638. [PMID: 38796601 PMCID: PMC11127999 DOI: 10.1038/s42003-024-06347-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 05/17/2024] [Indexed: 05/28/2024] Open
Abstract
In order to cope with the complexity and variability of the terrestrial environment, amphibians have developed a wide range of reproductive and parental behaviors. Nest building occurs in some anuran species as parental care. Species of the Music frog genus Nidirana are known for their unique courtship behavior and mud nesting in several congeners. However, the evolution of these frogs and their nidification behavior has yet to be studied. With phylogenomic and phylogeographic analyses based on a wide sampling of the genus, we find that Nidirana originated from central-southwestern China and the nidification behavior initially evolved at ca 19.3 Ma but subsequently lost in several descendants. Further population genomic analyses suggest that the nidification species have an older diversification and colonization history, while N. adenopleura complex congeners that do not exhibit nidification behavior have experienced a recent rapid radiation. The presence and loss of the nidification behavior in the Music frogs may be associated with paleoclimatic factors such as temperature and precipitation. This study highlights the nidification behavior as a key evolutionary innovation that has contributed to the diversification of an amphibian group under past climate changes.
Collapse
Affiliation(s)
- Zhi-Tong Lyu
- State Key Laboratory of Biocontrol, School of Ecology / School of Life Sciences, Sun Yat-sen University, Shenzhen, 518107, China
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610040, China
| | - Zhao-Chi Zeng
- State Key Laboratory of Biocontrol, School of Ecology / School of Life Sciences, Sun Yat-sen University, Shenzhen, 518107, China
| | - Han Wan
- State Key Laboratory of Biocontrol, School of Ecology / School of Life Sciences, Sun Yat-sen University, Shenzhen, 518107, China
| | - Qin Li
- Zhejiang Tiantong Forest Ecosystem National Observation and Research Station, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Atsushi Tominaga
- Faculty of Education, University of the Ryukyus, Senbaru 1 Nishihara, Okinawa, 903-0213, Japan
| | - Kanto Nishikawa
- Graduate School of Global Environmental Studies, Kyoto University, Yoshida-hon-machi, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Masafumi Matsui
- Graduate School of Human and Environmental Studies, Kyoto University, Yoshida-Nihon-matsu, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Shi-Ze Li
- Department of Food Science and Engineering, Moutai Institute, Renhuai, 564500, China
| | - Zhong-Wen Jiang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yang Liu
- State Key Laboratory of Biocontrol, School of Ecology / School of Life Sciences, Sun Yat-sen University, Shenzhen, 518107, China.
| | - Ying-Yong Wang
- State Key Laboratory of Biocontrol, School of Ecology / School of Life Sciences, Sun Yat-sen University, Shenzhen, 518107, China.
| |
Collapse
|
3
|
Bernot JP, Owen CL, Wolfe JM, Meland K, Olesen J, Crandall KA. Major Revisions in Pancrustacean Phylogeny and Evidence of Sensitivity to Taxon Sampling. Mol Biol Evol 2023; 40:msad175. [PMID: 37552897 PMCID: PMC10414812 DOI: 10.1093/molbev/msad175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 06/14/2023] [Accepted: 06/19/2023] [Indexed: 08/10/2023] Open
Abstract
The clade Pancrustacea, comprising crustaceans and hexapods, is the most diverse group of animals on earth, containing over 80% of animal species and half of animal biomass. It has been the subject of several recent phylogenomic analyses, yet relationships within Pancrustacea show a notable lack of stability. Here, the phylogeny is estimated with expanded taxon sampling, particularly of malacostracans. We show small changes in taxon sampling have large impacts on phylogenetic estimation. By analyzing identical orthologs between two slightly different taxon sets, we show that the differences in the resulting topologies are due primarily to the effects of taxon sampling on the phylogenetic reconstruction method. We compare trees resulting from our phylogenomic analyses with those from the literature to explore the large tree space of pancrustacean phylogenetic hypotheses and find that statistical topology tests reject the previously published trees in favor of the maximum likelihood trees produced here. Our results reject several clades including Caridoida, Eucarida, Multicrustacea, Vericrustacea, and Syncarida. Notably, we find Copepoda nested within Allotriocarida with high support and recover a novel relationship between decapods, euphausiids, and syncarids that we refer to as the Syneucarida. With denser taxon sampling, we find Stomatopoda sister to this latter clade, which we collectively name Stomatocarida, dividing Malacostraca into three clades: Leptostraca, Peracarida, and Stomatocarida. A new Bayesian divergence time estimation is conducted using 13 vetted fossils. We review our results in the context of other pancrustacean phylogenetic hypotheses and highlight 15 key taxa to sample in future studies.
Collapse
Affiliation(s)
- James P Bernot
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Christopher L Owen
- Systematic Entomology Laboratory, USDA-ARS, ℅ National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Joanna M Wolfe
- Museum of Comparative Zoology and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Kenneth Meland
- Department of Biology, University of Bergen, Bergen, Norway
| | - Jørgen Olesen
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Keith A Crandall
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
4
|
Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022; 39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open
Abstract
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA
| | | |
Collapse
|
5
|
Thureborn O, Razafimandimbison SG, Wikström N, Rydin C. Target capture data resolve recalcitrant relationships in the coffee family (Rubioideae, Rubiaceae). FRONTIERS IN PLANT SCIENCE 2022; 13:967456. [PMID: 36160958 PMCID: PMC9493367 DOI: 10.3389/fpls.2022.967456] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 08/03/2022] [Indexed: 06/16/2023]
Abstract
Subfamily Rubioideae is the largest of the main lineages in the coffee family (Rubiaceae), with over 8,000 species and 29 tribes. Phylogenetic relationships among tribes and other major clades within this group of plants are still only partly resolved despite considerable efforts. While previous studies have mainly utilized data from the organellar genomes and nuclear ribosomal DNA, we here use a large number of low-copy nuclear genes obtained via a target capture approach to infer phylogenetic relationships within Rubioideae. We included 101 Rubioideae species representing all but two (the monogeneric tribes Foonchewieae and Aitchinsonieae) of the currently recognized tribes, and all but one non-monogeneric tribe were represented by more than one genus. Using data from the 353 genes targeted with the universal Angiosperms353 probe set we investigated the impact of data type, analytical approach, and potential paralogs on phylogenetic reconstruction. We inferred a robust phylogenetic hypothesis of Rubioideae with the vast majority (or all) nodes being highly supported across all analyses and datasets and few incongruences between the inferred topologies. The results were similar to those of previous studies but novel relationships were also identified. We found that supercontigs [coding sequence (CDS) + non-coding sequence] clearly outperformed CDS data in levels of support and gene tree congruence. The full datasets (353 genes) outperformed the datasets with potentially paralogous genes removed (186 genes) in levels of support but increased gene tree incongruence slightly. The pattern of gene tree conflict at short internal branches were often consistent with high levels of incomplete lineage sorting (ILS) due to rapid speciation in the group. While concatenation- and coalescence-based trees mainly agreed, the observed phylogenetic discordance between the two approaches may be best explained by their differences in accounting for ILS. The use of target capture data greatly improved our confidence and understanding of the Rubioideae phylogeny, highlighted by the increased support for previously uncertain relationships and the increased possibility to explore sources of underlying phylogenetic discordance.
Collapse
Affiliation(s)
- Olle Thureborn
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, Sweden
| | | | - Niklas Wikström
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, Sweden
- Bergius Foundation, Royal Swedish Academy of Sciences, Stockholm, Sweden
| | - Catarina Rydin
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, Sweden
- Bergius Foundation, Royal Swedish Academy of Sciences, Stockholm, Sweden
| |
Collapse
|
6
|
Mahbub S, Sawmya S, Saha A, Reaz R, Rahman MS, Bayzid MS. Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data. JOURNAL OF COMPUTATIONAL BIOLOGY : A JOURNAL OF COMPUTATIONAL MOLECULAR CELL BIOLOGY 2022; 29:1156-1172. [PMID: 36048555 DOI: 10.1089/cmb.2022.0212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, for a combination of reasons (ranging from sampling biases to more biological causes, as in gene birth and loss), gene trees are often incomplete, meaning that not all species of interest have a common set of genes. Incomplete gene trees can potentially impact the accuracy of phylogenomic inference. We, for the first time, introduce the problem of imputing the quartet distribution induced by a set of incomplete gene trees, which involves adding the missing quartets back to the quartet distribution. We present Quartet based Gene tree Imputation using Deep Learning (QT-GILD), an automated and specially tailored unsupervised deep learning technique, accompanied by cues from natural language processing, which learns the quartet distribution in a given set of incomplete gene trees and generates a complete set of quartets accordingly. QT-GILD is a general-purpose technique needing no explicit modeling of the subject system or reasons for missing data or gene tree heterogeneity. Experimental studies on a collection of simulated and empirical datasets suggest that QT-GILD can effectively impute the quartet distribution, which results in a dramatic improvement in the species tree accuracy. Remarkably, QT-GILD not only imputes the missing quartets but can also account for gene tree estimation error. Therefore, QT-GILD advances the state-of-the-art in species tree estimation from gene trees in the face of missing data.
Collapse
Affiliation(s)
- Sazan Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.,Department of Computer Science, University of Maryland, College Park, Maryland, USA
| | - Shashata Sawmya
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Arpita Saha
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - M Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| |
Collapse
|
7
|
Xiong H, Wang D, Shao C, Yang X, Yang J, Ma T, Davis CC, Liu L, Xi Z. Species Tree Estimation and the Impact of Gene Loss Following Whole-Genome Duplication. Syst Biol 2022; 71:1348-1361. [PMID: 35689633 PMCID: PMC9558847 DOI: 10.1093/sysbio/syac040] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 12/02/2022] Open
Abstract
Whole-genome duplication (WGD) occurs broadly and repeatedly across the history of eukaryotes and is recognized as a prominent evolutionary force, especially in plants. Immediately following WGD, most genes are present in two copies as paralogs. Due to this redundancy, one copy of a paralog pair commonly undergoes pseudogenization and is eventually lost. When speciation occurs shortly after WGD; however, differential loss of paralogs may lead to spurious phylogenetic inference resulting from the inclusion of pseudoorthologs–paralogous genes mistakenly identified as orthologs because they are present in single copies within each sampled species. The influence and impact of including pseudoorthologs versus true orthologs as a result of gene extinction (or incomplete laboratory sampling) are only recently gaining empirical attention in the phylogenomics community. Moreover, few studies have yet to investigate this phenomenon in an explicit coalescent framework. Here, using mathematical models, numerous simulated data sets, and two newly assembled empirical data sets, we assess the effect of pseudoorthologs on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and differential gene loss scenarios following WGD. When gene loss occurs along the terminal branches of the species tree, alignment-based (BPP) and gene-tree-based (ASTRAL, MP-EST, and STAR) coalescent methods are adversely affected as the degree of ILS increases. This can be greatly improved by sampling a sufficiently large number of genes. Under the same circumstances, however, concatenation methods consistently estimate incorrect species trees as the number of genes increases. Additionally, pseudoorthologs can greatly mislead species tree inference when gene loss occurs along the internal branches of the species tree. Here, both coalescent and concatenation methods yield inconsistent results. These results underscore the importance of understanding the influence of pseudoorthologs in the phylogenomics era. [Coalescent method; concatenation method; incomplete lineage sorting; pseudoorthologs; single-copy gene; whole-genome duplication.]
Collapse
Affiliation(s)
- Haifeng Xiong
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Danying Wang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Chen Shao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Xuchen Yang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Jialin Yang
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Tao Ma
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Zhenxiang Xi
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| |
Collapse
|
8
|
Rahman MA, Tutul AA, Abdullah SM, Bayzid MS. CHAPAO: Likelihood and hierarchical reference-based representation of biomolecular sequences and applications to compressing multiple sequence alignments. PLoS One 2022; 17:e0265360. [PMID: 35436292 PMCID: PMC9015123 DOI: 10.1371/journal.pone.0265360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Accepted: 02/28/2022] [Indexed: 11/18/2022] Open
Abstract
Background
High-throughput experimental technologies are generating tremendous amounts of genomic data, offering valuable resources to answer important questions and extract biological insights. Storing this sheer amount of genomic data has become a major concern in bioinformatics. General purpose compression techniques (e.g. gzip, bzip2, 7-zip) are being widely used due to their pervasiveness and relatively good speed. However, they are not customized for genomic data and may fail to leverage special characteristics and redundancy of the biomolecular sequences.
Results
We present a new lossless compression method CHAPAO (COmpressing Alignments using Hierarchical and Probabilistic Approach), which is especially designed for multiple sequence alignments (MSAs) of biomolecular data and offers very good compression gain. We have introduced a novel hierarchical referencing technique to represent biomolecular sequences which combines likelihood based analyses of the sequence similarities and graph theoretic algorithms. We performed an extensive evaluation study using a collection of real biological data from the avian phylogenomics project, 1000 plants project (1KP), and 16S and 23S rRNA datasets. We report the performance of CHAPAO in comparison with general purpose compression techniques as well as with MFCompress and Nucleotide Archival Format (NAF)—two of the best known methods especially designed for FASTA files. Experimental results suggest that CHAPAO offers significant improvements in compression gain over most other alternative methods. CHAPAO is freely available as an open source software at https://github.com/ashiq24/CHAPAO.
Conclusion
CHAPAO advances the state-of-the-art in compression algorithms and represents a potential alternative to the general purpose compression techniques as well as to the existing specialized compression techniques for biomolecular sequences.
Collapse
Affiliation(s)
- Md Ashiqur Rahman
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Abdullah Aman Tutul
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Sifat Muhammad Abdullah
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md. Shamsuzzoha Bayzid
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
- * E-mail:
| |
Collapse
|
9
|
Dasarathy G, Mossel E, Nowak R, Roch S. A stochastic Farris transform for genetic data under the multispecies coalescent with applications to data requirements. J Math Biol 2022; 84:36. [PMID: 35394192 PMCID: PMC9258723 DOI: 10.1007/s00285-022-01731-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 02/15/2022] [Accepted: 02/17/2022] [Indexed: 10/18/2022]
Abstract
Species tree estimation faces many significant hurdles. Chief among them is that the trees describing the ancestral lineages of each individual gene-the gene trees-often differ from the species tree. The multispecies coalescent is commonly used to model this gene tree discordance, at least when it is believed to arise from incomplete lineage sorting, a population-genetic effect. Another significant challenge in this area is that molecular sequences associated to each gene typically provide limited information about the gene trees themselves. While the modeling of sequence evolution by single-site substitutions is well-studied, few species tree reconstruction methods with theoretical guarantees actually address this latter issue. Instead, a standard-but unsatisfactory-assumption is that gene trees are perfectly reconstructed before being fed into a so-called summary method. Hence much remains to be done in the development of inference methodologies that rigorously account for gene tree estimation error-or completely avoid gene tree estimation in the first place. In previous work, a data requirement trade-off was derived between the number of loci m needed for an accurate reconstruction and the length of the locus sequences k. It was shown that to reconstruct an internal branch of length f, one needs m to be of the order of [Formula: see text]. That previous result was obtained under the restrictive assumption that mutation rates as well as population sizes are constant across the species phylogeny. Here we further generalize this result beyond this assumption. Our main contribution is a novel reduction to the molecular clock case under the multispecies coalescent, which we refer to as a stochastic Farris transform. As a corollary, we also obtain a new identifiability result of independent interest: for any species tree with [Formula: see text] species, the rooted topology of the species tree can be identified from the distribution of its unrooted weighted gene trees even in the absence of a molecular clock.
Collapse
Affiliation(s)
- Gautam Dasarathy
- School of Electrical, Computer, and Energy Engineering, Arizona State University, Tempe, USA
| | - Elchanan Mossel
- Department of Mathematics and IDSS, Massachusetts Institute of Technology, Cambridge, USA
| | - Robert Nowak
- Department of Electrical and Computer Engineering, University of Wisconsin, Madison, USA
| | - Sebastien Roch
- Department of Mathematics, University of Wisconsin, Madison, USA.
| |
Collapse
|
10
|
Abstract
Motivation Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction. Results We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees. Availability and implementation QuCo is available on https://github.com/maryamrabiee/quco. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maryam Rabiee
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | | |
Collapse
|
11
|
Amaral DT, Romeiro-Brito M, Bonatelli IAS. Exploring Phylogenetic Relationships and Divergence Times of Bioluminescent Species Using Genomic and Transcriptomic Data. Methods Mol Biol 2022; 2525:409-423. [PMID: 35836087 DOI: 10.1007/978-1-0716-2473-9_32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Next-generation sequencing (NGS) has dominated the scene of genomics and evolutionary biology as a great amount of genomic data have been accumulated for a diverse set of species. At the same time, phylogenetic approaches and programs are in development to allow better use of such large-size datasets. Phylogenomics appears as a promising field to accommodate and explore all the information of NGS data in phylogenetic methods, being an important approach to investigate the evolution of bioluminescence in different organisms. To guarantee accurate results in phylogenomic studies, it is mandatory to correctly identify orthologous genes in phylogenetic reconstruction. Here, we show a simplified step-by-step framework to perform phylogenetic analysis along with divergence time estimation, beginning with an orthologous search. As empirical data, we exemplify transcriptome sequences of six species of the Elateroidea superfamily (Coleoptera). We introduce several bioinformatics tools for handling genomic data, especially those available in the software OrthoFinder, IQTREE, BEAST2, and TreePL.
Collapse
Affiliation(s)
- Danilo T Amaral
- Departamento de Biologia, Centro de Ciências Humanas e Biológicas, Universidade Federal de São Carlos (UFSCar), Sorocaba, Brazil.
- Programa de Pós Graduação em Biologia Comparada, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo (USP), Ribeirão Preto, Brazil.
| | - Monique Romeiro-Brito
- Departamento de Biologia, Centro de Ciências Humanas e Biológicas, Universidade Federal de São Carlos (UFSCar), Sorocaba, Brazil
| | - Isabel A S Bonatelli
- Departamento de Ecologia e Biologia Evolutiva, Universidade Federal de São Paulo (UNIFESP), Diadema, São Paulo, Brazil
| |
Collapse
|
12
|
How challenging RADseq data turned out to favor coalescent-based species tree inference. A case study in Aichryson (Crassulaceae). Mol Phylogenet Evol 2021; 167:107342. [PMID: 34785384 DOI: 10.1016/j.ympev.2021.107342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 07/05/2021] [Accepted: 10/29/2021] [Indexed: 12/24/2022]
Abstract
Analysing multiple genomic regions while incorporating detection and qualification of discordance among regions has become standard for understanding phylogenetic relationships. In plants, which usually have comparatively large genomes, this is feasible by the combination of reduced-representation library (RRL) methods and high-throughput sequencing enabling the cost effective acquisition of genomic data for thousands of loci from hundreds of samples. One popular RRL method is RADseq. A major disadvantage of established RADseq approaches is the rather short fragment and sequencing range, leading to loci of little individual phylogenetic information. This issue hampers the application of coalescent-based species tree inference. The modified RADseq protocol presented here targets ca. 5,000 loci of 300-600nt length, sequenced with the latest short-read-sequencing (SRS) technology, has the potential to overcome this drawback. To illustrate the advantages of this approach we use the study group Aichryson Webb & Berthelott (Crassulaceae), a plant genus that diversified on the Canary Islands. The data analysis approach used here aims at a careful quality control of the long loci dataset. It involves an informed selection of thresholds for accurate clustering, a thorough exploration of locus properties, such as locus length, coverage and variability, to identify potential biased data and a comparative phylogenetic inference of filtered datasets, accompanied by an evaluation of resulting BS support, gene and site concordance factor values, to improve overall resolution of the resulting phylogenetic trees. The final dataset contains variable loci with an average length of 373nt and facilitates species tree estimation using a coalescent-based summary approach. Additional improvements brought by the approach are critically discussed.
Collapse
|
13
|
Forthman M, Braun EL, Kimball RT. Gene tree quality affects empirical coalescent branch length estimation. ZOOL SCR 2021. [DOI: 10.1111/zsc.12512] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Michael Forthman
- Department of Entomology & Nematology University of Florida Gainesville FL USA
- California State Collection of Arthropods Plant Pest Diagnostics Branch California Department of Food & Agriculture Sacramento CA USA
| | - Edward L. Braun
- Department of Biology University of Florida Gainesville FL USA
| | | |
Collapse
|
14
|
Rabier CE, Berry V, Stoltz M, Santos JD, Wang W, Glaszmann JC, Pardi F, Scornavacca C. On the inference of complex phylogenetic networks by Markov Chain Monte-Carlo. PLoS Comput Biol 2021; 17:e1008380. [PMID: 34478440 PMCID: PMC8445492 DOI: 10.1371/journal.pcbi.1008380] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 09/16/2021] [Accepted: 07/13/2021] [Indexed: 11/19/2022] Open
Abstract
For various species, high quality sequences and complete genomes are nowadays available for many individuals. This makes data analysis challenging, as methods need not only to be accurate, but also time efficient given the tremendous amount of data to process. In this article, we introduce an efficient method to infer the evolutionary history of individuals under the multispecies coalescent model in networks (MSNC). Phylogenetic networks are an extension of phylogenetic trees that can contain reticulate nodes, which allow to model complex biological events such as horizontal gene transfer, hybridization and introgression. We present a novel way to compute the likelihood of biallelic markers sampled along genomes whose evolution involved such events. This likelihood computation is at the heart of a Bayesian network inference method called SnappNet, as it extends the Snapp method inferring evolutionary trees under the multispecies coalescent model, to networks. SnappNet is available as a package of the well-known beast 2 software. Recently, the MCMC_BiMarkers method, implemented in PhyloNet, also extended Snapp to networks. Both methods take biallelic markers as input, rely on the same model of evolution and sample networks in a Bayesian framework, though using different methods for computing priors. However, SnappNet relies on algorithms that are exponentially more time-efficient on non-trivial networks. Using simulations, we compare performances of SnappNet and MCMC_BiMarkers. We show that both methods enjoy similar abilities to recover simple networks, but SnappNet is more accurate than MCMC_BiMarkers on more complex network scenarios. Also, on complex networks, SnappNet is found to be extremely faster than MCMC_BiMarkers in terms of time required for the likelihood computation. We finally illustrate SnappNet performances on a rice data set. SnappNet infers a scenario that is consistent with previous results and provides additional understanding of rice evolution.
Collapse
Affiliation(s)
- Charles-Elie Rabier
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
- Institut Montpelliérain Alexander Grothendieck (IMAG), Université de Montpellier, CNRS, Montpellier, France
| | - Vincent Berry
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
| | - Marnus Stoltz
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - João D. Santos
- CIRAD, UMR AGAP, Montpellier, France
- Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP), Université de Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Wensheng Wang
- Institute of Crop Sciences (ICS), Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jean-Christophe Glaszmann
- CIRAD, UMR AGAP, Montpellier, France
- Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP), Université de Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Fabio Pardi
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
| |
Collapse
|
15
|
Gene flow in phylogenomics: Sequence capture resolves species limits and biogeography of Afromontane forest endemic frogs from the Cameroon Highlands. Mol Phylogenet Evol 2021; 163:107258. [PMID: 34252546 DOI: 10.1016/j.ympev.2021.107258] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 06/28/2021] [Accepted: 07/07/2021] [Indexed: 11/21/2022]
Abstract
Puddle frogs of the Phrynobatrachus steindachneri species complex are a useful group for investigating speciation and phylogeography in Afromontane forests of the Cameroon Volcanic Line, western Central Africa. The species complex is represented by six morphologically relatively cryptic mitochondrial DNA lineages, only two of which are distinguished at the species level - southern P. jimzimkusi and Lake Oku endemic P. njiomock, leaving the remaining four lineages identified as 'P. steindachneri'. In this study, the six mtDNA lineages are subjected to genomic sequence capture analyses and morphological examination to delimit species and to study biogeography. The nuclear DNA data (387 loci; 571,936 aligned base pairs) distinguished all six mtDNA lineages, but the topological pattern and divergence depths supported only four main clades: P. jimzimkusi, P. njiomock, and only two divergent evolutionary lineages within the four 'P. steindachneri' mtDNA lineages. One of the two lineages is herein described as a new species, P. amieti sp. nov. Reticulate evolution (hybridization) was detected within the species complex with morphologically intermediate hybrid individuals placed between the parental species in phylogenomic analyses, forming a ladder-like phylogenetic pattern. The presence of hybrids is undesirable in standard phylogenetic analyses but is essential and beneficial in the network multispecies coalescent. This latter approach provided insight into the reticulate evolutionary history of these endemic frogs. Introgressions likely occurred during the Middle and Late Pleistocene climatic oscillations, due to the cyclic connections (likely dominating during cold glacials) and separations (during warm interglacials) of montane forests. The genomic phylogeographic pattern supports the separation of the southern (Mt. Manengouba to Mt. Oku) and northern mountains at the onset of the Pleistocene. Further subdivisions occurred in the Early Pleistocene, separating populations from the northernmost (Tchabal Mbabo, Gotel Mts.) and middle mountains (Mt. Mbam, Mt. Oku, Mambilla Plateau), as well as the microendemic lineage restricted to Lake Oku (Mt. Oku). This unique model system is highly threatened as all the species within the complex have exhibited severe population declines in the past decade, placing them on the brink of extinction. In addition, Mount Oku is identified to be of particular conservation importance because it harbors three species of this complex. We, therefore, urge for conservation actions in the Cameroon Highlands to preserve their diversity before it is too late.
Collapse
|
16
|
Mahbub M, Wahab Z, Reaz R, Rahman MS, Bayzid MS. wQFM: Highly Accurate Genome-scale Species Tree Estimation from Weighted Quartets. Bioinformatics 2021; 37:3734-3743. [PMID: 34086858 DOI: 10.1093/bioinformatics/btab428] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 06/03/2021] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION Species tree estimation from genes sampled from throughout the whole genome is complicated due to the gene tree-species tree discordance. Incomplete lineage sorting (ILS) is one of the most frequent causes for this discordance, where alleles can coexist in populations for periods that may span several speciation events. Quartet-based summary methods for estimating species trees from a collection of gene trees are becoming popular due to their high accuracy and statistical guarantee under ILS. Generating quartets with appropriate weights, where weights correspond to the relative importance of quartets, and subsequently amalgamating the weighted quartets to infer a single coherent species tree can allow for a statistically consistent way of estimating species trees. However, handling weighted quartets is challenging. RESULTS We propose wQFM, a highly accurate method for species tree estimation from multi-locus data, by extending the quartet FM (QFM) algorithm to a weighted setting. wQFM was assessed on a collection of simulated and real biological datasets, including the avian phylogenomic dataset which is one of the largest phylogenomic datasets to date. We compared wQFM with wQMC, which is the best alternate method for weighted quartet amalgamation, and with ASTRAL, which is one of the most accurate and widely used coalescent-based species tree estimation methods. Our results suggest that wQFM matches or improves upon the accuracy of wQMC and ASTRAL. AVAILABILITY wQFM is available in open source form at https://github.com/Mahim1997/wQFM-2020. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mahim Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Zahin Wahab
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - M Saifur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| |
Collapse
|
17
|
Nydam ML, Lemmon AR, Cherry JR, Kortyna ML, Clancy DL, Hernandez C, Cohen CS. Phylogenomic and morphological relationships among the botryllid ascidians (Subphylum Tunicata, Class Ascidiacea, Family Styelidae). Sci Rep 2021; 11:8351. [PMID: 33863944 PMCID: PMC8052435 DOI: 10.1038/s41598-021-87255-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 02/16/2021] [Indexed: 02/02/2023] Open
Abstract
Ascidians (Phylum Chordata, Class Ascidiacea) are a large group of invertebrates which occupy a central role in the ecology of marine benthic communities. Many ascidian species have become successfully introduced around the world via anthropogenic vectors. The botryllid ascidians (Order Stolidobranchia, Family Styelidae) are a group of 53 colonial species, several of which are widespread throughout temperate or tropical and subtropical waters. However, the systematics and biology of this group of ascidians is not well-understood. To provide a systematic framework for this group, we have constructed a well-resolved phylogenomic tree using 200 novel loci and 55 specimens. A Principal Components Analysis of all species described in the literature using 31 taxonomic characteristics revealed that some species occupy a unique morphological space and can be easily identified using characteristics of adult colonies. For other species, additional information such as larval or life history characteristics may be required for taxonomic discrimination. Molecular barcodes are critical for guiding the delineation of morphologically similar species in this group.
Collapse
Affiliation(s)
- Marie L Nydam
- Math and Science Program, Soka University of America, 1 University Drive, Aliso Viejo, CA, 92656, USA.
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, 400 Dirac Science Library, Tallahassee, FL, 32306, USA
| | - Jesse R Cherry
- Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, FL, 32306, USA
| | - Michelle L Kortyna
- Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, FL, 32306, USA
| | - Darragh L Clancy
- Biology Department and Estuarine and Ocean Science Center, San Francisco State University, 3150 Paradise Drive, Tiburon, CA, 94920, USA
| | - Cecilia Hernandez
- Biology Department and Estuarine and Ocean Science Center, San Francisco State University, 3150 Paradise Drive, Tiburon, CA, 94920, USA
| | - C Sarah Cohen
- Biology Department and Estuarine and Ocean Science Center, San Francisco State University, 3150 Paradise Drive, Tiburon, CA, 94920, USA
| |
Collapse
|
18
|
Farah IT, Islam MM, Zinat KT, Rahman AH, Bayzid MS. Species tree estimation from gene trees by minimizing deep coalescence and maximizing quartet consistency: a comparative study and the presence of pseudo species tree terraces. Syst Biol 2021; 70:1213-1231. [PMID: 33844023 DOI: 10.1093/sysbio/syab026] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 03/25/2021] [Accepted: 03/29/2021] [Indexed: 11/14/2022] Open
Abstract
Species tree estimation from multi-locus datasets is extremely challenging, especially in the presence of gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Summary methods have been developed which estimate gene trees and then combine the gene trees to estimate a species tree by optimizing various optimization scores. In this study, we have extended and adapted the concept of phylogenetic terraces to species tree estimation by "summarizing" a set of gene trees, where multiple species trees with distinct topologies may have exactly the same optimality score (i.e., quartet score, extra lineage score, etc.). We particularly investigated the presence and impacts of equally optimal trees in species tree estimation from multi-locus data using summary methods by taking ILS into account. We analyzed two of the most popular ILS-aware optimization criteria: maximize quartet consistency (MQC) and minimize deep coalescence (MDC). Methods based on MQC are provably statistically consistent, whereas MDC is not a consistent criterion for species tree estimation. We present a comprehensive comparative study of these two optimality criteria. Our experiments, on a collection of datasets simulated under ILS, indicate that MDC may result in competitive or identical quartet consistency score as MQC, but could be significantly worse than MQC in terms of tree accuracy - demonstrating the presence and impacts of equally optimal species trees. This is the first known study that provides the conditions for the datasets to have equally optimal trees in the context of phylogenomic inference using summary methods.
Collapse
Affiliation(s)
- Ishrat Tanzila Farah
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh
| | - Md Muktadirul Islam
- Applied Statistics and Data Science (ASDS), Department of Statistics Jahangirnagar University Dhaka-1342, Bangladesh
| | - Kazi Tasnim Zinat
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh.,Department of Computer Science University of Maryland, College Park, Maryland, USA
| | - Atif Hasan Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh
| |
Collapse
|
19
|
|
20
|
Sarver BAJ, Herrera ND, Sneddon D, Hunter SS, Settles ML, Kronenberg Z, Demboski JR, Good JM, Sullivan J. Diversification, Introgression, and Rampant Cytonuclear Discordance in Rocky Mountains Chipmunks (Sciuridae: Tamias). Syst Biol 2021; 70:908-921. [PMID: 33410870 DOI: 10.1093/sysbio/syaa085] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Revised: 11/02/2020] [Accepted: 11/03/2020] [Indexed: 12/18/2022] Open
Abstract
Evidence from natural systems suggests that hybridization between animal species is more common than traditionally thought, but the overall contribution of introgression to standing genetic variation within species remains unclear for most animal systems. Here, we use targeted exon-capture to sequence thousands of nuclear loci and complete mitochondrial genomes from closely related chipmunk species in the Tamias quadrivittatus group that are distributed across the Great Basin and the central and southern Rocky Mountains of North America. This recent radiation includes six overlapping, ecologically distinct species (T. canipes, T. cinereicollis, T. dorsalis, T. quadrivittatus, T. rufus, and T. umbrinus) that show evidence for widespread introgression across species boundaries. Such evidence has historically been derived from a handful of markers, typically focused on mitochondrial loci, to describe patterns of introgression; consequently, the extent of introgression of nuclear genes is less well characterized. We conducted a series of phylogenomic and species-tree analyses to resolve the phylogeny of six species in this group. In addition, we performed several population genomic analyses to characterize nuclear genomes and infer coancestry among individuals. Furthermore, we used emerging quartets-based approaches to simultaneously infer the species tree (SVDquartets) and identify introgression (HyDe). We found that, in spite of rampant introgression of mitochondrial genomes between some species pairs (and sometimes involving up to three species), there appears to be little to no evidence for nuclear introgression. These findings mirror other genomic results where complete mitochondrial capture has occurred between chipmunk species in the absence of appreciable nuclear gene flow. The underlying causes of recurrent massive cytonuclear discordance remain unresolved in this group but mitochondrial DNA appears highly misleading of population histories as a whole. Collectively, it appears that chipmunk species boundaries are largely impermeable to nuclear gene flow and that hybridization, while pervasive with respect to mtDNA, has likely played a relatively minor role in the evolutionary history of this group.
Collapse
Affiliation(s)
- Brice A J Sarver
- Department of Biological Sciences, University of Idaho, Moscow, Idaho.,Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow Idaho
| | | | - David Sneddon
- Department of Biological Sciences, University of Idaho, Moscow, Idaho
| | - Samuel S Hunter
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow Idaho.,UC-Davis Genome Center, Davis, California
| | | | | | - John R Demboski
- Department of Zoology, Denver Museum of Nature & Sciences, Denver, Colorado
| | - Jeffrey M Good
- Division of Biological Sciences, University of Montana, Missoula, Montana.,Wildlife Biology Program, University of Montana, Missoula, Montana
| | - Jack Sullivan
- Department of Biological Sciences, University of Idaho, Moscow, Idaho.,Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow Idaho
| |
Collapse
|
21
|
Bossert S, Murray EA, Pauly A, Chernyshov K, Brady SG, Danforth BN. Gene Tree Estimation Error with Ultraconserved Elements: An Empirical Study on Pseudapis Bees. Syst Biol 2020; 70:803-821. [PMID: 33367855 DOI: 10.1093/sysbio/syaa097] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 11/18/2020] [Accepted: 12/02/2020] [Indexed: 11/12/2022] Open
Abstract
Summarizing individual gene trees to species phylogenies using two-step coalescent methods is now a standard strategy in the field of phylogenomics. However, practical implementations of summary methods suffer from gene tree estimation error, which is caused by various biological and analytical factors. Greatly understudied is the choice of gene tree inference method and downstream effects on species tree estimation for empirical data sets. To better understand the impact of this method choice on gene and species tree accuracy, we compare gene trees estimated through four widely used programs under different model-selection criteria: PhyloBayes, MrBayes, IQ-Tree, and RAxML. We study their performance in the phylogenomic framework of $>$800 ultraconserved elements from the bee subfamily Nomiinae (Halictidae). Our taxon sampling focuses on the genus Pseudapis, a distinct lineage with diverse morphological features, but contentious morphology-based taxonomic classifications and no molecular phylogenetic guidance. We approximate topological accuracy of gene trees by assessing their ability to recover two uncontroversial, monophyletic groups, and compare branch lengths of individual trees using the stemminess metric (the relative length of internal branches). We further examine different strategies of removing uninformative loci and the collapsing of weakly supported nodes into polytomies. We then summarize gene trees with ASTRAL and compare resulting species phylogenies, including comparisons to concatenation-based estimates. Gene trees obtained with the reversible jump model search in MrBayes were most concordant on average and all Bayesian methods yielded gene trees with better stemminess values. The only gene tree estimation approach whose ASTRAL summary trees consistently produced the most likely correct topology, however, was IQ-Tree with automated model designation (ModelFinder program). We discuss these findings and provide practical advice on gene tree estimation for summary methods. Lastly, we establish the first phylogeny-informed classification for Pseudapis s. l. and map the distribution of distinct morphological features of the group. [ASTRAL; Bees; concordance; gene tree estimation error; IQ-Tree; MrBayes, Nomiinae; PhyloBayes; RAxML; phylogenomics; stemminess].
Collapse
Affiliation(s)
- Silas Bossert
- Department of Entomology, Cornell University, Comstock Hall, Ithaca, NY 14853, USA.,Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.,Department of Entomology, Washington State University, Pullman, Washington 99164, USA
| | - Elizabeth A Murray
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.,Department of Entomology, Washington State University, Pullman, Washington 99164, USA
| | - Alain Pauly
- O.D. Taxonomy and Phylogeny, Royal Belgian Institute of Natural Sciences, Rue Vautier 29, 1000 Brussels, Belgium
| | - Kyrylo Chernyshov
- College of Arts and Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Seán G Brady
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA
| | - Bryan N Danforth
- Department of Entomology, Cornell University, Comstock Hall, Ithaca, NY 14853, USA
| |
Collapse
|
22
|
Meerow AW, Gardner EM, Nakamura K. Phylogenomics of the Andean Tetraploid Clade of the American Amaryllidaceae (Subfamily Amaryllidoideae): Unlocking a Polyploid Generic Radiation Abetted by Continental Geodynamics. FRONTIERS IN PLANT SCIENCE 2020; 11:582422. [PMID: 33250911 PMCID: PMC7674842 DOI: 10.3389/fpls.2020.582422] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 10/12/2020] [Indexed: 05/27/2023]
Abstract
One of the two major clades of the endemic American Amaryllidaceae subfam. Amaryllidoideae constitutes the tetraploid-derived (n = 23) Andean-centered tribes, most of which have 46 chromosomes. Despite progress in resolving phylogenetic relationships of the group with plastid and nrDNA, certain subclades were poorly resolved or weakly supported in those previous studies. Sequence capture using anchored hybrid enrichment was employed across 95 species of the clade along with five outgroups and generated sequences of 524 nuclear genes and a partial plastome. Maximum likelihood phylogenetic analyses were conducted on concatenated supermatrices, and coalescent-based species tree analyses were run on the gene trees, followed by hybridization network, age diversification and biogeographic analyses. The four tribes Clinantheae, Eucharideae, Eustephieae, and Hymenocallideae (sister to Clinantheae) are resolved in all analyses with > 90 and mostly 100% support, as are almost all genera within them. Nuclear gene supermatrix and species tree results were largely in concordance; however, some instances of cytonuclear discordance were evident. Hybridization network analysis identified significant reticulation in Clinanthus, Hymenocallis, Stenomesson and the subclade of Eucharideae comprising Eucharis, Caliphruria, and Urceolina. Our data support a previous treatment of the latter as a single genus, Urceolina, with the addition of Eucrosia dodsonii. Biogeographic analysis and penalized likelihood age estimation suggests an origin in the Cauca, Desert and Puna Neotropical bioprovinces for the complex in the mid-Oligocene, with more dispersals than vicariances in its history, but no extinctions. Hymenocallis represents the only instance of long-distance vicariance from the tropical Andean origin of its tribe Hymenocallideae. The absence of extinctions correlates with the lack of diversification rate shifts within the clade. The Eucharideae experienced a sudden lineage radiation ca. 10 Mya. We tie much of the divergences in the Andean-centered lineages to the rise of the Andes, and suggest that the Amotape-Huancabamba Zone functioned as both a corridor (dispersal) and a barrier to migration (vicariance). Several taxonomic changes are made. This is the largest DNA sequence data set to be applied within Amaryllidaceae to date.
Collapse
Affiliation(s)
- Alan W. Meerow
- USDA-ARS-SHRS, National Clonal Germplasm Repository, Miami, FL, United States
| | - Elliot M. Gardner
- Singapore Botanic Gardens, National Parks Board, Singapore, Singapore
- Institute of Environment, Florida International University, Miami, FL, United States
| | - Kyoko Nakamura
- USDA-ARS-SHRS, National Clonal Germplasm Repository, Miami, FL, United States
| |
Collapse
|
23
|
Portik DM, Wiens JJ. Do Alignment and Trimming Methods Matter for Phylogenomic (UCE) Analyses? Syst Biol 2020; 70:440-462. [PMID: 32797207 DOI: 10.1093/sysbio/syaa064] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/02/2020] [Accepted: 08/03/2020] [Indexed: 11/14/2022] Open
Abstract
Alignment is a crucial issue in molecular phylogenetics because different alignment methods can potentially yield very different topologies for individual genes. But it is unclear if the choice of alignment methods remains important in phylogenomic analyses, which incorporate data from hundreds or thousands of genes. For example, problematic biases in alignment might be multiplied across many loci, whereas alignment errors in individual genes might become irrelevant. The issue of alignment trimming (i.e., removing poorly aligned regions or missing data from individual genes) is also poorly explored. Here, we test the impact of 12 different combinations of alignment and trimming methods on phylogenomic analyses. We compare these methods using published phylogenomic data from ultraconserved elements (UCEs) from squamate reptiles (lizards and snakes), birds, and tetrapods. We compare the properties of alignments generated by different alignment and trimming methods (e.g., length, informative sites, missing data). We also test whether these data sets can recover well-established clades when analyzed with concatenated (RAxML) and species-tree methods (ASTRAL-III), using the full data ($\sim $5000 loci) and subsampled data sets (10% and 1% of loci). We show that different alignment and trimming methods can significantly impact various aspects of phylogenomic data sets (e.g., length, informative sites). However, these different methods generally had little impact on the recovery and support values for well-established clades, even across very different numbers of loci. Nevertheless, our results suggest several "best practices" for alignment and trimming. Intriguingly, the choice of phylogenetic methods impacted the phylogenetic results most strongly, with concatenated analyses recovering significantly more well-established clades (with stronger support) than the species-tree analyses. [Alignment; concatenated analysis; phylogenomics; sequence length heterogeneity; species-tree analysis; trimming].
Collapse
Affiliation(s)
- Daniel M Portik
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.,California Academy of Sciences, San Francisco, CA 94118, USA
| | - John J Wiens
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
24
|
Van Dam MH, Henderson JB, Esposito L, Trautwein M. Genomic Characterization and Curation of UCEs Improves Species Tree Reconstruction. Syst Biol 2020; 70:307-321. [PMID: 32750133 PMCID: PMC7875437 DOI: 10.1093/sysbio/syaa063] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 07/26/2020] [Accepted: 07/29/2020] [Indexed: 12/12/2022] Open
Abstract
Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here, we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated four different sets of UCE markers by genomic category from five different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees), and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by two or more UCEs, corresponding to nonoverlapping segments of a single gene. We considered these UCEs to be nonindependent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging cogenic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all data sets apparently driven by the increase in loci length. Additionally, we conducted simulations and found that gene trees generated from merged UCEs were more accurate than those generated by unmerged UCEs. As loci length improves gene tree accuracy, this modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses. [Anchored hybrid enrichment; ants; ASTRAL; bait capture; carangimorph; Coleoptera; conserved nonexonic elements; exon capture; gene tree; Hymenoptera; mammal; phylogenomic markers; songbird; species tree; ultraconserved elements; weevils.]
Collapse
Affiliation(s)
- Matthew H Van Dam
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA.,Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| | - James B Henderson
- Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| | - Lauren Esposito
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA.,Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| | - Michelle Trautwein
- Entomology Department, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA.,Center for Comparative Genomics, Institute for Biodiversity Science and Sustainability, California Academy of Sciences, 55 Music Concourse Dr., San Francisco, CA 94118, USA
| |
Collapse
|
25
|
Bhattacharjee A, Bayzid MS. Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices. BMC Genomics 2020; 21:497. [PMID: 32689946 PMCID: PMC7370488 DOI: 10.1186/s12864-020-06892-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 07/07/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND With the rapid growth rate of newly sequenced genomes, species tree inference from genes sampled throughout the whole genome has become a basic task in comparative and evolutionary biology. However, substantial challenges remain in leveraging these large scale molecular data. One of the foremost challenges is to develop efficient methods that can handle missing data. Popular distance-based methods, such as NJ (neighbor joining) and UPGMA (unweighted pair group method with arithmetic mean) require complete distance matrices without any missing data. RESULTS We introduce two highly accurate machine learning based distance imputation techniques. These methods are based on matrix factorization and autoencoder based deep learning architectures. We evaluated these two methods on a collection of simulated and biological datasets. Experimental results suggest that our proposed methods match or improve upon the best alternate distance imputation techniques. Moreover, these methods are scalable to large datasets with hundreds of taxa, and can handle a substantial amount of missing data. CONCLUSIONS This study shows, for the first time, the power and feasibility of applying deep learning techniques for imputing distance matrices. Thus, this study advances the state-of-the-art in phylogenetic tree construction in the presence of missing data. The proposed methods are available in open source form at https://github.com/Ananya-Bhattacharjee/ImputeDistances .
Collapse
Affiliation(s)
- Ananya Bhattacharjee
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205 Bangladesh
- Department of Computer Science and Engineering, Eastern University, Dhaka, Bangladesh
| | - Md. Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205 Bangladesh
| |
Collapse
|
26
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Brown RM. Larger, unfiltered datasets are more effective at resolving phylogenetic conflict: Introns, exons, and UCEs resolve ambiguities in Golden-backed frogs (Anura: Ranidae; genus Hylarana). Mol Phylogenet Evol 2020; 151:106899. [PMID: 32590046 DOI: 10.1016/j.ympev.2020.106899] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/18/2020] [Accepted: 06/17/2020] [Indexed: 01/01/2023]
Abstract
Using FrogCap, a recently-developed sequence-capture protocol, we obtained >12,000 highly informative exons, introns, and ultraconserved elements (UCEs), which we used to illustrate variation in evolutionary histories of these classes of markers, and to resolve long-standing systematic problems in Southeast Asian Golden-backed frogs of the genus-complex Hylarana. We also performed a comprehensive suite of analyses to assess the relative performance of different genetic markers, data filtering strategies, tree inference methods, and different measures of branch support. To reduce gene tree estimation error, we filtered the data using different thresholds of taxon completeness (missing data) and parsimony informative sites (PIS). We then estimated species trees using concatenated datasets and Maximum Likelihood (IQ-TREE) in addition to summary (ASTRAL-III), distance-based (ASTRID), and site-based (SVDQuartets) multispecies coalescent methods. Topological congruence and branch support were examined using traditional bootstrap, local posterior probabilities, gene concordance factors, quartet frequencies, and quartet scores. Our results did not yield a single concordant topology. Instead, introns, exons, and UCEs clearly possessed different phylogenetic signals, resulting in conflicting, yet strongly-supported phylogenetic estimates. However, a combined analysis comprising the most informative introns, exons, and UCEs converged on a similar topology across all analyses, with the exception of SVDQuartets. Bootstrap values were consistently high despite high levels of incongruence and high proportions of gene trees supporting conflicting topologies. Although low bootstrap values did indicate low heuristic support, high bootstrap support did not necessarily reflect congruence or support for the correct topology. This study reiterates findings of some previous studies, which demonstrated that traditional bootstrap values can produce positively misleading measures of support in large phylogenomic datasets. We also showed a remarkably strong positive relationship between branch length and topological congruence across all datasets, implying that very short internodes remain a challenge to resolve, even with orders of magnitude more data than ever before. Overall, our results demonstrate that more data from unfiltered or combined datasets produced superior results. Although data filtering reduced gene tree incongruence, decreased amounts of data also biased phylogenetic estimation. A point of diminishing returns was evident, at which higher congruence (from more stringent filtering) at the expense of amount of data led to topological error as assessed by comparison to more complete datasets across different genomic markers. Additionally, we showed that applying a parameter-rich model to a partitioned analysis of concatenated data produces better results compared to unpartitioned, or even partitioned analysis using model selection. Despite some lingering uncertainties, a combined analysis of our genomic data and sequences supplemented from GenBank (on the basis of a few gene regions) revealed highly supported novel systematic arrangements. Based on these new findings, we transfer Amnirana nicobariensis into the genus Indosylvirana; and I. milleti and Hylarana celebensis to the genus Papurana. We also provisionally place H. attigua in the genus Papurana pending verification from positively identified (voucher substantiated) samples.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, 2 Conservatory Drive, 117377, Singapore.
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Perry L Wood
- Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| | - L Lee Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, 4500 Riverwalk Parkway, Riverside, CA 92505, USA
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
27
|
Islam M, Sarker K, Das T, Reaz R, Bayzid MS. STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency. BMC Genomics 2020; 21:136. [PMID: 32039704 PMCID: PMC7011378 DOI: 10.1186/s12864-020-6519-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 01/20/2020] [Indexed: 12/14/2022] Open
Abstract
Background Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, estimating a species tree from a collection of gene trees can be complicated due to the presence of gene tree incongruence resulting from incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent process. Maximum likelihood and Bayesian MCMC methods can potentially result in accurate trees, but they do not scale well to large datasets. Results We present STELAR (Species Tree Estimation by maximizing tripLet AgReement), a new fast and highly accurate statistically consistent coalescent-based method for estimating species trees from a collection of gene trees. We formalized the constrained triplet consensus (CTC) problem and showed that the solution to the CTC problem is a statistically consistent estimate of the species tree under the multi-species coalescent (MSC) model. STELAR is an efficient dynamic programming based solution to the CTC problem which is highly accurate and scalable. We evaluated the accuracy of STELAR in comparison with SuperTriplets, which is an alternate fast and highly accurate triplet-based supertree method, and with MP-EST and ASTRAL – two of the most popular and accurate coalescent-based methods. Experimental results suggest that STELAR matches the accuracy of ASTRAL and improves on MP-EST and SuperTriplets. Conclusions Theoretical and empirical results (on both simulated and real biological datasets) suggest that STELAR is a valuable technique for species tree estimation from gene tree distributions.
Collapse
Affiliation(s)
- Mazharul Islam
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Kowshika Sarker
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Trisha Das
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science, The University of Texas at Austin, Texas, 78712, USA
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
| |
Collapse
|
28
|
Zeng Z, Liang D, Li J, Lyu Z, Wang Y, Zhang P. Phylogenetic relationships of the Chinese torrent frogs (Ranidae: Amolops) revealed by phylogenomic analyses of AFLP-Capture data. Mol Phylogenet Evol 2020; 146:106753. [PMID: 32028033 DOI: 10.1016/j.ympev.2020.106753] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 01/13/2020] [Accepted: 01/28/2020] [Indexed: 10/25/2022]
Abstract
The torrent frog genus Amolops contains nearly sixty species distributed in swift mountain streams throughout southeast Asia. The taxonomy of this genus has proven complicated due to unstable morphological diagnostic characters. The relationships of Amolops species and species groups were not readily resolved with a small number of molecular markers. Here, we applied the novel AFLP-Capture approach and acquired two large datasets (242 anonymous nuclear sequences and the mitochondrial genome) from 70 Chinese Amolops samples to study their relationships. The phylogenies inferred from the nuclear data and the mitochondrial data were both robust and revealed a primary phylogenetic split between eastern and western Chinese Amolops species. The relationships of the six species groups were clarified. While the three species groups in east China (the A. ricketti, A. daiyunensis and A. hainanensis groups) were monophyletic, the three species groups in the west (the A. mantzorum, A. monticola and A. marmoratus groups) were not monophyletic, suggesting a need for further investigation and revision. The robust phylogenies also provided new insights into species relationships, especially for the A. mantzorum group, which has been difficult to resolve due to multiple speciation events occurring approximately 7-8 million years ago. The divergence times estimated with the nuclear data indicated that the ancestor of the Chinese Amolops appeared in the late Eocene or early Oligocene, and that speciation events in the Chinese Amolops were often related to geological events (e.g. the uprising of mountains and the formation of islands). By including the mitochondrial sequences from GenBank, a more comprehensive Amolops phylogeny was constructed that reflected the origin of the Chinese Amolops. Based on all these results, a dispersal scenario of the torrent frogs was hypothesized. Our research serves as the first example of using AFLP-Capture to obtain a large amount of data for shallow-scale phylogenetic and taxonomic studies, which should be useful for other nonmodel organism groups.
Collapse
Affiliation(s)
- Zhaochi Zeng
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Dan Liang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Jiaxuan Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zhitong Lyu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yingyong Wang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China.
| | - Peng Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
29
|
Gatesy J, Sloan DB, Warren JM, Baker RH, Simmons MP, Springer MS. Partitioned coalescence support reveals biases in species-tree methods and detects gene trees that determine phylogenomic conflicts. Mol Phylogenet Evol 2019; 139:106539. [DOI: 10.1016/j.ympev.2019.106539] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 06/10/2019] [Accepted: 06/17/2019] [Indexed: 12/26/2022]
|
30
|
Jones KE, Fér T, Schmickl RE, Dikow RB, Funk VA, Herrando‐Moraira S, Johnston PR, Kilian N, Siniscalchi CM, Susanna A, Slovák M, Thapa R, Watson LE, Mandel JR. An empirical assessment of a single family-wide hybrid capture locus set at multiple evolutionary timescales in Asteraceae. APPLICATIONS IN PLANT SCIENCES 2019; 7:e11295. [PMID: 31667023 PMCID: PMC6814182 DOI: 10.1002/aps3.11295] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 09/05/2019] [Indexed: 05/23/2023]
Abstract
PREMISE Hybrid capture with high-throughput sequencing (Hyb-Seq) is a powerful tool for evolutionary studies. The applicability of an Asteraceae family-specific Hyb-Seq probe set and the outcomes of different phylogenetic analyses are investigated here. METHODS Hyb-Seq data from 112 Asteraceae samples were organized into groups at different taxonomic levels (tribe, genus, and species). For each group, data sets of non-paralogous loci were built and proportions of parsimony informative characters estimated. The impacts of analyzing alternative data sets, removing long branches, and type of analysis on tree resolution and inferred topologies were investigated in tribe Cichorieae. RESULTS Alignments of the Asteraceae family-wide Hyb-Seq locus set were parsimony informative at all taxonomic levels. Levels of resolution and topologies inferred at shallower nodes differed depending on the locus data set and the type of analysis, and were affected by the presence of long branches. DISCUSSION The approach used to build a Hyb-Seq locus data set influenced resolution and topologies inferred in phylogenetic analyses. Removal of long branches improved the reliability of topological inferences in maximum likelihood analyses. The Astereaceae Hyb-Seq probe set is applicable at multiple taxonomic depths, which demonstrates that probe sets do not necessarily need to be lineage-specific.
Collapse
Affiliation(s)
- Katy E. Jones
- Botanischer Garten und Botanisches Museum BerlinFreie Universität BerlinKönigin‐Luise‐Str. 6–814195BerlinGermany
| | - Tomáš Fér
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2CZ 12800PragueCzech Republic
| | - Roswitha E. Schmickl
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2CZ 12800PragueCzech Republic
- Institute of BotanyThe Czech Academy of SciencesZámek 1CZ 25243PrůhoniceCzech Republic
| | - Rebecca B. Dikow
- Data Science LabOffice of the Chief Information OfficerSmithsonian InstitutionWashingtonD.C.20013‐7012USA
| | - Vicki A. Funk
- Department of BotanyNational Museum of Natural HistorySmithsonian InstitutionWashingtonD.C.20013‐7012USA
| | | | - Paul R. Johnston
- Freie Universität BerlinEvolutionary BiologyBerlinGermany
- Berlin Center for Genomics in Biodiversity ResearchBerlinGermany
- Leibniz‐Institute of Freshwater Ecology and Inland Fisheries (IGB)BerlinGermany
| | - Norbert Kilian
- Botanischer Garten und Botanisches Museum BerlinFreie Universität BerlinKönigin‐Luise‐Str. 6–814195BerlinGermany
| | - Carolina M. Siniscalchi
- Department of Biological SciencesUniversity of MemphisMemphisTennessee38152USA
- Center for BiodiversityUniversity of MemphisMemphisTennessee38152USA
| | - Alfonso Susanna
- Botanic Institute of Barcelona (IBB‐CSIC‐ICUB)Pg. del Migdia s.n.ES 08038BarcelonaSpain
| | - Marek Slovák
- Department of BotanyFaculty of ScienceCharles UniversityBenátská 2CZ 12800PragueCzech Republic
- Plant Science and Biodiversity CentreSlovak Academy of SciencesSK‐84523BratislavaSlovakia
| | - Ramhari Thapa
- Department of Biological SciencesUniversity of MemphisMemphisTennessee38152USA
- Center for BiodiversityUniversity of MemphisMemphisTennessee38152USA
| | - Linda E. Watson
- Department of Plant Biology, Ecology, and EvolutionOklahoma State UniversityStillwaterOklahoma74078USA
| | - Jennifer R. Mandel
- Department of Biological SciencesUniversity of MemphisMemphisTennessee38152USA
- Center for BiodiversityUniversity of MemphisMemphisTennessee38152USA
| |
Collapse
|
31
|
Moumi NA, Das B, Tasnim Promi Z, Bristy NA, Bayzid MS. Quartet-based inference of cell differentiation trees from ChIP-Seq histone modification data. PLoS One 2019; 14:e0221270. [PMID: 31557185 PMCID: PMC6762093 DOI: 10.1371/journal.pone.0221270] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 08/04/2019] [Indexed: 01/23/2023] Open
Abstract
Understanding cell differentiation-the process of generation of distinct cell-types-plays a pivotal role in developmental and evolutionary biology. Transcriptomic information and epigenetic marks are useful to elucidate hierarchical developmental relationships among cell-types. Standard phylogenetic approaches such as maximum parsimony, maximum likelihood and neighbor joining have previously been applied to ChIP-Seq histone modification data to infer cell-type trees, showing how diverse types of cells are related. In this study, we demonstrate the applicability and suitability of quartet-based phylogenetic tree estimation techniques for constructing cell-type trees. We propose two quartet-based pipelines for constructing cell phylogeny. Our methods were assessed for their validity in inferring hierarchical differentiation processes of various cell-types in H3K4me3, H3K27me3, H3K36me3, and H3K27ac histone mark data. We also propose a robust metric for evaluating cell-type trees.
Collapse
Affiliation(s)
- Nazifa Ahmed Moumi
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Badhan Das
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Zarin Tasnim Promi
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Nishat Anjum Bristy
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md. Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
- * E-mail:
| |
Collapse
|
32
|
Mendes FK, Livera AP, Hahn MW. The perils of intralocus recombination for inferences of molecular convergence. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180244. [PMID: 31154973 DOI: 10.1098/rstb.2018.0244] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Accurate inferences of convergence require that the appropriate tree topology be used. If there is a mismatch between the tree a trait has evolved along and the tree used for analysis, then false inferences of convergence ('hemiplasy') can occur. To avoid problems of hemiplasy when there are high levels of gene tree discordance with the species tree, researchers have begun to construct tree topologies from individual loci. However, due to intralocus recombination, even locus-specific trees may contain multiple topologies within them. This implies that the use of individual tree topologies discordant with the species tree can still lead to incorrect inferences about molecular convergence. Here, we examine the frequency with which single exons and single protein-coding genes contain multiple underlying tree topologies, in primates and Drosophila, and quantify the effects of hemiplasy when using trees inferred from individual loci. In both clades, we find that there are most often multiple diagnosable topologies within single exons and whole genes, with 91% of Drosophila protein-coding genes containing multiple topologies. Because of this underlying topological heterogeneity, even using trees inferred from individual protein-coding genes results in 25% and 38% of substitutions falsely labelled as convergent in primates and Drosophila, respectively. While constructing local trees can reduce the problem of hemiplasy, our results suggest that it will be difficult to completely avoid false inferences of convergence. We conclude by suggesting several ways forward in the analysis of convergent evolution, for both molecular and morphological characters. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Fábio K Mendes
- 1 Department of Computer Science, The University of Auckland , Auckland 1010 , New Zealand.,2 Department of Biology, Indiana University , Bloomington, IN 47405 , USA
| | - Andrew P Livera
- 2 Department of Biology, Indiana University , Bloomington, IN 47405 , USA
| | - Matthew W Hahn
- 2 Department of Biology, Indiana University , Bloomington, IN 47405 , USA.,3 Department of Computer Science, Indiana University , Bloomington, IN 47405 , USA
| |
Collapse
|
33
|
Vasilikopoulos A, Balke M, Beutel RG, Donath A, Podsiadlowski L, Pflug JM, Waterhouse RM, Meusemann K, Peters RS, Escalona HE, Mayer C, Liu S, Hendrich L, Alarie Y, Bilton DT, Jia F, Zhou X, Maddison DR, Niehuis O, Misof B. Phylogenomics of the superfamily Dytiscoidea (Coleoptera: Adephaga) with an evaluation of phylogenetic conflict and systematic error. Mol Phylogenet Evol 2019; 135:270-285. [DOI: 10.1016/j.ympev.2019.02.022] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 02/22/2019] [Accepted: 02/25/2019] [Indexed: 02/07/2023]
|
34
|
Roch S, Nute M, Warnow T. Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods. Syst Biol 2019; 68:281-297. [PMID: 30247732 DOI: 10.1093/sysbio/syy061] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 09/12/2018] [Indexed: 11/13/2022] Open
Abstract
With advances in sequencing technologies, there are now massive amounts of genomic data from across all life, leading to the possibility that a robust Tree of Life can be constructed. However, "gene tree heterogeneity", which is when different genomic regions can evolve differently, is a common phenomenon in multi-locus data sets, and reduces the accuracy of standard methods for species tree estimation that do not take this heterogeneity into account. New methods have been developed for species tree estimation that specifically address gene tree heterogeneity, and that have been proven to converge to the true species tree when the number of loci and number of sites per locus both increase (i.e., the methods are said to be "statistically consistent"). Yet, little is known about the biologically realistic condition where the number of sites per locus is bounded. We show that when the sequence length of each locus is bounded (by any arbitrarily chosen value), the most common approaches to species tree estimation that take heterogeneity into account (i.e., traditional fully partitioned concatenated maximum likelihood and newer approaches, called summary methods, that estimate the species tree by combining estimated gene trees) are not statistically consistent, even when the heterogeneity is extremely constrained. The main challenge is the presence of conditions such as long branch attraction that create biased tree estimation when the number of sites is restricted. Hence, our study uncovers a fundamental challenge to species tree estimation using both traditional and new methods.
Collapse
Affiliation(s)
- Sebastien Roch
- Department of Mathematics, University of Wisconsin-Madison, 480 Lincoln Dr, Madison, WI 53706, USA
| | - Michael Nute
- Department of Statistics, The University of Illinois at Urbana-Champaign, 725 S Wright St #101, Champaign, IL 61820, USA
| | - Tandy Warnow
- Department of Computer Science, The University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801-2302, USA
| |
Collapse
|
35
|
Simmons MP, Sloan DB, Springer MS, Gatesy J. Gene-wise resampling outperforms site-wise resampling in phylogenetic coalescence analyses. Mol Phylogenet Evol 2019; 131:80-92. [DOI: 10.1016/j.ympev.2018.10.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Accepted: 10/01/2018] [Indexed: 01/15/2023]
|
36
|
Almeida EAB, Packer L, Melo GAR, Danforth BN, Cardinal SC, Quinteiro FB, Pie MR. The diversification of neopasiphaeine bees during the Cenozoic (Hymenoptera: Colletidae). ZOOL SCR 2018. [DOI: 10.1111/zsc.12333] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Eduardo A. B. Almeida
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia Ciências e Letras, Universidade de São Paulo Ribeirão Preto SP Brazil
| | | | - Gabriel A. R. Melo
- Departamento de Zoologia Universidade Federal do Paraná Curitiba PR Brazil
| | - Bryan N. Danforth
- Department of Entomology Comstock Hall, Cornell University. Ithaca New York
| | - Sophie C. Cardinal
- Agriculture and Agri‐Food Canada Canadian National Collection of Insects Ottawa Ontario Canada
| | - Fábio B. Quinteiro
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia Ciências e Letras, Universidade de São Paulo Ribeirão Preto SP Brazil
- Departamento de Ecologia, Zoologia e Genética, Instituto de Biologia Universidade Federal de Pelotas Pelotas Rio Grande do Sul Brazil
| | - Marcio R. Pie
- Departamento de Zoologia Universidade Federal do Paraná Curitiba PR Brazil
| |
Collapse
|
37
|
Pouchon C, Fernández A, Nassar JM, Boyer F, Aubert S, Lavergne S, Mavárez J. Phylogenomic Analysis of the Explosive Adaptive Radiation of the Espeletia Complex (Asteraceae) in the Tropical Andes. Syst Biol 2018; 67:1041-1060. [PMID: 30339252 DOI: 10.1093/sysbio/syy022] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 03/15/2018] [Indexed: 01/17/2023] Open
Abstract
The subtribe Espeletiinae (Asteraceae), endemic to the high-elevations in the Northern Andes, exhibits an exceptional diversity of species, growth-forms, and reproductive strategies. This complex of 140 species includes large trees, dichotomous trees, shrubs and the extraordinary giant caulescent rosettes, considered as a classic example of adaptation in tropical high-elevation ecosystems. The subtribe has also long been recognized as a prominent case of adaptive radiation, but the understanding of its evolution has been hampered by a lack of phylogenetic resolution. Herein, we produce the first fully resolved phylogeny of all morphological groups of Espeletiinae, using whole plastomes and about a million nuclear nucleotides obtained with an original de novo assembly procedure without reference genome, and analyzed with traditional and coalescent-based approaches that consider the possible impact of incomplete lineage sorting and hybridization on phylogenetic inference. We show that the diversification of Espeletiinae started from a rosette ancestor about 2.3 Ma, after the final uplift of the Northern Andes. This was followed by two independent radiations in the Colombian and Venezuelan Andes, with a few trans-cordilleran dispersal events among low-elevation tree lineages but none among high-elevation rosettes. We demonstrate complex scenarios of morphological change in Espeletiinae, usually implying the convergent evolution of growth-forms with frequent loss/gains of various traits. For instance, caulescent rosettes evolved independently in both countries, likely as convergent adaptations to life in tropical high-elevation habitats. Tree growth-forms evolved independently three times from the repeated colonization of lower elevations by high-elevation rosette ancestors. The rate of morphological diversification increased during the early phase of the radiation, after which it decreased steadily towards the present. On the other hand, the rate of species diversification in the best-sampled Venezuelan radiation was on average very high (3.1 spp/My), with significant rate variation among growth-forms (much higher in polycarpic caulescent rosettes). Our results point out a scenario where both adaptive morphological evolution and geographical isolation due to Pleistocene climatic oscillations triggered an exceptionally rapid radiation for a continental plant group.
Collapse
Affiliation(s)
- Charles Pouchon
- Laboratoire d'Ecologie Alpine, UMR 5553, Université Grenoble Alpes-CNRS, Grenoble, France
| | - Angel Fernández
- Herbario IVIC, Centro de Biofísica y Bioquímica, Instituto Venezolano de Investigaciones Científicas, Apartado 20632, Caracas 1020-A, Venezuela
| | - Jafet M Nassar
- Laboratorio de Biología de Organismos, Centro de Ecología, Instituto Venezolano de Investigaciones Científicas, Apartado 20632, Caracas 1020-A, Venezuela
| | - Frédéric Boyer
- Laboratoire d'Ecologie Alpine, UMR 5553, Université Grenoble Alpes-CNRS, Grenoble, France
| | - Serge Aubert
- Laboratoire d'Ecologie Alpine, UMR 5553, Université Grenoble Alpes-CNRS, Grenoble, France.,Station alpine Joseph-Fourier, UMS 3370, Université Grenoble Alpes-CNRS, Grenoble, France
| | - Sébastien Lavergne
- Laboratoire d'Ecologie Alpine, UMR 5553, Université Grenoble Alpes-CNRS, Grenoble, France
| | - Jesús Mavárez
- Laboratoire d'Ecologie Alpine, UMR 5553, Université Grenoble Alpes-CNRS, Grenoble, France
| |
Collapse
|
38
|
Rabiee M, Sayyari E, Mirarab S. Multi-allele species reconstruction using ASTRAL. Mol Phylogenet Evol 2018; 130:286-296. [PMID: 30393186 DOI: 10.1016/j.ympev.2018.10.033] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2017] [Revised: 10/23/2018] [Accepted: 10/24/2018] [Indexed: 11/29/2022]
Abstract
Genome-wide phylogeny reconstruction is becoming increasingly common, and one driving factor behind these phylogenomic studies is the promise that the potential discordance between gene trees and the species tree can be modeled. Incomplete lineage sorting is one cause of discordance that bridges population genetic and phylogenetic processes. ASTRAL is a species tree reconstruction method that seeks to find the tree with minimum quartet distance to an input set of inferred gene trees. However, the published ASTRAL algorithm only works with one sample per species. To account for polymorphisms in present-day species, one can sample multiple individuals per species to create multi-allele datasets. Here, we introduce how ASTRAL can handle multi-allele datasets. We show that the quartet-based optimization problem extends naturally, and we introduce heuristic methods for building the search space specifically for the case of multi-individual datasets. We study the accuracy and scalability of the multi-individual version of ASTRAL-III using extensive simulation studies and compare it to NJst, the only other scalable method that can handle these datasets. We do not find strong evidence that using multiple individuals dramatically improves accuracy. When we study the trade-off between sampling more genes versus more individuals, we find that sampling more genes is more effective than sampling more individuals, even under conditions that we study where trees are shallow (median length: ≈1Ne) and ILS is extremely high.
Collapse
Affiliation(s)
- Maryam Rabiee
- Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093, United States
| | - Erfan Sayyari
- Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093, United States
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093, United States.
| |
Collapse
|
39
|
Herrando-Moraira S. Exploring data processing strategies in NGS target enrichment to disentangle radiations in the tribe Cardueae (Compositae). Mol Phylogenet Evol 2018; 128:69-87. [PMID: 30036700 DOI: 10.1016/j.ympev.2018.07.012] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2018] [Revised: 07/13/2018] [Accepted: 07/14/2018] [Indexed: 12/17/2022]
Abstract
Target enrichment is a cost-effective sequencing technique that holds promise for elucidating evolutionary relationships in fast-evolving lineages. However, potential biases and impact of bioinformatic sequence treatments in phylogenetic inference have not been thoroughly explored yet. Here, we investigate this issue with an ultimate goal to shed light into a highly diversified group of Compositae (Asteraceae) constituted by four main genera: Arctium, Cousinia, Saussurea, and Jurinea. Specifically, we compared sequence data extraction methods implemented in two easy-to-use workflows, PHYLUCE and HybPiper, and assessed the impact of two filtering practices intended to reduce phylogenetic noise. In addition, we compared two phylogenetic inference methods: (1) the concatenation approach, in which all loci were concatenated in a supermatrix; and (2) the coalescence approach, in which gene trees were produced independently and then used to construct a species tree under coalescence assumptions. Here we confirm the usefulness of the set of 1061 COS targets (a nuclear conserved orthology loci set developed for the Compositae) across a variety of taxonomic levels. Intergeneric relationships were completely resolved: there are two sister groups, Arctium-Cousinia and Saussurea-Jurinea, which are in agreement with a morphological hypothesis. Intrageneric relationships among species of Arctium, Cousinia, and Saussurea are also well defined. Conversely, conflicting species relationships remain for Jurinea. Methodological choices significantly affected phylogenies in terms of topology, branch length, and support. Across all analyses, the phylogeny obtained using HybPiper and the strictest scheme of removing fast-evolving sites was estimated as the optimal. Regarding methodological choices, we conclude that: (1) trees obtained under the coalescence approach are topologically more congruent between them than those inferred using the concatenation approach; (2) refining treatments only improved support values under the concatenation approach; and (3) branch support values are maximized when fast-evolving sites are removed in the concatenation approach, and when a higher number of loci is analyzed in the coalescence approach.
Collapse
Affiliation(s)
- Sonia Herrando-Moraira
- Botanic Institute of Barcelona (IBB, CSIC-ICUB), Pg. del Migdia, s.n., 08038 Barcelona, Spain.
| | | |
Collapse
|
40
|
Kuang T, Tornabene L, Li J, Jiang J, Chakrabarty P, Sparks JS, Naylor GJP, Li C. Phylogenomic analysis on the exceptionally diverse fish clade Gobioidei (Actinopterygii: Gobiiformes) and data-filtering based on molecular clocklikeness. Mol Phylogenet Evol 2018; 128:192-202. [PMID: 30036699 DOI: 10.1016/j.ympev.2018.07.018] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 07/11/2018] [Accepted: 07/17/2018] [Indexed: 11/30/2022]
Abstract
The use of genome-scale data to infer phylogenetic relationships has gained in popularity in recent years due to the progress made in target-gene capture and sequencing techniques. Data filtering, the approach of excluding data inconsistent with the model from analyses, presumably could alleviate problems caused by systematic errors in phylogenetic inference. Different data filtering criteria, such as those based on evolutionary rate and molecular clocklikeness as well as others have been proposed for selecting useful phylogenetic markers, yet few studies have tested these criteria using phylogenomic data. We developed a novel set of single-copy nuclear coding markers to capture thousands of target genes in gobioid fishes, a species-rich lineages of vertebrates, and tested the effects of data-filtering methods based on substitution rate and molecular clocklikeness while attempting to control for the compounding effects of missing data and variation in locus length. We found that molecular clocklikeness was a better predictor than overall substitution rate for phylogenetic usefulness of molecular markers in our study. In addition, when the 100 best ranked loci for our predictors were concatenated and analyzed using maximum likelihood, or combined in a coalescent-based species-tree analysis, the resulting trees showed a well-resolved topology of Gobioidei that mostly agrees with previous studies. However, trees generated from the 100 least clocklike frequently recovered conflicting, and in some cases clearly erroneous topologies with strong support, thus indicating strong systematic biases in those datasets. Collectively these results suggest that data filtering has the potential improve the performance of phylogenetic inference when using both a concatenation approach as well as methods that rely on input from individual gene trees (i.e. coalescent species-tree approaches), which may be preferred in scenarios where incomplete lineage sorting is likely to be an issue.
Collapse
Affiliation(s)
- Ting Kuang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Luke Tornabene
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98105, USA
| | - Jingyan Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Jiamei Jiang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Prosanta Chakrabarty
- Louisiana State University, Museum of Natural Science, Department of Biological Sciences, Baton Rouge, LA 70803, USA
| | - John S Sparks
- American Museum of Natural History, Central Park West at 79th Street, NY, NY 10024, USA
| | | | - Chenhong Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China.
| |
Collapse
|
41
|
SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space. Mol Phylogenet Evol 2018. [DOI: 10.1016/j.ympev.2018.03.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
42
|
Bangs MR, Douglas MR, Mussmann SM, Douglas ME. Unraveling historical introgression and resolving phylogenetic discord within Catostomus (Osteichthys: Catostomidae). BMC Evol Biol 2018; 18:86. [PMID: 29879898 PMCID: PMC5992631 DOI: 10.1186/s12862-018-1197-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 05/18/2018] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Porous species boundaries can be a source of conflicting hypotheses, particularly when coupled with variable data and/or methodological approaches. Their impacts can often be magnified when non-model organisms with complex histories of reticulation are investigated. One such example is the genus Catostomus (Osteichthys, Catostomidae), a freshwater fish clade with conflicting morphological and mitochondrial phylogenies. The former is hypothesized as reflecting the presence of admixed genotypes within morphologically distinct lineages, whereas the latter is interpreted as the presence of distinct morphologies that emerged multiple times through convergent evolution. We tested these hypotheses using multiple methods, to including multispecies coalescent and concatenated approaches. Patterson's D-statistic was applied to resolve potential discord, examine introgression, and test the putative hybrid origin of two species. We also applied naïve binning to explore potential effects of concatenation. RESULTS We employed 14,007 loci generated from ddRAD sequencing of 184 individuals to derive the first highly supported nuclear phylogeny for Catostomus. Our phylogenomic analyses largely agreed with a morphological interpretation,with the exception of the placement of Xyrauchen texanus, which differs from both morphological and mitochondrial phylogenies. Additionally, our evaluation of the putative hybrid species C. columbianus revealed a lack introgression and instead matched the mitochondrial phylogeny. Furthermore, D-statistic tests clarified all discrepancies based solely on mitochondrial data, with agreement among topologies derived from concatenation and multispecies coalescent approaches. Extensive historic introgression was detected across six species-pairs. Potential endemism in the Virgin and Little Colorado Rivers was also apparent, and the former genus Pantosteus was derived as monophyletic, save for C. columbianus. CONCLUSIONS Complex reticulated histories detected herein support the hypothesis that introgression was responsible for conflicts that occurred within the mitochondrial phylogeny, and explains discrepancies found between it and previous morphological phylogenies. Additionally, the hybrid origin of C. columbianus was refuted, but with the caveat that more fine-grain sampling is still needed. Our diverse phylogenomic approaches provided largely concordant results, with naïve binning useful in exploring the single conflict. Considerable diversity was found within Catostomus across southwestern North America, with two drainages [Virgin River (UT) and Little Colorado River (AZ)] reflecting unique composition.
Collapse
Affiliation(s)
- Max R Bangs
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA. .,School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University, Auburn, AL, 36849, USA.
| | - Marlis R Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Steven M Mussmann
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Michael E Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA
| |
Collapse
|
43
|
Springer MS, Gatesy J. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets. Genes (Basel) 2018; 9:genes9030123. [PMID: 29495400 PMCID: PMC5867844 DOI: 10.3390/genes9030123] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/02/2018] [Accepted: 02/19/2018] [Indexed: 02/07/2023] Open
Abstract
coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset-the 'recombination ratchet'-is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d'etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful enough to infer the correct species tree for difficult phylogenetic problems in the anomaly zone, where concatenation is expected to fail because of ILS, then there should be a decreasing probability of inferring the correct species tree using longer loci with many intralocus recombination breakpoints (i.e., increased levels of concatenation).
Collapse
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA.
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA.
| |
Collapse
|
44
|
Yan DH, Gao Q, Sun X, Song X, Li H. ITS2 sequence-structure phylogeny reveals diverse endophytic Pseudocercospora fungi on poplars. Genetica 2018; 146:187-198. [PMID: 29397500 DOI: 10.1007/s10709-018-0011-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Accepted: 01/31/2018] [Indexed: 01/13/2023]
Abstract
For matching the new fungal nomenclature to abolish pleomorphic names for a fungus, a genus Pseudocercospora s. str. was suggested to host holomorphic Pseudocercosproa fungi. But the Pseudocercosproa fungi need extra phylogenetic loci to clarify their taxonomy and diversity for their existing and coming species. Internal transcribed spacer 2 (ITS2) secondary structures have been promising in charactering species phylogeny in plants, animals and fungi. In present study, a conserved model of ITS2 secondary structures was confirmed on fungi in Pseudocercospora s. str. genus using RNAshape program. The model has a typical eukaryotic four-helix ITS2 secondary structure. But a single U base occurred in conserved motif of U-U mismatch in Helix 2, and a UG emerged in UGGU motif in Helix 3 to Pseudocercospora fungi. The phylogeny analyses based on the ITS2 sequence-secondary structures with compensatory base change characterizations are able to delimit more species for Pseudocercospora s. str. than phylogenic inferences of traditional multi-loci alignments do. The model was employed to explore the diversity of endophytic Pseudocercospora fungi in poplar trees. The analysis results also showed that endophytic Pseudocercospora fungi were diverse in species and evolved a specific lineage in poplar trees. This work suggested that ITS2 sequence-structures could become as additionally significant loci for species phylogenetic and taxonomic studies on Pseudocerospora fungi, and that Pseudocercospora endophytes could be important roles to Pseudocercospora fungi's evolution and function in ecology.
Collapse
Affiliation(s)
- Dong-Hui Yan
- Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Hai Dian District, Beijing, 100091, People's Republic of China.
| | - Qian Gao
- Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Hai Dian District, Beijing, 100091, People's Republic of China
| | - Xiaoming Sun
- Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Hai Dian District, Beijing, 100091, People's Republic of China
| | - Xiaoyu Song
- Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Hai Dian District, Beijing, 100091, People's Republic of China
| | - Hongchang Li
- Research Institute of Forest Ecology, Environment and Protection, Chinese Academy of Forestry, Hai Dian District, Beijing, 100091, People's Republic of China
| |
Collapse
|
45
|
Streicher JW, Miller EC, Guerrero PC, Correa C, Ortiz JC, Crawford AJ, Pie MR, Wiens JJ. Evaluating methods for phylogenomic analyses, and a new phylogeny for a major frog clade (Hyloidea) based on 2214 loci. Mol Phylogenet Evol 2018; 119:128-143. [DOI: 10.1016/j.ympev.2017.10.013] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 10/21/2017] [Accepted: 10/22/2017] [Indexed: 01/28/2023]
|
46
|
Folk RA, Mandel JR, Freudenstein JV. Ancestral Gene Flow and Parallel Organellar Genome Capture Result in Extreme Phylogenomic Discord in a Lineage of Angiosperms. Syst Biol 2018; 66:320-337. [PMID: 27637567 DOI: 10.1093/sysbio/syw083] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 09/04/2016] [Indexed: 11/12/2022] Open
Abstract
While hybridization has recently received a resurgence of attention from systematists and evolutionary biologists, there remains a dearth of case studies on ancient, diversified hybrid lineages-clades of organisms that originated through reticulation. Studies on these groups are valuable in that they would speak to the long-term phylogenetic success of lineages following gene flow between species. We present a phylogenomic view of Heuchera, long known for frequent hybridization, incorporating all three independent genomes: targeted nuclear (~400,000 bp), plastid (~160,000 bp), and mitochondrial (~470,000 bp) data. We analyze these data using multiple concatenation and coalescence strategies. The nuclear phylogeny is consistent with previous work and with morphology, confidently suggesting a monophyletic Heuchera. By contrast, analyses of both organellar genomes recover a grossly polyphyletic Heuchera,consisting of three primary clades with relationships extensively rearranged within these as well. A minority of nuclear loci also exhibit phylogenetic discord; yet these topologies remarkably never resemble the pattern of organellar loci and largely present low levels of discord inter alia. Two independent estimates of the coalescent branch length of the ancestor of Heuchera using nuclear data suggest rare or nonexistent incomplete lineage sorting with related clades, inconsistent with the observed gross polyphyly of organellar genomes (confirmed by simulation of gene trees under the coalescent). These observations, in combination with previous work, strongly suggest hybridization as the cause of this phylogenetic discord. [Ancient hybridization; chloroplast capture; incongruence; phylogenomics; reticulation.].
Collapse
Affiliation(s)
- Ryan A Folk
- Herbarium, Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH 43212, USA and
| | - Jennifer R Mandel
- Department of Biological Sciences, University of Memphis, Memphis, TN 38152, USA
| | - John V Freudenstein
- Herbarium, Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH 43212, USA and
| |
Collapse
|
47
|
Zhang SQ, Che LH, Li Y, Dan Liang, Pang H, Ślipiński A, Zhang P. Evolutionary history of Coleoptera revealed by extensive sampling of genes and species. Nat Commun 2018; 9:205. [PMID: 29335414 PMCID: PMC5768713 DOI: 10.1038/s41467-017-02644-4] [Citation(s) in RCA: 182] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 12/15/2017] [Indexed: 01/04/2023] Open
Abstract
Beetles (Coleoptera) are the most diverse and species-rich group of insects, and a robust, time-calibrated phylogeny is fundamental to understanding macroevolutionary processes that underlie their diversity. Here we infer the phylogeny and divergence times of all major lineages of Coleoptera by analyzing 95 protein-coding genes in 373 beetle species, including ~67% of the currently recognized families. The subordinal relationships are strongly supported as Polyphaga (Adephaga (Archostemata, Myxophaga)). The series and superfamilies of Polyphaga are mostly monophyletic. The species-poor Nosodendridae is robustly recovered in a novel position sister to Staphyliniformia, Bostrichiformia, and Cucujiformia. Our divergence time analyses suggest that the crown group of extant beetles occurred ~297 million years ago (Mya) and that ~64% of families originated in the Cretaceous. Most of the herbivorous families experienced a significant increase in diversification rate during the Cretaceous, thus suggesting that the rise of angiosperms in the Cretaceous may have been an 'evolutionary impetus' driving the hyperdiversity of herbivorous beetles.
Collapse
Affiliation(s)
- Shao-Qian Zhang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006, China
| | - Li-Heng Che
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006, China
| | - Yun Li
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006, China
| | - Dan Liang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006, China
| | - Hong Pang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006, China
| | - Adam Ślipiński
- Australian National Insect Collection, CSIRO, GPO Box 1700, Canberra, ACT 2601, Australia.
| | - Peng Zhang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006, China.
| |
Collapse
|
48
|
Mallo D, Posada D. Multilocus inference of species trees and DNA barcoding. Philos Trans R Soc Lond B Biol Sci 2017; 371:rstb.2015.0335. [PMID: 27481787 PMCID: PMC4971187 DOI: 10.1098/rstb.2015.0335] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/10/2016] [Indexed: 11/30/2022] Open
Abstract
The unprecedented amount of data resulting from next-generation sequencing has opened a new era in phylogenetic estimation. Although large datasets should, in theory, increase phylogenetic resolution, massive, multilocus datasets have uncovered a great deal of phylogenetic incongruence among different genomic regions, due both to stochastic error and to the action of different evolutionary process such as incomplete lineage sorting, gene duplication and loss and horizontal gene transfer. This incongruence violates one of the fundamental assumptions of the DNA barcoding approach, which assumes that gene history and species history are identical. In this review, we explain some of the most important challenges we will have to face to reconstruct the history of species, and the advantages and disadvantages of different strategies for the phylogenetic analysis of multilocus data. In particular, we describe the evolutionary events that can generate species tree—gene tree discordance, compare the most popular methods for species tree reconstruction, highlight the challenges we need to face when using them and discuss their potential utility in barcoding. Current barcoding methods sacrifice a great amount of statistical power by only considering one locus, and a transition to multilocus barcodes would not only improve current barcoding methods, but also facilitate an eventual transition to species-tree-based barcoding strategies, which could better accommodate scenarios where the barcode gap is too small or inexistent. This article is part of the themed issue ‘From DNA barcodes to biomes’.
Collapse
Affiliation(s)
- Diego Mallo
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| |
Collapse
|
49
|
Wen D, Nakhleh L. Coestimating Reticulate Phylogenies and Gene Trees from Multilocus Sequence Data. Syst Biol 2017; 67:439-457. [DOI: 10.1093/sysbio/syx085] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 10/24/2017] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Luay Nakhleh
- Department of Computer Science
- Department of BioSciences, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
50
|
Molloy EK, Warnow T. To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods. Syst Biol 2017; 67:285-303. [DOI: 10.1093/sysbio/syx077] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 09/13/2017] [Indexed: 01/27/2023] Open
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|