101
|
A Robust Phylogenomic Time Tree for Biotechnologically and Medically Important Fungi in the Genera Aspergillus and Penicillium. mBio 2019; 10:mBio.00925-19. [PMID: 31289177 PMCID: PMC6747717 DOI: 10.1128/mbio.00925-19] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Understanding the evolution of traits across technologically and medically significant fungi requires a robust phylogeny. Even though species in the Aspergillus and Penicillium genera (family Aspergillaceae, class Eurotiomycetes) are some of the most significant technologically and medically relevant fungi, we still lack a genome-scale phylogeny of the lineage or knowledge of the parts of the phylogeny that exhibit conflict among analyses. Here, we used a phylogenomic approach to infer evolutionary relationships among 81 genomes that span the diversity of Aspergillus and Penicillium species, to identify conflicts in the phylogeny, and to determine the likely underlying factors of the observed conflicts. Using a data matrix comprised of 1,668 genes, we found that while most branches of the phylogeny of the Aspergillaceae are robustly supported and recovered irrespective of method of analysis, a few exhibit various degrees of conflict among our analyses. Further examination of the observed conflict revealed that it largely stems from incomplete lineage sorting and hybridization or introgression. Our analyses provide a robust and comprehensive evolutionary genomic roadmap for this important lineage, which will facilitate the examination of the diverse technologically and medically relevant traits of these fungi in an evolutionary context. The filamentous fungal family Aspergillaceae contains >1,000 known species, mostly in the genera Aspergillus and Penicillium. Several species are used in the food, biotechnology, and drug industries (e.g., Aspergillus oryzae and Penicillium camemberti), while others are dangerous human and plant pathogens (e.g., Aspergillus fumigatus and Penicillium digitatum). To infer a robust phylogeny and pinpoint poorly resolved branches and their likely underlying contributors, we used 81 genomes spanning the diversity of Aspergillus and Penicillium to construct a 1,668-gene data matrix. Phylogenies of the nucleotide and amino acid versions of this full data matrix as well as of several additional data matrices were generated using three different maximum likelihood schemes (i.e., gene-partitioned, unpartitioned, and coalescence) and using both site-homogenous and site-heterogeneous models (total of 64 species-level phylogenies). Examination of the topological agreement among these phylogenies and measures of internode certainty identified 11/78 (14.1%) bipartitions that were incongruent and pinpointed the likely underlying contributing factors, which included incomplete lineage sorting, hidden paralogy, hybridization or introgression, and reconstruction artifacts associated with poor taxon sampling. Relaxed molecular clock analyses suggest that Aspergillaceae likely originated in the lower Cretaceous and that the Aspergillus and Penicillium genera originated in the upper Cretaceous. Our results shed light on the ongoing debate on Aspergillus systematics and taxonomy and provide a robust evolutionary and temporal framework for comparative genomic analyses in Aspergillaceae. More broadly, our approach provides a general template for phylogenomic identification of resolved and contentious branches in densely genome-sequenced lineages across the tree of life.
Collapse
|
102
|
Vasilikopoulos A, Balke M, Beutel RG, Donath A, Podsiadlowski L, Pflug JM, Waterhouse RM, Meusemann K, Peters RS, Escalona HE, Mayer C, Liu S, Hendrich L, Alarie Y, Bilton DT, Jia F, Zhou X, Maddison DR, Niehuis O, Misof B. Phylogenomics of the superfamily Dytiscoidea (Coleoptera: Adephaga) with an evaluation of phylogenetic conflict and systematic error. Mol Phylogenet Evol 2019; 135:270-285. [DOI: 10.1016/j.ympev.2019.02.022] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 02/22/2019] [Accepted: 02/25/2019] [Indexed: 02/07/2023]
|
103
|
Barrett K, Lange L. Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP). BIOTECHNOLOGY FOR BIOFUELS 2019; 12:102. [PMID: 31168320 PMCID: PMC6489277 DOI: 10.1186/s13068-019-1436-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 04/13/2019] [Indexed: 05/24/2023]
Abstract
BACKGROUND Insight into the function of carbohydrate-active enzymes is required to understand their biological role and industrial potential. There is a need for better use of the ample genomic data in order to enable selection of the most interesting proteins for further studies. The basis for elaborating a new approach to sequence analysis is the hypothesis that when using conserved peptide patterns to determine the similarities between proteins, the exact spacing between conserved adjacent amino acids in the proteins plays a prominent functional role. Thus, the objective of developing the method of conserved unique peptide patterns (CUPP) is to construct a peptide-based grouping and validate the method to provide evidence that CUPP captures function-related features of the individual carbohydrate-active enzymes (as defined by CAZy families). This approach facilitates grouping of enzymes at a level lower than protein families and/or subfamilies. A standardized, efficient, and robust approach to functional annotation of carbohydrate-active enzymes would support improved molecular insight into enzyme-substrate interaction. RESULTS A new nonalignment-based clustering and functional annotation tool was developed that uses conserved unique peptides patterns to perform automated clustering of proteins and formation of protein groups. A peptide-based model was constructed for each of these protein CUPP groups to be used to automatically annotate protein family, subfamily, and EC function of carbohydrate-active enzymes. CUPP prediction can annotate proteins (from any CAZy family) with high F-score to existing family (0.966), subfamily (0.961), and EC-function (0.843). The speed of the CUPP program was estimated and exemplified by prediction of the 504,017 nonredundant proteins of CAZy in less than four CPU hours. CONCLUSION It was possible to construct an automated system for clustering proteins within families and use the resulting CUPP groups to directly build peptide-based models for genome annotation. The CUPP runtime, F-score, sensitivity, and precisions of family and subfamily annotations match or represent an improvement compared to state-of-the-art tools. The speed of the CUPP annotation is similar to the rapid DIAMOND annotation tool. CUPP facilitates automated annotation of full genome assemblies to any CAZy family.
Collapse
Affiliation(s)
- Kristian Barrett
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Lene Lange
- BioEconomy, Research & Advisory, Valby, Denmark
| |
Collapse
|
104
|
Laumer CE. Inferring Ancient Relationships with Genomic Data: A Commentary on Current Practices. Integr Comp Biol 2019; 58:623-639. [PMID: 29982611 DOI: 10.1093/icb/icy075] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Contemporary phylogeneticists enjoy an embarrassment of riches, not only in the volumes of data now available, but also in the diversity of bioinformatic tools for handling these data. Here, I discuss a subset of these tools I consider well-suited to the task of inferring ancient relationships with coding sequence data in particular, encompassing data generation, orthology assignment, alignment and gene tree inference, supermatrix construction, and analysis under the best-fitting models applicable to large-scale datasets. Throughout, I compare and critique methods, considering both their theoretical principles and the details of their implementation, and offering practical tips on usage where appropriate. I also entertain different motivations for analyzing what are almost always originally DNA sequence data as codons, amino acids, and higher-order recodings. Although presented in a linear order, I see value in using the diversity of tools available to us to assess the sensitivity of clades of biological interest to different gene and taxon sets and analytical modes, which can be an indication of the presence of systematic error, of which a few forms remain poorly controlled by even the best available inference methods.
Collapse
Affiliation(s)
- Christopher E Laumer
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, EBML-EBI South Building, Hinxton CB10 1SD, UK
| |
Collapse
|
105
|
Kelnarova I, Jendek E, Grebennikov VV, Bocak L. First molecular phylogeny of Agrilus (Coleoptera: Buprestidae), the largest genus on Earth, with DNA barcode database for forestry pest diagnostics. BULLETIN OF ENTOMOLOGICAL RESEARCH 2019; 109:200-211. [PMID: 29784069 DOI: 10.1017/s0007485318000330] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
All more than 3000 species of Agrilus beetles are phytophagous and some cause economically significant damage to trees and shrubs. Facilitated by international trade, Agrilus species regularly invade new countries and continents. This necessitates a rapid identification of Agrilus species, as the first step for subsequent protective measures. This study provides the first DNA reference library for ~100 Agrilus species from the Northern Hemisphere based on three mitochondrial markers: cox1-5' (DNA barcode fragment), cox1-3', and rrnL. All 329 Agrilus records available in the Barcode of Life Database format, including specimen images and geo data, are released through a public dataset 'Agrilus1 329' available at: dx.doi.org/10.5883/DS-AGRILUS1. All Agrilus species were identified using adult morphology and by using molecular phylogenetic trees, as well as distance- and tree-based algorithms. Most DNA-based species limits agree well with the morphology-based identification. Our results include cases of high intraspecific variability and multiple species para- and polyphyly. DNA barcoding is a powerful species identification tool in Agrilus, although it frequently fails to recover morphologically-delimited Agrilus species-group. Even though the current three-gene database covers only ~3% of the known Agrilus diversity, it contains representatives of all principal lineages from the Northern Hemisphere and represents the most extensive dataset built for DNA-delimited species identification within this genus so far. Molecular data analyses can rapidly and cost-effectively identify an unknown sample, including immature stages and/or non-native taxa, or species not yet formally named.
Collapse
Affiliation(s)
- I Kelnarova
- Department of Zoology,Faculty of Science UP,Olomouc,Czech Republic
| | - E Jendek
- Department of Forest Protection and Entomology,Faculty of Forestry and Wood Sciences, Czech University of Life Sciences,Kamýcká 1176, CZ-165 21, Prague 6-Suchdol,Czech Republic
| | - V V Grebennikov
- Canadian Food Inspection Agency,960 Carling Avenue, Ottawa, ON K1A 0Y9,Canada
| | - L Bocak
- Department of Zoology,Faculty of Science UP,Olomouc,Czech Republic
| |
Collapse
|
106
|
Six Impossible Things before Breakfast: Assumptions, Models, and Belief in Molecular Dating. Trends Ecol Evol 2019; 34:474-486. [PMID: 30904189 DOI: 10.1016/j.tree.2019.01.017] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 01/16/2023]
Abstract
Confidence in molecular dating analyses has grown with the increasing sophistication of the methods. Some problematic cases where molecular dates disagreed with paleontological estimates appear to have been resolved with a growing agreement between molecules and fossils. But we cannot relax just yet. The growing analytical sophistication of many molecular dating methods relies on an increasingly large number of assumptions about evolutionary history and processes. Many of these assumptions are based on statistical tractability rather than being informed by improved understanding of molecular evolution, yet changing the assumptions can influence molecular dates. How can we tell if the answers we get are driven more by the assumptions we make than by the molecular data being analyzed?
Collapse
|
107
|
Küppers GC, da Silva Paiva T, do Nascimento Borges B, Alfaro ER, Claps MC. A new oligotrich (Ciliophora, Oligotrichia) from Argentina, with redefinition of Novistrombidium Song and Bradbury. Eur J Protistol 2019; 69:20-36. [PMID: 30870724 DOI: 10.1016/j.ejop.2019.02.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 02/05/2019] [Accepted: 02/11/2019] [Indexed: 11/27/2022]
Abstract
A new oligotrich similar to Novistrombidium was discovered in plankton samples from an artificial tributary of the Salado River, Buenos Aires province, Argentina, in summer 2010. Propecingulum fistoleramalliei sp. n. has an obovate and anteriorly truncated body, with a conspicuous ventral furrow, is flattened ventrally, and has a prominent right apical protrusion. It temporarily attaches to the substratum by a posterior mucous thread. Rod-shaped extrusomes arranged equidistantly and insert directly above the girdle kinety. The macronucleus is globular to ellipsoidal. The contractile vacuole is located in the left, anterior quarter of the cell and the adoral zone is composed of 30-35 collar, 9-14 buccal, and two thigmotactic membranelles. The girdle kinety is dextrally spiraled and ventrally open; the ventral kinety is posterior to anterior end of the girdle kinety. The oral primordium develops posterior to the right thigmotactic membranelle and anterior the stripe of extrusomes above left, lateral portion of the girdle kinety. The SSUrDNA phylogeny confirms one more time that Novistrombidium is not monophyletic; consequently, we elevate the subgenus Propecingulum up to genus rank and redefine the genus Novistrombidium.
Collapse
Affiliation(s)
- Gabriela Cristina Küppers
- División Invertebrados, Museo Argentino de Ciencias Naturales "Bernardino Rivadavia", Buenos Aires, Argentina.
| | - Thiago da Silva Paiva
- Laboratório de Protistologia, Instituto de Biologia, Departamento de Zoologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - Bárbara do Nascimento Borges
- Laboratório de Biologia Molecular "Francisco Mauro Salzano", Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, PA, Brazil
| | - Elisa Raquel Alfaro
- División Invertebrados, Museo Argentino de Ciencias Naturales "Bernardino Rivadavia", Buenos Aires, Argentina
| | - María Cristina Claps
- Instituto de Limnología "Dr. R. A. Ringuelet", La Plata, Buenos Aires, Argentina
| |
Collapse
|
108
|
Chang JM, Floden EW, Herrero J, Gascuel O, Di Tommaso P, Notredame C. Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability. Bioinformatics 2019; 37:1506-1514. [PMID: 30726875 PMCID: PMC8275982 DOI: 10.1093/bioinformatics/btz082] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 12/12/2018] [Accepted: 02/05/2019] [Indexed: 12/30/2022] Open
Abstract
Motivation Most evolutionary analyses are based on pre-estimated multiple sequence alignment. Wong et al. established the existence of an uncertainty induced by multiple sequence alignment when reconstructing phylogenies. They were able to show that in many cases different aligners produce different phylogenies, with no simple objective criterion sufficient to distinguish among these alternatives. Results We demonstrate that incorporating MSA induced uncertainty into bootstrap sampling can significantly increase correlation between clade correctness and its corresponding bootstrap value. Our procedure involves concatenating several alternative multiple sequence alignments of the same sequences, produced using different commonly used aligners. We then draw bootstrap replicates while favoring columns of the more unique aligner among the concatenated aligners. We named this concatenation and bootstrapping method, Weighted Partial Super Bootstrap (wpSBOOT). We show on three simulated datasets of 16, 32 and 64 tips that our method improves the predictive power of bootstrap values. We also used as a benchmark an empirical collection of 853 1-to-1 orthologous genes from seven yeast species and found wpSBOOT to significantly improve discrimination capacity between topologically correct and incorrect trees. Bootstrap values of wpSBOOT are comparable to similar readouts estimated using a single method. However, for reduced trees by 50% and 95% bootstrap thresholds, wpSBOOT comes out the lowest Type I error (less FP). Availability The automated generation of replicates has been implemented in the T-Coffee package, which is available as open source freeware available from www.tcoffee.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jia-Ming Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Evan W Floden
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Olivier Gascuel
- Unité Bioinformatique Evolutive, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI)-USR 3756 CNRS and Institut Pasteur, Paris, France
| | - Paolo Di Tommaso
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Cedric Notredame
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
109
|
Di Franco A, Poujol R, Baurain D, Philippe H. Evaluating the usefulness of alignment filtering methods to reduce the impact of errors on evolutionary inferences. BMC Evol Biol 2019; 19:21. [PMID: 30634908 PMCID: PMC6330419 DOI: 10.1186/s12862-019-1350-2] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 01/02/2019] [Indexed: 11/10/2022] Open
Abstract
Background Multiple Sequence Alignments (MSAs) are the starting point of molecular evolutionary analyses. Errors in MSAs generate a non-historical signal that can lead to incorrect inferences. Therefore, numerous efforts have been made to reduce the impact of alignment errors, by improving alignment algorithms and by developing methods to filter out poorly aligned regions. However, MSAs do not only contain alignment errors, but also primary sequence errors. Such errors may originate from sequencing errors, from assembly errors, or from erroneous structural annotations (such as incorrect intron/exon boundaries). Even though their existence is acknowledged, the impact of primary sequence errors on evolutionary inference is poorly characterized. Results In a first step to fill this gap, we have developed a program called HmmCleaner, which detects and eliminates these errors from MSAs. It uses profile hidden Markov models (pHMM) to identify sequence segments that poorly fit their MSA and selectively removes them. We assessed its performances using > 700 amino-acid MSAs from prokaryotes and eukaryotes, in which we introduced several types of simulated primary sequence errors. The sensitivity of HmmCleaner towards simulated primary sequence errors was > 95%. In a second step, we compared the impact of segment filtering software (HmmCleaner and PREQUAL) relative to commonly used block-filtering software (BMGE and TrimAI) on evolutionary analyses. Using real data from vertebrates, we observed that segment-filtering methods improve the quality of evolutionary inference more than the currently used block-filtering methods. The formers were especially effective at improving branch length inferences, and at reducing false positive rate during detection of positive selection. Conclusions Segment filtering methods such as HmmCleaner accurately detect simulated primary sequence errors. Our results suggest that these errors are more detrimental than alignment errors. However, they also show that stochastic (sampling) error is predominant in single-gene evolutionary inferences. Therefore, we argue that MSA filtering should focus on segment instead of block removal and that more studies are required to find the optimal balance between accuracy improvement and stochastic error increase brought by data removal. Electronic supplementary material The online version of this article (10.1186/s12862-019-1350-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Arnaud Di Franco
- Station d'Ecologie Théorique et Expérimentale de Moulis, CNRS, Moulis, France
| | - Raphaël Poujol
- Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Montréal, Québec, Canada
| | - Denis Baurain
- InBioS-PhytoSYSTEMS, Unité de Phylogénomique des Eucaryotes, Université de Liège, Liège, Belgium
| | - Hervé Philippe
- Station d'Ecologie Théorique et Expérimentale de Moulis, CNRS, Moulis, France. .,Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Montréal, Québec, Canada.
| |
Collapse
|
110
|
Borowiec ML. Convergent Evolution of the Army Ant Syndrome and Congruence in Big-Data Phylogenetics. Syst Biol 2019; 68:642-656. [DOI: 10.1093/sysbio/syy088] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 11/09/2018] [Accepted: 12/15/2018] [Indexed: 11/12/2022] Open
Affiliation(s)
- Marek L Borowiec
- Department of Entomology, Plant Pathology and Nematology, 875 Perimeter Drive, University of Idaho, Moscow, ID 83844, USA
- School of Life Sciences, Social Insect Research Group, Arizona State University, Tempe, AZ 85287, USA
- Department of Entomology and Nematology, One Shields Avenue, University of California at Davis, Davis, CA 95616, USA
| |
Collapse
|
111
|
Ashkenazy H, Sela I, Levy Karin E, Landan G, Pupko T. Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction. Syst Biol 2019; 68:117-130. [PMID: 29771363 PMCID: PMC6657586 DOI: 10.1093/sysbio/syy036] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 05/07/2018] [Accepted: 05/09/2018] [Indexed: 01/11/2023] Open
Abstract
The classic methodology of inferring a phylogenetic tree from sequence data is composed of two steps. First, a multiple sequence alignment (MSA) is computed. Then, a tree is reconstructed assuming the MSA is correct. Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce tree inference accuracy. It was previously proposed that filtering unreliable alignment regions can increase the accuracy of tree inference. However, it was also demonstrated that the benefit of this filtering is often obscured by the resulting loss of phylogenetic signal. In this work we explore an approach, in which instead of relying on a single MSA, we generate a large set of alternative MSAs and concatenate them into a single SuperMSA. By doing so, we account for phylogenetic signals contained in columns that are not present in the single MSA computed by alignment algorithms. Using simulations, we demonstrate that this approach results, on average, in more accurate trees compared to 1) using an unfiltered MSA and 2) using a single MSA with weights assigned to columns according to their reliability. Next, we explore in which regions of the MSA space our approach is expected to be beneficial. Finally, we provide a simple criterion for deciding whether or not the extra effort of computing a SuperMSA and inferring a tree from it is beneficial. Based on these assessments, we expect our methodology to be useful for many cases in which diverged sequences are analyzed. The option to generate such a SuperMSA is available at http://guidance.tau.ac.il.
Collapse
Affiliation(s)
- Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
| | - Itamar Sela
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
- Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Giddy Landan
- Institute of Microbiology, Christian-Albrechts-University of Kiel, 24118 Kiel, Germany
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
| |
Collapse
|
112
|
Rojas-Cruz A, Reyes-Bermúdez A. Phylogenetic analysis of Alphapapillomavirus based on L1, E6 and E7 regions suggests that carcinogenicity and tissue tropism have appeared multiple times during viral evolution. INFECTION GENETICS AND EVOLUTION 2018; 67:210-221. [PMID: 30458293 DOI: 10.1016/j.meegid.2018.11.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 11/07/2018] [Accepted: 11/08/2018] [Indexed: 11/18/2022]
Abstract
Members of the Alphapapillomavirus genus are causative agents for cervix cancer and benign lesions in humans. These viruses are classified according to sequence similarities in their L1 region. Yet, viral carcinogenicity has been associated with variations in the proteins encoded by the E6 and E7 genes. In order to relate evolutionary history with origin of carcinogenicity, we performed phylogenetic reconstructions using both nucleotide and predicted amino acid sequences of the L1, E6 and E7 genes. Whilst phylogenetic analysis of L1 reconstructed genus evolutionary history, phylogenies based on E6 and E7 proteins support the idea that mutations at amino acids S/Tx [V/L] (E6) and LxCxE (E7) might be responsible for carcinogenic potential. These findings indicate that virulence within Alphapapillomavirus have appeared multiple times during evolution. Our results reveal that oncogenic potential is not a monophyletic clade-specific adaptation but might be the result of positive selection on random mutations occurring on proteins involved in host infection during viral diversification.
Collapse
Affiliation(s)
- Alexis Rojas-Cruz
- Departamento de Biología, Facultad de Ciencias Básicas, Universidad de la Amazonia, Florencia 180002, Colombia
| | - Alejandro Reyes-Bermúdez
- Departamento de Biología, Facultad de Ciencias Básicas, Universidad de la Amazonia, Florencia 180002, Colombia.
| |
Collapse
|
113
|
Penzar D, Krivozubov M, Spirin S. PQ, a new program for phylogeny reconstruction. BMC Bioinformatics 2018; 19:374. [PMID: 30314446 PMCID: PMC6186109 DOI: 10.1186/s12859-018-2399-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 09/25/2018] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Many algorithms and programs are available for phylogenetic reconstruction of families of proteins. Methods used widely at present use either a number of distance-based principles or character-based principles of maximum parsimony or maximum likelihood. RESULTS We developed a novel program, named PQ, for reconstructing protein and nucleic acid phylogenies following a new character-based principle. Being tested on natural sequences PQ improves upon the results of maximum parsimony and maximum likelihood. Working with alignments of 10 and 15 sequences, it also outperforms the FastME program, which is based on one of the distance-based principles. Among all tested programs PQ is proved to be the least susceptible to long branch attraction. FastME outperforms PQ when processing alignments of 45 sequences, however. We confirm a recent result that on natural sequences FastME outperforms maximum parsimony and maximum likelihood. At the same time, both PQ and FastME are inferior to maximum parsimony and maximum likelihood on simulated sequences. PQ is open source and available to the public via an online interface. CONCLUSIONS The software we developed offers an open-source alternative for phylogenetic reconstruction for relatively small sets of proteins and nucleic acids, with up to a few tens of sequences.
Collapse
Affiliation(s)
- Dmitry Penzar
- Faculty of Bioengineering and Bioinformatics, Moscow State University, 1 Leninskiye Gory, bld. 73, Moscow, 119991 Russia
| | - Mikhail Krivozubov
- Gamaleya Center of Epidemiology and Microbiology, 18 Gamaleya st., Moscow, 123098 Russia
| | - Sergey Spirin
- Faculty of Bioengineering and Bioinformatics, Moscow State University, 1 Leninskiye Gory, bld. 73, Moscow, 119991 Russia
- Belozersky Institute of Physico-Chemical Biology, Moscow State University, 1 Leninskiye Gory, bld. 40, Moscow, 119991 Russia
- Higher School of Economics, 20 Myasnitskaya st., Moscow, Russia
| |
Collapse
|
114
|
Masrati G, Dwivedi M, Rimon A, Gluck-Margolin Y, Kessel A, Ashkenazy H, Mayrose I, Padan E, Ben-Tal N. Broad phylogenetic analysis of cation/proton antiporters reveals transport determinants. Nat Commun 2018; 9:4205. [PMID: 30310075 PMCID: PMC6181914 DOI: 10.1038/s41467-018-06770-5] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 09/24/2018] [Indexed: 11/08/2022] Open
Abstract
Cation/proton antiporters (CPAs) play a major role in maintaining living cells' homeostasis. CPAs are commonly divided into two main groups, CPA1 and CPA2, and are further characterized by two main phenotypes: ion selectivity and electrogenicity. However, tracing the evolutionary relationships of these transporters is challenging because of the high diversity within CPAs. Here, we conduct comprehensive evolutionary analysis of 6537 representative CPAs, describing the full complexity of their phylogeny, and revealing a sequence motif that appears to determine central phenotypic characteristics. In contrast to previous suggestions, we show that the CPA1/CPA2 division only partially correlates with electrogenicity. Our analysis further indicates two acidic residues in the binding site that carry the protons in electrogenic CPAs, and a polar residue in the unwound transmembrane helix 4 that determines ion selectivity. A rationally designed triple mutant successfully converted the electrogenic CPA, EcNhaA, to be electroneutral.
Collapse
Affiliation(s)
- Gal Masrati
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Ramat-Aviv, 69978, Tel-Aviv, Israel
| | - Manish Dwivedi
- Department of Biological Chemistry, The Alexander Silberman Inst. of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Abraham Rimon
- Department of Biological Chemistry, The Alexander Silberman Inst. of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Yael Gluck-Margolin
- Department of Biological Chemistry, The Alexander Silberman Inst. of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Amit Kessel
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Ramat-Aviv, 69978, Tel-Aviv, Israel
| | - Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Ramat-Aviv, 69978, Tel-Aviv, Israel
| | - Itay Mayrose
- Department of Molecular Biology and Ecology of Plant, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Ramat-Aviv, 69978, Tel-Aviv, Israel
| | - Etana Padan
- Department of Biological Chemistry, The Alexander Silberman Inst. of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Ramat-Aviv, 69978, Tel-Aviv, Israel.
| |
Collapse
|
115
|
Villaverde T, Pokorny L, Olsson S, Rincón-Barrado M, Johnson MG, Gardner EM, Wickett NJ, Molero J, Riina R, Sanmartín I. Bridging the micro- and macroevolutionary levels in phylogenomics: Hyb-Seq solves relationships from populations to species and above. THE NEW PHYTOLOGIST 2018; 220:636-650. [PMID: 30016546 DOI: 10.1111/nph.15312] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 06/04/2018] [Indexed: 05/20/2023]
Abstract
Reconstructing phylogenetic relationships at the micro- and macroevoutionary levels within the same tree is problematic because of the need to use different data types and analytical frameworks. We test the power of target enrichment to provide phylogenetic resolution based on DNA sequences from above species to within populations, using a large herbarium sampling and Euphorbia balsamifera (Euphorbiaceae) as a case study. Target enrichment with custom probes was combined with genome skimming (Hyb-Seq) to sequence 431 low-copy nuclear genes and partial plastome DNA. We used supermatrix, multispecies-coalescent approaches, and Bayesian dating to estimate phylogenetic relationships and divergence times. Euphorbia balsamifera, with a disjunct Rand Flora-type distribution at opposite sides of Africa, comprises three well-supported subspecies: western Sahelian sepium is sister to eastern African-southern Arabian adenensis and Macaronesian-southwest Moroccan balsamifera. Lineage divergence times support Late Miocene to Pleistocene diversification and climate-driven vicariance to explain the Rand Flora pattern. We show that probes designed using genomic resources from taxa not directly related to the focal group are effective in providing phylogenetic resolution at deep and shallow evolutionary levels. Low capture efficiency in herbarium samples increased the proportion of missing data but did not bias estimation of phylogenetic relationships or branch lengths.
Collapse
Affiliation(s)
- Tamara Villaverde
- Real Jardín Botánico (RJB-CSIC), Plaza de Murillo 2, 28014, Madrid, Spain
| | - Lisa Pokorny
- Comparative Plant and Fungal Biology Department, Royal Botanic Gardens, Kew, Richmond, TW9 3DS, UK
| | - Sanna Olsson
- Department of Forest Ecology and Genetics, INIA Forest Research Centre (INIA-CIFOR), Ctra. de la Coruña km. 7.5, 28040, Madrid, Spain
| | | | - Matthew G Johnson
- Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX, 79409-43131, USA
- Department of Plant Science and Conservation, Chicago Botanical Garden, 1000 Lake Cook Road, Glencoe, IL, 60022, USA
| | | | - Norman J Wickett
- Department of Plant Science and Conservation, Chicago Botanical Garden, 1000 Lake Cook Road, Glencoe, IL, 60022, USA
- Program in Plant Biology and Conservation, Northwestern University, 2205 Tech Drive, Evanston, IL, 60208, USA
| | - Julià Molero
- Laboratori de Botànica, Departament de Biologia, Sanitat i Medi Ambient, Facultat de Farmàcia, Universitat de Barcelona, 08028, Barcelona, Spain
| | - Ricarda Riina
- Real Jardín Botánico (RJB-CSIC), Plaza de Murillo 2, 28014, Madrid, Spain
| | - Isabel Sanmartín
- Real Jardín Botánico (RJB-CSIC), Plaza de Murillo 2, 28014, Madrid, Spain
| |
Collapse
|
116
|
Kobayashi G, Goto R, Takano T, Kojima S. Molecular phylogeny of Maldanidae (Annelida): Multiple losses of tube-capping plates and evolutionary shifts in habitat depth. Mol Phylogenet Evol 2018; 127:332-344. [DOI: 10.1016/j.ympev.2018.04.036] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Revised: 04/16/2018] [Accepted: 04/23/2018] [Indexed: 11/27/2022]
|
117
|
Krah FS, Bässler C, Heibl C, Soghigian J, Schaefer H, Hibbett DS. Evolutionary dynamics of host specialization in wood-decay fungi. BMC Evol Biol 2018; 18:119. [PMID: 30075699 PMCID: PMC6091043 DOI: 10.1186/s12862-018-1229-7] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 07/03/2018] [Indexed: 11/23/2022] Open
Abstract
Background The majority of wood decomposing fungi are mushroom-forming Agaricomycetes, which exhibit two main modes of plant cell wall decomposition: white rot, in which all plant cell wall components are degraded, including lignin, and brown rot, in which lignin is modified but not appreciably removed. Previous studies suggested that brown rot fungi tend to be specialists of gymnosperm hosts and that brown rot promotes gymnosperm specialization. However, these hypotheses were based on analyses of limited datasets of Agaricomycetes. Overcoming this limitation, we used a phylogeny with 1157 species integrating available sequences, assembled decay mode characters from the literature, and coded host specialization using the newly developed R package, rusda. Results We found that most brown rot fungi are generalists or gymnosperm specialists, whereas most white rot fungi are angiosperm specialists. A six-state model of the evolution of host specialization revealed high transition rates between generalism and specialization in both decay modes. However, while white rot lineages switched most frequently to angiosperm specialists, brown rot lineages switched most frequently to generalism. A time-calibrated phylogeny revealed that Agaricomycetes is older than the flowering plants but many of the large clades originated after the diversification of the angiosperms in the Cretaceous. Conclusions Our results challenge the current view that brown rot fungi are primarily gymnosperm specialists and reveal intensive white rot specialization to angiosperm hosts. We thus suggest that brown rot associated convergent loss of lignocellulose degrading enzymes was correlated with host generalism, rather than gymnosperm specialism. A likelihood model of host specialization evolution together with a time-calibrated phylogeny further suggests that the rise of the angiosperms opened a new mega-niche for wood-decay fungi, which was exploited particularly well by white rot lineages. Electronic supplementary material The online version of this article (10.1186/s12862-018-1229-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Franz-Sebastian Krah
- Plant Biodiversity Research Group, Center for Food and Life Sciences Weihenstephan, Technische Universität München, Freising, Germany. .,Baverian Forest National Park, Grafenau, Germany.
| | | | | | - John Soghigian
- Department of Environmental Science, The Connecticut Agricultural Experiment Station, New Haven, CT, 06511, USA
| | - Hanno Schaefer
- Plant Biodiversity Research Group, Center for Food and Life Sciences Weihenstephan, Technische Universität München, Freising, Germany
| | - David S Hibbett
- Biology Department, Clark University, Worcester, MA, 01610, USA
| |
Collapse
|
118
|
Griesmann M, Chang Y, Liu X, Song Y, Haberer G, Crook MB, Billault-Penneteau B, Lauressergues D, Keller J, Imanishi L, Roswanjaya YP, Kohlen W, Pujic P, Battenberg K, Alloisio N, Liang Y, Hilhorst H, Salgado MG, Hocher V, Gherbi H, Svistoonoff S, Doyle JJ, He S, Xu Y, Xu S, Qu J, Gao Q, Fang X, Fu Y, Normand P, Berry AM, Wall LG, Ané JM, Pawlowski K, Xu X, Yang H, Spannagl M, Mayer KFX, Wong GKS, Parniske M, Delaux PM, Cheng S. Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis. Science 2018; 361:science.aat1743. [DOI: 10.1126/science.aat1743] [Citation(s) in RCA: 198] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 05/16/2018] [Indexed: 12/20/2022]
|
119
|
Abstract
BACKGROUND Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. RESULTS We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. CONCLUSIONS TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .
Collapse
Affiliation(s)
- Uyen Mai
- Computer Science and Engineering, University of California at San Diego, San Diego, 92093 CA USA
| | - Siavash Mirarab
- Electrical and Computer Engineering, University of California at San Diego, San Diego, 92093 CA USA
| |
Collapse
|
120
|
Horizontal gene transfer constrains the timing of methanogen evolution. Nat Ecol Evol 2018; 2:897-903. [DOI: 10.1038/s41559-018-0513-7] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 02/20/2018] [Indexed: 11/08/2022]
|
121
|
Kim S, de Medeiros BAS, Byun BK, Lee S, Kang JH, Lee B, Farrell BD. West meets East: How do rainforest beetles become circum-Pacific? Evolutionary origin of Callipogon relictus and allied species (Cerambycidae: Prioninae) in the New and Old Worlds. Mol Phylogenet Evol 2018. [PMID: 29524651 DOI: 10.1016/j.ympev.2018.02.019] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The longhorn beetle genus Callipogon Audinet-Serville represents a small group of large wood-boring beetles whose distribution pattern exhibits a unique trans-Pacific disjunction between the East Asian temperate rainforest and the tropical rainforest of the Neotropics. To understand the biogeographic history underlying this circum-Pacific disjunct distribution, we reconstructed a molecular phylogeny of the subfamily Prioninae with extensive sampling of Callipogon using multilocus sequence data of 99 prionine and four parandrine samples (ingroups), together with two distant outgroup species. Our sampling of Callipogon includes 18 of the 24 currently accepted species, with complete representation of all species in our focal subgenera. Our phylogenetic analyses confirmed the purported affinity between the Palearctic Callipogon relictus and its Neotropical congeners. Furthermore, based on molecular dating under the fossilized birth-death (FBD) model with comprehensive fossil records and probabilistic ancestral range reconstructions, we estimated the crown group Callipogon to have originated in the Paleocene circa 60 million years ago (Ma) across the Neotropics and Eastern Palearctics. The divergence between the Palearctic C. relictus and its Neotropical congeners is explained as the result of a vicariance event following the demise of boreotropical forest across Beringia at the Eocene-Oligocene boundary. As C. relictus represents the unique relictual species that evidentiates the lineage's expansive ancient distribution, we evaluated its conservation importance through species distribution modelling. Though we estimated a range expansion for C. relictus by 2050, we emphasize a careful implementation of conservation programs towards the protection of primary forest across its current habitats, as the species remains highly vulnerable to habitat disturbance.
Collapse
Affiliation(s)
- Sangil Kim
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, USA.
| | - Bruno A S de Medeiros
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, USA
| | - Bong-Kyu Byun
- Department of Biological Science and Biotechnology, Hannam University, Daejeon, Republic of Korea
| | - Seunghwan Lee
- Laboratory of Insect Biosystematics, Department of Agricultural Biotechnology, Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Jung-Hoon Kang
- National Research Institute of Cultural Heritage, Cultural Heritage Administration, Daejeon, Republic of Korea
| | - Bongwoo Lee
- Division of Forest Biodiversity, Korea National Arboretum, Pocheon, Gyeonggi-do, Republic of Korea
| | - Brian D Farrell
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, USA
| |
Collapse
|
122
|
Vieira WAS, Lima WG, Nascimento ES, Michereff SJ, Câmara MPS, Doyle VP. The impact of phenotypic and molecular data on the inference of Colletotrichum diversity associated with Musa. Mycologia 2018; 109:912-934. [PMID: 29494311 DOI: 10.1080/00275514.2017.1418577] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Developing a comprehensive and reliable taxonomy for the Colletotrichum gloeosporioides species complex will require adopting data standards on the basis of an understanding of how methodological choices impact morphological evaluations and phylogenetic inference. We explored the impact of methodological choices in a morphological and molecular evaluation of Colletotrichum species associated with banana in Brazil. The choice of alignment filtering algorithm has a significant impact on topological inference and the retention of phylogenetically informative sites. Similarly, the choice of phylogenetic marker affects the delimitation of species boundaries, particularly if low phylogenetic signal is confounded with strong discordance, and inference of the species tree from multiple-gene trees. According to both phylogenetic informativeness profiling and Bayesian concordance analyses, the most informative loci are DNA lyase (APN2), intergenic spacer (IGS) between DNA lyase and the mating-type locus MAT1-2-1 (APN2/MAT-IGS), calmodulin (CAL), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), glutamine synthetase (GS), β-tubulin (TUB2), and a new marker, the intergenic spacer between GAPDH and an hypothetical protein (GAP2-IGS). Cornmeal agar minimizes the variance in conidial dimensions compared with potato dextrose agar and synthetic nutrient-poor agar, such that species are more readily distinguishable based on phenotypic differences. We apply these insights to investigate the diversity of Colletotrichum species associated with banana anthracnose in Brazil and report C. musae, C. tropicale, C. theobromicola, and C. siamense in association with banana anthracnose. One lineage did not cluster with any previously described species and is described here as C. chrysophilum.
Collapse
Affiliation(s)
- Willie A S Vieira
- a Departamento de Agronomia , Universidade Federal Rural de Pernambuco , Recife , Pernambuco , Brazil
| | - Waléria G Lima
- a Departamento de Agronomia , Universidade Federal Rural de Pernambuco , Recife , Pernambuco , Brazil
| | - Eduardo S Nascimento
- a Departamento de Agronomia , Universidade Federal Rural de Pernambuco , Recife , Pernambuco , Brazil
| | - Sami J Michereff
- a Departamento de Agronomia , Universidade Federal Rural de Pernambuco , Recife , Pernambuco , Brazil
| | - Marcos P S Câmara
- a Departamento de Agronomia , Universidade Federal Rural de Pernambuco , Recife , Pernambuco , Brazil
| | - Vinson P Doyle
- b Department of Plant Pathology and Crop Physiology , Louisiana State University AgCenter, Louisiana State University , Baton Rouge , Louisiana 70803
| |
Collapse
|
123
|
Bogusz M, Whelan S. Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking. Syst Biol 2018; 66:218-231. [PMID: 27633353 DOI: 10.1093/sysbio/syw074] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 08/23/2016] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error. [Alignment-free; distance-based phylogenetics; pair Hidden Markov Models; phylogenetic inference; statistical alignment.].
Collapse
Affiliation(s)
- Marcin Bogusz
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Simon Whelan
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| |
Collapse
|
124
|
Abstract
BACKGROUND Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. RESULTS HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. CONCLUSIONS In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, People's Republic of China
- Guangdong Province Key Laboratory of Popular High Performance Computers, Shenzhen University, Shenzhen, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Shixiang Wan
- School of Computer Science and Technology, Tianjin University, Tianjin, People's Republic of China
| | - Xiangxiang Zeng
- Department of Computer Science, Xiamen University, Xiamen, China.
| | - Zhanshan Sam Ma
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
125
|
Quattrini AM, Faircloth BC, Dueñas LF, Bridge TCL, Brugler MR, Calixto‐Botía IF, DeLeo DM, Forêt S, Herrera S, Lee SMY, Miller DJ, Prada C, Rádis‐Baptista G, Ramírez‐Portilla C, Sánchez JA, Rodríguez E, McFadden CS. Universal target‐enrichment baits for anthozoan (Cnidaria) phylogenomics: New approaches to long‐standing problems. Mol Ecol Resour 2017; 18:281-295. [DOI: 10.1111/1755-0998.12736] [Citation(s) in RCA: 85] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Revised: 10/28/2017] [Accepted: 11/06/2017] [Indexed: 12/31/2022]
Affiliation(s)
| | - Brant C. Faircloth
- Department of Biological Sciences and Museum of Natural Science Louisiana State University Baton Rouge LA USA
| | - Luisa F. Dueñas
- Departamento de Ciencias Biológicas‐Facultad de Ciencias Laboratorio de Biología Molecular Marina (BIOMMAR) Universidad de los Andes Bogotá Colombia
| | - Tom C. L. Bridge
- Queensland Museum Network Townsville QLD Australia
- Australian Research Council Centre of Excellence for Coral Reef Studies James Cook University Townsville QLD Australia
| | - Mercer R. Brugler
- Division of Invertebrate Zoology American Museum of Natural History New York NY USA
- Biological Sciences Department NYC College of Technology City University of New York Brooklyn NY USA
| | - Iván F. Calixto‐Botía
- Departamento de Ciencias Biológicas‐Facultad de Ciencias Laboratorio de Biología Molecular Marina (BIOMMAR) Universidad de los Andes Bogotá Colombia
- Department of Animal Ecology and Systematics Justus Liebig Universität Giessen Germany
| | - Danielle M. DeLeo
- Department of Biological Sciences Florida International University North Miami FL USA
- Biology Department Temple University Philadelphia PA USA
| | - Sylvain Forêt
- Research School of Biology Australian National University Canberra ACT Australia
| | - Santiago Herrera
- Department of Biological Sciences Lehigh University Bethlehem PA USA
| | - Simon M. Y. Lee
- State Key Laboratory of Quality Research in Chinese Medicine and Institute of Chinese Medical Sciences University of Macau Macao China
| | - David J. Miller
- Australian Research Council Centre of Excellence for Coral Reef Studies James Cook University Townsville QLD Australia
| | - Carlos Prada
- Department of Biological Sciences University of Rhode Island Kingston RI USA
| | | | - Catalina Ramírez‐Portilla
- Departamento de Ciencias Biológicas‐Facultad de Ciencias Laboratorio de Biología Molecular Marina (BIOMMAR) Universidad de los Andes Bogotá Colombia
- Department of Animal Ecology and Systematics Justus Liebig Universität Giessen Germany
| | - Juan A. Sánchez
- Departamento de Ciencias Biológicas‐Facultad de Ciencias Laboratorio de Biología Molecular Marina (BIOMMAR) Universidad de los Andes Bogotá Colombia
| | - Estefanía Rodríguez
- Division of Invertebrate Zoology American Museum of Natural History New York NY USA
| | | |
Collapse
|
126
|
Revision of Podocotyloides Yamaguti, 1934 (Digenea: Opecoelidae), resurrection of Pedunculacetabulum Yamaguti, 1934 and the naming of a cryptic opecoelid species. Syst Parasitol 2017; 95:1-31. [DOI: 10.1007/s11230-017-9761-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/29/2017] [Indexed: 10/18/2022]
|
127
|
Mishra B, Choi YJ, Thines M. Phylogenomics of Bartheletia paradoxa reveals its basal position in Agaricomycotina and that the early evolutionary history of basidiomycetes was rapid and probably not strictly bifurcating. Mycol Prog 2017. [DOI: 10.1007/s11557-017-1349-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
128
|
Ratmann O, Wymant C, Colijn C, Danaviah S, Essex M, Frost S, Gall A, Gaseitsiwe S, Grabowski MK, Gray R, Guindon S, von Haeseler A, Kaleebu P, Kendall M, Kozlov A, Manasa J, Minh BQ, Moyo S, Novitsky V, Nsubuga R, Pillay S, Quinn TC, Serwadda D, Ssemwanga D, Stamatakis A, Trifinopoulos J, Wawer M, Brown AL, de Oliveira T, Kellam P, Pillay D, Fraser C, on behalf of the PANGEA-HIV Consort. HIV-1 full-genome phylogenetics of generalized epidemics in sub-Saharan Africa: impact of missing nucleotide characters in next-generation sequences. AIDS Res Hum Retroviruses 2017; 33:1083-1098. [PMID: 28540766 PMCID: PMC5597042 DOI: 10.1089/aid.2017.0061] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the “Phylogenetics and Networks for Generalised HIV Epidemics in Africa” consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.
Collapse
Affiliation(s)
- Oliver Ratmann
- MRC Centre for Outbreak Analyses and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Chris Wymant
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Caroline Colijn
- Department of Mathematics, Imperial College London, London, United Kingdom
| | - Siva Danaviah
- Africa Health Research Institute, KwaZulu-Natal, South Africa
| | - Max Essex
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Botswana Harvard AIDS Institute Partnership, Gaborone, Botswana
| | - Simon Frost
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Astrid Gall
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | | | - Mary K. Grabowski
- Department of Epidemiology Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Rakai Health Sciences Program, Entebbe, Uganda
| | - Ronald Gray
- Department of Epidemiology Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Rakai Health Sciences Program, Entebbe, Uganda
| | - Stephane Guindon
- Department of Statistics, University of Auckland, Auckland, New Zealand
- Laboratoire d'Informatique, de Robotique et de Microelectronique de Montpellier–UMR 5506, CNRS & UM, Montpellier, France
| | - Arndt von Haeseler
- Centre for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, Vienna, Austria
- Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria
| | | | - Michelle Kendall
- Department of Mathematics, Imperial College London, London, United Kingdom
| | - Alexey Kozlov
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Justen Manasa
- Africa Health Research Institute, KwaZulu-Natal, South Africa
| | - Bui Quang Minh
- Centre for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, Vienna, Austria
| | - Sikhulile Moyo
- Botswana Harvard AIDS Institute Partnership, Gaborone, Botswana
| | - Vlad Novitsky
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Botswana Harvard AIDS Institute Partnership, Gaborone, Botswana
| | | | | | - Thomas C. Quinn
- Rakai Health Sciences Program, Entebbe, Uganda
- Division of Intramural Research, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, Maryland
- Department of Medicine Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - David Serwadda
- Rakai Health Sciences Program, Entebbe, Uganda
- Makerere University School of Public Health, Makerere University College of Health Sciences, Kampala, Uganda
| | | | - Alexandros Stamatakis
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Jana Trifinopoulos
- Centre for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, Vienna, Austria
| | - Maria Wawer
- Department of Epidemiology Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Rakai Health Sciences Program, Entebbe, Uganda
| | - Andy Leigh Brown
- School of Biological Sciences, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Tulio de Oliveira
- Nelson R. Mandela School of Medicine, School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Paul Kellam
- Department of Infectious Diseases and Immunity, Imperial College London, United Kingdom
| | - Deenan Pillay
- Africa Health Research Institute, KwaZulu-Natal, South Africa
- Division of Infection & Immunity, Faculty of Medical Sciences, University College London, London, United Kingdom
| | - Christophe Fraser
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | | |
Collapse
|
129
|
Edwards SV, Cloutier A, Baker AJ. Conserved Nonexonic Elements: A Novel Class of Marker for Phylogenomics. Syst Biol 2017; 66:1028-1044. [PMID: 28637293 PMCID: PMC5790140 DOI: 10.1093/sysbio/syx058] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Revised: 06/03/2017] [Accepted: 06/06/2017] [Indexed: 01/12/2023] Open
Abstract
Noncoding markers have a particular appeal as tools for phylogenomic analysis because, at least in vertebrates, they appear less subject to strong variation in GC content among lineages. Thus far, ultraconserved elements (UCEs) and introns have been the most widely used noncoding markers. Here we analyze and study the evolutionary properties of a new type of noncoding marker, conserved nonexonic elements (CNEEs), which consists of noncoding elements that are estimated to evolve slower than the neutral rate across a set of species. Although they often include UCEs, CNEEs are distinct from UCEs because they are not ultraconserved, and, most importantly, the core region alone is analyzed, rather than both the core and its flanking regions. Using a data set of 16 birds plus an alligator outgroup, and ∼3600-∼3800 loci per marker type, we found that although CNEEs were less variable than bioinformatically derived UCEs or introns and in some cases exhibited a slower approach to branch resolution as determined by phylogenomic subsampling, the quality of CNEE alignments was superior to those of the other markers, with fewer gaps and missing species. Phylogenetic resolution using coalescent approaches was comparable among the three marker types, with most nodes being fully and congruently resolved. Comparison of phylogenetic results across the three marker types indicated that one branch, the sister group to the passerine + falcon clade, was resolved differently and with moderate (>70%) bootstrap support between CNEEs and UCEs or introns. Overall, CNEEs appear to be promising as phylogenomic markers, yielding phylogenetic resolution as high as for UCEs and introns but with fewer gaps, less ambiguity in alignments and with patterns of nucleotide substitution more consistent with the assumptions of commonly used methods of phylogenetic analysis.
Collapse
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, 26 Oxford Street, Harvard University, Cambridge, MA 02138 USA
| | - Alison Cloutier
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, 26 Oxford Street, Harvard University, Cambridge, MA 02138 USA
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario, M5S 2C6 Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario, M5S 3B2 Canada
| | - Allan J. Baker
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario, M5S 2C6 Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario, M5S 3B2 Canada
| |
Collapse
|
130
|
Hallas JM, Chichvarkhin A, Gosliner TM. Aligning evidence: concerns regarding multiple sequence alignments in estimating the phylogeny of the Nudibranchia suborder Doridina. ROYAL SOCIETY OPEN SCIENCE 2017; 4:171095. [PMID: 29134101 PMCID: PMC5666284 DOI: 10.1098/rsos.171095] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 09/20/2017] [Indexed: 06/07/2023]
Abstract
Molecular estimates of phylogenetic relationships rely heavily on multiple sequence alignment construction. There has been little consensus, however, on how to properly address issues pertaining to the alignment of variable regions. Here, we construct alignments from four commonly sequenced molecular markers (16S, 18S, 28S and cytochrome c oxidase subunit I) for the Nudibranchia using three different methodologies: (i) strict mathematical algorithm; (ii) exclusion of variable or divergent regions and (iii) manually curated, and examine how different alignment construction methods can affect phylogenetic signal and phylogenetic estimates for the suborder Doridina. Phylogenetic informativeness (PI) profiles suggest that the molecular markers tested lack the power to resolve relationships at the base of the Doridina, while being more robust at family-level classifications. This supports the lack of consistent resolution between the 19 families within the Doridina across all three alignments. Most of the 19 families were recovered as monophyletic, and instances of non-monophyletic families were consistently recovered between analyses. We conclude that the alignment of variable regions has some effect on phylogenetic estimates of the Doridina, but these effects can vary depending on the size and scope of the phylogenetic query and PI of molecular markers.
Collapse
Affiliation(s)
- Joshua M. Hallas
- Department of Biology, University of Nevada, Reno. 1664 N. Virginia St, Reno, NV 89557, USA
- Department of Invertebrate Zoology and Geology, California Academy of Sciences, 55 Music Concourse Dr Golden Gate Park, San Francisco, CA 94118, USA
| | - Anton Chichvarkhin
- National Scientific Center of Marine Biology, Far East Branch of Russian Academy of Sciences, Palchevskogo 17, Vladivostok 690041, Russia
- Far Eastern Federal University, Sukhanova 8, Vladivostok 690950, Russia
| | - Terrence M. Gosliner
- Department of Invertebrate Zoology and Geology, California Academy of Sciences, 55 Music Concourse Dr Golden Gate Park, San Francisco, CA 94118, USA
| |
Collapse
|
131
|
Huston DC, Cutmore SC, Cribb TH. Molecular phylogeny of the Haplosplanchnata Olson, Cribb, Tkach, Bray and Littlewood, 2003, with a description of Schikhobalotrema huffmani n. sp. Acta Parasitol 2017; 62:502-512. [PMID: 28682775 DOI: 10.1515/ap-2017-0060] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 03/27/2017] [Indexed: 11/15/2022]
Abstract
We describe Schikhobalotrema huffmani n. sp. from Tylosurus crocodilus (Péron and Leseur) (Belonidae) collected off Lizard Island, Great Barrier Reef, Queensland, Australia and Tylosurus gavialoides (Castelnau) collected from Moreton Bay, Queensland. Schikhobalotrema huffmani n. sp., along with Schikhobalotrema ablennis (Abdul-Salam and Khalil, 1987) Madhavi, 2005, Schikhobalotrema acutum (Linton, 1910) Skrjabin and Guschanskaja, 1955 and Schikhobalotrema adacutum (Manter, 1937) Skrjabin and Guschanskaja, 1955 are distinguished from all other species of Schikhobalotrema Skrjabin and Guschanskaja, 1955 in having ventral suckers which bear lateral lobes and have longitudinal apertures. Schikhobalotrema huffmani n. sp. differs from S. ablennis in having an obvious post-vitelline region and a longer forebody. From S. acutum, S. huffmani n. sp. differs in having a prostatic bulb smaller than the pharynx and more anterior testis. From S. adacutum, S. huffmani n. sp. differs in having more prominent ventral sucker lobes, a conspicuous prostatic bulb and a longer forebody. We also report the first Australian record of Haplosplanchnus pachysomus (Eysenhardt, 1829) Looss, 1902, from Mugil cephalus Linnaeus (Mugilidae) collected in Moreton Bay. Molecular sequence data (ITS2, 18S and 28S rDNA) were generated for Schikhobalotrema huffmani n. sp., H. pachysomus and archived specimens of Hymenocotta mulli Manter, 1961. The new 18S and 28S molecular data were combined with published data of five other haplosplanchnid taxa to expand the phylogeny for the Haplosplanchnata. Bayesian inference and Maximum Likelihood analyses recovered identical tree topology and demonstrated the Haplosplanchnata as a well-supported monophyletic group. However, relationships at and below the subfamily level remain poorly resolved.
Collapse
|
132
|
Basso A, Babbucci M, Pauletto M, Riginella E, Patarnello T, Negrisolo E. The highly rearranged mitochondrial genomes of the crabs Maja crispata and Maja squinado (Majidae) and gene order evolution in Brachyura. Sci Rep 2017; 7:4096. [PMID: 28642542 PMCID: PMC5481413 DOI: 10.1038/s41598-017-04168-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 05/11/2017] [Indexed: 11/09/2022] Open
Abstract
We sequenced the mitochondrial genomes of the spider crabs Maja crispata and Maja squinado (Majidae, Brachyura). Both genomes contain the whole set of 37 genes characteristic of Bilaterian genomes, encoded on both α- and β-strands. Both species exhibit the same gene order, which is unique among known animal genomes. In particular, all the genes located on the β-strand form a single block. This gene order was analysed together with the other nine gene orders known for the Brachyura. Our study confirms that the most widespread gene order (BraGO) represents the plesiomorphic condition for Brachyura and was established at the onset of this clade. All other gene orders are the result of transformational pathways originating from BraGO. The different gene orders exhibit variable levels of genes rearrangements, which involve only tRNAs or all types of genes. Local homoplastic arrangements were identified, while complete gene orders remain unique and represent signatures that can have a diagnostic value. Brachyura appear to be a hot-spot of gene order diversity within the phylum Arthropoda. Our analysis, allowed to track, for the first time, the fully evolutionary pathways producing the Brachyuran gene orders. This goal was achieved by coupling sophisticated bioinformatic tools with phylogenetic analysis.
Collapse
Affiliation(s)
- Andrea Basso
- University of Padova, Department of Comparative Biomedicine and Food Science (BCA), 35020, Agripolis, Legnaro (PD), Italy
| | - Massimiliano Babbucci
- University of Padova, Department of Comparative Biomedicine and Food Science (BCA), 35020, Agripolis, Legnaro (PD), Italy
| | - Marianna Pauletto
- University of Padova, Department of Comparative Biomedicine and Food Science (BCA), 35020, Agripolis, Legnaro (PD), Italy
| | - Emilio Riginella
- University of Padova, Department of Biology, 35131, Padova, Italy
| | - Tomaso Patarnello
- University of Padova, Department of Comparative Biomedicine and Food Science (BCA), 35020, Agripolis, Legnaro (PD), Italy
| | - Enrico Negrisolo
- University of Padova, Department of Comparative Biomedicine and Food Science (BCA), 35020, Agripolis, Legnaro (PD), Italy.
| |
Collapse
|
133
|
Le VS, Dang CC, Le QS. Improved mitochondrial amino acid substitution models for metazoan evolutionary studies. BMC Evol Biol 2017; 17:136. [PMID: 28606055 PMCID: PMC5469158 DOI: 10.1186/s12862-017-0987-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 06/03/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Amino acid substitution models play an essential role in inferring phylogenies from mitochondrial protein data. However, only few empirical models have been estimated from restricted mitochondrial protein data of a hundred species. The existing models are unlikely to represent appropriately the amino acid substitutions from hundred thousands metazoan mitochondrial protein sequences. RESULTS We selected 125,935 mitochondrial protein sequences from 34,448 species in the metazoan kingdom to estimate new amino acid substitution models targeting metazoa, vertebrates and invertebrate groups. The new models help to find significantly better likelihood phylogenies in comparison with the existing models. We noted remarkable distances from phylogenies with the existing models to the maximum likelihood phylogenies that indicate a considerable number of incorrect bipartitions in phylogenies with the existing models. Finally, we used the new models and mitochondrial protein data to certify that Testudines, Aves, and Crocodylia form one separated clade within amniotes. CONCLUSIONS We introduced new mitochondrial amino acid substitution models for metazoan mitochondrial proteins. The new models outperform the existing models in inferring phylogenies from metazoan mitochondrial protein data. We strongly recommend researchers to use the new models in analysing metazoan mitochondrial protein data.
Collapse
Affiliation(s)
- Vinh Sy Le
- University of Engineering and Technology, Vietnam National University Hanoi, Hanoi, Vietnam.
| | - Cuong Cao Dang
- University of Engineering and Technology, Vietnam National University Hanoi, Hanoi, Vietnam
| | - Quang Si Le
- School of Pharmacy and Biomedical Sciences, University of Portsmouth, Winston Churchill Avenue Portsmouth, Portsmouth, PO1 2UP, UK.
| |
Collapse
|
134
|
Anderson FE, Williams BW, Horn KM, Erséus C, Halanych KM, Santos SR, James SW. Phylogenomic analyses of Crassiclitellata support major Northern and Southern Hemisphere clades and a Pangaean origin for earthworms. BMC Evol Biol 2017; 17:123. [PMID: 28558722 PMCID: PMC5450073 DOI: 10.1186/s12862-017-0973-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 05/18/2017] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Earthworms (Crassiclitellata) are a diverse group of annelids of substantial ecological and economic importance. Earthworms are primarily terrestrial infaunal animals, and as such are probably rather poor natural dispersers. Therefore, the near global distribution of earthworms reflects an old and likely complex evolutionary history. Despite a long-standing interest in Crassiclitellata, relationships among and within major clades remain unresolved. METHODS In this study, we evaluate crassiclitellate phylogenetic relationships using 38 new transcriptomes in combination with publicly available transcriptome data. Our data include representatives of nearly all extant earthworm families and a representative of Moniligastridae, another terrestrial annelid group thought to be closely related to Crassiclitellata. We use a series of differentially filtered data matrices and analyses to examine the effects of data partitioning, missing data, compositional and branch-length heterogeneity, and outgroup inclusion. RESULTS AND DISCUSSION We recover a consistent, strongly supported ingroup topology irrespective of differences in methodology. The topology supports two major earthworm clades, each of which consists of a Northern Hemisphere subclade and a Southern Hemisphere subclade. Divergence time analysis results are concordant with the hypothesis that these north-south splits are the result of the breakup of the supercontinent Pangaea. CONCLUSIONS These results support several recently proposed revisions to the classical understanding of earthworm phylogeny, reveal two major clades that seem to reflect Pangaean distributions, and raise new questions about earthworm evolutionary relationships.
Collapse
Affiliation(s)
- Frank E Anderson
- Department of Zoology, Southern Illinois University, Carbondale, IL, 62901, USA.
| | - Bronwyn W Williams
- Department of Zoology, Southern Illinois University, Carbondale, IL, 62901, USA
- North Carolina Museum of Natural Sciences, Research Laboratory, Raleigh, North Carolina, 27699, USA
| | - Kevin M Horn
- Department of Zoology, Southern Illinois University, Carbondale, IL, 62901, USA
| | - Christer Erséus
- Department of Biological and Environmental Sciences, University of Gothenburg, 405 30, Göteborg, SE, Sweden
| | - Kenneth M Halanych
- Molette Biology Laboratory for Environmental and Climate Change Studies, Department of Biological Sciences, Auburn University, Auburn, AL, 36849, USA
| | - Scott R Santos
- Molette Biology Laboratory for Environmental and Climate Change Studies, Department of Biological Sciences, Auburn University, Auburn, AL, 36849, USA
| | - Samuel W James
- Department of Biology, University of Iowa, Iowa City, Iowa, 52242, USA
| |
Collapse
|
135
|
James AM, Jayasena AS, Zhang J, Berkowitz O, Secco D, Knott GJ, Whelan J, Bond CS, Mylne JS. Evidence for Ancient Origins of Bowman-Birk Inhibitors from Selaginella moellendorffii. THE PLANT CELL 2017; 29:461-473. [PMID: 28298518 PMCID: PMC5385957 DOI: 10.1105/tpc.16.00831] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 02/27/2017] [Accepted: 03/14/2017] [Indexed: 05/16/2023]
Abstract
Bowman-Birk Inhibitors (BBIs) are a well-known family of plant protease inhibitors first described 70 years ago. BBIs are known only in the legume (Fabaceae) and cereal (Poaceae) families, but peptides that mimic their trypsin-inhibitory loops exist in sunflowers (Helianthus annuus) and frogs. The disparate biosynthetic origins and distant phylogenetic distribution implies these loops evolved independently, but their structural similarity suggests a common ancestor. Targeted bioinformatic searches for the BBI inhibitory loop discovered highly divergent BBI-like sequences in the seedless, vascular spikemoss Selaginella moellendorffii Using de novo transcriptomics, we confirmed expression of five transcripts in S. moellendorffii whose encoded proteins share homology with BBI inhibitory loops. The most highly expressed, BBI3, encodes a protein that inhibits trypsin. We needed to mutate two lysine residues to abolish trypsin inhibition, suggesting BBI3's mechanism of double-headed inhibition is shared with BBIs from angiosperms. As Selaginella belongs to the lycopod plant lineage, which diverged ∼200 to 230 million years before the common ancestor of angiosperms, its BBI-like proteins imply there was a common ancestor for legume and cereal BBIs. Indeed, we discovered BBI sequences in six angiosperm families outside the Fabaceae and Poaceae. These findings provide the evolutionary missing links between the well-known legume and cereal BBI gene families.
Collapse
Affiliation(s)
- Amy M James
- School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, Australia
- The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, Australia
| | - Achala S Jayasena
- School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, Australia
- The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, Australia
| | - Jingjing Zhang
- School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, Australia
- The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, Australia
| | - Oliver Berkowitz
- La Trobe University, School of Life Sciences, ARC Centre of Excellence in Plant Energy Biology, Melbourne 3086, Australia
| | - David Secco
- The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, Australia
| | - Gavin J Knott
- School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, Australia
| | - James Whelan
- La Trobe University, School of Life Sciences, ARC Centre of Excellence in Plant Energy Biology, Melbourne 3086, Australia
| | - Charles S Bond
- School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, Australia
| | - Joshua S Mylne
- School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, Australia
- The ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, Australia
| |
Collapse
|
136
|
Martin SB, Cutmore SC, Cribb TH. Revision of Neolebouria Gibson, 1976 (Digenea: Opecoelidae), with Trilobovarium n. g., for species infecting tropical and subtropical shallow-water fishes. Syst Parasitol 2017; 94:307-338. [DOI: 10.1007/s11230-017-9707-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 02/02/2017] [Indexed: 11/24/2022]
|
137
|
Ayad LAK, Pissis SP. MARS: improving multiple circular sequence alignment using refined sequences. BMC Genomics 2017; 18:86. [PMID: 28088189 PMCID: PMC5237495 DOI: 10.1186/s12864-016-3477-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 12/26/2016] [Indexed: 12/04/2022] Open
Abstract
Background A fundamental assumption of all widely-used multiple sequence alignment techniques is that the left- and right-most positions of the input sequences are relevant to the alignment. However, the position where a sequence starts or ends can be totally arbitrary due to a number of reasons: arbitrariness in the linearisation (sequencing) of a circular molecular structure; or inconsistencies introduced into sequence databases due to different linearisation standards. These scenarios are relevant, for instance, in the process of multiple sequence alignment of mitochondrial DNA, viroid, viral or other genomes, which have a circular molecular structure. A solution for these inconsistencies would be to identify a suitable rotation (cyclic shift) for each sequence; these refined sequences may in turn lead to improved multiple sequence alignments using the preferred multiple sequence alignment program. Results We present MARS, a new heuristic method for improving Multiple circular sequence Alignment using Refined Sequences. MARS was implemented in the C++ programming language as a program to compute the rotations (cyclic shifts) required to best align a set of input sequences. Experimental results, using real and synthetic data, show that MARS improves the alignments, with respect to standard genetic measures and the inferred maximum-likelihood-based phylogenies, and outperforms state-of-the-art methods both in terms of accuracy and efficiency. Our results show, among others, that the average pairwise distance in the multiple sequence alignment of a dataset of widely-studied mitochondrial DNA sequences is reduced by around 5% when MARS is applied before a multiple sequence alignment is performed. Conclusions Analysing multiple sequences simultaneously is fundamental in biological research and multiple sequence alignment has been found to be a popular method for this task. Conventional alignment techniques cannot be used effectively when the position where sequences start is arbitrary. We present here a method, which can be used in conjunction with any multiple sequence alignment program, to address this problem effectively and efficiently.
Collapse
Affiliation(s)
- Lorraine A K Ayad
- Department of Informatics, King's College London, Strand, London, WC2R 2LS, UK
| | - Solon P Pissis
- Department of Informatics, King's College London, Strand, London, WC2R 2LS, UK.
| |
Collapse
|
138
|
|
139
|
Phylogenetics and Phylogenomics of Rust Fungi. FUNGAL PHYLOGENETICS AND PHYLOGENOMICS 2017; 100:267-307. [DOI: 10.1016/bs.adgen.2017.09.011] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
140
|
Ngoc PCT, Greenhalgh R, Dermauw W, Rombauts S, Bajda S, Zhurov V, Grbić M, Van de Peer Y, Van Leeuwen T, Rouzé P, Clark RM. Complex Evolutionary Dynamics of Massively Expanded Chemosensory Receptor Families in an Extreme Generalist Chelicerate Herbivore. Genome Biol Evol 2016; 8:3323-3339. [PMID: 27797949 PMCID: PMC5203786 DOI: 10.1093/gbe/evw249] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
While mechanisms to detoxify plant produced, anti-herbivore compounds have been associated with plant host use by herbivores, less is known about the role of chemosensory perception in their life histories. This is especially true for generalists, including chelicerate herbivores that evolved herbivory independently from the more studied insect lineages. To shed light on chemosensory perception in a generalist herbivore, we characterized the chemosensory receptors (CRs) of the chelicerate two-spotted spider mite, Tetranychus urticae, an extreme generalist. Strikingly, T. urticae has more CRs than reported in any other arthropod to date. Including pseudogenes, 689 gustatory receptors were identified, as were 136 degenerin/Epithelial Na+ Channels (ENaCs) that have also been implicated as CRs in insects. The genomic distribution of T. urticae gustatory receptors indicates recurring bursts of lineage-specific proliferations, with the extent of receptor clusters reminiscent of those observed in the CR-rich genomes of vertebrates or C. elegans Although pseudogenization of many gustatory receptors within clusters suggests relaxed selection, a subset of receptors is expressed. Consistent with functions as CRs, the genomic distribution and expression of ENaCs in lineage-specific T. urticae expansions mirrors that observed for gustatory receptors. The expansion of ENaCs in T. urticae to > 3-fold that reported in other animals was unexpected, raising the possibility that ENaCs in T. urticae have been co-opted to fulfill a major role performed by unrelated CRs in other animals. More broadly, our findings suggest an elaborate role for chemosensory perception in generalist herbivores that are of key ecological and agricultural importance.
Collapse
Affiliation(s)
- Phuong Cao Thi Ngoc
- Department of Plant Systems Biology, VIB, Ghent, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | | | - Wannes Dermauw
- Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Stephane Rombauts
- Department of Plant Systems Biology, VIB, Ghent, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Sabina Bajda
- Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.,Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Vladimir Zhurov
- Department of Biology, The University of Western Ontario, London, ON, Canada
| | - Miodrag Grbić
- Department of Biology, The University of Western Ontario, London, ON, Canada.,University of La Rioja, Logroño, Spain
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, Ghent, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent, Belgium.,Department of Genetics, Genomics Research Institute, University of Pretoria, Pretoria, South Africa
| | - Thomas Van Leeuwen
- Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.,Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Pierre Rouzé
- Department of Plant Systems Biology, VIB, Ghent, Belgium
| | - Richard M Clark
- Department of Biology, University of Utah, Salt Lake City, Utah .,Center for Cell and Genome Science, University of Utah, Salt Lake City, Utah
| |
Collapse
|
141
|
Chiner-Oms A, González-Candelas F. EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers. Evol Bioinform Online 2016; 12:277-284. [PMID: 27920488 PMCID: PMC5127606 DOI: 10.4137/ebo.s40583] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Revised: 10/02/2016] [Accepted: 10/05/2016] [Indexed: 12/01/2022] Open
Abstract
We present EvalMSA, a software tool for evaluating and detecting outliers in multiple sequence alignments (MSAs). This tool allows the identification of divergent sequences in MSAs by scoring the contribution of each row in the alignment to its quality using a sum-of-pair-based method and additional analyses. Our main goal is to provide users with objective data in order to take informed decisions about the relevance and/or pertinence of including/retaining a particular sequence in an MSA. EvalMSA is written in standard Perl and also uses some routines from the statistical language R. Therefore, it is necessary to install the R-base package in order to get full functionality. Binary packages are freely available from http://sourceforge.net/projects/evalmsa/for Linux and Windows.
Collapse
Affiliation(s)
- Alvaro Chiner-Oms
- Joint Research Unit "Infection and Public Health" FISABIO, Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia, Paterna, Valencia, Spain.; CIBER in Epidemiology and Public Health, Madrid, Spain
| | - Fernando González-Candelas
- Joint Research Unit "Infection and Public Health" FISABIO, Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia, Paterna, Valencia, Spain.; CIBER in Epidemiology and Public Health, Madrid, Spain
| |
Collapse
|
142
|
Abstract
Background Multiple sequence alignment is an important task in bioinformatics, and alignments of large datasets containing hundreds or thousands of sequences are increasingly of interest. While many alignment methods exist, the most accurate alignments are likely to be based on stochastic models where sequences evolve down a tree with substitutions, insertions, and deletions. While some methods have been developed to estimate alignments under these stochastic models, only the Bayesian method BAli-Phy has been able to run on even moderately large datasets, containing 100 or so sequences. A technique to extend BAli-Phy to enable alignments of thousands of sequences could potentially improve alignment and phylogenetic tree accuracy on large-scale data beyond the best-known methods today. Results We use simulated data with up to 10,000 sequences representing a variety of model conditions, including some that are significantly divergent from the statistical models used in BAli-Phy and elsewhere. We give a method for incorporating BAli-Phy into PASTA and UPP, two strategies for enabling alignment methods to scale to large datasets, and give alignment and tree accuracy results measured against the ground truth from simulations. Comparable results are also given for other methods capable of aligning this many sequences. Conclusions Extensions of BAli-Phy using PASTA and UPP produce significantly more accurate alignments and phylogenetic trees than the current leading methods. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3101-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michael Nute
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S. Wright St, Champaign, 61820, IL, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Ave, Urbana, 61801, IL, USA. .,Department of Bioengineering, University of Illinois at Urbana-Champaign, 1270 Digital Computing Laboratory, MC-278, Urbana, 61801, IL, USA. .,National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, 1205 W. Clark St., MC-257, Urbana, 61801, IL, USA. .,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1206 W. Gregory Dr., MC-195, Urbana, 61801, IL, USA.
| |
Collapse
|
143
|
Iwai S, Weinmaier T, Schmidt BL, Albertson DG, Poloso NJ, Dabbagh K, DeSantis TZ. Piphillin: Improved Prediction of Metagenomic Content by Direct Inference from Human Microbiomes. PLoS One 2016; 11:e0166104. [PMID: 27820856 PMCID: PMC5098786 DOI: 10.1371/journal.pone.0166104] [Citation(s) in RCA: 198] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 10/07/2016] [Indexed: 01/30/2023] Open
Abstract
Functional analysis of a clinical microbiome facilitates the elucidation of mechanisms by which microbiome perturbation can cause a phenotypic change in the patient. The direct approach for the analysis of the functional capacity of the microbiome is via shotgun metagenomics. An inexpensive method to estimate the functional capacity of a microbial community is through collecting 16S rRNA gene profiles then indirectly inferring the abundance of functional genes. This inference approach has been implemented in the PICRUSt and Tax4Fun software tools. However, those tools have important limitations since they rely on outdated functional databases and uncertain phylogenetic trees and require very specific data pre-processing protocols. Here we introduce Piphillin, a straightforward algorithm independent of any proposed phylogenetic tree, leveraging contemporary functional databases and not obliged to any singular data pre-processing protocol. When all three inference tools were evaluated against actual shotgun metagenomics, Piphillin was superior in predicting gene composition in human clinical samples compared to both PICRUSt and Tax4Fun (p<0.01 and p<0.001, respectively) and Piphillin’s ability to predict disease associations with specific gene orthologs exhibited a 15% increase in balanced accuracy compared to PICRUSt. From laboratory animal samples, no performance advantage was observed for any one of the tools over the others and for environmental samples all produced unsatisfactory predictions. Our results demonstrate that functional inference using the direct method implemented in Piphillin is preferable for clinical biospecimens. Piphillin is publicly available for academic use at http://secondgenome.com/Piphillin.
Collapse
Affiliation(s)
- Shoko Iwai
- Informatics Department, Second Genome Inc., South San Francisco, California, United States of America
| | - Thomas Weinmaier
- Informatics Department, Second Genome Inc., South San Francisco, California, United States of America
| | - Brian L. Schmidt
- Bluestone Center for Clinical Research and the Department of Oral and Maxillofacial Surgery, New York University College of Dentistry, New York, New York, United States of America
| | - Donna G. Albertson
- Bluestone Center for Clinical Research and the Department of Oral and Maxillofacial Surgery, New York University College of Dentistry, New York, New York, United States of America
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California, United States of America
| | - Neil J. Poloso
- Research and External Scientific Innovation Department, Allergan PLC, Irvine, California, United States of America
| | - Karim Dabbagh
- Informatics Department, Second Genome Inc., South San Francisco, California, United States of America
| | - Todd Z. DeSantis
- Informatics Department, Second Genome Inc., South San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
144
|
Nunez JCB, Oleksiak MF. A Cost-Effective Approach to Sequence Hundreds of Complete Mitochondrial Genomes. PLoS One 2016; 11:e0160958. [PMID: 27505419 PMCID: PMC4978415 DOI: 10.1371/journal.pone.0160958] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 07/27/2016] [Indexed: 12/11/2022] Open
Abstract
We present a cost-effective approach to sequence whole mitochondrial genomes for hundreds of individuals. Our approach uses small reaction volumes and unmodified (non-phosphorylated) barcoded adaptors to minimize reagent costs. We demonstrate our approach by sequencing 383 Fundulus sp. mitochondrial genomes (192 F. heteroclitus and 191 F. majalis). Prior to sequencing, we amplified the mitochondrial genomes using 4–5 custom-made, overlapping primer pairs, and sequencing was performed on an Illumina HiSeq 2500 platform. After removing low quality and short sequences, 2.9 million and 2.8 million reads were generated for F. heteroclitus and F. majalis respectively. Individual genomes were assembled for each species by mapping barcoded reads to a reference genome. For F. majalis, the reference genome was built de novo. On average, individual consensus sequences had high coverage: 61-fold for F. heteroclitus and 57-fold for F. majalis. The approach discussed in this paper is optimized for sequencing mitochondrial genomes on an Illumina platform. However, with the proper modifications, this approach could be easily applied to other small genomes and sequencing platforms.
Collapse
Affiliation(s)
- Joaquin C. B. Nunez
- University of Miami, Rosenstiel School of Marine and Atmospheric Science, Department of Marine Biology and Ecology, Miami, Florida, United States of America
| | - Marjorie F. Oleksiak
- University of Miami, Rosenstiel School of Marine and Atmospheric Science, Department of Marine Biology and Ecology, Miami, Florida, United States of America
- * E-mail:
| |
Collapse
|
145
|
Vanhoutreve R, Kress A, Legrand B, Gass H, Poch O, Thompson JD. LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system. BMC Bioinformatics 2016; 17:271. [PMID: 27387560 PMCID: PMC4936259 DOI: 10.1186/s12859-016-1146-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 07/01/2016] [Indexed: 11/13/2022] Open
Abstract
Background A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertain regions in the alignment can lead to significant bias in the subsequent inferences. Results Here, we present a new method, LEON-BIS, which uses a robust Bayesian framework to estimate the homologous relations between sequences in a protein multiple alignment. Sequences are clustered into sub-families and relations are predicted at different levels, including ‘core blocks’, ‘regions’ and full-length proteins. The accuracy and reliability of the predictions are demonstrated in large-scale comparisons using well annotated alignment databases, where the homologous sequence segments are detected with very high sensitivity and specificity. Conclusions LEON-BIS uses robust Bayesian statistics to distinguish the portions of multiple sequence alignments that are conserved either across the whole family or within subfamilies. LEON-BIS should thus be useful for automatic, high-throughput genome annotations, 2D/3D structure predictions, protein-protein interaction predictions etc.
Collapse
Affiliation(s)
- Renaud Vanhoutreve
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle de Strasbourg, Strasbourg, France
| | - Arnaud Kress
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle de Strasbourg, Strasbourg, France
| | - Baptiste Legrand
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle de Strasbourg, Strasbourg, France
| | - Hélène Gass
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle de Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle de Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle de Strasbourg, Strasbourg, France.
| |
Collapse
|
146
|
Simmons MP, Gatesy J. Biases of tree-independent-character-subsampling methods. Mol Phylogenet Evol 2016; 100:424-443. [DOI: 10.1016/j.ympev.2016.04.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 03/16/2016] [Accepted: 04/15/2016] [Indexed: 12/21/2022]
|
147
|
Jaiteh M, Taly A, Hénin J. Evolution of Pentameric Ligand-Gated Ion Channels: Pro-Loop Receptors. PLoS One 2016; 11:e0151934. [PMID: 26986966 PMCID: PMC4795631 DOI: 10.1371/journal.pone.0151934] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 03/07/2016] [Indexed: 01/27/2023] Open
Abstract
Pentameric ligand-gated ion channels (pLGICs) are ubiquitous neurotransmitter receptors in Bilateria, with a small number of known prokaryotic homologues. Here we describe a new inventory and phylogenetic analysis of pLGIC genes across all kingdoms of life. Our main finding is a set of pLGIC genes in unicellular eukaryotes, some of which are metazoan-like Cys-loop receptors, and others devoid of Cys-loop cysteines, like their prokaryotic relatives. A number of such “Cys-less” receptors also appears in invertebrate metazoans. Together, those findings draw a new distribution of pLGICs in eukaryotes. A broader distribution of prokaryotic channels also emerges, including a major new archaeal taxon, Thaumarchaeota. More generally, pLGICs now appear nearly ubiquitous in major taxonomic groups except multicellular plants and fungi. However, pLGICs are sparsely present in unicellular taxa, suggesting a high rate of gene loss and a non-essential character, contrasting with their essential role as synaptic receptors of the bilaterian nervous system. Multiple alignments of these highly divergent sequences reveal a small number of conserved residues clustered at the interface between the extracellular and transmembrane domains. Only the “Cys-loop” proline is absolutely conserved, suggesting the more fitting name “Pro loop” for that motif, and “Pro-loop receptors” for the superfamily. The infered molecular phylogeny shows a Cys-loop and a Cys-less clade in eukaryotes, both containing metazoans and unicellular members. This suggests new hypotheses on the evolutionary history of the superfamily, such as a possible origin of the Cys-loop cysteines in an ancient unicellular eukaryote. Deeper phylogenetic relationships remain uncertain, particularly around the split between bacteria, archaea, and eukaryotes.
Collapse
Affiliation(s)
- Mariama Jaiteh
- Laboratoire de Biochimie Théorique, Institut de Biologie Physico-Chimique, CNRS and Université Paris Diderot, Paris, France
| | - Antoine Taly
- Laboratoire de Biochimie Théorique, Institut de Biologie Physico-Chimique, CNRS and Université Paris Diderot, Paris, France
| | - Jérôme Hénin
- Laboratoire de Biochimie Théorique, Institut de Biologie Physico-Chimique, CNRS and Université Paris Diderot, Paris, France
- * E-mail:
| |
Collapse
|
148
|
Alors D, Lumbsch HT, Divakar PK, Leavitt SD, Crespo A. An Integrative Approach for Understanding Diversity in the Punctelia rudecta Species Complex (Parmeliaceae, Ascomycota). PLoS One 2016; 11:e0146537. [PMID: 26863231 PMCID: PMC4749632 DOI: 10.1371/journal.pone.0146537] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Accepted: 12/18/2015] [Indexed: 11/23/2022] Open
Abstract
High levels of cryptic diversity have been documented in lichenized fungi, especially in Parmeliaceae, and integrating various lines of evidence, including coalescent-based species delimitation approaches, help establish more robust species circumscriptions. In this study, we used an integrative taxonomic approach to delimit species in the lichen-forming fungal genus Punctelia (Parmeliaceae), with a particular focus on the cosmopolitan species P. rudecta. Nuclear, mitochondrial ribosomal DNA and protein-coding DNA sequences were analyzed in phylogenetic and coalescence-based frameworks. Additionally, morphological, ecological and geographical features of the sampled specimens were evaluated. Five major strongly supported monophyletic clades were recognized in the genus Punctelia, and each clade could be characterized by distinct patterns in medullary chemistry. Punctelia rudecta as currently circumscribed was shown to be polyphyletic. A variety of empirical species delimitation methods provide evidence for a minimum of four geographically isolated species within the nominal taxon Punctelia rudecta, including a newly described saxicolous species, P. guanchica, and three corticolous species. In order to facilitate reliable sample identification for biodiversity, conservation, and air quality bio-monitoring research, these three species have been epitypified, in addition to the description of a new species.
Collapse
Affiliation(s)
- David Alors
- Departamento de Biología Vegetal II, Facultad de Farmacia, Universidad Complutense de Madrid, Plaza de Ramón y Cajal s/n, Madrid, Spain
| | - H. Thorsten Lumbsch
- Science and Education, Field Museum, Chicago, Illinois, United States of America
| | - Pradeep K. Divakar
- Departamento de Biología Vegetal II, Facultad de Farmacia, Universidad Complutense de Madrid, Plaza de Ramón y Cajal s/n, Madrid, Spain
| | - Steven D. Leavitt
- Science and Education, Field Museum, Chicago, Illinois, United States of America
| | - Ana Crespo
- Departamento de Biología Vegetal II, Facultad de Farmacia, Universidad Complutense de Madrid, Plaza de Ramón y Cajal s/n, Madrid, Spain
| |
Collapse
|
149
|
Cannon JT, Kocot KM. Phylogenomics Using Transcriptome Data. Methods Mol Biol 2016; 1452:65-80. [PMID: 27460370 DOI: 10.1007/978-1-4939-3774-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This chapter presents a generalized protocol for conducting phylogenetic analyses using large-scale molecular datasets, specifically using transcriptome data from the Illumina sequencing platform. The general molecular lab bench protocol consists of RNA extraction, cDNA synthesis, and sequencing, in this case via Illumina. After sequences have been obtained, bioinformatics methods are used to assemble raw reads, identify coding regions, and categorize sequences from different species into groups of orthologous genes (OGs). The specific OGs to be used for phylogenetic inference are selected using a custom shell script. Finally, the selected orthologous groups are concatenated into a supermatrix. Generalized methods for phylogenomic inference using maximum likelihood and Bayesian inference software are presented.
Collapse
Affiliation(s)
- Johanna Taylor Cannon
- Department of Zoology, Naturhistoriska Riksmuseet, 50007, SE-104 05, Stockholm, Sweden.
| | - Kevin Michael Kocot
- Department of Biological Sciences and Alabama Museum of Natural History, The University of Alabama, 307 Mary Harmon Bryant Hall, Tuscaloosa, AL, 35487, USA
| |
Collapse
|