1
|
Mulhair PO, McCarthy CGP, Siu-Ting K, Creevey CJ, O'Connell MJ. Filtering artifactual signal increases support for Xenacoelomorpha and Ambulacraria sister relationship in the animal tree of life. Curr Biol 2022; 32:5180-5188.e3. [PMID: 36356574 DOI: 10.1016/j.cub.2022.10.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 08/09/2022] [Accepted: 10/18/2022] [Indexed: 11/10/2022]
Abstract
Conflicting studies place a group of bilaterian invertebrates containing xenoturbellids and acoelomorphs, the Xenacoelomorpha, as either the primary emerging bilaterian phylum1,2,3,4,5,6 or within Deuterostomia, sister to Ambulacraria.7,8,9,10,11 Although their placement as sister to the rest of Bilateria supports relatively simple morphology in the ancestral bilaterian, their alternative placement within Deuterostomia suggests a morphologically complex ancestral bilaterian along with extensive loss of major phenotypic traits in the Xenacoelomorpha. Recent studies have questioned whether Deuterostomia should be considered monophyletic at all.10,12,13 Hidden paralogy and poor phylogenetic signal present a major challenge for reconstructing species phylogenies.14,15,16,17,18 Here, we assess whether these issues have contributed to the conflict over the placement of Xenacoelomorpha. We reanalyzed published datasets, enriching for orthogroups whose gene trees support well-resolved clans elsewhere in the animal tree.16 We find that most genes in previously published datasets violate incontestable clans, suggesting that hidden paralogy and low phylogenetic signal affect the ability to reconstruct branching patterns at deep nodes in the animal tree. We demonstrate that removing orthogroups that cannot recapitulate incontestable relationships alters the final topology that is inferred, while simultaneously improving the fit of the model to the data. We discover increased, but ultimately not conclusive, support for the existence of Xenambulacraria in our set of filtered orthogroups. At a time when we are progressing toward sequencing all life on the planet, we argue that long-standing contentious issues in the tree of life will be resolved using smaller amounts of better quality data that can be modeled adequately.19.
Collapse
Affiliation(s)
- Peter O Mulhair
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | - Charley G P McCarthy
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK
| | - Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Mary J O'Connell
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK.
| |
Collapse
|
2
|
Xiong H, Wang D, Shao C, Yang X, Yang J, Ma T, Davis CC, Liu L, Xi Z. Species Tree Estimation and the Impact of Gene Loss Following Whole-Genome Duplication. Syst Biol 2022; 71:1348-1361. [PMID: 35689633 PMCID: PMC9558847 DOI: 10.1093/sysbio/syac040] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 12/02/2022] Open
Abstract
Whole-genome duplication (WGD) occurs broadly and repeatedly across the history of eukaryotes and is recognized as a prominent evolutionary force, especially in plants. Immediately following WGD, most genes are present in two copies as paralogs. Due to this redundancy, one copy of a paralog pair commonly undergoes pseudogenization and is eventually lost. When speciation occurs shortly after WGD; however, differential loss of paralogs may lead to spurious phylogenetic inference resulting from the inclusion of pseudoorthologs–paralogous genes mistakenly identified as orthologs because they are present in single copies within each sampled species. The influence and impact of including pseudoorthologs versus true orthologs as a result of gene extinction (or incomplete laboratory sampling) are only recently gaining empirical attention in the phylogenomics community. Moreover, few studies have yet to investigate this phenomenon in an explicit coalescent framework. Here, using mathematical models, numerous simulated data sets, and two newly assembled empirical data sets, we assess the effect of pseudoorthologs on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and differential gene loss scenarios following WGD. When gene loss occurs along the terminal branches of the species tree, alignment-based (BPP) and gene-tree-based (ASTRAL, MP-EST, and STAR) coalescent methods are adversely affected as the degree of ILS increases. This can be greatly improved by sampling a sufficiently large number of genes. Under the same circumstances, however, concatenation methods consistently estimate incorrect species trees as the number of genes increases. Additionally, pseudoorthologs can greatly mislead species tree inference when gene loss occurs along the internal branches of the species tree. Here, both coalescent and concatenation methods yield inconsistent results. These results underscore the importance of understanding the influence of pseudoorthologs in the phylogenomics era. [Coalescent method; concatenation method; incomplete lineage sorting; pseudoorthologs; single-copy gene; whole-genome duplication.]
Collapse
Affiliation(s)
- Haifeng Xiong
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Danying Wang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Chen Shao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Xuchen Yang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Jialin Yang
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Tao Ma
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Zhenxiang Xi
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| |
Collapse
|
3
|
Zhang C, Zhao Y, Braun EL, Mirarab S. TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13696] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology Program University of California San Diego CA USA
| | - Yiming Zhao
- Electrical and Computer Engineering Department University of California San Diego CA USA
| | - Edward L. Braun
- Department of Biology and Genetics Institute University of Florida Gainesville FL USA
| | - Siavash Mirarab
- Electrical and Computer Engineering Department University of California San Diego CA USA
| |
Collapse
|
4
|
Shen XX, Steenwyk JL, Rokas A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst Biol 2021; 70:997-1014. [PMID: 33616672 DOI: 10.1093/sysbio/syab011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.
Collapse
Affiliation(s)
- Xing-Xing Shen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China.,Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
5
|
Vazquez JM, Lynch VJ. Pervasive duplication of tumor suppressors in Afrotherians during the evolution of large bodies and reduced cancer risk. eLife 2021; 10:e65041. [PMID: 33513090 PMCID: PMC7952090 DOI: 10.7554/elife.65041] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 01/28/2021] [Indexed: 12/11/2022] Open
Abstract
The risk of developing cancer is correlated with body size and lifespan within species. Between species, however, there is no correlation between cancer and either body size or lifespan, indicating that large, long-lived species have evolved enhanced cancer protection mechanisms. Elephants and their relatives (Proboscideans) are a particularly interesting lineage for the exploration of mechanisms underlying the evolution of augmented cancer resistance because they evolved large bodies recently within a clade of smaller-bodied species (Afrotherians). Here, we explore the contribution of gene duplication to body size and cancer risk in Afrotherians. Unexpectedly, we found that tumor suppressor duplication was pervasive in Afrotherian genomes, rather than restricted to Proboscideans. Proboscideans, however, have duplicates in unique pathways that may underlie some aspects of their remarkable anti-cancer cell biology. These data suggest that duplication of tumor suppressor genes facilitated the evolution of increased body size by compensating for decreasing intrinsic cancer risk.
Collapse
Affiliation(s)
- Juan M Vazquez
- Department of Human Genetics, The University of ChicagoChicagoUnited States
| | - Vincent J Lynch
- Department of Biological Sciences, University at BuffaloBuffaloUnited States
| |
Collapse
|
6
|
Correia K, Mahadevan R. Pan‐Genome‐Scale Network Reconstruction: Harnessing Phylogenomics Increases the Quantity and Quality of Metabolic Models. Biotechnol J 2020; 15:e1900519. [DOI: 10.1002/biot.201900519] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 07/22/2020] [Indexed: 12/31/2022]
Affiliation(s)
- Kevin Correia
- Department of Chemical Engineering and Applied Chemistry University of Toronto 200 College Street Toronto Ontario M5S 3E5 Canada
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering and Applied Chemistry University of Toronto 200 College Street Toronto Ontario M5S 3E5 Canada
- Institute of Biomedical Engineering University of Toronto 164 College Street Toronto Ontario M5S 3G9 Canada
| |
Collapse
|
7
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
8
|
Correia K, Yu SM, Mahadevan R. AYbRAH: a curated ortholog database for yeasts and fungi spanning 600 million years of evolution. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5403499. [PMID: 30893420 PMCID: PMC6425859 DOI: 10.1093/database/baz022] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 01/17/2019] [Accepted: 01/28/2019] [Indexed: 12/14/2022]
Abstract
Budding yeasts inhabit a range of environments by exploiting various metabolic traits. The genetic bases for these traits are mostly unknown, preventing their addition or removal in a chassis organism for metabolic engineering. Insight into the evolution of orthologs, paralogs and xenologs in the yeast pan-genome can help bridge these genotypes; however, existing phylogenomic databases do not span diverse yeasts, and sometimes cannot distinguish between these homologs. To help understand the molecular evolution of these traits in yeasts, we created Analyzing Yeasts by Reconstructing Ancestry of Homologs (AYbRAH), an open-source database of predicted and manually curated ortholog groups for 33 diverse fungi and yeasts in Dikarya, spanning 600 million years of evolution. OrthoMCL and OrthoDB were used to cluster protein sequence into ortholog and homolog groups, respectively; MAFFT and PhyML reconstructed the phylogeny of all homolog groups. Ortholog assignments for enzymes and small metabolite transporters were compared to their phylogenetic reconstruction, and curated to resolve any discrepancies. Information on homolog and ortholog groups can be viewed in the AYbRAH web portal (https://lmse.github.io/aybrah/), including functional annotations, predictions for mitochondrial localization and transmembrane domains, literature references and phylogenetic reconstructions. Ortholog assignments in AYbRAH were compared to HOGENOM, KEGG Orthology, OMA, eggNOG and PANTHER. PANTHER and OMA had the most congruent ortholog groups with AYbRAH, while the other phylogenomic databases had greater amounts of under-clustering, over-clustering or no ortholog annotations for proteins. Future plans are discussed for AYbRAH, and recommendations are made for other research communities seeking to create curated ortholog databases.
Collapse
Affiliation(s)
- Kevin Correia
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, College Street, Toronto, ON, Canada
| | - Shi M Yu
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, College Street, Toronto, ON, Canada
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, College Street, Toronto, ON, Canada.,Institute of Biomaterials and Biomedical Engineering, University of Toronto, College Street, Toronto, ON, Canada
| |
Collapse
|
9
|
Mead ME, Knowles SL, Raja HA, Beattie SR, Kowalski CH, Steenwyk JL, Silva LP, Chiaratto J, Ries LNA, Goldman GH, Cramer RA, Oberlies NH, Rokas A. Characterizing the Pathogenic, Genomic, and Chemical Traits of Aspergillus fischeri, a Close Relative of the Major Human Fungal Pathogen Aspergillus fumigatus. mSphere 2019; 4:e00018-19. [PMID: 30787113 PMCID: PMC6382966 DOI: 10.1128/msphere.00018-19] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 02/04/2019] [Indexed: 12/15/2022] Open
Abstract
Aspergillus fischeri is closely related to Aspergillus fumigatus, the major cause of invasive mold infections. Even though A. fischeri is commonly found in diverse environments, including hospitals, it rarely causes invasive disease. Why A. fischeri causes less human disease than A. fumigatus is unclear. A comparison of A. fischeri and A. fumigatus for pathogenic, genomic, and secondary metabolic traits revealed multiple differences in pathogenesis-related phenotypes. We observed that A. fischeri NRRL 181 is less virulent than A. fumigatus strain CEA10 in multiple animal models of disease, grows slower in low-oxygen environments, and is more sensitive to oxidative stress. Strikingly, the observed differences for some traits are of the same order of magnitude as those previously reported between A. fumigatus strains. In contrast, similar to what has previously been reported, the two species exhibit high genomic similarity; ∼90% of the A. fumigatus proteome is conserved in A. fischeri, including 48/49 genes known to be involved in A. fumigatus virulence. However, only 10/33 A. fumigatus biosynthetic gene clusters (BGCs) likely involved in secondary metabolite production are conserved in A. fischeri and only 13/48 A. fischeri BGCs are conserved in A. fumigatus Detailed chemical characterization of A. fischeri cultures grown on multiple substrates identified multiple secondary metabolites, including two new compounds and one never before isolated as a natural product. Additionally, an A. fischeri deletion mutant of laeA, a master regulator of secondary metabolism, produced fewer secondary metabolites and in lower quantities, suggesting that regulation of secondary metabolism is at least partially conserved. These results suggest that the nonpathogenic A. fischeri possesses many of the genes important for A. fumigatus pathogenicity but is divergent with respect to its ability to thrive under host-relevant conditions and its secondary metabolism.IMPORTANCEAspergillus fumigatus is the primary cause of aspergillosis, a devastating ensemble of diseases associated with severe morbidity and mortality worldwide. A. fischeri is a close relative of A. fumigatus but is not generally observed to cause human disease. To gain insights into the underlying causes of this remarkable difference in pathogenicity, we compared two representative strains (one from each species) for a range of pathogenesis-relevant biological and chemical characteristics. We found that disease progression in multiple A. fischeri mouse models was slower and caused less mortality than A. fumigatus Remarkably, the observed differences between A. fischeri and A. fumigatus strains examined here closely resembled those previously described for two commonly studied A. fumigatus strains, AF293 and CEA10. A. fischeri and A. fumigatus exhibited different growth profiles when placed in a range of stress-inducing conditions encountered during infection, such as low levels of oxygen and the presence of chemicals that induce the production of reactive oxygen species. We also found that the vast majority of A. fumigatus genes known to be involved in virulence are conserved in A. fischeri, whereas the two species differ significantly in their secondary metabolic pathways. These similarities and differences that we report here are the first step toward understanding the evolutionary origin of a major fungal pathogen.
Collapse
Affiliation(s)
- Matthew E Mead
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, USA
| | - Sonja L Knowles
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, North Carolina, USA
| | - Huzefa A Raja
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, North Carolina, USA
| | - Sarah R Beattie
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Caitlin H Kowalski
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, USA
| | - Lilian P Silva
- Faculdade de Ciencias Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Jessica Chiaratto
- Faculdade de Ciencias Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Laure N A Ries
- Faculdade de Ciencias Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Gustavo H Goldman
- Faculdade de Ciencias Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Robert A Cramer
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Nicholas H Oberlies
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, North Carolina, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
10
|
Knowles SL, Raja HA, Wright AJ, Lee AML, Caesar LK, Cech NB, Mead ME, Steenwyk JL, Ries LNA, Goldman GH, Rokas A, Oberlies NH. Mapping the Fungal Battlefield: Using in situ Chemistry and Deletion Mutants to Monitor Interspecific Chemical Interactions Between Fungi. Front Microbiol 2019; 10:285. [PMID: 30837981 PMCID: PMC6389630 DOI: 10.3389/fmicb.2019.00285] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 02/04/2019] [Indexed: 11/13/2022] Open
Abstract
Fungi grow in competitive environments, and to cope, they have evolved strategies, such as the ability to produce a wide range of secondary metabolites. This begs two related questions. First, how do secondary metabolites influence fungal ecology and interspecific interactions? Second, can these interspecific interactions provide a way to “see” how fungi respond, chemically, within a competitive environment? To evaluate these, and to gain insight into the secondary metabolic arsenal fungi possess, we co-cultured Aspergillus fischeri, a genetically tractable fungus that produces a suite of mycotoxins, with Xylaria cubensis, a fungus that produces the fungistatic compound and FDA-approved drug, griseofulvin. To monitor and characterize fungal chemistry in situ, we used the droplet-liquid microjunction-surface sampling probe (droplet probe). The droplet probe makes a microextraction at defined locations on the surface of the co-culture, followed by analysis of the secondary metabolite profile via liquid chromatography-mass spectrometry. Using this, we mapped and compared the spatial profiles of secondary metabolites from both fungi in monoculture versus co-culture. X. cubensis predominantly biosynthesized griseofulvin and dechlorogriseofulvin in monoculture. In contrast, under co-culture conditions a deadlock was formed between the two fungi, and X. cubensis biosynthesized the same two secondary metabolites, along with dechloro-5′-hydroxygriseofulvin and 5′-hydroxygriseofulvin, all of which have fungistatic properties, as well as mycotoxins like cytochalasin D and cytochalasin C. In contrast, in co-culture, A. fischeri increased the production of the mycotoxins fumitremorgin B and verruculogen, but otherwise remained unchanged relative to its monoculture. To evaluate that secondary metabolites play an important role in defense and territory establishment, we co-cultured A. fischeri lacking the master regulator of secondary metabolism laeA with X. cubensis. We found that the reduced secondary metabolite biosynthesis of the ΔlaeA strain of A. fischeri eliminated the organism’s ability to compete in co-culture and led to its displacement by X. cubensis. These results demonstrate the potential of in situ chemical analysis and deletion mutant approaches for shedding light on the ecological roles of secondary metabolites and how they influence fungal ecological strategies; co-culturing may also stimulate the biosynthesis of secondary metabolites that are not produced in monoculture in the laboratory.
Collapse
Affiliation(s)
- Sonja L Knowles
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, NC, United States
| | - Huzefa A Raja
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, NC, United States
| | - Allison J Wright
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, NC, United States
| | - Ann Marie L Lee
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, NC, United States
| | - Lindsay K Caesar
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, NC, United States
| | - Nadja B Cech
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, NC, United States
| | - Matthew E Mead
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States
| | - Laure N A Ries
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Gustavo H Goldman
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States
| | - Nicholas H Oberlies
- Department of Chemistry and Biochemistry, University of North Carolina at Greensboro, Greensboro, NC, United States
| |
Collapse
|
11
|
Bogaert KA, Blommaert L, Ljung K, Beeckman T, De Clerck O. Auxin Function in the Brown Alga Dictyota dichotoma. PLANT PHYSIOLOGY 2019; 179:280-299. [PMID: 30420566 PMCID: PMC6324224 DOI: 10.1104/pp.18.01041] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 10/30/2018] [Indexed: 05/14/2023]
Abstract
Auxin controls body plan patterning in land plants and has been proposed to play a similar role in the development of brown algae (Phaeophyta) despite their distant evolutionary relationship with land plants. The mechanism of auxin action in brown algae remains controversial because of contradicting conclusions derived from pharmacological studies on Fucus In this study, we used Dictyota dichotoma as a model system to show that auxin plays a role during the apical-basal patterning of the embryo of brown algae. Indole-3-acetic acid was detectable in D. dichotoma germlings and mature tissue. Although two-celled D. dichotoma zygotes normally develop a rhizoid from one pole and a thallus meristem from the other, addition of exogenous auxins to one-celled embryos affected polarization, and both poles of the spheroidal embryo developed into rhizoids instead. The effect was strongest at lower pH and when variable extrinsic informational cues were applied. 2-[4-(diethylamino)-2-hydroxybenzoyl]benzoic acid, an inhibitor of the ABC-B/multidrug resistance/P-glycoprotein subfamily of transporters in land plants, affected rhizoid formation by increasing rhizoid branching and inducing ectopic rhizoids. An in silico survey of auxin genes suggested that a diverse range of biosynthesis genes and transport genes, such as PIN-LIKES, and the ATP-binding cassette subfamily (ABC-B/multidrug resistance/P-glycoprotein) transporters from land plants have homologs in D. dichotoma and Ectocarpus siliculosus Together with reports on auxin function in basal lineages of green algae, these results suggest that auxin function predates the divergence between the green and brown lineage and the transition toward land plants.
Collapse
Affiliation(s)
- Kenny A Bogaert
- Department of Biology, Ghent University, 9000 Ghent, Belgium
| | | | - Karin Ljung
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-901 83 Umeå, Sweden
| | - Tom Beeckman
- VIB-UGent Center for Plant Systems Biology, B-9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium
| | | |
Collapse
|
12
|
Georgescu CH, Manson AL, Griggs AD, Desjardins CA, Pironti A, Wapinski I, Abeel T, Haas BJ, Earl AM. SynerClust: a highly scalable, synteny-aware orthologue clustering tool. Microb Genom 2018; 4. [PMID: 30418868 PMCID: PMC6321874 DOI: 10.1099/mgen.0.000231] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Accurate orthologue identification is a vital component of bacterial comparative genomic studies, but many popular sequence-similarity-based approaches do not scale well to the large numbers of genomes that are now generated routinely. Furthermore, most approaches do not take gene synteny into account, which is useful information for disentangling paralogues. Here, we present SynerClust, a user-friendly synteny-aware tool based on synergy that can process thousands of genomes. SynerClust was designed to analyse genomes with high levels of local synteny, particularly prokaryotes, which have operon structure. SynerClust’s run-time is optimized by selecting cluster representatives at each node in the phylogeny; thus, avoiding the need for exhaustive pairwise similarity searches. In benchmarking against Roary, Hieranoid2, PanX and Reciprocal Best Hit, SynerClust was able to more completely identify sets of core genes for datasets that included diverse strains, while using substantially less memory, and with scalability comparable to the fastest tools. Due to its scalability, ease of installation and use, and suitability for a variety of computing environments, orthogroup clustering using SynerClust will enable many large-scale prokaryotic comparative genomics efforts.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Thomas Abeel
- 1Broad Institute, Cambridge, MA, USA.,3Delft University of Technology, Delft, The Netherlands
| | | | | |
Collapse
|
13
|
Shen XX, Opulente DA, Kominek J, Zhou X, Steenwyk JL, Buh KV, Haase MAB, Wisecaver JH, Wang M, Doering DT, Boudouris JT, Schneider RM, Langdon QK, Ohkuma M, Endoh R, Takashima M, Manabe RI, Čadež N, Libkind D, Rosa CA, DeVirgilio J, Hulfachor AB, Groenewald M, Kurtzman CP, Hittinger CT, Rokas A. Tempo and Mode of Genome Evolution in the Budding Yeast Subphylum. Cell 2018; 175:1533-1545.e20. [PMID: 30415838 DOI: 10.1016/j.cell.2018.10.023] [Citation(s) in RCA: 344] [Impact Index Per Article: 57.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Revised: 08/12/2018] [Accepted: 10/04/2018] [Indexed: 11/17/2022]
Abstract
Budding yeasts (subphylum Saccharomycotina) are found in every biome and are as genetically diverse as plants or animals. To understand budding yeast evolution, we analyzed the genomes of 332 yeast species, including 220 newly sequenced ones, which represent nearly one-third of all known budding yeast diversity. Here, we establish a robust genus-level phylogeny comprising 12 major clades, infer the timescale of diversification from the Devonian period to the present, quantify horizontal gene transfer (HGT), and reconstruct the evolution of 45 metabolic traits and the metabolic toolkit of the budding yeast common ancestor (BYCA). We infer that BYCA was metabolically complex and chronicle the tempo and mode of genomic and phenotypic evolution across the subphylum, which is characterized by very low HGT levels and widespread losses of traits and the genes that control them. More generally, our results argue that reductive evolution is a major mode of evolutionary diversification.
Collapse
Affiliation(s)
- Xing-Xing Shen
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Dana A Opulente
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA; DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Jacek Kominek
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA; DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Xiaofan Zhou
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA; Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, 510642 Guangzhou, China
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Kelly V Buh
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Max A B Haase
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA; DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706, USA; Sackler Institute of Graduate Biomedical Sciences, NYU School of Medicine, New York, NY 10016, USA
| | - Jennifer H Wisecaver
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA; Department of Biochemistry, Center for Plant Biology, Purdue University, West Lafayette, IN 47907, USA
| | - Mingshuang Wang
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Drew T Doering
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - James T Boudouris
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Rachel M Schneider
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA; DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Quinn K Langdon
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Moriya Ohkuma
- Japan Collection of Microorganisms, RIKEN BioResource Research Center, Tsukuba, Ibaraki 305-0074, Japan
| | - Rikiya Endoh
- Japan Collection of Microorganisms, RIKEN BioResource Research Center, Tsukuba, Ibaraki 305-0074, Japan
| | - Masako Takashima
- Japan Collection of Microorganisms, RIKEN BioResource Research Center, Tsukuba, Ibaraki 305-0074, Japan
| | - Ri-Ichiroh Manabe
- Division of Genomic Technologies, RIKEN Center For Life Science Technologies, Laboratory for Comprehensive Genomic Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Neža Čadež
- Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia
| | - Diego Libkind
- Laboratorio de Microbiología Aplicada y Biotecnología, Instituto Andino Patagónico de Tecnologías Biológicas y Geoambientales (IPATEC), Consejo Nacional de Investigaciones, Científicas y Técnicas (CONICET)-Universidad Nacional del Comahue, 8400 Bariloche, Argentina
| | - Carlos A Rosa
- Departamento de Microbiologia, ICB, CP 486, Universidade Federal de Minas Gerais, Belo Horizonte, MG, 31270-901, Brazil
| | - Jeremy DeVirgilio
- Mycotoxin Prevention and Applied Microbiology Research Unit, National Center for Agricultural Utilization Research, Agricultural Research Service, U.S. Department of Agriculture, Peoria, IL 61604, USA
| | - Amanda Beth Hulfachor
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA
| | | | - Cletus P Kurtzman
- Mycotoxin Prevention and Applied Microbiology Research Unit, National Center for Agricultural Utilization Research, Agricultural Research Service, U.S. Department of Agriculture, Peoria, IL 61604, USA
| | - Chris Todd Hittinger
- Laboratory of Genetics, Genome Center of Wisconsin, Wisconsin Energy Institute, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA; DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706, USA.
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA.
| |
Collapse
|
14
|
Goodswen SJ, Kennedy PJ, Ellis JT. A Gene-Based Positive Selection Detection Approach to Identify Vaccine Candidates Using Toxoplasma gondii as a Test Case Protozoan Pathogen. Front Genet 2018; 9:332. [PMID: 30177953 PMCID: PMC6109633 DOI: 10.3389/fgene.2018.00332] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 08/02/2018] [Indexed: 11/22/2022] Open
Abstract
Over the last two decades, various in silico approaches have been developed and refined that attempt to identify protein and/or peptide vaccines candidates from informative signals encoded in protein sequences of a target pathogen. As to date, no signal has been identified that clearly indicates a protein will effectively contribute to a protective immune response in a host. The premise for this study is that proteins under positive selection from the immune system are more likely suitable vaccine candidates than proteins exposed to other selection pressures. Furthermore, our expectation is that protein sequence regions encoding major histocompatibility complexes (MHC) binding peptides will contain consecutive positive selection sites. Using freely available data and bioinformatic tools, we present a high-throughput approach through a pipeline that predicts positive selection sites, protein subcellular locations, and sequence locations of medium to high T-Cell MHC class I binding peptides. Positive selection sites are estimated from a sequence alignment by comparing rates of synonymous (dS) and non-synonymous (dN) substitutions among protein coding sequences of orthologous genes in a phylogeny. The main pipeline output is a list of protein vaccine candidates predicted to be naturally exposed to the immune system and containing sites under positive selection. Candidates are ranked with respect to the number of consecutive sites located on protein sequence regions encoding MHCI-binding peptides. Results are constrained by the reliability of prediction programs and quality of input data. Protein sequences from Toxoplasma gondii ME49 strain (TGME49) were used as a case study. Surface antigen (SAG), dense granules (GRA), microneme (MIC), and rhoptry (ROP) proteins are considered worthy T. gondii candidates. Given 8263 TGME49 protein sequences processed anonymously, the top 10 predicted candidates were all worthy candidates. In particular, the top ten included ROP5 and ROP18, which are T. gondii virulence determinants. The chance of randomly selecting a ROP protein was 0.2% given 8263 sequences. We conclude that the approach described is a valuable addition to other in silico approaches to identify vaccines candidates worthy of laboratory validation and could be adapted for other apicomplexan parasite species (with appropriate data).
Collapse
Affiliation(s)
- Stephen J Goodswen
- School of Life Sciences, University of Technology Sydney, Ultimo, NSW, Australia
| | - Paul J Kennedy
- School of Software, Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Ultimo, NSW, Australia
| | - John T Ellis
- School of Life Sciences, University of Technology Sydney, Ultimo, NSW, Australia
| |
Collapse
|
15
|
Mei S, Flemington EK, Zhang K. Transferring knowledge of bacterial protein interaction networks to predict pathogen targeted human genes and immune signaling pathways: a case study on M. tuberculosis. BMC Genomics 2018; 19:505. [PMID: 29954330 PMCID: PMC6027805 DOI: 10.1186/s12864-018-4873-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 06/18/2018] [Indexed: 12/11/2022] Open
Abstract
Background Bacterial invasive infection and host immune response is fundamental to the understanding of pathogen pathogenesis and the discovery of effective therapeutic drugs. However, there are very few experimental studies on the signaling cross-talks between bacteria and human host to date. Methods In this work, taking M. tuberculosis H37Rv (MTB) that is co-evolving with its human host as an example, we propose a general computational framework that exploits the known bacterial pathogen protein interaction networks in STRING database to predict pathogen-host protein interactions and their signaling cross-talks. In this framework, significant interlogs are derived from the known pathogen protein interaction networks to train a predictive l2-regularized logistic regression model. Results The computational results show that the proposed method achieves excellent performance of cross validation as well as low predicted positive rates on the less significant interlogs and non-interlogs, indicating a low risk of false discovery. We further conduct gene ontology (GO) and pathway enrichment analyses of the predicted pathogen-host protein interaction networks, which potentially provides insights into the machinery that M. tuberculosis H37Rv targets human genes and signaling pathways. In addition, we analyse the pathogen-host protein interactions related to drug resistance, inhibition of which potentially provides an alternative solution to M. tuberculosis H37Rv drug resistance. Conclusions The proposed machine learning framework has been verified effective for predicting bacteria-host protein interactions via known bacterial protein interaction networks. For a vast majority of bacterial pathogens that lacks experimental studies of bacteria-host protein interactions, this framework is supposed to achieve a general-purpose applicability. The predicted protein interaction networks between M. tuberculosis H37Rv and Homo sapiens, provided in the Additional files, promise to gain applications in the two fields: (1) providing an alternative solution to drug resistance; (2) revealing the patterns that M. tuberculosis H37Rv genes target human immune signaling pathways. Electronic supplementary material The online version of this article (10.1186/s12864-018-4873-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang, 110034, China.
| | - Erik K Flemington
- Department of Pathology, Tulane Cancer Center, Tulane University, New Orleans, LA, 70112, USA.
| | - Kun Zhang
- Department of Computer Science, Bioinformatics facility of Xavier NIH RCMI Cancer Research Center, Xavier University of Louisiana, New Orleans, LA, 70125, USA.
| |
Collapse
|
16
|
Galpert D, Fernández A, Herrera F, Antunes A, Molina-Ruiz R, Agüero-Chapin G. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers. BMC Bioinformatics 2018; 19:166. [PMID: 29724166 PMCID: PMC5934817 DOI: 10.1186/s12859-018-2148-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 04/04/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. RESULTS The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. CONCLUSIONS The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.
Collapse
Affiliation(s)
- Deborah Galpert
- Departamento de Ciencia de la Computación, Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Alberto Fernández
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Agostinho Antunes
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba.
| |
Collapse
|
17
|
Kallal RJ, Fernández R, Giribet G, Hormiga G. A phylotranscriptomic backbone of the orb-weaving spider family Araneidae (Arachnida, Araneae) supported by multiple methodological approaches. Mol Phylogenet Evol 2018; 126:129-140. [PMID: 29635025 DOI: 10.1016/j.ympev.2018.04.007] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 03/05/2018] [Accepted: 04/06/2018] [Indexed: 01/01/2023]
Abstract
The orb-weaving spider family Araneidae is extremely diverse (>3100 spp.) and its members can be charismatic terrestrial arthropods, many of them recognizable by their iconic orbicular snare web, such as the common garden spiders. Despite considerable effort to better understand their backbone relationships based on multiple sources of data (morphological, behavioral and molecular), pervasive low support remains in recent studies. In addition, no overarching phylogeny of araneids is available to date, hampering further comparative work. In this study, we analyze the transcriptomes of 33 taxa, including 19 araneids - 12 of them new to this study - representing most of the core family lineages, to examine the relationships within the family using genomic-scale datasets resulting from various methodological treatments, namely ortholog selection and gene occupancy as a measure of matrix completion. Six matrices were constructed to assess these effects by varying orthology inference method and gene occupancy threshold. Orthology methods used are the benchmarking tool BUSCO and the tree-based method UPhO; three gene occupancy thresholds (45%, 65%, 85%) were used to assess the effect of missing data. Gene tree and species tree-based methods (including multi-species coalescent and concatenation approaches, as well as maximum likelihood and Bayesian inference) were used totalling 17 analytical treatments. The monophyly of Araneidae and the placement of core araneid lineages were supported, together with some previously unsound backbone divergences; these include high support for Zygiellinae as the earliest diverging subfamily (followed by Nephilinae), the placement of Gasteracanthinae as sister group to Cyclosa and close relatives, and close relationships between the Araneus + Neoscona clade and Cyrtophorinae + Argiopinae clade. Incongruences were relegated to short branches in the clade comprising Cyclosa and its close relatives. We found congruence between most of the completed analyses, with minimal topological effects from occupancy/missing data and orthology assessment. The resulting number of genes by certain combinations of orthology and occupancy thresholds being analyzed had the greatest effect on the resulting trees, with anomalous outcomes recovered from analysis of lower numbers of genes.
Collapse
Affiliation(s)
- Robert J Kallal
- Department of Biological Sciences, The George Washington University, 2029 G St. NW, Washington, DC 20052, USA.
| | - Rosa Fernández
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St., Cambridge, MA 02138, USA; Bioinformatics and Genomics Unit, Center for Genomic Regulation, Carrer del Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Gonzalo Giribet
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St., Cambridge, MA 02138, USA
| | - Gustavo Hormiga
- Department of Biological Sciences, The George Washington University, 2029 G St. NW, Washington, DC 20052, USA
| |
Collapse
|
18
|
McGirr JA, Martin CH. Parallel evolution of gene expression between trophic specialists despite divergent genotypes and morphologies. Evol Lett 2018; 2:62-75. [PMID: 30283665 PMCID: PMC6089502 DOI: 10.1002/evl3.41] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 12/22/2017] [Accepted: 01/03/2018] [Indexed: 12/20/2022] Open
Abstract
Parallel evolution of gene expression commonly underlies convergent niche specialization, but parallel changes in expression could also underlie divergent specialization. We investigated divergence in gene expression and whole-genome genetic variation across three sympatric Cyprinodon pupfishes endemic to San Salvador Island, Bahamas. This recent radiation consists of a generalist and two derived specialists adapted to novel niches: a scale-eating and a snail-eating pupfish. We sampled total mRNA from all three species at two early developmental stages and compared gene expression with whole-genome genetic differentiation among all three species in 42 resequenced genomes. Eighty percent of genes that were differentially expressed between snail-eaters and generalists were up or down regulated in the same direction between scale-eaters and generalists; however, there were no fixed variants shared between species underlying these parallel changes in expression. Genes showing parallel evolution of expression were enriched for effects on metabolic processes, whereas genes showing divergent expression were enriched for effects on cranial skeleton development and pigment biosynthesis, reflecting the most divergent phenotypes observed between specialist species. Our findings reveal that even divergent niche specialists may exhibit convergent adaptation to higher trophic levels through shared genetic pathways. This counterintuitive result suggests that parallel evolution in gene expression can accompany divergent ecological speciation during adaptive radiation.
Collapse
Affiliation(s)
- Joseph A. McGirr
- Department of BiologyUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27514
| | - Christopher H. Martin
- Department of BiologyUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27514
| |
Collapse
|
19
|
Schreiber HL, Conover MS, Chou WC, Hibbing ME, Manson AL, Dodson KW, Hannan TJ, Roberts PL, Stapleton AE, Hooton TM, Livny J, Earl AM, Hultgren SJ. Bacterial virulence phenotypes of Escherichia coli and host susceptibility determine risk for urinary tract infections. Sci Transl Med 2017; 9:9/382/eaaf1283. [PMID: 28330863 DOI: 10.1126/scitranslmed.aaf1283] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Revised: 05/12/2016] [Accepted: 12/12/2016] [Indexed: 01/01/2023]
Abstract
Urinary tract infections (UTIs) are caused by uropathogenic Escherichia coli (UPEC) strains. In contrast to many enteric E. coli pathogroups, no genetic signature has been identified for UPEC strains. We conducted a high-resolution comparative genomic study using E. coli isolates collected from the urine of women suffering from frequent recurrent UTIs. These isolates were genetically diverse and varied in their urovirulence, that is, their ability to infect the bladder in a mouse model of cystitis. We found no set of genes, including previously defined putative urovirulence factors (PUFs), that were predictive of urovirulence. In addition, in some patients, the E. coli strain causing a recurrent UTI had fewer PUFs than the supplanted strain. In competitive experimental infections in mice, the supplanting strain was more efficient at colonizing the mouse bladder than the supplanted strain. Despite the lack of a clear genomic signature for urovirulence, comparative transcriptomic and phenotypic analyses revealed that the expression of key conserved functions during culture, such as motility and metabolism, could be used to predict subsequent colonization of the mouse bladder. Together, our findings suggest that UTI risk and outcome may be determined by complex interactions between host susceptibility and the urovirulence potential of diverse bacterial strains.
Collapse
Affiliation(s)
- Henry L Schreiber
- Department of Molecular Microbiology, Washington University, St. Louis, MO 63110, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Matt S Conover
- Department of Molecular Microbiology, Washington University, St. Louis, MO 63110, USA
| | - Wen-Chi Chou
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Michael E Hibbing
- Department of Molecular Microbiology, Washington University, St. Louis, MO 63110, USA
| | - Abigail L Manson
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Karen W Dodson
- Department of Molecular Microbiology, Washington University, St. Louis, MO 63110, USA
| | - Thomas J Hannan
- Department of Pathology and Immunology, Washington University, St. Louis, MO 63110, USA
| | - Pacita L Roberts
- Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Ann E Stapleton
- Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Thomas M Hooton
- Division of Infectious Diseases, Department of Medicine, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Jonathan Livny
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA.
| | - Ashlee M Earl
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA.
| | - Scott J Hultgren
- Department of Molecular Microbiology, Washington University, St. Louis, MO 63110, USA. .,Center for Women's Infectious Disease Research, Washington University, St. Louis, MO 63110, USA
| |
Collapse
|
20
|
Eberlein C, Nielly-Thibault L, Maaroufi H, Dubé AK, Leducq JB, Charron G, Landry CR. The Rapid Evolution of an Ohnolog Contributes to the Ecological Specialization of Incipient Yeast Species. Mol Biol Evol 2017; 34:2173-2186. [PMID: 28482005 DOI: 10.1093/molbev/msx153] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Identifying the molecular changes that lead to ecological specialization during speciation is one of the major goals of molecular evolution. One question that remains to be thoroughly investigated is whether ecological specialization derives strictly from adaptive changes and their associated trade-offs, or from conditionally neutral mutations that accumulate under relaxed selection. We used whole-genome sequencing, genome annotation and computational analyses to identify genes that have rapidly diverged between two incipient species of Saccharomyces paradoxus that occupy different climatic regions along a south-west to north-east gradient. As candidate loci for ecological specialization, we identified genes that show signatures of adaptation and accelerated rates of amino acid substitutions, causing asymmetric evolution between lineages. This set of genes includes a glycyl-tRNA-synthetase, GRS2, which is known to be transcriptionally induced under heat stress in the model and sister species S. cerevisiae. Molecular modelling, expression analysis and fitness assays suggest that the accelerated evolution of this gene in the Northern lineage may be caused by relaxed selection. GRS2 arose during the whole-genome duplication (WGD) that occurred 100 million years ago in the yeast lineage. While its ohnolog GRS1 has been preserved in all post-WGD species, GRS2 has frequently been lost and is evolving rapidly, suggesting that the fate of this ohnolog is still to be resolved. Our results suggest that the asymmetric evolution of GRS2 between the two incipient S. paradoxus species contributes to their restricted climatic distributions and thus that ecological specialization derives at least partly from relaxed selection rather than a molecular trade-off resulting from adaptive evolution.
Collapse
Affiliation(s)
- Chris Eberlein
- Département de Biologie, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.,PROTEO, The Quebec Network for Research on Protein Function, Engineering and Applications, Québec, QC, Canada
| | - Lou Nielly-Thibault
- Département de Biologie, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.,PROTEO, The Quebec Network for Research on Protein Function, Engineering and Applications, Québec, QC, Canada.,Big Data Research Center (CRDM), Université Laval, Québec, QC, Canada
| | - Halim Maaroufi
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Alexandre K Dubé
- Département de Biologie, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.,PROTEO, The Quebec Network for Research on Protein Function, Engineering and Applications, Québec, QC, Canada
| | - Jean-Baptiste Leducq
- Département de Biologie, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Guillaume Charron
- Département de Biologie, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.,PROTEO, The Quebec Network for Research on Protein Function, Engineering and Applications, Québec, QC, Canada
| | - Christian R Landry
- Département de Biologie, Université Laval, Québec, QC, Canada.,Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.,PROTEO, The Quebec Network for Research on Protein Function, Engineering and Applications, Québec, QC, Canada.,Big Data Research Center (CRDM), Université Laval, Québec, QC, Canada
| |
Collapse
|
21
|
SMORE: Synteny Modulator of Repetitive Elements. Life (Basel) 2017; 7:life7040042. [PMID: 29088079 PMCID: PMC5745555 DOI: 10.3390/life7040042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 10/27/2017] [Accepted: 10/28/2017] [Indexed: 12/19/2022] Open
Abstract
Several families of multicopy genes, such as transfer ribonucleic acids (tRNAs) and ribosomal RNAs (rRNAs), are subject to concerted evolution, an effect that keeps sequences of paralogous genes effectively identical. Under these circumstances, it is impossible to distinguish orthologs from paralogs on the basis of sequence similarity alone. Synteny, the preservation of relative genomic locations, however, also remains informative for the disambiguation of evolutionary relationships in this situation. In this contribution, we describe an automatic pipeline for the evolutionary analysis of such cases that use genome-wide alignments as a starting point to assign orthology relationships determined by synteny. The evolution of tRNAs in primates as well as the history of the Y RNA family in vertebrates and nematodes are used to showcase the method. The pipeline is freely available.
Collapse
|
22
|
Mahajan G, Mande SC. Using structural knowledge in the protein data bank to inform the search for potential host-microbe protein interactions in sequence space: application to Mycobacterium tuberculosis. BMC Bioinformatics 2017; 18:201. [PMID: 28376709 PMCID: PMC5379762 DOI: 10.1186/s12859-017-1550-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Accepted: 02/16/2017] [Indexed: 12/31/2022] Open
Abstract
Background A comprehensive map of the human-M. tuberculosis (MTB) protein interactome would help fill the gaps in our understanding of the disease, and computational prediction can aid and complement experimental studies towards this end. Several sequence-based in silico approaches tap the existing data on experimentally validated protein-protein interactions (PPIs); these PPIs serve as templates from which novel interactions between pathogen and host are inferred. Such comparative approaches typically make use of local sequence alignment, which, in the absence of structural details about the interfaces mediating the template interactions, could lead to incorrect inferences, particularly when multi-domain proteins are involved. Results We propose leveraging the domain-domain interaction (DDI) information in PDB complexes to score and prioritize candidate PPIs between host and pathogen proteomes based on targeted sequence-level comparisons. Our method picks out a small set of human-MTB protein pairs as candidates for physical interactions, and the use of functional meta-data suggests that some of them could contribute to the in vivo molecular cross-talk between pathogen and host that regulates the course of the infection. Further, we present numerical data for Pfam domain families that highlights interaction specificity on the domain level. Not every instance of a pair of domains, for which interaction evidence has been found in a few instances (i.e. structures), is likely to functionally interact. Our sorting approach scores candidates according to how “distant” they are in sequence space from known examples of DDIs (templates). Thus, it provides a natural way to deal with the heterogeneity in domain-level interactions. Conclusions Our method represents a more informed application of local alignment to the sequence-based search for potential human-microbial interactions that uses available PPI data as a prior. Our approach is somewhat limited in its sensitivity by the restricted size and diversity of the template dataset, but, given the rapid accumulation of solved protein complex structures, its scope and utility are expected to keep steadily improving. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1550-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gaurang Mahajan
- National Centre for Cell Science, Ganeshkhind, Pune, 411 007, India. .,Indian Institute of Science Education and Research, Pashan, Pune, 411 008, India.
| | - Shekhar C Mande
- National Centre for Cell Science, Ganeshkhind, Pune, 411 007, India
| |
Collapse
|
23
|
Dupont PY, Cox MP. Genomic Data Quality Impacts Automated Detection of Lateral Gene Transfer in Fungi. G3 (BETHESDA, MD.) 2017; 7:1301-1314. [PMID: 28235827 PMCID: PMC5386878 DOI: 10.1534/g3.116.038448] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 02/17/2017] [Indexed: 12/26/2022]
Abstract
Lateral gene transfer (LGT, also known as horizontal gene transfer), an atypical mechanism of transferring genes between species, has almost become the default explanation for genes that display an unexpected composition or phylogeny. Numerous methods of detecting LGT events all rely on two fundamental strategies: primary structure composition or gene tree/species tree comparisons. Discouragingly, the results of these different approaches rarely coincide. With the wealth of genome data now available, detection of laterally transferred genes is increasingly being attempted in large uncurated eukaryotic datasets. However, detection methods depend greatly on the quality of the underlying genomic data, which are typically complex for eukaryotes. Furthermore, given the automated nature of genomic data collection, it is typically impractical to manually verify all protein or gene models, orthology predictions, and multiple sequence alignments, requiring researchers to accept a substantial margin of error in their datasets. Using a test case comprising plant-associated genomes across the fungal kingdom, this study reveals that composition- and phylogeny-based methods have little statistical power to detect laterally transferred genes. In particular, phylogenetic methods reveal extreme levels of topological variation in fungal gene trees, the vast majority of which show departures from the canonical species tree. Therefore, it is inherently challenging to detect LGT events in typical eukaryotic genomes. This finding is in striking contrast to the large number of claims for laterally transferred genes in eukaryotic species that routinely appear in the literature, and questions how many of these proposed examples are statistically well supported.
Collapse
Affiliation(s)
- Pierre-Yves Dupont
- Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, Palmerston North 4442, New Zealand
- the Bio-Protection Research Centre, Massey University, Palmerston North 4442, New Zealand
| | - Murray P Cox
- Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, Palmerston North 4442, New Zealand
- the Bio-Protection Research Centre, Massey University, Palmerston North 4442, New Zealand
| |
Collapse
|
24
|
Steenwyk JL, Soghigian JS, Perfect JR, Gibbons JG. Copy number variation contributes to cryptic genetic variation in outbreak lineages of Cryptococcus gattii from the North American Pacific Northwest. BMC Genomics 2016; 17:700. [PMID: 27590805 PMCID: PMC5009542 DOI: 10.1186/s12864-016-3044-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 08/24/2016] [Indexed: 12/13/2022] Open
Abstract
Background Copy number variants (CNVs) are a class of structural variants (SVs) and are defined as fragments of DNA that are present at variable copy number in comparison with a reference genome. Recent advances in bioinformatics methodologies and sequencing technologies have enabled the high-resolution quantification of genome-wide CNVs. In pathogenic fungi SVs have been shown to alter gene expression, influence host specificity, and drive fungicide resistance, but little attention has focused specifically on CNVs. Using publicly available sequencing data, we identified 90 isolates across 212 Cryptococcus gattii genomes that belong to the VGII subgroups responsible for the recent deadly outbreaks in the North American Pacific Northwest. We generated CNV profiles for each sample to investigate the prevalence and function of CNV in C. gattii. Results We identified eight genetic clusters among publicly available Illumina whole genome sequence data from 212 C. gattii isolates through population structure analysis. Three clusters represent the VGIIa, VGIIb, and VGIIc subgroups from the North American Pacific Northwest. CNV was bioinformatically predicted and affected ~300–400 Kilobases (Kb) of the C. gattii VGII subgroup genomes. Sixty-seven loci, encompassing 58 genes, showed highly divergent patterns of copy number variation between VGII subgroups. Analysis of PFam domains within divergent CN variable genes revealed enrichment of protein domains associated with transport, cell wall organization and external encapsulating structure. Conclusions CNVs may contribute to pathological and phenotypic differences observed between the C. gattii VGIIa, VGIIb, and VGIIc subpopulations. Genes overlapping with population differentiated CNVs were enriched for several virulence related functional terms. These results uncover novel candidate genes to examine the genetic and functional underpinnings of C. gattii pathogenicity. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3044-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Biology Department, Clark University, 950 Main Street, Worcester, MA, USA.,Current address: Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - John S Soghigian
- Biology Department, Clark University, 950 Main Street, Worcester, MA, USA.,Current address: Department of Environmental Sciences, The Connecticut Agricultural Experiment Station, New Haven, CT, USA
| | - John R Perfect
- Division of Infectious Diseases, Department of Medicine, Duke University Medical Center, Durham, NC, USA
| | - John G Gibbons
- Biology Department, Clark University, 950 Main Street, Worcester, MA, USA.
| |
Collapse
|
25
|
Fernández R, Edgecombe GD, Giribet G. Exploring Phylogenetic Relationships within Myriapoda and the Effects of Matrix Composition and Occupancy on Phylogenomic Reconstruction. Syst Biol 2016; 65:871-89. [PMID: 27162151 PMCID: PMC4997009 DOI: 10.1093/sysbio/syw041] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2015] [Accepted: 04/28/2016] [Indexed: 11/14/2022] Open
Abstract
Myriapods, including the diverse and familiar centipedes and millipedes, are one of the dominant terrestrial arthropod groups. Although molecular evidence has shown that Myriapoda is monophyletic, its internal phylogeny remains contentious and understudied, especially when compared to those of Chelicerata and Hexapoda. Until now, efforts have focused on taxon sampling (e.g., by including a handful of genes from many species) or on maximizing matrix size (e.g., by including hundreds or thousands of genes in just a few species), but a phylogeny maximizing sampling at both levels remains elusive. In this study, we analyzed 40 Illumina transcriptomes representing 3 of the 4 myriapod classes (Diplopoda, Chilopoda, and Symphyla); 25 transcriptomes were newly sequenced to maximize representation at the ordinal level in Diplopoda and at the family level in Chilopoda. Ten supermatrices were constructed to explore the effect of several potential phylogenetic biases (e.g., rate of evolution, heterotachy) at 3 levels of gene occupancy per taxon (50%, 75%, and 90%). Analyses based on maximum likelihood and Bayesian mixture models retrieved monophyly of each myriapod class, and resulted in 2 alternative phylogenetic positions for Symphyla, as sister group to Diplopoda + Chilopoda, or closer to Diplopoda, the latter hypothesis having been traditionally supported by morphology. Within centipedes, all orders were well supported, but 2 deep nodes remained in conflict in the different analyses despite dense taxon sampling at the family level. Relationships among centipede orders in all analyses conducted with the most complete matrix (90% occupancy) are at odds not only with the sparser but more gene-rich supermatrices (75% and 50% supermatrices) and with the matrices optimizing phylogenetic informativeness or most conserved genes, but also with previous hypotheses based on morphology, development, or other molecular data sets. Our results indicate that a high percentage of ribosomal proteins in the most complete matrices, in conjunction with distance from the root, can act in concert to compromise the estimated relationships within the ingroup. We discuss the implications of these findings in the context of the ever more prevalent quest for completeness in phylogenomic studies.
Collapse
Affiliation(s)
- Rosa Fernández
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Gregory D Edgecombe
- Department of Earth Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK
| | - Gonzalo Giribet
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| |
Collapse
|
26
|
Havird JC, Mitchell RT, Henry RP, Santos SR. Salinity-induced changes in gene expression from anterior and posterior gills of Callinectes sapidus (Crustacea: Portunidae) with implications for crustacean ecological genomics. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2016; 19:34-44. [PMID: 27337176 DOI: 10.1016/j.cbd.2016.06.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 05/31/2016] [Accepted: 06/08/2016] [Indexed: 01/05/2023]
Abstract
Decapods represent one of the most ecologically diverse taxonomic groups within crustaceans, making them ideal to study physiological processes like osmoregulation. However, prior studies have failed to consider the entire transcriptomic response of the gill - the primary organ responsible for ion transport - to changing salinity. Moreover, the molecular genetic differences between non-osmoregulatory and osmoregulatory gill types, as well as the hormonal basis of osmoregulation, remain underexplored. Here, we identified and characterized differentially expressed genes (DEGs) via RNA-Seq in anterior (non-osmoregulatory) and posterior (osmoregulatory) gills during high to low salinity transfer in the blue crab Callinectes sapidus, a well-studied model for crustacean osmoregulation. Overall, we confirmed previous expression patterns for individual ion transport genes and identified novel ones with salinity-mediated expression. Notable, novel DEGs among salinities and gill types for C. sapidus included anterior gills having higher expression of structural genes such as actin and cuticle proteins while posterior gills exhibit elevated expression of ion transport and energy-related genes, with the latter likely linked to ion transport. Potential targets among recovered DEGs for hormonal regulation of ion transport between salinities and gill types included neuropeptide Y and a KCTD16-like protein. Using publically available sequence data, constituents for a "core" gill transcriptome among decapods are presented, comprising genes involved in ion transport and energy conversion and consistent with salinity transfer experiments. Lastly, rarefication analyses lead us to recommend a modest number of sequence reads (~10-15M), but with increased biological replication, be utilized in future DEG analyses of crustaceans.
Collapse
Affiliation(s)
- Justin C Havird
- Department of Biological Sciences, Molette Laboratory for Climate Change and Environmental Studies, Auburn University, 101 Rouse Life Sciences Bldg., Auburn, AL 36849, USA; Dept. of Biology, Colorado State University, Room E106 Anatomy/Zoology Building, Fort Collins, CO 80523, USA.
| | - Reed T Mitchell
- Dept. of Biological Sciences, Auburn University, 101 Rouse Life Sciences Bldg., Auburn, AL 36849, USA; Walter Reed Biosystematics Unit, 4210 Silver Hill Rd, Suitland, MD, 20746, USA
| | - Raymond P Henry
- Dept. of Biological Sciences, Auburn University, 101 Rouse Life Sciences Bldg., Auburn, AL 36849, USA
| | - Scott R Santos
- Department of Biological Sciences, Molette Laboratory for Climate Change and Environmental Studies, Auburn University, 101 Rouse Life Sciences Bldg., Auburn, AL 36849, USA
| |
Collapse
|
27
|
Ballesteros JA, Hormiga G. A New Orthology Assessment Method for Phylogenomic Data: Unrooted Phylogenetic Orthology. Mol Biol Evol 2016; 33:2117-34. [PMID: 27189539 DOI: 10.1093/molbev/msw069] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Current sequencing technologies are making available unprecedented amounts of genetic data for a large variety of species including nonmodel organisms. Although many phylogenomic surveys spend considerable time finding orthologs from the wealth of sequence data, these results do not transcend the original study and after being processed for specific phylogenetic purposes these orthologs do not become stable orthology hypotheses. We describe a procedure to detect and document the phylogenetic distribution of orthologs allowing researchers to use this information to guide selection of loci best suited to test specific evolutionary questions. At the core of this pipeline is a new phylogenetic orthology method that is neither affected by the position of the root nor requires explicit assignment of outgroups. We discuss the properties of this new orthology assessment method and exemplify its utility for phylogenomics using a small insects dataset. In addition, we exemplify the pipeline to identify and document stable orthologs for the group of orb-weaving spiders (Araneoidea) using RNAseq data. The scripts used in this study, along with sample files and additional documentation, are available at https://github.com/ballesterus/UPhO.
Collapse
Affiliation(s)
| | - Gustavo Hormiga
- Department of Biological Sciences, The George Washington University
| |
Collapse
|
28
|
Standardized benchmarking in the quest for orthologs. Nat Methods 2016; 13:425-30. [PMID: 27043882 PMCID: PMC4827703 DOI: 10.1038/nmeth.3830] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/09/2016] [Indexed: 11/23/2022]
Abstract
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision–recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.
Collapse
|
29
|
Guimarães LC, Florczak-Wyspianska J, de Jesus LB, Viana MVC, Silva A, Ramos RTJ, Soares SDC, Soares SDC. Inside the Pan-genome - Methods and Software Overview. Curr Genomics 2016; 16:245-52. [PMID: 27006628 PMCID: PMC4765519 DOI: 10.2174/1389202916666150423002311] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Revised: 04/20/2015] [Accepted: 04/21/2015] [Indexed: 12/11/2022] Open
Abstract
The number of genomes that have been deposited in databases has increased exponentially
after the advent of Next-Generation Sequencing (NGS), which produces high-throughput sequence
data; this circumstance has demanded the development of new bioinformatics software and the creation
of new areas, such as comparative genomics. In comparative genomics, the genetic content of an
organism is compared against other organisms, which helps in the prediction of gene function and coding region sequences,
identification of evolutionary events and determination of phylogenetic relationships. However, expanding comparative
genomics to a large number of related bacteria, we can infer their lifestyles, gene repertoires and minimal genome
size. In this context, a powerful approach called Pan-genome has been initiated and developed. This approach involves the
genomic comparison of different strains of the same species, or even genus. Its main goal is to establish the total number
of non-redundant genes that are present in a determined dataset. Pan-genome consists of three parts: core genome; accessory
or dispensable genome; and species-specific or strain-specific genes. Furthermore, pan-genome is considered to be
“open” as long as new genes are added significantly to the total repertoire for each new additional genome and “closed”
when the newly added genomes cannot be inferred to significantly increase the total repertoire of the genes. To perform
all of the required calculations, a substantial amount of software has been developed, based on orthologous and paralogous
gene identification.
Collapse
Affiliation(s)
- Luis Carlos Guimarães
- Department of General Biology, Institute of Biological Sciences, Federal University of Minas Gerais, Avenue Antônio Carlos, 6627, Belo Horizonte, Minas Gerais, Brazil;; Department of Genetics, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | | | - Leandro Benevides de Jesus
- Department of General Biology, Institute of Biological Sciences, Federal University of Minas Gerais, Avenue Antônio Carlos, 6627, Belo Horizonte, Minas Gerais, Brazil
| | - Marcus Vinícius Canário Viana
- Department of General Biology, Institute of Biological Sciences, Federal University of Minas Gerais, Avenue Antônio Carlos, 6627, Belo Horizonte, Minas Gerais, Brazil
| | - Artur Silva
- Department of Genetics, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Rommel Thiago Jucá Ramos
- Department of Genetics, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Siomar de Castro Soares
- Department of Immunology, Microbiology and Parasitology, Institute of Biological Sciences and Natural Sciences Federal University of Triângulo Mineiro, Uberaba, Minas Gerais, Brazil
| | - Siomar de Castro Soares
- Department of Immunology, Microbiology and Parasitology, Institute of Biological Sciences and Natural Sciences Federal University of Triângulo Mineiro, Uberaba, Minas Gerais, Brazil
| |
Collapse
|
30
|
Roy Chowdhury P, DeMaere M, Chapman T, Worden P, Charles IG, Darling AE, Djordjevic SP. Comparative genomic analysis of toxin-negative strains of Clostridium difficile from humans and animals with symptoms of gastrointestinal disease. BMC Microbiol 2016; 16:41. [PMID: 26971047 PMCID: PMC4789261 DOI: 10.1186/s12866-016-0653-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 03/02/2016] [Indexed: 12/13/2022] Open
Abstract
Background Clostridium difficile infections (CDI) are a significant health problem to humans and food animals. Clostridial toxins ToxA and ToxB encoded by genes tcdA and tcdB are located on a pathogenicity locus known as the PaLoc and are the major virulence factors of C. difficile. While toxin-negative strains of C. difficile are often isolated from faeces of animals and patients suffering from CDI, they are not considered to play a role in disease. Toxin-negative strains of C. difficile have been used successfully to treat recurring CDI but their propensity to acquire the PaLoc via lateral gene transfer and express clinically relevant levels of toxins has reinforced the need to characterise them genetically. In addition, further studies that examine the pathogenic potential of toxin-negative strains of C. difficile and the frequency by which toxin-negative strains may acquire the PaLoc are needed. Results We undertook a comparative genomic analysis of five Australian toxin-negative isolates of C. difficile that lack tcdA, tcdB and both binary toxin genes cdtA and cdtB that were recovered from humans and farm animals with symptoms of gastrointestinal disease. Our analyses show that the five C. difficile isolates cluster closely with virulent toxigenic strains of C. difficile belonging to the same sequence type (ST) and have virulence gene profiles akin to those in toxigenic strains. Furthermore, phage acquisition appears to have played a key role in the evolution of C. difficile. Conclusions Our results are consistent with the C. difficile global population structure comprising six clades each containing both toxin-positive and toxin-negative strains. Our data also suggests that toxin-negative strains of C. difficile encode a repertoire of putative virulence factors that are similar to those found in toxigenic strains of C. difficile, raising the possibility that acquisition of PaLoc by toxin-negative strains poses a threat to human health. Studies in appropriate animal models are needed to examine the pathogenic potential of toxin-negative strains of C. difficile and to determine the frequency by which toxin-negative strains may acquire the PaLoc. Electronic supplementary material The online version of this article (doi:10.1186/s12866-016-0653-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Piklu Roy Chowdhury
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia. .,NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, PMB 8, Camden, NSW, 2570, Australia.
| | - Matthew DeMaere
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia
| | - Toni Chapman
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, PMB 8, Camden, NSW, 2570, Australia
| | - Paul Worden
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia
| | - Ian G Charles
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia.,Institute of Food Research, Norwich Research Park, Colney, Norwich, NR4 7UA, UK
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia
| | - Steven P Djordjevic
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia.
| |
Collapse
|
31
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
32
|
Barbosa R, Almeida P, Safar SVB, Santos RO, Morais PB, Nielly-Thibault L, Leducq JB, Landry CR, Gonçalves P, Rosa CA, Sampaio JP. Evidence of Natural Hybridization in Brazilian Wild Lineages of Saccharomyces cerevisiae. Genome Biol Evol 2016; 8:317-29. [PMID: 26782936 PMCID: PMC4779607 DOI: 10.1093/gbe/evv263] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The natural biology of Saccharomyces cerevisiae, the best known unicellular model eukaryote, remains poorly documented and understood although recent progress has started to change this situation. Studies carried out recently in the Northern Hemisphere revealed the existence of wild populations associated with oak trees in North America, Asia, and in the Mediterranean region. However, in spite of these advances, the global distribution of natural populations of S. cerevisiae, especially in regions were oaks and other members of the Fagaceae are absent, is not well understood. Here we investigate the occurrence of S. cerevisiae in Brazil, a tropical region where oaks and other Fagaceae are absent. We report a candidate natural habitat of S. cerevisiae in South America and, using whole-genome data, we uncover new lineages that appear to have as closest relatives the wild populations found in North America and Japan. A population structure analysis revealed the penetration of the wine genotype into the wild Brazilian population, a first observation of the impact of domesticated microbe lineages on the genetic structure of wild populations. Unexpectedly, the Brazilian population shows conspicuous evidence of hybridization with an American population of Saccharomyces paradoxus. Introgressions from S. paradoxus were significantly enriched in genes encoding secondary active transmembrane transporters. We hypothesize that hybridization in tropical wild lineages may have facilitated the habitat transition accompanying the colonization of the tropical ecosystem.
Collapse
Affiliation(s)
- Raquel Barbosa
- UCIBIO-REQUIMTE, Departamento de Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal
| | - Pedro Almeida
- UCIBIO-REQUIMTE, Departamento de Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal
| | - Silvana V B Safar
- Departamento de Microbiologia, ICB, C.P. 486, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Renata Oliveira Santos
- Departamento de Microbiologia, ICB, C.P. 486, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Paula B Morais
- Laboratório de Microbiologia Ambiental e Biotecnologia, Universidade Federal de Tocantins, Palmas, TO, Brazil
| | - Lou Nielly-Thibault
- Département de Biologie, Institut de Biologie Intégrative et Des Systèmes (IBIS), Université Laval, Pavillon Charles-Eugènes-Marchand, QC, Canada
| | - Jean-Baptiste Leducq
- Département des Sciences Biologiques, Pavillon Marie-Victorin, 90 Rue Vincent D'indy-Université de Montréal, Montréal, QC, Canada
| | - Christian R Landry
- Département de Biologie, Institut de Biologie Intégrative et Des Systèmes (IBIS), Université Laval, Pavillon Charles-Eugènes-Marchand, QC, Canada
| | - Paula Gonçalves
- UCIBIO-REQUIMTE, Departamento de Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal
| | - Carlos A Rosa
- Departamento de Microbiologia, ICB, C.P. 486, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - José Paulo Sampaio
- UCIBIO-REQUIMTE, Departamento de Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal
| |
Collapse
|
33
|
Hooper CM, Castleden IR, Aryamanesh N, Jacoby RP, Millar AH. Finding the Subcellular Location of Barley, Wheat, Rice and Maize Proteins: The Compendium of Crop Proteins with Annotated Locations (cropPAL). PLANT & CELL PHYSIOLOGY 2016; 57:e9. [PMID: 26556651 DOI: 10.1093/pcp/pcv170] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 10/27/2015] [Indexed: 05/10/2023]
Abstract
Barley, wheat, rice and maize provide the bulk of human nutrition and have extensive industrial use as agricultural products. The genomes of these crops each contains >40,000 genes encoding proteins; however, the major genome databases for these species lack annotation information of protein subcellular location for >80% of these gene products. We address this gap, by constructing the compendium of crop protein subcellular locations called crop Proteins with Annotated Locations (cropPAL). Subcellular location is most commonly determined by fluorescent protein tagging of live cells or mass spectrometry detection in subcellular purifications, but can also be predicted from amino acid sequence or protein expression patterns. The cropPAL database collates 556 published studies, from >300 research institutes in >30 countries that have been previously published, as well as compiling eight pre-computed subcellular predictions for all Hordeum vulgare, Triticum aestivum, Oryza sativa and Zea mays protein sequences. The data collection including metadata for proteins and published studies can be accessed through a search portal http://crop-PAL.org. The subcellular localization information housed in cropPAL helps to depict plant cells as compartmentalized protein networks that can be investigated for improving crop yield and quality, and developing new biotechnological solutions to agricultural challenges.
Collapse
Affiliation(s)
- Cornelia M Hooper
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| | - Ian R Castleden
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| | - Nader Aryamanesh
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| | - Richard P Jacoby
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| | - A Harvey Millar
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| |
Collapse
|
34
|
Impact of gene family evolutionary histories on phylogenetic species tree inference by gene tree parsimony. Mol Phylogenet Evol 2015; 96:9-16. [PMID: 26702957 DOI: 10.1016/j.ympev.2015.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 10/11/2015] [Accepted: 12/03/2015] [Indexed: 11/21/2022]
Abstract
Complicated history of gene duplication and loss brings challenge to molecular phylogenetic inference, especially in deep phylogenies. However, phylogenomic approaches, such as gene tree parsimony (GTP), show advantage over some other approaches in its ability to use gene families with duplications. GTP searches the 'optimal' species tree by minimizing the total cost of biological events such as duplications, but accuracy of GTP and phylogenetic signal in the context of different gene families with distinct histories of duplication and loss are unclear. To evaluate how different evolutionary properties of different gene families can impact on species tree inference, 3900 gene families from seven angiosperms encompassing a wide range of gene content, lineage-specific expansions and contractions were analyzed. It was found that the gene content and total duplication number in a gene family strongly influence species tree inference accuracy, with the highest accuracy achieved at either very low or very high gene content (or duplication number) and lowest accuracy centered in intermediate gene content (or duplication number), as the relationship can fit a binomial regression. Besides, for gene families of similar level of average gene content, those with relatively higher lineage-specific expansion or duplication rates tend to show lower accuracy. Additional correlation tests support that high accuracy for those gene families with large gene content may rely on abundant ancestral copies to provide many subtrees to resolve conflicts, whereas high accuracy for single or low copy gene families are just subject to sequence substitution per se. Very low accuracy reached by gene families of intermediate gene content or duplication number can be due to insufficient subtrees to resolve the conflicts from loss of alternative copies. As these evolutionary properties can significantly influence species tree accuracy, I discussed the potential weighting of the duplication cost by evolutionary properties of gene families in future GTP analyses.
Collapse
|
35
|
Galpert D, del Río S, Herrera F, Ancede-Gallardo E, Antunes A, Agüero-Chapin G. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species. BIOMED RESEARCH INTERNATIONAL 2015; 2015:748681. [PMID: 26605337 PMCID: PMC4641943 DOI: 10.1155/2015/748681] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Revised: 07/26/2015] [Accepted: 08/20/2015] [Indexed: 11/17/2022]
Abstract
Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.
Collapse
Affiliation(s)
- Deborah Galpert
- Departamento de Ciencias de la Computación, Universidad Central “Marta Abreu” de Las Villas (UCLV), 54830 Santa Clara, Cuba
| | - Sara del Río
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071 Granada, Spain
| | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071 Granada, Spain
| | - Evys Ancede-Gallardo
- Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas (UCLV), 54830 Santa Clara, Cuba
| | - Agostinho Antunes
- Centro Interdisciplinar de Investigação Marinha e Ambiental (CIMAR/CIIMAR), Universidade do Porto, Rua dos Bragas 177, 4050-123 Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Guillermin Agüero-Chapin
- Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas (UCLV), 54830 Santa Clara, Cuba
- Centro Interdisciplinar de Investigação Marinha e Ambiental (CIMAR/CIIMAR), Universidade do Porto, Rua dos Bragas 177, 4050-123 Porto, Portugal
| |
Collapse
|
36
|
Schulz F, Martijn J, Wascher F, Lagkouvardos I, Kostanjšek R, Ettema TJG, Horn M. A Rickettsiales symbiont of amoebae with ancient features. Environ Microbiol 2015; 18:2326-42. [PMID: 25908022 DOI: 10.1111/1462-2920.12881] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Revised: 03/03/2015] [Accepted: 03/16/2015] [Indexed: 11/28/2022]
Abstract
The Rickettsiae comprise intracellular bacterial symbionts and pathogens infecting diverse eukaryotes. Here, we provide a detailed characterization of 'Candidatus Jidaibacter acanthamoeba', a rickettsial symbiont of Acanthamoeba. The bacterium establishes the infection in its amoeba host within 2 h where it replicates within vacuoles. Higher bacterial loads and accelerated spread of infection at elevated temperatures were observed. The infection had a negative impact on host growth rate, although no increased levels of host cell lysis were seen. Phylogenomic analysis identified this bacterium as member of the Midichloriaceae. Its 2.4 Mb genome represents the largest among Rickettsiales and is characterized by a moderate degree of pseudogenization and a high coding density. We found an unusually large number of genes encoding proteins with eukaryotic-like domains such as ankyrins, leucine-rich repeats and tetratricopeptide repeats, which likely function in host interaction. There are a total of three divergent, independently acquired type IV secretion systems, and 35 flagellar genes representing the most complete set found in an obligate intracellular Alphaproteobacterium. The deeply branching phylogenetic position of 'Candidatus Jidaibacter acanthamoeba' together with its ancient features place it closely to the rickettsial ancestor and helps to better understand the transition from a free-living to an intracellular lifestyle.
Collapse
Affiliation(s)
- Frederik Schulz
- Department of Microbiology and Ecosystem Science, University of Vienna, Althanstraße 14, Vienna, Austria
| | - Joran Martijn
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, Sweden
| | - Florian Wascher
- Department of Microbiology and Ecosystem Science, University of Vienna, Althanstraße 14, Vienna, Austria
| | - Ilias Lagkouvardos
- Department of Microbiology and Ecosystem Science, University of Vienna, Althanstraße 14, Vienna, Austria
| | - Rok Kostanjšek
- Department of Biology, University of Ljubljana, Večna pot 111, Ljubljana, Slovenia
| | - Thijs J G Ettema
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, Sweden
| | - Matthias Horn
- Department of Microbiology and Ecosystem Science, University of Vienna, Althanstraße 14, Vienna, Austria
| |
Collapse
|
37
|
Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference. G3-GENES GENOMES GENETICS 2015; 5:629-38. [PMID: 25711833 PMCID: PMC4390578 DOI: 10.1534/g3.115.017095] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Ortholog detection (OD) is a lynchpin of most statistical methods in comparative genomics. This task involves accurately identifying genes across species that descend from a common ancestral sequence. OD methods comprise a wide variety of approaches, each with their own benefits and costs under a variety of evolutionary and practical scenarios. In this article, we examine the proteomes of ten mammals by using four methodologically distinct, rigorously filtered OD methods. In head-to-head comparisons, we find that these algorithms significantly outperform one another for 38–45% of the genes analyzed. We leverage this high complementarity through the development MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization, the first tool for integrating methodologically diverse OD methods. Relative to the four methods examined, MOSAIC more than quintuples the number of alignments for which all species are present while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, this improvement in alignment quality yields more confidently aligned sites and higher levels of overall conservation, while simultaneously detecting of up to 180% more positively selected sites. We close by highlighting a MOSAIC-specific positively selected sites near the active site of TPSAB1, an enzyme linked to asthma, heart disease, and irritable bowel disease. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC.
Collapse
|
38
|
Jeffares DC, Tomiczek B, Sojo V, dos Reis M. A beginners guide to estimating the non-synonymous to synonymous rate ratio of all protein-coding genes in a genome. Methods Mol Biol 2015; 1201:65-90. [PMID: 25388108 DOI: 10.1007/978-1-4939-1438-8_4] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The ratio of non-synonymous to synonymous substitutions (dN/dS) is a useful measure of the strength and mode of natural selection acting on protein-coding genes. It is widely used to study patterns of selection on protein genes on a genomic scale-from the small genomes of viruses, bacteria, and parasitic eukaryotes to the largest eukaryotic genomes. In this chapter we describe all the steps necessary to calculate the dN/dS of all the genes using at least two genomes. We include a brief discussion on assigning orthologs, and of codon-aware alignment of orthologs. We then describe how to use the CODEML program of the PAML package for phylogenetic analysis to calculate the dN/dS and how to perform some statistical tests for positive selection. We then outline some methods for interpreting output and describe how one may use this data to make discoveries about the biology of your species. Finally, as a worked example we show all the steps we used to calculate dN/dS for 3,261 orthologs from six Plasmodium species, including tests for adaptive evolution (see worked_example.pdf).
Collapse
Affiliation(s)
- Daniel C Jeffares
- Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK,
| | | | | | | |
Collapse
|
39
|
Li J, Wong CF, Wong MT, Huang H, Leung FC. Modularized evolution in archaeal methanogens phylogenetic forest. Genome Biol Evol 2014; 6:3344-59. [PMID: 25502908 PMCID: PMC4986457 DOI: 10.1093/gbe/evu259] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2014] [Indexed: 11/13/2022] Open
Abstract
Methanogens are methane-producing archaea that plays a key role in the global carbon cycle. To date, the evolutionary history of methanogens and closely related nonmethanogen species remains unresolved among studies conducted upon different genetic markers, attributing to horizontal gene transfers (HGTs). With an effort to decipher both congruent and conflicting evolutionary events, reconstruction of coevolved gene clusters and hierarchical structure in the archaeal methanogen phylogenetic forest, comprehensive evolution, and network analyses were performed upon 3,694 gene families from 41 methanogens and 33 closely related archaea. Our results show that 1) greater than 50% of genes are in topological dissonance with others; 2) the prevalent interorder HGTs, even for core genes, in methanogen genomes led to their scrambled phylogenetic relationships; 3) most methanogenesis-related genes have experienced at least one HGT; 4) greater than 20% of the genes in methanogen genomes were transferred horizontally from other archaea, with genes involved in cell-wall synthesis and defense system having been transferred most frequently; 5) the coevolution network contains seven statistically robust modules, wherein the central module has the highest average node strength and comprises a majority of the core genes; 6) different coevolutionary module genes boomed in different time and evolutionary lineage, constructing diversified pan-genome structures; 7) the modularized evolution is also closely related to the vertical evolution signals and the HGT rate of the genes. Overall, this study presented a modularized phylogenetic forest that describes a combination of complicated vertical and nonvertical evolutionary processes for methanogenic archaeal species.
Collapse
Affiliation(s)
- Jun Li
- School of Biological Sciences, Faculty of Science, The University of Hong Kong, China
| | - Chi-Fat Wong
- School of Biological Sciences, Faculty of Science, The University of Hong Kong, China School of Biological Sciences, Faculty of Science, The University of Hong Kong, China
| | - Mabel Ting Wong
- School of Biological Sciences, Faculty of Science, The University of Hong Kong, China Present address: Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
| | - He Huang
- Center for Marine Environmental Studies, Ehime University, Japan
| | - Frederick C Leung
- School of Biological Sciences, Faculty of Science, The University of Hong Kong, China Bioinformatics Center, Nanjing Agricultural University, People's Republic of China
| |
Collapse
|
40
|
Trachana K, Forslund K, Larsson T, Powell S, Doerks T, von Mering C, Bork P. A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation. PLoS One 2014; 9:e111122. [PMID: 25369365 PMCID: PMC4219706 DOI: 10.1371/journal.pone.0111122] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 09/23/2014] [Indexed: 11/19/2022] Open
Abstract
Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a “core” species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.
Collapse
Affiliation(s)
- Kalliopi Trachana
- Institute for Systems Biology, Seattle, WA, United States of America
| | - Kristoffer Forslund
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Tomas Larsson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Sean Powell
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Tobias Doerks
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich and Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max-Delbruck-Centre for Molecular Medicine, Berlin, Germany
- * E-mail:
| |
Collapse
|
41
|
Pereira C, Denise A, Lespinet O. A meta-approach for improving the prediction and the functional annotation of ortholog groups. BMC Genomics 2014; 15 Suppl 6:S16. [PMID: 25573073 PMCID: PMC4240552 DOI: 10.1186/1471-2164-15-s6-s16] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In comparative genomics, orthologs are used to transfer annotation from genes already characterized to newly sequenced genomes. Many methods have been developed for finding orthologs in sets of genomes. However, the application of different methods on the same proteome set can lead to distinct orthology predictions. METHODS We developed a method based on a meta-approach that is able to combine the results of several methods for orthologous group prediction. The purpose of this method is to produce better quality results by using the overlapping results obtained from several individual orthologous gene prediction procedures. Our method proceeds in two steps. The first aims to construct seeds for groups of orthologous genes; these seeds correspond to the exact overlaps between the results of all or several methods. In the second step, these seed groups are expanded by using HMM profiles. RESULTS We evaluated our method on two standard reference benchmarks, OrthoBench and Orthology Benchmark Service. Our method presents a higher level of accurately predicted groups than the individual input methods of orthologous group prediction. Moreover, our method increases the number of annotated orthologous pairs without decreasing the annotation quality compared to twelve state-of-the-art methods. CONCLUSIONS The meta-approach based method appears to be a reliable procedure for predicting orthologous groups. Since a large number of methods for predicting groups of orthologous genes exist, it is quite conceivable to apply this meta-approach to several combinations of different methods.
Collapse
|
42
|
Ogilvie HA, Imin N, Djordjevic MA. Diversification of the C-TERMINALLY ENCODED PEPTIDE (CEP) gene family in angiosperms, and evolution of plant-family specific CEP genes. BMC Genomics 2014; 15:870. [PMID: 25287121 PMCID: PMC4197245 DOI: 10.1186/1471-2164-15-870] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Accepted: 09/24/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Small, secreted signaling peptides work in parallel with phytohormones to control important aspects of plant growth and development. Genes from the C-TERMINALLY ENCODED PEPTIDE (CEP) family produce such peptides which negatively regulate plant growth, especially under stress, and affect other important developmental processes. To illuminate how the CEP gene family has evolved within the plant kingdom, including its emergence, diversification and variation between lineages, a comprehensive survey was undertaken to identify and characterize CEP genes in 106 plant genomes. RESULTS Using a motif-based system developed for this study to identify canonical CEP peptide domains, a total of 916 CEP genes and 1,223 CEP domains were found in angiosperms and for the first time in gymnosperms. This defines a narrow band for the emergence of CEP genes in plants, from the divergence of lycophytes to the angiosperm/gymnosperm split. Both CEP genes and domains were found to have diversified in angiosperms, particularly in the Poaceae and Solanaceae plant families. Multispecies orthologous relationships were determined for 22% of identified CEP genes, and further analysis of those groups found selective constraints upon residues within the CEP peptide and within the previously little-characterized variable region. An examination of public Oryza sativa RNA-Seq datasets revealed an expression pattern that links OsCEP5 and OsCEP6 to panicle development and flowering, and CEP gene trees reveal these emerged from a duplication event associated with the Poaceae plant family. CONCLUSIONS The characterization of the plant-family specific CEP genes OsCEP5 and OsCEP6, the association of CEP genes with angiosperm-specific development processes like panicle development, and the diversification of CEP genes in angiosperms provides further support for the hypothesis that CEP genes have been integral to the evolution of novel traits within the angiosperm lineage. Beyond these findings, the comprehensive set of CEP genes and their properties reported here will be a resource for future research on CEP genes and peptides.
Collapse
Affiliation(s)
- Huw A Ogilvie
- Research School of Biology, The Australian National University, Canberra, ACT 0200 Australia
| | - Nijat Imin
- Research School of Biology, The Australian National University, Canberra, ACT 0200 Australia
| | - Michael A Djordjevic
- Research School of Biology, The Australian National University, Canberra, ACT 0200 Australia
| |
Collapse
|
43
|
Andrade SCS, Montenegro H, Strand M, Schwartz ML, Kajihara H, Norenburg JL, Turbeville JM, Sundberg P, Giribet G. A Transcriptomic Approach to Ribbon Worm Systematics (Nemertea): Resolving the Pilidiophora Problem. Mol Biol Evol 2014; 31:3206-15. [DOI: 10.1093/molbev/msu253] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
|
44
|
Literature-based gene curation and proposed genetic nomenclature for cryptococcus. EUKARYOTIC CELL 2014; 13:878-83. [PMID: 24813190 DOI: 10.1128/ec.00083-14] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Cryptococcus, a major cause of disseminated infections in immunocompromised patients, kills over 600,000 people per year worldwide. Genes involved in the virulence of the meningitis-causing fungus are being characterized at an increasing rate, and to date, at least 648 Cryptococcus gene names have been published. However, these data are scattered throughout the literature and are challenging to find. Furthermore, conflicts in locus identification exist, so that named genes have been subsequently published under new names or names associated with one locus have been used for another locus. To avoid these conflicts and to provide a central source of Cryptococcus gene information, we have collected all published Cryptococcus gene names from the scientific literature and associated them with standard Cryptococcus locus identifiers and have incorporated them into FungiDB (www.fungidb.org). FungiDB is a panfungal genome database that collects gene information and functional data and provides search tools for 61 species of fungi and oomycetes. We applied these published names to a manually curated ortholog set of all Cryptococcus species currently in FungiDB, including Cryptococcus neoformans var. neoformans strains JEC21 and B-3501A, C. neoformans var. grubii strain H99, and Cryptococcus gattii strains R265 and WM276, and have written brief descriptions of their functions. We also compiled a protocol for gene naming that summarizes guidelines proposed by members of the Cryptococcus research community. The centralization of genomic and literature-based information for Cryptococcus at FungiDB will help researchers communicate about genes of interest, such as those related to virulence, and will further facilitate research on the pathogen.
Collapse
|
45
|
Dalquen DA, Dessimoz C. Bidirectional best hits miss many orthologs in duplication-rich clades such as plants and animals. Genome Biol Evol 2014; 5:1800-6. [PMID: 24013106 PMCID: PMC3814191 DOI: 10.1093/gbe/evt132] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Bidirectional best hits (BBH), which entails identifying the pairs of genes in two different genomes that are more similar to each other than either is to any other gene in the other genome, is a simple and widely used method to infer orthology. A recent study has analyzed the link between BBH and orthology in bacteria and archaea and concluded that, given the very high consistency in BBH they observed among triplets of neighboring genes, a high proportion of BBH are likely to be bona fide orthologs. However, limited by their analysis setup, the previous study could not easily test the reverse question: which proportion of orthologs are BBH? In this follow-up study, we consider this question in theory and answer it based on conceptual arguments, simulated data, and real biological data from all three domains of life. Our analyses corroborate the findings of the previous study, but also show that because of the high rate of gene duplication in plants and animals, as much as 60% of orthologous relations are missed by the BBH criterion.
Collapse
Affiliation(s)
- Daniel A Dalquen
- Computational Biochemistry Research Group, ETH Zurich, Zürich, Switzerland
| | | |
Collapse
|
46
|
Fernández R, Laumer CE, Vahtera V, Libro S, Kaluziak S, Sharma PP, Pérez-Porro AR, Edgecombe GD, Giribet G. Evaluating topological conflict in centipede phylogeny using transcriptomic data sets. Mol Biol Evol 2014; 31:1500-13. [PMID: 24674821 DOI: 10.1093/molbev/msu108] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Relationships between the five extant orders of centipedes have been considered solved based on morphology. Phylogenies based on samples of up to a few dozen genes have largely been congruent with the morphological tree apart from an alternative placement of one order, the relictual Craterostigmomorpha, consisting of two species in Tasmania and New Zealand. To address this incongruence, novel transcriptomic data were generated to sample all five orders of centipedes and also used as a test case for studying gene-tree incongruence. Maximum likelihood and Bayesian mixture model analyses of a data set composed of 1,934 orthologs with 45% missing data, as well as the 389 orthologs in the least saturated, stationary quartile, retrieve strong support for a sister-group relationship between Craterostigmomorpha and all other pleurostigmophoran centipedes, of which the latter group is newly named Amalpighiata. The Amalpighiata hypothesis, which shows little gene-tree incongruence and is robust to the influence of among-taxon compositional heterogeneity, implies convergent evolution in several morphological and behavioral characters traditionally used in centipede phylogenetics, such as maternal brood care, but accords with patterns of first appearances in the fossil record.
Collapse
Affiliation(s)
- Rosa Fernández
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| | - Christopher E Laumer
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| | - Varpu Vahtera
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MAZoological Museum, Department of Biology, University of Turku, Turku, Finland
| | - Silvia Libro
- Marine Science Center, Northeastern University, Nahant, MA
| | | | - Prashant P Sharma
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY
| | - Alicia R Pérez-Porro
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MACentre d'Estudis Avançats de Blanes (CEAB-CSIC), Catalonia, Spain
| | - Gregory D Edgecombe
- Department of Earth Sciences, The Natural History Museum, London, United Kingdom
| | - Gonzalo Giribet
- Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| |
Collapse
|
47
|
Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, von Mering C, Bork P. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res 2013; 42:D231-9. [PMID: 24297252 PMCID: PMC3964997 DOI: 10.1093/nar/gkt1253] [Citation(s) in RCA: 436] [Impact Index Per Article: 39.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
Collapse
Affiliation(s)
- Sean Powell
- European Molecular Biology Laboratory, Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany, University of Zurich and Swiss Institute of Bioinformatics, Institute of Molecular Life Sciences, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109-5234, USA, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), C/Dr. Aiguader 88, 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, CUBE-Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Althanstraße 14, 1090 Vienna, Austria, Institute of Biological, Environmental & Rural Sciences, Aberystwyth University, Penglais, Aberystwyth, Ceredigion, SY23 3FG, UK, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200, Copenhagen N, Denmark and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Yang Y, Luo D. The origin of parasitism gene in nematodes: evolutionary analysis through the construction of domain trees. Evol Bioinform Online 2013; 9:453-66. [PMID: 24277980 PMCID: PMC3836563 DOI: 10.4137/ebo.s13032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Inferring evolutionary history of parasitism genes is important to understand how evolutionary mechanisms affect the occurrences of parasitism genes. In this study, we constructed multiple domain trees for parasitism genes and genes under free-living conditions. Further analyses of horizontal gene transfer (HGT)-like phylogenetic incongruences, duplications, and speciations were performed based on these trees. By comparing these analyses, the contributions of pre-adaptations were found to be more important to the evolution of parasitism genes than those of duplications, and pre-adaptations are as crucial as previously reported HGTs to parasitism. Furthermore, speciation may also affect the evolution of parasitism genes. In addition, Pristionchus pacificus was suggested to be a common model organism for studies of parasitic nematodes, including root-knot species. These analyses provided information regarding mechanisms that may have contributed to the evolution of parasitism genes.
Collapse
Affiliation(s)
- Yizi Yang
- School of Life Sciences, Xiamen University, Xiamen, Fujian, China
| | | |
Collapse
|
49
|
Francis O, Han F, Adams JC. Molecular phylogeny of a RING E3 ubiquitin ligase, conserved in eukaryotic cells and dominated by homologous components, the muskelin/RanBPM/CTLH complex. PLoS One 2013; 8:e75217. [PMID: 24143168 PMCID: PMC3797097 DOI: 10.1371/journal.pone.0075217] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2012] [Accepted: 08/13/2013] [Indexed: 01/11/2023] Open
Abstract
Ubiquitination is an essential post-translational modification that regulates signalling and protein turnover in eukaryotic cells. Specificity of ubiquitination is driven by ubiquitin E3 ligases, many of which remain poorly understood. One such is the mammalian muskelin/RanBP9/CTLH complex that includes eight proteins, five of which (RanBP9/RanBPM, TWA1, MAEA, Rmnd5 and muskelin), share striking similarities of domain architecture and have been implicated in regulation of cell organisation. In budding yeast, the homologous GID complex acts to down-regulate gluconeogenesis. In both complexes, Rmnd5/GID2 corresponds to a RING ubiquitin ligase. To better understand this E3 ligase system, we conducted molecular phylogenetic and sequence analyses of the related components. TWA1, Rmnd5, MAEA and WDR26 are conserved throughout all eukaryotic supergroups, albeit WDR26 was not identified in Rhizaria. RanBPM is absent from Excavates and from some sub-lineages. Armc8 and c17orf39 were represented across unikonts but in bikonts were identified only in Viridiplantae and in O. trifallax within alveolates. Muskelin is present only in Opisthokonts. Phylogenetic and sequence analyses of the shared LisH and CTLH domains of RanBPM, TWA1, MAEA and Rmnd5 revealed closer relationships and profiles of conserved residues between, respectively, Rmnd5 and MAEA, and RanBPM and TWA1. Rmnd5 and MAEA are also related by the presence of conserved, variant RING domains. Examination of how N- or C-terminal domain deletions alter the sub-cellular localisation of each protein in mammalian cells identified distinct contributions of the LisH domains to protein localisation or folding/stability. In conclusion, all components except muskelin are inferred to have been present in the last eukaryotic common ancestor. Diversification of this ligase complex in different eukaryotic lineages may result from the apparently fast evolution of RanBPM, differing requirements for WDR26, Armc8 or c17orf39, and the origin of muskelin in opisthokonts as a RanBPM-binding protein.
Collapse
Affiliation(s)
- Ore Francis
- School of Biochemistry, University of Bristol, Bristol, United Kingdom
| | - Fujun Han
- School of Biochemistry, University of Bristol, Bristol, United Kingdom
| | - Josephine C. Adams
- School of Biochemistry, University of Bristol, Bristol, United Kingdom
- * E-mail:
| |
Collapse
|
50
|
Capra JA, Stolzer M, Durand D, Pollard KS. How old is my gene? Trends Genet 2013; 29:659-68. [PMID: 23915718 DOI: 10.1016/j.tig.2013.07.001] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Revised: 06/13/2013] [Accepted: 07/03/2013] [Indexed: 11/26/2022]
Abstract
Gene functions, interactions, disease associations, and ecological distributions are all correlated with gene age. However, it is challenging to estimate the intricate series of evolutionary events leading to a modern-day gene and then to reduce this history to a single age estimate. Focusing on eukaryotic gene families, we introduce a framework that can be used to compare current strategies for quantifying gene age, discuss key differences between these methods, and highlight several common problems. We argue that genes with complex evolutionary histories do not have a single well-defined age. As a result, care must be taken to articulate the goals and assumptions of any analysis that uses gene age estimates. Recent algorithmic advances offer the promise of gene age estimates that are fast, accurate, and consistent across gene families. This will enable a shift to integrated genome-wide analyses of all events in gene evolutionary histories in the near future.
Collapse
Affiliation(s)
- John A Capra
- Center for Human Genetics Research and Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, USA
| | | | | | | |
Collapse
|