1
|
Kumar M, Tibocha-Bonilla JD, Füssy Z, Lieng C, Schwenck SM, Levesque AV, Al-Bassam MM, Passi A, Neal M, Zuniga C, Kaiyom F, Espinoza JL, Lim H, Polson SW, Allen LZ, Zengler K. Mixotrophic growth of a ubiquitous marine diatom. SCIENCE ADVANCES 2024; 10:eado2623. [PMID: 39018398 PMCID: PMC466952 DOI: 10.1126/sciadv.ado2623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 06/12/2024] [Indexed: 07/19/2024]
Abstract
Diatoms are major players in the global carbon cycle, and their metabolism is affected by ocean conditions. Understanding the impact of changing inorganic nutrients in the oceans on diatoms is crucial, given the changes in global carbon dioxide levels. Here, we present a genome-scale metabolic model (iMK1961) for Cylindrotheca closterium, an in silico resource to understand uncharacterized metabolic functions in this ubiquitous diatom. iMK1961 represents the largest diatom metabolic model to date, comprising 1961 open reading frames and 6718 reactions. With iMK1961, we identified the metabolic response signature to cope with drastic changes in growth conditions. Comparing model predictions with Tara Oceans transcriptomics data unraveled C. closterium's metabolism in situ. Unexpectedly, the diatom only grows photoautotrophically in 21% of the sunlit ocean samples, while the majority of the samples indicate a mixotrophic (71%) or, in some cases, even a heterotrophic (8%) lifestyle in the light. Our findings highlight C. closterium's metabolic flexibility and its potential role in global carbon cycling.
Collapse
Affiliation(s)
- Manish Kumar
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Juan D. Tibocha-Bonilla
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Zoltán Füssy
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec, Czech Republic
| | - Chloe Lieng
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Sarah M. Schwenck
- Scripps Institution of Oceanography, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Alice V. Levesque
- Scripps Institution of Oceanography, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Mahmoud M. Al-Bassam
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Anurag Passi
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Maxwell Neal
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Cristal Zuniga
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Farrah Kaiyom
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Josh L. Espinoza
- Department of Microbial and Environmental Genomics, J. Craig Venter Institute, 4120 Capricorn Way, La Jolla, CA 92037, USA
| | - Hyungyu Lim
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Shawn W. Polson
- Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave., Newark, DE 19716, USA
- Center for Bioinformatics and Computational Biology, University of Delaware, 590 Avenue 1743, Newark, DE 19713, USA
| | - Lisa Zeigler Allen
- Scripps Institution of Oceanography, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
- Department of Microbial and Environmental Genomics, J. Craig Venter Institute, 4120 Capricorn Way, La Jolla, CA 92037, USA
| | - Karsten Zengler
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
- Center for Microbiome Innovation, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
- Program in Materials Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
2
|
Bohutínská M, Peichel CL. Divergence time shapes gene reuse during repeated adaptation. Trends Ecol Evol 2024; 39:396-407. [PMID: 38155043 DOI: 10.1016/j.tree.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 11/15/2023] [Accepted: 11/20/2023] [Indexed: 12/30/2023]
Abstract
When diverse lineages repeatedly adapt to similar environmental challenges, the extent to which the same genes are involved (gene reuse) varies across systems. We propose that divergence time among lineages is a key factor driving this variability: as lineages diverge, the extent of gene reuse should decrease due to reductions in allele sharing, functional differentiation among genes, and restructuring of genome architecture. Indeed, we show that many genomic studies of repeated adaptation find that more recently diverged lineages exhibit higher gene reuse during repeated adaptation, but the relationship becomes less clear at older divergence time scales. Thus, future research should explore the factors shaping gene reuse and their interplay across broad divergence time scales for a deeper understanding of evolutionary repeatability.
Collapse
Affiliation(s)
- Magdalena Bohutínská
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, 3012, Switzerland; Department of Botany, Faculty of Science, Charles University, Prague, 12800, Czech Republic.
| | - Catherine L Peichel
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, 3012, Switzerland
| |
Collapse
|
3
|
Huang B, Xiao Y, Zhang Y. Asgard archaeal selenoproteome reveals a roadmap for the archaea-to-eukaryote transition of selenocysteine incorporation machinery. THE ISME JOURNAL 2024; 18:wrae111. [PMID: 38896033 PMCID: PMC11227280 DOI: 10.1093/ismejo/wrae111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/26/2024] [Accepted: 06/18/2024] [Indexed: 06/21/2024]
Abstract
Selenocysteine (Sec) is encoded by the UGA codon that normally functions as a stop signal and is specifically incorporated into selenoproteins via a unique recoding mechanism. The translational recoding of UGA as Sec is directed by an unusual RNA structure, the SECIS element. Although archaea and eukaryotes adopt similar Sec encoding machinery, the SECIS elements have no similarities to each other with regard to sequence and structure. We analyzed >400 Asgard archaeal genomes to examine the occurrence of both Sec encoding system and selenoproteins in this archaeal superphylum, the closest prokaryotic relatives of eukaryotes. A comprehensive map of Sec utilization trait has been generated, providing the most detailed understanding of the use of this nonstandard amino acid in Asgard archaea so far. By characterizing the selenoproteomes of all organisms, several selenoprotein-rich phyla and species were identified. Most Asgard archaeal selenoprotein genes possess eukaryotic SECIS-like structures with varying degrees of diversity. Moreover, euryarchaeal SECIS elements might originate from Asgard archaeal SECIS elements via lateral gene transfer, indicating a complex and dynamic scenario of the evolution of SECIS element within archaea. Finally, a roadmap for the transition of eukaryotic SECIS elements from archaea was proposed, and selenophosphate synthetase may serve as a potential intermediate for the generation of ancestral eukaryotic SECIS element. Our results offer new insights into a deeper understanding of the evolution of Sec insertion machinery.
Collapse
Affiliation(s)
- Biyan Huang
- Shenzhen Key Laboratory of Marine Bioresources and Ecology, Brain Disease and Big Data Research Institute, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen 518055, Guangdong Province, P. R. China
| | - Yao Xiao
- Shenzhen Key Laboratory of Marine Bioresources and Ecology, Brain Disease and Big Data Research Institute, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen 518055, Guangdong Province, P. R. China
| | - Yan Zhang
- Shenzhen Key Laboratory of Marine Bioresources and Ecology, Brain Disease and Big Data Research Institute, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen 518055, Guangdong Province, P. R. China
- Shenzhen-Hong Kong Institute of Brain Science-Shenzhen Fundamental Research Institutions, Shenzhen 518055, Guangdong Province, P. R. China
| |
Collapse
|
4
|
Carhuaricra-Huaman D, Setubal JC. Protein-Coding Gene Families in Prokaryote Genome Comparisons. Methods Mol Biol 2024; 2802:33-55. [PMID: 38819555 DOI: 10.1007/978-1-0716-3838-5_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
The identification of orthologous genes is relevant for comparative genomics, phylogenetic analysis, and functional annotation. There are many computational tools for the prediction of orthologous groups as well as web-based resources that offer orthology datasets for download and online analysis. This chapter presents a simple and practical guide to the process of orthologous group prediction, using a dataset of 10 prokaryotic proteomes as example. The orthology methods covered are OrthoMCL, COGtriangles, OrthoFinder2, and OMA. The authors compare the number of orthologous groups predicted by these various methods, and present a brief workflow for the functional annotation and reconstruction of phylogenies from inferred single-copy orthologous genes. The chapter also demonstrates how to explore two orthology databases: eggNOG6 and OrthoDB.
Collapse
Affiliation(s)
- Dennis Carhuaricra-Huaman
- Programa de Pós-Graduação Interunidades em Bioinformática, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, SP, Brazil
- Research Group in Biotechnology Applied to Animal Health, Production and Conservation (SANIGEN), Laboratory of Biology and Molecular Genetics, Faculty of Veterinary Medicine, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - João Carlos Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
5
|
Robicheau BM, Tolman J, Desai D, LaRoche J. Microevolutionary patterns in ecotypes of the symbiotic cyanobacterium UCYN-A revealed from a Northwest Atlantic coastal time series. SCIENCE ADVANCES 2023; 9:eadh9768. [PMID: 37774025 PMCID: PMC10541017 DOI: 10.1126/sciadv.adh9768] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 08/28/2023] [Indexed: 10/01/2023]
Abstract
UCYN-A is a globally important nitrogen-fixing symbiotic microbe often found in colder regions and coastal areas where nitrogen fixation has been overlooked. We present a 3-year coastal Northwest Atlantic time series of UCYN-A by integrating oceanographic data with weekly nifH and16S rRNA gene sequencing and quantitative PCR assays for UCYN-A ecotypes. High UCYN-A relative abundances dominated by A1 to A4 ecotypes reoccurred annually in the coastal Northwest Atlantic. Although UCYN-A was detected every summer/fall, the ability to observe separate ecotypes may be highly dependent on sampling time given intense interannual and weekly variability of ecotype-specific occurrences. Additionally, much of UCYN-A's rarer diversity was populated by short-lived neutral mutational variants, therefore providing insight into UCYN-A's microevolutionary patterns. For instance, rare ASVs exhibited community composition restructuring annually, while also sharing a common connection to a dominant ASV within each ecotype. Our study provides additional perspectives for interpreting UCYN-A intraspecific diversity and underscores the need for high-resolution datasets when deciphering spatiotemporal ecologies within UCYN-A.
Collapse
Affiliation(s)
- Brent M. Robicheau
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Jennifer Tolman
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Dhwani Desai
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Integrated Microbiome Resource, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Julie LaRoche
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
6
|
Heinken A, Hertel J, Acharya G, Ravcheev DA, Nyga M, Okpala OE, Hogan M, Magnúsdóttir S, Martinelli F, Nap B, Preciat G, Edirisinghe JN, Henry CS, Fleming RMT, Thiele I. Genome-scale metabolic reconstruction of 7,302 human microorganisms for personalized medicine. Nat Biotechnol 2023; 41:1320-1331. [PMID: 36658342 PMCID: PMC10497413 DOI: 10.1038/s41587-022-01628-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 11/30/2022] [Indexed: 01/21/2023]
Abstract
The human microbiome influences the efficacy and safety of a wide variety of commonly prescribed drugs. Designing precision medicine approaches that incorporate microbial metabolism would require strain- and molecule-resolved, scalable computational modeling. Here, we extend our previous resource of genome-scale metabolic reconstructions of human gut microorganisms with a greatly expanded version. AGORA2 (assembly of gut organisms through reconstruction and analysis, version 2) accounts for 7,302 strains, includes strain-resolved drug degradation and biotransformation capabilities for 98 drugs, and was extensively curated based on comparative genomics and literature searches. The microbial reconstructions performed very well against three independently assembled experimental datasets with an accuracy of 0.72 to 0.84, surpassing other reconstruction resources and predicted known microbial drug transformations with an accuracy of 0.81. We demonstrate that AGORA2 enables personalized, strain-resolved modeling by predicting the drug conversion potential of the gut microbiomes from 616 patients with colorectal cancer and controls, which greatly varied between individuals and correlated with age, sex, body mass index and disease stages. AGORA2 serves as a knowledge base for the human microbiome and paves the way to personalized, predictive analysis of host-microbiome metabolic interactions.
Collapse
Affiliation(s)
- Almut Heinken
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
- INSERM UMRS 1256, Nutrition, Genetics, and Environmental Risk Exposure (NGERE), University of Lorraine, Nancy, France
| | - Johannes Hertel
- School of Medicine, University of Galway, Galway, Ireland
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany
| | - Geeta Acharya
- Integrated BioBank of Luxembourg, Dudelange, Luxembourg
| | - Dmitry A Ravcheev
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
| | | | | | - Marcus Hogan
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
| | - Stefanía Magnúsdóttir
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Filippo Martinelli
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
| | - Bram Nap
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
| | - German Preciat
- Leiden Academic Centre for Drug Research, Leiden University, Leiden, the Netherlands
| | - Janaka N Edirisinghe
- Computation Institute, University of Chicago, Chicago, IL, USA
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Christopher S Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Ronan M T Fleming
- School of Medicine, University of Galway, Galway, Ireland
- Leiden Academic Centre for Drug Research, Leiden University, Leiden, the Netherlands
| | - Ines Thiele
- School of Medicine, University of Galway, Galway, Ireland.
- Ryan Institute, University of Galway, Galway, Ireland.
- Division of Microbiology, University of Galway, Galway, Ireland.
- APC Microbiome Ireland, Cork, Ireland.
| |
Collapse
|
7
|
Alim NTB, Koppenhöfer S, Lang AS, Beatty JT. Extracellular Polysaccharide Receptor and Receptor-Binding Proteins of the Rhodobacter capsulatus Bacteriophage-like Gene Transfer Agent RcGTA. Genes (Basel) 2023; 14:genes14051124. [PMID: 37239483 DOI: 10.3390/genes14051124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 05/16/2023] [Accepted: 05/17/2023] [Indexed: 05/28/2023] Open
Abstract
A variety of prokaryotes produce a bacteriophage-like gene transfer agent (GTA), and the alphaproteobacterial Rhodobacter capsulatus RcGTA is a model GTA. Some environmental isolates of R. capsulatus lack the ability to acquire genes transferred by the RcGTA (recipient capability). In this work, we investigated the reason why R. capsulatus strain 37b4 lacks recipient capability. The RcGTA head spike fiber and tail fiber proteins have been proposed to bind extracellular oligosaccharide receptors, and strain 37b4 lacks a capsular polysaccharide (CPS). The reason why strain 37b4 lacks a CPS was unknown, as was whether the provision of a CPS to 37b4 would result in recipient capability. To address these questions, we sequenced and annotated the strain 37b4 genome and used BLAST interrogations of this genome sequence to search for homologs of genes known to be needed for R. capsulatus recipient capability. We also created a cosmid-borne genome library from a wild-type strain, mobilized the library into 37b4, and used the cosmid-complemented strain 37b4 to identify genes needed for a gain of function, allowing for the acquisition of RcGTA-borne genes. The relative presence of CPS around a wild-type strain, 37b4, and cosmid-complemented 37b4 cells was visualized using light microscopy of stained cells. Fluorescently tagged head spike fiber and tail fiber proteins of the RcGTA particle were created and used to measure the relative binding to wild-type and 37b4 cells. We found that strain 37b4 lacks recipient capability because of an inability to bind RcGTA; the reason it is incapable of binding is that it lacks CPS, and the absence of CPS is due to the absence of genes previously shown to be needed for CPS production in another strain. In addition to the head spike fiber, we found that the tail fiber protein also binds to the CPS.
Collapse
Affiliation(s)
- Nawshin T B Alim
- Department of Microbiology & Immunology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Sonja Koppenhöfer
- Department of Biology, Memorial University of Newfoundland, St. John's, NL A1C 5S7, Canada
| | - Andrew S Lang
- Department of Biology, Memorial University of Newfoundland, St. John's, NL A1C 5S7, Canada
| | - J Thomas Beatty
- Department of Microbiology & Immunology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
8
|
Karavaeva V, Sousa FL. Modular structure of complex II: An evolutionary perspective. BIOCHIMICA ET BIOPHYSICA ACTA. BIOENERGETICS 2023; 1864:148916. [PMID: 36084748 DOI: 10.1016/j.bbabio.2022.148916] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 07/21/2022] [Accepted: 09/02/2022] [Indexed: 11/25/2022]
Abstract
Succinate dehydrogenases (SDHs) and fumarate reductases (FRDs) catalyse the interconversion of succinate and fumarate, a reaction highly conserved in all domains of life. The current classification of SDH/FRDs is based on the structure of the membrane anchor subunits and their cofactors. It is, however, unknown whether this classification would hold in the context of evolution. In this work, a large-scale comparative genomic analysis of complex II addresses the questions of its taxonomic distribution and phylogeny. Our findings report that for types C, D, and F, structural classification and phylogeny go hand in hand, while for types A, B and E the situation is more complex, highlighting the possibility for their classification into subgroups. Based on these findings, we proposed a revised version of the evolutionary scenario for these enzymes in which a primordial soluble module, corresponding to the cytoplasmatic subunits, would give rise to the current diversity via several independent membrane anchor attachment events.
Collapse
Affiliation(s)
- Val Karavaeva
- Department of Functional and Evolutionary Ecology, University of Vienna, Djerassiplatz 1, 1030 Wien, Austria
| | - Filipa L Sousa
- Department of Functional and Evolutionary Ecology, University of Vienna, Djerassiplatz 1, 1030 Wien, Austria.
| |
Collapse
|
9
|
Escorcia-Rodríguez JM, Esposito M, Freyre-González JA, Moreno-Hagelsieb G. Non-synonymous to synonymous substitutions suggest that orthologs tend to keep their functions, while paralogs are a source of functional novelty. PeerJ 2022; 10:e13843. [PMID: 36065404 PMCID: PMC9440661 DOI: 10.7717/peerj.13843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 07/14/2022] [Indexed: 01/18/2023] Open
Abstract
Orthologs separate after lineages split from each other and paralogs after gene duplications. Thus, orthologs are expected to remain more functionally coherent across lineages, while paralogs have been proposed as a source of new functions. Because protein functional divergence follows from non-synonymous substitutions, we performed an analysis based on the ratio of non-synonymous to synonymous substitutions (dN/dS), as proxy for functional divergence. We used five working definitions of orthology, including reciprocal best hits (RBH), among other definitions based on network analyses and clustering. The results showed that orthologs, by all definitions tested, had values of dN/dS noticeably lower than those of paralogs, suggesting that orthologs generally tend to be more functionally stable than paralogs. The differences in dN/dS ratios remained suggesting the functional stability of orthologs after eliminating gene comparisons with potential problems, such as genes with high codon usage biases, low coverage of either of the aligned sequences, or sequences with very high similarities. Separation by percent identity of the encoded proteins showed that the differences between the dN/dS ratios of orthologs and paralogs were more evident at high sequence identity, less so as identity dropped. The last results suggest that the differences between dN/dS ratios were partially related to differences in protein identity. However, they also suggested that paralogs undergo functional divergence relatively early after duplication. Our analyses indicate that choosing orthologs as probably functionally coherent remains the right approach in comparative genomics.
Collapse
Affiliation(s)
- Juan M. Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autonóma de México, Cuernavaca, Morelos, México
| | - Mario Esposito
- Department of Biology, Wilfrid Laurier University, Waterloo, Canada
| | - Julio A. Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autonóma de México, Cuernavaca, Morelos, México
| | | |
Collapse
|
10
|
Environmental Potential for Microbial 1,4-Dioxane Degradation Is Sparse despite Mobile Elements Playing a Role in Trait Distribution. Appl Environ Microbiol 2022; 88:e0209121. [PMID: 35297726 DOI: 10.1128/aem.02091-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
1,4-Dioxane (dioxane) is an emerging contaminant of concern for which bioremediation is seen as a promising solution. To date, eight distinct gene families have been implicated in dioxane degradation, though only dioxane monooxygenase (DXMO) from Pseudonocardia dioxanivorans is routinely used as a biomarker in environmental surveys. In order to assess the functional and taxonomic diversity of bacteria capable of dioxane degradation, we collated existing, poorly-organized information on known biodegraders to create a curated suite of biomarkers with confidence levels for assessing 1,4-dioxane degradation potential. The characterized enzyme systems for dioxane degradation are frequently found on mobile elements, and we identified that many of the curated biomarkers are associated with other hallmarks of genomic rearrangements, indicating lateral gene transfer plays a role in dissemination of this trait. This is contrasted by the extremely limited phylogenetic distribution of known dioxane degraders, where all representatives belong to four classes within three bacterial phyla. Based on the curated set of expanded biomarkers, a search of more than 11,000 publicly available metagenomes identified a sparse and taxonomically limited distribution of potential dioxane degradation proteins. Our work provides an important and necessary structure to the current knowledge base for dioxane degradation and clarifies the potential for natural attenuation of dioxane across different environments. It further highlights a disconnect between the apparent mobility of these gene families and their limited distributions, indicating dioxane degradation may be difficult to integrate into a microorganism's metabolism. IMPORTANCE New regulatory limits for 1,4-dioxane in groundwater have been proposed or adopted in many countries, including the United States and Canada, generating a direct need for remediation options as well as better tools for assessing the fate of dioxane in an environment. A comprehensive suite of biomarkers associated with dioxane degradation was identified and then leveraged to examine the global potential for dioxane degradation in natural and engineered environments. We identified consistent differences in the dioxane-degrading gene families associated with terrestrial, aquatic, and wetland environments, indicating reliance on a single biomarker for assessing natural attenuation of dioxane is likely to miss key players. Most environments do not currently host the capacity for dioxane degradation-the sparse distribution of dioxane degradation potential highlights the need for bioaugmentation approaches over biostimulation of naturally occurring microbial communities.
Collapse
|
11
|
Feregrino C, Tschopp P. Assessing evolutionary and developmental transcriptome dynamics in homologous cell types. Dev Dyn 2021; 251:1472-1489. [PMID: 34114716 PMCID: PMC9545966 DOI: 10.1002/dvdy.384] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 05/19/2021] [Accepted: 06/04/2021] [Indexed: 12/03/2022] Open
Abstract
Background During development, complex organ patterns emerge through the precise temporal and spatial specification of different cell types. On an evolutionary timescale, these patterns can change, resulting in morphological diversification. It is generally believed that homologous anatomical structures are built—largely—by homologous cell types. However, whether a common evolutionary origin of such cell types is always reflected in the conservation of their intrinsic transcriptional specification programs is less clear. Results Here, we developed a user‐friendly bioinformatics workflow to detect gene co‐expression modules and test for their conservation across developmental stages and species boundaries. Using a paradigm of morphological diversification, the tetrapod limb, and single‐cell RNA‐sequencing data from two distantly related species, chicken and mouse, we assessed the transcriptional dynamics of homologous cell types during embryonic patterning. With mouse limb data as reference, we identified 19 gene co‐expression modules with varying tissue or cell type‐restricted activities. Testing for co‐expression conservation revealed modules with high evolutionary turnover, while others seemed maintained—to different degrees, in module make‐up, density or connectivity—over developmental and evolutionary timescales. Conclusions We present an approach to identify evolutionary and developmental dynamics in gene co‐expression modules during patterning‐relevant stages of homologous cell type specification using single‐cell RNA‐sequencing data. We present an approach to identify evolutionary and developmental dynamics in gene co‐expression modules during patterning‐relevant stages of homologous cell type specification using single‐cell RNA‐sequencing data.
Collapse
Affiliation(s)
- Christian Feregrino
- DUW Zoology, University of Basel, Basel, Switzerland.,Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany. Hannoversche Str. 28, Berlin, Germany
| | | |
Collapse
|
12
|
Kloosterman AM, Cimermancic P, Elsayed SS, Du C, Hadjithomas M, Donia MS, Fischbach MA, van Wezel GP, Medema MH. Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides. PLoS Biol 2020; 18:e3001026. [PMID: 33351797 PMCID: PMC7794033 DOI: 10.1371/journal.pbio.3001026] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 01/08/2021] [Accepted: 12/07/2020] [Indexed: 12/22/2022] Open
Abstract
Microbial natural products constitute a wide variety of chemical compounds, many which can have antibiotic, antiviral, or anticancer properties that make them interesting for clinical purposes. Natural product classes include polyketides (PKs), nonribosomal peptides (NRPs), and ribosomally synthesized and post-translationally modified peptides (RiPPs). While variants of biosynthetic gene clusters (BGCs) for known classes of natural products are easy to identify in genome sequences, BGCs for new compound classes escape attention. In particular, evidence is accumulating that for RiPPs, subclasses known thus far may only represent the tip of an iceberg. Here, we present decRiPPter (Data-driven Exploratory Class-independent RiPP TrackER), a RiPP genome mining algorithm aimed at the discovery of novel RiPP classes. DecRiPPter combines a Support Vector Machine (SVM) that identifies candidate RiPP precursors with pan-genomic analyses to identify which of these are encoded within operon-like structures that are part of the accessory genome of a genus. Subsequently, it prioritizes such regions based on the presence of new enzymology and based on patterns of gene cluster and precursor peptide conservation across species. We then applied decRiPPter to mine 1,295 Streptomyces genomes, which led to the identification of 42 new candidate RiPP families that could not be found by existing programs. One of these was studied further and elucidated as a representative of a novel subfamily of lanthipeptides, which we designate class V. The 2D structure of the new RiPP, which we name pristinin A3 (1), was solved using nuclear magnetic resonance (NMR), tandem mass spectrometry (MS/MS) data, and chemical labeling. Two previously unidentified modifying enzymes are proposed to create the hallmark lanthionine bridges. Taken together, our work highlights how novel natural product families can be discovered by methods going beyond sequence similarity searches to integrate multiple pathway discovery criteria. This study shows that decRiPPter, an innovative algorithmic approach using pan-genomics and machine learning, can discover novel types of ribosomally synthesized peptide (RIPP) natural products, including a new class of lanthipeptides.
Collapse
Affiliation(s)
| | - Peter Cimermancic
- Verily Life Sciences, South San Francisco, CA, United States of America
| | | | - Chao Du
- Institute of Biology, Leiden University, the Netherlands
| | | | - Mohamed S. Donia
- Department of Molecular Biology, Princeton University, NJ, United States of America
| | | | - Gilles P. van Wezel
- Institute of Biology, Leiden University, the Netherlands
- Netherlands Institute for Ecology (NIOO-KNAW), Wageningen, the Netherlands
- * E-mail: (GPvW); (MHM)
| | - Marnix H. Medema
- Bioinformatics group, Wageningen University, the Netherlands
- * E-mail: (GPvW); (MHM)
| |
Collapse
|
13
|
Hernández-Salmerón JE, Moreno-Hagelsieb G. Progress in quickly finding orthologs as reciprocal best hits: comparing blast, last, diamond and MMseqs2. BMC Genomics 2020; 21:741. [PMID: 33099302 PMCID: PMC7585182 DOI: 10.1186/s12864-020-07132-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 10/09/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Finding orthologs remains an important bottleneck in comparative genomics analyses. While the authors of software for the quick comparison of protein sequences evaluate the speed of their software and compare their results against the most usual software for the task, it is not common for them to evaluate their software for more particular uses, such as finding orthologs as reciprocal best hits (RBH). Here we compared RBH results obtained using software that runs faster than blastp. Namely, lastal, diamond, and MMseqs2. RESULTS We found that lastal required the least time to produce results. However, it yielded fewer results than any other program when comparing the proteins encoded by evolutionarily distant genomes. The program producing the most similar number of RBH to blastp was diamond ran with the "ultra-sensitive" option. However, this option was diamond's slowest, with the "very-sensitive" option offering the best balance between speed and RBH results. The speeding up of the programs was much more evident when dealing with eukaryotic genomes, which code for more numerous proteins. For example, lastal took a median of approx. 1.5% of the blastp time to run with bacterial proteomes and 0.6% with eukaryotic ones, while diamond with the very-sensitive option took 7.4% and 5.2%, respectively. Though estimated error rates were very similar among the RBH obtained with all programs, RBH obtained with MMseqs2 had the lowest error rates among the programs tested. CONCLUSIONS The fast algorithms for pairwise protein comparison produced results very similar to blast in a fraction of the time, with diamond offering the best compromise in speed, sensitivity and quality, as long as a sensitivity option, other than the default, was chosen.
Collapse
Affiliation(s)
| | - Gabriel Moreno-Hagelsieb
- Wilfrid Laurier University, Department of Biology, 75 University Ave W, Waterloo, N2L 3C5 ON Canada
| |
Collapse
|
14
|
Bonnici V, Maresi E, Giugno R. Challenges in gene-oriented approaches for pangenome content discovery. Brief Bioinform 2020; 22:5901976. [PMID: 32893299 DOI: 10.1093/bib/bbaa198] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 05/14/2020] [Accepted: 08/04/2020] [Indexed: 01/17/2023] Open
Abstract
Given a group of genomes, represented as the sets of genes that belong to them, the discovery of the pangenomic content is based on the search of genetic homology among the genes for clustering them into families. Thus, pangenomic analyses investigate the membership of the families to the given genomes. This approach is referred to as the gene-oriented approach in contrast to other definitions of the problem that takes into account different genomic features. In the past years, several tools have been developed to discover and analyse pangenomic contents. Because of the hardness of the problem, each tool applies a different strategy for discovering the pangenomic content. This results in a differentiation of the performance of each tool that depends on the composition of the input genomes. This review reports the main analysis instruments provided by the current state of the art tools for the discovery of pangenomic contents. Moreover, unlike previous works, the presented study compares pangenomic tools from a methodological perspective, analysing the causes that lead a given methodology to outperform other tools. The analysis is performed by taking into account different bacterial populations, which are synthetically generated by changing evolutionary parameters. The benchmarks used to compare the pangenomic tools, in addition to the computational pipeline developed for this purpose, are available at https://github.com/InfOmics/pangenes-review. Contact: V. Bonnici, R. Giugno Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
| | - Emiliano Maresi
- The Microsoft Research, University of Trento Centre for Computational and Systems Biology
| | - Rosalba Giugno
- Computer Science and Bioinformatics, referent of the Master Degree in Medical Bioinformatics
| |
Collapse
|
15
|
Kumar R, Bröms JE, Sjöstedt A. Exploring the Diversity Within the Genus Francisella - An Integrated Pan-Genome and Genome-Mining Approach. Front Microbiol 2020; 11:1928. [PMID: 32849479 PMCID: PMC7431613 DOI: 10.3389/fmicb.2020.01928] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 07/22/2020] [Indexed: 01/13/2023] Open
Abstract
Pan-genome analysis is a powerful method to explore genomic heterogeneity and diversity of bacterial species. Here we present a pan-genome analysis of the genus Francisella, comprising a dataset of 63 genomes and encompassing clinical as well as environmental isolates from distinct geographic locations. To determine the evolutionary relationship within the genus, we performed phylogenetic whole-genome studies utilizing the average nucleotide identity, average amino acid identity, core genes and non-recombinant loci markers. Based on the analyses, the phylogenetic trees obtained identified two distinct clades, A and B and a diverse cluster designated C. The sizes of the pan-, core-, cloud-, and shell-genomes of Francisella were estimated and compared to those of two other facultative intracellular pathogens, Legionella and Piscirickettsia. Francisella had the smallest core-genome, 692 genes, compared to 886 and 1,732 genes for Legionella and Piscirickettsia respectively, while the pan-genome of Legionella was more than twice the size of that of the other two genera. Also, the composition of the Francisella Type VI secretion system (T6SS) was analyzed. Distinct differences in the gene content of the T6SS were identified. In silico approaches performed to identify putative substrates of these systems revealed potential effectors targeting the cell wall, inner membrane, cellular nucleic acids as well as proteins, thus constituting attractive targets for site-directed mutagenesis. The comparative analysis performed here provides a comprehensive basis for the assessment of the phylogenomic relationship of members of the genus Francisella and for the identification of putative T6SS virulence traits.
Collapse
Affiliation(s)
- Rajender Kumar
- Department of Clinical Microbiology and Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå University, Umeå, Sweden
| | - Jeanette E Bröms
- Department of Clinical Microbiology and Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå University, Umeå, Sweden
| | - Anders Sjöstedt
- Department of Clinical Microbiology and Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå University, Umeå, Sweden
| |
Collapse
|
16
|
Kamminga T, Benis N, Martins Dos Santos V, Bijlsma JJE, Schaap PJ. Combined Transcriptome Sequencing of Mycoplasma hyopneumoniae and Infected Pig Lung Tissue Reveals Up-Regulation of Bacterial F1-Like ATPase and Down-Regulation of the P102 Cilium Adhesin in vivo. Front Microbiol 2020; 11:1679. [PMID: 32765473 PMCID: PMC7379848 DOI: 10.3389/fmicb.2020.01679] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 06/26/2020] [Indexed: 12/21/2022] Open
Abstract
Mycoplasma hyopneumoniae (M. hyopneumoniae) causes enzootic pneumonia in pigs but it is still largely unknown which host-pathogen interactions enable persistent infection and cause disease. In this study, we analyzed the host and bacterial transcriptomes during infection using RNA sequencing. Comparison of the transcriptome of lung lesion tissue from infected pigs with lung tissue from non-infected animals, identified 424 differentially expressed genes (FDR < 0.01 and fold change > 1.5LOG2). These genes were part of the following major pathways of the immune system: interleukin signaling (type 4, 10, 13, and 18), regulation of Toll-like receptors by endogenous ligand and activation of C3 and C5 in the complement system. Besides analyzing the lung transcriptome, a sampling protocol was developed to obtain enough bacterial mRNA from infected lung tissue for RNA sequencing. This was done by flushing infected lobes in the lung, and subsequently enriching for bacterial RNA. On average, 2.2 million bacterial reads were obtained per biological replicate to analyze the bacterial in vivo transcriptome. We compared the in vivo bacterial transcriptome with the transcriptome of bacteria grown in vitro and identified 22 up-regulated and 30 down-regulated genes (FDR < 0.01 and fold change > 2LOG2). Six out of seven genes in the operon encoding the mycoplasma specific F1-like ATPase (MHP_RS02445-MHP_RS02475) and all genes in the operon MHP_RS01965-MHP_RS01990 with functions related to nucleotide metabolism, spermidine transport and glycerol-3-phoshate transport were up-regulated in vivo. Down-regulated in vivo were genes related to glycerol uptake, cilium adhesion (P102), cell division and myo-inositol metabolism. In addition to providing a novel method to isolate bacterial mRNA from infected lung, this study provided insights into changes in gene expression during infection, which could help development of novel treatment strategies against enzootic pneumonia caused by M. hyopneumoniae.
Collapse
Affiliation(s)
- Tjerko Kamminga
- Laboratory of Systems and Synthetic Biology, Department of Agrotechnology and Food Sciences, Wageningen University and Research, Wageningen, Netherlands.,Bioprocess Technology and Support, MSD Animal Health, Boxmeer, Netherlands
| | - Nirupama Benis
- Laboratory of Systems and Synthetic Biology, Department of Agrotechnology and Food Sciences, Wageningen University and Research, Wageningen, Netherlands
| | - Vitor Martins Dos Santos
- Laboratory of Systems and Synthetic Biology, Department of Agrotechnology and Food Sciences, Wageningen University and Research, Wageningen, Netherlands
| | | | - Peter J Schaap
- Laboratory of Systems and Synthetic Biology, Department of Agrotechnology and Food Sciences, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|
17
|
Gao K, Miller J. Primary orthologs from local sequence context. BMC Bioinformatics 2020; 21:48. [PMID: 32028880 PMCID: PMC7006074 DOI: 10.1186/s12859-020-3384-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 01/22/2020] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to gene-, centric modes of inferring paths of sequence evolution are increasingly relevant. Customarily, homologous sequences derived from the same direct ancestor, whose ancestral position in two genomes is usually conserved, are termed "primary" (or "positional") orthologs. Methods based solely on similarity don't reliably distinguish primary orthologs from other homologs; for this, genomic context is often essential. Context-dependent identification of orthologs traditionally relies on genomic context over length scales characteristic of conserved gene order or whole-genome sequence alignment, and can be computationally intensive. RESULTS We demonstrate that short-range sequence context-as short as a single "maximal" match- distinguishes primary orthologs from other homologs across whole genomes. On mammalian whole genomes not preprocessed by repeat-masker, potential orthologs are extracted by genome intersection as "non-nested maximal matches:" maximal matches that are not nested into other maximal matches. It emerges that on both nucleotide and gene scales, non-nested maximal matches recapitulate primary or positional orthologs with high precision and high recall, while the corresponding computation consumes less than one thirtieth of the computation time required by commonly applied whole-genome alignment methods. In regions of genomes that would be masked by repeat-masker, non-nested maximal matches recover orthologs that are inaccessible to Lastz net alignment, for which repeat-masking is a prerequisite. mmRBHs, reciprocal best hits of genes containing non-nested maximal matches, yield novel putative orthologs, e.g. around 1000 pairs of genes for human-chimpanzee. CONCLUSIONS We describe an intersection-based method that requires neither repeat-masking nor alignment to infer evolutionary history of sequences based on short-range genomic sequence context. Ortholog identification based on non-nested maximal matches is parameter-free, and less computationally intensive than many alignment-based methods. It is especially suitable for genome-wide identification of orthologs, and may be applicable to unassembled genomes. We are agnostic as to the reasons for its effectiveness, which may reflect local variation of mean mutation rate.
Collapse
Affiliation(s)
- Kun Gao
- School of Science, Southwest University of Science and Technology, 59 Qinglong Road, Mianyang, Sichuan Province, 621010, People's Republic of China.
| | - Jonathan Miller
- Physics and Biology Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Kunigami-gun, Okinawa, 904-0495, Japan
| |
Collapse
|
18
|
van Gestel J, Ackermann M, Wagner A. Microbial life cycles link global modularity in regulation to mosaic evolution. Nat Ecol Evol 2019; 3:1184-1196. [PMID: 31332330 DOI: 10.1038/s41559-019-0939-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 06/03/2019] [Indexed: 11/09/2022]
Abstract
Microbes are exposed to changing environments, to which they can respond by adopting various lifestyles such as swimming, colony formation or dormancy. These lifestyles are often studied in isolation, thereby giving a fragmented view of the life cycle as a whole. Here, we study lifestyles in the context of this whole. We first use machine learning to reconstruct the expression changes underlying life cycle progression in the bacterium Bacillus subtilis, based on hundreds of previously acquired expression profiles. This yields a timeline that reveals the modular organization of the life cycle. By analysing over 380 Bacillales genomes, we then show that life cycle modularity gives rise to mosaic evolution in which life stages such as motility and sporulation are conserved and lost as discrete units. We postulate that this mosaic conservation pattern results from habitat changes that make these life stages obsolete or detrimental. Indeed, when evolving eight distinct Bacillales strains and species under laboratory conditions that favour colony growth, we observe rapid and parallel losses of the sporulation life stage across species, induced by mutations that affect the same global regulator. We conclude that a life cycle perspective is pivotal to understanding the causes and consequences of modularity in both regulation and evolution.
Collapse
Affiliation(s)
- Jordi van Gestel
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, Zürich, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, Switzerland. .,Department of Environmental Systems Science, ETH Zürich, Zürich, Switzerland. .,Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, Switzerland.
| | - Martin Ackermann
- Department of Environmental Systems Science, ETH Zürich, Zürich, Switzerland.,Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, Zürich, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, Switzerland. .,The Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
19
|
Dependency Between Protein-Protein Interactions and Protein Variability and Evolutionary Rates in Vertebrates: Observed Relationships and Stochastic Modeling. J Mol Evol 2019; 87:184-198. [PMID: 31302723 PMCID: PMC6658588 DOI: 10.1007/s00239-019-09899-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 07/01/2019] [Indexed: 01/23/2023]
Abstract
Recent developments in sequencing and growth of bioinformatics resources provide us with vast depositories of protein network and single nucleotide polymorphism data. It allows us to re-examine, on a larger and more comprehensive scale, the relationship between protein–protein interactions and protein variability and evolutionary rates. This relationship has remained far from unambiguously resolved for quite a long time, reflecting shifting analysis approaches in the literature, and growing data availability. In this study, we utilized several public genomic databases to investigate this relationship in human, mouse, pig, chicken, and zebrafish. We observed strong non-linear relationship patterns (tending towards convex decreasing function shapes) between protein variability and the density of corresponding protein–protein interactions across all five species. To investigate further, we carried out stochastic simulations, modeling the interplay between protein connectivity and variability. Our results indicate that a simple negative linear correlation model, often suggested (or tacitly assumed) in the literature, as either a null or an alternative hypothesis, is not a good fit with the observed data. After considering different (but still relatively simple, and not overfitting) simulation models, we found that a convex decreasing protein variability–connectivity function (specifically, exponential decay) led to a much better fit with the real data. We conclude that simple correlation models might be inadequate for describing protein variability–connectivity interplay in vertebrates; they often tend towards false negatives (showing no more than marginal linear or rank correlation where there are in fact strong non-random patterns).
Collapse
|
20
|
Kim DW, Thawng CN, Lee K, Wellington EMH, Cha CJ. A novel sulfonamide resistance mechanism by two-component flavin-dependent monooxygenase system in sulfonamide-degrading actinobacteria. ENVIRONMENT INTERNATIONAL 2019; 127:206-215. [PMID: 30928844 DOI: 10.1016/j.envint.2019.03.046] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 03/18/2019] [Accepted: 03/19/2019] [Indexed: 05/19/2023]
Abstract
Sulfonamide-degrading bacteria have been discovered in various environments, suggesting the presence of novel resistance mechanisms via drug inactivation. In this study, Microbacterium sp. CJ77 capable of utilizing various sulfonamides as a sole carbon source was isolated from a composting facility. Genome and proteome analyses revealed that a gene cluster containing a flavin-dependent monooxygenase and a flavin reductase was highly up-regulated in response to sulfonamides. Biochemical analysis showed that the two-component monooxygenase system was key enzymes for the initial cleavage of sulfonamides. Co-expression of the two-component system in Escherichia coli conferred decreased susceptibility to sulfamethoxazole, indicating that the genes encoding drug-inactivating enzymes are potential resistance determinants. Comparative genomic analysis revealed that the gene cluster containing sulfonamide monooxygenase (renamed as sulX) and flavin reductase (sulR) was highly conserved in a genomic island shared among sulfonamide-degrading actinobacteria, all of which also contained sul1-carrying class 1 integrons. These results suggest that the sulfonamide metabolism may have evolved in sulfonamide-resistant bacteria which had already acquired the class 1 integron under sulfonamide selection pressures. Furthermore, the presence of multiple insertion sequence elements and putative composite transposon structures containing the sulX gene cluster indicated potential mobilization. This is the first study to report that sulX responsible for both sulfonamide degradation and resistance is prevalent in sulfonamide-degrading actinobacteria and its genetic signatures indicate horizontal gene transfer of the novel resistance gene.
Collapse
Affiliation(s)
- Dae-Wi Kim
- Department of Systems Biotechnology and Center for Antibiotic Resistome, Chung-Ang University, Anseong 17456, Republic of Korea
| | - Cung Nawl Thawng
- Department of Systems Biotechnology and Center for Antibiotic Resistome, Chung-Ang University, Anseong 17456, Republic of Korea
| | - Kihyun Lee
- Department of Systems Biotechnology and Center for Antibiotic Resistome, Chung-Ang University, Anseong 17456, Republic of Korea
| | | | - Chang-Jun Cha
- Department of Systems Biotechnology and Center for Antibiotic Resistome, Chung-Ang University, Anseong 17456, Republic of Korea.
| |
Collapse
|
21
|
Goel P, Parvez S, Sharma A. Genomic analyses of aminoacyl tRNA synthetases from human-infecting helminths. BMC Genomics 2019; 20:333. [PMID: 31046663 PMCID: PMC6498573 DOI: 10.1186/s12864-019-5679-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/09/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Helminth infections affect ~ 60% of the human population that lives in tropical and subtropical regions worldwide. These infections result in diseases like schistosomiasis, lymphatic filariasis, river blindness and echinococcosis. Here we provide a comprehensive computational analysis of the aminoacyl tRNA synthetase (aaRS) enzyme family from 27 human-infecting helminths. Our analyses support the idea that several helminth aaRSs can be targeted for drug repurposing or for development of new drugs. For experimental validation, we focused on Onchocerciasis (also known as "river blindness"), a filarial vector-borne disease that is prevalent in Africa and Latin America. We show that halofuginone (HF) can act as a potent inhibitor of Onchocerca volvulus prolyl tRNA synthetase (OvPRS). RESULTS The conserved enzyme family of aaRSs has been validated as druggable targets in numerous eukaryotic parasites. We thus embarked on assessing aaRSs from the genomes of 27 helminths that cause infections in humans. In order to delineate the distribution of aaRSs per genome we utilized Hidden Markov Models of aaRS catalytic domains to identify all orthologues. We note that Fasciola hepatica genome encodes the highest number of aaRS-like proteins (69) whereas Taenia asiatica has the lowest count (32). The number of genes for any particular aaRS-like protein varies from 1 to 8 in these 27 studied helminths. Sequence alignments of helminth-encoded lysyl, prolyl, leucyl and threonyl tRNA synthetases suggest that various known aaRS inhibitors like Cladosporin, Halofuginone, Benzoborale and Borrelidin may be of utility against helminths. The recombinantly expressed Onchocerca volvulus PRS was used as proof of concept for targeting aaRS with drug-like molecules like HF. CONCLUSIONS Systematic analysis of unique subdomains within helminth aaRSs reveals the presence of a number of non-canonical domains like PAC3, Utp-14, Pex2_Pex12 fused to catalytic domains in the predicted helminth aaRSs. We have established a platform for biochemical validation of a large number of helminth aaRSs that can be targeted using available inhibitors to jump-start drug repurposing against human helminths.
Collapse
Affiliation(s)
- Preeti Goel
- 0000 0004 0498 7682grid.425195.eStructural Parasitology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, 110067 India ,0000 0004 0498 8167grid.411816.bDepartment of Toxicology, School of Chemical and Life Sciences, Jamia Hamdard, New Delhi, 110063 India
| | - Suhel Parvez
- 0000 0004 0498 8167grid.411816.bDepartment of Toxicology, School of Chemical and Life Sciences, Jamia Hamdard, New Delhi, 110063 India
| | - Amit Sharma
- 0000 0004 0498 7682grid.425195.eStructural Parasitology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, 110067 India
| |
Collapse
|
22
|
Abstract
Best match graphs arise naturally as the first processing intermediate in algorithms for orthology detection. Let T be a phylogenetic (gene) tree T and [Formula: see text] an assignment of leaves of T to species. The best match graph [Formula: see text] is a digraph that contains an arc from x to y if the genes x and y reside in different species and y is one of possibly many (evolutionary) closest relatives of x compared to all other genes contained in the species [Formula: see text]. Here, we characterize best match graphs and show that it can be decided in cubic time and quadratic space whether [Formula: see text] derived from a tree in this manner. If the answer is affirmative, there is a unique least resolved tree that explains [Formula: see text], which can also be constructed in cubic time.
Collapse
|
23
|
Wright ES, Baum DA. Exclusivity offers a sound yet practical species criterion for bacteria despite abundant gene flow. BMC Genomics 2018; 19:724. [PMID: 30285620 PMCID: PMC6171291 DOI: 10.1186/s12864-018-5099-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 09/21/2018] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The question of whether bacterial species objectively exist has long divided microbiologists. A major source of contention stems from the fact that bacteria regularly engage in horizontal gene transfer (HGT), making it difficult to ascertain relatedness and draw boundaries between taxa. A natural way to define taxa is based on exclusivity of relatedness, which applies when members of a taxon are more closely related to each other than they are to any outsider. It is largely unknown whether exclusive bacterial taxa exist when averaging over the genome or are rare due to rampant hybridization. RESULTS Here, we analyze a collection of 701 genomes representing a wide variety of environmental isolates from the family Streptomycetaceae, whose members are competent at HGT. We find that the presence/absence of auxiliary genes in the pan-genome displays a hierarchical (tree-like) structure that correlates significantly with the genealogy of the core-genome. Moreover, we identified the existence of many exclusive taxa, although individual genes often contradict these taxa. These conclusions were supported by repeating the analysis on 1,586 genomes belonging to the genus Bacillus. However, despite confirming the existence of exclusive groups (taxa), we were unable to identify an objective threshold at which to assign the rank of species. CONCLUSIONS The existence of bacterial taxa is justified by considering average relatedness across the entire genome, as captured by exclusivity, but is rejected if one requires unanimous agreement of all parts of the genome. We propose using exclusivity to delimit taxa and conventional genome similarity thresholds to assign bacterial taxa to the species rank. This approach recognizes species that are phylogenetically meaningful, while also establishing some degree of comparability across species-ranked taxa in different bacterial clades.
Collapse
Affiliation(s)
- Erik S Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, USA.
- Pittsburgh Center for Evolutionary Biology and Medicine, Pittsburgh, USA.
| | - David A Baum
- Department of Botany, University of Wisconsin-Madison, Madison, USA
| |
Collapse
|
24
|
Peng T, Xu Y, Zhang Y. Comparative genomics of molybdenum utilization in prokaryotes and eukaryotes. BMC Genomics 2018; 19:691. [PMID: 30231876 PMCID: PMC6147048 DOI: 10.1186/s12864-018-5068-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 09/11/2018] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Molybdenum (Mo) is an essential micronutrient for almost all biological systems, which holds key positions in several enzymes involved in carbon, nitrogen and sulfur metabolism. In general, this transition metal needs to be coordinated to a unique pterin, thus forming a prosthetic group named molybdenum cofactor (Moco) at the catalytic sites of molybdoenzymes. The biochemical functions of many molybdoenzymes have been characterized; however, comprehensive analyses of the evolution of Mo metabolism and molybdoproteomes are quite limited. RESULTS In this study, we analyzed almost 5900 sequenced organisms to examine the occurrence of the Mo utilization trait at the levels of Mo transport system, Moco biosynthetic pathway and molybdoproteins in all three domains of life. A global map of Moco biosynthesis and molybdoproteins has been generated, which shows the most detailed understanding of Mo utilization in prokaryotes and eukaryotes so far. Our results revealed that most prokaryotes and all higher eukaryotes utilize Mo whereas many unicellular eukaryotes such as parasites and most yeasts lost the ability to use this metal. By characterizing the molybdoproteomes of all organisms, we found many new molybdoprotein-rich species, especially in bacteria. A variety of new domain fusions were detected for different molybdoprotein families, suggesting the presence of novel proteins that are functionally linked to molybdoproteins or Moco biosynthesis. Moreover, horizontal gene transfer event involving both the Moco biosynthetic pathway and molybdoproteins was identified. Finally, analysis of the relationship between environmental factors and Mo utilization showed new evolutionary trends of the Mo utilization trait. CONCLUSIONS Our data provide new insights into the evolutionary history of Mo utilization in nature.
Collapse
Affiliation(s)
- Ting Peng
- Shenzhen Key Laboratory of Marine Bioresources and Ecology, College of Life Sciences and Oceanography, Shenzhen University, Guangdong Province, Shenzhen, 518060, China.,Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yinzhen Xu
- Shenzhen Key Laboratory of Marine Bioresources and Ecology, College of Life Sciences and Oceanography, Shenzhen University, Guangdong Province, Shenzhen, 518060, China.,Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yan Zhang
- Shenzhen Key Laboratory of Marine Bioresources and Ecology, College of Life Sciences and Oceanography, Shenzhen University, Guangdong Province, Shenzhen, 518060, China.
| |
Collapse
|
25
|
Abstract
This chapter covers the theory and practice of ortholog gene set computation. In the theoretical part we give detailed and formal descriptions of the relevant concepts. We also cover the topic of graph-based clustering as a tool to compute ortholog gene sets. In the second part we provide an overview of practical considerations intended for researchers who need to determine orthologous genes from a collection of annotated genomes, briefly describing some of the most popular programs and resources currently available for this task.
Collapse
|
26
|
Alvarez-Ponce D, Feyertag F, Chakraborty S. Position Matters: Network Centrality Considerably Impacts Rates of Protein Evolution in the Human Protein-Protein Interaction Network. Genome Biol Evol 2018; 9:1742-1756. [PMID: 28854629 PMCID: PMC5570066 DOI: 10.1093/gbe/evx117] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2017] [Indexed: 02/06/2023] Open
Abstract
The proteins of any organism evolve at disparate rates. A long list of factors affecting rates of protein evolution have been identified. However, the relative importance of each factor in determining rates of protein evolution remains unresolved. The prevailing view is that evolutionary rates are dominantly determined by gene expression, and that other factors such as network centrality have only a marginal effect, if any. However, this view is largely based on analyses in yeasts, and accurately measuring the importance of the determinants of rates of protein evolution is complicated by the fact that the different factors are often correlated with each other, and by the relatively poor quality of available functional genomics data sets. Here, we use correlation, partial correlation and principal component regression analyses to measure the contributions of several factors to the variability of the rates of evolution of human proteins. For this purpose, we analyzed the entire human protein–protein interaction data set and the human signal transduction network—a network data set of exceptionally high quality, obtained by manual curation, which is expected to be virtually free from false positives. In contrast with the prevailing view, we observe that network centrality (measured as the number of physical and nonphysical interactions, betweenness, and closeness) has a considerable impact on rates of protein evolution. Surprisingly, the impact of centrality on rates of protein evolution seems to be comparable, or even superior according to some analyses, to that of gene expression. Our observations seem to be independent of potentially confounding factors and from the limitations (biases and errors) of interactomic data sets.
Collapse
|
27
|
A plastidial pantoate transporter with a potential role in pantothenate synthesis. Biochem J 2018; 475:813-825. [DOI: 10.1042/bcj20170883] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 01/26/2018] [Accepted: 01/30/2018] [Indexed: 11/17/2022]
Abstract
The pantothenate (vitamin B5) synthesis pathway in plants is not fully defined because the subcellular site of its ketopantoate → pantoate reduction step is unclear. However, the pathway is known to be split between cytosol, mitochondria, and potentially plastids, and inferred to involve mitochondrial or plastidial transport of ketopantoate or pantoate. No proteins that mediate these transport steps have been identified. Comparative genomic and transcriptomic analyses identified Arabidopsis thaliana BASS1 (At1g78560) and its maize (Zea mays) ortholog as candidates for such a transport role. BASS1 proteins belong to the bile acid : sodium symporter family and share similarity with the Salmonella enterica PanS pantoate/ketopantoate transporter and with predicted bacterial transporters whose genes cluster on the chromosome with pantothenate synthesis genes. Furthermore, Arabidopsis BASS1 is co-expressed with genes related to metabolism of coenzyme A, the cofactor derived from pantothenate. Expression of Arabidopsis or maize BASS1 promoted the growth of a S. enterica panB panS mutant strain when pantoate, but not ketopantoate, was supplied, and increased the rate of [3H]pantoate uptake. Subcellular localization of green fluorescent protein fusions in Nicotiana tabacum BY-2 cells demonstrated that Arabidopsis BASS1 is targeted solely to the plastid inner envelope. Two independent Arabidopsis BASS1 knockout mutants accumulated pantoate ∼10-fold in leaves and had smaller seeds. Taken together, these data indicate that BASS1 is a physiologically significant plastidial pantoate transporter and that the pantoate reduction step in pantothenate biosynthesis could be at least partly localized in plastids.
Collapse
|
28
|
Nourdin-Galindo G, Sánchez P, Molina CF, Espinoza-Rojas DA, Oliver C, Ruiz P, Vargas-Chacoff L, Cárcamo JG, Figueroa JE, Mancilla M, Maracaja-Coutinho V, Yañez AJ. Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups. Front Cell Infect Microbiol 2017; 7:459. [PMID: 29164068 PMCID: PMC5671498 DOI: 10.3389/fcimb.2017.00459] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 10/16/2017] [Indexed: 11/13/2022] Open
Abstract
Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these genes could be directly associated with inter-genogroup differences in pathogenesis and host-pathogen interactions, information that could be useful in designing novel strategies for diagnosing and controlling P. salmonis infection.
Collapse
Affiliation(s)
- Guillermo Nourdin-Galindo
- Facultad de Ciencias, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Valdivia, Chile.,Laboratory of Integrative Bioinformatics, Facultad de Ciencias, Centro de Genómica y Bioinformática, Universidad Mayor, Santiago, Chile
| | - Patricio Sánchez
- Facultad de Ciencias, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Valdivia, Chile.,Centro FONDAP, Interdisciplinary Center for Aquaculture Research, Concepción, Chile
| | - Cristian F Molina
- Facultad de Ciencias, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Valdivia, Chile.,AUSTRAL-omics, Universidad Austral de Chile, Valdivia, Chile
| | - Daniela A Espinoza-Rojas
- Facultad de Ciencias, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Valdivia, Chile.,Laboratory of Integrative Bioinformatics, Facultad de Ciencias, Centro de Genómica y Bioinformática, Universidad Mayor, Santiago, Chile
| | - Cristian Oliver
- Facultad de Ciencias, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Valdivia, Chile.,Centro FONDAP, Interdisciplinary Center for Aquaculture Research, Concepción, Chile.,Laboratorio de Patología de Organismos Acuáticos y Biotecnología Acuícola, Facultad de Ciencias Biológicas, Universidad Andrés Bello, Viña del Mar, Chile
| | - Pamela Ruiz
- Facultad de Ciencias, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Valdivia, Chile.,Centro FONDAP, Interdisciplinary Center for Aquaculture Research, Concepción, Chile
| | - Luis Vargas-Chacoff
- Facultad de Ciencias, Instituto de Ciencias Marinas y Limnológicas, Universidad Austral de Chile, Valdivia, Chile
| | - Juan G Cárcamo
- Facultad de Ciencias, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Valdivia, Chile.,Centro FONDAP, Interdisciplinary Center for Aquaculture Research, Concepción, Chile
| | - Jaime E Figueroa
- Facultad de Ciencias, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Valdivia, Chile.,Centro FONDAP, Interdisciplinary Center for Aquaculture Research, Concepción, Chile
| | - Marcos Mancilla
- Laboratorio de Diagnóstico y Biotecnología, ADL Diagnostic Chile SpA., Puerto Montt, Chile
| | - Vinicius Maracaja-Coutinho
- Laboratory of Integrative Bioinformatics, Facultad de Ciencias, Centro de Genómica y Bioinformática, Universidad Mayor, Santiago, Chile.,Laboratory of Integrative Bioinformatics, Instituto Vandique, João Pessoa, Brazil.,Beagle Bioinformatics, Santiago, Chile
| | - Alejandro J Yañez
- Facultad de Ciencias, Instituto de Bioquímica y Microbiología, Universidad Austral de Chile, Valdivia, Chile.,Centro FONDAP, Interdisciplinary Center for Aquaculture Research, Concepción, Chile.,AUSTRAL-omics, Universidad Austral de Chile, Valdivia, Chile
| |
Collapse
|
29
|
Battenberg K, Lee EK, Chiu JC, Berry AM, Potter D. OrthoReD: a rapid and accurate orthology prediction tool with low computational requirement. BMC Bioinformatics 2017. [PMID: 28633662 PMCID: PMC5479036 DOI: 10.1186/s12859-017-1726-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Identifying orthologous genes is an initial step required for phylogenetics, and it is also a common strategy employed in functional genetics to find candidates for functionally equivalent genes across multiple species. At the same time, in silico orthology prediction tools often require large computational resources only available on computing clusters. Here we present OrthoReD, an open-source orthology prediction tool with accuracy comparable to published tools that requires only a desktop computer. The low computational resource requirement of OrthoReD is achieved by repeating orthology searches on one gene of interest at a time, thereby generating a reduced dataset to limit the scope of orthology search for each gene of interest. Results The output of OrthoReD was highly similar to the outputs of two other published orthology prediction tools, OrthologID and/or OrthoDB, for the three dataset tested, which represented three phyla with different ranges of species diversity and different number of genomes included. Median CPU time for ortholog prediction per gene by OrthoReD executed on a desktop computer was <15 min even for the largest dataset tested, which included all coding sequences of 100 bacterial species. Conclusions With high-throughput sequencing, unprecedented numbers of genes from non-model organisms are available with increasing need for clear information about their orthologies and/or functional equivalents in model organisms. OrthoReD is not only fast and accurate as an orthology prediction tool, but also gives researchers flexibility in the number of genes analyzed at a time, without requiring a high-performance computing cluster. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1726-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai Battenberg
- Department of Plant Sciences, University of California, Davis, CA, USA.
| | - Ernest K Lee
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Joanna C Chiu
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Alison M Berry
- Department of Plant Sciences, University of California, Davis, CA, USA
| | - Daniel Potter
- Department of Plant Sciences, University of California, Davis, CA, USA
| |
Collapse
|
30
|
Martín-Durán JM, Ryan JF, Vellutini BC, Pang K, Hejnol A. Increased taxon sampling reveals thousands of hidden orthologs in flatworms. Genome Res 2017; 27:1263-1272. [PMID: 28400424 PMCID: PMC5495077 DOI: 10.1101/gr.216226.116] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 04/10/2017] [Indexed: 11/25/2022]
Abstract
Gains and losses shape the gene complement of animal lineages and are a fundamental aspect of genomic evolution. Acquiring a comprehensive view of the evolution of gene repertoires is limited by the intrinsic limitations of common sequence similarity searches and available databases. Thus, a subset of the gene complement of an organism consists of hidden orthologs, i.e., those with no apparent homology to sequenced animal lineages—mistakenly considered new genes—but actually representing rapidly evolving orthologs or undetected paralogs. Here, we describe Leapfrog, a simple automated BLAST pipeline that leverages increased taxon sampling to overcome long evolutionary distances and identify putative hidden orthologs in large transcriptomic databases by transitive homology. As a case study, we used 35 transcriptomes of 29 flatworm lineages to recover 3427 putative hidden orthologs, some unidentified by OrthoFinder and HaMStR, two common orthogroup inference algorithms. Unexpectedly, we do not observe a correlation between the number of putative hidden orthologs in a lineage and its “average” evolutionary rate. Hidden orthologs do not show unusual sequence composition biases that might account for systematic errors in sequence similarity searches. Instead, gene duplication with divergence of one paralog and weak positive selection appear to underlie hidden orthology in Platyhelminthes. By using Leapfrog, we identify key centrosome-related genes and homeodomain classes previously reported as absent in free-living flatworms, e.g., planarians. Altogether, our findings demonstrate that hidden orthologs comprise a significant proportion of the gene repertoire in flatworms, qualifying the impact of gene losses and gains in gene complement evolution.
Collapse
Affiliation(s)
- José M Martín-Durán
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen 5006, Norway
| | - Joseph F Ryan
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen 5006, Norway.,Whitney Laboratory for Marine Bioscience, University of Florida, St. Augustine, Florida 32080, USA
| | - Bruno C Vellutini
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen 5006, Norway
| | - Kevin Pang
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen 5006, Norway
| | - Andreas Hejnol
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen 5006, Norway
| |
Collapse
|
31
|
Tulpan D, Leger S. The Plant Orthology Browser: An Orthology and Gene-Order Visualizer for Plant Comparative Genomics. THE PLANT GENOME 2017; 10. [PMID: 28464063 DOI: 10.3835/plantgenome2016.08.0078] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Worldwide genome sequencing efforts for plants with medium and large genomes require identification and visualization of orthologous genes, while their syntenic conservation becomes the pinnacle of any comparative and functional genomics study. Using gene models for 20 fully sequenced plant genomes, including model organisms and staple crops such as Coss., (L.) Heynh., (L.) Beauv., turnip ( L.), barley ( L.), rice ( L.), sorghum [ (L.) Moench], wheat ( L.), red wild einkorn ( Tumanian ex Gandilyan), and maize ( L.), we computationally predicted 1,021,611 orthologs using stringent sequence similarity criteria. For each pair of plant species, we determined sets of conserved synteny blocks using strand orientation and physical mapping. Gene ontology (GO) annotations are added for each gene. Plant Orthology Browser (POB) includes three interconnected modules: (i) a gene-order visualization module implementing an interactive environment for exploration of gene order between any pair of chromosomes in two plant species, (ii) a synteny visualization module providing unique interactive dot plot representations of orthologous genes between a pair of chromosomes in two distinct plant species, and (iii) a search module that interconnects all modules via free-text search capability with online as-you-type suggestions and highlighting that allows exploration of the underlining information without constraint of interface-dependent search fields. The POB is a web-based orthology and annotation visualization tool, which currently supports 20 completely sequenced plant species with considerably large genomes and offers intuitive and highly interactive pairwise comparison and visualization of genomic traits via gene orthology.
Collapse
|
32
|
Elevated Rate of Genome Rearrangements in Radiation-Resistant Bacteria. Genetics 2017; 205:1677-1689. [PMID: 28188144 PMCID: PMC5378121 DOI: 10.1534/genetics.116.196154] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Accepted: 01/30/2017] [Indexed: 01/27/2023] Open
Abstract
A number of bacterial, archaeal, and eukaryotic species are known for their resistance to ionizing radiation. One of the challenges these species face is a potent environmental source of DNA double-strand breaks, potential drivers of genome structure evolution. Efficient and accurate DNA double-strand break repair systems have been demonstrated in several unrelated radiation-resistant species and are putative adaptations to the DNA damaging environment. Such adaptations are expected to compensate for the genome-destabilizing effect of environmental DNA damage and may be expected to result in a more conserved gene order in radiation-resistant species. However, here we show that rates of genome rearrangements, measured as loss of gene order conservation with time, are higher in radiation-resistant species in multiple, phylogenetically independent groups of bacteria. Comparison of indicators of selection for genome organization between radiation-resistant and phylogenetically matched, nonresistant species argues against tolerance to disruption of genome structure as a strategy for radiation resistance. Interestingly, an important mechanism affecting genome rearrangements in prokaryotes, the symmetrical inversions around the origin of DNA replication, shapes genome structure of both radiation-resistant and nonresistant species. In conclusion, the opposing effects of environmental DNA damage and DNA repair result in elevated rates of genome rearrangements in radiation-resistant bacteria.
Collapse
|
33
|
Kamminga T, Koehorst JJ, Vermeij P, Slagman SJ, Martins Dos Santos VAP, Bijlsma JJE, Schaap PJ. Persistence of Functional Protein Domains in Mycoplasma Species and their Role in Host Specificity and Synthetic Minimal Life. Front Cell Infect Microbiol 2017; 7:31. [PMID: 28224116 PMCID: PMC5293770 DOI: 10.3389/fcimb.2017.00031] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Accepted: 01/23/2017] [Indexed: 11/26/2022] Open
Abstract
Mycoplasmas are the smallest self-replicating organisms and obligate parasites of a specific vertebrate host. An in-depth analysis of the functional capabilities of mycoplasma species is fundamental to understand how some of simplest forms of life on Earth succeeded in subverting complex hosts with highly sophisticated immune systems. In this study we present a genome-scale comparison, focused on identification of functional protein domains, of 80 publically available mycoplasma genomes which were consistently re-annotated using a standardized annotation pipeline embedded in a semantic framework to keep track of the data provenance. We examined the pan- and core-domainome and studied predicted functional capability in relation to host specificity and phylogenetic distance. We show that the pan- and core-domainome of mycoplasma species is closed. A comparison with the proteome of the “minimal” synthetic bacterium JCVI-Syn3.0 allowed us to classify domains and proteins essential for minimal life. Many of those essential protein domains, essential Domains of Unknown Function (DUFs) and essential hypothetical proteins are not persistent across mycoplasma genomes suggesting that mycoplasma species support alternative domain configurations that bypass their essentiality. Based on the protein domain composition, we could separate mycoplasma species infecting blood and tissue. For selected genomes of tissue infecting mycoplasmas, we could also predict whether the host is ruminant, pig or human. Functionally closely related mycoplasma species, which have a highly similar protein domain repertoire, but different hosts could not be separated. This study provides a concise overview of the functional capabilities of mycoplasma species, which can be used as a basis to further understand host-pathogen interaction or to design synthetic minimal life.
Collapse
Affiliation(s)
- Tjerko Kamminga
- Laboratory of Systems and Synthetic Biology, Department of Agrotechnology and Food Sciences, Wageningen University and ResearchWageningen, Netherlands; Bioprocess Technology and Support, MSD Animal HealthBoxmeer, Netherlands
| | - Jasper J Koehorst
- Laboratory of Systems and Synthetic Biology, Department of Agrotechnology and Food Sciences, Wageningen University and Research Wageningen, Netherlands
| | - Paul Vermeij
- Discovery and Technology, MSD Animal Health Boxmeer, Netherlands
| | - Simen-Jan Slagman
- Bioprocess Technology and Support, MSD Animal Health Boxmeer, Netherlands
| | - Vitor A P Martins Dos Santos
- Laboratory of Systems and Synthetic Biology, Department of Agrotechnology and Food Sciences, Wageningen University and Research Wageningen, Netherlands
| | | | - Peter J Schaap
- Laboratory of Systems and Synthetic Biology, Department of Agrotechnology and Food Sciences, Wageningen University and Research Wageningen, Netherlands
| |
Collapse
|
34
|
de los Reyes P, Romero-Campero FJ, Ruiz MT, Romero JM, Valverde F. Evolution of Daily Gene Co-expression Patterns from Algae to Plants. FRONTIERS IN PLANT SCIENCE 2017; 8:1217. [PMID: 28751903 PMCID: PMC5508029 DOI: 10.3389/fpls.2017.01217] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Accepted: 06/28/2017] [Indexed: 05/04/2023]
Abstract
Daily rhythms play a key role in transcriptome regulation in plants and microalgae orchestrating responses that, among other processes, anticipate light transitions that are essential for their metabolism and development. The recent accumulation of genome-wide transcriptomic data generated under alternating light:dark periods from plants and microalgae has made possible integrative and comparative analysis that could contribute to shed light on the evolution of daily rhythms in the green lineage. In this work, RNA-seq and microarray data generated over 24 h periods in different light regimes from the eudicot Arabidopsis thaliana and the microalgae Chlamydomonas reinhardtii and Ostreococcus tauri have been integrated and analyzed using gene co-expression networks. This analysis revealed a reduction in the size of the daily rhythmic transcriptome from around 90% in Ostreococcus, being heavily influenced by light transitions, to around 40% in Arabidopsis, where a certain independence from light transitions can be observed. A novel Multiple Bidirectional Best Hit (MBBH) algorithm was applied to associate single genes with a family of potential orthologues from evolutionary distant species. Gene duplication, amplification and divergence of rhythmic expression profiles seems to have played a central role in the evolution of gene families in the green lineage such as Pseudo Response Regulators (PRRs), CONSTANS-Likes (COLs), and DNA-binding with One Finger (DOFs). Gene clustering and functional enrichment have been used to identify groups of genes with similar rhythmic gene expression patterns. The comparison of gene clusters between species based on potential orthologous relationships has unveiled a low to moderate level of conservation of daily rhythmic expression patterns. However, a strikingly high conservation was found for the gene clusters exhibiting their highest and/or lowest expression value during the light transitions.
Collapse
Affiliation(s)
- Pedro de los Reyes
- Plant Development Unit, Institute for Plant Biochemistry and Photosynthesis, Consejo Superior de Investigaciones Científicas, Universidad de SevillaSeville, Spain
| | - Francisco J. Romero-Campero
- Plant Development Unit, Institute for Plant Biochemistry and Photosynthesis, Consejo Superior de Investigaciones Científicas, Universidad de SevillaSeville, Spain
- Department of Computer Science and Artificial Intelligence, Universidad de SevillaSeville, Spain
| | - M. Teresa Ruiz
- Plant Development Unit, Institute for Plant Biochemistry and Photosynthesis, Consejo Superior de Investigaciones Científicas, Universidad de SevillaSeville, Spain
| | - José M. Romero
- Plant Development Unit, Institute for Plant Biochemistry and Photosynthesis, Consejo Superior de Investigaciones Científicas, Universidad de SevillaSeville, Spain
| | - Federico Valverde
- Plant Development Unit, Institute for Plant Biochemistry and Photosynthesis, Consejo Superior de Investigaciones Científicas, Universidad de SevillaSeville, Spain
- *Correspondence: Federico Valverde
| |
Collapse
|
35
|
Tao Y, Mace ES, Tai S, Cruickshank A, Campbell BC, Zhao X, Van Oosterom EJ, Godwin ID, Botella JR, Jordan DR. Whole-Genome Analysis of Candidate genes Associated with Seed Size and Weight in Sorghum bicolor Reveals Signatures of Artificial Selection and Insights into Parallel Domestication in Cereal Crops. FRONTIERS IN PLANT SCIENCE 2017. [PMID: 28769949 DOI: 10.3389/fp/s.2017.01237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Seed size and seed weight are major quality attributes and important determinants of yield that have been strongly selected for during crop domestication. Limited information is available about the genetic control and genes associated with seed size and weight in sorghum. This study identified sorghum orthologs of genes with proven effects on seed size and weight in other plant species and searched for evidence of selection during domestication by utilizing resequencing data from a diversity panel. In total, 114 seed size candidate genes were identified in sorghum, 63 of which exhibited signals of purifying selection during domestication. A significant number of these genes also had domestication signatures in maize and rice, consistent with the parallel domestication of seed size in cereals. Seed size candidate genes that exhibited differentially high expression levels in seed were also found more likely to be under selection during domestication, supporting the hypothesis that modification to seed size during domestication preferentially targeted genes for intrinsic seed size rather than genes associated with physiological factors involved in the carbohydrate supply and transport. Our results provide improved understanding of the complex genetic control of seed size and weight and the impact of domestication on these genes.
Collapse
Affiliation(s)
- Yongfu Tao
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandWarwick, QLD, Australia
| | - Emma S Mace
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandWarwick, QLD, Australia
- Department of Agriculture and Fisheries, Hermitage Research FacilityWarwick, QLD, Australia
| | | | - Alan Cruickshank
- Department of Agriculture and Fisheries, Hermitage Research FacilityWarwick, QLD, Australia
| | - Bradley C Campbell
- School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - Xianrong Zhao
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandWarwick, QLD, Australia
| | - Erik J Van Oosterom
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandBrisbane, QLD, Australia
| | - Ian D Godwin
- School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - Jose R Botella
- School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - David R Jordan
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandWarwick, QLD, Australia
| |
Collapse
|
36
|
Tao Y, Mace ES, Tai S, Cruickshank A, Campbell BC, Zhao X, Van Oosterom EJ, Godwin ID, Botella JR, Jordan DR. Whole-Genome Analysis of Candidate genes Associated with Seed Size and Weight in Sorghum bicolor Reveals Signatures of Artificial Selection and Insights into Parallel Domestication in Cereal Crops. FRONTIERS IN PLANT SCIENCE 2017; 8:1237. [PMID: 28769949 PMCID: PMC5513986 DOI: 10.3389/fpls.2017.01237] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 06/30/2017] [Indexed: 05/22/2023]
Abstract
Seed size and seed weight are major quality attributes and important determinants of yield that have been strongly selected for during crop domestication. Limited information is available about the genetic control and genes associated with seed size and weight in sorghum. This study identified sorghum orthologs of genes with proven effects on seed size and weight in other plant species and searched for evidence of selection during domestication by utilizing resequencing data from a diversity panel. In total, 114 seed size candidate genes were identified in sorghum, 63 of which exhibited signals of purifying selection during domestication. A significant number of these genes also had domestication signatures in maize and rice, consistent with the parallel domestication of seed size in cereals. Seed size candidate genes that exhibited differentially high expression levels in seed were also found more likely to be under selection during domestication, supporting the hypothesis that modification to seed size during domestication preferentially targeted genes for intrinsic seed size rather than genes associated with physiological factors involved in the carbohydrate supply and transport. Our results provide improved understanding of the complex genetic control of seed size and weight and the impact of domestication on these genes.
Collapse
Affiliation(s)
- Yongfu Tao
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandWarwick, QLD, Australia
- *Correspondence: Yongfu Tao
| | - Emma S. Mace
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandWarwick, QLD, Australia
- Department of Agriculture and Fisheries, Hermitage Research FacilityWarwick, QLD, Australia
- Emma S. Mace
| | | | - Alan Cruickshank
- Department of Agriculture and Fisheries, Hermitage Research FacilityWarwick, QLD, Australia
| | - Bradley C. Campbell
- School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - Xianrong Zhao
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandWarwick, QLD, Australia
| | - Erik J. Van Oosterom
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandBrisbane, QLD, Australia
| | - Ian D. Godwin
- School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - Jose R. Botella
- School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - David R. Jordan
- Queensland Alliance for Agriculture and Food Innovation, University of QueenslandWarwick, QLD, Australia
- David R. Jordan
| |
Collapse
|
37
|
Abstract
The two main species causing malaria in humans, Plasmodium falciparum and P. vivax, differ significantly from each other in their evolutionary response to common drugs, but the reasons for this are not clear. Here we utilized the recently available large-scale genome sequencing data from these parasites and compared the pattern of single nucleotide polymorphisms, which may be related to these differences. We found that there was a five-fold higher preference for AT nucleotides compared to GC nucleotides at synonymous single nucleotide polymorphism sites in P. vivax. The preference for AT nucleotides was also present at non-synonymous sites, which lead to amino acid changes favouring those with codons of higher AT content. The substitution bias was also present at low and moderately conserved amino acid positions, but not at highly conserved positions. No marked bias was found at synonymous and non-synonymous sites in P. falciparum. The difference in the substitution bias between P. falciparum and P. vivax found in the present study may possibly contribute to their divergent evolutionary response to similar drug pressures.
Collapse
|
38
|
Singh GP, Sharma A. South-East Asian strains of Plasmodium falciparum display higher ratio of non-synonymous to synonymous polymorphisms compared to African strains. F1000Res 2016; 5:1964. [PMID: 27853513 PMCID: PMC5089136 DOI: 10.12688/f1000research.9372.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/19/2016] [Indexed: 11/20/2022] Open
Abstract
Resistance to frontline anti-malarial drugs, including artemisinin, has repeatedly arisen in South-East Asia, but the reasons for this are not understood. Here we test whether evolutionary constraints on
Plasmodium falciparum strains from South-East Asia differ from African strains. We find a significantly higher ratio of non-synonymous to synonymous polymorphisms in
P. falciparum from South-East Asia compared to Africa, suggesting differences in the selective constraints on
P. falciparum genome in these geographical regions. Furthermore, South-East Asian strains showed a higher proportion of non-synonymous polymorphism at conserved positions, suggesting reduced negative selection. There was a lower rate of mixed infection by multiple genotypes in samples from South-East Asia compared to Africa. We propose that a lower mixed infection rate in South-East Asia reduces intra-host competition between the parasite clones, reducing the efficiency of natural selection. This might increase the probability of fixation of fitness-reducing mutations including drug resistant ones.
Collapse
Affiliation(s)
- Gajinder Pal Singh
- Molecular Medicine Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| | - Amit Sharma
- Molecular Medicine Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India
| |
Collapse
|
39
|
Alvarez-Ponce D, Sabater-Muñoz B, Toft C, Ruiz-González MX, Fares MA. Essentiality Is a Strong Determinant of Protein Rates of Evolution during Mutation Accumulation Experiments in Escherichia coli. Genome Biol Evol 2016; 8:2914-2927. [PMID: 27566759 PMCID: PMC5630975 DOI: 10.1093/gbe/evw205] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The Neutral Theory of Molecular Evolution is considered the most powerful theory to understand the evolutionary behavior of proteins. One of the main predictions of this theory is that essential proteins should evolve slower than dispensable ones owing to increased selective constraints. Comparison of genomes of different species, however, has revealed only small differences between the rates of evolution of essential and nonessential proteins. In some analyses, these differences vanish once confounding factors are controlled for, whereas in other cases essentiality seems to have an independent, albeit small, effect. It has been argued that comparing relatively distant genomes may entail a number of limitations. For instance, many of the genes that are dispensable in controlled lab conditions may be essential in some of the conditions faced in nature. Moreover, essentiality can change during evolution, and rates of protein evolution are simultaneously shaped by a variety of factors, whose individual effects are difficult to isolate. Here, we conducted two parallel mutation accumulation experiments in Escherichia coli, during 5,500–5,750 generations, and compared the genomes at different points of the experiments. Our approach (a short-term experiment, under highly controlled conditions) enabled us to overcome many of the limitations of previous studies. We observed that essential proteins evolved substantially slower than nonessential ones during our experiments. Strikingly, rates of protein evolution were only moderately affected by expression level and protein length.
Collapse
Affiliation(s)
| | - Beatriz Sabater-Muñoz
- Instituto de Biología Molecular y Celular de Plantas (CSIC-UPV), Valencia, Spain Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin, Ireland
| | - Christina Toft
- Department of Genetics, University of Valencia, Valencia, Spain Departamento de Biotecnología, Instituto de Agroquímica y Tecnología de los Alimentos (CSIC), Valencia, Spain
| | - Mario X Ruiz-González
- Instituto de Biología Molecular y Celular de Plantas (CSIC-UPV), Valencia, Spain Current Address: Secretaría de Educación Superior, Ciencia, Tecnología e Innovación, Proyecto Prometeo; Departamento de Ciencias Biológicas, Universidad Tócnica Particular de Loja, Loja, Ecuador
| | - Mario A Fares
- Instituto de Biología Molecular y Celular de Plantas (CSIC-UPV), Valencia, Spain Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
40
|
Phylogenomic networks reveal limited phylogenetic range of lateral gene transfer by transduction. ISME JOURNAL 2016; 11:543-554. [PMID: 27648812 PMCID: PMC5183456 DOI: 10.1038/ismej.2016.116] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Revised: 06/24/2016] [Accepted: 07/08/2016] [Indexed: 01/01/2023]
Abstract
Bacteriophages are recognized DNA vectors and transduction is considered as a common mechanism of lateral gene transfer (LGT) during microbial evolution. Anecdotal events of phage-mediated gene transfer were studied extensively, however, a coherent evolutionary viewpoint of LGT by transduction, its extent and characteristics, is still lacking. Here we report a large-scale evolutionary reconstruction of transduction events in 3982 genomes. We inferred 17 158 recent transduction events linking donors, phages and recipients into a phylogenomic transduction network view. We find that LGT by transduction is mostly restricted to closely related donors and recipients. Furthermore, a substantial number of the transduction events (9%) are best described as gene duplications that are mediated by mobile DNA vectors. We propose to distinguish this type of paralogy by the term autology. A comparison of donor and recipient genomes revealed that genome similarity is a superior predictor of species connectivity in the network in comparison to common habitat. This indicates that genetic similarity, rather than ecological opportunity, is a driver of successful transduction during microbial evolution. A striking difference in the connectivity pattern of donors and recipients shows that while lysogenic interactions are highly species-specific, the host range for lytic phage infections can be much wider, serving to connect dense clusters of closely related species. Our results thus demonstrate that DNA transfer via transduction occurs within the context of phage–host specificity, but that this tight constraint can be breached, on rare occasions, to produce long-range LGTs of profound evolutionary consequences.
Collapse
|
41
|
Harel A, Häggblom MM, Falkowski PG, Yee N. Evolution of prokaryotic respiratory molybdoenzymes and the frequency of their genomic co-occurrence. FEMS Microbiol Ecol 2016; 92:fiw187. [DOI: 10.1093/femsec/fiw187] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/05/2016] [Indexed: 02/03/2023] Open
|
42
|
Brown NM, Mueller RS, Shepardson JW, Landry ZC, Morré JT, Maier CS, Hardy FJ, Dreher TW. Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102. BMC Genomics 2016; 17:457. [PMID: 27296936 PMCID: PMC4907049 DOI: 10.1186/s12864-016-2738-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2015] [Accepted: 05/12/2016] [Indexed: 11/29/2022] Open
Abstract
Background Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture. Results The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90. Conclusion Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2738-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nathan M Brown
- Department of Microbiology, Oregon State University, 226 Nash Hall, Corvallis, 97331, OR, USA
| | - Ryan S Mueller
- Department of Microbiology, Oregon State University, 226 Nash Hall, Corvallis, 97331, OR, USA
| | - Jonathan W Shepardson
- Department of Microbiology, Oregon State University, 226 Nash Hall, Corvallis, 97331, OR, USA
| | - Zachary C Landry
- Department of Microbiology, Oregon State University, 226 Nash Hall, Corvallis, 97331, OR, USA
| | - Jeffrey T Morré
- Department of Chemistry, Oregon State University, 153 Gilbert Hall, Corvallis, 97331, OR, USA
| | - Claudia S Maier
- Department of Chemistry, Oregon State University, 153 Gilbert Hall, Corvallis, 97331, OR, USA
| | - F Joan Hardy
- Office of Environmental Public Health Sciences, Washington State Department of Health, Olympia, 98504, WA, USA
| | - Theo W Dreher
- Department of Microbiology, Oregon State University, 226 Nash Hall, Corvallis, 97331, OR, USA. .,Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR, USA.
| |
Collapse
|
43
|
Kadibalban AS, Bogumil D, Landan G, Dagan T. DnaK-Dependent Accelerated Evolutionary Rate in Prokaryotes. Genome Biol Evol 2016; 8:1590-9. [PMID: 27189986 PMCID: PMC4898814 DOI: 10.1093/gbe/evw102] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Many proteins depend on an interaction with molecular chaperones in order to fold into a functional tertiary structure. Previous studies showed that protein interaction with the GroEL/GroES chaperonine and Hsp90 chaperone can buffer the impact of slightly deleterious mutations in the protein sequence. This capacity of GroEL/GroES to prevent protein misfolding has been shown to accelerate the evolution of its client proteins. Whether other bacterial chaperones have a similar effect on their client proteins is currently unknown. Here, we study the impact of DnaK (Hsp70) chaperone on the evolution of its client proteins. Evolutionary parameters were derived from comparison of the Escherichia coli proteome to 1,808,565 orthologous proteins in 1,149 proteobacterial genomes. Our analysis reveals a significant positive correlation between protein binding frequency with DnaK and evolutionary rate. Proteins with high binding affinity to DnaK evolve on average 4.3-fold faster than proteins in the lowest binding affinity class at the genus resolution. Differences in evolutionary rates of DnaK interactor classes are still significant after adjusting for possible effects caused by protein expression level. Furthermore, we observe an additive effect of DnaK and GroEL chaperones on the evolutionary rates of their common interactors. Finally, we found pronounced similarities in the physicochemical profiles that characterize proteins belonging to DnaK and GroEL interactomes. Our results thus implicate DnaK-mediated folding as a major component in shaping protein evolutionary dynamics in bacteria and supply further evidence for the long-term manifestation of chaperone-mediated folding on genome evolution.
Collapse
Affiliation(s)
- A Samer Kadibalban
- Institute of General Microbiology, Christian-Albrechts Universtiy of Kiel, Kiel, Germany
| | - David Bogumil
- Present address: The Department of Life Sciences & the National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Giddy Landan
- Institute of General Microbiology, Christian-Albrechts Universtiy of Kiel, Kiel, Germany
| | - Tal Dagan
- Institute of General Microbiology, Christian-Albrechts Universtiy of Kiel, Kiel, Germany
| |
Collapse
|
44
|
Standardized benchmarking in the quest for orthologs. Nat Methods 2016; 13:425-30. [PMID: 27043882 PMCID: PMC4827703 DOI: 10.1038/nmeth.3830] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/09/2016] [Indexed: 11/23/2022]
Abstract
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision–recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.
Collapse
|
45
|
Comparative genomics reveals new evolutionary and ecological patterns of selenium utilization in bacteria. ISME JOURNAL 2016; 10:2048-59. [PMID: 26800233 PMCID: PMC5029168 DOI: 10.1038/ismej.2015.246] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 10/28/2015] [Accepted: 11/27/2015] [Indexed: 12/15/2022]
Abstract
Selenium (Se) is an important micronutrient for many organisms, which is required for the biosynthesis of selenocysteine, selenouridine and Se-containing cofactor. Several key genes involved in different Se utilization traits have been characterized; however, systematic studies on the evolution and ecological niches of Se utilization are very limited. Here, we analyzed more than 5200 sequenced organisms to examine the occurrence patterns of all Se traits in bacteria. A global species map of all Se utilization pathways has been generated, which demonstrates the most detailed understanding of Se utilization in bacteria so far. In addition, the selenophosphate synthetase gene, which is used to define the overall Se utilization, was also detected in some organisms that do not have any of the known Se traits, implying the presence of a novel Se form in this domain. Phylogenetic analyses of components of different Se utilization traits revealed new horizontal gene transfer events for each of them. Moreover, by characterizing the selenoproteomes of all organisms, we found a new selenoprotein-rich phylum and additional selenoprotein-rich species. Finally, the relationship between ecological environments and Se utilization was investigated and further verified by metagenomic analysis of environmental samples, which indicates new macroevolutionary trends of each Se utilization trait in bacteria. Our data provide insights into the general features of Se utilization in bacteria and should be useful for a further understanding of the evolutionary dynamics of Se utilization in nature.
Collapse
|
46
|
Matelska D, Kurkowska M, Purta E, Bujnicki JM, Dunin-Horkawicz S. Loss of Conserved Noncoding RNAs in Genomes of Bacterial Endosymbionts. Genome Biol Evol 2016; 8:426-38. [PMID: 26782934 PMCID: PMC4779614 DOI: 10.1093/gbe/evw007] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The genomes of intracellular symbiotic or pathogenic bacteria, such as of Buchnera, Mycoplasma, and Rickettsia, are typically smaller compared with their free-living counterparts. Here we showed that noncoding RNA (ncRNA) families, which are conserved in free-living bacteria, frequently could not be detected by computational methods in the small genomes. Statistical tests demonstrated that their absence is not an artifact of low GC content or small deletions in these small genomes, and thus it was indicative of an independent loss of ncRNAs in different endosymbiotic lineages. By analyzing the synteny (conservation of gene order) between the reduced and nonreduced genomes, we revealed instances of protein-coding genes that were preserved in the reduced genomes but lost cis-regulatory elements. We found that the loss of cis-regulatory ncRNA sequences, which regulate the expression of cognate protein-coding genes, is characterized by the reduction of secondary structure formation propensity, GC content, and length of the corresponding genomic regions.
Collapse
Affiliation(s)
- Dorota Matelska
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Malgorzata Kurkowska
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Elzbieta Purta
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland Laboratory of Structural Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Poznan, Poland
| | - Stanislaw Dunin-Horkawicz
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland
| |
Collapse
|
47
|
Hooper CM, Castleden IR, Aryamanesh N, Jacoby RP, Millar AH. Finding the Subcellular Location of Barley, Wheat, Rice and Maize Proteins: The Compendium of Crop Proteins with Annotated Locations (cropPAL). PLANT & CELL PHYSIOLOGY 2016; 57:e9. [PMID: 26556651 DOI: 10.1093/pcp/pcv170] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 10/27/2015] [Indexed: 05/10/2023]
Abstract
Barley, wheat, rice and maize provide the bulk of human nutrition and have extensive industrial use as agricultural products. The genomes of these crops each contains >40,000 genes encoding proteins; however, the major genome databases for these species lack annotation information of protein subcellular location for >80% of these gene products. We address this gap, by constructing the compendium of crop protein subcellular locations called crop Proteins with Annotated Locations (cropPAL). Subcellular location is most commonly determined by fluorescent protein tagging of live cells or mass spectrometry detection in subcellular purifications, but can also be predicted from amino acid sequence or protein expression patterns. The cropPAL database collates 556 published studies, from >300 research institutes in >30 countries that have been previously published, as well as compiling eight pre-computed subcellular predictions for all Hordeum vulgare, Triticum aestivum, Oryza sativa and Zea mays protein sequences. The data collection including metadata for proteins and published studies can be accessed through a search portal http://crop-PAL.org. The subcellular localization information housed in cropPAL helps to depict plant cells as compartmentalized protein networks that can be investigated for improving crop yield and quality, and developing new biotechnological solutions to agricultural challenges.
Collapse
Affiliation(s)
- Cornelia M Hooper
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| | - Ian R Castleden
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| | - Nader Aryamanesh
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| | - Richard P Jacoby
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| | - A Harvey Millar
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia
| |
Collapse
|
48
|
Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
49
|
Lehmann JS, Corey VC, Ricaldi JN, Vinetz JM, Winzeler EA, Matthias MA. Whole Genome Shotgun Sequencing Shows Selection on Leptospira Regulatory Proteins During in vitro Culture Attenuation. Am J Trop Med Hyg 2015; 94:302-313. [PMID: 26711524 PMCID: PMC4751964 DOI: 10.4269/ajtmh.15-0401] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Accepted: 09/08/2015] [Indexed: 12/29/2022] Open
Abstract
Leptospirosis is the most common zoonotic disease worldwide with an estimated 500,000 severe cases reported annually, and case fatality rates of 12–25%, due primarily to acute kidney and lung injuries. Despite its prevalence, the molecular mechanisms underlying leptospirosis pathogenesis remain poorly understood. To identify virulence-related genes in Leptospira interrogans, we delineated cumulative genome changes that occurred during serial in vitro passage of a highly virulent strain of L. interrogans serovar Lai into a nearly avirulent isogenic derivative. Comparison of protein coding and computationally predicted noncoding RNA (ncRNA) genes between these two polyclonal strains identified 15 nonsynonymous single nucleotide variant (nsSNV) alleles that increased in frequency and 19 that decreased, whereas no changes in allelic frequency were observed among the ncRNA genes. Some of the nsSNV alleles were in six genes shown previously to be transcriptionally upregulated during exposure to in vivo-like conditions. Five of these nsSNVs were in evolutionarily conserved positions in genes related to signal transduction and metabolism. Frequency changes of minor nsSNV alleles identified in this study likely contributed to the loss of virulence during serial in vitro culture. The identification of new virulence-associated genes should spur additional experimental inquiry into their potential role in Leptospira pathogenesis.
Collapse
Affiliation(s)
| | | | | | | | | | - Michael A. Matthias
- *Address correspondence to Michael A. Matthias, Department of Medicine, Division of Infectious Diseases, School of Medicine, University of California, San Diego School of Medicine, 9500 Gilman Drive, BRF 2, Room 4A15, La Jolla, CA 92093-0760. E-mail:
| |
Collapse
|
50
|
Jaakkola K, Somervuo P, Korkeala H. Comparative Genomic Hybridization Analysis of Yersinia enterocolitica and Yersinia pseudotuberculosis Identifies Genetic Traits to Elucidate Their Different Ecologies. BIOMED RESEARCH INTERNATIONAL 2015; 2015:760494. [PMID: 26605338 PMCID: PMC4641178 DOI: 10.1155/2015/760494] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 09/28/2015] [Indexed: 12/22/2022]
Abstract
Enteropathogenic Yersinia enterocolitica and Yersinia pseudotuberculosis are both etiological agents for intestinal infection known as yersiniosis, but their epidemiology and ecology bear many differences. Swine are the only known reservoir for Y. enterocolitica 4/O:3 strains, which are the most common cause of human disease, while Y. pseudotuberculosis has been isolated from a variety of sources, including vegetables and wild animals. Infections caused by Y. enterocolitica mainly originate from swine, but fresh produce has been the source for widespread Y. pseudotuberculosis outbreaks within recent decades. A comparative genomic hybridization analysis with a DNA microarray based on three Yersinia enterocolitica and four Yersinia pseudotuberculosis genomes was conducted to shed light on the genomic differences between enteropathogenic Yersinia. The hybridization results identified Y. pseudotuberculosis strains to carry operons linked with the uptake and utilization of substances not found in living animal tissues but present in soil, plants, and rotting flesh. Y. pseudotuberculosis also harbors a selection of type VI secretion systems targeting other bacteria and eukaryotic cells. These genetic traits are not found in Y. enterocolitica, and it appears that while Y. pseudotuberculosis has many tools beneficial for survival in varied environments, the Y. enterocolitica genome is more streamlined and adapted to their preferred animal reservoir.
Collapse
Affiliation(s)
- Kaisa Jaakkola
- Department of Food Hygiene and Environmental Health, University of Helsinki, P.O. Box 66, 00014 Helsinki, Finland
| | - Panu Somervuo
- Department of Food Hygiene and Environmental Health, University of Helsinki, P.O. Box 66, 00014 Helsinki, Finland
| | - Hannu Korkeala
- Department of Food Hygiene and Environmental Health, University of Helsinki, P.O. Box 66, 00014 Helsinki, Finland
| |
Collapse
|