1
|
Smug BJ, Szczepaniak K, Rocha EPC, Dunin-Horkawicz S, Mostowy RJ. Ongoing shuffling of protein fragments diversifies core viral functions linked to interactions with bacterial hosts. Nat Commun 2023; 14:7460. [PMID: 38016962 PMCID: PMC10684548 DOI: 10.1038/s41467-023-43236-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 11/03/2023] [Indexed: 11/30/2023] Open
Abstract
Biological modularity enhances evolutionary adaptability. This principle is vividly exemplified by bacterial viruses (phages), which display extensive genomic modularity. Phage genomes are composed of independent functional modules that evolve separately and recombine in various configurations. While genomic modularity in phages has been extensively studied, less attention has been paid to protein modularity-proteins consisting of distinct building blocks that can evolve and recombine, enhancing functional and genetic diversity. Here, we use a set of 133,574 representative phage proteins and highly sensitive homology detection to capture instances of domain mosaicism, defined as fragment sharing between two otherwise unrelated proteins, and to understand its relationship with functional diversity in phage genomes. We discover that unrelated proteins from diverse functional classes frequently share homologous domains. This phenomenon is particularly pronounced within receptor-binding proteins, endolysins, and DNA polymerases. We also identify multiple instances of recent diversification via domain shuffling in receptor-binding proteins, neck passage structures, endolysins and some members of the core replication machinery, often transcending distant taxonomic and ecological boundaries. Our findings suggest that ongoing diversification via domain shuffling is reflective of a co-evolutionary arms race, driven by the need to overcome various bacterial resistance mechanisms against phages.
Collapse
Affiliation(s)
- Bogna J Smug
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | | | - Eduardo P C Rocha
- Institut Pasteur, Université Paris Cité, CNRS UMR3525, Microbial Evolutionary Genomics, Paris, France
| | - Stanislaw Dunin-Horkawicz
- Institute of Evolutionary Biology, Faculty of Biology & Biological and Chemical Research Centre, University of Warsaw, Żwirki i Wigury 101, 02-089, Warsaw, Poland
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Rafał J Mostowy
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
| |
Collapse
|
2
|
Manrubia S. The simple emergence of complex molecular function. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2022; 380:20200422. [PMID: 35599566 DOI: 10.1098/rsta.2020.0422] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
At odds with a traditional view of molecular evolution that seeks a descent-with-modification relationship between functional sequences, new functions can emerge de novo with relative ease. At early times of molecular evolution, random polymers could have sufficed for the appearance of incipient chemical activity, while the cellular environment harbours a myriad of proto-functional molecules. The emergence of function is facilitated by several mechanisms intrinsic to molecular organization, such as redundant mapping of sequences into structures, phenotypic plasticity, modularity or cooperative associations between genomic sequences. It is the availability of niches in the molecular ecology that filters new potentially functional proposals. New phenotypes and subsequent levels of molecular complexity could be attained through combinatorial explorations of currently available molecular variants. Natural selection does the rest. This article is part of the theme issue 'Emergent phenomena in complex physical and socio-technical systems: from cells to societies'.
Collapse
Affiliation(s)
- Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Systems Biology Department, National Biotechnology Centre (CSIC), c/Darwin 3, 28049 Madrid, Spain
| |
Collapse
|
3
|
Watson AK, Lopez P, Bapteste E. Hundreds of out-of-frame remodelled gene families in the E. coli pangenome. Mol Biol Evol 2021; 39:6430988. [PMID: 34792602 PMCID: PMC8788219 DOI: 10.1093/molbev/msab329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.
Collapse
Affiliation(s)
- Andrew K Watson
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Philippe Lopez
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Eric Bapteste
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| |
Collapse
|
4
|
Ou Y, McInerney JO. Eukaryote Genes Are More Likely than Prokaryote Genes to Be Composites. Genes (Basel) 2019; 10:genes10090648. [PMID: 31466252 PMCID: PMC6769587 DOI: 10.3390/genes10090648] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 08/18/2019] [Accepted: 08/23/2019] [Indexed: 12/27/2022] Open
Abstract
The formation of new genes by combining parts of existing genes is an important evolutionary process. Remodelled genes, which we call composites, have been investigated in many species, however, their distribution across all of life is still unknown. We set out to examine the extent to which genomes from cells and mobile genetic elements contain composite genes. We identify composite genes as those that show partial homology to at least two unrelated component genes. In order to identify composite and component genes, we constructed sequence similarity networks (SSNs) of more than one million genes from all three domains of life, as well as viruses and plasmids. We identified non-transitive triplets of nodes in this network and explored the homology relationships in these triplets to see if the middle nodes were indeed composite genes. In total, we identified 221,043 (18.57%) composites genes, which were distributed across all genomic and functional categories. In particular, the presence of composite genes is statistically more likely in eukaryotes than prokaryotes.
Collapse
Affiliation(s)
- Yaqing Ou
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, UK.
| | - James O McInerney
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, UK.
- School of Life Sciences, University of Nottingham, Nottingham NG7 2UH, UK.
| |
Collapse
|
5
|
Krupovic M, Makarova KS, Wolf YI, Medvedeva S, Prangishvili D, Forterre P, Koonin EV. Integrated mobile genetic elements in Thaumarchaeota. Environ Microbiol 2019; 21:2056-2078. [PMID: 30773816 PMCID: PMC6563490 DOI: 10.1111/1462-2920.14564] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 02/10/2019] [Accepted: 02/13/2019] [Indexed: 12/20/2022]
Abstract
To explore the diversity of mobile genetic elements (MGE) associated with archaea of the phylum Thaumarchaeota, we exploited the property of most MGE to integrate into the genomes of their hosts. Integrated MGE (iMGE) were identified in 20 thaumarchaeal genomes amounting to 2 Mbp of mobile thaumarchaeal DNA. These iMGE group into five major classes: (i) proviruses, (ii) casposons, (iii) insertion sequence-like transposons, (iv) integrative-conjugative elements and (v) cryptic integrated elements. The majority of the iMGE belong to the latter category and might represent novel families of viruses or plasmids. The identified proviruses are related to tailed viruses of the order Caudovirales and to tailless icosahedral viruses with the double jelly-roll capsid proteins. The thaumarchaeal iMGE are all connected within a gene sharing network, highlighting pervasive gene exchange between MGE occupying the same ecological niche. The thaumarchaeal mobilome carries multiple auxiliary metabolic genes, including multicopper oxidases and ammonia monooxygenase subunit C (AmoC), and stress response genes, such as those for universal stress response proteins (UspA). Thus, iMGE might make important contributions to the fitness and adaptation of their hosts. We identified several iMGE carrying type I-B CRISPR-Cas systems and spacers matching other thaumarchaeal iMGE, suggesting antagonistic interactions between coexisting MGE and symbiotic relationships with the ir archaeal hosts.
Collapse
Affiliation(s)
- Mart Krupovic
- Institut Pasteur, Unité Biologie Moléculaire du Gène chez les Extrêmophiles, 75015, Paris, France
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Sofia Medvedeva
- Institut Pasteur, Unité Biologie Moléculaire du Gène chez les Extrêmophiles, 75015, Paris, France.,Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, Russia.,Sorbonne Université, Collège doctoral, 75005, Paris, France
| | - David Prangishvili
- Institut Pasteur, Unité Biologie Moléculaire du Gène chez les Extrêmophiles, 75015, Paris, France
| | - Patrick Forterre
- Institut Pasteur, Unité Biologie Moléculaire du Gène chez les Extrêmophiles, 75015, Paris, France.,Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris- Sud, Université Paris-Saclay, Gif-sur-Yvette cedex, Paris, France
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| |
Collapse
|
6
|
Corel E, Méheust R, Watson AK, McInerney JO, Lopez P, Bapteste E. Bipartite Network Analysis of Gene Sharings in the Microbial World. Mol Biol Evol 2019; 35:899-913. [PMID: 29346651 PMCID: PMC5888944 DOI: 10.1093/molbev/msy001] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Extensive microbial gene flows affect how we understand virology, microbiology, medical sciences, genetic modification, and evolutionary biology. Phylogenies only provide a narrow view of these gene flows: plasmids and viruses, lacking core genes, cannot be attached to cellular life on phylogenetic trees. Yet viruses and plasmids have a major impact on cellular evolution, affecting both the gene content and the dynamics of microbial communities. Using bipartite graphs that connect up to 149,000 clusters of homologous genes with 8,217 related and unrelated genomes, we can in particular show patterns of gene sharing that do not map neatly with the organismal phylogeny. Homologous genes are recycled by lateral gene transfer, and multiple copies of homologous genes are carried by otherwise completely unrelated (and possibly nested) genomes, that is, viruses, plasmids and prokaryotes. When a homologous gene is present on at least one plasmid or virus and at least one chromosome, a process of "gene externalization," affected by a postprocessed selected functional bias, takes place, especially in Bacteria. Bipartite graphs give us a view of vertical and horizontal gene flow beyond classic taxonomy on a single very large, analytically tractable, graph that goes beyond the cellular Web of Life.
Collapse
Affiliation(s)
- Eduardo Corel
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - Raphaël Méheust
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - Andrew K Watson
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - James O McInerney
- Chair in Evolutionary Biology, The University of Manchester, United Kingdom
| | - Philippe Lopez
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - Eric Bapteste
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| |
Collapse
|
7
|
Pathmanathan JS, Lopez P, Lapointe FJ, Bapteste E. CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection. Mol Biol Evol 2019; 35:252-255. [PMID: 29092069 PMCID: PMC5850286 DOI: 10.1093/molbev/msx283] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data.
Collapse
Affiliation(s)
| | - Philippe Lopez
- Institut de Biologie Paris-Seine (IBPS), UPMC Université Paris 06, Sorbonne Universités, Paris, France
| | | | - Eric Bapteste
- Institut de Biologie Paris-Seine (IBPS), UPMC Université Paris 06, Sorbonne Universités, Paris, France
| |
Collapse
|
8
|
Watson AK, Lannes R, Pathmanathan JS, Méheust R, Karkar S, Colson P, Corel E, Lopez P, Bapteste E. The Methodology Behind Network Thinking: Graphs to Analyze Microbial Complexity and Evolution. Methods Mol Biol 2019; 1910:271-308. [PMID: 31278668 DOI: 10.1007/978-1-4939-9074-0_9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In the post genomic era, large and complex molecular datasets from genome and metagenome sequencing projects expand the limits of what is possible for bioinformatic analyses. Network-based methods are increasingly used to complement phylogenetic analysis in studies in molecular evolution, including comparative genomics, classification, and ecological studies. Using network methods, the vertical and horizontal relationships between all genes or genomes, whether they are from cellular chromosomes or mobile genetic elements, can be explored in a single expandable graph. In recent years, development of new methods for the construction and analysis of networks has helped to broaden the availability of these approaches from programmers to a diversity of users. This chapter introduces the different kinds of networks based on sequence similarity that are already available to tackle a wide range of biological questions, including sequence similarity networks, gene-sharing networks and bipartite graphs, and a guide for their construction and analyses.
Collapse
Affiliation(s)
- Andrew K Watson
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Romain Lannes
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Jananan S Pathmanathan
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Raphaël Méheust
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Slim Karkar
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
- Department of Ecology, Evolution, and Natural Resources, School of Environmental and Biological Sciences, Rutgers, The State University of NJ, New Brunswick, NJ, USA
| | - Philippe Colson
- Fondation Institut Hospitalo-Universitaire Méditerranée Infection, Pôle des Maladies Infectieuses et Tropicales Clinique et Biologique, Fédération de Bactériologie-Hygiène-Virologie, Centre Hospitalo-Universitaire Tione, Assistance Publique-Hôpitaux de Marseille, Marseille, France
- Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes (URMITE) UM63, CNRS 7278, IRD 198, INSERM U1095, Aix-Marseille University, Marseille, France
| | - Eduardo Corel
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Philippe Lopez
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Eric Bapteste
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France.
| |
Collapse
|
9
|
Kharchenko EP. OCCURRENCE OF SMALL HOMOLOGOUS AND COMPLEMENTARY FRAGMENTS IN HUMAN VIRUS GENOMES AND THEIR POSSIBLE ROLE. RUSSIAN JOURNAL OF INFECTION AND IMMUNITY 2018. [DOI: 10.15789/2220-7619-2017-4-393-404] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
With computer analysis occurrence of small homologous and complementary fragments (21 nucleotides in length) has been studied in genomes of 14 human viruses causing most dangerous infections. The sample includes viruses with (+) and (–) single stranded RNA and DNA-containing hepatitis A virus. Analysis of occurrence of homologous sequences has shown the existence two extreme situations. On the one hand, the same virus contains homologous sequences to almost all other viruses (for example, Ebola virus, severe acute respiratory syndrome-related coronavirus, and mumps virus), and numerous homologous sequences to the same other virus (especially in severe acute respiratory syndrome-related coronavirus to Dengue virus and in Ebola virus to poliovirus). On the other hand, there are rare occurrence and not numerous homologous sequences in genomes of other viruses (rubella virus, hepatitis A virus, and hepatitis B virus). Similar situation exists for occurrence of complementary sequences. Rubella virus, the genome of which has the high content of guanine and cytosine, has no complementary sequences to almost all other viruses. Most viruses have moderate level of occurrence for homologous and complementary sequences. Autocomplementary sequences are numerous in most viruses and one may suggest that the genome of single stranded RNA viruses has branched secondary structure. In addition to possible role in recombination among strains autocomplementary sequences could be regulators of translation rate of virus proteins and determine its optimal proportion in virion assembly with genome and mRNA folding. Occurrence of small homologous and complementary sequences in RNA- and DNA-containing viruses may be the result of multiple recombinations in the past and the present and determine their adaptation and variability. Recombination may take place in coinfection of human and/or common hosts. Inclusion of homologous and complementary sequences into genome could not only renew viruses but also serve as memory of existence of a competitor for host and means of counteraction against a competitor in coinfection being an analogy of the bacterial CRISPR/Cas system.
Collapse
|
10
|
Shapiro JA. Living Organisms Author Their Read-Write Genomes in Evolution. BIOLOGY 2017; 6:E42. [PMID: 29211049 PMCID: PMC5745447 DOI: 10.3390/biology6040042] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 11/17/2017] [Accepted: 11/28/2017] [Indexed: 12/18/2022]
Abstract
Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with "non-coding" DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called "non-coding" RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.
Collapse
Affiliation(s)
- James A Shapiro
- Department of Biochemistry and Molecular Biology, University of Chicago GCIS W123B, 979 E. 57th Street, Chicago, IL 60637, USA.
| |
Collapse
|
11
|
The "Giant Virus Finder" discovers an abundance of giant viruses in the Antarctic dry valleys. Arch Virol 2017; 162:1671-1676. [PMID: 28247094 DOI: 10.1007/s00705-017-3286-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 12/02/2016] [Indexed: 12/20/2022]
Abstract
Mimivirus was identified in 2003 from a biofilm of an industrial water-cooling tower in England. Later, numerous new giant viruses were found in oceans and freshwater habitats, some of them having 2,500 genes. We have demonstrated their likely presence in four soil samples taken from the Kutch Desert (Gujarat, India). Here we describe a bioinformatics work-flow, called the "Giant Virus Finder" that is capable of discovering the likely presence of the genomes of giant viruses in metagenomic shotgun-sequenced datasets. The new workflow is applied to numerous hot and cold desert soil samples as well as some tundra- and forest soils. We show that most of these samples contain giant viruses, especially in the Antarctic dry valleys. The results imply that giant viruses could be frequent not only in aqueous habitats, but in a wide spectrum of soils on our planet.
Collapse
|
12
|
Abstract
UNLABELLED Virus genomes are prone to extensive gene loss, gain, and exchange and share no universal genes. Therefore, in a broad-scale study of virus evolution, gene and genome network analyses can complement traditional phylogenetics. We performed an exhaustive comparative analysis of the genomes of double-stranded DNA (dsDNA) viruses by using the bipartite network approach and found a robust hierarchical modularity in the dsDNA virosphere. Bipartite networks consist of two classes of nodes, with nodes in one class, in this case genomes, being connected via nodes of the second class, in this case genes. Such a network can be partitioned into modules that combine nodes from both classes. The bipartite network of dsDNA viruses includes 19 modules that form 5 major and 3 minor supermodules. Of these modules, 11 include tailed bacteriophages, reflecting the diversity of this largest group of viruses. The module analysis quantitatively validates and refines previously proposed nontrivial evolutionary relationships. An expansive supermodule combines the large and giant viruses of the putative order "Megavirales" with diverse moderate-sized viruses and related mobile elements. All viruses in this supermodule share a distinct morphogenetic tool kit with a double jelly roll major capsid protein. Herpesviruses and tailed bacteriophages comprise another supermodule, held together by a distinct set of morphogenetic proteins centered on the HK97-like major capsid protein. Together, these two supermodules cover the great majority of currently known dsDNA viruses. We formally identify a set of 14 viral hallmark genes that comprise the hubs of the network and account for most of the intermodule connections. IMPORTANCE Viruses and related mobile genetic elements are the dominant biological entities on earth, but their evolution is not sufficiently understood and their classification is not adequately developed. The key reason is the characteristic high rate of virus evolution that involves not only sequence change but also extensive gene loss, gain, and exchange. Therefore, in the study of virus evolution on a large scale, traditional phylogenetic approaches have limited applicability and have to be complemented by gene and genome network analyses. We applied state-of-the art methods of such analysis to reveal robust hierarchical modularity in the genomes of double-stranded DNA viruses. Some of the identified modules combine highly diverse viruses infecting bacteria, archaea, and eukaryotes, in support of previous hypotheses on direct evolutionary relationships between viruses from the three domains of cellular life. We formally identify a set of 14 viral hallmark genes that hold together the genomic network.
Collapse
|
13
|
Protein networks identify novel symbiogenetic genes resulting from plastid endosymbiosis. Proc Natl Acad Sci U S A 2016; 113:3579-84. [PMID: 26976593 DOI: 10.1073/pnas.1517551113] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The integration of foreign genetic information is central to the evolution of eukaryotes, as has been demonstrated for the origin of the Calvin cycle and of the heme and carotenoid biosynthesis pathways in algae and plants. For photosynthetic lineages, this coordination involved three genomes of divergent phylogenetic origins (the nucleus, plastid, and mitochondrion). Major hurdles overcome by the ancestor of these lineages were harnessing the oxygen-evolving organelle, optimizing the use of light, and stabilizing the partnership between the plastid endosymbiont and host through retargeting of proteins to the nascent organelle. Here we used protein similarity networks that can disentangle reticulate gene histories to explore how these significant challenges were met. We discovered a previously hidden component of algal and plant nuclear genomes that originated from the plastid endosymbiont: symbiogenetic genes (S genes). These composite proteins, exclusive to photosynthetic eukaryotes, encode a cyanobacterium-derived domain fused to one of cyanobacterial or another prokaryotic origin and have emerged multiple, independent times during evolution. Transcriptome data demonstrate the existence and expression of S genes across a wide swath of algae and plants, and functional data indicate their involvement in tolerance to oxidative stress, phototropism, and adaptation to nitrogen limitation. Our research demonstrates the "recycling" of genetic information by photosynthetic eukaryotes to generate novel composite genes, many of which function in plastid maintenance.
Collapse
|
14
|
Hatzopoulos T, Watkins SC, Putonti C. PhagePhisher: a pipeline for the discovery of covert viral sequences in complex genomic datasets. Microb Genom 2016; 2:e000053. [PMID: 28348848 PMCID: PMC5320576 DOI: 10.1099/mgen.0.000053] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 01/21/2016] [Indexed: 01/09/2023] Open
Abstract
Obtaining meaningful viral information from large sequencing datasets presents unique challenges distinct from prokaryotic and eukaryotic sequencing efforts. The difficulties surrounding this issue can be ascribed in part to the genomic plasticity of viruses themselves as well as the scarcity of existing information in genomic databases. The open-source software PhagePhisher (http://www.putonti-lab.com/phagephisher) has been designed as a simple pipeline to extract relevant information from complex and mixed datasets, and will improve the examination of bacteriophages, viruses, and virally related sequences, in a range of environments. Key aspects of the software include speed and ease of use; PhagePhisher can be used with limited operator knowledge of bioinformatics on a standard workstation. As a proof-of-concept, PhagePhisher was successfully implemented with bacteria–virus mixed samples of varying complexity. Furthermore, viral signals within microbial metagenomic datasets were easily and quickly identified by PhagePhisher, including those from prophages as well as lysogenic phages, an important and often neglected aspect of examining phage populations in the environment. PhagePhisher resolves viral-related sequences which may be obscured by or imbedded in bacterial genomes.
Collapse
|
15
|
Network-Thinking: Graphs to Analyze Microbial Complexity and Evolution. Trends Microbiol 2016; 24:224-237. [PMID: 26774999 PMCID: PMC4766943 DOI: 10.1016/j.tim.2015.12.003] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 12/02/2015] [Accepted: 12/08/2015] [Indexed: 01/23/2023]
Abstract
The tree model and tree-based methods have played a major, fruitful role in evolutionary studies. However, with the increasing realization of the quantitative and qualitative importance of reticulate evolutionary processes, affecting all levels of biological organization, complementary network-based models and methods are now flourishing, inviting evolutionary biology to experience a network-thinking era. We show how relatively recent comers in this field of study, that is, sequence-similarity networks, genome networks, and gene families–genomes bipartite graphs, already allow for a significantly enhanced usage of molecular datasets in comparative studies. Analyses of these networks provide tools for tackling a multitude of complex phenomena, including the evolution of gene transfer, composite genes and genomes, evolutionary transitions, and holobionts. Introgressive processes shape the microbial world at all levels of organisation. This reticulated evolution is increasingly studied by sequence-similarity networks. They provide an inclusive accurate multilevel framework to study the web of life. Networks enhance analyses of microbial genes, genomes, communities, and of symbiosis.
Collapse
|
16
|
Abstract
Viruses are notorious for rapidly exchanging genetic information between close relatives and with the host cells they infect. This exchange has profound effects on the nature and rapidity of virus and host evolution. Recombination between dsDNA viruses is common, as is genetic exchange between dsDNA viruses or retroviruses and host genomes. Recombination between RNA virus genomes is also well known. In contrast, genetic exchange across viral kingdoms, for instance between nonretroviral RNA viruses or ssDNA viruses and host genomes or between RNA and DNA viruses, was previously thought to be practically nonexistent. However, there is now growing evidence for both RNA and ssDNA viruses recombining with host dsDNA genomes and, more surprisingly, RNA virus genes recombining with ssDNA virus genomes. Mechanisms are still unclear, but this deep recombination greatly expands the breadth of virus evolution and confounds virus taxonomy.
Collapse
Affiliation(s)
- Kenneth M Stedman
- Biology Department and Center for Life in Extreme Environments, Portland State University, Portland, Oregon 97207;
| |
Collapse
|
17
|
Cheng S, Karkar S, Bapteste E, Yee N, Falkowski P, Bhattacharya D. Sequence similarity network reveals the imprints of major diversification events in the evolution of microbial life. Front Ecol Evol 2014. [DOI: 10.3389/fevo.2014.00072] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
|