1
|
Goldman AL, Fulk EM, Momper LM, Heider C, Mulligan J, Osburn M, Masiello CA, Silberg JJ. Microbial sensor variation across biogeochemical conditions in the terrestrial deep subsurface. mSystems 2024; 9:e0096623. [PMID: 38059636 PMCID: PMC10805038 DOI: 10.1128/msystems.00966-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 11/08/2023] [Indexed: 12/08/2023] Open
Abstract
Microbes can be found in abundance many kilometers underground. While microbial metabolic capabilities have been examined across different geochemical settings, it remains unclear how changes in subsurface niches affect microbial needs to sense and respond to their environment. To address this question, we examined how microbial extracellular sensor systems vary with environmental conditions across metagenomes at different Deep Mine Microbial Observatory (DeMMO) subsurface sites. Because two-component systems (TCSs) directly sense extracellular conditions and convert this information into intracellular biochemical responses, we expected that this sensor family would vary across isolated oligotrophic subterranean environments that differ in abiotic and biotic conditions. TCSs were found at all six subsurface sites, the service water control, and the surface site, with an average of 0.88 sensor histidine kinases (HKs) per 100 genes across all sites. Abundance was greater in subsurface fracture fluids compared with surface-derived fluids, and candidate phyla radiation (CPR) bacteria presented the lowest HK frequencies. Measures of microbial diversity, such as the Shannon diversity index, revealed that HK abundance is inversely correlated with microbial diversity (r2 = 0.81). Among the geochemical parameters measured, HK frequency correlated most strongly with variance in dissolved organic carbon (r2 = 0.82). Taken together, these results implicate the abiotic and biotic properties of an ecological niche as drivers of sensor needs, and they suggest that microbes in environments with large fluctuations in organic nutrients (e.g., lacustrine, terrestrial, and coastal ecosystems) may require greater TCS diversity than ecosystems with low nutrients (e.g., open ocean).IMPORTANCEThe ability to detect extracellular environmental conditions is a fundamental property of all life forms. Because microbial two-component sensor systems convert information about extracellular conditions into biochemical information that controls their behaviors, we evaluated how two-component sensor systems evolved within the deep Earth across multiple sites where abiotic and biotic properties vary. We show that these sensor systems remain abundant in microbial consortia at all subterranean sampling sites and observe correlations between sensor system abundances and abiotic (dissolved organic carbon variation) and biotic (consortia diversity) properties. These results suggest that multiple environmental properties may drive sensor protein evolution and highlight the need for further studies of metagenomic and geochemical data in parallel to understand the drivers of microbial sensor evolution.
Collapse
Affiliation(s)
| | - Emily M. Fulk
- Systems, Synthetic, and Physical Biology Graduate Program, Rice University, Houston, Texas, USA
| | - Lily M. Momper
- Department of Earth and Planetary Sciences, Northwestern University, Evanston, Illinois, USA
| | - Clinton Heider
- Rice University, Center for Research Computing, Houston, Texas, USA
| | - John Mulligan
- Rice University, Center for Research Computing, Houston, Texas, USA
| | - Magdalena Osburn
- Department of Earth and Planetary Sciences, Northwestern University, Evanston, Illinois, USA
| | - Caroline A. Masiello
- Department of Biosciences, Rice University, Houston, Texas, USA
- Department of Earth, Environmental and Planetary Sciences, Rice University, Houston, Texas, USA
- Department of Chemistry, Rice University, Houston, Texas, USA
| | - Jonathan J. Silberg
- Department of Biosciences, Rice University, Houston, Texas, USA
- Department of Bioengineering, Rice University, Houston, Texas, USA
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| |
Collapse
|
2
|
Sequence Similarity Network Analysis Provides Insight into the Temporal and Geographical Distribution of Mutations in SARS-CoV-2 Spike Protein. Viruses 2022; 14:v14081672. [PMID: 36016294 PMCID: PMC9413517 DOI: 10.3390/v14081672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/28/2022] [Accepted: 07/28/2022] [Indexed: 11/17/2022] Open
Abstract
Severe acute respiratory syndrome-related coronavirus (SARS-CoV-2), which still infects hundreds of thousands of people globally each day despite various countermeasures, has been mutating rapidly. Mutations in the spike (S) protein seem to play a vital role in viral stability, transmission, and adaptability. Therefore, to control the spread of the virus, it is important to gain insight into the evolution and transmission of the S protein. This study deals with the temporal and geographical distribution of mutant S proteins from sequences gathered across the US over a period of 19 months in 2020 and 2021. The S protein sequences are studied using two approaches: (i) multiple sequence alignment is used to identify prominent mutations and highly mutable regions and (ii) sequence similarity networks are subsequently employed to gain further insight and study mutation profiles of concerning variants across the defined time periods and states. Additionally, we tracked the variants using visualizations on geographical maps. The visualizations produced using the Directed Weighted All Nearest Neighbors (DiWANN) networks and maps provided insights into the transmission of the virus that reflect well the statistics reported for the time periods studied. We found that the networks created using DiWANN are superior to commonly used approximate distance networks created using BLAST bitscores. The study offers a richer computational approach to analyze the transmission profile of the prominent S protein mutations in SARS-CoV-2 and can be extended to other proteins and viruses.
Collapse
|
3
|
Faure E, Ayata SD, Bittner L. Towards omics-based predictions of planktonic functional composition from environmental data. Nat Commun 2021; 12:4361. [PMID: 34272373 PMCID: PMC8285379 DOI: 10.1038/s41467-021-24547-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 05/25/2021] [Indexed: 02/06/2023] Open
Abstract
Marine microbes play a crucial role in climate regulation, biogeochemical cycles, and trophic networks. Unprecedented amounts of data on planktonic communities were recently collected, sparking a need for innovative data-driven methodologies to quantify and predict their ecosystemic functions. We reanalyze 885 marine metagenome-assembled genomes through a network-based approach and detect 233,756 protein functional clusters, from which 15% are functionally unannotated. We investigate all clusters' distributions across the global ocean through machine learning, identifying biogeographical provinces as the best predictors of protein functional clusters' abundance. The abundances of 14,585 clusters are predictable from the environmental context, including 1347 functionally unannotated clusters. We analyze the biogeography of these 14,585 clusters, identifying the Mediterranean Sea as an outlier in terms of protein functional clusters composition. Applicable to any set of sequences, our approach constitutes a step towards quantitative predictions of functional composition from the environmental context.
Collapse
Affiliation(s)
- Emile Faure
- Sorbonne Université, CNRS, Laboratoire d'Océanographie de Villefranche, LOV, Villefranche-sur-Mer, France.
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France.
| | - Sakina-Dorothée Ayata
- Sorbonne Université, CNRS, Laboratoire d'Océanographie de Villefranche, LOV, Villefranche-sur-Mer, France
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Lucie Bittner
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
- Institut Universitaire de France, Paris, France
| |
Collapse
|
4
|
Shafee T, Bacic A, Johnson K. Evolution of Sequence-Diverse Disordered Regions in a Protein Family: Order within the Chaos. Mol Biol Evol 2020; 37:2155-2172. [DOI: 10.1093/molbev/msaa096] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Abstract
Approaches for studying the evolution of globular proteins are now well established yet are unsuitable for disordered sequences. Our understanding of the evolution of proteins containing disordered regions therefore lags that of globular proteins, limiting our capacity to estimate their evolutionary history, classify paralogs, and identify potential sequence–function relationships. Here, we overcome these limitations by using new analytical approaches that project representations of sequence space to dissect the evolution of proteins with both ordered and disordered regions, and the correlated changes between these. We use the fasciclin-like arabinogalactan proteins (FLAs) as a model family, since they contain a variable number of globular fasciclin domains as well as several distinct types of disordered regions: proline (Pro)-rich arabinogalactan (AG) regions and longer Pro-depleted regions.
Sequence space projections of fasciclin domains from 2019 FLAs from 78 species identified distinct clusters corresponding to different types of fasciclin domains. Clusters can be similarly identified in the seemingly random Pro-rich AG and Pro-depleted disordered regions. Sequence features of the globular and disordered regions clearly correlate with one another, implying coevolution of these distinct regions, as well as with the N-linked and O-linked glycosylation motifs. We reconstruct the overall evolutionary history of the FLAs, annotated with the changing domain architectures, glycosylation motifs, number and length of AG regions, and disordered region sequence features. Mapping these features onto the functionally characterized FLAs therefore enables their sequence–function relationships to be interrogated. These findings will inform research on the abundant disordered regions in protein families from all kingdoms of life.
Collapse
Affiliation(s)
- Thomas Shafee
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
| | - Antony Bacic
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
- Sino-Australia Plant Cell Wall Research Centre, College of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, China
| | - Kim Johnson
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
- Sino-Australia Plant Cell Wall Research Centre, College of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, China
| |
Collapse
|
5
|
Imam N, Alam A, Ali R, Siddiqui MF, Ali S, Malik MZ, Ishrat R. In silico characterization of hypothetical proteins from Orientia tsutsugamushi str. Karp uncovers virulence genes. Heliyon 2019; 5:e02734. [PMID: 31720472 PMCID: PMC6838952 DOI: 10.1016/j.heliyon.2019.e02734] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 04/29/2019] [Accepted: 10/23/2019] [Indexed: 11/20/2022] Open
Abstract
Scrub typhus also known as bush typhus is a disease with symptoms similar to Chikungunya infection. It is caused by a gram-negative bacterium Orientia tsutsugamushi which resides in its vertebrate host, Mites. The genome of Orientia tsutsugamushi str. Karp encodes for 1,563 proteins, of which 344 are characterized as hypothetical ones. In the present study, we tried to identify the probable functions of these 344 hypothetical proteins (HPs). All the characterized hypothetical proteins (HPs) belong to the various protein classes like enzymes, transporters, binding proteins, metabolic process and catalytic activity and kinase activity. These hypothetical proteins (HPs) were further analyzed for virulence factors with 62 proteins identified as the most virulent proteins among these hypothetical proteins (HPs). In addition, we studied the protein sequence similarity network for visualizing functional trends across protein superfamilies from the context of sequence similarity and it shows great potential for generating testable hypotheses about protein structure-function relationships. Furthermore, we calculated toplogical properties of the network and found them to obey network power law distributions showing a fractal nature. We also identifed two highly interconnected modules in the main network which contained five hub proteins (KJV55465, KJV56211, KJV57212, KJV57203 and KJV57216) having 1.0 clustering coefficient. The structural modeling (2D and 3D structure) of these five hub proteins was carried out and the catalytic site essential for its functioning was analyzed. The outcome of the present study may facilitate a better understanding of the mechanism of virulence, pathogenesis, adaptability to host and up-to-date annotations will make unknown genes easy to identify and target for experimentation. The information on the functional attributes and virulence characteristic of these hypothetical proteins (HPs) are envisaged to facilitate effective development of novel antibacterial drug targets of Orientia tsutsugamushi.
Collapse
Affiliation(s)
- Nikhat Imam
- Institute of Computer Science and Information Technology, Magadh University, Bodhgaya, India
- Centre for Interdisciplinary Research in Basic Science, Jamia Millia Islamia, New Delhi, India
| | - Aftab Alam
- Centre for Interdisciplinary Research in Basic Science, Jamia Millia Islamia, New Delhi, India
| | - Rafat Ali
- Centre for Interdisciplinary Research in Basic Science, Jamia Millia Islamia, New Delhi, India
| | - Mohd Faizan Siddiqui
- International Medical Faculty, Osh State University, Osh City, 723500, Kyrgyz Republic (Kyrgyzstan)
| | - Sher Ali
- Centre for Interdisciplinary Research in Basic Science, Jamia Millia Islamia, New Delhi, India
| | - Md. Zubbair Malik
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, Delhi, 110067, India
| | - Romana Ishrat
- Centre for Interdisciplinary Research in Basic Science, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
6
|
Muscente AD, Bykova N, Boag TH, Buatois LA, Mángano MG, Eleish A, Prabhu A, Pan F, Meyer MB, Schiffbauer JD, Fox P, Hazen RM, Knoll AH. Ediacaran biozones identified with network analysis provide evidence for pulsed extinctions of early complex life. Nat Commun 2019; 10:911. [PMID: 30796215 PMCID: PMC6384941 DOI: 10.1038/s41467-019-08837-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 01/28/2019] [Indexed: 12/05/2022] Open
Abstract
Rocks of Ediacaran age (~635–541 Ma) contain the oldest fossils of large, complex organisms and their behaviors. These fossils document developmental and ecological innovations, and suggest that extinctions helped to shape the trajectory of early animal evolution. Conventional methods divide Ediacaran macrofossil localities into taxonomically distinct clusters, which may represent evolutionary, environmental, or preservational variation. Here, we investigate these possibilities with network analysis of body and trace fossil occurrences. By partitioning multipartite networks of taxa, paleoenvironments, and geologic formations into community units, we distinguish between biostratigraphic zones and paleoenvironmentally restricted biotopes, and provide empirically robust and statistically significant evidence for a global, cosmopolitan assemblage unique to terminal Ediacaran strata. The assemblage is taxonomically depauperate but includes fossils of recognizable eumetazoans, which lived between two episodes of biotic turnover. These turnover events were the first major extinctions of complex life and paved the way for the Cambrian radiation of animals. The Ediacara biota—the first large, complex organisms to evolve on Earth—disappeared prior to the radiation of animals during the Cambrian Period. Here, Muscente et al. perform network analysis of Ediacaran fossils and show that there were two global extinction events before the Cambrian radiation.
Collapse
Affiliation(s)
- A D Muscente
- Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, 02138, USA. .,Department of Geological Sciences, Jackson School of Geoscience, University of Texas at Austin, Austin, TX, 78712, USA.
| | - Natalia Bykova
- Trofimuk Institute of Petroleum Geology and Geophysics, Siberian Branch Russian Academy of Sciences, Novosibirsk, 630090, Russia
| | - Thomas H Boag
- Department of Geological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Luis A Buatois
- Department of Geological Sciences, University of Saskatchewan, Saskatoon, SK, S7n 5E2, Canada
| | - M Gabriela Mángano
- Department of Geological Sciences, University of Saskatchewan, Saskatoon, SK, S7n 5E2, Canada
| | - Ahmed Eleish
- Department of Earth and Environmental Sciences, Rensselaer Polytechnic Institute, Jonsson-Rowland Science Center, 1W19, 110 8th Street, Troy, NY, 12180, USA
| | - Anirudh Prabhu
- Department of Earth and Environmental Sciences, Rensselaer Polytechnic Institute, Jonsson-Rowland Science Center, 1W19, 110 8th Street, Troy, NY, 12180, USA
| | - Feifei Pan
- Department of Earth and Environmental Sciences, Rensselaer Polytechnic Institute, Jonsson-Rowland Science Center, 1W19, 110 8th Street, Troy, NY, 12180, USA
| | - Michael B Meyer
- Earth and Environmental Science Program, Harrisburg University of Science and Technology, Harrisburg, PA, 17101, USA
| | - James D Schiffbauer
- Department of Geological Sciences, University of Missouri, Columbia, MO, 65211, USA.,X-ray Microanalysis Core Facility, University of Missouri, Columbia, MO, 65211, USA
| | - Peter Fox
- Department of Earth and Environmental Sciences, Rensselaer Polytechnic Institute, Jonsson-Rowland Science Center, 1W19, 110 8th Street, Troy, NY, 12180, USA
| | - Robert M Hazen
- Geophysical Laboratory, Carnegie Institution for Science, 5251 Broad Branch Road, Washington, D.C, 20015, USA
| | - Andrew H Knoll
- Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, 02138, USA.,Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| |
Collapse
|
7
|
Watson AK, Lannes R, Pathmanathan JS, Méheust R, Karkar S, Colson P, Corel E, Lopez P, Bapteste E. The Methodology Behind Network Thinking: Graphs to Analyze Microbial Complexity and Evolution. Methods Mol Biol 2019; 1910:271-308. [PMID: 31278668 DOI: 10.1007/978-1-4939-9074-0_9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In the post genomic era, large and complex molecular datasets from genome and metagenome sequencing projects expand the limits of what is possible for bioinformatic analyses. Network-based methods are increasingly used to complement phylogenetic analysis in studies in molecular evolution, including comparative genomics, classification, and ecological studies. Using network methods, the vertical and horizontal relationships between all genes or genomes, whether they are from cellular chromosomes or mobile genetic elements, can be explored in a single expandable graph. In recent years, development of new methods for the construction and analysis of networks has helped to broaden the availability of these approaches from programmers to a diversity of users. This chapter introduces the different kinds of networks based on sequence similarity that are already available to tackle a wide range of biological questions, including sequence similarity networks, gene-sharing networks and bipartite graphs, and a guide for their construction and analyses.
Collapse
Affiliation(s)
- Andrew K Watson
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Romain Lannes
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Jananan S Pathmanathan
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Raphaël Méheust
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Slim Karkar
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
- Department of Ecology, Evolution, and Natural Resources, School of Environmental and Biological Sciences, Rutgers, The State University of NJ, New Brunswick, NJ, USA
| | - Philippe Colson
- Fondation Institut Hospitalo-Universitaire Méditerranée Infection, Pôle des Maladies Infectieuses et Tropicales Clinique et Biologique, Fédération de Bactériologie-Hygiène-Virologie, Centre Hospitalo-Universitaire Tione, Assistance Publique-Hôpitaux de Marseille, Marseille, France
- Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes (URMITE) UM63, CNRS 7278, IRD 198, INSERM U1095, Aix-Marseille University, Marseille, France
| | - Eduardo Corel
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Philippe Lopez
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Eric Bapteste
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France.
| |
Collapse
|
8
|
Abstract
Understanding how an animal organism and its gut microbes form an integrated biological organization, known as a holobiont, is becoming a central issue in biological studies. Such an organization inevitably involves a complex web of transmission processes that occur on different scales in time and space, across microbes and hosts. Network-based models are introduced in this chapter to tackle aspects of this complexity and to better take into account vertical and horizontal dimensions of transmission. Two types of network-based models are presented, sequence similarity networks and bipartite graphs. One interest of these networks is that they can consider a rich diversity of important players in microbial evolution that are usually excluded from evolutionary studies, like plasmids and viruses. These methods bring forward the notion of "gene externalization," which is defined as the presence of redundant copies of prokaryotic genes on mobile genetic elements (MGEs), and therefore emphasizes a related although distinct process from lateral gene transfer between microbial cells. This chapter introduces guidelines to the construction of these networks, reviews their analysis, and illustrates their possible biological interpretations and uses. The application to human gut microbiomes shows that sequences present in a higher diversity of MGEs have both biased functions and a broader microbial and human host range. These results suggest that an "externalized gut metagenome" is partly common to humans and benefits the gut microbial community. We conclude that testing relationships between microbial genes, microbes, and their animal hosts, using network-based methods, could help to unravel additional mechanisms of transmission in holobionts.
Collapse
|
9
|
Shafee T, Anderson MA. A quantitative map of protein sequence space for the cis-defensin superfamily. Bioinformatics 2018; 35:743-752. [DOI: 10.1093/bioinformatics/bty697] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 08/01/2018] [Accepted: 08/08/2018] [Indexed: 12/13/2022] Open
Affiliation(s)
- Thomas Shafee
- Department of biochemistry and genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Australia
| | - Marilyn A Anderson
- Department of biochemistry and genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Australia
| |
Collapse
|
10
|
Meng A, Corre E, Probert I, Gutierrez-Rodriguez A, Siano R, Annamale A, Alberti A, Da Silva C, Wincker P, Le Crom S, Not F, Bittner L. Analysis of the genomic basis of functional diversity in dinoflagellates using a transcriptome-based sequence similarity network. Mol Ecol 2018; 27:2365-2380. [PMID: 29624751 DOI: 10.1111/mec.14579] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 02/23/2018] [Accepted: 03/21/2018] [Indexed: 02/06/2023]
Abstract
Dinoflagellates are one of the most abundant and functionally diverse groups of eukaryotes. Despite an overall scarcity of genomic information for dinoflagellates, constantly emerging high-throughput sequencing resources can be used to characterize and compare these organisms. We assembled de novo and processed 46 dinoflagellate transcriptomes and used a sequence similarity network (SSN) to compare the underlying genomic basis of functional features within the group. This approach constitutes the most comprehensive picture to date of the genomic potential of dinoflagellates. A core-predicted proteome composed of 252 connected components (CCs) of putative conserved protein domains (pCDs) was identified. Of these, 206 were novel and 16 lacked any functional annotation in public databases. Integration of functional information in our network analyses allowed investigation of pCDs specifically associated with functional traits. With respect to toxicity, sequences homologous to those of proteins found in species with toxicity potential (e.g., sxtA4 and sxtG) were not specific to known toxin-producing species. Although not fully specific to symbiosis, the most represented functions associated with proteins involved in the symbiotic trait were related to membrane processes and ion transport. Overall, our SSN approach led to identification of 45,207 and 90,794 specific and constitutive pCDs of, respectively, the toxic and symbiotic species represented in our analyses. Of these, 56% and 57%, respectively (i.e., 25,393 and 52,193 pCDs), completely lacked annotation in public databases. This stresses the extent of our lack of knowledge, while emphasizing the potential of SSNs to identify candidate pCDs for further functional genomic characterization.
Collapse
Affiliation(s)
- Arnaud Meng
- Sorbonne Universités, UPMC Univ Paris 06, Univ Antilles Guyane, Univ Nice Sophia Antipolis, CNRS, Evolution Paris Seine - Institut de Biologie Paris Seine (EPS - IBPS), Paris, France
| | - Erwan Corre
- CNRS, UPMC, FR2424, ABiMS, Station Biologique, Roscoff, France
| | - Ian Probert
- UPMC-CNRS, FR2424, Roscoff Culture Collection, Station Biologique de Roscoff, Place Georges Teissier, Roscoff, France
| | | | - Raffaele Siano
- Ifremer - Centre de Brest, DYNECO PELAGOS, Plouzané, France
| | - Anita Annamale
- CEA - Institut de Génomique, GENOSCOPE, Evry, France.,CNRS, UMR8030, Evry, France.,Université d'Evry Val d'Essonne, Evry, France
| | - Adriana Alberti
- CEA - Institut de Génomique, GENOSCOPE, Evry, France.,CNRS, UMR8030, Evry, France.,Université d'Evry Val d'Essonne, Evry, France
| | - Corinne Da Silva
- CEA - Institut de Génomique, GENOSCOPE, Evry, France.,CNRS, UMR8030, Evry, France.,Université d'Evry Val d'Essonne, Evry, France
| | - Patrick Wincker
- CEA - Institut de Génomique, GENOSCOPE, Evry, France.,CNRS, UMR8030, Evry, France.,Université d'Evry Val d'Essonne, Evry, France
| | - Stéphane Le Crom
- Sorbonne Universités, UPMC Univ Paris 06, Univ Antilles Guyane, Univ Nice Sophia Antipolis, CNRS, Evolution Paris Seine - Institut de Biologie Paris Seine (EPS - IBPS), Paris, France
| | - Fabrice Not
- CNRS, UMR 7144, Station Biologique de Roscoff, Place Georges Teissier, Roscoff, France
| | - Lucie Bittner
- Sorbonne Universités, UPMC Univ Paris 06, Univ Antilles Guyane, Univ Nice Sophia Antipolis, CNRS, Evolution Paris Seine - Institut de Biologie Paris Seine (EPS - IBPS), Paris, France
| |
Collapse
|
11
|
BRIDES: A New Fast Algorithm and Software for Characterizing Evolving Similarity Networks Using Breakthroughs, Roadblocks, Impasses, Detours, Equals and Shortcuts. PLoS One 2016; 11:e0161474. [PMID: 27580188 PMCID: PMC5007014 DOI: 10.1371/journal.pone.0161474] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 08/06/2016] [Indexed: 11/19/2022] Open
Abstract
Various types of genome and gene similarity networks along with their characteristics have been increasingly used for retracing different kinds of evolutionary and ecological relationships. Here, we present a new polynomial time algorithm and the corresponding software (BRIDES) to provide characterization of different types of paths existing in evolving (or augmented) similarity networks under the constraint that such paths contain at least one node that was not present in the original network. These different paths are denoted as Breakthroughs, Roadblocks, Impasses, Detours, Equal paths, and Shortcuts. The analysis of their distribution can allow discriminating among different evolutionary hypotheses concerning genomes or genes at hand. Our approach is based on an original application of the popular shortest path Dijkstra's and Yen's algorithms. The C++ and R versions of the BRIDES program are freely available at: https://github.com/etiennelord/BRIDES.
Collapse
|
12
|
Network-Thinking: Graphs to Analyze Microbial Complexity and Evolution. Trends Microbiol 2016; 24:224-237. [PMID: 26774999 PMCID: PMC4766943 DOI: 10.1016/j.tim.2015.12.003] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 12/02/2015] [Accepted: 12/08/2015] [Indexed: 01/23/2023]
Abstract
The tree model and tree-based methods have played a major, fruitful role in evolutionary studies. However, with the increasing realization of the quantitative and qualitative importance of reticulate evolutionary processes, affecting all levels of biological organization, complementary network-based models and methods are now flourishing, inviting evolutionary biology to experience a network-thinking era. We show how relatively recent comers in this field of study, that is, sequence-similarity networks, genome networks, and gene families–genomes bipartite graphs, already allow for a significantly enhanced usage of molecular datasets in comparative studies. Analyses of these networks provide tools for tackling a multitude of complex phenomena, including the evolution of gene transfer, composite genes and genomes, evolutionary transitions, and holobionts. Introgressive processes shape the microbial world at all levels of organisation. This reticulated evolution is increasingly studied by sequence-similarity networks. They provide an inclusive accurate multilevel framework to study the web of life. Networks enhance analyses of microbial genes, genomes, communities, and of symbiosis.
Collapse
|
13
|
Harel A, Karkar S, Cheng S, Falkowski P, Bhattacharya D. Deciphering Primordial Cyanobacterial Genome Functions from Protein Network Analysis. Curr Biol 2015; 25:628-34. [DOI: 10.1016/j.cub.2014.12.061] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 11/05/2014] [Accepted: 12/29/2014] [Indexed: 11/16/2022]
|