1
|
Sonnert ND, Rosen CE, Ghazi AR, Franzosa EA, Duncan-Lowey B, González-Hernández JA, Huck JD, Yang Y, Dai Y, Rice TA, Nguyen MT, Song D, Cao Y, Martin AL, Bielecka AA, Fischer S, Guan C, Oh J, Huttenhower C, Ring AM, Palm NW. A host-microbiota interactome reveals extensive transkingdom connectivity. Nature 2024; 628:171-179. [PMID: 38509360 DOI: 10.1038/s41586-024-07162-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 02/05/2024] [Indexed: 03/22/2024]
Abstract
The myriad microorganisms that live in close association with humans have diverse effects on physiology, yet the molecular bases for these impacts remain mostly unknown1-3. Classical pathogens often invade host tissues and modulate immune responses through interactions with human extracellular and secreted proteins (the 'exoproteome'). Commensal microorganisms may also facilitate niche colonization and shape host biology by engaging host exoproteins; however, direct exoproteome-microbiota interactions remain largely unexplored. Here we developed and validated a novel technology, BASEHIT, that enables proteome-scale assessment of human exoproteome-microbiome interactions. Using BASEHIT, we interrogated more than 1.7 million potential interactions between 519 human-associated bacterial strains from diverse phylogenies and tissues of origin and 3,324 human exoproteins. The resulting interactome revealed an extensive network of transkingdom connectivity consisting of thousands of previously undescribed host-microorganism interactions involving 383 strains and 651 host proteins. Specific binding patterns within this network implied underlying biological logic; for example, conspecific strains exhibited shared exoprotein-binding patterns, and individual tissue isolates uniquely bound tissue-specific exoproteins. Furthermore, we observed dozens of unique and often strain-specific interactions with potential roles in niche colonization, tissue remodelling and immunomodulation, and found that strains with differing host interaction profiles had divergent interactions with host cells in vitro and effects on the host immune system in vivo. Overall, these studies expose a previously unexplored landscape of molecular-level host-microbiota interactions that may underlie causal effects of indigenous microorganisms on human health and disease.
Collapse
Affiliation(s)
- Nicole D Sonnert
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
- Department of Microbial Pathogenesis, Yale School of Medicine, New Haven, CT, USA
| | - Connor E Rosen
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Andrew R Ghazi
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric A Franzosa
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | | | | | - John D Huck
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Yi Yang
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Yile Dai
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Tyler A Rice
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Mytien T Nguyen
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Deguang Song
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Yiyun Cao
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Anjelica L Martin
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Agata A Bielecka
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Suzanne Fischer
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Changhui Guan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Julia Oh
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Aaron M Ring
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA.
- Department of Pharmacology, Yale School of Medicine, New Haven, CT, USA.
| | - Noah W Palm
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
2
|
Zhao H, Li Z, Chen W, Zheng Z, Xie S. Accelerated Partially Shared Dictionary Learning With Differentiable Scale-Invariant Sparsity for Multi-View Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8825-8839. [PMID: 35254997 DOI: 10.1109/tnnls.2022.3153310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Multiview dictionary learning (DL) is attracting attention in multiview clustering due to the efficient feature learning ability. However, most existing multiview DL algorithms are facing problems in fully utilizing consistent and complementary information simultaneously in the multiview data and learning the most precise representation for multiview clustering because of gaps between views. This article proposes an efficient multiview DL algorithm for multiview clustering, which uses the partially shared DL model with a flexible ratio of shared sparse coefficients to excavate both consistency and complementarity in the multiview data. In particular, a differentiable scale-invariant function is used as the sparsity regularizer, which considers the absolute sparsity of coefficients as the l0 norm regularizer but is continuous and differentiable almost everywhere. The corresponding optimization problem is solved by the proximal splitting method with extrapolation technology; moreover, the proximal operator of the differentiable scale-invariant regularizer can be derived. The synthetic experiment results demonstrate that the proposed algorithm can recover the synthetic dictionary well with reasonable convergence time costs. Multiview clustering experiments include six real-world multiview datasets, and the performances show that the proposed algorithm is not sensitive to the regularizer parameter as the other algorithms. Furthermore, an appropriate coefficient sharing ratio can help to exploit consistent information while keeping complementary information from multiview data and thus enhance performances in multiview clustering. In addition, the convergence performances show that the proposed algorithm can obtain the best performances in multiview clustering among compared algorithms and can converge faster than compared multiview algorithms mostly.
Collapse
|
3
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
4
|
Zhao J, Wang X, Zou Q, Kang F, Peng J, Wang F. On improvability of hash clustering data from different sources by bipartite graph. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01125-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
5
|
Gemler BT, Mukherjee C, Howland CA, Huk D, Shank Z, Harbo LJ, Tabbaa OP, Bartling CM. Function-based classification of hazardous biological sequences: Demonstration of a new paradigm for biohazard assessments. Front Bioeng Biotechnol 2022; 10:979497. [PMID: 36277394 PMCID: PMC9585941 DOI: 10.3389/fbioe.2022.979497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 08/31/2022] [Indexed: 12/04/2022] Open
Abstract
Bioengineering applies analytical and engineering principles to identify functional biological building blocks for biotechnology applications. While these building blocks are leveraged to improve the human condition, the lack of simplistic, machine-readable definition of biohazards at the function level is creating a gap for biosafety practices. More specifically, traditional safety practices focus on the biohazards of known pathogens at the organism-level and may not accurately consider novel biodesigns with engineered functionalities at the genetic component-level. This gap is motivating the need for a paradigm shift from organism-centric procedures to function-centric biohazard identification and classification practices. To address this challenge, we present a novel methodology for classifying biohazards at the individual sequence level, which we then compiled to distinguish the biohazardous property of pathogenicity at the whole genome level. Our methodology is rooted in compilation of hazardous functions, defined as a set of sequences and associated metadata that describe coarse-level functions associated with pathogens (e.g., adherence, immune subversion). We demonstrate that the resulting database can be used to develop hazardous “fingerprints” based on the functional metadata categories. We verified that these hazardous functions are found at higher levels in pathogens compared to non-pathogens, and hierarchical clustering of the fingerprints can distinguish between these two groups. The methodology presented here defines the hazardous functions associated with bioengineering functional building blocks at the sequence level, which provide a foundational framework for classifying biological hazards at the organism level, thus leading to the improvement and standardization of current biosecurity and biosafety practices.
Collapse
|
6
|
Karaoz U, Brodie EL. microTrait: A Toolset for a Trait-Based Representation of Microbial Genomes. FRONTIERS IN BIOINFORMATICS 2022; 2:918853. [PMID: 36304272 PMCID: PMC9580909 DOI: 10.3389/fbinf.2022.918853] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 06/20/2022] [Indexed: 11/29/2023] Open
Abstract
Remote sensing approaches have revolutionized the study of macroorganisms, allowing theories of population and community ecology to be tested across increasingly larger scales without much compromise in resolution of biological complexity. In microbial ecology, our remote window into the ecology of microorganisms is through the lens of genome sequencing. For microbial organisms, recent evidence from genomes recovered from metagenomic samples corroborate a highly complex view of their metabolic diversity and other associated traits which map into high physiological complexity. Regardless, during the first decades of this omics era, microbial ecological research has primarily focused on taxa and functional genes as ecological units, favoring breadth of coverage over resolution of biological complexity manifested as physiological diversity. Recently, the rate at which provisional draft genomes are generated has increased substantially, giving new insights into ecological processes and interactions. From a genotype perspective, the wide availability of genome-centric data requires new data synthesis approaches that place organismal genomes center stage in the study of environmental roles and functional performance. Extraction of ecologically relevant traits from microbial genomes will be essential to the future of microbial ecological research. Here, we present microTrait, a computational pipeline that infers and distills ecologically relevant traits from microbial genome sequences. microTrait maps a genome sequence into a trait space, including discrete and continuous traits, as well as simple and composite. Traits are inferred from genes and pathways representing energetic, resource acquisition, and stress tolerance mechanisms, while genome-wide signatures are used to infer composite, or life history, traits of microorganisms. This approach is extensible to any microbial habitat, although we provide initial examples of this approach with reference to soil microbiomes.
Collapse
Affiliation(s)
- Ulas Karaoz
- Earth and Environmental Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Eoin L. Brodie
- Earth and Environmental Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
- Department of Environmental Science, Policy and Management, University of California, Berkeley, CA, United States
| |
Collapse
|
7
|
Lu Y, Li Q, Li T. PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment. Front Genet 2022; 13:839453. [PMID: 35444686 PMCID: PMC9013948 DOI: 10.3389/fgene.2022.839453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/17/2022] [Indexed: 11/17/2022] Open
Abstract
With the rapid development of sequencing technology, completed genomes of microbes have explosively emerged. For a newly sequenced prokaryotic genome, gene functional annotation and metabolism pathway assignment are important foundations for all subsequent research work. However, the assignment rate for gene metabolism pathways is lower than 48% on the whole. It is even lower for newly sequenced prokaryotic genomes, which has become a bottleneck for subsequent research. Thus, the development of a high-precision metabolic pathway assignment framework is urgently needed. Here, we developed PPA-GCN, a prokaryotic pathways assignment framework based on graph convolutional network, to assist functional pathway assignments using KEGG information and genomic characteristics. In the framework, genomic gene synteny information was used to construct a network, and ideas of self-supervised learning were inspired to enhance the framework’s learning ability. Our framework is applicable to the genera of microbe with sufficient whole genome sequences. To evaluate the assignment rate, genomes from three different genera (Flavobacterium (65 genomes) and Pseudomonas (100 genomes), Staphylococcus (500 genomes)) were used. The initial functional pathway assignment rate of the three test genera were 27.7% (Flavobacterium), 49.5% (Pseudomonas) and 30.1% (Staphylococcus). PPA-GCN achieved excellence performance of 84.8% (Flavobacterium), 77.0% (Pseudomonas) and 71.0% (Staphylococcus) for assignment rate. At the same time, PPA-GCN was proved to have strong fault tolerance. The framework provides novel insights into assignment for metabolism pathways and is likely to inform future deep learning applications for interpreting functional annotations and extends to all prokaryotic genera with sufficient genomes.
Collapse
Affiliation(s)
- Yuntao Lu
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.,College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Qi Li
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Tao Li
- Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| |
Collapse
|
8
|
Hu EZ, Lan XR, Liu ZL, Gao J, Niu DK. A positive correlation between GC content and growth temperature in prokaryotes. BMC Genomics 2022; 23:110. [PMID: 35139824 PMCID: PMC8827189 DOI: 10.1186/s12864-022-08353-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 01/31/2022] [Indexed: 01/27/2023] Open
Abstract
Background GC pairs are generally more stable than AT pairs; GC-rich genomes were proposed to be more adapted to high temperatures than AT-rich genomes. Previous studies consistently showed positive correlations between growth temperature and the GC contents of structural RNA genes. However, for the whole genome sequences and the silent sites of the codons in protein-coding genes, the relationship between GC content and growth temperature is in a long-lasting debate. Results With a dataset much larger than previous studies (681 bacteria and 155 archaea with completely assembled genomes), our phylogenetic comparative analyses showed positive correlations between optimal growth temperature (Topt) and GC content both in bacterial and archaeal structural RNA genes and in bacterial whole genome sequences, chromosomal sequences, plasmid sequences, core genes, and accessory genes. However, in the 155 archaea, we did not observe a significant positive correlation of Topt with whole-genome GC content (GCw) or GC content at four-fold degenerate sites. We randomly drew 155 samples from the 681 bacteria for 1000 rounds. In most cases (> 95%), the positive correlations between Topt and genomic GC contents became statistically nonsignificant (P > 0.05). This result suggested that the small sample sizes might account for the lack of positive correlations between growth temperature and genomic GC content in the 155 archaea and the bacterial samples of previous studies. Comparing the GC content among four categories (psychrophiles/psychrotrophiles, mesophiles, thermophiles, and hyperthermophiles) also revealed a positive correlation between GCw and growth temperature in bacteria. By including the GCw of incompletely assembled genomes, we expanded the sample size of archaea to 303. Positive correlations between GCw and Topt appear especially after excluding the halophilic archaea whose GC contents might be strongly shaped by intense UV radiation. Conclusions This study explains the previous contradictory observations and ends a long debate. Prokaryotes growing in high temperatures have higher GC contents. Thermal adaptation is one possible explanation for the positive association. Meanwhile, we propose that the elevated efficiency of DNA repair in response to heat mutagenesis might have the by-product of increasing GC content like that happens in intracellular symbionts and marine bacterioplankton. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08353-7.
Collapse
Affiliation(s)
- En-Ze Hu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Xin-Ran Lan
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Zhi-Ling Liu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Jie Gao
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Deng-Ke Niu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
9
|
Ben Khedher M, Ghedira K, Rolain JM, Ruimy R, Croce O. Application and Challenge of 3rd Generation Sequencing for Clinical Bacterial Studies. Int J Mol Sci 2022; 23:1395. [PMID: 35163319 PMCID: PMC8835973 DOI: 10.3390/ijms23031395] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 02/04/2023] Open
Abstract
Over the past 25 years, the powerful combination of genome sequencing and bioinformatics analysis has played a crucial role in interpreting information encoded in bacterial genomes. High-throughput sequencing technologies have paved the way towards understanding an increasingly wide range of biological questions. This revolution has enabled advances in areas ranging from genome composition to how proteins interact with nucleic acids. This has created unprecedented opportunities through the integration of genomic data into clinics for the diagnosis of genetic traits associated with disease. Since then, these technologies have continued to evolve, and recently, long-read sequencing has overcome previous limitations in terms of accuracy, thus expanding its applications in genomics, transcriptomics and metagenomics. In this review, we describe a brief history of the bacterial genome sequencing revolution and its application in public health and molecular epidemiology. We present a chronology that encompasses the various technological developments: whole-genome shotgun sequencing, high-throughput sequencing, long-read sequencing. We mainly discuss the application of next-generation sequencing to decipher bacterial genomes. Secondly, we highlight how long-read sequencing technologies go beyond the limitations of traditional short-read sequencing. We intend to provide a description of the guiding principles of the 3rd generation sequencing applications and ongoing improvements in the field of microbial medical research.
Collapse
Affiliation(s)
- Mariem Ben Khedher
- Bacteriology Laboratory, Archet 2 Hospital, CHU Nice, 06000 Nice, France
- Institute for Research on Cancer and Aging Nice (IRCAN), CNRS, INSERM, Université Côte d’Azur, 06108 Nice, France
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institute Pasteur of Tunis, Tunis 1002, Tunisia;
| | - Jean-Marc Rolain
- IRD, APHM, MEPHI, IHU-Méditerranée Infection, Aix Marseille Université, 13005 Marseille, France;
| | - Raymond Ruimy
- Bacteriology Laboratory, Archet 2 Hospital, CHU Nice, 06000 Nice, France
- Centre Méditerranéen de Médecine Moléculaire (C3M), INSERM, Université Côte D’Azur, 06108 Nice, France
| | - Olivier Croce
- Institute for Research on Cancer and Aging Nice (IRCAN), CNRS, INSERM, Université Côte d’Azur, 06108 Nice, France
| |
Collapse
|
10
|
Barnett SE, Youngblut ND, Koechli CN, Buckley DH. Multisubstrate DNA stable isotope probing reveals guild structure of bacteria that mediate soil carbon cycling. Proc Natl Acad Sci U S A 2021; 118:e2115292118. [PMID: 34799453 PMCID: PMC8617410 DOI: 10.1073/pnas.2115292118] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 10/10/2021] [Indexed: 11/18/2022] Open
Abstract
Soil microorganisms determine the fate of soil organic matter (SOM), and their activities compose a major component of the global carbon (C) cycle. We employed a multisubstrate, DNA-stable isotope probing experiment to track bacterial assimilation of C derived from distinct sources that varied in bioavailability. This approach allowed us to measure microbial contributions to SOM processing by measuring the C assimilation dynamics of diverse microorganisms as they interacted within soil. We identified and tracked 1,286 bacterial taxa that assimilated 13C in an agricultural soil over a period of 48 d. Overall 13C-assimilation dynamics of bacterial taxa, defined by the source and timing of the 13C they assimilated, exhibited low phylogenetic conservation. We identified bacterial guilds composed of taxa that had similar 13C assimilation dynamics. We show that C-source bioavailability explained significant variation in both C mineralization dynamics and guild structure, and that the growth dynamics of bacterial guilds differed significantly in response to C addition. We also demonstrate that the guild structure explains significant variation in the biogeographical distribution of bacteria at continental and global scales. These results suggest that an understanding of in situ growth dynamics is essential for understanding microbial contributions to soil C cycling. We interpret these findings in the context of bacterial life history strategies and their relationship to terrestrial C cycling.
Collapse
Affiliation(s)
- Samuel E Barnett
- School of Integrative Plant Science, Cornell University, Ithaca, NY 14853
| | - Nicholas D Youngblut
- School of Integrative Plant Science, Cornell University, Ithaca, NY 14853
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Chantal N Koechli
- School of Integrative Plant Science, Cornell University, Ithaca, NY 14853
- Department of Biological Sciences, University of the Sciences, Philadelphia, PA 19104
| | - Daniel H Buckley
- School of Integrative Plant Science, Cornell University, Ithaca, NY 14853;
| |
Collapse
|
11
|
Weissman JL, Dogra S, Javadi K, Bolten S, Flint R, Davati C, Beattie J, Dixit K, Peesay T, Awan S, Thielen P, Breitwieser F, Johnson PLF, Karig D, Fagan WF, Bewick S. Exploring the functional composition of the human microbiome using a hand-curated microbial trait database. BMC Bioinformatics 2021; 22:306. [PMID: 34098872 PMCID: PMC8186035 DOI: 10.1186/s12859-021-04216-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Accepted: 05/25/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Even when microbial communities vary wildly in their taxonomic composition, their functional composition is often surprisingly stable. This suggests that a functional perspective could provide much deeper insight into the principles governing microbiome assembly. Much work to date analyzing the functional composition of microbial communities, however, relies heavily on inference from genomic features. Unfortunately, output from these methods can be hard to interpret and often suffers from relatively high error rates. RESULTS We built and analyzed a domain-specific microbial trait database from known microbe-trait pairs recorded in the literature to better understand the functional composition of the human microbiome. Using a combination of phylogentically conscious machine learning tools and a network science approach, we were able to link particular traits to areas of the human body, discover traits that determine the range of body areas a microbe can inhabit, and uncover drivers of metabolic breadth. CONCLUSIONS Domain-specific trait databases are an effective compromise between noisy methods to infer complex traits from genomic data and exhaustive, expensive attempts at database curation from the literature that do not focus on any one subset of taxa. They provide an accurate account of microbial traits and, by limiting the number of taxa considered, are feasible to build within a reasonable time-frame. We present a database specific for the human microbiome, in the hopes that this will prove useful for research into the functional composition of human-associated microbial communities.
Collapse
Affiliation(s)
- J L Weissman
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Sonia Dogra
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Keyan Javadi
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Samantha Bolten
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Rachel Flint
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Cyrus Davati
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Jess Beattie
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Keshav Dixit
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Tejasvi Peesay
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Shehar Awan
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Peter Thielen
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - Florian Breitwieser
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - Philip L F Johnson
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - David Karig
- Bioengineering Department, Clemson University, Clemson, SC, USA
| | - William F Fagan
- Department of Biology, University of Maryland - College Park, College Park, MD, USA
| | - Sharon Bewick
- Biological Sciences Department, Clemson University, Clemson, SC, USA.
| |
Collapse
|
12
|
Zhang GY, Chen XW, Zhou YR, Wang CD, Huang D, He XY. Kernelized multi-view subspace clustering via auto-weighted graph learning. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02365-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Zimmermann J, Kaleta C, Waschina S. gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models. Genome Biol 2021; 22:81. [PMID: 33691770 PMCID: PMC7949252 DOI: 10.1186/s13059-021-02295-1] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 02/10/2021] [Indexed: 12/21/2022] Open
Abstract
Genome-scale metabolic models of microorganisms are powerful frameworks to predict phenotypes from an organism's genotype. While manual reconstructions are laborious, automated reconstructions often fail to recapitulate known metabolic processes. Here we present gapseq ( https://github.com/jotech/gapseq ), a new tool to predict metabolic pathways and automatically reconstruct microbial metabolic models using a curated reaction database and a novel gap-filling algorithm. On the basis of scientific literature and experimental data for 14,931 bacterial phenotypes, we demonstrate that gapseq outperforms state-of-the-art tools in predicting enzyme activity, carbon source utilisation, fermentation products, and metabolic interactions within microbial communities.
Collapse
Affiliation(s)
- Johannes Zimmermann
- Christian-Albrechts-University Kiel, Institute of Experimental Medicine, Research Group Medical Systems Biology, Michaelis-Str. 5, Kiel, 24105 Germany
| | - Christoph Kaleta
- Christian-Albrechts-University Kiel, Institute of Experimental Medicine, Research Group Medical Systems Biology, Michaelis-Str. 5, Kiel, 24105 Germany
| | - Silvio Waschina
- Christian-Albrechts-University Kiel, Institute of Experimental Medicine, Research Group Medical Systems Biology, Michaelis-Str. 5, Kiel, 24105 Germany
- Christian-Albrechts-University Kiel, Institute of Human Nutrition and Food Science, Nutriinformatics, Heinrich-Hecht-Platz 10, Kiel, 24118 Germany
| |
Collapse
|
14
|
Discovering microbe-disease associations from the literature using a hierarchical long short-term memory network and an ensemble parser model. Sci Rep 2021; 11:4490. [PMID: 33627732 PMCID: PMC7904816 DOI: 10.1038/s41598-021-83966-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 02/08/2021] [Indexed: 02/07/2023] Open
Abstract
With recent advances in biotechnology and sequencing technology, the microbial community has been intensively studied and discovered to be associated with many chronic as well as acute diseases. Even though a tremendous number of studies describing the association between microbes and diseases have been published, text mining methods that focus on such associations have been rarely studied. We propose a framework that combines machine learning and natural language processing methods to analyze the association between microbes and diseases. A hierarchical long short-term memory network was used to detect sentences that describe the association. For the sentences determined, two different parse tree-based search methods were combined to find the relation-describing word. The ensemble model of constituency parsing for structural pattern matching and dependency-based relation extraction improved the prediction accuracy. By combining deep learning and parse tree-based extractions, our proposed framework could extract the microbe-disease association with higher accuracy. The evaluation results showed that our system achieved an F-score of 0.8764 and 0.8524 in binary decisions and extracting relation words, respectively. As a case study, we performed a large-scale analysis of the association between microbes and diseases. Additionally, a set of common microbes shared by multiple diseases were also identified in this study. This study could provide valuable information for the major microbes that were studied for a specific disease. The code and data are available at https://github.com/DMnBI/mdi_predictor .
Collapse
|
15
|
Cauchy loss induced block diagonal representation for robust multi-view subspace clustering. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.11.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
16
|
Abstract
A synthesis of phenotypic and quantitative genomic traits is provided for bacteria and archaea, in the form of a scripted, reproducible workflow that standardizes and merges 26 sources. The resulting unified dataset covers 14 phenotypic traits, 5 quantitative genomic traits, and 4 environmental characteristics for approximately 170,000 strain-level and 15,000 species-aggregated records. It spans all habitats including soils, marine and fresh waters and sediments, host-associated and thermal. Trait data can find use in clarifying major dimensions of ecological strategy variation across species. They can also be used in conjunction with species and abundance sampling to characterize trait mixtures in communities and responses of traits along environmental gradients.
Collapse
|
17
|
Hornischer K, Khaledi A, Pohl S, Schniederjans M, Pezoldt L, Casilag F, Muthukumarasamy U, Bruchmann S, Thöming J, Kordes A, Häussler S. BACTOME-a reference database to explore the sequence- and gene expression-variation landscape of Pseudomonas aeruginosa clinical isolates. Nucleic Acids Res 2020; 47:D716-D720. [PMID: 30272193 PMCID: PMC6324029 DOI: 10.1093/nar/gky895] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 09/21/2018] [Indexed: 12/26/2022] Open
Abstract
Extensive use of next-generation sequencing (NGS) for pathogen profiling has the potential to transform our understanding of how genomic plasticity contributes to phenotypic versatility. However, the storage of large amounts of NGS data and visualization tools need to evolve to offer the scientific community fast and convenient access to these data. We introduce BACTOME as a database system that links aligned DNA- and RNA-sequencing reads of clinical Pseudomonas aeruginosa isolates with clinically relevant pathogen phenotypes. The database allows data extraction for any single isolate, gene or phenotype as well as data filtering and phenotypic grouping for specific research questions. With the integration of statistical tools we illustrate the usefulness of a relational database structure for the identification of phenotype-genotype correlations as an essential part of the discovery pipeline in genomic research. Furthermore, the database provides a compilation of DNA sequences and gene expression values of a plethora of clinical isolates to give a consensus DNA sequence and consensus gene expression signature. Deviations from the consensus thereby describe the genomic landscape and the transcriptional plasticity of the species P. aeruginosa. The database is available at https://bactome.helmholtz-hzi.de.
Collapse
Affiliation(s)
- Klaus Hornischer
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany.,Molecular Health GmbH, D-69115 Heidelberg, Germany
| | - Ariane Khaledi
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany
| | - Sarah Pohl
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany
| | - Monika Schniederjans
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany
| | - Lorena Pezoldt
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany
| | - Fiordiligie Casilag
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany
| | - Uthayakumar Muthukumarasamy
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany
| | - Sebastian Bruchmann
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany.,Pathogen Genomics, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Janne Thöming
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany
| | - Adrian Kordes
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany
| | - Susanne Häussler
- Institute of Molecular Bacteriology, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany.,Institute of Molecular Bacteriology, TWINCORE GmbH, Center for Clinical and Experimental Infection Research, D-30625 Hannover, Germany
| |
Collapse
|
18
|
San JE, Baichoo S, Kanzi A, Moosa Y, Lessells R, Fonseca V, Mogaka J, Power R, de Oliveira T. Current Affairs of Microbial Genome-Wide Association Studies: Approaches, Bottlenecks and Analytical Pitfalls. Front Microbiol 2020; 10:3119. [PMID: 32082269 PMCID: PMC7002396 DOI: 10.3389/fmicb.2019.03119] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 12/24/2019] [Indexed: 12/12/2022] Open
Abstract
Microbial genome-wide association studies (mGWAS) are a new and exciting research field that is adapting human GWAS methods to understand how variations in microbial genomes affect host or pathogen phenotypes, such as drug resistance, virulence, host specificity and prognosis. Several computational tools and methods have been developed or adapted from human GWAS to facilitate the discovery of novel mutations and structural variations that are associated with the phenotypes of interest. However, no comprehensive, end-to-end, user-friendly tool is currently available. The development of a broadly applicable pipeline presents a real opportunity among computational biologists. Here, (i) we review the prominent and promising tools, (ii) discuss analytical pitfalls and bottlenecks in mGWAS, (iii) provide insights into the selection of appropriate tools, (iv) highlight the gaps that still need to be filled and how users and developers can work together to overcome these bottlenecks. Use of mGWAS research can inform drug repositioning decisions as well as accelerate the discovery and development of more effective vaccines and antimicrobials for pressing infectious diseases of global health significance, such as HIV, TB, influenza, and malaria.
Collapse
Affiliation(s)
- James Emmanuel San
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Shakuntala Baichoo
- Department of Digital Technologies, FoICDT, University of Mauritius, Réduit, Mauritius
| | - Aquillah Kanzi
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Yumna Moosa
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Richard Lessells
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Vagner Fonseca
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Laboratório de Genética Celular e Molecular, ICB, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - John Mogaka
- Discipline of Public Health, University of Kwazulu-Natal, Durban, South Africa
| | - Robert Power
- St Edmund Hall, Oxford University, Oxford, United Kingdom
| | - Tulio de Oliveira
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Department of Global Health, University of Washington, Seattle, WA, United States
| |
Collapse
|
19
|
Patterns of diverse gene functions in genomic neighborhoods predict gene function and phenotype. Sci Rep 2019; 9:19537. [PMID: 31863070 PMCID: PMC6925100 DOI: 10.1038/s41598-019-55984-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 12/02/2019] [Indexed: 01/01/2023] Open
Abstract
Genes with similar roles in the cell cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes. It is a very common occurrence that pairs of dissimilar gene functions – corresponding to semantically distant Gene Ontology terms – are significantly co-located on chromosomes. These neighborhood associations are often as conserved across genomes as the known associations between similar functions, suggesting selective benefits from clustering of certain diverse functions, which may conceivably play complementary roles in the cell. We propose a simple encoding of chromosomal gene order, the neighborhood function profiles (NFP), which draws on diverse gene clustering patterns to predict gene function and phenotype. NFPs yield a 26–46% increase in predictive power over state-of-the-art approaches that propagate function across neighborhoods, thus providing hundreds of novel, high-confidence gene function inferences per genome. Furthermore, we demonstrate that copy number-neutral structural variation that shapes gene function distribution across chromosomes can predict phenotype of individuals from their genome sequence.
Collapse
|
20
|
Weissman JL, Fagan WF, Johnson PLF. Linking high GC content to the repair of double strand breaks in prokaryotic genomes. PLoS Genet 2019; 15:e1008493. [PMID: 31703064 PMCID: PMC6867656 DOI: 10.1371/journal.pgen.1008493] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 11/20/2019] [Accepted: 10/25/2019] [Indexed: 01/21/2023] Open
Abstract
Genomic GC content varies widely among microbes for reasons unknown. While mutation bias partially explains this variation, prokaryotes near-universally have a higher GC content than predicted solely by this bias. Debate surrounds the relative importance of the remaining explanations of selection versus biased gene conversion favoring GC alleles. Some environments (e.g. soils) are associated with a high genomic GC content of their inhabitants, which implies that either high GC content is a selective adaptation to particular habitats, or that certain habitats favor increased rates of gene conversion. Here, we report a novel association between the presence of the non-homologous end joining DNA double-strand break repair pathway and GC content; this observation suggests that DNA damage may be a fundamental driver of GC content, leading in part to the many environmental patterns observed to-date. We discuss potential mechanisms accounting for the observed association, and provide preliminary evidence that sites experiencing higher rates of double-strand breaks are under selection for increased GC content relative to the genomic background. The overall nucleotide composition of an organism’s genome varies greatly between species. Previous work has identified certain environmental factors (e.g., oxygen availability) associated with the relative number of GC bases as opposed to AT bases in the genomes of species. Many of these environments that are associated with high GC content are also associated with relatively high rates of DNA damage. We show that organisms possessing the non-homologous end-joining DNA repair pathway, which is one mechanism to repair DNA double-strand breaks, have an elevated GC content relative to expectation. We also show that certain sites on the genome that are particularly susceptible to double strand breaks have an elevated GC content. This leads us to suggest that an important underlying driver of variability in nucleotide composition across environments is the rate of DNA damage (specifically double-strand breaks) to which an organism living in each environment is exposed.
Collapse
Affiliation(s)
- JL Weissman
- Department of Biology, University of Maryland - College Park, College Park, Maryland, United States of America
| | - William F. Fagan
- Department of Biology, University of Maryland - College Park, College Park, Maryland, United States of America
| | - Philip L. F. Johnson
- Department of Biology, University of Maryland - College Park, College Park, Maryland, United States of America
- * E-mail:
| |
Collapse
|
21
|
Bewick S, Gurarie E, Weissman JL, Beattie J, Davati C, Flint R, Thielen P, Breitwieser F, Karig D, Fagan WF. Trait-based analysis of the human skin microbiome. MICROBIOME 2019; 7:101. [PMID: 31277701 PMCID: PMC6612184 DOI: 10.1186/s40168-019-0698-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 05/19/2019] [Indexed: 05/04/2023]
Abstract
BACKGROUND The past decade of microbiome research has concentrated on cataloging the diversity of taxa in different environments. The next decade is poised to focus on microbial traits and function. Most existing methods for doing this perform pathway analysis using reference databases. This has both benefits and drawbacks. Function can go undetected if reference databases are coarse-grained or incomplete. Likewise, detection of a pathway does not guarantee expression of the associated function. Finally, function cannot be connected to specific microbial constituents, making it difficult to ascertain the types of organisms exhibiting particular traits-something that is important for understanding microbial success in specific environments. A complementary approach to pathway analysis is to use the wealth of microbial trait information collected over years of lab-based, culture experiments. METHODS Here, we use journal articles and Bergey's Manual of Systematic Bacteriology to develop a trait-based database for 971 human skin bacterial taxa. We then use this database to examine functional traits that are over/underrepresented among skin taxa. Specifically, we focus on three trait classes-binary, categorical, and quantitative-and compare trait values among skin taxa and microbial taxa more broadly. We compare binary traits using a Chi-square test, categorical traits using randomization trials, and quantitative traits using a nonparametric relative effects test based on global rankings using Tukey contrasts. RESULTS We find a number of traits that are over/underrepresented within the human skin microbiome. For example, spore formation, acid phosphatase, alkaline phosphatase, pigment production, catalase, and oxidase are all less common among skin taxa. As well, skin bacteria are less likely to be aerobic, favoring, instead, a facultative strategy. They are also less likely to exhibit gliding motility, less likely to be spirillum or rod-shaped, and less likely to grow in chains. Finally, skin bacteria have more difficulty at high pH, prefer warmer temperatures, and are much less resilient to hypotonic conditions. CONCLUSIONS Our analysis shows how an approach that relies on information from culture experiments can both support findings from pathway analysis, and also generate new insights into the structuring principles of microbial communities.
Collapse
Affiliation(s)
- Sharon Bewick
- Department of Biological Sciences, Clemson University, Clemson, SC 29631 USA
| | - Eliezer Gurarie
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| | - JL Weissman
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| | - Jess Beattie
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| | - Cyrus Davati
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| | - Rachel Flint
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| | - Peter Thielen
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD 20723 USA
| | - Florian Breitwieser
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD 21205 USA
| | - David Karig
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD 20723 USA
- Department of Bioengineering, Clemson University, Clemson, SC 29631 USA
| | - William F. Fagan
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| |
Collapse
|
22
|
Weissman JL, Laljani RMR, Fagan WF, Johnson PLF. Visualization and prediction of CRISPR incidence in microbial trait-space to identify drivers of antiviral immune strategy. ISME JOURNAL 2019; 13:2589-2602. [PMID: 31239539 PMCID: PMC6776019 DOI: 10.1038/s41396-019-0411-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 03/15/2019] [Accepted: 03/24/2019] [Indexed: 01/21/2023]
Abstract
Bacteria and archaea are locked in a near-constant battle with their viral pathogens. Despite previous mechanistic characterization of numerous prokaryotic defense strategies, the underlying ecological drivers of different strategies remain largely unknown and predicting which species will take which strategies remains a challenge. Here, we focus on the CRISPR immune strategy and develop a phylogenetically-corrected machine learning approach to build a predictive model of CRISPR incidence using data on over 100 traits across over 2600 species. We discover a strong but hitherto-unknown negative interaction between CRISPR and aerobicity, which we hypothesize may result from interference between CRISPR-associated proteins and non-homologous end-joining DNA repair due to oxidative stress. Our predictive model also quantitatively confirms previous observations of an association between CRISPR and temperature. Finally, we contrast the environmental associations of different CRISPR system types (I, II, III) and restriction modification systems, all of which act as intracellular immune systems.
Collapse
Affiliation(s)
- Jake L Weissman
- Department of Biology, University of Maryland, College Park, MD, USA
| | - Rohan M R Laljani
- Department of Biology, University of Maryland, College Park, MD, USA
| | - William F Fagan
- Department of Biology, University of Maryland, College Park, MD, USA
| | | |
Collapse
|
23
|
Schmutzer M, Barraclough TG. The role of recombination, niche-specific gene pools and flexible genomes in the ecological speciation of bacteria. Ecol Evol 2019; 9:4544-4556. [PMID: 31031926 PMCID: PMC6476844 DOI: 10.1002/ece3.5052] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 02/16/2019] [Accepted: 02/18/2019] [Indexed: 12/21/2022] Open
Abstract
Bacteria diversify into genetic clusters analogous to those observed in sexual eukaryotes, but the definition of bacterial species is an ongoing problem. Recent work has focused on adaptation to distinct ecological niches as the main driver of clustering, but there remains debate about the role of recombination in that process. One view is that homologous recombination occurs too rarely for gene flow to constrain divergent selection. Another view is that homologous recombination is frequent enough in many bacterial populations that barriers to gene flow are needed to permit divergence. Niche-specific gene pools have been proposed as a general mechanism to limit gene flow. We use theoretical models to evaluate additional hypotheses that evolving genetic architecture, specifically the effect sizes of genes and gene gain and loss, can limit gene flow between diverging populations. Our model predicts that (a) in the presence of gene flow and recombination, ecological divergence is concentrated in few loci of large effect and (b) high rates of gene flow plus recombination promote gene loss and favor the evolution of niche-specific genes. The results show that changing genetic architecture and gene loss can facilitate ecological divergence, even without niche-specific gene pools. We discuss these results in the context of recent studies of sympatric divergence in microbes.
Collapse
|
24
|
Perz AI, Giles CB, Brown CA, Porter H, Roopnarinesingh X, Wren JD. MNEMONIC: MetageNomic Experiment Mining to create an OTU Network of Inhabitant Correlations. BMC Bioinformatics 2019; 20:96. [PMID: 30871469 PMCID: PMC6419333 DOI: 10.1186/s12859-019-2623-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Background The number of publicly available metagenomic experiments in various environments has been rapidly growing, empowering the potential to identify similar shifts in species abundance between different experiments. This could be a potentially powerful way to interpret new experiments, by identifying common themes and causes behind changes in species abundance. Results We propose a novel framework for comparing microbial shifts between conditions. Using data from one of the largest human metagenome projects to date, the American Gut Project (AGP), we obtain differential abundance vectors for microbes using experimental condition information provided with the AGP metadata, such as patient age, dietary habits, or health status. We show it can be used to identify similar and opposing shifts in microbial species, and infer putative interactions between microbes. Our results show that groups of shifts with similar effects on microbiome can be identified and that similar dietary interventions display similar microbial abundance shifts. Conclusions Without comparison to prior data, it is difficult for experimentalists to know if their observed changes in species abundance have been observed by others, both in their conditions and in others they would never consider comparable. Yet, this can be a very important contextual factor in interpreting the significance of a shift. We’ve proposed and tested an algorithmic solution to this problem, which also allows for comparing the metagenomic signature shifts between conditions in the existing body of data. Electronic supplementary material The online version of this article (10.1186/s12859-019-2623-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aleksandra I Perz
- Arthritis and Clinical Immunology Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104-5005, USA.
| | - Cory B Giles
- Arthritis and Clinical Immunology Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104-5005, USA.,Department of Geriatric Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Chase A Brown
- Arthritis and Clinical Immunology Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104-5005, USA.,Oklahoma Center for Neuroscience, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Hunter Porter
- Arthritis and Clinical Immunology Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104-5005, USA.,Oklahoma Center for Neuroscience, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Xiavan Roopnarinesingh
- Arthritis and Clinical Immunology Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104-5005, USA.,Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Jonathan D Wren
- Arthritis and Clinical Immunology Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, 73104-5005, USA. .,Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA. .,Oklahoma Center for Neuroscience, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA. .,Department of Geriatric Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.
| |
Collapse
|
25
|
Uncovering carbohydrate metabolism through a genotype-phenotype association study of 56 lactic acid bacteria genomes. Appl Microbiol Biotechnol 2019; 103:3135-3152. [PMID: 30830251 PMCID: PMC6447522 DOI: 10.1007/s00253-019-09701-6] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 02/14/2019] [Accepted: 02/14/2019] [Indexed: 11/09/2022]
Abstract
Owing to their unique potential to ferment carbohydrates, both homo- and heterofermentative lactic acid bacteria (LAB) are widely used in the food industry. Deciphering the genetic basis that determine the LAB fermentation type, and hence carbohydrate utilization, is paramount to optimize LAB industrial processes. Deep sequencing of 24 LAB species and comparison with 32 publicly available genome sequences provided a comparative data set including five major LAB genera for further analysis. Phylogenomic reconstruction confirmed Leuconostoc and Pediococcus species as independently emerging from the Lactobacillus genus, within one of the three phylogenetic clades identified. These clades partially grouped LABs according to their fermentation types, suggesting that some metabolic capabilities were independently acquired during LAB evolution. In order to apply a genome-wide association study (GWAS) at the multigene family level, utilization of 49 carbohydrates was also profiled for these 56 LAB species. GWAS results indicated that obligately heterofermentative species lack 1-phosphofructokinase, required for d-mannose degradation in the homofermentative pathway. Heterofermentative species were found to often contain the araBAD operon, involved in l-arabinose degradation, which is important for heterofermentation. Taken together, our results provide helpful insights into the genetic determinants of LAB carbohydrate metabolism, and opens for further experimental research, aiming at validating the role of these candidate genes for industrial applications.
Collapse
|
26
|
Barnett SE, Youngblut ND, Buckley DH. Data Analysis for DNA Stable Isotope Probing Experiments Using Multiple Window High-Resolution SIP. Methods Mol Biol 2019; 2046:109-128. [PMID: 31407300 DOI: 10.1007/978-1-4939-9721-3_9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
DNA stable isotope probing (DNA-SIP) allows for the identification of microbes that assimilate isotopically labeled substrates into DNA. Here we describe the analysis of sequencing data using the multiple window high-resolution DNA-SIP method (MW-HR-SIP). MW-HR-SIP has improved accuracy over other methods and is easily implemented on the statistical platform R. We also discuss key experimental parameters to consider when designing DNA-SIP experiments and how these parameters affect accuracy of analysis.
Collapse
Affiliation(s)
- Samuel E Barnett
- School of Integrative Plant Science, Cornell University, Ithaca, NY, USA
| | - Nicholas D Youngblut
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Daniel H Buckley
- School of Integrative Plant Science, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
27
|
Tedersoo L, Drenkhan R, Anslan S, Morales‐Rodriguez C, Cleary M. High-throughput identification and diagnostics of pathogens and pests: Overview and practical recommendations. Mol Ecol Resour 2019; 19:47-76. [PMID: 30358140 PMCID: PMC7379260 DOI: 10.1111/1755-0998.12959] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 08/01/2018] [Accepted: 08/28/2018] [Indexed: 12/26/2022]
Abstract
High-throughput identification technologies provide efficient tools for understanding the ecology and functioning of microorganisms. Yet, these methods have been only rarely used for monitoring and testing ecological hypotheses in plant pathogens and pests in spite of their immense importance in agriculture, forestry and plant community dynamics. The main objectives of this manuscript are the following: (a) to provide a comprehensive overview about the state-of-the-art high-throughput quantification and molecular identification methods used to address population dynamics, community ecology and host associations of microorganisms, with a specific focus on antagonists such as pathogens, viruses and pests; (b) to compile available information and provide recommendations about specific protocols and workable primers for bacteria, fungi, oomycetes and insect pests; and (c) to provide examples of novel methods used in other microbiological disciplines that are of great potential use for testing specific biological hypotheses related to pathology. Finally, we evaluate the overall perspectives of the state-of-the-art and still evolving methods for diagnostics and population- and community-level ecological research of pathogens and pests.
Collapse
Affiliation(s)
- Leho Tedersoo
- Natural History Museum and Institute of Ecology and Earth SciencesUniversity of TartuTartuEstonia
| | - Rein Drenkhan
- Institute of Forestry and Rural EngineeringEstonian University of Life SciencesTartuEstonia
| | - Sten Anslan
- Natural History Museum and Institute of Ecology and Earth SciencesUniversity of TartuTartuEstonia
| | | | - Michelle Cleary
- Southern Swedish Forest Research CentreSwedish University of Agricultural SciencesAlnarpSweden
| |
Collapse
|
28
|
Engqvist MKM. Correlating enzyme annotations with a large set of microbial growth temperatures reveals metabolic adaptations to growth at diverse temperatures. BMC Microbiol 2018; 18:177. [PMID: 30400856 PMCID: PMC6219164 DOI: 10.1186/s12866-018-1320-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 10/16/2018] [Indexed: 12/15/2022] Open
Abstract
Background The ambient temperature of all habitats is a key physical property that shapes the biology of microbes inhabiting them. The optimal growth temperature (OGT) of a microbe, is therefore a key piece of data needed to understand evolutionary adaptations manifested in their genome sequence. Unfortunately there is no growth temperature database or easily downloadable dataset encompassing the majority of cultured microorganisms. We are thus limited in interpreting genomic data to identify temperature adaptations in microbes. Results In this work I significantly contribute to closing this gap by mining data from major culture collection centres to obtain growth temperature data for a nonredundant set of 21,498 microbes. The dataset (10.5281/zenodo.1175608) contains mainly bacteria and archaea and spans psychrophiles, mesophiles, thermophiles and hyperthermophiles. Using this data a full 43% of all protein entries in the UniProt database can be annotated with the growth temperature of the species from which they originate. I validate the dataset by showing a Pearson correlation of up to 0.89 between growth temperature and mean enzyme optima, a physiological property directly influenced by the growth temperature. Using the temperature dataset I correlate the genomic occurance of enzyme functional annotations with growth temperature. I identify 319 enzyme functions that either increase or decrease in occurrence with temperature. Eight metabolic pathways were statistically enriched for these enzyme functions. Furthermore, I establish a correlation between 33 domains of unknown function (DUFs) with growth temperature in microbes, four of which (DUF438, DUF1524, DUF1957 and DUF3458_C) were significant in both archaea and bacteria. Conclusions The growth temperature dataset enables large-scale correlation analysis with enzyme function- and domain-level annotations. Growth-temperature dependent changes in their occurrence highlight potential evolutionary adaptations. A few of the identified changes are previously known, such as the preference for menaquinone biosynthesis through the futalosine pathway in bacteria growing at high temperatures. Others represent important starting points for future studies, such as DUFs where their occurrence change with temperature. The growth temperature dataset should become a valuable community resource and will find additional, important, uses in correlating genomic, transcriptomic, proteomic, metabolomic, phenotypic or taxonomic properties with temperature in future studies. Electronic supplementary material The online version of this article (10.1186/s12866-018-1320-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Martin K M Engqvist
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden.
| |
Collapse
|
29
|
Vidulin V, Šmuc T, Džeroski S, Supek F. The evolutionary signal in metagenome phyletic profiles predicts many gene functions. MICROBIOME 2018; 6:129. [PMID: 29991352 PMCID: PMC6040064 DOI: 10.1186/s40168-018-0506-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 06/19/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner. RESULTS We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions. The MPPs are an encoding of environmental DNA sequencing data that consists of relative abundances of gene families across metagenomes. We find that such MPPs can accurately predict 826 Gene Ontology functional categories, while drawing on human gut microbiomes, ocean metagenomes, and DNA sequences from various other engineered and natural environments. Overall, in this task, the MPPs are highly accurate, and moreover they provide coverage for a set of Gene Ontology terms largely complementary to standard phylogenetic profiles, derived from fully sequenced genomes. We also find that metagenomes approximated from taxon relative abundance obtained via 16S rRNA gene sequencing may provide surprisingly useful predictive models. Crucially, the MPPs derived from different types of environments can infer distinct, non-overlapping sets of gene functions and therefore complement each other. Consistently, simulations on > 5000 metagenomes indicate that the amount of data is not in itself critical for maximizing predictive accuracy, while the diversity of sampled environments appears to be the critical factor for obtaining robust models. CONCLUSIONS In past work, metagenomics has provided invaluable insight into ecology of various habitats, into diversity of microbial life and also into human health and disease mechanisms. We propose that environmental DNA sequencing additionally constitutes a useful tool to predict biological roles of genes, yielding inferences out of reach for existing comparative genomics approaches.
Collapse
Affiliation(s)
- Vedrana Vidulin
- Faculty of Information Studies, 8000 Novo Mesto, Slovenia
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Tomislav Šmuc
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Fran Supek
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| |
Collapse
|
30
|
Hockenberry AJ, Stern AJ, Amaral LAN, Jewett MC. Diversity of Translation Initiation Mechanisms across Bacterial Species Is Driven by Environmental Conditions and Growth Demands. Mol Biol Evol 2017; 35:582-592. [PMID: 29220489 PMCID: PMC5850609 DOI: 10.1093/molbev/msx310] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The Shine-Dalgarno (SD) sequence motif is frequently found upstream of protein coding genes and is thought to be the dominant mechanism of translation initiation used by bacteria. Experimental studies have shown that the SD sequence facilitates start codon recognition and enhances translation initiation by directly interacting with the highly conserved anti-SD sequence on the 30S ribosomal subunit. However, the proportion of SD-led genes within a genome varies across species and the factors governing this variation in translation initiation mechanisms remain largely unknown. Here, we conduct a phylogenetically informed analysis and find that species capable of rapid growth contain a higher proportion of SD-led genes throughout their genomes. We show that SD sequence utilization covaries with a suite of genomic features that are important for efficient translation initiation and elongation. In addition to these endogenous genomic factors, we further show that exogenous environmental factors may influence the evolution of translation initiation mechanisms by finding that thermophilic species contain significantly more SD-led genes than mesophiles. Our results demonstrate that variation in translation initiation mechanisms across bacterial species is predictable and is a consequence of differential life-history strategies related to maximum growth rate and environmental-specific constraints.
Collapse
Affiliation(s)
- Adam J Hockenberry
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Program in Biological Sciences, Northwestern University, Evanston, IL, USA
| | - Aaron J Stern
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Luís A N Amaral
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Northwestern Institute for Complex Systems, Northwestern University, Evanston, IL, USA
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, USA
- Corresponding authors: E-mails: ;
| | - Michael C Jewett
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Northwestern Institute for Complex Systems, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Simpson Querrey Institute for BioNanotechnology, Northwestern University, Evanston, IL, USA
- Corresponding authors: E-mails: ;
| |
Collapse
|