1
|
Fitness-Conditional Genes for Soil Adaptation in the Bioaugmentation Agent Pseudomonas veronii 1YdBTEX2. mSystems 2023; 8:e0117422. [PMID: 36786610 PMCID: PMC10134887 DOI: 10.1128/msystems.01174-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023] Open
Abstract
Strain inoculation (bioaugmentation) is a potentially useful technology to provide microbiomes with new functionalities. However, there is limited understanding of the genetic factors contributing to successful establishment of inoculants. This work aimed to characterize the genes implicated in proliferation of the monoaromatic compound-degrading Pseudomonas veronii 1YdBTEX2 in nonsterile polluted soils. We generated two independent mutant libraries by random minitransposon-delivered marker insertion followed by deep sequencing (Tn-seq) with a total of 5.0 × 105 unique insertions. Libraries were grown in multiple successive cycles for up to 50 generations either in batch liquid medium or in two types of soil microcosms with different resident microbial content (sand or silt) in the presence of toluene. Analysis of gene insertion abundances at different time points (passed generations of metapopulation growth), in comparison to proportions at start and to in silico generated randomized insertion distributions, allowed to define ~800 essential genes common to both libraries and ~2,700 genes with conditional fitness effects in either liquid or soil (195 of which resulted in fitness gain). Conditional fitness genes largely overlapped among all growth conditions but affected approximately twice as many functions in liquid than in soil. This indicates soil to be a more promiscuous environment for mutant growth, probably because of additional nutrient availability. Commonly depleted genes covered a wide range of biological functions and metabolic pathways, such as inorganic ion transport, fatty acid metabolism, amino acid biosynthesis, or nucleotide and cofactor metabolism. Only sparse gene sets were uncovered whose insertion caused fitness decrease exclusive for soils, which were different between silt and sand. Despite detectable higher resident bacteria and potential protist predatory counts in silt, we were, therefore, unable to detect any immediately obvious candidate genes affecting P. veronii biological competitiveness. In contrast to liquid growth conditions, mutants inactivating flagella biosynthesis and motility consistently gained strong fitness advantage in soils and displayed higher growth rates than wild type. In conclusion, although many gene functions were found to be important for growth in soils, most of these are not specific as they affect growth in liquid minimal medium more in general. This indicates that P. veronii does not need major metabolic reprogramming for proliferation in soil with accessible carbon and generally favorable growth conditions. IMPORTANCE Restoring damaged microbiomes is still a formidable challenge. Classical widely adopted approaches consist of augmenting communities with pure or mixed cultures in the hope that these display their intended selected properties under in situ conditions. Ecological theory, however, dictates that introduction of a nonresident microbe is unlikely to lead to its successful proliferation in a foreign system such as a soil microbiome. In an effort to study this systematically, we used random transposon insertion scanning to identify genes and possibly, metabolic subsystems, that are crucial for growth and survival of a bacterial inoculant (Pseudomonas veronii) for targeted degradation of monoaromatic compounds in contaminated nonsterile soils. Our results indicate that although many gene functions are important for proliferation in soil, they are general factors for growth and not exclusive for soil. In other words, P. veronii is a generalist that is not a priori hindered by the soil for its proliferation and would make a good bioaugmentation candidate.
Collapse
|
2
|
Liu X, Liu G, Wu Y, Pang X, Wu Y, Qinshu, Niu J, Chen Q, Zhang X. Transposon sequencing: A powerful tool for the functional genomic study of food-borne pathogens. Trends Food Sci Technol 2021. [DOI: 10.1016/j.tifs.2021.06.032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
3
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
4
|
Selection or drift: The population biology underlying transposon insertion sequencing experiments. Comput Struct Biotechnol J 2020; 18:791-804. [PMID: 32280434 PMCID: PMC7138912 DOI: 10.1016/j.csbj.2020.03.021] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 03/06/2020] [Accepted: 03/22/2020] [Indexed: 01/23/2023] Open
Abstract
Transposon insertion sequencing methods such as Tn-seq revolutionized microbiology by allowing the identification of genomic loci that are critical for viability in a specific environment on a genome-wide scale. While powerful, transposon insertion sequencing suffers from limited reproducibility when different analysis methods are compared. From the perspective of population biology, this may be explained by changes in mutant frequency due to chance (drift) rather than differential fitness (selection). Here, we develop a mathematical model of the population biology of transposon insertion sequencing experiments, i.e. the changes in size and composition of the transposon-mutagenized population during the experiment. We use this model to investigate mutagenesis, the growth of the mutant library, and its passage through bottlenecks. Specifically, we study how these processes can lead to extinction of individual mutants depending on their fitness and the distribution of fitness effects (DFE) of the entire mutant population. We find that in typical in vitro experiments few mutants with high fitness go extinct. However, bottlenecks of a size that is common in animal infection models lead to so much random extinction that a large number of viable mutants would be misclassified. While mutants with low fitness are more likely to be lost during the experiment, mutants with intermediate fitness are expected to be much more abundant and can constitute a large proportion of detected hits, i.e. false positives. Thus, incorporating the DFEs of randomly generated mutations in the analysis may improve the reproducibility of transposon insertion experiments, especially when strong bottlenecks are encountered.
Collapse
|
5
|
Liu S, Jiang Y, Yu T. Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model. Genet Epidemiol 2019; 43:786-799. [PMID: 31328831 DOI: 10.1002/gepi.22246] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 05/16/2019] [Accepted: 06/13/2019] [Indexed: 11/06/2022]
Abstract
RNA sequencing (RNA-Seq) has been frequently used in genomic studies and has generated a vast amount of data. The RNA-Seq data are composed of two parts: (a) a sequence of nucleotides of the genome; and (b) a corresponding sequence of counts, standing for the number of short reads whose mapped positions start at each position of the genome. One common feature of these count data is that they are typically nonuniform; recent studies have revealed that the nonuniformity is partially owing to a systematic bias resulted from the sequencing preference. Existing works in the literature model the nonuniformity with a single component Poisson linear model that incorporates the effects of the sequencing preference. However, we observe consistently that the short reads mapped to a gene may have a mixture structure and can be zero-inflated. A single component model may not suffice to model the complexity of such data. In this paper, we propose a zero-inflated mixture Poisson linear model for the RNA-Seq count data and derive a fast expectation-maximisation-based algorithm for estimating the unknown parameters. Numerical studies are conducted to illustrate the effectiveness of our method.
Collapse
Affiliation(s)
- Siyun Liu
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, Oregon
| | - Tao Yu
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| |
Collapse
|
6
|
Zhou Y, Wan X, Zhang B, Tong T. Classifying next-generation sequencing data using a zero-inflated Poisson model. Bioinformatics 2019; 34:1329-1335. [PMID: 29186294 DOI: 10.1093/bioinformatics/btx768] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Accepted: 11/24/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation With the development of high-throughput techniques, RNA-sequencing (RNA-seq) is becoming increasingly popular as an alternative for gene expression analysis, such as RNAs profiling and classification. Identifying which type of diseases a new patient belongs to with RNA-seq data has been recognized as a vital problem in medical research. As RNA-seq data are discrete, statistical methods developed for classifying microarray data cannot be readily applied for RNA-seq data classification. Witten proposed a Poisson linear discriminant analysis (PLDA) to classify the RNA-seq data in 2011. Note, however, that the count datasets are frequently characterized by excess zeros in real RNA-seq or microRNA sequence data (i.e. when the sequence depth is not enough or small RNAs with the length of 18-30 nucleotides). Therefore, it is desired to develop a new model to analyze RNA-seq data with an excess of zeros. Results In this paper, we propose a Zero-Inflated Poisson Logistic Discriminant Analysis (ZIPLDA) for RNA-seq data with an excess of zeros. The new method assumes that the data are from a mixture of two distributions: one is a point mass at zero, and the other follows a Poisson distribution. We then consider a logistic relation between the probability of observing zeros and the mean of the genes and the sequencing depth in the model. Simulation studies show that the proposed method performs better than, or at least as well as, the existing methods in a wide range of settings. Two real datasets including a breast cancer RNA-seq dataset and a microRNA-seq dataset are also analyzed, and they coincide with the simulation results that our proposed method outperforms the existing competitors. Availability and implementation The software is available at http://www.math.hkbu.edu.hk/∼tongt. Contact xwan@comp.hkbu.edu.hk or tongt@hkbu.edu.hk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yan Zhou
- College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, Shenzhen 518060, China
| | - Xiang Wan
- Department of Computer Science, and Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Baoxue Zhang
- School of Statistics, Capital University of Economics and Business, Beijing 100070, China
| | - Tiejun Tong
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| |
Collapse
|
7
|
Danchin A, Ouzounis C, Tokuyasu T, Zucker JD. No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects. Microb Biotechnol 2018; 11:588-605. [PMID: 29806194 PMCID: PMC6011933 DOI: 10.1111/1751-7915.13284] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from 'the sequence tells the structure tells the function' fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader.
Collapse
Affiliation(s)
- Antoine Danchin
- Integromics, Institute of Cardiometabolism and Nutrition, Hôpital de la Pitié-Salpêtrière, 47 Boulevard de l'Hôpital, 75013, Paris, France
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, Hong Kong University, 21 Sassoon Road, Pokfulam, Hong Kong
| | - Christos Ouzounis
- Biological Computation and Process Laboratory, Centre for Research and Technology Hellas, Chemical Process and Energy Resources Institute, Thessalonica, 57001, Greece
| | - Taku Tokuyasu
- Shenzhen Institutes of Advanced Technology, Institute of Synthetic Biology, Shenzhen University Town, 1068 Xueyuan Avenue, Shenzhen, China
| | - Jean-Daniel Zucker
- Integromics, Institute of Cardiometabolism and Nutrition, Hôpital de la Pitié-Salpêtrière, 47 Boulevard de l'Hôpital, 75013, Paris, France
| |
Collapse
|
8
|
Chen L, Garmaeva S, Zhernakova A, Fu J, Wijmenga C. A system biology perspective on environment–host–microbe interactions. Hum Mol Genet 2018; 27:R187-R194. [DOI: 10.1093/hmg/ddy137] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 04/11/2018] [Indexed: 02/07/2023] Open
Affiliation(s)
- Lianmin Chen
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
- Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
| | - Sanzhima Garmaeva
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
| | - Alexandra Zhernakova
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
| | - Jingyuan Fu
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
- Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
| | - Cisca Wijmenga
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
- Department of Immunology, K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, Norway
| |
Collapse
|
9
|
Peng C, Lin Y, Luo H, Gao F. A Comprehensive Overview of Online Resources to Identify and Predict Bacterial Essential Genes. Front Microbiol 2017; 8:2331. [PMID: 29230204 PMCID: PMC5711816 DOI: 10.3389/fmicb.2017.02331] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 11/13/2017] [Indexed: 12/15/2022] Open
Abstract
Genes critical for the survival or reproduction of an organism in certain circumstances are classified as essential genes. Essential genes play a significant role in deciphering the survival mechanism of life. They may be greatly applied to pharmaceutics and synthetic biology. The continuous progress of experimental method for essential gene identification has accelerated the accumulation of gene essentiality data which facilitates the study of essential genes in silico. In this article, we present some available online resources related to gene essentiality, including bioinformatic software tools for transposon sequencing (Tn-seq) analysis, essential gene databases and online services to predict bacterial essential genes. We review several computational approaches that have been used to predict essential genes, and summarize the features used for gene essentiality prediction. In addition, we evaluate the available online bacterial essential gene prediction servers based on the experimentally validated essential gene sets of 30 bacteria from DEG. This article is intended to be a quick reference guide for the microbiologists interested in the essential genes.
Collapse
Affiliation(s)
- Chong Peng
- Department of Physics, School of Science, Tianjin University, Tianjin, China
| | - Yan Lin
- Department of Physics, School of Science, Tianjin University, Tianjin, China
| | - Hao Luo
- Department of Physics, School of Science, Tianjin University, Tianjin, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, Tianjin, China
- Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin University, Tianjin, China
| |
Collapse
|
10
|
Zhao L, Anderson MT, Wu W, T Mobley HL, Bachman MA. TnseqDiff: identification of conditionally essential genes in transposon sequencing studies. BMC Bioinformatics 2017; 18:326. [PMID: 28683752 PMCID: PMC5500955 DOI: 10.1186/s12859-017-1745-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Accepted: 06/26/2017] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Tn-Seq is a high throughput technique for analysis of transposon mutant libraries to determine conditional essentiality of a gene under an experimental condition. A special feature of the Tn-seq data is that multiple mutants in a gene provides independent evidence to prioritize that gene as being essential. The existing methods do not account for this feature or rely on a high-density transposon library. Moreover, these methods are unable to accommodate complex designs. RESULTS The method proposed here is specifically designed for the analysis of Tn-Seq data. It utilizes two steps to estimate the conditional essentiality for each gene in the genome. First, it collects evidence of conditional essentiality for each insertion by comparing read counts of that insertion between conditions. Second, it combines insertion-level evidence for the corresponding gene. It deals with data from both low- and high-density transposon libraries and accommodates complex designs. Moreover, it is very fast to implement. The performance of the proposed method was tested on simulated data and experimental Tn-Seq data from Serratia marcescens transposon mutant library used to identify genes that contribute to fitness in a murine model of infection. CONCLUSION We describe a new, efficient method for identifying conditionally essential genes in Tn-Seq experiments with high detection sensitivity and specificity. It is implemented as TnseqDiff function in R package Tnseq and can be installed from the Comprehensive R Archive Network, CRAN.
Collapse
Affiliation(s)
- Lili Zhao
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, USA.
| | - Mark T Anderson
- Department of Microbiology and Immunology, School of medicine, University of Michigan, Ann Arbor, USA
| | - Weisheng Wu
- BRCF Bioinformatics Core, University of Michigan, Ann Arbor, USA
| | - Harry L T Mobley
- Department of Microbiology and Immunology, School of medicine, University of Michigan, Ann Arbor, USA
| | - Michael A Bachman
- Department of Pathology, School of medicine, University of Michigan, Ann Arbor, USA
| |
Collapse
|