1
|
Peleg O, Borenstein E. Interpolation of microbiome composition in longitudinal data sets. mBio 2024; 15:e0115024. [PMID: 39162569 PMCID: PMC11389371 DOI: 10.1128/mbio.01150-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/11/2024] [Indexed: 08/21/2024] Open
Abstract
The human gut microbiome significantly impacts health, prompting a rise in longitudinal studies that capture microbiome samples at multiple time points. Such studies allow researchers to characterize microbiome changes over time, but importantly, also present major analytical challenges due to incomplete or irregular sampling. To address this challenge, longitudinal microbiome studies often employ various interpolation methods, aiming to infer missing microbiome data. However, to date, a comprehensive assessment of such microbiome interpolation techniques, as well as best practice guidelines for interpolating microbiome data, is still lacking. This work aims to fill this gap, rigorously implementing and systematically evaluating a large array of interpolation methods, spanning several different categories, for longitudinal microbiome interpolation. To assess each method and its ability to accurately infer microbiome composition at missing time points, we used three longitudinal microbiome data sets that follow individuals over a long period of time and a leave-one-out approach. Overall, our analysis demonstrated that the K-nearest neighbors algorithm consistently outperforms other methods in interpolation accuracy, yet, accuracy varied widely across data sets, individuals, and time. Factors such as microbiome stability, sample size, and the time gap between interpolated and adjacent samples significantly influenced accuracy, allowing us to develop a model for predicting the expected interpolation accuracy at a missing time point. Our findings, combined, suggest that accurate interpolation in longitudinal microbiome data is feasible, especially in dense cohorts. Furthermore, using our predictive model, future studies can interpolate data only in time points where the expected interpolation accuracy is high. IMPORTANCE Since missing samples are common in longitudinal microbiome dataset due to inconsistent collection practices, it is important to evaluate and benchmark different interpolation methods for predicting microbiome composition in such samples and facilitate downstream analysis. Our study rigorously evaluated several such methods and identified the K-nearest neighbors approach as particularly effective for this task. The study also notes significant variability in interpolation accuracy among individuals, influenced by factors such as age, sample size, and sampling frequency. Furthermore, we developed a predictive model for estimating interpolation accuracy at a specific time point, enhancing the reliability of such analyses in future studies. Combined, our study, thus, provides critical insights and tools that enhance the accuracy and reliability of data interpolation methods in the growing field of longitudinal microbiome research.
Collapse
Affiliation(s)
- Omri Peleg
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Elhanan Borenstein
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
- Faculty of Medical & Health Sciences, Tel Aviv University, Tel Aviv, Israel
- Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|
2
|
Sizemore N, Oliphant K, Zheng R, Martin CR, Claud EC, Chattopadhyay I. A digital twin of the infant microbiome to predict neurodevelopmental deficits. SCIENCE ADVANCES 2024; 10:eadj0400. [PMID: 38598636 PMCID: PMC11006218 DOI: 10.1126/sciadv.adj0400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 03/06/2024] [Indexed: 04/12/2024]
Abstract
Despite the recognized gut-brain axis link, natural variations in microbial profiles between patients hinder definition of normal abundance ranges, confounding the impact of dysbiosis on infant neurodevelopment. We infer a digital twin of the infant microbiome, forecasting ecosystem trajectories from a few initial observations. Using 16S ribosomal RNA profiles from 88 preterm infants (398 fecal samples and 32,942 abundance estimates for 91 microbial classes), the model (Q-net) predicts abundance dynamics with R2 = 0.69. Contrasting the fit to Q-nets of typical versus suboptimal development, we can reliably estimate individual deficit risk (Mδ) and identify infants achieving poor future head circumference growth with ≈76% area under the receiver operator characteristic curve, 95% ± 1.8% positive predictive value at 98% specificity at 30 weeks postmenstrual age. We find that early transplantation might mitigate risk for ≈45.2% of the cohort, with potentially negative effects from incorrect supplementation. Q-nets are generative artificial intelligence models for ecosystem dynamics, with broad potential applications.
Collapse
Affiliation(s)
- Nicholas Sizemore
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Kaitlyn Oliphant
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, USA
| | - Ruolin Zheng
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Camilia R. Martin
- Division of Neonatology, Weill Cornell Medicine, New York, NY 10021, USA
| | - Erika C. Claud
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, USA
- Neonatology Research, University of Chicago, Chicago, IL 60637, USA
| | - Ishanu Chattopadhyay
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
- Committee on Quantitative Methods in Social, Behavioral, and Health Sciences, University of Chicago, Chicago, IL 60637, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
- Center for Health Statistics, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
3
|
Srinivasan S, Jnana A, Murali TS. Modeling Microbial Community Networks: Methods and Tools for Studying Microbial Interactions. MICROBIAL ECOLOGY 2024; 87:56. [PMID: 38587642 PMCID: PMC11001700 DOI: 10.1007/s00248-024-02370-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 03/28/2024] [Indexed: 04/09/2024]
Abstract
Microbial interactions function as a fundamental unit in complex ecosystems. By characterizing the type of interaction (positive, negative, neutral) occurring in these dynamic systems, one can begin to unravel the role played by the microbial species. Towards this, various methods have been developed to decipher the function of the microbial communities. The current review focuses on the various qualitative and quantitative methods that currently exist to study microbial interactions. Qualitative methods such as co-culturing experiments are visualized using microscopy-based techniques and are combined with data obtained from multi-omics technologies (metagenomics, metabolomics, metatranscriptomics). Quantitative methods include the construction of networks and network inference, computational models, and development of synthetic microbial consortia. These methods provide a valuable clue on various roles played by interacting partners, as well as possible solutions to overcome pathogenic microbes that can cause life-threatening infections in susceptible hosts. Studying the microbial interactions will further our understanding of complex less-studied ecosystems and enable design of effective frameworks for treatment of infectious diseases.
Collapse
Affiliation(s)
- Shanchana Srinivasan
- Department of Public Health Genomics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal, 576104, India
| | - Apoorva Jnana
- Department of Public Health Genomics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal, 576104, India
| | - Thokur Sreepathy Murali
- Department of Public Health Genomics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal, 576104, India.
| |
Collapse
|
4
|
Lyu R, Qu Y, Divaris K, Wu D. Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review. Genes (Basel) 2023; 15:51. [PMID: 38254941 PMCID: PMC11154524 DOI: 10.3390/genes15010051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/22/2023] [Accepted: 12/26/2023] [Indexed: 01/24/2024] Open
Abstract
Biological processes underlying health and disease are inherently dynamic and are best understood when characterized in a time-informed manner. In this comprehensive review, we discuss challenges inherent in time-series microbiome data analyses and compare available approaches and methods to overcome them. Appropriate handling of longitudinal microbiome data can shed light on important roles, functions, patterns, and potential interactions between large numbers of microbial taxa or genes in the context of health, disease, or interventions. We present a comprehensive review and comparison of existing microbiome time-series analysis methods, for both preprocessing and downstream analyses, including differential analysis, clustering, network inference, and trait classification. We posit that the careful selection and appropriate utilization of computational tools for longitudinal microbiome analyses can help advance our understanding of the dynamic host-microbiome relationships that underlie health-maintaining homeostases, progressions to disease-promoting dysbioses, as well as phases of physiologic development like those encountered in childhood.
Collapse
Affiliation(s)
- Ruiqi Lyu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Yixiang Qu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
| | - Kimon Divaris
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Di Wu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
5
|
Shahin M, Ji B, Dixit PD. EMBED: Essential MicroBiomE Dynamics, a dimensionality reduction approach for longitudinal microbiome studies. NPJ Syst Biol Appl 2023; 9:26. [PMID: 37339950 PMCID: PMC10282069 DOI: 10.1038/s41540-023-00285-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 05/23/2023] [Indexed: 06/22/2023] Open
Abstract
Dimensionality reduction offers unique insights into high-dimensional microbiome dynamics by leveraging collective abundance fluctuations of multiple bacteria driven by similar ecological perturbations. However, methods providing lower-dimensional representations of microbiome dynamics both at the community and individual taxa levels are not currently available. To that end, we present EMBED: Essential MicroBiomE Dynamics, a probabilistic nonlinear tensor factorization approach. Like normal mode analysis in structural biophysics, EMBED infers ecological normal modes (ECNs), which represent the unique orthogonal modes capturing the collective behavior of microbial communities. Using multiple real and synthetic datasets, we show that a very small number of ECNs can accurately approximate microbiome dynamics. Inferred ECNs reflect specific ecological behaviors, providing natural templates along which the dynamics of individual bacteria may be partitioned. Moreover, the multi-subject treatment in EMBED systematically identifies subject-specific and universal abundance dynamics that are not detected by traditional approaches. Collectively, these results highlight the utility of EMBED as a versatile dimensionality reduction tool for studies of microbiome dynamics.
Collapse
Affiliation(s)
- Mayar Shahin
- Department of Physics, University of Florida, Gainesville, FL, 32611, USA.
| | - Brian Ji
- Physician-Scientist Training Pathway, Department of Medicine, UCSD, San Diego, CA, 92103, USA
| | - Purushottam D Dixit
- Department of Physics, University of Florida, Gainesville, FL, 32611, USA.
- Genetics Institute, University of Florida, Gainesville, FL, 32611, USA.
- Department of Chemical Engineering, University of Florida, Gainesville, FL, 32611, USA.
- Department of Biomedical Engineering, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
6
|
Benincà E, Pinto S, Cazelles B, Fuentes S, Shetty S, Bogaards JA. Wavelet clustering analysis as a tool for characterizing community structure in the human microbiome. Sci Rep 2023; 13:8042. [PMID: 37198426 PMCID: PMC10192422 DOI: 10.1038/s41598-023-34713-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 05/05/2023] [Indexed: 05/19/2023] Open
Abstract
Human microbiome research is helped by the characterization of microbial networks, as these may reveal key microbes that can be targeted for beneficial health effects. Prevailing methods of microbial network characterization are based on measures of association, often applied to limited sampling points in time. Here, we demonstrate the potential of wavelet clustering, a technique that clusters time series based on similarities in their spectral characteristics. We illustrate this technique with synthetic time series and apply wavelet clustering to densely sampled human gut microbiome time series. We compare our results with hierarchical clustering based on temporal correlations in abundance, within and across individuals, and show that the cluster trees obtained by using either method are significantly different in terms of elements clustered together, branching structure and total branch length. By capitalizing on the dynamic nature of the human microbiome, wavelet clustering reveals community structures that remain obscured in correlation-based methods.
Collapse
Affiliation(s)
- Elisa Benincà
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands.
| | - Susanne Pinto
- Biomedical Data Sciences, Leiden UMC, Leiden, The Netherlands
| | - Bernard Cazelles
- CNRS UMR-8197, IBENS, Ecole Normale Supérieure, Paris, France
- Sorbonne Université, UMMISCO, Paris, France
| | - Susana Fuentes
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Sudarshan Shetty
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
- Department of Medical Microbiology and Infection Prevention, UMC Groningen, Groningen, The Netherlands
| | - Johannes A Bogaards
- Department of Epidemiology & Data Science, Amsterdam UMC location VUMC, Amsterdam, The Netherlands
- Amsterdam Institute for Infection and Immunity, Amsterdam UMC, Amsterdam, The Netherlands
| |
Collapse
|
7
|
Roche KE, Bjork JR, Dasari MR, Grieneisen L, Jansen D, Gould TJ, Gesquiere LR, Barreiro LB, Alberts SC, Blekhman R, Gilbert JA, Tung J, Mukherjee S, Archie EA. Universal gut microbial relationships in the gut microbiome of wild baboons. eLife 2023; 12:e83152. [PMID: 37158607 PMCID: PMC10292843 DOI: 10.7554/elife.83152] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 05/08/2023] [Indexed: 05/10/2023] Open
Abstract
Ecological relationships between bacteria mediate the services that gut microbiomes provide to their hosts. Knowing the overall direction and strength of these relationships is essential to learn how ecology scales up to affect microbiome assembly, dynamics, and host health. However, whether bacterial relationships are generalizable across hosts or personalized to individual hosts is debated. Here, we apply a robust, multinomial logistic-normal modeling framework to extensive time series data (5534 samples from 56 baboon hosts over 13 years) to infer thousands of correlations in bacterial abundance in individual baboons and test the degree to which bacterial abundance correlations are 'universal'. We also compare these patterns to two human data sets. We find that, most bacterial correlations are weak, negative, and universal across hosts, such that shared correlation patterns dominate over host-specific correlations by almost twofold. Further, taxon pairs that had inconsistent correlation signs (either positive or negative) in different hosts always had weak correlations within hosts. From the host perspective, host pairs with the most similar bacterial correlation patterns also had similar microbiome taxonomic compositions and tended to be genetic relatives. Compared to humans, universality in baboons was similar to that in human infants, and stronger than one data set from human adults. Bacterial families that showed universal correlations in human infants were often universal in baboons. Together, our work contributes new tools for analyzing the universality of bacterial associations across hosts, with implications for microbiome personalization, community assembly, and stability, and for designing microbiome interventions to improve host health.
Collapse
Affiliation(s)
- Kimberly E Roche
- Program in Computational Biology and Bioinformatics, Duke UniversityDurhamUnited States
| | - Johannes R Bjork
- University of Groningen and University Medical Center Groningen, Department of Gastroenterology and HepatologyGroningenNetherlands
- University of Groningen and University Medical Center Groningen, Department of GeneticsGroningenNetherlands
- Department of Biological Sciences, University of Notre DameNotre DameUnited States
| | - Mauna R Dasari
- Department of Biological Sciences, University of Notre DameNotre DameUnited States
| | - Laura Grieneisen
- Department of Biology, University of British Columbia-Okanagan CampusKelownaCanada
| | - David Jansen
- Department of Biological Sciences, University of Notre DameNotre DameUnited States
| | - Trevor J Gould
- Department of Ecology, Evolution, and Behavior, University of MinnesotaMinneapolisUnited States
| | | | - Luis B Barreiro
- Committee on Genetics, Genomics, and Systems Biology, University of ChicagoChicagoUnited States
- Section of Genetic Medicine, Department of Medicine, University of ChicagoChicagoUnited States
- Committee on Immunology, University of ChicagoChicagoUnited States
| | - Susan C Alberts
- Department of Biology, Duke UniversityDurhamUnited States
- Department of Evolutionary Anthropology, Duke UniversityDurhamUnited States
- Duke University Population Research Institute, Duke UniversityDurhamUnited States
| | - Ran Blekhman
- Section of Genetic Medicine, Department of Medicine, University of ChicagoChicagoUnited States
| | - Jack A Gilbert
- Department of Pediatrics and the Scripps Institution of Oceanography, University of California, San DiegoSan DiegoUnited States
| | - Jenny Tung
- Department of Biology, Duke UniversityDurhamUnited States
- Department of Evolutionary Anthropology, Duke UniversityDurhamUnited States
- Duke University Population Research Institute, Duke UniversityDurhamUnited States
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary AnthropologyLeipzigGermany
| | - Sayan Mukherjee
- Program in Computational Biology and Bioinformatics, Duke UniversityDurhamUnited States
- Departments of Statistical Science, Mathematics, Computer Science, and Bioinformatics & Biostatistics, Duke UniversityDurhamUnited States
- Center for Scalable Data Analytics and Artificial Intelligence, University of LeipzigLeipzigGermany
- Max Plank Institute for Mathematics in the Natural SciencesLeipzigGermany
| | - Elizabeth A Archie
- Department of Biological Sciences, University of Notre DameNotre DameUnited States
| |
Collapse
|
8
|
Rahman G, Morton JT, Martino C, Sepich-Poore GD, Allaband C, Guccione C, Chen Y, Hakim D, Estaki M, Knight R. BIRDMAn: A Bayesian differential abundance framework that enables robust inference of host-microbe associations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.30.526328. [PMID: 36778470 PMCID: PMC9915500 DOI: 10.1101/2023.01.30.526328] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Quantifying the differential abundance (DA) of specific taxa among experimental groups in microbiome studies is challenging due to data characteristics (e.g., compositionality, sparsity) and specific study designs (e.g., repeated measures, meta-analysis, cross-over). Here we present BIRDMAn (Bayesian Inferential Regression for Differential Microbiome Analysis), a flexible DA method that can account for microbiome data characteristics and diverse experimental designs. Simulations show that BIRDMAn models are robust to uneven sequencing depth and provide a >20-fold improvement in statistical power over existing methods. We then use BIRDMAn to identify antibiotic-mediated perturbations undetected by other DA methods due to subject-level heterogeneity. Finally, we demonstrate how BIRDMAn can construct state-of-the-art cancer-type classifiers using The Cancer Genome Atlas (TCGA) dataset, with substantial accuracy improvements over random forests and existing DA tools across multiple sequencing centers. Collectively, BIRDMAn extracts more informative biological signals while accounting for study-specific experimental conditions than existing approaches.
Collapse
Affiliation(s)
- Gibraan Rahman
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - James T Morton
- Biostatistics & Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Cameron Martino
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | | | - Celeste Allaband
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Caitlin Guccione
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA
| | - Yang Chen
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Dermatology, University of California San Diego, La Jolla, CA, USA
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA
| | - Daniel Hakim
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - Mehrbod Estaki
- Department of Physiology & Pharmacology, University of Calgary, Calgary, Canada
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, USA
| |
Collapse
|
9
|
MDITRE: Scalable and Interpretable Machine Learning for Predicting Host Status from Temporal Microbiome Dynamics. mSystems 2022; 7:e0013222. [PMID: 36069455 PMCID: PMC9600536 DOI: 10.1128/msystems.00132-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Longitudinal microbiome data sets are being generated with increasing regularity, and there is broad recognition that these studies are critical for unlocking the mechanisms through which the microbiome impacts human health and disease. However, there is a dearth of computational tools for analyzing microbiome time-series data. To address this gap, we developed an open-source software package, Microbiome Differentiable Interpretable Temporal Rule Engine (MDITRE), which implements a new highly efficient method leveraging deep-learning technologies to derive human-interpretable rules that predict host status from longitudinal microbiome data. Using semi-synthetic and a large compendium of publicly available 16S rRNA amplicon and metagenomics sequencing data sets, we demonstrate that in almost all cases, MDITRE performs on par with or better than popular uninterpretable machine learning methods, and orders-of-magnitude faster than the prior interpretable technique. MDITRE also provides a graphical user interface, which we show through case studies can be used to derive biologically meaningful interpretations linking patterns of microbiome changes over time with host phenotypes. IMPORTANCE The human microbiome, or collection of microbes living on and within us, changes over time. Linking these changes to the status of the human host is crucial to understanding how the microbiome influences a variety of human diseases. Due to the large scale and complexity of microbiome data, computational methods are essential. Existing computational methods for linking changes in the microbiome to the status of the human host are either unable to scale to large and complex microbiome data sets or cannot produce human-interpretable outputs. We present a new computational method and software package that overcomes the limitations of previous methods, allowing researchers to analyze larger and more complex data sets while producing easily interpretable outputs. Our method has the potential to enable new insights into how changes in the microbiome over time maintain health or lead to disease in humans and facilitate the development of diagnostic tests based on the microbiome.
Collapse
|
10
|
Chlenski P, Hsu M, Pe’er I. MiSDEED: a synthetic data engine for microbiome study power analysis and study design. BIOINFORMATICS ADVANCES 2022; 2:vbac043. [PMID: 36699411 PMCID: PMC9710642 DOI: 10.1093/bioadv/vbac043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 04/14/2022] [Indexed: 01/28/2023]
Abstract
Summary MiSDEED (Microbial Synthetic Data Engine for Experimental Design) is a command-line tool for generating synthetic longitudinal multinode data from simulated microbial environments. It generates relative-abundance timecourses under perturbations for an arbitrary number of time points, samples, locations and data types. All simulation parameters are exposed to the user to facilitate rapid power analysis and aid in study design. Users who want additional flexibility may also use MiSDEED as a Python package. Availability and implementation MiSDEED is written in Python and is freely available at https://github.com/pchlenski/misdeed.
Collapse
Affiliation(s)
| | - Melody Hsu
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Itsik Pe’er
- Department of Computer Science, Columbia University, New York, NY 10027, USA,Department of Systems Biology, Columbia University, New York, NY 10027, USA,Data Science Institute, Columbia University, New York, NY 10027, USA
| |
Collapse
|