251
|
Vangay P, Hillmann BM, Knights D. Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks. Gigascience 2019; 8:giz042. [PMID: 31042284 PMCID: PMC6493971 DOI: 10.1093/gigascience/giz042] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 02/24/2019] [Accepted: 03/26/2019] [Indexed: 01/05/2023] Open
Abstract
The use of machine learning in high-dimensional biological applications, such as the human microbiome, has grown exponentially in recent years, but algorithm developers often lack the domain expertise required for interpretation and curation of the heterogeneous microbiome datasets. We present Microbiome Learning Repo (ML Repo, available at https://knights-lab.github.io/MLRepo/), a public, web-based repository of 33 curated classification and regression tasks from 15 published human microbiome datasets. We highlight the use of ML Repo in several use cases to demonstrate its wide application, and we expect it to be an important resource for algorithm developers.
Collapse
Affiliation(s)
- Pajau Vangay
- Bioinformatics and Computational Biology, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455
| | - Benjamin M Hillmann
- Department of Computer Science and Engineering, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455
| | - Dan Knights
- Bioinformatics and Computational Biology, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455
- Department of Computer Science and Engineering, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455
| |
Collapse
|
252
|
Predicting Growth and Carcass Traits in Swine Using Microbiome Data and Machine Learning Algorithms. Sci Rep 2019; 9:6574. [PMID: 31024050 PMCID: PMC6484031 DOI: 10.1038/s41598-019-43031-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 04/10/2019] [Indexed: 12/17/2022] Open
Abstract
In this paper, we evaluated the power of microbiome measures taken at three time points over the growth test period (weaning, 15 and 22 weeks) to foretell growth and carcass traits in 1039 individuals of a line of crossbred pigs. We measured prediction accuracy as the correlation between actual and predicted phenotypes in a five-fold cross-validation setting. Phenotypic traits measured included live weight measures and carcass composition obtained during the trial as well as at slaughter. We employed a null model excluding microbiome information as a baseline to assess the increase in prediction accuracy stemming from the inclusion of operational taxonomic units (OTU) as predictors. We further contrasted performance of models from the Bayesian alphabet (Bayesian Lasso) as well machine learning approaches (Random Forest and Gradient Boosting) and semi-parametric kernel models (Reproducing Kernel Hilbert space). In most cases, prediction accuracy increased significantly with the inclusion of microbiome data. Accuracy was more substantial with the inclusion of microbiome information taken at weeks 15 and 22, with values ranging from approximately 0.30 for loin traits to more than 0.50 for back fat. Conversely, microbiome composition at weaning resulted in most cases in marginal gains of prediction accuracy, suggesting that later measures might be more useful to include in predictive models. Model choice affected predictions marginally with no clear winner for any model/trait/time point. We, therefore, suggest average prediction across models as a robust strategy in fitting microbiome information. In conclusion, microbiome composition can effectively be used as a predictor of growth and composition traits, particularly for fatness traits. The inclusion of OTU predictors could potentially be used to promote fast growth of individuals while limiting fat accumulation. Early microbiome measures might not be good predictors of growth and OTU information might be best collected at later life stages. Future research should focus on the inclusion of both microbiome as well as host genome information in predictions, as well as the interaction between the two. Furthermore, the influence of the microbiome on feed efficiency as well as carcass and meat quality should be investigated.
Collapse
|
253
|
Wassan JT, Wang H, Browne F, Zheng H. Phy-PMRFI: Phylogeny-Aware Prediction of Metagenomic Functions Using Random Forest Feature Importance. IEEE Trans Nanobioscience 2019; 18:273-282. [PMID: 31021803 DOI: 10.1109/tnb.2019.2912824] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
High-throughput sequencing techniques have accelerated functional metagenomics studies through the generation of large volumes of omics data. The integration of these data using computational approaches is potentially useful for predicting metagenomic functions. Machine learning (ML) models can be trained using microbial features which are then used to classify microbial data into different functional classes. For example, ML analyses over the human microbiome data has been linked to the prediction of important biological states. For analysing omics data, integrating abundance count of taxonomical features with their biological relationships is important. These relationships can potentially be uncovered from the phylogenetic tree of microbial taxa. In this paper, we propose a novel integrative framework Phy-PMRFI. This framework is driven by the phylogeny-based modeling of omics data to predict metagenomic functions using important features selected by a random forest importance (RFI) strategy. The proposed framework integrates the underlying phylogenetic tree information with abundance measures of microbial species (features) by creating a novel phylogeny and abundance aware matrix structure (PAAM). Phy-PMRFI progresses by ranking the microbial features using an RFI measure. This is then used as input for microbiome classification. The resultant feature set enhances the performance of the state-of-art methods such as support vector machines. Our proposed integrative framework also outperforms the state-of-the-art pipeline of phylogenetic isometric log-ratio transform (PhILR) and MetaPhyl. Prediction accuracy of 90 % is obtained with Phy-PMRFI over human throat microbiome in comparison to other approaches of PhILR with 53% and MetaPhyl with 71% accuracy.
Collapse
|
254
|
Thomas AM, Manghi P, Asnicar F, Pasolli E, Armanini F, Zolfo M, Beghini F, Manara S, Karcher N, Pozzi C, Gandini S, Serrano D, Tarallo S, Francavilla A, Gallo G, Trompetto M, Ferrero G, Mizutani S, Shiroma H, Shiba S, Shibata T, Yachida S, Yamada T, Wirbel J, Schrotz-King P, Ulrich CM, Brenner H, Arumugam M, Bork P, Zeller G, Cordero F, Dias-Neto E, Setubal JC, Tett A, Pardini B, Rescigno M, Waldron L, Naccarati A, Segata N. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat Med 2019; 25:667-678. [PMID: 30936548 PMCID: PMC9533319 DOI: 10.1038/s41591-019-0405-7] [Citation(s) in RCA: 479] [Impact Index Per Article: 95.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 02/20/2019] [Indexed: 02/07/2023]
Abstract
Several studies have investigated links between the gut microbiome and colorectal cancer (CRC), but questions remain about the replicability of biomarkers across cohorts and populations. We performed a meta-analysis of five publicly available datasets and two new cohorts and validated the findings on two additional cohorts, considering in total 969 fecal metagenomes. Unlike microbiome shifts associated with gastrointestinal syndromes, the gut microbiome in CRC showed reproducibly higher richness than controls (P < 0.01), partially due to expansions of species typically derived from the oral cavity. Meta-analysis of the microbiome functional potential identified gluconeogenesis and the putrefaction and fermentation pathways as being associated with CRC, whereas the stachyose and starch degradation pathways were associated with controls. Predictive microbiome signatures for CRC trained on multiple datasets showed consistently high accuracy in datasets not considered for model training and independent validation cohorts (average area under the curve, 0.84). Pooled analysis of raw metagenomes showed that the choline trimethylamine-lyase gene was overabundant in CRC (P = 0.001), identifying a relationship between microbiome choline metabolism and CRC. The combined analysis of heterogeneous CRC cohorts thus identified reproducible microbiome biomarkers and accurate disease-predictive models that can form the basis for clinical prognostic tests and hypothesis-driven mechanistic studies.
Collapse
Affiliation(s)
- Andrew Maltez Thomas
- Department CIBIO, University of Trento, Trento, Italy
- Biochemistry Department, Chemistry Institute, University of São Paulo, São Paulo, Brazil
- Medical Genomics Laboratory, CIPE/A.C. Camargo Cancer Center, São Paulo, Brazil
| | - Paolo Manghi
- Department CIBIO, University of Trento, Trento, Italy
| | | | | | | | - Moreno Zolfo
- Department CIBIO, University of Trento, Trento, Italy
| | | | - Serena Manara
- Department CIBIO, University of Trento, Trento, Italy
| | | | - Chiara Pozzi
- IEO, European Institute of Oncology IRCCS, Milan, Italy
| | - Sara Gandini
- IEO, European Institute of Oncology IRCCS, Milan, Italy
| | | | - Sonia Tarallo
- Italian Institute for Genomic Medicine, Turin, Italy
| | | | - Gaetano Gallo
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, Italy
- Department of Colorectal Surgery, Clinica S. Rita, Vercelli, Italy
| | - Mario Trompetto
- Department of Colorectal Surgery, Clinica S. Rita, Vercelli, Italy
| | - Giulio Ferrero
- Department of Computer Science, University of Turin, Turin, Italy
| | - Sayaka Mizutani
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
- Research Fellow of Japan Society for the Promotion of Science, Tokyo, Japan
| | - Hirotsugu Shiroma
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Satoshi Shiba
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Tatsuhiro Shibata
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Shinichi Yachida
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
- Department of Cancer Genome Informatics, Osaka University, Osaka, Japan
| | - Takuji Yamada
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
- PRESTO, Japan Science and Technology Agency, Saitama, Japan
| | - Jakob Wirbel
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Petra Schrotz-King
- Division of Preventive Oncology, National Center for Tumor Diseases and German Cancer Research Center, Heidelberg, Germany
| | - Cornelia M Ulrich
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT, USA
| | - Hermann Brenner
- Division of Preventive Oncology, National Center for Tumor Diseases and German Cancer Research Center, Heidelberg, Germany
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, German Cancer Research Center, Heidelberg, Germany
| | - Manimozhiyan Arumugam
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Faculty of Healthy Sciences, University of Southern Denmark, Odense, Denmark
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Molecular Medicine Partnership Unit, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Georg Zeller
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | | | - Emmanuel Dias-Neto
- Medical Genomics Laboratory, CIPE/A.C. Camargo Cancer Center, São Paulo, Brazil
- Laboratory of Neurosciences, Institute of Psychiatry, University of São Paulo, São Paulo, Brazil
| | - João Carlos Setubal
- Biochemistry Department, Chemistry Institute, University of São Paulo, São Paulo, Brazil
- Biocomplexity Institute of Virginia Tech, Blacksburg, VA, USA
| | - Adrian Tett
- Department CIBIO, University of Trento, Trento, Italy
| | - Barbara Pardini
- Italian Institute for Genomic Medicine, Turin, Italy
- Department of Medical Sciences, University of Turin, Turin, Italy
| | - Maria Rescigno
- Mucosal Immunology and Microbiota Unit, Humanitas Research Hospital, Milan, Italy
| | - Levi Waldron
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY, USA
- Institute for Implementation Science in Population Health, City University of New York, New York, NY, USA
| | - Alessio Naccarati
- Italian Institute for Genomic Medicine, Turin, Italy
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine, Prague, Czech Republic
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy.
| |
Collapse
|
255
|
Wirbel J, Pyl PT, Kartal E, Zych K, Kashani A, Milanese A, Fleck JS, Voigt AY, Palleja A, Ponnudurai R, Sunagawa S, Coelho LP, Schrotz-King P, Vogtmann E, Habermann N, Niméus E, Thomas AM, Manghi P, Gandini S, Serrano D, Mizutani S, Shiroma H, Shiba S, Shibata T, Yachida S, Yamada T, Waldron L, Naccarati A, Segata N, Sinha R, Ulrich CM, Brenner H, Arumugam M, Bork P, Zeller G. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat Med 2019; 25:679-689. [PMID: 30936547 PMCID: PMC7984229 DOI: 10.1038/s41591-019-0406-6] [Citation(s) in RCA: 620] [Impact Index Per Article: 124.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 02/20/2019] [Indexed: 02/07/2023]
Abstract
Association studies have linked microbiome alterations with many human diseases. However, they have not always reported consistent results, thereby necessitating cross-study comparisons. Here, a meta-analysis of eight geographically and technically diverse fecal shotgun metagenomic studies of colorectal cancer (CRC, n = 768), which was controlled for several confounders, identified a core set of 29 species significantly enriched in CRC metagenomes (false discovery rate (FDR) < 1 × 10-5). CRC signatures derived from single studies maintained their accuracy in other studies. By training on multiple studies, we improved detection accuracy and disease specificity for CRC. Functional analysis of CRC metagenomes revealed enriched protein and mucin catabolism genes and depleted carbohydrate degradation genes. Moreover, we inferred elevated production of secondary bile acids from CRC metagenomes, suggesting a metabolic link between cancer-associated gut microbes and a fat- and meat-rich diet. Through extensive validations, this meta-analysis firmly establishes globally generalizable, predictive taxonomic and functional microbiome CRC signatures as a basis for future diagnostics.
Collapse
Affiliation(s)
- Jakob Wirbel
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Paul Theodor Pyl
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medicine, University of Copenhagen, Copenhagen, Denmark.,Division of Surgery, Oncology and Pathology, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Ece Kartal
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Molecular Medicine Partnership Unit, Heidelberg, Germany
| | - Konrad Zych
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Alireza Kashani
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Alessio Milanese
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jonas S Fleck
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Anita Y Voigt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Albert Palleja
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Ruby Ponnudurai
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Shinichi Sunagawa
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Department of Biology, ETH Zürich, Zürich, Switzerland
| | - Luis Pedro Coelho
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Petra Schrotz-King
- Division of Preventive Oncology, National Center for Tumor Diseases and German Cancer Research Center, Heidelberg, Germany
| | - Emily Vogtmann
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Nina Habermann
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Emma Niméus
- Division of Surgery, Oncology and Pathology, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden.,Division of Surgery, Department of Clinical Sciences Lund, Faculty of Medicine, Skane University Hospital, Lund, Sweden
| | - Andrew M Thomas
- Department CIBIO, University of Trento, Trento, Italy.,Biochemistry Department, Chemistry Institute, University of São Paulo, São Paulo, Brazil
| | - Paolo Manghi
- Department CIBIO, University of Trento, Trento, Italy
| | - Sara Gandini
- IEO, European Institute of Oncology IRCCS, Milan, Italy
| | | | - Sayaka Mizutani
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan.,Research Fellow of Japan Society for the Promotion of Science, Tokyo, Japan
| | - Hirotsugu Shiroma
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Satoshi Shiba
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Tatsuhiro Shibata
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan.,Laboratory of Molecular Medicine, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Shinichi Yachida
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan.,Department of Cancer Genome Informatics, Graduate School of Medicine/Faculty of Medicine, Osaka University, Osaka, Japan
| | - Takuji Yamada
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan.,PRESTO, Japan Science and Technology Agency, Saitama, Japan
| | - Levi Waldron
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY, USA.,Institute for Implementation Science in Population Health, City University of New York, New York, NY, USA
| | - Alessio Naccarati
- Italian Institute for Genomic Medicine, Turin, Italy.,Department of Molecular Biology of Cancer, Institute of Experimental Medicine, Prague, Czech Republic
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
| | - Rashmi Sinha
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Cornelia M Ulrich
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT, USA
| | - Hermann Brenner
- Division of Preventive Oncology, National Center for Tumor Diseases and German Cancer Research Center, Heidelberg, Germany.,Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Heidelberg, Germany.,German Cancer Consortium, German Cancer Research Center, Heidelberg, Germany
| | - Manimozhiyan Arumugam
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medicine, University of Copenhagen, Copenhagen, Denmark. .,Faculty of Healthy Sciences, University of Southern Denmark, Odense, Denmark.
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany. .,Molecular Medicine Partnership Unit, Heidelberg, Germany. .,Max Delbrück Centre for Molecular Medicine, Berlin, Germany. .,Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany.
| | - Georg Zeller
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
256
|
Gharaibeh RZ, Jobin C. Microbiota and cancer immunotherapy: in search of microbial signals. Gut 2019; 68:385-388. [PMID: 30530851 PMCID: PMC6580757 DOI: 10.1136/gutjnl-2018-317220] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 11/21/2018] [Accepted: 11/26/2018] [Indexed: 01/22/2023]
Affiliation(s)
- Raad Z Gharaibeh
- Department of Medicine, University of Florida, Gainesville, Florida, USA
| | - Christian Jobin
- Department of Medicine, University of Florida, Gainesville, Florida, USA,Department of Infectious Diseases and Immunology, University of Florida, Gainesville, Florida, USA,Department of Anatomy and Cell Biology, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
257
|
Watson RL, de Koff EM, Bogaert D. Characterising the respiratory microbiome. Eur Respir J 2019; 53:13993003.01711-2018. [PMID: 30487204 DOI: 10.1183/13993003.01711-2018] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 11/05/2018] [Indexed: 12/27/2022]
Affiliation(s)
- Rebecca L Watson
- Center for Inflammation Research, Queens Medical Research Institute, University of Edinburgh, Edinburgh, UK.,Both authors contributed equally
| | - Emma M de Koff
- Dept of Pediatrics, Wilhelmina Children's Hospital, University Medical Center Utrecht, Utrecht, The Netherlands.,Spaarne Academy, Spaarne Gasthuis, Hoofddorp, The Netherlands.,Both authors contributed equally
| | - Debby Bogaert
- Center for Inflammation Research, Queens Medical Research Institute, University of Edinburgh, Edinburgh, UK.,Dept of Pediatrics, Wilhelmina Children's Hospital, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
258
|
Abstract
Our understanding of the human gut microbiome continues to evolve at a rapid pace, but practical application of thisknowledge is still in its infancy. This review discusses the type of studies that will be essential for translating microbiome research into targeted modulations with dedicated benefits for the human host.
Collapse
Affiliation(s)
- Thomas S B Schmidt
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117 Heidelberg, Germany
| | - Jeroen Raes
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute, Herestraat 49, 3000 Leuven, Belgium; VIB, Center for Microbiology, Heerestraat 49, 3000 Leuven, Belgium.
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117 Heidelberg, Germany; Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, 69120 Heidelberg, Germany; Max-Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125 Berlin, Germany; Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany.
| |
Collapse
|
259
|
Bowyer RCE, Jackson MA, Le Roy CI, Ni Lochlainn M, Spector TD, Dowd JB, Steves CJ. Socioeconomic Status and the Gut Microbiome: A TwinsUK Cohort Study. Microorganisms 2019; 7:E17. [PMID: 30641975 PMCID: PMC6351927 DOI: 10.3390/microorganisms7010017] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 01/07/2019] [Accepted: 01/08/2019] [Indexed: 12/12/2022] Open
Abstract
Socioeconomic inequalities in health and mortality are well established, but the biological mechanisms underlying these associations are less understood. In parallel, the gut microbiome is emerging as a potentially important determinant of human health, but little is known about its broader environmental and social determinants. We test the association between gut microbiota composition and individual- and area-level socioeconomic factors in a well-characterized twin cohort. In this study, 1672 healthy volunteers from twin registry TwinsUK had data available for at least one socioeconomic measure, existing fecal 16S rRNA microbiota data, and all considered co-variables. Associations with socioeconomic status (SES) were robust to adjustment for known health correlates of the microbiome; conversely, these health-microbiome associations partially attenuated with adjustment for SES. Twins discordant for IMD (Index of Multiple Deprivation) were shown to significantly differ by measures of compositional dissimilarity, with suggestion the greater the difference in twin pair IMD, the greater the dissimilarity of their microbiota. Future research should explore how SES might influence the composition of the gut microbiota and its potential role as a mediator of differences associated with SES.
Collapse
Affiliation(s)
- Ruth C E Bowyer
- The Department of Twin Research, Kings College London, 3-4th Floor South Wing Block D, St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH, UK.
| | - Matthew A Jackson
- The Department of Twin Research, Kings College London, 3-4th Floor South Wing Block D, St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH, UK.
- Kennedy Institute of Rheumatology, University of Oxford, Oxford OX1 3QR, UK.
| | - Caroline I Le Roy
- The Department of Twin Research, Kings College London, 3-4th Floor South Wing Block D, St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH, UK.
| | - Mary Ni Lochlainn
- The Department of Twin Research, Kings College London, 3-4th Floor South Wing Block D, St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH, UK.
- Clinical Age Research Unit, Kings College Hospital Foundation Trust, London SE5 9RS, UK.
| | - Tim D Spector
- The Department of Twin Research, Kings College London, 3-4th Floor South Wing Block D, St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH, UK.
| | - Jennifer B Dowd
- Department of Global Health & Social Medicine, King's Building, King's College London, Strand, London WC2R 2LS, UK.
- CUNY Graduate School of Public Health and Health Policy, 55 W 125th Street, New York, NY 10027, USA.
| | - Claire J Steves
- The Department of Twin Research, Kings College London, 3-4th Floor South Wing Block D, St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH, UK.
- Department of Ageing and Health, St Thomas' Hospital, 9th floor, North Wing, Westminster Bridge Road, London SE1 7EH, UK.
| |
Collapse
|
260
|
Xiao J, Chen L, Yu Y, Zhang X, Chen J. A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data. Front Microbiol 2018; 9:3112. [PMID: 30619188 PMCID: PMC6305753 DOI: 10.3389/fmicb.2018.03112] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 12/03/2018] [Indexed: 12/16/2022] Open
Abstract
Fueled by technological advancement, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine. Efficient predictive models based on microbiome data could be potentially used in various clinical applications such as disease diagnosis, patient stratification and drug response prediction. One important characteristic of the microbial community data is the phylogenetic tree that relates all the microbial taxa based on their evolutionary history. The phylogenetic tree is an informative prior for more efficient prediction since the microbial community changes are usually not randomly distributed on the tree but tend to occur in clades at varying phylogenetic depths (clustered signal). Although community-wide changes are possible for some conditions, it is also likely that the community changes are only associated with a small subset of "marker" taxa (sparse signal). Unfortunately, predictive models of microbial community data taking into account both the sparsity and the tree structure remain under-developed. In this paper, we propose a predictive framework to exploit sparse and clustered microbiome signals using a phylogeny-regularized sparse regression model. Our approach is motivated by evolutionary theory, where a natural correlation structure among microbial taxa exists according to the phylogenetic relationship. A novel phylogeny-based smoothness penalty is proposed to smooth the coefficients of the microbial taxa with respect to the phylogenetic tree. Using simulated and real datasets, we show that our method achieves better prediction performance than competing sparse regression methods for sparse and clustered microbiome signals.
Collapse
Affiliation(s)
- Jian Xiao
- Division of Biomedical Statistics and Informatics, Center for Individualized Medicine, Mayo Clinic Rochester, MN, United States.,School of Statistics and Mathematics Zhongnan University of Economics and Law, Wuhan, China
| | - Li Chen
- Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University Auburn, AL, United States
| | - Yue Yu
- Division of Biomedical Statistics and Informatics, Center for Individualized Medicine, Mayo Clinic Rochester, MN, United States
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University College Station, TX, United States
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Center for Individualized Medicine, Mayo Clinic Rochester, MN, United States
| |
Collapse
|
261
|
Forbes JD, Chen CY, Knox NC, Marrie RA, El-Gabalawy H, de Kievit T, Alfa M, Bernstein CN, Van Domselaar G. A comparative study of the gut microbiota in immune-mediated inflammatory diseases-does a common dysbiosis exist? MICROBIOME 2018; 6:221. [PMID: 30545401 PMCID: PMC6292067 DOI: 10.1186/s40168-018-0603-4] [Citation(s) in RCA: 256] [Impact Index Per Article: 42.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 11/25/2018] [Indexed: 05/12/2023]
Abstract
BACKGROUND Immune-mediated inflammatory disease (IMID) represents a substantial health concern. It is widely recognized that IMID patients are at a higher risk for developing secondary inflammation-related conditions. While an ambiguous etiology is common to all IMIDs, in recent years, considerable knowledge has emerged regarding the plausible role of the gut microbiome in IMIDs. This study used 16S rRNA gene amplicon sequencing to compare the gut microbiota of patients with Crohn's disease (CD; N = 20), ulcerative colitis (UC; N = 19), multiple sclerosis (MS; N = 19), and rheumatoid arthritis (RA; N = 21) versus healthy controls (HC; N = 23). Biological replicates were collected from participants within a 2-month interval. This study aimed to identify common (or unique) taxonomic biomarkers of IMIDs using both differential abundance testing and a machine learning approach. RESULTS Significant microbial community differences between cohorts were observed (pseudo F = 4.56; p = 0.01). Richness and diversity were significantly different between cohorts (pFDR < 0.001) and were lowest in CD while highest in HC. Abundances of Actinomyces, Eggerthella, Clostridium III, Faecalicoccus, and Streptococcus (pFDR < 0.001) were significantly higher in all disease cohorts relative to HC, whereas significantly lower abundances were observed for Gemmiger, Lachnospira, and Sporobacter (pFDR < 0.001). Several taxa were found to be differentially abundant in IMIDs versus HC including significantly higher abundances of Intestinibacter in CD, Bifidobacterium in UC, and unclassified Erysipelotrichaceae in MS and significantly lower abundances of Coprococcus in CD, Dialister in MS, and Roseburia in RA. A machine learning approach to classify disease versus HC was highest for CD (AUC = 0.93 and AUC = 0.95 for OTU and genus features, respectively) followed by MS, RA, and UC. Gemmiger and Faecalicoccus were identified as important features for classification of subjects to CD and HC. In general, features identified by differential abundance testing were consistent with machine learning feature importance. CONCLUSIONS This study identified several gut microbial taxa with differential abundance patterns common to IMIDs. We also found differentially abundant taxa between IMIDs. These taxa may serve as biomarkers for the detection and diagnosis of IMIDs and suggest there may be a common component to IMID etiology.
Collapse
Affiliation(s)
- Jessica D. Forbes
- Department of Internal Medicine, University of Manitoba, Winnipeg, MB Canada
- University of Manitoba IBD Clinical and Research Centre, Winnipeg, MB Canada
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, MB R3E 3R2 Canada
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB Canada
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada
| | - Chih-yu Chen
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, MB R3E 3R2 Canada
| | - Natalie C. Knox
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, MB R3E 3R2 Canada
| | - Ruth-Ann Marrie
- Department of Internal Medicine, University of Manitoba, Winnipeg, MB Canada
- Department of Community Health Sciences, University of Manitoba, Winnipeg, MB Canada
| | - Hani El-Gabalawy
- Department of Internal Medicine, University of Manitoba, Winnipeg, MB Canada
- Arthritis Centre, University of Manitoba, Winnipeg, MB Canada
| | - Teresa de Kievit
- Department of Microbiology, University of Manitoba, Winnipeg, MB Canada
| | - Michelle Alfa
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB Canada
| | - Charles N. Bernstein
- Department of Internal Medicine, University of Manitoba, Winnipeg, MB Canada
- University of Manitoba IBD Clinical and Research Centre, Winnipeg, MB Canada
| | - Gary Van Domselaar
- University of Manitoba IBD Clinical and Research Centre, Winnipeg, MB Canada
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, MB R3E 3R2 Canada
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB Canada
| |
Collapse
|
262
|
Maltez Thomas A, Prata Lima F, Maria Silva Moura L, Maria da Silva A, Dias-Neto E, Setubal JC. Comparative Metagenomics. Methods Mol Biol 2018; 1704:243-260. [PMID: 29277868 DOI: 10.1007/978-1-4939-7463-4_8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Thanks in large part to newer, better, and cheaper DNA sequencing technologies, an enormous number of metagenomic sequence datasets have been and continue to be generated, covering a huge variety of environmental niches, including several different human body sites. Comparing these metagenomes and identifying their commonalities and differences is a challenging task, due not only to the large amounts of data, but also because there are several methodological considerations that need to be taken into account to ensure an appropriate and sound comparison between datasets. In this chapter, we describe current techniques aimed at comparing metagenomes generated by 16S ribosomal RNA and shotgun DNA sequencing, emphasizing methodological issues that arise in these comparative studies. We provide a detailed case study to illustrate some of these techniques using data from the Human Microbiome Project comparing the microbial communities from ten buccal mucosa samples with ten tongue dorsum samples in terms of alpha diversity, beta diversity, and their taxonomic and functional profiles.
Collapse
Affiliation(s)
- Andrew Maltez Thomas
- Department of Biochemistry, Institute of Chemistry , University of São Paulo, São Paulo, SP, Brazil.,Medical Genomics Laboratory, CIPE/A.C. Camargo Cancer Center, São Paulo, SP, Brazil
| | - Felipe Prata Lima
- Department of Biochemistry, Institute of Chemistry , University of São Paulo, São Paulo, SP, Brazil.,Instituto Federal de Alagoas, Maceió, Alagoas, Brazil
| | - Livia Maria Silva Moura
- Department of Biochemistry, Institute of Chemistry , University of São Paulo, São Paulo, SP, Brazil
| | - Aline Maria da Silva
- Department of Biochemistry, Institute of Chemistry , University of São Paulo, São Paulo, SP, Brazil
| | - Emmanuel Dias-Neto
- Medical Genomics Laboratory, CIPE/A.C. Camargo Cancer Center, São Paulo, SP, Brazil.,Lab. of Neurosciences (LIM-27) Alzira Denise Hertzog Silva, Institute of Psychiatry, Faculdade de Medicina, Universidade de São Paulo (USP), São Paulo, SP, Brazil
| | - João C Setubal
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes, 748 room 909, 05508-000, São Paulo, SP, Brazil.
| |
Collapse
|
263
|
Sinha R, Ahsan H, Blaser M, Caporaso JG, Carmical JR, Chan AT, Fodor A, Gail MH, Harris CC, Helzlsouer K, Huttenhower C, Knight R, Kong HH, Lai GY, Hutchinson DLS, Le Marchand L, Li H, Orlich MJ, Shi J, Truelove A, Verma M, Vogtmann E, White O, Willett W, Zheng W, Mahabir S, Abnet C. Next steps in studying the human microbiome and health in prospective studies, Bethesda, MD, May 16-17, 2017. MICROBIOME 2018; 6:210. [PMID: 30477563 PMCID: PMC6257978 DOI: 10.1186/s40168-018-0596-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 11/15/2018] [Indexed: 06/09/2023]
Abstract
The National Cancer Institute (NCI) sponsored a 2-day workshop, "Next Steps in Studying the Human Microbiome and Health in Prospective Studies," in Bethesda, Maryland, May 16-17, 2017. The workshop brought together researchers in the field to discuss the challenges of conducting microbiome studies, including study design, collection and processing of samples, bioinformatics and statistical methods, publishing results, and ensuring reproducibility of published results. The presenters emphasized the great potential of microbiome research in understanding the etiology of cancer. This report summarizes the workshop and presents practical suggestions for conducting microbiome studies, from workshop presenters, moderators, and participants.
Collapse
Affiliation(s)
- Rashmi Sinha
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, USA.
| | - Habibul Ahsan
- Comprehensive Cancer Center University of Chicago Medicine and Biological Sciences, Chicago, IL, 60615, USA
| | - Martin Blaser
- Departments of Medicine and Microbiology, New York University Langone Medical Center, New York, NY, 10016, USA
| | - J Gregory Caporaso
- Pathogen and Microbiome Institute and Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Joseph Russell Carmical
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Andrew T Chan
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, 02114, USA
- Division of Gastroenterology, Massachusetts General Hospital, Boston, MA, 02114, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, 02115, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, 02142, USA
| | - Anthony Fodor
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Mitchell H Gail
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, USA
| | - Curtis C Harris
- Laboratory of Human Carcinogenesis, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Kathy Helzlsouer
- Division of Cancer Control and Population Sciences, National Cancer Institute, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Curtis Huttenhower
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, 02142, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Rob Knight
- Center for Microbiome Innovation, and Departments of Pediatrics and Computer Science and Engineering, University of California San Diego, San Diego, CA, 92093, USA
| | - Heidi H Kong
- Dermatology Branch, National Cancer Institute, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Gabriel Y Lai
- Environmental Epidemiology Branch, National Cancer Institute, Bethesda, MD, 20892, USA
| | - Diane Leigh Smith Hutchinson
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Loic Le Marchand
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, 96813, USA
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Michael J Orlich
- School of Public Health and Department of Preventive Medicine, School of Medicine, Loma Linda University, Loma Linda, CA, 92350, USA
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, USA
| | | | - Mukesh Verma
- Division of Cancer Control and Population Sciences, National Cancer Institute, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Emily Vogtmann
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, USA
| | - Owen White
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Walter Willett
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, 02115, USA
- Departments of Epidemiology and Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Wei Zheng
- Division of Epidemiology, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Somdat Mahabir
- Division of Cancer Control and Population Sciences, National Cancer Institute, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Christian Abnet
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, USA
| |
Collapse
|
264
|
Composition Analysis and Feature Selection of the Oral Microbiota Associated with Periodontal Disease. BIOMED RESEARCH INTERNATIONAL 2018; 2018:3130607. [PMID: 30581850 PMCID: PMC6276491 DOI: 10.1155/2018/3130607] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 10/10/2018] [Accepted: 11/04/2018] [Indexed: 12/15/2022]
Abstract
Periodontitis is an inflammatory disease involving complex interactions between oral microorganisms and the host immune response. Understanding the structure of the microbiota community associated with periodontitis is essential for improving classifications and diagnoses of various types of periodontal diseases and will facilitate clinical decision-making. In this study, we used a 16S rRNA metagenomics approach to investigate and compare the compositions of the microbiota communities from 76 subgingival plagues samples, including 26 from healthy individuals and 50 from patients with periodontitis. Furthermore, we propose a novel feature selection algorithm for selecting features with more information from many variables with a combination of these features and machine learning methods were used to construct prediction models for predicting the health status of patients with periodontal disease. We identified a total of 12 phyla, 124 genera, and 355 species and observed differences between health- and periodontitis-associated bacterial communities at all phylogenetic levels. We discovered that the genera Porphyromonas, Treponema, Tannerella, Filifactor, and Aggregatibacter were more abundant in patients with periodontal disease, whereas Streptococcus, Haemophilus, Capnocytophaga, Gemella, Campylobacter, and Granulicatella were found at higher levels in healthy controls. Using our feature selection algorithm, random forests performed better in terms of predictive power than other methods and consumed the least amount of computational time.
Collapse
|
265
|
Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, Lipson KS, Knight R, Caporaso JG, Segata N, Huttenhower C. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 2018. [PMID: 30377376 DOI: 10.1038/s41592-018-0176] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types.
Collapse
Affiliation(s)
- Eric A Franzosa
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lauren J McIver
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gholamali Rahnavard
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Luke R Thompson
- Department of Pediatrics, University of California San Diego, San Diego, CA, USA
| | - Melanie Schirmer
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - George Weingart
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | | | - Rob Knight
- Department of Pediatrics, University of California San Diego, San Diego, CA, USA
- Department of Computer Science & Engineering, University of California San Diego, San Diego, CA, USA
| | - J Gregory Caporaso
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Nicola Segata
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
266
|
Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, Lipson KS, Knight R, Caporaso JG, Segata N, Huttenhower C. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 2018; 15:962-968. [PMID: 30377376 PMCID: PMC6235447 DOI: 10.1038/s41592-018-0176-y] [Citation(s) in RCA: 896] [Impact Index Per Article: 149.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 07/17/2018] [Indexed: 01/06/2023]
Abstract
Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types.
Collapse
Affiliation(s)
- Eric A Franzosa
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lauren J McIver
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gholamali Rahnavard
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Luke R Thompson
- Department of Pediatrics, University of California San Diego, San Diego, CA, USA
| | - Melanie Schirmer
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - George Weingart
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | | | - Rob Knight
- Department of Pediatrics, University of California San Diego, San Diego, CA, USA.,Department of Computer Science & Engineering, University of California San Diego, San Diego, CA, USA
| | - J Gregory Caporaso
- Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Nicola Segata
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA. .,The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
267
|
Tackmann J, Arora N, Schmidt TSB, Rodrigues JFM, von Mering C. Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites. MICROBIOME 2018; 6:192. [PMID: 30355348 PMCID: PMC6201589 DOI: 10.1186/s40168-018-0565-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Accepted: 09/28/2018] [Indexed: 06/02/2023]
Abstract
BACKGROUND The identification of body site-specific microbial biomarkers and their use for classification tasks have promising applications in medicine, microbial ecology, and forensics. Previous studies have characterized site-specific microbiota and shown that sample origin can be accurately predicted by microbial content. However, these studies were usually restricted to single datasets with consistent experimental methods and conditions, as well as comparatively small sample numbers. The effects of study-specific biases and statistical power on classification performance and biomarker identification thus remain poorly understood. Furthermore, reliable detection in mixtures of different body sites or with noise from environmental contamination has rarely been investigated thus far. Finally, the impact of ecological associations between microbes on biomarker discovery was usually not considered in previous work. RESULTS Here we present the analysis of one of the largest cross-study sequencing datasets of microbial communities from human body sites (15,082 samples from 57 publicly available studies). We show that training a Random Forest Classifier on this aggregated dataset increases prediction performance for body sites by 35% compared to a single-study classifier. Using simulated datasets, we further demonstrate that the source of different microbial contributions in mixtures of different body sites or with soil can be detected starting at 1% of the total microbial community. We apply a biomarker selection method that excludes indirect environmental associations driven by microbe-microbe associations, yielding a parsimonious set of highly predictive taxa including novel biomarkers and excluding many previously reported taxa. We find a considerable fraction of unclassified biomarkers ("microbial dark matter") and observe that negatively associated taxa have a surprisingly high impact on classification performance. We further detect a significant enrichment of rod-shaped, motile, and sporulating taxa for feces biomarkers, consistent with a highly competitive environment. CONCLUSIONS Our machine learning model shows strong body site classification performance, both in single-source samples and mixtures, making it promising for tasks requiring high accuracy, such as forensic applications. We report a core set of ecologically informed biomarkers, inferred across a wide range of experimental protocols and conditions, providing the most concise, general, and least biased overview of body site-associated microbes to date.
Collapse
Affiliation(s)
- Janko Tackmann
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Natasha Arora
- Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
| | - Thomas Sebastian Benedikt Schmidt
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
- Present address: European Molecular Biology Laboratory, Heidelberg, Germany
| | | | - Christian von Mering
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
268
|
Wassan JT, Wang H, Browne F, Zheng H. A Comprehensive Study on Predicting Functional Role of Metagenomes Using Machine Learning Methods. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:751-763. [PMID: 30040657 DOI: 10.1109/tcbb.2018.2858808] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
"Metagenomics" is the study of genomic sequences obtained directly from environmental microbial communities with the aim to linking their structures with functional roles. The field has been aided in the unprecedented advancement through high-throughput omics data sequencing. The outcome of sequencing are biologically rich data sets. Metagenomic data consisting of microbial spe-cies which outnumber microbial samples, lead to the "curse of dimensionality". Hence the focus in metagenomics studies has moved towards developing efficient computational models using Machine Learning (ML), reducing the computational cost. In this paper, we comprehensively assessed various ML approaches to classifying high-dimensional human microbiota effectively into their functional phenotypes. We propose the application of embedded feature selection methods, namely, Extreme Gradient Boost-ing and Penalized Logistic Regression to determine important species. The resultant feature set enhanced the performance of one of the most popular state-of-the-art methods, Random Forest (RF) over metagenomic studies. Experimental results indicate that the proposed method achieved best results in terms of accuracy, area under Receiver Operating Characteristic curve (ROC-AUC) and major improvement in processing time. It outperformed other feature selection methods of filters or wrappers over RF and classifiers such as Support Vector Machine (SVM), Extreme Learning Machine (ELM), and -Nearest Neighbors (-NN).
Collapse
|
269
|
Vidulin V, Šmuc T, Džeroski S, Supek F. The evolutionary signal in metagenome phyletic profiles predicts many gene functions. MICROBIOME 2018; 6:129. [PMID: 29991352 PMCID: PMC6040064 DOI: 10.1186/s40168-018-0506-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 06/19/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner. RESULTS We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions. The MPPs are an encoding of environmental DNA sequencing data that consists of relative abundances of gene families across metagenomes. We find that such MPPs can accurately predict 826 Gene Ontology functional categories, while drawing on human gut microbiomes, ocean metagenomes, and DNA sequences from various other engineered and natural environments. Overall, in this task, the MPPs are highly accurate, and moreover they provide coverage for a set of Gene Ontology terms largely complementary to standard phylogenetic profiles, derived from fully sequenced genomes. We also find that metagenomes approximated from taxon relative abundance obtained via 16S rRNA gene sequencing may provide surprisingly useful predictive models. Crucially, the MPPs derived from different types of environments can infer distinct, non-overlapping sets of gene functions and therefore complement each other. Consistently, simulations on > 5000 metagenomes indicate that the amount of data is not in itself critical for maximizing predictive accuracy, while the diversity of sampled environments appears to be the critical factor for obtaining robust models. CONCLUSIONS In past work, metagenomics has provided invaluable insight into ecology of various habitats, into diversity of microbial life and also into human health and disease mechanisms. We propose that environmental DNA sequencing additionally constitutes a useful tool to predict biological roles of genes, yielding inferences out of reach for existing comparative genomics approaches.
Collapse
Affiliation(s)
- Vedrana Vidulin
- Faculty of Information Studies, 8000 Novo Mesto, Slovenia
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Tomislav Šmuc
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Fran Supek
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| |
Collapse
|
270
|
Macintyre PD, Van Niekerk A, Dobrowolski MP, Tsakalos JL, Mucina L. Impact of ecological redundancy on the performance of machine learning classifiers in vegetation mapping. Ecol Evol 2018; 8:6728-6737. [PMID: 30038769 PMCID: PMC6053567 DOI: 10.1002/ece3.4176] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Revised: 02/26/2018] [Accepted: 04/18/2018] [Indexed: 11/18/2022] Open
Abstract
Vegetation maps are models of the real vegetation patterns and are considered important tools in conservation and management planning. Maps created through traditional methods can be expensive and time-consuming, thus, new more efficient approaches are needed. The prediction of vegetation patterns using machine learning shows promise, but many factors may impact on its performance. One important factor is the nature of the vegetation-environment relationship assessed and ecological redundancy. We used two datasets with known ecological redundancy levels (strength of the vegetation-environment relationship) to evaluate the performance of four machine learning (ML) classifiers (classification trees, random forests, support vector machines, and nearest neighbor). These models used climatic and soil variables as environmental predictors with pretreatment of the datasets (principal component analysis and feature selection) and involved three spatial scales. We show that the ML classifiers produced more reliable results in regions where the vegetation-environment relationship is stronger as opposed to regions characterized by redundant vegetation patterns. The pretreatment of datasets and reduction in prediction scale had a substantial influence on the predictive performance of the classifiers. The use of ML classifiers to create potential vegetation maps shows promise as a more efficient way of vegetation modeling. The difference in performance between areas with poorly versus well-structured vegetation-environment relationships shows that some level of understanding of the ecology of the target region is required prior to their application. Even in areas with poorly structured vegetation-environment relationships, it is possible to improve classifier performance by either pretreating the dataset or reducing the spatial scale of the predictions.
Collapse
Affiliation(s)
- Paul D. Macintyre
- School of Biological SciencesThe University of Western AustraliaPerth, CrawleyWAAustralia
| | - Adriaan Van Niekerk
- Centre for Geographical AnalysisStellenbosch UniversityMatieland, StellenboschSouth Africa
| | - Mark P. Dobrowolski
- School of Biological SciencesThe University of Western AustraliaPerth, CrawleyWAAustralia
- Iluka Resources LimitedPerthWAAustralia
| | - James L. Tsakalos
- School of Biological SciencesThe University of Western AustraliaPerth, CrawleyWAAustralia
| | - Ladislav Mucina
- School of Biological SciencesThe University of Western AustraliaPerth, CrawleyWAAustralia
- Centre for Geographical AnalysisStellenbosch UniversityMatieland, StellenboschSouth Africa
| |
Collapse
|
271
|
Asgari E, Garakani K, McHardy AC, Mofrad MRK. MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples. Bioinformatics 2018; 34:i32-i42. [PMID: 29950008 PMCID: PMC6022683 DOI: 10.1093/bioinformatics/bty296] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Motivation Microbial communities play important roles in the function and maintenance of various biosystems, ranging from the human body to the environment. A major challenge in microbiome research is the classification of microbial communities of different environments or host phenotypes. The most common and cost-effective approach for such studies to date is 16S rRNA gene sequencing. Recent falls in sequencing costs have increased the demand for simple, efficient and accurate methods for rapid detection or diagnosis with proved applications in medicine, agriculture and forensic science. We describe a reference- and alignment-free approach for predicting environments and host phenotypes from 16S rRNA gene sequencing based on k-mer representations that benefits from a bootstrapping framework for investigating the sufficiency of shallow sub-samples. Deep learning methods as well as classical approaches were explored for predicting environments and host phenotypes. Results A k-mer distribution of shallow sub-samples outperformed Operational Taxonomic Unit (OTU) features in the tasks of body-site identification and Crohn's disease prediction. Aside from being more accurate, using k-mer features in shallow sub-samples allows (i) skipping computationally costly sequence alignments required in OTU-picking and (ii) provided a proof of concept for the sufficiency of shallow and short-length 16S rRNA sequencing for phenotype prediction. In addition, k-mer features predicted representative 16S rRNA gene sequences of 18 ecological environments, and 5 organismal environments with high macro-F1 scores of 0.88 and 0.87. For large datasets, deep learning outperformed classical methods such as Random Forest and Support Vector Machine. Availability and implementation The software and datasets are available at https://llp.berkeley.edu/micropheno. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ehsaneddin Asgari
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
| | - Kiavash Garakani
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
| | - Mohammad R K Mofrad
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Lab, Berkeley, CA, USA
| |
Collapse
|
272
|
Xiao J, Chen L, Johnson S, Yu Y, Zhang X, Chen J. Predictive Modeling of Microbiome Data Using a Phylogeny-Regularized Generalized Linear Mixed Model. Front Microbiol 2018; 9:1391. [PMID: 29997602 PMCID: PMC6030386 DOI: 10.3389/fmicb.2018.01391] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 06/06/2018] [Indexed: 12/21/2022] Open
Abstract
Recent human microbiome studies have revealed an essential role of the human microbiome in health and disease, opening up the possibility of building microbiome-based predictive models for individualized medicine. One unique characteristic of microbiome data is the existence of a phylogenetic tree that relates all the microbial species. It has frequently been observed that a cluster or clusters of bacteria at varying phylogenetic depths are associated with some clinical or biological outcome due to shared biological function (clustered signal). Moreover, in many cases, we observe a community-level change, where a large number of functionally interdependent species are associated with the outcome (dense signal). We thus develop "glmmTree," a prediction method based on a generalized linear mixed model framework, for capturing clustered and dense microbiome signals. glmmTree uses the similarity between microbiomes, which is defined based on the microbiome composition and the phylogenetic tree, to predict the outcome. The effects of other predictive variables (e.g., age, sex) can be incorporated readily in the regression framework. Additional tuning parameters enable a data-adaptive approach to capture signals at different phylogenetic depth and abundance level. Simulation studies and real data applications demonstrated that "glmmTree" outperformed existing methods in the dense and clustered signal scenarios.
Collapse
Affiliation(s)
- Jian Xiao
- Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN, United States
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Hubei, China
| | - Li Chen
- Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University, Auburn, AL, United States
| | - Stephen Johnson
- Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN, United States
| | - Yue Yu
- Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN, United States
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University, College Station, TX, United States
| | - Jun Chen
- Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
273
|
Oudah M, Henschel A. Taxonomy-aware feature engineering for microbiome classification. BMC Bioinformatics 2018; 19:227. [PMID: 29907097 PMCID: PMC6003080 DOI: 10.1186/s12859-018-2205-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Accepted: 05/15/2018] [Indexed: 12/17/2022] Open
Abstract
Background What is a healthy microbiome? The pursuit of this and many related questions, especially in light of the recently recognized microbial component in a wide range of diseases has sparked a surge in metagenomic studies. They are often not simply attributable to a single pathogen but rather are the result of complex ecological processes. Relatedly, the increasing DNA sequencing depth and number of samples in metagenomic case-control studies enabled the applicability of powerful statistical methods, e.g. Machine Learning approaches. For the latter, the feature space is typically shaped by the relative abundances of operational taxonomic units, as determined by cost-effective phylogenetic marker gene profiles. While a substantial body of microbiome/microbiota research involves unsupervised and supervised Machine Learning, very little attention has been put on feature selection and engineering. Results We here propose the first algorithm to exploit phylogenetic hierarchy (i.e. an all-encompassing taxonomy) in feature engineering for microbiota classification. The rationale is to exploit the often mono- or oligophyletic distribution of relevant (but hidden) traits by virtue of taxonomic abstraction. The algorithm is embedded in a comprehensive microbiota classification pipeline, which we applied to a diverse range of datasets, distinguishing healthy from diseased microbiota samples. Conclusion We demonstrate substantial improvements over the state-of-the-art microbiota classification tools in terms of classification accuracy, regardless of the actual Machine Learning technique while using drastically reduced feature spaces. Moreover, generalized features bear great explanatory value: they provide a concise description of conditions and thus help to provide pathophysiological insights. Indeed, the automatically and reproducibly derived features are consistent with previously published domain expert analyses. Electronic supplementary material The online version of this article (10.1186/s12859-018-2205-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mai Oudah
- Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.,New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Andreas Henschel
- Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
274
|
Ruuskanen MO, St Pierre KA, St Louis VL, Aris-Brosou S, Poulain AJ. Physicochemical Drivers of Microbial Community Structure in Sediments of Lake Hazen, Nunavut, Canada. Front Microbiol 2018; 9:1138. [PMID: 29922252 PMCID: PMC5996194 DOI: 10.3389/fmicb.2018.01138] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 05/14/2018] [Indexed: 11/13/2022] Open
Abstract
The Arctic is undergoing rapid environmental change, potentially affecting the physicochemical constraints of microbial communities that play a large role in both carbon and nutrient cycling in lacustrine environments. However, the microbial communities in such Arctic environments have seldom been studied, and the drivers of their composition are poorly characterized. To address these gaps, we surveyed the biologically active surface sediments in Lake Hazen, the largest lake by volume north of the Arctic Circle, and a small lake and shoreline pond in its watershed. High-throughput amplicon sequencing of the 16S rRNA gene uncovered a community dominated by Proteobacteria, Bacteroidetes, and Chloroflexi, similar to those found in other cold and oligotrophic lake sediments. We also show that the microbial community structure in this Arctic polar desert is shaped by pH and redox gradients. This study lays the groundwork for predicting how sediment microbial communities in the Arctic could respond as climate change proceeds to alter their physicochemical constraints.
Collapse
Affiliation(s)
| | - Kyra A St Pierre
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Vincent L St Louis
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Stéphane Aris-Brosou
- Department of Biology, University of Ottawa, Ottawa, ON, Canada.,Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada
| | | |
Collapse
|
275
|
Wang Y, Fu L, Ren J, Yu Z, Chen T, Sun F. Identifying Group-Specific Sequences for Microbial Communities Using Long k-mer Sequence Signatures. Front Microbiol 2018; 9:872. [PMID: 29774017 PMCID: PMC5943621 DOI: 10.3389/fmicb.2018.00872] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 04/16/2018] [Indexed: 12/19/2022] Open
Abstract
Comparing metagenomic samples is crucial for understanding microbial communities. For different groups of microbial communities, such as human gut metagenomic samples from patients with a certain disease and healthy controls, identifying group-specific sequences offers essential information for potential biomarker discovery. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered "group-specific" in our study. Our main purpose is to discover group-specific sequence regions between control and case groups as disease-associated markers. We developed a long k-mer (k ≥ 30 bps)-based computational pipeline to detect group-specific sequences at strain resolution free from reference sequences, sequence alignments, and metagenome-wide de novo assembly. We called our method MetaGO: Group-specific oligonucleotide analysis for metagenomic samples. An open-source pipeline on Apache Spark was developed with parallel computing. We applied MetaGO to one simulated and three real metagenomic datasets to evaluate the discriminative capability of identified group-specific markers. In the simulated dataset, 99.11% of group-specific logical 40-mers covered 98.89% disease-specific regions from the disease-associated strain. In addition, 97.90% of group-specific numerical 40-mers covered 99.61 and 96.39% of differentially abundant genome and regions between two groups, respectively. For a large-scale metagenomic liver cirrhosis (LC)-associated dataset, we identified 37,647 group-specific 40-mer features. Any one of the features can predict disease status of the training samples with the average of sensitivity and specificity higher than 0.8. The random forests classification using the top 10 group-specific features yielded a higher AUC (from ∼0.8 to ∼0.9) than that of previous studies. All group-specific 40-mers were present in LC patients, but not healthy controls. All the assembled 11 LC-specific sequences can be mapped to two strains of Veillonella parvula: UTDB1-3 and DSM2008. The experiments on the other two real datasets related to Inflammatory Bowel Disease and Type 2 Diabetes in Women consistently demonstrated that MetaGO achieved better prediction accuracy with fewer features compared to previous studies. The experiments showed that MetaGO is a powerful tool for identifying group-specific k-mers, which would be clinically applicable for disease prediction. MetaGO is available at https://github.com/VVsmileyx/MetaGO.
Collapse
Affiliation(s)
- Ying Wang
- Department of Automation, Xiamen University, Xiamen, China
| | - Lei Fu
- Department of Automation, Xiamen University, Xiamen, China
| | - Jie Ren
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, United States
| | - Zhaoxia Yu
- Department of Statistics, University of California, Irvine, Irvine, CA, United States
| | - Ting Chen
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, United States
- Bioinformatics Division, Tsinghua National Laboratory of Information Science and Technology, Tsinghua University, Beijing, China
- Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, United States
- Center for Computational Systems Biology, Fudan University, Shanghai, China
| |
Collapse
|
276
|
Plewniak F, Crognale S, Rossetti S, Bertin PN. A Genomic Outlook on Bioremediation: The Case of Arsenic Removal. Front Microbiol 2018; 9:820. [PMID: 29755441 PMCID: PMC5932151 DOI: 10.3389/fmicb.2018.00820] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Accepted: 04/10/2018] [Indexed: 01/07/2023] Open
Abstract
Microorganisms play a major role in biogeochemical cycles. As such they are attractive candidates for developing new or improving existing biotechnological applications, in order to deal with the accumulation and pollution of organic and inorganic compounds. Their ability to participate in bioremediation processes mainly depends on their capacity to metabolize toxic elements and catalyze reactions resulting in, for example, precipitation, biotransformation, dissolution, or sequestration. The contribution of genomics may be of prime importance to a thorough understanding of these metabolisms and the interactions of microorganisms with pollutants at the level of both single species and microbial communities. Such approaches should pave the way for the utilization of microorganisms to design new, efficient and environmentally sound remediation strategies, as exemplified by the case of arsenic contamination, which has been declared as a major risk for human health in various parts of the world.
Collapse
Affiliation(s)
- Frédéric Plewniak
- Génétique Moléculaire, Génomique et Microbiologie, UMR7156 CNRS, Université de Strasbourg, Strasbourg, France
| | - Simona Crognale
- Istituto di Ricerca sulle Acque, Consiglio Nazionale delle Ricerche, Rome, Italy
| | - Simona Rossetti
- Istituto di Ricerca sulle Acque, Consiglio Nazionale delle Ricerche, Rome, Italy
| | - Philippe N Bertin
- Génétique Moléculaire, Génomique et Microbiologie, UMR7156 CNRS, Université de Strasbourg, Strasbourg, France
| |
Collapse
|
277
|
Gibbons SM, Duvallet C, Alm EJ. Correcting for batch effects in case-control microbiome studies. PLoS Comput Biol 2018; 14:e1006102. [PMID: 29684016 PMCID: PMC5940237 DOI: 10.1371/journal.pcbi.1006102] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 05/08/2018] [Accepted: 03/21/2018] [Indexed: 12/18/2022] Open
Abstract
High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare different batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to traditional meta-analysis methods for combining independent p-values and to limma and ComBat, widely used batch-correction models developed for RNA microarray data. Overall, we show that percentile-normalization is a simple, non-parametric approach for correcting batch effects and improving sensitivity in case-control meta-analyses. Batch effects are obstacles to comparing results across studies. Traditional meta-analysis techniques for combining p-values from independent studies, like Fisher’s method, are effective but statistically conservative. If batch-effects can be corrected, then statistical tests can be performed on data pooled across studies, increasing sensitivity to detect differences between treatment groups. Here, we show how a simple, model-free approach corrects for batch effects in case-control microbiome datasets.
Collapse
Affiliation(s)
- Sean M. Gibbons
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- Center for Microbiome Informatics and Therapeutics, Cambridge, MA, United States of America
- The Broad Institute of MIT and Harvard, Cambridge, MA, United States of America
| | - Claire Duvallet
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- Center for Microbiome Informatics and Therapeutics, Cambridge, MA, United States of America
| | - Eric J. Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- Center for Microbiome Informatics and Therapeutics, Cambridge, MA, United States of America
- The Broad Institute of MIT and Harvard, Cambridge, MA, United States of America
- * E-mail:
| |
Collapse
|
278
|
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018; 15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 806] [Impact Index Per Article: 134.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open
Abstract
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.
Collapse
Affiliation(s)
- Travers Ching
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Brett K Beaulieu-Jones
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Alexandr A Kalinin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - Gregory P Way
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Enrico Ferrero
- Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
| | | | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Wei Xie
- Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Gail L Rosen
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Benjamin J Lengerich
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Johnny Israeli
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Jack Lanchantin
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Stephen Woloszynek
- Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Evan M Cofer
- Department of Computer Science, Trinity University, San Antonio, TX, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Christopher A Lavender
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Srinivas C Turaga
- Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David J Harris
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
| | | | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Yifan Peng
- National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Laura K Wiley
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Marwin H S Segler
- Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
| | - Simina M Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
| | - Austin Huang
- Department of Medicine, Brown University, Providence, RI, USA
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
279
|
Maier L, Pruteanu M, Kuhn M, Zeller G, Telzerow A, Anderson EE, Brochado AR, Fernandez KC, Dose H, Mori H, Patil KR, Bork P, Typas A. Extensive impact of non-antibiotic drugs on human gut bacteria. Nature 2018; 555:623-628. [PMID: 29555994 PMCID: PMC6108420 DOI: 10.1038/nature25979] [Citation(s) in RCA: 1121] [Impact Index Per Article: 186.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2017] [Accepted: 02/08/2018] [Indexed: 12/13/2022]
Abstract
A few commonly used non-antibiotic drugs have recently been associated with changes in gut microbiome composition, but the extent of this phenomenon is unknown. Here, we screened more than 1,000 marketed drugs against 40 representative gut bacterial strains, and found that 24% of the drugs with human targets, including members of all therapeutic classes, inhibited the growth of at least one strain in vitro. Particular classes, such as the chemically diverse antipsychotics, were overrepresented in this group. The effects of human-targeted drugs on gut bacteria are reflected on their antibiotic-like side effects in humans and are concordant with existing human cohort studies. Susceptibility to antibiotics and human-targeted drugs correlates across bacterial species, suggesting common resistance mechanisms, which we verified for some drugs. The potential risk of non-antibiotics promoting antibiotic resistance warrants further exploration. Our results provide a resource for future research on drug-microbiome interactions, opening new paths for side effect control and drug repurposing, and broadening our view of antibiotic resistance.
Collapse
Affiliation(s)
- Lisa Maier
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Mihaela Pruteanu
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Michael Kuhn
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Georg Zeller
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Anja Telzerow
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Exene Erin Anderson
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Ana Rita Brochado
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | | | - Hitomi Dose
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Japan
| | - Hirotada Mori
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Japan
| | - Kiran Raosaheb Patil
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
- Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
- Molecular Medicine Partnership Unit, Heidelberg, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Germany
| | - Athanasios Typas
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| |
Collapse
|
280
|
Exploring Linkages between Taxonomic and Functional Profiles of the Human Microbiome. mSystems 2018; 3:mSystems00163-17. [PMID: 29629420 PMCID: PMC5881027 DOI: 10.1128/msystems.00163-17] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 01/16/2018] [Indexed: 02/07/2023] Open
Abstract
Microbiome studies typically focus on characterizing the taxonomic and functional profiles of the microbes within a community. Functional profiling is generally thought to be superior to taxonomic profiling for investigating human-microbe interactions, but there are several limitations and challenges to existing approaches. Microbiome studies typically focus on characterizing the taxonomic and functional profiles of the microbes within a community. Functional profiling is generally thought to be superior to taxonomic profiling for investigating human-microbe interactions, but there are several limitations and challenges to existing approaches. This Perspective discusses the current sequencing and bioinformatic methods for producing taxonomic and functional profiles, recent studies utilizing and comparing these technologies, and the existing challenges and limitations of these data. In addition, functional versus taxonomic conservation across the population is questioned, while future research that focuses on investigating the taxonomic diversity of microbial functions is proposed.
Collapse
|
281
|
Data and Statistical Methods To Analyze the Human Microbiome. mSystems 2018; 3:mSystems00194-17. [PMID: 29556541 PMCID: PMC5850081 DOI: 10.1128/msystems.00194-17] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2017] [Accepted: 12/07/2017] [Indexed: 11/20/2022] Open
Abstract
The Waldron lab for computational biostatistics bridges the areas of cancer genomics and microbiome studies for public health, developing methods to exploit publicly available data resources and to integrate -omics studies. The Waldron lab for computational biostatistics bridges the areas of cancer genomics and microbiome studies for public health, developing methods to exploit publicly available data resources and to integrate -omics studies.
Collapse
|
282
|
Duvallet C. Meta-analysis generates and prioritizes hypotheses for translational microbiome research. Microb Biotechnol 2018; 11:273-276. [PMID: 29349912 PMCID: PMC5812236 DOI: 10.1111/1751-7915.13047] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Affiliation(s)
- Claire Duvallet
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- MIT Center for Microbiome Informatics and Therapeutics, Cambridge, MA, 02139, USA
| |
Collapse
|
283
|
Gut microbiome populations are associated with structure-specific changes in white matter architecture. Transl Psychiatry 2018; 8:6. [PMID: 29317592 PMCID: PMC5802560 DOI: 10.1038/s41398-017-0022-5] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 08/01/2017] [Accepted: 08/04/2017] [Indexed: 12/13/2022] Open
Abstract
Altered gut microbiome populations are associated with a broad range of neurodevelopmental disorders including autism spectrum disorder and mood disorders. In animal models, modulation of gut microbiome populations via dietary manipulation influences brain function and behavior and has been shown to ameliorate behavioral symptoms. With striking differences in microbiome-driven behavior, we explored whether these behavioral changes are also accompanied by corresponding changes in neural tissue microstructure. Utilizing diffusion tensor imaging, we identified global changes in white matter structural integrity occurring in a diet-dependent manner. Analysis of 16S ribosomal RNA sequencing of gut bacteria also showed changes in bacterial populations as a function of diet. Changes in brain structure were found to be associated with diet-dependent changes in gut microbiome populations using a machine learning classifier for quantitative assessment of the strength of microbiome-brain region associations. These associations allow us to further test our understanding of the gut-brain-microbiota axis by revealing possible links between altered and dysbiotic gut microbiome populations and changes in brain structure, highlighting the potential impact of diet and metagenomic effects in neuroimaging.
Collapse
|
284
|
|
285
|
Bokulich NA, Dillon MR, Bolyen E, Kaehler BD, Huttley GA, Caporaso JG. q2-sample-classifier: machine-learning tools for microbiome classification and regression. JOURNAL OF OPEN RESEARCH SOFTWARE 2018; 3:934. [PMID: 31552137 PMCID: PMC6759219 DOI: 10.21105/joss.00934] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
q2-sample-classifier is a plugin for the QIIME 2 microbiome bioinformatics platform that facilitates access, reproducibility, and interpretation of supervised learning (SL) methods for a broad audience of non-bioinformatics specialists.
Collapse
Affiliation(s)
- Nicholas A Bokulich
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Matthew R Dillon
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Evan Bolyen
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Benjamin D Kaehler
- Research School of Biology, Australian National University, Canberra, Australia
| | - Gavin A Huttley
- Research School of Biology, Australian National University, Canberra, Australia
| | - J Gregory Caporaso
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA
| |
Collapse
|
286
|
Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun 2017; 8:1784. [PMID: 29209090 PMCID: PMC5716994 DOI: 10.1038/s41467-017-01973-8] [Citation(s) in RCA: 586] [Impact Index Per Article: 83.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 10/30/2017] [Indexed: 12/16/2022] Open
Abstract
Hundreds of clinical studies have demonstrated associations between the human microbiome and disease, yet fundamental questions remain on how we can generalize this knowledge. Results from individual studies can be inconsistent, and comparing published data is further complicated by a lack of standard processing and analysis methods. Here we introduce the MicrobiomeHD database, which includes 28 published case–control gut microbiome studies spanning ten diseases. We perform a cross-disease meta-analysis of these studies using standardized methods. We find consistent patterns characterizing disease-associated microbiome changes. Some diseases are associated with over 50 genera, while most show only 10–15 genus-level changes. Some diseases are marked by the presence of potentially pathogenic microbes, whereas others are characterized by a depletion of health-associated bacteria. Furthermore, we show that about half of genera associated with individual studies are bacteria that respond to more than one disease. Thus, many associations found in case–control studies are likely not disease-specific but rather part of a non-specific, shared response to health and disease. Reported associations between the human microbiome and disease are often inconsistent. Here, Duvallet et al. perform a meta-analysis of 28 gut microbiome studies spanning ten diseases, and find associations that are likely not disease-specific but potentially part of a shared response to disease.
Collapse
Affiliation(s)
- Claire Duvallet
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Sean M Gibbons
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, 02139, USA
| | - Thomas Gurry
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, 02139, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Eric J Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. .,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. .,The Broad Institute of MIT and Harvard, Cambridge, MA, 02139, USA.
| |
Collapse
|
287
|
Affiliation(s)
- Christian Jobin
- Department of Medicine, University of Florida, Gainesville, Florida; Department of Infectious Diseases and Pathology, University of Florida, Gainesville, Florida; Department of Anatomy and Cell Biology, University of Florida, Gainesville, Florida.
| |
Collapse
|
288
|
Mallick H, Ma S, Franzosa EA, Vatanen T, Morgan XC, Huttenhower C. Experimental design and quantitative analysis of microbial community multiomics. Genome Biol 2017; 18:228. [PMID: 29187204 PMCID: PMC5708111 DOI: 10.1186/s13059-017-1359-z] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Studies of the microbiome have become increasingly sophisticated, and multiple sequence-based, molecular methods as well as culture-based methods exist for population-scale microbiome profiles. To link the resulting host and microbial data types to human health, several experimental design considerations, data analysis challenges, and statistical epidemiological approaches must be addressed. Here, we survey current best practices for experimental design in microbiome molecular epidemiology, including technologies for generating, analyzing, and integrating microbiome multiomics data. We highlight studies that have identified molecular bioactives that influence human health, and we suggest steps for scaling translational microbiome research to high-throughput target discovery across large populations.
Collapse
Affiliation(s)
- Himel Mallick
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Siyuan Ma
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Eric A Franzosa
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Tommi Vatanen
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Xochitl C Morgan
- Department of Microbiology and Immunology, The University of Otago, Dunedin, New Zealand
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA. .,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
289
|
Xing X, Liu JS, Zhong W. MetaGen: reference-free learning with multiple metagenomic samples. Genome Biol 2017; 18:187. [PMID: 28974263 PMCID: PMC5627425 DOI: 10.1186/s13059-017-1323-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 09/13/2017] [Indexed: 12/20/2022] Open
Abstract
A major goal of metagenomics is to identify and study the entire collection of microbial species in a set of targeted samples. We describe a statistical metagenomic algorithm that simultaneously identifies microbial species and estimates their abundances without using reference genomes. As a trade-off, we require multiple metagenomic samples, usually ≥10 samples, to get highly accurate binning results. Compared to reference-free methods based primarily on k-mer distributions or coverage information, the proposed approach achieves a higher species binning accuracy and is particularly powerful when sequencing coverage is low. We demonstrated the performance of this new method through both simulation and real metagenomic studies. The MetaGen software is available at https://github.com/BioAlgs/MetaGen.
Collapse
Affiliation(s)
- Xin Xing
- Department of Statistics, University of Georgia, Athens, 30602, GA, USA
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, 02138, MA, USA.,Center for Statistical Science & Department of Industry Entering, Beijing, 100084, China
| | - Wenxuan Zhong
- Department of Statistics, University of Georgia, Athens, 30602, GA, USA.
| |
Collapse
|
290
|
Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 2017; 35:833-844. [PMID: 28898207 DOI: 10.1038/nbt.3935] [Citation(s) in RCA: 844] [Impact Index Per Article: 120.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 07/12/2017] [Indexed: 02/06/2023]
Abstract
Diverse microbial communities of bacteria, archaea, viruses and single-celled eukaryotes have crucial roles in the environment and in human health. However, microbes are frequently difficult to culture in the laboratory, which can confound cataloging of members and understanding of how communities function. High-throughput sequencing technologies and a suite of computational pipelines have been combined into shotgun metagenomics methods that have transformed microbiology. Still, computational approaches to overcome the challenges that affect both assembly-based and mapping-based metagenomic profiling, particularly of high-complexity samples or environments containing organisms with limited similarity to sequenced genomes, are needed. Understanding the functions and characterizing specific strains of these communities offers biotechnological promise in therapeutic discovery and innovative ways to synthesize products using microbial factories and can pinpoint the contributions of microorganisms to planetary, animal and human health.
Collapse
|
291
|
Sharpton T, Lyalina S, Luong J, Pham J, Deal EM, Armour C, Gaulke C, Sanjabi S, Pollard KS. Development of Inflammatory Bowel Disease Is Linked to a Longitudinal Restructuring of the Gut Metagenome in Mice. mSystems 2017; 2:e00036-17. [PMID: 28904997 PMCID: PMC5585689 DOI: 10.1128/msystems.00036-17] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/08/2017] [Indexed: 02/08/2023] Open
Abstract
The gut microbiome is linked to inflammatory bowel disease (IBD) severity and altered in late-stage disease. However, it is unclear how gut microbial communities change over the course of IBD development, especially in regard to function. To investigate microbiome-mediated disease mechanisms and discover early biomarkers of IBD, we conducted a longitudinal metagenomic investigation in an established mouse model of IBD, where damped transforming growth factor β (TGF-β) signaling in T cells leads to peripheral immune activation, weight loss, and severe colitis. IBD development is associated with abnormal gut microbiome temporal dynamics, including damped acquisition of functional diversity and significant differences in abundance trajectories for KEGG modules such as glycosaminoglycan degradation, cellular chemotaxis, and type III and IV secretion systems. Most differences between sick and control mice emerge when mice begin to lose weight and heightened T cell activation is detected in peripheral blood. However, levels of lipooligosaccharide transporter abundance diverge prior to immune activation, indicating that it could be a predisease indicator or microbiome-mediated disease mechanism. Taxonomic structure of the gut microbiome also significantly changes in association with IBD development, and the abundances of particular taxa, including several species of Bacteroides, correlate with immune activation. These discoveries were enabled by our use of generalized linear mixed-effects models to test for differences in longitudinal profiles between healthy and diseased mice while accounting for the distributions of taxon and gene counts in metagenomic data. These findings demonstrate that longitudinal metagenomics is useful for discovering the potential mechanisms through which the gut microbiome becomes altered in IBD. IMPORTANCE IBD patients harbor distinct microbial communities with functional capabilities different from those seen with healthy people. But is this cause or effect? Answering this question requires data on changes in gut microbial communities leading to disease onset. By performing weekly metagenomic sequencing and mixed-effects modeling on an established mouse model of IBD, we identified several functional pathways encoded by the gut microbiome that covary with host immune status. These pathways are novel early biomarkers that may either enable microbes to live inside an inflamed gut or contribute to immune activation in IBD mice. Future work will validate the potential roles of these microbial pathways in host-microbe interactions and human disease. This study was novel in its longitudinal design and focus on microbial pathways, which provided new mechanistic insights into the role of gut microbes in IBD development.
Collapse
Affiliation(s)
- Thomas Sharpton
- Department of Microbiology, Oregon State University, Corvallis, Oregon
- Department of Statistics, Oregon State University, Corvallis, Oregon
| | | | - Julie Luong
- Gladstone Institutes, San Francisco, California, USA
| | - Joey Pham
- Gladstone Institutes, San Francisco, California, USA
| | - Emily M. Deal
- Gladstone Institutes, San Francisco, California, USA
| | - Courtney Armour
- Department of Microbiology, Oregon State University, Corvallis, Oregon
| | | | - Shomyseh Sanjabi
- Gladstone Institutes, San Francisco, California, USA
- Department of Microbiology & Immunology, University of California, San Francisco, San Francisco, California, USA
| | - Katherine S. Pollard
- Gladstone Institutes, San Francisco, California, USA
- Department of Epidemiology & Biostatistics, Institute for Human Genetics, and Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|
292
|
Mitra A, MacIntyre DA, Mahajan V, Lee YS, Smith A, Marchesi JR, Lyons D, Bennett PR, Kyrgiou M. Comparison of vaginal microbiota sampling techniques: cytobrush versus swab. Sci Rep 2017; 7:9802. [PMID: 28852043 PMCID: PMC5575119 DOI: 10.1038/s41598-017-09844-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 07/31/2017] [Indexed: 12/16/2022] Open
Abstract
Evidence suggests the vaginal microbiota (VM) may influence risk of persistent Human Papillomavirus (HPV) infection and cervical carcinogenesis. Established cytology biobanks, typically collected with a cytobrush, constitute a unique resource to study such associations longitudinally. It is plausible that compared to rayon swabs; the most commonly used sampling devices, cytobrushes may disrupt biofilms leading to variation in VM composition. Cervico-vaginal samples were collected with cytobrush and rayon swabs from 30 women with high-grade cervical precancer. Quantitative PCR was used to compare bacterial load and Illumina MiSeq sequencing of the V1-V3 regions of the 16S rRNA gene used to compare VM composition. Cytobrushes collected a higher total bacterial load. Relative abundance of bacterial species was highly comparable between sampling devices (R2 = 0.993). However, in women with a Lactobacillus-depleted, high-diversity VM, significantly less correlation in relative species abundance was observed between devices when compared to those with a Lactobacillus species-dominant VM (p = 0.0049). Cytobrush and swab sampling provide a comparable VM composition. In a small proportion of cases the cytobrush was able to detect underlying high-diversity community structure, not realized with swab sampling. This study highlights the need to consider sampling devices as potential confounders when comparing multiple studies and datasets.
Collapse
Affiliation(s)
- Anita Mitra
- Institute of Reproductive and Developmental Biology, Surgery and Cancer, Imperial College London, London, W12 0NN, UK.,Department of Obstetrics & Gynaecology - West London Gynaecological Cancer Centre, Imperial College NHS Trust, London, W2 1NY, UK
| | - David A MacIntyre
- Institute of Reproductive and Developmental Biology, Surgery and Cancer, Imperial College London, London, W12 0NN, UK
| | - Vishakha Mahajan
- Institute of Reproductive and Developmental Biology, Surgery and Cancer, Imperial College London, London, W12 0NN, UK
| | - Yun S Lee
- Institute of Reproductive and Developmental Biology, Surgery and Cancer, Imperial College London, London, W12 0NN, UK
| | - Ann Smith
- Department of Biosciences, Cardiff University, Cardiff, CF10 3AX, UK
| | - Julian R Marchesi
- Department of Biosciences, Cardiff University, Cardiff, CF10 3AX, UK.,Centre for Digestive and Gut Health, Surgery and Cancer, Imperial College London, London, W2 1NY, UK
| | - Deirdre Lyons
- Department of Obstetrics & Gynaecology - West London Gynaecological Cancer Centre, Imperial College NHS Trust, London, W2 1NY, UK
| | - Phillip R Bennett
- Institute of Reproductive and Developmental Biology, Surgery and Cancer, Imperial College London, London, W12 0NN, UK.,Department of Obstetrics & Gynaecology - West London Gynaecological Cancer Centre, Imperial College NHS Trust, London, W2 1NY, UK
| | - Maria Kyrgiou
- Institute of Reproductive and Developmental Biology, Surgery and Cancer, Imperial College London, London, W12 0NN, UK. .,Department of Obstetrics & Gynaecology - West London Gynaecological Cancer Centre, Imperial College NHS Trust, London, W2 1NY, UK.
| |
Collapse
|
293
|
Large-scale comparative metagenomics of Blastocystis, a common member of the human gut microbiome. ISME JOURNAL 2017; 11:2848-2863. [PMID: 28837129 PMCID: PMC5702742 DOI: 10.1038/ismej.2017.139] [Citation(s) in RCA: 117] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Revised: 07/05/2017] [Accepted: 07/14/2017] [Indexed: 12/12/2022]
Abstract
The influence of unicellular eukaryotic microorganisms on human gut health and disease is still largely unexplored. Blastocystis spp. commonly colonize the gut, but its clinical significance and ecological role are currently unsettled. We have developed a high-sensitivity bioinformatic pipeline to detect Blastocystis subtypes (STs) from shotgun metagenomics, and applied it to 12 large data sets, comprising 1689 subjects of different geographic origin, disease status and lifestyle. We confirmed and extended previous observations on the high prevalence the microrganism in the population (14.9%), its non-random and ST-specific distribution, and its ability to cause persistent (asymptomatic) colonization. These findings, along with the higher prevalence observed in non-westernized individuals, the lack of positive association with any of the disease considered, and decreased presence in individuals with dysbiosis associated with colorectal cancer and Crohn's disease, strongly suggest that Blastocystis is a component of the healthy gut microbiome. Further, we found an inverse association between body mass index and Blastocystis, and strong co-occurrence with archaeal organisms (Methanobrevibacter smithii) and several bacterial species. The association of specific microbial community structures with Blastocystis was confirmed by the high predictability (up to 0.91 area under the curve) of the microorganism colonization based on the species-level composition of the microbiome. Finally, we reconstructed and functionally profiled 43 new draft Blastocystis genomes and discovered a higher intra subtype variability of ST1 and ST2 compared with ST3 and ST4. Altogether, we provide an in-depth epidemiologic, ecological, and genomic analysis of Blastocystis, and show how metagenomics can be crucial to advance population genomics of human parasites.
Collapse
|
294
|
Tett A, Pasolli E, Farina S, Truong DT, Asnicar F, Zolfo M, Beghini F, Armanini F, Jousson O, De Sanctis V, Bertorelli R, Girolomoni G, Cristofolini M, Segata N. Unexplored diversity and strain-level structure of the skin microbiome associated with psoriasis. NPJ Biofilms Microbiomes 2017; 3:14. [PMID: 28649415 PMCID: PMC5481418 DOI: 10.1038/s41522-017-0022-5] [Citation(s) in RCA: 115] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 03/22/2017] [Accepted: 05/22/2017] [Indexed: 12/20/2022] Open
Abstract
Psoriasis is an immune-mediated inflammatory skin disease that has been associated with cutaneous microbial dysbiosis by culture-dependent investigations and rRNA community profiling. We applied, for the first time, high-resolution shotgun metagenomics to characterise the microbiome of psoriatic and unaffected skin from 28 individuals. We demonstrate psoriatic ear sites have a decreased diversity and psoriasis is associated with an increase in Staphylococcus, but overall the microbiomes of psoriatic and unaffected sites display few discriminative features at the species level. Finer strain-level analysis reveals strain heterogeneity colonisation and functional variability providing the intriguing hypothesis of psoriatic niche-specific strain adaptation or selection. Furthermore, we accessed the poorly characterised, but abundant, clades with limited sequence information in public databases, including uncharacterised Malassezia spp. These results highlight the skins hidden diversity and suggests strain-level variations could be key determinants of the psoriatic microbiome. This illustrates the need for high-resolution analyses, particularly when identifying therapeutic targets. This work provides a baseline for microbiome studies in relation to the pathogenesis of psoriasis.
Collapse
Affiliation(s)
- Adrian Tett
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Edoardo Pasolli
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | | | - Duy Tin Truong
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | | | - Moreno Zolfo
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | | | | | - Olivier Jousson
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Veronica De Sanctis
- NGS Facility, Laboratory of Biomolecular Sequence and Structure Analysis for Health, Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Roberto Bertorelli
- NGS Facility, Laboratory of Biomolecular Sequence and Structure Analysis for Health, Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Giampiero Girolomoni
- Department of Medicine, Section of Dermatology, University of Verona, Verona, Italy
| | | | - Nicola Segata
- Centre for Integrative Biology, University of Trento, Trento, Italy
| |
Collapse
|
295
|
Lee STM, Kahn SA, Delmont TO, Shaiber A, Esen ÖC, Hubert NA, Morrison HG, Antonopoulos DA, Rubin DT, Eren AM. Tracking microbial colonization in fecal microbiota transplantation experiments via genome-resolved metagenomics. MICROBIOME 2017; 5:50. [PMID: 28473000 PMCID: PMC5418705 DOI: 10.1186/s40168-017-0270-x] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2016] [Accepted: 04/22/2017] [Indexed: 05/11/2023]
Abstract
BACKGROUND Fecal microbiota transplantation (FMT) is an effective treatment for recurrent Clostridium difficile infection and shows promise for treating other medical conditions associated with intestinal dysbioses. However, we lack a sufficient understanding of which microbial populations successfully colonize the recipient gut, and the widely used approaches to study the microbial ecology of FMT experiments fail to provide enough resolution to identify populations that are likely responsible for FMT-derived benefits. METHODS We used shotgun metagenomics together with assembly and binning strategies to reconstruct metagenome-assembled genomes (MAGs) from fecal samples of a single FMT donor. We then used metagenomic mapping to track the occurrence and distribution patterns of donor MAGs in two FMT recipients. RESULTS Our analyses revealed that 22% of the 92 highly complete bacterial MAGs that we identified from the donor successfully colonized and remained abundant in two recipients for at least 8 weeks. Most MAGs with a high colonization rate belonged to the order Bacteroidales. The vast majority of those that lacked evidence of colonization belonged to the order Clostridiales, and colonization success was negatively correlated with the number of genes related to sporulation. Our analysis of 151 publicly available gut metagenomes showed that the donor MAGs that colonized both recipients were prevalent, and the ones that colonized neither were rare across the participants of the Human Microbiome Project. Although our dataset showed a link between taxonomy and the colonization ability of a given MAG, we also identified MAGs that belong to the same taxon with different colonization properties, highlighting the importance of an appropriate level of resolution to explore the functional basis of colonization and to identify targets for cultivation, hypothesis generation, and testing in model systems. CONCLUSIONS The analytical strategy adopted in our study can provide genomic insights into bacterial populations that may be critical to the efficacy of FMT due to their success in gut colonization and metabolic properties, and guide cultivation efforts to investigate mechanistic underpinnings of this procedure beyond associations.
Collapse
Affiliation(s)
- Sonny T M Lee
- Section of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Chicago Medicine, Chicago, IL, USA
| | - Stacy A Kahn
- Section of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Chicago Medicine, Chicago, IL, USA
- Present address: Boston Children's Hospital, Inflammatory Bowel Disease Center, Boston, MA, USA
| | - Tom O Delmont
- Section of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Chicago Medicine, Chicago, IL, USA
| | - Alon Shaiber
- Section of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Chicago Medicine, Chicago, IL, USA
| | - Özcan C Esen
- Section of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Chicago Medicine, Chicago, IL, USA
| | - Nathaniel A Hubert
- Section of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Chicago Medicine, Chicago, IL, USA
| | - Hilary G Morrison
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, 02543, MA, USA
| | - Dionysios A Antonopoulos
- Section of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Chicago Medicine, Chicago, IL, USA
| | - David T Rubin
- Section of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Chicago Medicine, Chicago, IL, USA
| | - A Murat Eren
- Section of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Chicago Medicine, Chicago, IL, USA.
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, 02543, MA, USA.
| |
Collapse
|
296
|
Chang HX, Haudenshield JS, Bowen CR, Hartman GL. Metagenome-Wide Association Study and Machine Learning Prediction of Bulk Soil Microbiome and Crop Productivity. Front Microbiol 2017; 8:519. [PMID: 28421041 PMCID: PMC5378059 DOI: 10.3389/fmicb.2017.00519] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2017] [Accepted: 03/13/2017] [Indexed: 11/25/2022] Open
Abstract
Areas within an agricultural field in the same season often differ in crop productivity despite having the same cropping history, crop genotype, and management practices. One hypothesis is that abiotic or biotic factors in the soils differ between areas resulting in these productivity differences. In this study, bulk soil samples collected from a high and a low productivity area from within six agronomic fields in Illinois were quantified for abiotic and biotic characteristics. Extracted DNA from these bulk soil samples were shotgun sequenced. While logistic regression analyses resulted in no significant association between crop productivity and the 26 soil characteristics, principal coordinate analysis and constrained correspondence analysis showed crop productivity explained a major proportion of the taxa variance in the bulk soil microbiome. Metagenome-wide association studies (MWAS) identified more Bradyrhizodium and Gammaproteobacteria in higher productivity areas and more Actinobacteria, Ascomycota, Planctomycetales, and Streptophyta in lower productivity areas. Machine learning using a random forest method successfully predicted productivity based on the microbiome composition with the best accuracy of 0.79 at the order level. Our study showed that crop productivity differences were associated with bulk soil microbiome composition and highlighted several nitrogen utility-related taxa. We demonstrated the merit of MWAS and machine learning for the first time in a plant-microbiome study.
Collapse
Affiliation(s)
- Hao-Xun Chang
- Department of Crop Sciences, University of IllinoisUrbana, IL, USA
| | - James S. Haudenshield
- Department of Crop Sciences, University of IllinoisUrbana, IL, USA
- USDA—Agricultural Research ServiceUrbana, IL, USA
| | - Charles R. Bowen
- Department of Crop Sciences, University of IllinoisUrbana, IL, USA
- USDA—Agricultural Research ServiceUrbana, IL, USA
| | - Glen L. Hartman
- Department of Crop Sciences, University of IllinoisUrbana, IL, USA
- USDA—Agricultural Research ServiceUrbana, IL, USA
| |
Collapse
|
297
|
Metcalf JL, Xu ZZ, Bouslimani A, Dorrestein P, Carter DO, Knight R. Microbiome Tools for Forensic Science. Trends Biotechnol 2017; 35:814-823. [PMID: 28366290 DOI: 10.1016/j.tibtech.2017.03.006] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Revised: 03/08/2017] [Accepted: 03/09/2017] [Indexed: 01/28/2023]
Abstract
Microbes are present at every crime scene and have been used as physical evidence for over a century. Advances in DNA sequencing and computational approaches have led to recent breakthroughs in the use of microbiome approaches for forensic science, particularly in the areas of estimating postmortem intervals (PMIs), locating clandestine graves, and obtaining soil and skin trace evidence. Low-cost, high-throughput technologies allow us to accumulate molecular data quickly and to apply sophisticated machine-learning algorithms, building generalizable predictive models that will be useful in the criminal justice system. In particular, integrating microbiome and metabolomic data has excellent potential to advance microbial forensics.
Collapse
Affiliation(s)
- Jessica L Metcalf
- Department of Animal Sciences, Colorado State University, Fort Collins, CO 80523, USA.
| | - Zhenjiang Z Xu
- Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA 92093, USA
| | - Amina Bouslimani
- Department of Pharmacology, University of California, San Diego, La Jolla, CA 92093, USA
| | - Pieter Dorrestein
- Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA 92093, USA; Department of Pharmacology, University of California, San Diego, La Jolla, CA 92093, USA; Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA; Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - David O Carter
- Laboratory of Forensic Taphonomy, Forensic Sciences Unit, Division of Natural Sciences and Mathematics, Chaminade University of Honolulu, Honolulu, HI 96816, USA
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA 92093, USA; Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
298
|
Tsilimigras MCB, Fodor A, Jobin C. Carcinogenesis and therapeutics: the microbiota perspective. Nat Microbiol 2017; 2:17008. [PMID: 28225000 PMCID: PMC6423540 DOI: 10.1038/nmicrobiol.2017.8] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 01/10/2017] [Indexed: 12/18/2022]
Abstract
Cancer arises from the acquisition of multiple genetic and epigenetic changes in host cells over the span of many years, promoting oncogenic traits and carcinogenesis. Most cancers develop following random somatic alterations of key oncogenic genes, which are favoured by a number of risk factors, including lifestyle, diet and inflammation. Importantly, the environment where tumours evolve provides a unique source of signalling cues that affects cancer cell growth, survival, movement and metastasis. Recently, there has been increased interest in how the microbiota, the collection of microorganisms inhabiting the host body surface and cavities, shapes a micro-environment for host cells that can either promote or prevent cancer formation. The microbiota, particularly the intestinal biota, plays a central role in host physiology, and the composition and activity of this consortium of microorganisms is directly influenced by known cancer risk factors such as lifestyle, diet and inflammation. In this REVIEW, we discuss the pro- and anticarcinogenic role of the microbiota, as well as highlighting the therapeutic potential of microorganisms in tumourigenesis. The broad impacts, and, at times, opposing roles of the microbiota in carcinogenesis serve to illustrate the complex and sometimes conflicted relationship between microorganisms and the host-a relationship that could potentially be harnessed for therapeutic benefits.
Collapse
Affiliation(s)
- Matthew C. B. Tsilimigras
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, North Carolina 28223, USA
| | - Anthony Fodor
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, North Carolina 28223, USA
| | - Christian Jobin
- Department of Medicine, University of Florida, Gainesville, Florida 32611, USA
- Department of Infectious Diseases and Pathology, University of Florida, Gainesville, Florida 32611, USA
| |
Collapse
|
299
|
Truong DT, Tett A, Pasolli E, Huttenhower C, Segata N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res 2017; 27:626-638. [PMID: 28167665 PMCID: PMC5378180 DOI: 10.1101/gr.216242.116] [Citation(s) in RCA: 420] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 02/02/2017] [Indexed: 12/15/2022]
Abstract
Among the human health conditions linked to microbial communities, phenotypes are often associated with only a subset of strains within causal microbial groups. Although it has been critical for decades in microbial physiology to characterize individual strains, this has been challenging when using culture-independent high-throughput metagenomics. We introduce StrainPhlAn, a novel metagenomic strain identification approach, and apply it to characterize the genetic structure of thousands of strains from more than 125 species in more than 1500 gut metagenomes drawn from populations spanning North and South American, European, Asian, and African countries. The method relies on per-sample dominant sequence variant reconstruction within species-specific marker genes. It identified primarily subject-specific strain variants (<5% inter-subject strain sharing), and we determined that a single strain typically dominated each species and was retained over time (for >70% of species). Microbial population structure was correlated in several distinct ways with the geographic structure of the host population. In some cases, discrete subspecies (e.g., for Eubacterium rectale and Prevotella copri) or continuous microbial genetic variations (e.g., for Faecalibacterium prausnitzii) were associated with geographically distinct human populations, whereas few strains occurred in multiple unrelated cohorts. We further estimated the genetic variability of gut microbes, with Bacteroides species appearing remarkably consistent (0.45% median number of nucleotide variants between strains), whereas P. copri was among the most plastic gut colonizers. We thus characterize here the population genetics of previously inaccessible intestinal microbes, providing a comprehensive strain-level genetic overview of the gut microbial diversity.
Collapse
Affiliation(s)
- Duy Tin Truong
- Centre for Integrative Biology, University of Trento, 38123 Trento, Italy
| | - Adrian Tett
- Centre for Integrative Biology, University of Trento, 38123 Trento, Italy
| | - Edoardo Pasolli
- Centre for Integrative Biology, University of Trento, 38123 Trento, Italy
| | - Curtis Huttenhower
- Biostatistics Department, Harvard School of Public Health, Boston, Massachusetts 02115, USA.,The Broad Institute, Cambridge, Massachusetts 02142, USA
| | - Nicola Segata
- Centre for Integrative Biology, University of Trento, 38123 Trento, Italy
| |
Collapse
|
300
|
Dick GJ. Embracing the mantra of modellers and synthesizing omics, experiments and models. ENVIRONMENTAL MICROBIOLOGY REPORTS 2017; 9:18-20. [PMID: 27775862 DOI: 10.1111/1758-2229.12491] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Affiliation(s)
- Gregory J Dick
- Department of Earth and Environmental Sciences, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|