1
|
Boshuizen HC, Te Beest DE. Pitfalls in the statistical analysis of microbiome amplicon sequencing data. Mol Ecol Resour 2023; 23:539-548. [PMID: 36330663 DOI: 10.1111/1755-0998.13730] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 10/27/2022] [Indexed: 11/06/2022]
Abstract
Microbiome data are characterized by several aspects that make them challenging to analyse statistically: they are compositional, high dimensional and rich in zeros. A large array of statistical methods exist to analyse these data. Some are borrowed from other fields, such as ecology or RNA-sequencing, while others are custom-made for microbiome data. The large range of available methods, and which is continuously expanding, means that researchers have to invest considerable effort in choosing what method(s) to apply. In this paper we list 14 statistical methods or approaches that we think should be generally avoided. In several cases this is because we believe the assumptions behind the method are unlikely to be met for microbiome data. In other cases we see methods that are used in ways they are not intended to be used. We believe researchers would be helped by more critical evaluations of existing methods, as not all methods in use are suitable or have been sufficiently reviewed. We hope this paper contributes to a critical discussion on what methods are appropriate to use in the analysis of microbiome data.
Collapse
Affiliation(s)
| | - Dennis E Te Beest
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
2
|
Khomich M, Måge I, Rud I, Berget I. Analysing microbiome intervention design studies: Comparison of alternative multivariate statistical methods. PLoS One 2021; 16:e0259973. [PMID: 34793531 PMCID: PMC8601541 DOI: 10.1371/journal.pone.0259973] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 10/30/2021] [Indexed: 12/13/2022] Open
Abstract
The diet plays a major role in shaping gut microbiome composition and function in both humans and animals, and dietary intervention trials are often used to investigate and understand these effects. A plethora of statistical methods for analysing the differential abundance of microbial taxa exists, and new methods are constantly being developed, but there is a lack of benchmarking studies and clear consensus on the best multivariate statistical practices. This makes it hard for a biologist to decide which method to use. We compared the outcomes of generic multivariate ANOVA (ASCA and FFMANOVA) against statistical methods commonly used for community analyses (PERMANOVA and SIMPER) and methods designed for analysis of count data from high-throughput sequencing experiments (ALDEx2, ANCOM and DESeq2). The comparison is based on both simulated data and five published dietary intervention trials representing different subjects and study designs. We found that the methods testing differences at the community level were in agreement regarding both effect size and statistical significance. However, the methods that provided ranking and identification of differentially abundant operational taxonomic units (OTUs) gave incongruent results, implying that the choice of method is likely to influence the biological interpretations. The generic multivariate ANOVA tools have the flexibility needed for analysing multifactorial experiments and provide outputs at both the community and OTU levels; good performance in the simulation studies suggests that these statistical tools are also suitable for microbiome data sets.
Collapse
Affiliation(s)
- Maryia Khomich
- Division of Food Science, Department of Food Safety and Quality, Nofima – Norwegian Institute of Food, Fisheries and Aquaculture Research, Ås, Norway
- Department of Clinical Science, University of Bergen, Bergen, Norway
- * E-mail: , (MK); (IM)
| | - Ingrid Måge
- Division of Food Science, Department of Raw Materials and Process Optimisation, Nofima – Norwegian Institute of Food, Fisheries and Aquaculture Research, Ås, Norway
- * E-mail: , (MK); (IM)
| | - Ida Rud
- Division of Food Science, Department of Food Safety and Quality, Nofima – Norwegian Institute of Food, Fisheries and Aquaculture Research, Ås, Norway
| | - Ingunn Berget
- Division of Food Science, Department of Raw Materials and Process Optimisation, Nofima – Norwegian Institute of Food, Fisheries and Aquaculture Research, Ås, Norway
| |
Collapse
|
3
|
Petersen KS, Kris-Etherton PM, McCabe GP, Raman G, Miller JW, Maki KC. Perspective: Planning and Conducting Statistical Analyses for Human Nutrition Randomized Controlled Trials: Ensuring Data Quality and Integrity. Adv Nutr 2021; 12:1610-1624. [PMID: 33957665 PMCID: PMC8483948 DOI: 10.1093/advances/nmab045] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 03/02/2021] [Accepted: 03/17/2021] [Indexed: 12/29/2022] Open
Abstract
Appropriate planning, execution, and reporting of statistical methods and results is critical for research transparency, validity, and reproducibility. This paper provides an overview of best practices for developing a statistical analysis plan a priori, conducting statistical analyses, and reporting statistical methods and results for human nutrition randomized controlled trials (RCTs). Readers are referred to the other NURISH (NUtrition inteRventIon reSearcH) publications for detailed information about the preparation and conduct of human nutrition RCTs. Collectively, the NURISH series outlines best practices for conducting human nutrition research.
Collapse
Affiliation(s)
| | - Penny M Kris-Etherton
- Department of Nutritional Sciences, The Pennsylvania State University, University Park, PA, USA
| | - George P McCabe
- Department of Statistics, Purdue University, West Lafayette, IN, USA
| | - Gowri Raman
- Institute for Clinical Research and Health Policy Studies, Center for Clinical Evidence Synthesis (CCES),Tufts Medical Center, Boston, MA, USA
| | - Joshua W Miller
- Department of Nutritional Sciences, Rutgers University, New Brunswick, NJ, USA
| | | |
Collapse
|
4
|
Cullen CM, Aneja KK, Beyhan S, Cho CE, Woloszynek S, Convertino M, McCoy SJ, Zhang Y, Anderson MZ, Alvarez-Ponce D, Smirnova E, Karstens L, Dorrestein PC, Li H, Sen Gupta A, Cheung K, Powers JG, Zhao Z, Rosen GL. Emerging Priorities for Microbiome Research. Front Microbiol 2020; 11:136. [PMID: 32140140 PMCID: PMC7042322 DOI: 10.3389/fmicb.2020.00136] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 01/21/2020] [Indexed: 12/12/2022] Open
Abstract
Microbiome research has increased dramatically in recent years, driven by advances in technology and significant reductions in the cost of analysis. Such research has unlocked a wealth of data, which has yielded tremendous insight into the nature of the microbial communities, including their interactions and effects, both within a host and in an external environment as part of an ecological community. Understanding the role of microbiota, including their dynamic interactions with their hosts and other microbes, can enable the engineering of new diagnostic techniques and interventional strategies that can be used in a diverse spectrum of fields, spanning from ecology and agriculture to medicine and from forensics to exobiology. From June 19-23 in 2017, the NIH and NSF jointly held an Innovation Lab on Quantitative Approaches to Biomedical Data Science Challenges in our Understanding of the Microbiome. This review is inspired by some of the topics that arose as priority areas from this unique, interactive workshop. The goal of this review is to summarize the Innovation Lab's findings by introducing the reader to emerging challenges, exciting potential, and current directions in microbiome research. The review is broken into five key topic areas: (1) interactions between microbes and the human body, (2) evolution and ecology of microbes, including the role played by the environment and microbe-microbe interactions, (3) analytical and mathematical methods currently used in microbiome research, (4) leveraging knowledge of microbial composition and interactions to develop engineering solutions, and (5) interventional approaches and engineered microbiota that may be enabled by selectively altering microbial composition. As such, this review seeks to arm the reader with a broad understanding of the priorities and challenges in microbiome research today and provide inspiration for future investigation and multi-disciplinary collaboration.
Collapse
Affiliation(s)
- Chad M. Cullen
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States
| | | | - Sinem Beyhan
- Department of Infectious Diseases, J. Craig Venter Institute, La Jolla, CA, United States
| | - Clara E. Cho
- Department of Nutrition, Dietetics and Food Sciences, Utah State University, Logan, UT, United States
| | - Stephen Woloszynek
- Ecological and Evolutionary Signal-processing and Informatics Laboratory (EESI), Electrical and Computer Engineering, Drexel University, Philadelphia, PA, United States
- College of Medicine, Drexel University, Philadelphia, PA, United States
| | - Matteo Convertino
- Nexus Group, Faculty of Information Science and Technology, Gi-CoRE Station for Big Data & Cybersecurity, Hokkaido University, Sapporo, Japan
| | - Sophie J. McCoy
- Department of Biological Science, Florida State University, Tallahassee, FL, United States
| | - Yanyan Zhang
- Department of Civil Engineering, New Mexico State University, Las Cruces, NM, United States
| | - Matthew Z. Anderson
- Department of Microbiology, The Ohio State University, Columbus, OH, United States
- Department of Microbial Infection and Immunity, The Ohio State University, Columbus, OH, United States
| | | | - Ekaterina Smirnova
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, United States
| | - Lisa Karstens
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, United States
- Department of Obstetrics and Gynecology, Oregon Health & Science University, Portland, OR, United States
| | - Pieter C. Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, San Diego, CA, United States
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Ananya Sen Gupta
- Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA, United States
| | - Kevin Cheung
- Department of Dermatology, The University of Iowa, Iowa City, IA, United States
| | | | - Zhengqiao Zhao
- Ecological and Evolutionary Signal-processing and Informatics Laboratory (EESI), Electrical and Computer Engineering, Drexel University, Philadelphia, PA, United States
| | - Gail L. Rosen
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States
- Ecological and Evolutionary Signal-processing and Informatics Laboratory (EESI), Electrical and Computer Engineering, Drexel University, Philadelphia, PA, United States
| |
Collapse
|
5
|
Norouzi-Beirami MH, Marashi SA, Banaei-Moghaddam AM, Kavousi K. Beyond Taxonomic Analysis of Microbiomes: A Functional Approach for Revisiting Microbiome Changes in Colorectal Cancer. Front Microbiol 2020; 10:3117. [PMID: 32038558 PMCID: PMC6990412 DOI: 10.3389/fmicb.2019.03117] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 12/24/2019] [Indexed: 01/16/2023] Open
Abstract
Colorectal cancer (CRC) is one of the most prevalent cancers in the world, especially in developed countries. In different studies, the association between CRC and dysbiosis of gut microbiome has been reported. However, most of these works focus on the taxonomic variation of the microbiome, which presents little, if any, functional insight about the reason behind and/or consequences of microbiome dysbiosis. In this study, we used a previously reported metagenome dataset which is obtained by sequencing 156 microbiome samples of healthy individuals as the control group (Co), as well as microbiome samples of patients with advanced colorectal adenoma (Ad) and colorectal carcinoma (Ca). Features of the microbiome samples have been analyzed at the level of species, as well as four functional levels, i.e., gene, KEGG orthology (KO) group, Enzyme Commission (EC) number, and reaction. It was shown that, at each of these levels, certain features exist which show significant changing trends during cancer progression. In the next step, a list of these features were extracted, which were shown to be able to predict the category of Co, Ad, and Ca samples with an accuracy of >85%. When only one group of features (species, gene, KO group, EC number, reaction) was used, KO-related features were found to be the most successful features for classifying the three categories of samples. Notably, species-related features showed the least success in sample classification. Furthermore, by applying an independent test set, we showed that these performance trends are not limited to our original dataset. We determined the most important classification features at each of the four functional levels. We propose that these features can be considered as biomarkers of CRC progression. Finally, we show that the intra-diversity of each sample at the levels of bacterial species and genes is much more than those of the KO groups, EC numbers, and reactions of that sample. Therefore, we conclude that the microbiome diversity at the species level, or gene level, is not necessarily associated with the diversity at the functional level, which again indicates the importance of KO-, EC-, and reaction-based features in metagenome analysis. The source code of proposed method is freely available from https://www.bioinformatics.org/mamed.
Collapse
Affiliation(s)
- Mohammad Hossein Norouzi-Beirami
- Laboratory of Complex Biological Systems and Bioinformatics, Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Sayed-Amir Marashi
- Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran
| | - Ali Mohammad Banaei-Moghaddam
- Laboratory of Genomics and Epigenomics, Department of Biochemistry, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics, Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
6
|
LaPierre N, Ju CJT, Zhou G, Wang W. MetaPheno: A critical evaluation of deep learning and machine learning in metagenome-based disease prediction. Methods 2019; 166:74-82. [PMID: 30885720 PMCID: PMC6708502 DOI: 10.1016/j.ymeth.2019.03.003] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 02/14/2019] [Accepted: 03/04/2019] [Indexed: 01/21/2023] Open
Abstract
The human microbiome plays a number of critical roles, impacting almost every aspect of human health and well-being. Conditions in the microbiome have been linked to a number of significant diseases. Additionally, revolutions in sequencing technology have led to a rapid increase in publicly-available sequencing data. Consequently, there have been growing efforts to predict disease status from metagenomic sequencing data, with a proliferation of new approaches in the last few years. Some of these efforts have explored utilizing a powerful form of machine learning called deep learning, which has been applied successfully in several biological domains. Here, we review some of these methods and the algorithms that they are based on, with a particular focus on deep learning methods. We also perform a deeper analysis of Type 2 Diabetes and obesity datasets that have eluded improved results, using a variety of machine learning and feature extraction methods. We conclude by offering perspectives on study design considerations that may impact results and future directions the field can take to improve results and offer more valuable conclusions. The scripts and extracted features for the analyses conducted in this paper are available via GitHub:https://github.com/nlapier2/metapheno.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Chelsea J-T Ju
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Guangyu Zhou
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Wei Wang
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
7
|
Morales E, Chen J, Greathouse KL. Compositional Analysis of the Human Microbiome in Cancer Research. Methods Mol Biol 2019; 1928:299-335. [PMID: 30725462 DOI: 10.1007/978-1-4939-9027-6_16] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Gut microbial composition has shown to be associated with obesity, diabetes mellitus, inflammatory bowel disease, colitis, autoimmune disorders, and cancer, among other diseases. Microbiome research has significantly evolved through the years and continues to advance as we develop new and better strategies to more accurately measure its composition and function. Careful selection of study design, inclusion and exclusion criteria of participants, and methodology are paramount to accurately analyze microbial structure. Here we present the most up-to-date available information on methods for gut microbial collection and analysis.
Collapse
Affiliation(s)
- Elisa Morales
- Robbins College of Health and Human Sciences, Baylor University, Waco, TX, USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - K Leigh Greathouse
- Robbins College of Health and Human Sciences, Baylor University, Waco, TX, USA.
| |
Collapse
|
8
|
Qu K, Guo F, Liu X, Lin Y, Zou Q. Application of Machine Learning in Microbiology. Front Microbiol 2019; 10:827. [PMID: 31057526 PMCID: PMC6482238 DOI: 10.3389/fmicb.2019.00827] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 04/01/2019] [Indexed: 02/01/2023] Open
Abstract
Microorganisms are ubiquitous and closely related to people's daily lives. Since they were first discovered in the 19th century, researchers have shown great interest in microorganisms. People studied microorganisms through cultivation, but this method is expensive and time consuming. However, the cultivation method cannot keep a pace with the development of high-throughput sequencing technology. To deal with this problem, machine learning (ML) methods have been widely applied to the field of microbiology. Literature reviews have shown that ML can be used in many aspects of microbiology research, especially classification problems, and for exploring the interaction between microorganisms and the surrounding environment. In this study, we summarize the application of ML in microbiology.
Collapse
Affiliation(s)
- Kaiyang Qu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Xiangrong Liu
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Yuan Lin
- School of Information Science and Technology, Xiamen University, Xiamen, China
- Department of System Integration, Sparebanken Vest, Bergen, Norway
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
9
|
Reply to Moossavi and Azad, "Quantifying and Interpreting the Association between Early-Life Gut Microbiota Composition and Childhood Obesity". mBio 2019; 10:mBio.00047-19. [PMID: 30755504 PMCID: PMC6372791 DOI: 10.1128/mbio.00047-19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|