1
|
Dindhoria K, Manyapu V, Ali A, Kumar R. Unveiling the role of emerging metagenomics for the examination of hypersaline environments. Biotechnol Genet Eng Rev 2024; 40:2090-2128. [PMID: 37017219 DOI: 10.1080/02648725.2023.2197717] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 03/28/2023] [Indexed: 04/06/2023]
Abstract
Hypersaline ecosystems are distributed all over the globe. They are subjected to poly-extreme stresses and are inhabited by halophilic microorganisms possessing multiple adaptations. The halophiles have many biotechnological applications such as nutrient supplements, antioxidant synthesis, salt tolerant enzyme production, osmolyte synthesis, biofuel production, electricity generation etc. However, halophiles are still underexplored in terms of complex ecological interactions and functions as compared to other niches. The advent of metagenomics and the recent advancement of next-generation sequencing tools have made it feasible to investigate the microflora of an ecosystem, its interactions and functions. Both target gene and shotgun metagenomic approaches are commonly employed for the taxonomic, phylogenetic, and functional analyses of the hypersaline microbial communities. This review discusses different types of hypersaline niches, their residential microflora, and an overview of the metagenomic approaches used to investigate them. Various applications, hurdles and the recent advancements in metagenomic approaches have also been focused on here for their better understanding and utilization in the study of hypersaline microbiome.
Collapse
Affiliation(s)
- Kiran Dindhoria
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology Palampur, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Vivek Manyapu
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology Palampur, Palampur, Himachal Pradesh, India
| | - Ashif Ali
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology Palampur, Palampur, Himachal Pradesh, India
| | - Rakshak Kumar
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology Palampur, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
2
|
Wei ZG, Zhang XD, Fan XG, Qian Y, Liu F, Wu FX. pathMap: a path-based mapping tool for long noisy reads with high sensitivity. Brief Bioinform 2024; 25:bbae107. [PMID: 38517696 PMCID: PMC10959152 DOI: 10.1093/bib/bbae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 12/25/2023] [Accepted: 02/28/2024] [Indexed: 03/24/2024] Open
Abstract
With the rapid development of single-molecule sequencing (SMS) technologies, the output read length is continuously increasing. Mapping such reads onto a reference genome is one of the most fundamental tasks in sequence analysis. Mapping sensitivity is becoming a major concern since high sensitivity can detect more aligned regions on the reference and obtain more aligned bases, which are useful for downstream analysis. In this study, we present pathMap, a novel k-mer graph-based mapper that is specifically designed for mapping SMS reads with high sensitivity. By viewing the alignment chain as a path containing as many anchors as possible in the matched k-mer graph, pathMap treats chaining as a path selection problem in the directed graph. pathMap iteratively searches the longest path in the remaining nodes; more candidate chains with high quality can be effectively detected and aligned. Compared to other state-of-the-art mapping methods such as minimap2 and Winnowmap2, experiment results on simulated and real-life datasets demonstrate that pathMap obtains the number of mapped chains at least 11.50% more than its closest competitor and increases the mapping sensitivity by 17.28% and 13.84% of bases over the next-best mapper for Pacific Biosciences and Oxford Nanopore sequencing data, respectively. In addition, pathMap is more robust to sequence errors and more sensitive to species- and strain-specific identification of pathogens using MinION reads.
Collapse
Affiliation(s)
- Ze-Gang Wei
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
- Division of Biomedical Engineering, Department of Computer Science and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Xiao-Dan Zhang
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| | - Xing-Guo Fan
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| | - Yu Qian
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| | - Fei Liu
- School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, Department of Computer Science and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
3
|
Wei ZG, Chen X, Zhang XD, Zhang H, Fan XG, Gao HY, Liu F, Qian Y. Comparison of Methods for Biological Sequence Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2874-2888. [PMID: 37028305 DOI: 10.1109/tcbb.2023.3253138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Recent advances in sequencing technology have considerably promoted genomics research by providing high-throughput sequencing economically. This great advancement has resulted in a huge amount of sequencing data. Clustering analysis is powerful to study and probe the large-scale sequence data. A number of available clustering methods have been developed in the last decade. Despite numerous comparison studies being published, we noticed that they have two main limitations: only traditional alignment-based clustering methods are compared and the evaluation metrics heavily rely on labeled sequence data. In this study, we present a comprehensive benchmark study for sequence clustering methods. Specifically, i) alignment-based clustering algorithms including classical (e.g., CD-HIT, UCLUST, VSEARCH) and recently proposed methods (e.g., MMseq2, Linclust, edClust) are assessed; ii) two alignment-free methods (e.g., LZW-Kernel and Mash) are included to compare with alignment-based methods; and iii) different evaluation measures based on the true labels (supervised metrics) and the input data itself (unsupervised metrics) are applied to quantify their clustering results. The aims of this study are to help biological analyzers in choosing one reasonable clustering algorithm for processing their collected sequences, and furthermore, motivate algorithm designers to develop more efficient sequence clustering approaches.
Collapse
|
4
|
Paul SS, Rama Rao SV, Chatterjee RN, Raju MVLN, Mahato AK, Prakash B, Yadav SP, Kannan A, Reddy GN, Kumar V, Kumar PSP. An Immobilized Form of a Blend of Essential Oils Improves the Density of Beneficial Bacteria, in Addition to Suppressing Pathogens in the Gut and Also Improves the Performance of Chicken Breeding. Microorganisms 2023; 11:1960. [PMID: 37630519 PMCID: PMC10459846 DOI: 10.3390/microorganisms11081960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 07/17/2023] [Accepted: 07/18/2023] [Indexed: 08/27/2023] Open
Abstract
Antimicrobial growth promoters (AGP) are used in chicken production to suppress pathogens in the gut and improve performance, but such products tend to suppress beneficial bacteria while favoring the development and spread of antimicrobial resistance. A green alternative to AGP with the ability to suppress pathogens, but with an additional ability to spare beneficial gut bacteria and improve breeding performance is urgently required. We investigated the effect of supplementation of a blend of select essential oils (cinnamon oil, carvacrol, and thyme oil, henceforth referred to as EO; at two doses: 200 g/t and 400 g/t feed) exhibiting an ability to spare Lactobacillus while exhibiting strong E. coli inhibition ability under in vitro tests and immobilized in a sunflower oil and calcium alginate matrix, to broiler chickens and compared the effects with those of a probiotic yeast (Y), an AGP virginiamycin (V), and a negative control (C). qPCR analysis of metagenomic DNA from the gut content of experimental chickens indicated a significantly (p < 0.05) lower density of E. coli in the EO groups as compared to other groups. Amplicon sequence data of the gut microbiome indicated that all the additives had specific significant effects (DESeq2) on the gut microbiome, such as enrichment of uncultured Clostridia in the V and Y groups and uncultured Ruminococcaceae in the EO groups, as compared to the control. LEfSe analysis of the sequence data indicated a high abundance of beneficial bacteria Ruminococcaceae in the EO groups, Faecalibacterium in the Y group, and Blautia in the V group. Supplementation of the immobilized EO at the dose rate of 400 g/ton feed improved body weight gain (by 64 g/bird), feed efficiency (by 5 points), and cellular immunity (skin thickness response to phytoheamagglutinin lectin from Phaseolus vulgaris by 58%) significantly (p < 0.05), whereas neither yeast nor virginiamycin showed a significant effect on performance parameters. Expression of genes associated with gut barrier and immunity function such as CLAUDIN1, IL6, IFNG, TLR2A, and NOD1 were significantly higher in the EO groups. This study showed that the encapsulated EO mixture can improve the density of beneficial microbes in the gut significantly, with concomitant suppression of potential pathogens such as E.coli and improved performance and immunity, and hence, has a high potential to be used as an effective alternative to AGP in poultry.
Collapse
Affiliation(s)
- Shyam Sundar Paul
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| | - Savaram Venkata Rama Rao
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| | - Rudra Nath Chatterjee
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| | - Mantena Venkata Lakshmi Narasimha Raju
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| | - Ajay Kumar Mahato
- The Centre for DNA Fingerprinting and Diagnostics, Department of Biotechnology, Hyderabad 500039, India;
| | - Bhukya Prakash
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| | - Satya Pal Yadav
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| | - Alagarsamy Kannan
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| | - Godumagadda Narender Reddy
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| | - Vikas Kumar
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| | - Prakki Santosh Phani Kumar
- Directorate of Poultry Research, Poultry Nutrition, Indian Council of Agricultural Research (ICAR), Hyderabad 500030, India; (S.V.R.R.); (R.N.C.); (M.V.L.N.R.); (B.P.); (S.P.Y.); (A.K.); (G.N.R.); (V.K.); (P.S.P.K.)
| |
Collapse
|
5
|
Cao M, Peng Q, Wei ZG, Liu F, Hou YF. EdClust: A heuristic sequence clustering method with higher sensitivity. J Bioinform Comput Biol 2021; 20:2150036. [PMID: 34939905 DOI: 10.1142/s0219720021500360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.
Collapse
Affiliation(s)
- Ming Cao
- Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, P. R. China.,School of Mathematics and Statistics, Shaanxi Xueqian Normal University, Xi'an, 710100, P. R. China
| | - Qinke Peng
- Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, P. R. China
| | - Ze-Gang Wei
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, P. R. China
| | - Fei Liu
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, P. R. China
| | - Yi-Fan Hou
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, P. R. China
| |
Collapse
|
6
|
Wei ZG, Zhang XD, Cao M, Liu F, Qian Y, Zhang SW. Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences. Front Microbiol 2021; 12:644012. [PMID: 33841367 PMCID: PMC8024490 DOI: 10.3389/fmicb.2021.644012] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 02/17/2021] [Indexed: 12/31/2022] Open
Abstract
With the advent of next-generation sequencing technology, it has become convenient and cost efficient to thoroughly characterize the microbial diversity and taxonomic composition in various environmental samples. Millions of sequencing data can be generated, and how to utilize this enormous sequence resource has become a critical concern for microbial ecologists. One particular challenge is the OTUs (operational taxonomic units) picking in 16S rRNA sequence analysis. Lucky, this challenge can be directly addressed by sequence clustering that attempts to group similar sequences. Therefore, numerous clustering methods have been proposed to help to cluster 16S rRNA sequences into OTUs. However, each method has its clustering mechanism, and different methods produce diverse outputs. Even a slight parameter change for the same method can also generate distinct results, and how to choose an appropriate method has become a challenge for inexperienced users. A lot of time and resources can be wasted in selecting clustering tools and analyzing the clustering results. In this study, we introduced the recent advance of clustering methods for OTUs picking, which mainly focus on three aspects: (i) the principles of existing clustering algorithms, (ii) benchmark dataset construction for OTU picking and evaluation metrics, and (iii) the performance of different methods with various distance thresholds on benchmark datasets. This paper aims to assist biological researchers to select the reasonable clustering methods for analyzing their collected sequences and help algorithm developers to design more efficient sequences clustering methods.
Collapse
Affiliation(s)
- Ze-Gang Wei
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Xiao-Dan Zhang
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Ming Cao
- Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- School of Mathematics and Statistics, Shaanxi Xueqian Normal University, Xi’an, China
| | - Fei Liu
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Yu Qian
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
7
|
Paul SS, Chatterjee RN, Raju MVLN, Prakash B, Rama Rao SV, Yadav SP, Kannan A. Gut Microbial Composition Differs Extensively among Indian Native Chicken Breeds Originated in Different Geographical Locations and a Commercial Broiler Line, but Breed-Specific, as Well as Across-Breed Core Microbiomes, Are Found. Microorganisms 2021; 9:microorganisms9020391. [PMID: 33672925 PMCID: PMC7918296 DOI: 10.3390/microorganisms9020391] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/27/2021] [Accepted: 02/04/2021] [Indexed: 12/14/2022] Open
Abstract
Gut microbiota plays an important role in the health and performance of the host. Characterizations of gut microbiota, core microbiomes, and microbial networks in different chicken breeds are expected to provide clues for pathogen exclusion, improving performance or feed efficiency. Here, we characterized the gut microbiota of “finishing” chickens (at the end of production life) of indigenous Indian Nicobari, Ghagus, and Aseel breeds, originating from the Nicobari island, coastal India, and the Indian mainland, respectively, as well as a global commercial broiler line, VenCobb 400, using 16S rDNA amplicon sequencing. We found that diversity, as well as richness of microbiota, was higher in indigenous breeds than in the broiler line. Beta diversity analysis indicated the highest overlap between Ghagus and Nicobari breeds and a very low overlap between the broiler line and all indigenous breeds. Linear discriminant analysis effect size (LEfSe) revealed 82 breed- or line-specific phylotype operational taxonomic unit (OTU) level biomarkers. We confirm the presence of breed specific and across-breed core microbiomes. Additionally, we show the existence of breed specific complex microbial networks in all groups. This study provides the first (and comprehensive) insight into the gut microbiota of three indigenous breeds and one commercial broiler line of chickens reared without antimicrobials, and underscores the need to study microbial diversity in other indigenous breeds.
Collapse
Affiliation(s)
- Shyam Sundar Paul
- Poultry Nutrition Lab, ICAR—Directorate of Poultry Research, Poultry Nutrition, Hyderabad 500030, India; (M.V.L.N.R.); (B.P.); (S.V.R.R.); (A.K.)
- Correspondence:
| | | | | | - Bhukya Prakash
- Poultry Nutrition Lab, ICAR—Directorate of Poultry Research, Poultry Nutrition, Hyderabad 500030, India; (M.V.L.N.R.); (B.P.); (S.V.R.R.); (A.K.)
| | - Savaram Venkata Rama Rao
- Poultry Nutrition Lab, ICAR—Directorate of Poultry Research, Poultry Nutrition, Hyderabad 500030, India; (M.V.L.N.R.); (B.P.); (S.V.R.R.); (A.K.)
| | - Satya Pal Yadav
- Animal Biotechnology Lab, ICAR—Directorate of Poultry Research, Hyderabad 500030, India;
| | - Alagarsamy Kannan
- Poultry Nutrition Lab, ICAR—Directorate of Poultry Research, Poultry Nutrition, Hyderabad 500030, India; (M.V.L.N.R.); (B.P.); (S.V.R.R.); (A.K.)
| |
Collapse
|
8
|
Wei ZG, Zhang SW, Liu F. smsMap: mapping single molecule sequencing reads by locating the alignment starting positions. BMC Bioinformatics 2020; 21:341. [PMID: 32753028 PMCID: PMC7430848 DOI: 10.1186/s12859-020-03698-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 07/23/2020] [Indexed: 01/09/2023] Open
Abstract
Background Single Molecule Sequencing (SMS) technology can produce longer reads with higher sequencing error rate. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. Most existing mapping tools generally adopt the traditional seed-and-extend strategy, and the candidate aligned regions for each query read are selected either by counting the number of matched seeds or chaining a group of seeds. However, for all the existing mapping tools, the coverage ratio of the alignment region to the query read is lower, and the read alignment quality and efficiency need to be improved. Here, we introduce smsMap, a novel mapping tool that is specifically designed to map the long reads of SMS to a reference genome. Results smsMap was evaluated with other existing seven SMS mapping tools (e.g., BLASR, minimap2, and BWA-MEM) on both simulated and real-life SMS datasets. The experimental results show that smsMap can efficiently achieve higher aligned read coverage ratio and has higher sensitivity that can align more sequences and bases to the reference genome. Additionally, smsMap is more robust to sequencing errors. Conclusions smsMap is computationally efficient to align SMS reads, especially for the larger size of the reference genome (e.g., H. sapiens genome with over 3 billion base pairs). The source code of smsMap can be freely downloaded from https://github.com/NWPU-903PR/smsMap.
Collapse
Affiliation(s)
- Ze-Gang Wei
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.,Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Fei Liu
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China
| |
Collapse
|
9
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|