1
|
Jiang Z, Li X, Guo L. Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases. Interdiscip Sci 2022; 14:795-803. [PMID: 35639335 DOI: 10.1007/s12539-022-00526-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 04/23/2022] [Accepted: 04/27/2022] [Indexed: 06/15/2023]
Abstract
Metagenomics can directly extract the genetic material of all microorganisms from the environment, and obtain metagenomic samples with a large number of unknown DNA sequences. Binning of metagenomic contigs is a hot topic in metagenomics research. There are two key challenges for the current unsupervised metagenomic clustering algorithms. First, unsupervised metagenomic clustering methods rarely use reference databases, causing a certain waste of resources. Second, unsupervised metagenomic clustering methods are restricted by the characteristics of the sequences and the clustering algorithms, and the binning effect is limited. Therefore, a new binning method for metagenomic contigs using unsupervised clustering methods and reference databases is proposed to address these challenges, to make full use of the advantages of unsupervised clustering methods and reference databases constructed by scientists to improve the overall binning effect. This method uses the integrated SVM classification model to further bin the unsupervised clustering parts that do not perform well. Our proposed method was tested on simulated datasets and a real dataset and compared with other state-of-the-art metagenomic clustering methods including CONCOCT, Metabin2.0, Autometa, and MetaBAT. The results show that our method can achieve higher precision rate and improve the binning effect.
Collapse
Affiliation(s)
- Zhongjun Jiang
- College of Information Science and Technology, Ningbo University, Ningbo, 315211, China
| | - Xiaobo Li
- College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, 321004, China.
| | - Lijun Guo
- College of Information Science and Technology, Ningbo University, Ningbo, 315211, China
| |
Collapse
|
2
|
Yao Z, Zhu Y, Wu Q, Xu Y. Challenges and perspectives of quantitative microbiome profiling in food fermentations. Crit Rev Food Sci Nutr 2022; 64:4995-5015. [PMID: 36412251 DOI: 10.1080/10408398.2022.2147899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Spontaneously fermented foods are consumed and appreciated for thousands of years although they are usually produced with fluctuated productivity and quality, potentially threatening both food safety and food security. To guarantee consistent fermentation productivity and quality, it is essential to control the complex microbiota, the most crucial factor in food fermentations. The prerequisite for the control is to comprehensively understand the structure and function of the microbiota. How to quantify the actual microbiota is of paramount importance. Among various microbial quantitative methods evolved, quantitative microbiome profiling, namely to quantify all microbial taxa by absolute abundance, is the best method to understand the complex microbiota, although it is still at its pioneering stage for food fermentations. Here, we provide an overview of microbial quantitative methods, including the development from conventional methods to the advanced quantitative microbiome profiling, and the application examples of these methods. Moreover, we address potential challenges and perspectives of quantitative microbiome profiling methods, as well as future research needs for the ultimate goal of rational and optimal control of microbiota in spontaneous food fermentations. Our review can serve as reference for the traditional food fermentation sector for stable fermentation productivity, quality and safety.
Collapse
Affiliation(s)
- Zhihao Yao
- Lab of Brewing Microbiology and Applied Enzymology, The Key Laboratory of Industrial Biotechnology, Ministry of Education; State Key Laboratory of Food Science and Technology; School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Yang Zhu
- Bioprocess Engineering, Wageningen University and Research, Wageningen, The Netherlands
| | - Qun Wu
- Lab of Brewing Microbiology and Applied Enzymology, The Key Laboratory of Industrial Biotechnology, Ministry of Education; State Key Laboratory of Food Science and Technology; School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Yan Xu
- Lab of Brewing Microbiology and Applied Enzymology, The Key Laboratory of Industrial Biotechnology, Ministry of Education; State Key Laboratory of Food Science and Technology; School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| |
Collapse
|
3
|
Wu Z, Wang Y, Zeng J, Zhou Y. Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking. BMC Genomics 2022; 23:746. [PMID: 36352370 PMCID: PMC9647946 DOI: 10.1186/s12864-022-08967-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 10/25/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND So far, a lot of binning approaches have been intensively developed for untangling metagenome-assembled genomes (MAGs) and evaluated by two main strategies. The strategy by comparison to known genomes prevails over the other strategy by using single-copy genes. However, there is still no dataset with all known genomes for a real (not simulated) bacterial consortium yet. RESULTS Here, we continue investigating the real bacterial consortium F1RT enriched and sequenced by us previously, considering the high possibility to unearth all MAGs, due to its low complexity. The improved F1RT metagenome reassembled by metaSPAdes here utilizes about 98.62% of reads, and a series of analyses for the remaining reads suggests that the possibility of containing other low-abundance organisms in F1RT is greatly low, demonstrating that almost all MAGs are successfully assembled. Then, 4 isolates are obtained and individually sequenced. Based on the 4 isolate genomes and the entire metagenome, an elaborate pipeline is then in-house developed to construct all F1RT MAGs. A series of assessments extensively prove the high reliability of the herein reconstruction. Next, our findings further show that this dataset harbors several properties challenging for binning and thus is suitable to compare advanced binning tools available now or benchmark novel binners. Using this dataset, 8 advanced binning algorithms are assessed, giving useful insights for developing novel approaches. In addition, compared with our previous study, two novel MAGs termed FC8 and FC9 are discovered here, and 7 MAGs are solidly unearthed for species without any available genomes. CONCLUSION To our knowledge, it is the first time to construct a dataset with almost all known MAGs for a not simulated consortium. We hope that this dataset will be used as a routine toolkit to complement mock datasets for evaluating binning methods to further facilitate binning and metagenomic studies in the future.
Collapse
Affiliation(s)
- Ziyao Wu
- Guangxi Key Laboratory of Environmental Exposomics and Entire Lifecycle Health, School of Public Health, Guilin Medical University, Guilin, 541199, Guangxi, China
| | - Yuxiao Wang
- Guangxi Key Laboratory of Environmental Exposomics and Entire Lifecycle Health, School of Public Health, Guilin Medical University, Guilin, 541199, Guangxi, China
| | - Jiaqi Zeng
- Guangxi Key Laboratory of Environmental Exposomics and Entire Lifecycle Health, School of Public Health, Guilin Medical University, Guilin, 541199, Guangxi, China
- Insitute of Pathogeny Biology, School of Basic Medicine, Guilin Medical University, Guilin, 541199, Guangxi, China
| | - Yizhuang Zhou
- Guangxi Key Laboratory of Environmental Exposomics and Entire Lifecycle Health, School of Public Health, Guilin Medical University, Guilin, 541199, Guangxi, China.
| |
Collapse
|
4
|
Banerjee G, Agarwal S, Marshall A, Jones DH, Sulaiman IM, Sur S, Banerjee P. Application of advanced genomic tools in food safety rapid diagnostics: challenges and opportunities. Curr Opin Food Sci 2022. [DOI: 10.1016/j.cofs.2022.100886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
5
|
Jiang Z, Li X, Guo L. MetaCRS: unsupervised clustering of contigs with the recursive strategy of reducing metagenomic dataset's complexity. BMC Bioinformatics 2022; 22:315. [PMID: 35045830 PMCID: PMC8772042 DOI: 10.1186/s12859-021-04227-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 06/01/2021] [Indexed: 01/02/2023] Open
Abstract
Background Metagenomics technology can directly extract microbial genetic material from the environmental samples to obtain their sequencing reads, which can be further assembled into contigs through assembly tools. Clustering methods of contigs are subsequently applied to recover complete genomes from environmental samples. The main problems with current clustering methods are that they cannot recover more high-quality genes from complex environments. Firstly, there are multiple strains under the same species, resulting in assembly of chimeras. Secondly, different strains under the same species are difficult to be classified. Thirdly, it is difficult to determine the number of strains during the clustering process. Results In view of the shortcomings of current clustering methods, we propose an unsupervised clustering method which can improve the ability to recover genes from complex environments and a new method for selecting the number of sample’s strains in clustering process. The sequence composition characteristics (tetranucleotide frequency) and co-abundance are combined to train the probability model for clustering. A new recursive method that can continuously reduce the complexity of the samples is proposed to improve the ability to recover genes from complex environments. The new clustering method was tested on both simulated and real metagenomic datasets, and compared with five state-of-the-art methods including CONCOCT, Maxbin2.0, MetaBAT, MyCC and COCACOLA. In terms of the number and quality of recovered genes from metagenomic datasets, the results show that our proposed method is more effective. Conclusions A new contigs clustering method is proposed, which can recover more high-quality genes from complex environmental samples.
Collapse
Affiliation(s)
- Zhongjun Jiang
- College of Information Science and Technology, Ningbo University, Ningbo, 315211, China
| | - Xiaobo Li
- College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua, 321004, China. .,College of Engineering, Lishui University, Lishui, 323000, China.
| | - Lijun Guo
- College of Information Science and Technology, Ningbo University, Ningbo, 315211, China
| |
Collapse
|
6
|
Ma T, McAllister TA, Guan LL. A review of the resistome within the digestive tract of livestock. J Anim Sci Biotechnol 2021; 12:121. [PMID: 34763729 PMCID: PMC8588621 DOI: 10.1186/s40104-021-00643-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 10/07/2021] [Indexed: 12/25/2022] Open
Abstract
Antimicrobials have been widely used to prevent and treat infectious diseases and promote growth in food-production animals. However, the occurrence of antimicrobial resistance poses a huge threat to public and animal health, especially in less developed countries where food-producing animals often intermingle with humans. To limit the spread of antimicrobial resistance from food-production animals to humans and the environment, it is essential to have a comprehensive knowledge of the role of the resistome in antimicrobial resistance (AMR), The resistome refers to the collection of all antimicrobial resistance genes associated with microbiota in a given environment. The dense microbiota in the digestive tract is known to harbour one of the most diverse resistomes in nature. Studies of the resistome in the digestive tract of humans and animals are increasing exponentially as a result of advancements in next-generation sequencing and the expansion of bioinformatic resources/tools to identify and describe the resistome. In this review, we outline the various tools/bioinformatic pipelines currently available to characterize and understand the nature of the intestinal resistome of swine, poultry, and ruminants. We then propose future research directions including analysis of resistome using long-read sequencing, investigation in the role of mobile genetic elements in the expression, function and transmission of AMR. This review outlines the current knowledge and approaches to studying the resistome in food-producing animals and sheds light on future strategies to reduce antimicrobial usage and control the spread of AMR both within and from livestock production systems.
Collapse
Affiliation(s)
- Tao Ma
- Key laboratory of Feed Biotechnology of the Ministry of Agriculture, Institute of Feed Research, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.,Department of Agricultural, Food and Nutritional Science, University of Alberta, T6G2P5, Edmonton, AB, Canada
| | - Tim A McAllister
- Lethbridge Research and Development Centre, Lethbridge, AB, T1J 4P4, Canada
| | - Le Luo Guan
- Department of Agricultural, Food and Nutritional Science, University of Alberta, T6G2P5, Edmonton, AB, Canada.
| |
Collapse
|
7
|
Guo H, Li J. scSorter: assigning cells to known cell types according to marker genes. Genome Biol 2021; 22:69. [PMID: 33618746 PMCID: PMC7898451 DOI: 10.1186/s13059-021-02281-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 01/27/2021] [Indexed: 12/13/2022] Open
Abstract
On single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.
Collapse
Affiliation(s)
- Hongyu Guo
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, 102 Crowley Hall, Notre Dame, USA
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, 102 Crowley Hall, Notre Dame, USA.
| |
Collapse
|
8
|
Siu DMD, Lee KCM, Lo MCK, Stassen SV, Wang M, Zhang IZQ, So HKH, Chan GCF, Cheah KSE, Wong KKY, Hsin MKY, Ho JCM, Tsia KK. Deep-learning-assisted biophysical imaging cytometry at massive throughput delineates cell population heterogeneity. LAB ON A CHIP 2020; 20:3696-3708. [PMID: 32935707 DOI: 10.1039/d0lc00542h] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The association of the intrinsic optical and biophysical properties of cells to homeostasis and pathogenesis has long been acknowledged. Defining these label-free cellular features obviates the need for costly and time-consuming labelling protocols that perturb the living cells. However, wide-ranging applicability of such label-free cell-based assays requires sufficient throughput, statistical power and sensitivity that are unattainable with current technologies. To close this gap, we present a large-scale, integrative imaging flow cytometry platform and strategy that allows hierarchical analysis of intrinsic morphological descriptors of single-cell optical and mass density within a population of millions of cells. The optofluidic cytometry system also enables the synchronous single-cell acquisition of and correlation with fluorescently labeled biochemical markers. Combined with deep neural network and transfer learning, this massive single-cell profiling strategy demonstrates the label-free power to delineate the biophysical signatures of the cancer subtypes, to detect rare populations of cells in the heterogeneous samples (10-5), and to assess the efficacy of targeted therapeutics. This technique could spearhead the development of optofluidic imaging cell-based assays that stratify the underlying physiological and pathological processes based on the information-rich biophysical cellular phenotypes.
Collapse
Affiliation(s)
- Dickson M D Siu
- Department of Electrical and Electronic Engineering, Choi Yei Ching Building, The University of Hong Kong, Pokfulam Road, Pokfulam, Hong Kong.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Choi JH, In Kim H, Woo HG. scTyper: a comprehensive pipeline for the cell typing analysis of single-cell RNA-seq data. BMC Bioinformatics 2020; 21:342. [PMID: 32753029 PMCID: PMC7430822 DOI: 10.1186/s12859-020-03700-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 07/23/2020] [Indexed: 01/02/2023] Open
Abstract
Background Recent advances in single-cell RNA sequencing (scRNA-seq) technology have enabled the identification of individual cell types, such as epithelial cells, immune cells, and fibroblasts, in tissue samples containing complex cell populations. Cell typing is one of the key challenges in scRNA-seq data analysis that is usually achieved by estimating the expression of cell marker genes. However, there is no standard practice for cell typing, often resulting in variable and inaccurate outcomes. Results We have developed a comprehensive and user-friendly R-based scRNA-seq analysis and cell typing package, scTyper. scTyper also provides a database of cell type markers, scTyper.db, which contains 213 cell marker sets collected from literature. These marker sets include but are not limited to markers for malignant cells, cancer-associated fibroblasts, and tumor-infiltrating T cells. Additionally, scTyper provides three customized methods for estimating cell-type marker expression, including nearest template prediction (NTP), gene set enrichment analysis (GSEA), and average expression values. DNA copy number inference method (inferCNV) has been implemented with an improved modification that can be used for malignant cell typing. The package also supports the data preprocessing pipelines by Cell Ranger from 10X Genomics and the Seurat package. A summary reporting system is also implemented, which may facilitate users to perform reproducible analyses. Conclusions scTyper provides a comprehensive and user-friendly analysis pipeline for cell typing of scRNA-seq data with a curated cell marker database, scTyper.db.
Collapse
Affiliation(s)
- Ji-Hye Choi
- Department of Physiology, Ajou University School of Medicine, 164 Worldcup-ro, Yeongtong-gu, Suwon, 16499, Republic of Korea.,Department of Biomedical Science, Graduate School, Ajou University, Suwon, Republic of Korea
| | - Hye In Kim
- Department of Physiology, Ajou University School of Medicine, 164 Worldcup-ro, Yeongtong-gu, Suwon, 16499, Republic of Korea.,Department of Biomedical Science, Graduate School, Ajou University, Suwon, Republic of Korea
| | - Hyun Goo Woo
- Department of Physiology, Ajou University School of Medicine, 164 Worldcup-ro, Yeongtong-gu, Suwon, 16499, Republic of Korea. .,Department of Biomedical Science, Graduate School, Ajou University, Suwon, Republic of Korea.
| |
Collapse
|
10
|
Pérez-Cobas AE, Gomez-Valero L, Buchrieser C. Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses. Microb Genom 2020; 6:mgen000409. [PMID: 32706331 PMCID: PMC7641418 DOI: 10.1099/mgen.0.000409] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 06/30/2020] [Indexed: 12/23/2022] Open
Abstract
Metagenomics and marker gene approaches, coupled with high-throughput sequencing technologies, have revolutionized the field of microbial ecology. Metagenomics is a culture-independent method that allows the identification and characterization of organisms from all kinds of samples. Whole-genome shotgun sequencing analyses the total DNA of a chosen sample to determine the presence of micro-organisms from all domains of life and their genomic content. Importantly, the whole-genome shotgun sequencing approach reveals the genomic diversity present, but can also give insights into the functional potential of the micro-organisms identified. The marker gene approach is based on the sequencing of a specific gene region. It allows one to describe the microbial composition based on the taxonomic groups present in the sample. It is frequently used to analyse the biodiversity of microbial ecosystems. Despite its importance, the analysis of metagenomic sequencing and marker gene data is quite a challenge. Here we review the primary workflows and software used for both approaches and discuss the current challenges in the field.
Collapse
Affiliation(s)
- Ana Elena Pérez-Cobas
- Institut Pasteur, Biologie des Bactéries Intracellulaires, Paris, France and CNRS UMR 3525, 675724, Paris, France
| | - Laura Gomez-Valero
- Institut Pasteur, Biologie des Bactéries Intracellulaires, Paris, France and CNRS UMR 3525, 675724, Paris, France
| | - Carmen Buchrieser
- Institut Pasteur, Biologie des Bactéries Intracellulaires, Paris, France and CNRS UMR 3525, 675724, Paris, France
| |
Collapse
|