1
|
Sadurski J, Polak-Berecka M, Staniszewski A, Waśko A. Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review. Foods 2024; 13:2216. [PMID: 39063300 PMCID: PMC11276190 DOI: 10.3390/foods13142216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 07/11/2024] [Accepted: 07/12/2024] [Indexed: 07/28/2024] Open
Abstract
This review article offers a comprehensive overview of the current understanding of using metagenomic tools in food microbiome research. It covers the scientific foundation and practical application of genetic analysis techniques for microbial material from food, including bioinformatic analysis and data interpretation. The method discussed in the article for analyzing microorganisms in food without traditional culture methods is known as food metagenomics. This approach, along with other omics technologies such as nutrigenomics, proteomics, metabolomics, and transcriptomics, collectively forms the field of foodomics. Food metagenomics allows swift and thorough examination of bacteria and potential metabolic pathways by utilizing foodomic databases. Despite its established scientific basis and available bioinformatics resources, the research approach of food metagenomics outlined in the article is not yet widely implemented in industry. The authors believe that the integration of next-generation sequencing (NGS) with rapidly advancing digital technologies such as artificial intelligence (AI), the Internet of Things (IoT), and big data will facilitate the widespread adoption of this research strategy in microbial analysis for the food industry. This adoption is expected to enhance food safety and product quality in the near future.
Collapse
Affiliation(s)
- Jan Sadurski
- Department of Biotechnology, Microbiology and Human Nutrition, Faculty of Food Science and Biotechnology, University of Life Sciences in Lublin, 20-704 Lublin, Poland; (M.P.-B.); (A.S.); (A.W.)
| | | | | | | |
Collapse
|
2
|
Schwartz DC. Biophysics and the Genomic Sciences. Biophys J 2019; 117:2047-2053. [PMID: 31409480 DOI: 10.1016/j.bpj.2019.07.038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 06/27/2019] [Accepted: 07/09/2019] [Indexed: 12/20/2022] Open
Abstract
It is now rare to find biological, or genetic investigations that do not rely on the tools, data, and thinking drawn from the genomic sciences. Much of this revolution is powered by contemporary sequencing approaches that readily deliver large, genome-wide data sets that not only provide genetic insights but also uniquely report molecular outcomes from experiments that biophysicists are increasingly using for potentiating structural and mechanistic investigations. In this perspective, I describe a path of how biophysical thinking greatly contributed to this revolution in ways that parallel advancements in computer science through discussion of several key inventions, described as "foundational devices." These discussions also point at the future of how biophysics and the genomic sciences may become more finely integrated for empowering new measurement paradigms for biological investigations.
Collapse
Affiliation(s)
- David C Schwartz
- Department of Chemistry, Laboratory of Genetics, Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison, Madison Wisconsin.
| |
Collapse
|
3
|
Molina JE, Vasquez-Echeverri A, Schwartz DC, Hernández-Ortiz JP. Discrete and Continuum Models for the Salt in Crowded Environments of Suspended Charged Particles. J Chem Theory Comput 2018; 14:4901-4913. [PMID: 30044624 DOI: 10.1021/acs.jctc.8b00221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Electrostatic forces greatly affect the overall dynamics and diffusional activities of suspended charged particles in crowded environments. Accordingly, the concentration of counter- or co-ions in a fluid-''the salt"-determines the range, strength, and order of electrostatic interactions between particles. This environment fosters engineering routes for controlling directed assembly of particles at both the micro- and nanoscale. Here, we analyzed two computational modeling schemes that considered salt within suspensions of charged particles, or polyelectrolytes: discrete and continuum. Electrostatic interactions were included through a Green's function formalism, where the confined fundamental solution for Poisson's equation is resolved by the general geometry Ewald-like method. For the discrete model, the salt was considered as regularized point-charges with a specific valence and size, while concentration fields were defined for each ionic species for the continuum model. These considerations were evolved using Brownian dynamics of the suspended charged particles and the discrete salt ions, while a convection-diffusion transport equation, including the Nernst-Planck diffusion mechanism, accounted for the dynamics of the concentration fields. The salt/particle models were considered as suspensions under slit-confinement conditions for creating crowded "macro-ions", where density distributions and radial distribution functions were used to compare and differentiate computational models. Importantly, our analysis shows that disparate length scales or increased system size presented by the salt and suspended particles are best dealt with using concentration fields to model the ions. These findings were then validated by novel simulations of a semipermeable polyelectrolyte membrane, at the mesoscale, from which ionic channels emerged and enable ion conduction.
Collapse
Affiliation(s)
- Jarol E Molina
- Departamento de Materiales y Nanotecnología , Universidad Nacional de Colombia-Medellín , Medellín 050034 , Colombia
| | - Alejandro Vasquez-Echeverri
- Departamento de Materiales y Nanotecnología , Universidad Nacional de Colombia-Medellín , Medellín 050034 , Colombia
| | - David C Schwartz
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics , University of Wisconsin-Madison , Madison , Wisconsin 53706-1396 , United States.,The Biotechnology Center , University of Wisconsin-Madison , Madison , Wisconsin 53706-1396 , United States
| | - Juan P Hernández-Ortiz
- Departamento de Materiales y Nanotecnología , Universidad Nacional de Colombia-Medellín , Medellín 050034 , Colombia.,The Biotechnology Center , University of Wisconsin-Madison , Madison , Wisconsin 53706-1396 , United States.,Institute for Molecular Engineering , University of Chicago , Chicago , Illinois 60637 , United States
| |
Collapse
|
4
|
Rezende VB, Congrains C, Lima ALA, Campanini EB, Nakamura AM, Oliveira JLD, Chahad-Ehlers S, Junior IS, Alves de Brito R. Head Transcriptomes of Two Closely Related Species of Fruit Flies of the Anastrepha fraterculus Group Reveals Divergent Genes in Species with Extensive Gene Flow. G3 (BETHESDA, MD.) 2016; 6:3283-3295. [PMID: 27558666 PMCID: PMC5068948 DOI: 10.1534/g3.116.030486] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 08/10/2016] [Indexed: 11/18/2022]
Abstract
Several fruit flies species of the Anastrepha fraterculus group are of great economic importance for the damage they cause to a variety of fleshy fruits. Some species in this group have diverged recently, with evidence of introgression, showing similar morphological attributes that render their identification difficult, reinforcing the relevance of identifying new molecular markers that may differentiate species. We investigated genes expressed in head tissues from two closely related species: A. obliqua and A. fraterculus, aiming to identify fixed single nucleotide polymorphisms (SNPs) and highly differentiated transcripts, which, considering that these species still experience some level of gene flow, could indicate potential candidate genes involved in their differentiation process. We generated multiple libraries from head tissues of these two species, at different reproductive stages, for both sexes. Our analyses indicate that the de novo transcriptome assemblies are fairly complete. We also produced a hybrid assembly to map each species' reads, and identified 67,470 SNPs in A. fraterculus, 39,252 in A. obliqua, and 6386 that were common to both species. We identified 164 highly differentiated unigenes that had a mean interspecific index ([Formula: see text]) of at least 0.94. We selected unigenes that had Ka/Ks higher than 0.5, or had at least three or more highly differentiated SNPs as potential candidate genes for species differentiation. Among these candidates, we identified proteases, regulators of redox homeostasis, and an odorant-binding protein (Obp99c), among other genes. The head transcriptomes described here enabled the identification of thousands of genes hitherto unavailable for these species, and generated a set of candidate genes that are potentially important to genetically identify species and understand the speciation process in the presence of gene flow of A. obliqua and A. fraterculus.
Collapse
Affiliation(s)
- Victor Borges Rezende
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil
| | - Carlos Congrains
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil
| | - André Luís A Lima
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil
| | - Emeline Boni Campanini
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil
| | - Aline Minali Nakamura
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil
| | - Janaína Lima de Oliveira
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil
| | - Samira Chahad-Ehlers
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil
| | - Iderval Sobrinho Junior
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil
| | - Reinaldo Alves de Brito
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Paulo 13565-905, Brazil
| |
Collapse
|
5
|
Kuczynski J, Stombaugh J, Walters WA, González A, Caporaso JG, Knight R. Using QIIME to analyze 16S rRNA gene sequences from microbial communities. ACTA ACUST UNITED AC 2013. [PMID: 23184592 DOI: 10.1002/9780471729259.mc01e05s27] [Citation(s) in RCA: 312] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
QIIME (canonically pronounced "chime") is a software application that performs microbial community analysis. It is an acronym for Quantitative Insights Into Microbial Ecology, and has been used to analyze and interpret nucleic acid sequence data from fungal, viral, bacterial, and archaeal communities. The following protocols describe how to install QIIME on a single computer and use it to analyze microbial 16S sequence data from nine distinct microbial communities.
Collapse
Affiliation(s)
- Justin Kuczynski
- Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, Colorado, USA
| | | | | | | | | | | |
Collapse
|
6
|
Kuczynski J, Stombaugh J, Walters WA, González A, Caporaso JG, Knight R. Using QIIME to analyze 16S rRNA gene sequences from microbial communities. ACTA ACUST UNITED AC 2012; Chapter 10:10.7.1-10.7.20. [PMID: 22161565 DOI: 10.1002/0471250953.bi1007s36] [Citation(s) in RCA: 383] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
QIIME (canonically pronounced "chime") is a software application that performs microbial community analysis. It is an acronym for Quantitative Insights Into Microbial Ecology, and has been used to analyze and interpret nucleic acid sequence data from fungal, viral, bacterial, and archaeal communities. The following protocols describe how to install QIIME on a single computer and use it to analyze microbial 16S sequence data from nine distinct microbial communities.
Collapse
Affiliation(s)
- Justin Kuczynski
- Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, Colorado, USA
| | | | | | | | | | | |
Collapse
|
7
|
Hazelhurst S, Lipták Z. KABOOM! A new suffix array based algorithm for clustering expression data. ACTA ACUST UNITED AC 2011; 27:3348-55. [PMID: 21984769 DOI: 10.1093/bioinformatics/btr560] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Second-generation sequencing technology has reinvigorated research using expression data, and clustering such data remains a significant challenge, with much larger datasets and with different error profiles. Algorithms that rely on all-versus-all comparison of sequences are not practical for large datasets. RESULTS We introduce a new filter for string similarity which has the potential to eliminate the need for all-versus-all comparison in clustering of expression data and other similar tasks. Our filter is based on multiple long exact matches between the two strings, with the additional constraint that these matches must be sufficiently far apart. We give details of its efficient implementation using modified suffix arrays. We demonstrate its efficiency by presenting our new expression clustering tool, wcd-express, which uses this heuristic. We compare it to other current tools and show that it is very competitive both with respect to quality and run time. AVAILABILITY Source code and binaries available under GPL at http://code.google.com/p/wcdest. Runs on Linux and MacOS X. CONTACT scott.hazelhurst@wits.ac.za; zsuzsa@cebitec.uni-bielefeld.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Scott Hazelhurst
- Wits Bioinformatics, School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, Private Bag 3, 2050 Wits, South Africa.
| | | |
Collapse
|
8
|
Peng Q, Smith AD. Multiple sequence assembly from reads alignable to a common reference genome. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1283-1295. [PMID: 21778524 DOI: 10.1109/tcbb.2010.107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
We describe a set of computational problems motivated by certain analysis tasks in genome resequencing. These are assembly problems for which multiple distinct sequences must be assembled, but where the relative positions of reads to be assembled are already known. This information is obtained from a common reference genome and is characteristic of resequencing experiments. The simplest variant of the problem aims at determining a minimum set of superstrings such that each sequenced read matches at least one superstring. We give an algorithm with time complexity O(N), where N is the sum of the lengths of reads, substantially improving on previous algorithms for solving the same problem. We also examine the problem of finding the smallest number of reads to remove such that the remaining reads are consistent with k superstrings. By exploiting a surprising relationship with the minimum cost flow problem, we show that this problem can be solved in polynomial time when nested reads are excluded. If nested reads are permitted, this problem of removing the minimum number of reads becomes NP-hard. We show that permitting mismatches between reads and their nearest superstrings generally renders these problems NP-hard.
Collapse
Affiliation(s)
- Qian Peng
- Department of Computer Science & Engineering, University of California, San Diego, 9500 Gilman Drive, Mail Code 0404, La Jolla, CA 92093-0114, USA.
| | | |
Collapse
|
9
|
Kim Y, Kim KS, Kounovsky KL, Chang R, Jung GY, de Pablo JJ, Jo K, Schwartz DC. Nanochannel confinement: DNA stretch approaching full contour length. LAB ON A CHIP 2011; 11:1721-9. [PMID: 21431167 PMCID: PMC3222331 DOI: 10.1039/c0lc00680g] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Fully stretched DNA molecules are becoming a fundamental component of new systems for comprehensive genome analysis. Among a number of approaches for elongating DNA molecules, nanofluidic molecular confinement has received enormous attentions from physical and biological communities for the last several years. Here we demonstrate a well-optimized condition that a DNA molecule can stretch almost to its full contour length: the average stretch is 19.1 µm ± 1.1 µm for YOYO-1 stained λ DNA (21.8 µm contour length) in 250 nm × 400 nm channel, which is the longest stretch value ever reported in any nanochannels or nanoslits. In addition, based on Odijk's polymer physics theory, we interpret our experimental findings as a function of channel dimensions and ionic strengths. Furthermore, we develop a Monte Carlo simulation approach using a primitive model for the rigorous understanding of DNA confinement effects. Collectively, we present a more complete understanding of nanochannel confined DNA stretching via the comparisons to computer simulation results and Odijk's polymer physics theory.
Collapse
Affiliation(s)
- Yoori Kim
- Department of Chemistry and Interdisciplinary Program of Integrated Biotechnology, Sogang University, Seoul, 121-742, Republic of Korea, Tel: +82 2 705 8881
| | - Ki Seok Kim
- Department of Material Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 500-712, Republic of Korea
| | - Kristy L. Kounovsky
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, 53706 Tel: +1 608 265-0546
| | - Rakwoo Chang
- Department of Chemistry, Kwangwoon University, Seoul 139-701, Republic of Korea
| | - Gun Young Jung
- Department of Material Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju, 500-712, Republic of Korea
| | - Juan J. de Pablo
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, WI 53706
| | - Kyubong Jo
- Department of Chemistry and Interdisciplinary Program of Integrated Biotechnology, Sogang University, Seoul, 121-742, Republic of Korea, Tel: +82 2 705 8881
| | - David C. Schwartz
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, 53706 Tel: +1 608 265-0546
| |
Collapse
|
10
|
Comparing de novo genome assembly: the long and short of it. PLoS One 2011; 6:e19175. [PMID: 21559467 PMCID: PMC3084767 DOI: 10.1371/journal.pone.0019175] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 03/29/2011] [Indexed: 01/30/2023] Open
Abstract
Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.
Collapse
|