1
|
Shen K, Din AU, Sinha B, Zhou Y, Qian F, Shen B. Translational informatics for human microbiota: data resources, models and applications. Brief Bioinform 2023; 24:7152256. [PMID: 37141135 DOI: 10.1093/bib/bbad168] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 04/07/2023] [Accepted: 04/11/2023] [Indexed: 05/05/2023] Open
Abstract
With the rapid development of human intestinal microbiology and diverse microbiome-related studies and investigations, a large amount of data have been generated and accumulated. Meanwhile, different computational and bioinformatics models have been developed for pattern recognition and knowledge discovery using these data. Given the heterogeneity of these resources and models, we aimed to provide a landscape of the data resources, a comparison of the computational models and a summary of the translational informatics applied to microbiota data. We first review the existing databases, knowledge bases, knowledge graphs and standardizations of microbiome data. Then, the high-throughput sequencing techniques for the microbiome and the informatics tools for their analyses are compared. Finally, translational informatics for the microbiome, including biomarker discovery, personalized treatment and smart healthcare for complex diseases, are discussed.
Collapse
Affiliation(s)
- Ke Shen
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Ahmad Ud Din
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Baivab Sinha
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Yi Zhou
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Fuliang Qian
- Center for Systems Biology, Suzhou Medical College of Soochow University, Suzhou 215123, China
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Suzhou 215123, China
| | - Bairong Shen
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| |
Collapse
|
2
|
Pillay S, Calderón-Franco D, Urhan A, Abeel T. Metagenomic-based surveillance systems for antibiotic resistance in non-clinical settings. Front Microbiol 2022; 13:1066995. [PMID: 36532424 PMCID: PMC9755710 DOI: 10.3389/fmicb.2022.1066995] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 11/09/2022] [Indexed: 08/12/2023] Open
Abstract
The success of antibiotics as a therapeutic agent has led to their ineffectiveness. The continuous use and misuse in clinical and non-clinical areas have led to the emergence and spread of antibiotic-resistant bacteria and its genetic determinants. This is a multi-dimensional problem that has now become a global health crisis. Antibiotic resistance research has primarily focused on the clinical healthcare sectors while overlooking the non-clinical sectors. The increasing antibiotic usage in the environment - including animals, plants, soil, and water - are drivers of antibiotic resistance and function as a transmission route for antibiotic resistant pathogens and is a source for resistance genes. These natural compartments are interconnected with each other and humans, allowing the spread of antibiotic resistance via horizontal gene transfer between commensal and pathogenic bacteria. Identifying and understanding genetic exchange within and between natural compartments can provide insight into the transmission, dissemination, and emergence mechanisms. The development of high-throughput DNA sequencing technologies has made antibiotic resistance research more accessible and feasible. In particular, the combination of metagenomics and powerful bioinformatic tools and platforms have facilitated the identification of microbial communities and has allowed access to genomic data by bypassing the need for isolating and culturing microorganisms. This review aimed to reflect on the different sequencing techniques, metagenomic approaches, and bioinformatics tools and pipelines with their respective advantages and limitations for antibiotic resistance research. These approaches can provide insight into resistance mechanisms, the microbial population, emerging pathogens, resistance genes, and their dissemination. This information can influence policies, develop preventative measures and alleviate the burden caused by antibiotic resistance.
Collapse
Affiliation(s)
- Stephanie Pillay
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
| | | | - Aysun Urhan
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Thomas Abeel
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| |
Collapse
|
3
|
Li Y, Altamia MA, Shipway JR, Brugler MR, Bernardino AF, de Brito TL, Lin Z, da Silva Oliveira FA, Sumida P, Smith CR, Trindade-Silva A, Halanych KM, Distel DL. Contrasting modes of mitochondrial genome evolution in sister taxa of wood-eating marine bivalves (Teredinidae and Xylophagaidae). Genome Biol Evol 2022; 14:evac089. [PMID: 35714221 PMCID: PMC9226539 DOI: 10.1093/gbe/evac089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 05/19/2022] [Accepted: 06/05/2022] [Indexed: 11/14/2022] Open
Abstract
The bivalve families Teredinidae and Xylophagaidae include voracious consumers of wood in shallow and deep-water marine environments, respectively. The taxa are sister clades whose members consume wood as food with the aid of intracellular cellulolytic endosymbionts housed in their gills. This combination of adaptations is found in no other group of animals and was likely present in the common ancestor of both families. Despite these commonalities, the two families have followed dramatically different evolutionary paths with respect to anatomy, life history and distribution. Here we present 42 new mitochondrial genome sequences from Teredinidae and Xylophagaidae and show that distinct trajectories have also occurred in the evolution and organization of their mitochondrial genomes. Teredinidae display significantly greater rates of amino acid substitution but absolute conservation of protein-coding gene order, whereas Xylophagaidae display significantly less amino acid change but have undergone numerous and diverse changes in genome organization since their divergence from a common ancestor. As with many bivalves, these mitochondrial genomes encode two ribosomal RNAs, 12 protein coding genes, and 22 tRNAs; atp8 was not detected. We further show that their phylogeny, as inferred from amino acid sequences of 12 concatenated mitochondrial protein-coding genes, is largely congruent with those inferred from their nuclear genomes based on 18S and 28S ribosomal RNA sequences. Our results provide a robust phylogenetic framework to explore the tempo and mode of mitochondrial genome evolution and offer directions for future phylogenetic and taxonomic studies of wood-boring bivalves.
Collapse
Affiliation(s)
- Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China
| | - Marvin A Altamia
- Ocean Genome Legacy Center, Department of Marine and Environmental Science, Northeastern University, Nahant, Massachusetts 01908, USA
| | - J Reuben Shipway
- Marine Biology and Ecology Research Centre, School of Biological and Marine Sciences, University of Plymouth, Plymouth PL4 8AA, United Kingdom
| | - Mercer R Brugler
- Department of Natural Sciences, University of South Carolina Beaufort, 801 Carteret Street, Beaufort, South Carolina 29902, USA
- Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th Street, New York, New York 10024, USA
| | | | - Thaís Lima de Brito
- Drug Research and Development Center, Department of Physiology and Pharmacology, Federal University of Ceará, Ceará, Brazil
| | - Zhenjian Lin
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah, USA
| | | | - Paulo Sumida
- Departamento de Oceanografia Biológica, Instituto Oceanográfico da Universidade de São Paulo, São Paulo, SP, Brazil
| | - Craig R Smith
- Department of Oceanography, University of Hawai’i at Mãnoa, Hawaii, USA
| | - Amaro Trindade-Silva
- Drug Research and Development Center, Department of Physiology and Pharmacology, Federal University of Ceará, Ceará, Brazil
| | - Kenneth M Halanych
- Center for Marine Science, University of North Carolina Wilmington, North Carolina, USA
| | - Daniel L Distel
- Ocean Genome Legacy Center, Department of Marine and Environmental Science, Northeastern University, Nahant, Massachusetts 01908, USA
| |
Collapse
|
4
|
de Medeiros Azevedo T, Aburjaile FF, Ferreira-Neto JRC, Pandolfi V, Benko-Iseppon AM. The endophytome (plant-associated microbiome): methodological approaches, biological aspects, and biotech applications. World J Microbiol Biotechnol 2021; 37:206. [PMID: 34708327 DOI: 10.1007/s11274-021-03168-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 10/05/2021] [Indexed: 11/25/2022]
Abstract
Similar to other organisms, plants establish interactions with a variety of microorganisms in their natural environment. The plant microbiome occupies the host plant's tissues, either internally or on its surfaces, showing interactions that can assist in its growth, development, and adaptation to face environmental stresses. The advance of metagenomics and metatranscriptomics approaches has strongly driven the study and recognition of plant microbiome impacts. Research in this regard provides comprehensive information about the taxonomic and functional aspects of microbial plant communities, contributing to a better understanding of their dynamics. Evidence of the plant microbiome's functional potential has boosted its exploitation to develop more ecological and sustainable agricultural practices that impact human health. Although microbial inoculants' development and use are promising to revolutionize crop production, interdisciplinary studies are needed to identify new candidates and promote effective practical applications. On the other hand, there are challenges in understanding and analyzing complex data generated within a plant microbiome project's scope. This review presents aspects about the complex structuring and assembly of the microbiome in the host plant's tissues, metagenomics, and metatranscriptomics approaches for its understanding, covering descriptions of recent studies concerning metagenomics to characterize the microbiome of non-model plants under different aspects. Studies involving bio-inoculants, isolated from plant microbial communities, capable of assisting in crops' productivity, are also reviewed.
Collapse
Affiliation(s)
- Thamara de Medeiros Azevedo
- Departamento de Genética, Centro de Biociências, Universidade Federal de Pernambuco (UFPE), Av. Prof. Moraes Rego, 1235 - Cidade Universitária, Recife, PE, CEP: 50670-901, Brazil
| | - Flávia Figueira Aburjaile
- Departamento de Genética, Centro de Biociências, Universidade Federal de Pernambuco (UFPE), Av. Prof. Moraes Rego, 1235 - Cidade Universitária, Recife, PE, CEP: 50670-901, Brazil
| | - José Ribamar Costa Ferreira-Neto
- Departamento de Genética, Centro de Biociências, Universidade Federal de Pernambuco (UFPE), Av. Prof. Moraes Rego, 1235 - Cidade Universitária, Recife, PE, CEP: 50670-901, Brazil
| | - Valesca Pandolfi
- Departamento de Genética, Centro de Biociências, Universidade Federal de Pernambuco (UFPE), Av. Prof. Moraes Rego, 1235 - Cidade Universitária, Recife, PE, CEP: 50670-901, Brazil
| | - Ana Maria Benko-Iseppon
- Departamento de Genética, Centro de Biociências, Universidade Federal de Pernambuco (UFPE), Av. Prof. Moraes Rego, 1235 - Cidade Universitária, Recife, PE, CEP: 50670-901, Brazil.
| |
Collapse
|
5
|
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools. Funct Integr Genomics 2021; 22:3-26. [PMID: 34657989 DOI: 10.1007/s10142-021-00810-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 09/25/2021] [Accepted: 10/03/2021] [Indexed: 10/20/2022]
Abstract
This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."
Collapse
|
6
|
Secondary Metabolism in the Gill Microbiota of Shipworms (Teredinidae) as Revealed by Comparison of Metagenomes and Nearly Complete Symbiont Genomes. mSystems 2020; 5:5/3/e00261-20. [PMID: 32606027 PMCID: PMC7329324 DOI: 10.1128/msystems.00261-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
We define a system in which the major symbionts that are important to host biology and to the production of secondary metabolites can be cultivated. We show that symbiotic bacteria that are critical to host nutrition and lifestyle also have an immense capacity to produce a multitude of diverse and likely novel bioactive secondary metabolites that could lead to the discovery of drugs and that these pathways are found within shipworm gills. We propose that, by shaping associated microbial communities within the host, the compounds support the ability of shipworms to degrade wood in marine environments. Because these symbionts can be cultivated and genetically manipulated, they provide a powerful model for understanding how secondary metabolism impacts microbial symbiosis. Shipworms play critical roles in recycling wood in the sea. Symbiotic bacteria supply enzymes that the organisms need for nutrition and wood degradation. Some of these bacteria have been grown in pure culture and have the capacity to make many secondary metabolites. However, little is known about whether such secondary metabolite pathways are represented in the symbiont communities within their hosts. In addition, little has been reported about the patterns of host-symbiont co-occurrence. Here, we collected shipworms from the United States, the Philippines, and Brazil and cultivated symbiotic bacteria from their gills. We analyzed sequences from 22 shipworm gill metagenomes from seven shipworm species and from 23 cultivated symbiont isolates. Using (meta)genome sequencing, we demonstrate that the cultivated isolates represent all the major bacterial symbiont species and strains in shipworm gills. We show that the bacterial symbionts are distributed among shipworm hosts in consistent, predictable patterns. The symbiotic bacteria harbor many gene cluster families (GCFs) for biosynthesis of bioactive secondary metabolites, only <5% of which match previously described biosynthetic pathways. Because we were able to cultivate the symbionts and to sequence their genomes, we can definitively enumerate the biosynthetic pathways in these symbiont communities, showing that ∼150 of ∼200 total biosynthetic gene clusters (BGCs) present in the animal gill metagenomes are represented in our culture collection. Shipworm symbionts occur in suites that differ predictably across a wide taxonomic and geographic range of host species and collectively constitute an immense resource for the discovery of new biosynthetic pathways corresponding to bioactive secondary metabolites. IMPORTANCE We define a system in which the major symbionts that are important to host biology and to the production of secondary metabolites can be cultivated. We show that symbiotic bacteria that are critical to host nutrition and lifestyle also have an immense capacity to produce a multitude of diverse and likely novel bioactive secondary metabolites that could lead to the discovery of drugs and that these pathways are found within shipworm gills. We propose that, by shaping associated microbial communities within the host, the compounds support the ability of shipworms to degrade wood in marine environments. Because these symbionts can be cultivated and genetically manipulated, they provide a powerful model for understanding how secondary metabolism impacts microbial symbiosis.
Collapse
|
7
|
Malla MA, Dubey A, Kumar A, Yadav S, Hashem A, Abd_Allah EF. Exploring the Human Microbiome: The Potential Future Role of Next-Generation Sequencing in Disease Diagnosis and Treatment. Front Immunol 2019; 9:2868. [PMID: 30666248 PMCID: PMC6330296 DOI: 10.3389/fimmu.2018.02868] [Citation(s) in RCA: 148] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 11/21/2018] [Indexed: 12/12/2022] Open
Abstract
The interaction between the human microbiome and immune system has an effect on several human metabolic functions and impacts our well-being. Additionally, the interaction between humans and microbes can also play a key role in determining the wellness or disease status of the human body. Dysbiosis is related to a plethora of diseases, including skin, inflammatory, metabolic, and neurological disorders. A better understanding of the host-microbe interaction is essential for determining the diagnosis and appropriate treatment of these ailments. The significance of the microbiome on host health has led to the emergence of new therapeutic approaches focused on the prescribed manipulation of the host microbiome, either by removing harmful taxa or reinstating missing beneficial taxa and the functional roles they perform. Culturing large numbers of microbial taxa in the laboratory is problematic at best, if not impossible. Consequently, this makes it very difficult to comprehensively catalog the individual members comprising a specific microbiome, as well as understanding how microbial communities function and influence host-pathogen interactions. Recent advances in sequencing technologies and computational tools have allowed an increasing number of metagenomic studies to be performed. These studies have provided key insights into the human microbiome and a host of other microbial communities in other environments. In the present review, the role of the microbiome as a therapeutic agent and its significance in human health and disease is discussed. Advances in high-throughput sequencing technologies for surveying host-microbe interactions are also discussed. Additionally, the correlation between the composition of the microbiome and infectious diseases as described in previously reported studies is covered as well. Lastly, recent advances in state-of-the-art bioinformatics software, workflows, and applications for analysing metagenomic data are summarized.
Collapse
Affiliation(s)
- Muneer Ahmad Malla
- Department of Zoology, Dr. Harisingh Gour Central University, Sagar, India
| | - Anamika Dubey
- Metagenomics and Secretomics Research Laboratory, Department of Botany, Dr. Harisingh Gour Central University, Sagar, India
| | - Ashwani Kumar
- Metagenomics and Secretomics Research Laboratory, Department of Botany, Dr. Harisingh Gour Central University, Sagar, India
| | - Shweta Yadav
- Department of Zoology, Dr. Harisingh Gour Central University, Sagar, India
| | - Abeer Hashem
- Department of Botany and Microbiology, College of Science, King Saud University, Riyadh, Saudi Arabia
- Mycology and Plant Disease Survey Department, Plant Pathology Research Institute, Agriculture Research Center, Giza, Egypt
| | - Elsayed Fathi Abd_Allah
- Department of Plant Production, College of Food and Agricultural Sciences, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
8
|
Fierst JL, Murdock DA. Decontaminating eukaryotic genome assemblies with machine learning. BMC Bioinformatics 2017; 18:533. [PMID: 29191179 PMCID: PMC5709863 DOI: 10.1186/s12859-017-1941-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 11/14/2017] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND High-throughput sequencing has made it theoretically possible to obtain high-quality de novo assembled genome sequences but in practice DNA extracts are often contaminated with sequences from other organisms. Currently, there are few existing methods for rigorously decontaminating eukaryotic assemblies. Those that do exist filter sequences based on nucleotide similarity to contaminants and risk eliminating sequences from the target organism. RESULTS We introduce a novel application of an established machine learning method, a decision tree, that can rigorously classify sequences. The major strength of the decision tree is that it can take any measured feature as input and does not require a priori identification of significant descriptors. We use the decision tree to classify de novo assembled sequences and compare the method to published protocols. CONCLUSIONS A decision tree performs better than existing methods when classifying sequences in eukaryotic de novo assemblies. It is efficient, readily implemented, and accurately identifies target and contaminant sequences. Importantly, a decision tree can be used to classify sequences according to measured descriptors and has potentially many uses in distilling biological datasets.
Collapse
Affiliation(s)
- Janna L Fierst
- Department of Biological Sciences, University of Alabama, Tuscaloosa, 35487, AL, USA.
| | - Duncan A Murdock
- Department of Biological Sciences, University of Alabama, Tuscaloosa, 35487, AL, USA
| |
Collapse
|
9
|
Abstract
Microbiome analysis involves determining the composition and function of a community of microorganisms in a particular location. For the gastroenterologist, this technology opens up a rapidly evolving set of challenges and opportunities for generating novel insights into the health of patients on the basis of microbiota characterizations from intestinal, hepatic or extraintestinal samples. Alterations in gut microbiota composition correlate with intestinal and extraintestinal disease and, although only a few mechanisms are known, the microbiota are still an attractive target for developing biomarkers for disease detection and management as well as potential therapeutic applications. In this Review, we summarize the major decision points confronting new entrants to the field or for those designing new projects in microbiome research. We provide recommendations based on current technology options and our experience of sequencing platform choices. We also offer perspectives on future applications of microbiome research, which we hope convey the promise of this technology for clinical applications.
Collapse
|
10
|
Kanj S, Brüls T, Gazut S. Shared Nearest Neighbor Clustering in a Locality Sensitive Hashing Framework. J Comput Biol 2017; 25:236-250. [PMID: 28953425 DOI: 10.1089/cmb.2017.0113] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We present a new algorithm to cluster high-dimensional sequence data and its application to the field of metagenomics, which aims at reconstructing individual genomes from a mixture of genomes sampled from an environmental site, without any prior knowledge of reference data (genomes) or the shape of clusters. Such problems typically cannot be solved directly with classical approaches seeking to estimate the density of clusters, for example, using the shared nearest neighbors (SNN) rule, due to the prohibitive size of contemporary sequence datasets. We explore here a new approach based on combining the SNN rule with the concept of locality sensitive hashing (LSH). The proposed method, called LSH-SNN, works by randomly splitting the input data into smaller-sized subsets (buckets) and employing the SNN rule on each of these buckets. Links can be created among neighbors sharing a sufficient number of elements, hence allowing clusters to be grown from linked elements. LSH-SNN can scale up to larger datasets consisting of millions of sequences, while achieving high accuracy across a variety of sample sizes and complexities.
Collapse
Affiliation(s)
- Sawsan Kanj
- 1 CEA , Genoscope, Evry, France .,2 CEA, LIST, Laboratoire d'Analyse de Données et Intelligence des Systèmes, Gif-sur-Yvette, France .,3 Université d'Evry , Evry, France .,4 CNRS-UMR 8030 , Evry, France .,5 Université Paris-Saclay , Evry, France
| | - Thomas Brüls
- 1 CEA , Genoscope, Evry, France .,3 Université d'Evry , Evry, France .,4 CNRS-UMR 8030 , Evry, France .,5 Université Paris-Saclay , Evry, France
| | - Stéphane Gazut
- 2 CEA, LIST, Laboratoire d'Analyse de Données et Intelligence des Systèmes, Gif-sur-Yvette, France
| |
Collapse
|
11
|
Interpreting Microbial Biosynthesis in the Genomic Age: Biological and Practical Considerations. Mar Drugs 2017; 15:md15060165. [PMID: 28587290 PMCID: PMC5484115 DOI: 10.3390/md15060165] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 05/22/2017] [Accepted: 05/31/2017] [Indexed: 02/06/2023] Open
Abstract
Genome mining has become an increasingly powerful, scalable, and economically accessible tool for the study of natural product biosynthesis and drug discovery. However, there remain important biological and practical problems that can complicate or obscure biosynthetic analysis in genomic and metagenomic sequencing projects. Here, we focus on limitations of available technology as well as computational and experimental strategies to overcome them. We review the unique challenges and approaches in the study of symbiotic and uncultured systems, as well as those associated with biosynthetic gene cluster (BGC) assembly and product prediction. Finally, to explore sequencing parameters that affect the recovery and contiguity of large and repetitive BGCs assembled de novo, we simulate Illumina and PacBio sequencing of the Salinispora tropica genome focusing on assembly of the salinilactam (slm) BGC.
Collapse
|
12
|
Discovery of chemoautotrophic symbiosis in the giant shipworm Kuphus polythalamia (Bivalvia: Teredinidae) extends wooden-steps theory. Proc Natl Acad Sci U S A 2017; 114:E3652-E3658. [PMID: 28416684 DOI: 10.1073/pnas.1620470114] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The "wooden-steps" hypothesis [Distel DL, et al. (2000) Nature 403:725-726] proposed that large chemosynthetic mussels found at deep-sea hydrothermal vents descend from much smaller species associated with sunken wood and other organic deposits, and that the endosymbionts of these progenitors made use of hydrogen sulfide from biogenic sources (e.g., decaying wood) rather than from vent fluids. Here, we show that wood has served not only as a stepping stone between habitats but also as a bridge between heterotrophic and chemoautotrophic symbiosis for the giant mud-boring bivalve Kuphus polythalamia This rare and enigmatic species, which achieves the greatest length of any extant bivalve, is the only described member of the wood-boring bivalve family Teredinidae (shipworms) that burrows in marine sediments rather than wood. We show that K. polythalamia harbors sulfur-oxidizing chemoautotrophic (thioautotrophic) bacteria instead of the cellulolytic symbionts that allow other shipworm species to consume wood as food. The characteristics of its symbionts, its phylogenetic position within Teredinidae, the reduction of its digestive system by comparison with other family members, and the loss of morphological features associated with wood digestion indicate that K. polythalamia is a chemoautotrophic bivalve descended from wood-feeding (xylotrophic) ancestors. This is an example in which a chemoautotrophic endosymbiosis arose by displacement of an ancestral heterotrophic symbiosis and a report of pure culture of a thioautotrophic endosymbiont.
Collapse
|
13
|
Noecker C, McNally CP, Eng A, Borenstein E. High-resolution characterization of the human microbiome. Transl Res 2017; 179:7-23. [PMID: 27513210 PMCID: PMC5164958 DOI: 10.1016/j.trsl.2016.07.012] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Revised: 07/12/2016] [Accepted: 07/15/2016] [Indexed: 12/29/2022]
Abstract
The human microbiome plays an important and increasingly recognized role in human health. Studies of the microbiome typically use targeted sequencing of the 16S rRNA gene, whole metagenome shotgun sequencing, or other meta-omic technologies to characterize the microbiome's composition, activity, and dynamics. Processing, analyzing, and interpreting these data involve numerous computational tools that aim to filter, cluster, annotate, and quantify the obtained data and ultimately provide an accurate and interpretable profile of the microbiome's taxonomy, functional capacity, and behavior. These tools, however, are often limited in resolution and accuracy and may fail to capture many biologically and clinically relevant microbiome features, such as strain-level variation or nuanced functional response to perturbation. Over the past few years, extensive efforts have been invested toward addressing these challenges and developing novel computational methods for accurate and high-resolution characterization of microbiome data. These methods aim to quantify strain-level composition and variation, detect and characterize rare microbiome species, link specific genes to individual taxa, and more accurately characterize the functional capacity and dynamics of the microbiome. These methods and the ability to produce detailed and precise microbiome information are clearly essential for informing microbiome-based personalized therapies. In this review, we survey these methods, highlighting the challenges each method sets out to address and briefly describing methodological approaches.
Collapse
Affiliation(s)
- Cecilia Noecker
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Colin P McNally
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Alexander Eng
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Elhanan Borenstein
- Department of Genome Sciences, University of Washington, Seattle, WA
- Department of Computer Science and Engineering, University of Washington, Seattle, WA
- Santa Fe Institute, Santa Fe, NM
| |
Collapse
|
14
|
Sedlar K, Kupkova K, Provaznik I. Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics. Comput Struct Biotechnol J 2016; 15:48-55. [PMID: 27980708 PMCID: PMC5148923 DOI: 10.1016/j.csbj.2016.11.005] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Revised: 11/24/2016] [Accepted: 11/26/2016] [Indexed: 12/11/2022] Open
Abstract
One of main steps in a study of microbial communities is resolving their composition, diversity and function. In the past, these issues were mostly addressed by the use of amplicon sequencing of a target gene because of reasonable price and easier computational postprocessing of the bioinformatic data. With the advancement of sequencing techniques, the main focus shifted to the whole metagenome shotgun sequencing, which allows much more detailed analysis of the metagenomic data, including reconstruction of novel microbial genomes and to gain knowledge about genetic potential and metabolic capacities of whole environments. On the other hand, the output of whole metagenomic shotgun sequencing is mixture of short DNA fragments belonging to various genomes, therefore this approach requires more sophisticated computational algorithms for clustering of related sequences, commonly referred to as sequence binning. There are currently two types of binning methods: taxonomy dependent and taxonomy independent. The first type classifies the DNA fragments by performing a standard homology inference against a reference database, while the latter performs the reference-free binning by applying clustering techniques on features extracted from the sequences. In this review, we describe the strategies within the second approach. Although these strategies do not require prior knowledge, they have higher demands on the length of sequences. Besides their basic principle, an overview of particular methods and tools is provided. Furthermore, the review covers the utilization of the methods in context with the length of sequences and discusses the needs for metagenomic data preprocessing in form of initial assembly prior to binning.
Collapse
Affiliation(s)
- Karel Sedlar
- Department of Biomedical Engineering, Brno University of Technology, Technicka 12, Brno, Czech Republic
| | | | | |
Collapse
|
15
|
Bouhajja E, Agathos SN, George IF. Metagenomics: Probing pollutant fate in natural and engineered ecosystems. Biotechnol Adv 2016; 34:1413-1426. [PMID: 27825829 DOI: 10.1016/j.biotechadv.2016.10.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Revised: 10/01/2016] [Accepted: 10/12/2016] [Indexed: 12/23/2022]
Abstract
Polluted environments are a reservoir of microbial species able to degrade or to convert pollutants to harmless compounds. The proper management of microbial resources requires a comprehensive characterization of their genetic pool to assess the fate of contaminants and increase the efficiency of bioremediation processes. Metagenomics offers appropriate tools to describe microbial communities in their whole complexity without lab-based cultivation of individual strains. After a decade of use of metagenomics to study microbiomes, the scientific community has made significant progress in this field. In this review, we survey the main steps of metagenomics applied to environments contaminated with organic compounds or heavy metals. We emphasize technical solutions proposed to overcome encountered obstacles. We then compare two metagenomic approaches, i.e. library-based targeted metagenomics and direct sequencing of metagenomes. In the former, environmental DNA is cloned inside a host, and then clones of interest are selected based on (i) their expression of biodegradative functions or (ii) sequence homology with probes and primers designed from relevant, already known sequences. The highest score for the discovery of novel genes and degradation pathways has been achieved so far by functional screening of large clone libraries. On the other hand, direct sequencing of metagenomes without a cloning step has been more often applied to polluted environments for characterization of the taxonomic and functional composition of microbial communities and their dynamics. In this case, the analysis has focused on 16S rRNA genes and marker genes of biodegradation. Advances in next generation sequencing and in bioinformatic analysis of sequencing data have opened up new opportunities for assessing the potential of biodegradation by microbes, but annotation of collected genes is still hampered by a limited number of available reference sequences in databases. Although metagenomics is still facing technical and computational challenges, our review of the recent literature highlights its value as an aid to efficiently monitor the clean-up of contaminated environments and develop successful strategies to mitigate the impact of pollutants on ecosystems.
Collapse
Affiliation(s)
- Emna Bouhajja
- Laboratoire de Génie Biologique, Earth and Life Institute, Université Catholique de Louvain, Place Croix du Sud 2, boite L7.05.19, 1348 Louvain-la-Neuve, Belgium
| | - Spiros N Agathos
- Laboratoire de Génie Biologique, Earth and Life Institute, Université Catholique de Louvain, Place Croix du Sud 2, boite L7.05.19, 1348 Louvain-la-Neuve, Belgium; School of Life Sciences and Biotechnology, Yachay Tech University, 100119 San Miguel de Urcuquí, Ecuador
| | - Isabelle F George
- Université Libre de Bruxelles, Laboratoire d'Ecologie des Systèmes Aquatiques, Campus de la Plaine CP 221, Boulevard du Triomphe, 1050 Brussels, Belgium.
| |
Collapse
|
16
|
Comin M, Schimd M. Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values. BMC Med Genomics 2016; 9 Suppl 1:36. [PMID: 27535823 PMCID: PMC4989896 DOI: 10.1186/s12920-016-0193-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Background Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. Results In this paper we present a family of alignment-free measures, called dq-type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. Conclusion The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments.
Collapse
Affiliation(s)
- Matteo Comin
- Department of Information Engineering, University of Padova, Via Gradenigo 6/A, Padova, Italy.
| | - Michele Schimd
- Department of Information Engineering, University of Padova, Via Gradenigo 6/A, Padova, Italy
| |
Collapse
|
17
|
Lin Z, Torres JP, Tianero MD, Kwan JC, Schmidt EW. Origin of Chemical Diversity in Prochloron-Tunicate Symbiosis. Appl Environ Microbiol 2016; 82:3450-60. [PMID: 27037119 PMCID: PMC4959158 DOI: 10.1128/aem.00860-16] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 03/23/2016] [Indexed: 12/20/2022] Open
Abstract
UNLABELLED Diversity-generating metabolism leads to the evolution of many different chemicals in living organisms. Here, by examining a marine symbiosis, we provide a precise evolutionary model of how nature generates a family of novel chemicals, the cyanobactins. We show that tunicates and their symbiotic Prochloron cyanobacteria share congruent phylogenies, indicating that Prochloron phylogeny is related to host phylogeny and not to external habitat or geography. We observe that Prochloron exchanges discrete functional genetic modules for cyanobactin secondary metabolite biosynthesis in an otherwise conserved genetic background. The module exchange leads to gain or loss of discrete chemical functional groups. Because the underlying enzymes exhibit broad substrate tolerance, discrete exchange of substrates and enzymes between Prochloron strains leads to the rapid generation of chemical novelty. These results have implications in choosing biochemical pathways and enzymes for engineered or combinatorial biosynthesis. IMPORTANCE While most biosynthetic pathways lead to one or a few products, a subset of pathways are diversity generating and are capable of producing thousands to millions of derivatives. This property is highly useful in biotechnology since it enables biochemical or synthetic biological methods to create desired chemicals. A fundamental question has been how nature itself creates this chemical diversity. Here, by examining the symbiosis between coral reef animals and bacteria, we describe the genetic basis of chemical variation with unprecedented precision. New compounds from the cyanobactin family are created by either varying the substrate or importing needed enzymatic functions from other organisms or via both mechanisms. This natural process matches successful laboratory strategies to engineer the biosynthesis of new chemicals and teaches a new strategy to direct biosynthesis.
Collapse
Affiliation(s)
- Zhenjian Lin
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah, USAUniversity of Tennessee and Oak Ridge National Laboratory
| | - Joshua P Torres
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah, USAUniversity of Tennessee and Oak Ridge National Laboratory
| | - M Diarey Tianero
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah, USAUniversity of Tennessee and Oak Ridge National Laboratory
| | - Jason C Kwan
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah, USAUniversity of Tennessee and Oak Ridge National Laboratory
| | - Eric W Schmidt
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah, USAUniversity of Tennessee and Oak Ridge National Laboratory
| |
Collapse
|
18
|
Le VV, Tran LV, Tran HV. A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads. BMC Bioinformatics 2016; 17:22. [PMID: 26740458 PMCID: PMC4702387 DOI: 10.1186/s12859-015-0872-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 12/22/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Taxonomic assignment is a crucial step in a metagenomic project which aims to identify the origin of sequences in an environmental sample. Among the existing methods, since composition-based algorithms are not sufficient for classifying short reads, recent algorithms use only the feature of similarity, or similarity-based combined features. However, those algorithms suffer from the computational expense because the task of similarity search is very time-consuming. Besides, the lack of similarity information between reads and reference sequences due to the length of short reads reduces significantly the classification quality. RESULTS This paper presents a novel taxonomic assignment algorithm, called SeMeta, which is based on semi-supervised learning to produce a fast and highly accurate classification of short-length reads with sufficient mutual overlap. The proposed algorithm firstly separates reads into clusters using their composition feature. It then labels the clusters with the support of an efficient filtering technique on results of the similarity search between their reads and reference databases. Furthermore, instead of performing the similarity search for all reads in the clusters, SeMeta only does for reads in their subgroups by utilizing the information of sequence overlapping. The experimental results demonstrate that SeMeta outperforms two other similarity-based algorithms on different aspects. CONCLUSIONS By using a semi-supervised method as well as taking the advantages of various features, the proposed algorithm is able not only to achieve high classification quality, but also to reduce much computational cost. The source codes of the algorithm can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMeta.html.
Collapse
Affiliation(s)
- Vinh Van Le
- Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, HCM City, Vietnam.
- Faculty of Information Technology, HCMC University of Technology and Education, 1 Vo Van Ngan, Thu Duc, HCM City, Vietnam.
| | - Lang Van Tran
- Institute of Applied Mechanics and Informatics, Vietnam Academy of Science and Technology, 01 Mac Dinh Chi, Q1, HCM City, Vietnam.
- Faculty of Information Technology, Lac Hong University, 10 Huynh Van Nghe, Bien Hoa, Dong Nai, Vietnam.
| | - Hoai Van Tran
- Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, HCM City, Vietnam.
| |
Collapse
|
19
|
Escobar-Zepeda A, Vera-Ponce de León A, Sanchez-Flores A. The Road to Metagenomics: From Microbiology to DNA Sequencing Technologies and Bioinformatics. Front Genet 2015; 6:348. [PMID: 26734060 PMCID: PMC4681832 DOI: 10.3389/fgene.2015.00348] [Citation(s) in RCA: 145] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 11/27/2015] [Indexed: 12/17/2022] Open
Abstract
The study of microorganisms that pervade each and every part of this planet has encountered many challenges through time such as the discovery of unknown organisms and the understanding of how they interact with their environment. The aim of this review is to take the reader along the timeline and major milestones that led us to modern metagenomics. This new and thriving area is likely to be an important contributor to solve different problems. The transition from classical microbiology to modern metagenomics studies has required the development of new branches of knowledge and specialization. Here, we will review how the availability of high-throughput sequencing technologies has transformed microbiology and bioinformatics and how to tackle the inherent computational challenges that arise from the DNA sequencing revolution. New computational methods are constantly developed to collect, process, and extract useful biological information from a variety of samples and complex datasets, but metagenomics needs the integration of several of these computational methods. Despite the level of specialization needed in bioinformatics, it is important that life-scientists have a good understanding of it for a correct experimental design, which allows them to reveal the information in a metagenome.
Collapse
Affiliation(s)
- Alejandra Escobar-Zepeda
- Unidad de Secuenciación Masiva y Bioinformática, Instituto de Biotecnología, Universidad Nacional Autónoma de MéxicoCuernavaca, México
| | - Arturo Vera-Ponce de León
- Programa de Ecología Genómica, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de MéxicoCuernavaca, México
| | - Alejandro Sanchez-Flores
- Unidad de Secuenciación Masiva y Bioinformática, Instituto de Biotecnología, Universidad Nacional Autónoma de MéxicoCuernavaca, México
| |
Collapse
|
20
|
Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 2015; 9:75-88. [PMID: 25983555 PMCID: PMC4426941 DOI: 10.4137/bbi.s12462] [Citation(s) in RCA: 177] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 03/09/2015] [Accepted: 03/13/2015] [Indexed: 12/14/2022] Open
Abstract
Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of "metagenomics", often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards.
Collapse
Affiliation(s)
- Anastasis Oulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christina Pavloudi
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
- Department of Biology, University of Ghent, Ghent, Belgium
- Department of Microbial Ecophysiology, University of Bremen, Bremen, Germany
| | - Paraskevi Polymenakou
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Georgios Kotoulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christos Arvanitidis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| |
Collapse
|
21
|
Zhang R, Cheng Z, Guan J, Zhou S. Exploiting topic modeling to boost metagenomic reads binning. BMC Bioinformatics 2015; 16 Suppl 5:S2. [PMID: 25859745 PMCID: PMC4402587 DOI: 10.1186/1471-2105-16-s5-s2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With the rapid development of high-throughput technologies, researchers can sequence the whole metagenome of a microbial community sampled directly from the environment. The assignment of these metagenomic reads into different species or taxonomical classes is a vital step for metagenomic analysis, which is referred to as binning of metagenomic data. RESULTS In this paper, we propose a new method TM-MCluster for binning metagenomic reads. First, we represent each metagenomic read as a set of "k-mers" with their frequencies occurring in the read. Then, we employ a probabilistic topic model -- the Latent Dirichlet Allocation (LDA) model to the reads, which generates a number of hidden "topics" such that each read can be represented by a distribution vector of the generated topics. Finally, as in the MCluster method, we apply SKWIC -- a variant of the classical K-means algorithm with automatic feature weighting mechanism to cluster these reads represented by topic distributions. CONCLUSIONS Experiments show that the new method TM-MCluster outperforms major existing methods, including AbundanceBin, MetaCluster 3.0/5.0 and MCluster. This result indicates that the exploitation of topic modeling can effectively improve the binning performance of metagenomic reads.
Collapse
|
22
|
Vinh LV, Lang TV, Binh LT, Hoai TV. A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms Mol Biol 2015; 10:2. [PMID: 25648210 PMCID: PMC4304631 DOI: 10.1186/s13015-014-0030-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 10/20/2014] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Metagenomics is the study of genetic materials derived directly from complex microbial samples, instead of from culture. One of the crucial steps in metagenomic analysis, referred to as "binning", is to separate reads into clusters that represent genomes from closely related organisms. Among the existing binning methods, unsupervised methods base the classification on features extracted from reads, and especially taking advantage in case of the limitation of reference database availability. However, their performance, under various aspects, is still being investigated by recent theoretical and empirical studies. The one addressed in this paper is among those efforts to enhance the accuracy of the classification. RESULTS This paper presents an unsupervised algorithm, called BiMeta, for binning of reads from different species in a metagenomic dataset. The algorithm consists of two phases. In the first phase of the algorithm, reads are grouped into groups based on overlap information between the reads. The second phase merges the groups by using an observation on l-mer frequency distribution of sets of non-overlapping reads. The experimental results on simulated and real datasets showed that BiMeta outperforms three state-of-the-art binning algorithms for both short and long reads (≥700 b p) datasets. CONCLUSIONS This paper developed a novel and efficient algorithm for binning of metagenomic reads, which does not require any reference database. The software implementing the algorithm and all test datasets mentioned in this paper can be downloaded at http://it.hcmute.edu.vn/bioinfo/bimeta/index.htm.
Collapse
Affiliation(s)
- Le Van Vinh
- />Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam
| | - Tran Van Lang
- />Institute of Applied Mechanics and Informatics, Vietnam Academy of Science and Technology (VAST), 01 Mac Dinh Chi, Q1, Ho Chi Minh City, Vietnam
- />Faculty of Information Technology, Lac Hong University, 10 Huynh Van Nghe, Bien Hoa, Dong Nai Vietnam
| | - Le Thanh Binh
- />Institute of Biotechnology, Vietnam Academy of Science and Technology (VAST), 18 Hoang Quoc Viet, Cau Giay, Ha Noi Vietnam
| | - Tran Van Hoai
- />Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam
| |
Collapse
|
23
|
Abram F. Systems-based approaches to unravel multi-species microbial community functioning. Comput Struct Biotechnol J 2014; 13:24-32. [PMID: 25750697 PMCID: PMC4348430 DOI: 10.1016/j.csbj.2014.11.009] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Revised: 11/25/2014] [Accepted: 11/26/2014] [Indexed: 01/24/2023] Open
Abstract
Some of the most transformative discoveries promising to enable the resolution of this century's grand societal challenges will most likely arise from environmental science and particularly environmental microbiology and biotechnology. Understanding how microbes interact in situ, and how microbial communities respond to environmental changes remains an enormous challenge for science. Systems biology offers a powerful experimental strategy to tackle the exciting task of deciphering microbial interactions. In this framework, entire microbial communities are considered as metaorganisms and each level of biological information (DNA, RNA, proteins and metabolites) is investigated along with in situ environmental characteristics. In this way, systems biology can help unravel the interactions between the different parts of an ecosystem ultimately responsible for its emergent properties. Indeed each level of biological information provides a different level of characterisation of the microbial communities. Metagenomics, metatranscriptomics, metaproteomics, metabolomics and SIP-omics can be employed to investigate collectively microbial community structure, potential, function, activity and interactions. Omics approaches are enabled by high-throughput 21st century technologies and this review will discuss how their implementation has revolutionised our understanding of microbial communities.
Collapse
Affiliation(s)
- Florence Abram
- Functional Environmental Microbiology, School of Natural Sciences, National University of Ireland Galway, University Road, Galway, Ireland
| |
Collapse
|
24
|
Ladoukakis E, Kolisis FN, Chatziioannou AA. Integrative workflows for metagenomic analysis. Front Cell Dev Biol 2014; 2:70. [PMID: 25478562 PMCID: PMC4237130 DOI: 10.3389/fcell.2014.00070] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 11/05/2014] [Indexed: 01/22/2023] Open
Abstract
The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications.
Collapse
Affiliation(s)
- Efthymios Ladoukakis
- Laboratory of Biotechnology, Department of Chemical Engineering, School of Chemical Engineering, National Technical University of Athens Athens, Greece
| | - Fragiskos N Kolisis
- Laboratory of Biotechnology, Department of Chemical Engineering, School of Chemical Engineering, National Technical University of Athens Athens, Greece
| | - Aristotelis A Chatziioannou
- Metabolic Engineering and Bioinformatics Program, Institute of Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation Athens, Greece
| |
Collapse
|
25
|
Magasin JD, Gerloff DL. Pooled assembly of marine metagenomic datasets: enriching annotation through chimerism. ACTA ACUST UNITED AC 2014; 31:311-7. [PMID: 25306399 DOI: 10.1093/bioinformatics/btu546] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Despite advances in high-throughput sequencing, marine metagenomic samples remain largely opaque. A typical sample contains billions of microbial organisms from thousands of genomes and quadrillions of DNA base pairs. Its derived metagenomic dataset underrepresents this complexity by orders of magnitude because of the sparseness and shortness of sequencing reads. Read shortness and sequencing errors pose a major challenge to accurate species and functional annotation. This includes distinguishing known from novel species. Often the majority of reads cannot be annotated and thus cannot help our interpretation of the sample. RESULTS Here, we demonstrate quantitatively how careful assembly of marine metagenomic reads within, but also across, datasets can alleviate this problem. For 10 simulated datasets, each with species complexity modeled on a real counterpart, chimerism remained within the same species for most contigs (97%). For 42 real pyrosequencing ('454') datasets, assembly increased the proportion of annotated reads, and even more so when datasets were pooled, by on average 1.6% (max 6.6%) for species, 9.0% (max 28.7%) for Pfam protein domains and 9.4% (max 22.9%) for PANTHER gene families. Our results outline exciting prospects for data sharing in the metagenomics community. While chimeric sequences should be avoided in other areas of metagenomics (e.g. biodiversity analyses), conservative pooled assembly is advantageous for annotation specificity and sensitivity. Intriguingly, our experiment also found potential prospects for (low-cost) discovery of new species in 'old' data. CONTACT dgerloff@ffame.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jonathan D Magasin
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064 and Foundation for Applied Molecular Evolution (FfAME), Gainesville, FL 32604, USA
| | - Dietlind L Gerloff
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064 and Foundation for Applied Molecular Evolution (FfAME), Gainesville, FL 32604, USA
| |
Collapse
|