1
|
Singh J, Teotia S, Singh AK, Arya M, Rout AK, Behera BK, Majumder S. Whole genome sequence analysis of shallot virus X from India reveals it to be a natural recombinant with positive selection pressure. BMC Genom Data 2024; 25:42. [PMID: 38711021 DOI: 10.1186/s12863-024-01196-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/23/2024] [Indexed: 05/08/2024] Open
Abstract
BACKGROUND Shallots are infected by various viruses like Onion yellow dwarf virus (OYDV), Leek yellow stripe virus (LYSV), Shallot latent virus (SLV) and Shallot virus X (ShVX). In India, they have been found to be persistently infected by ShVX. ShVX also infects onion and garlic in combination with other carlaviruses and potyviruses. ShVX is a member of genus Allexivirus of family Alphaflexiviridae. ShVX has a monopartite genome, which is represented by positive sense single-stranded RNA. Globally, only six complete and 3 nearly complete genome sequences of ShV X are reported to date. This number is insufficient to measure a taxon's true molecular diversity. Moreover, the complete genome sequence of ShVX from Asia has not been reported as yet. Therefore, this study was undertaken to generate a complete genome sequence of ShVX from India. RESULTS Shallot virus X (ShVX) is one of the significant threats to Allium crop production. In this study, we report the first complete genome sequence of the ShVX from India through Next-generation sequencing (NGS). The complete genome of the ShVX (Accession No. OK104171), from this study comprised 8911 nucleotides. In-silico analysis of the sequence revealed variability between this isolate and isolates from other countries. The dissimilarities are spread all over the genome specifically some non-coding intergenic regions. Statistical analysis of individual genes for site-specific selection indicates a positive selection in NABP region. The presence of a recombination event was detected in coat protein region. The sequence similarity percentage and phylogenetic analysis indicate ShVX Indian isolate is a distinctly different isolate. Recombination and site-specific selection may have a function in the evolution of this isolate. This is the first detailed study of the ShVX complete genome sequence from Southeast Asia. CONCLUSION This study presents the first report of the entire genome sequence of an Indian isolate of ShVX along with an in-depth exploration of its evolutionary traits. The findings highlight the Indian variant as a naturally occurring recombinant, emphasizing the substantial role of recombination in the evolution of this viral species. This insight into the molecular diversity of strains within a specific geographical region holds immense significance for comprehending and forecasting potential epidemics. Consequently, the insights garnered from this research hold practical value for shaping ShVX management strategies and providing a foundation for forthcoming studies delving into its evolutionary trajectory.
Collapse
Affiliation(s)
- Jyoti Singh
- Department of Biotechnology, Sharda University, Greater Noida, India
| | - Sachin Teotia
- Department of Biotechnology, Sharda University, Greater Noida, India
| | - Ajay Kumar Singh
- Deaprtment of Bioinformatics, Central University of South Bihar, Gaya, Bihar, India
| | - Meenakshi Arya
- Rani Lakshmi Bai Central Agricultural University, 284003, Jhansi, Uttar Pradesh, India.
| | - Ajaya Kumar Rout
- Rani Lakshmi Bai Central Agricultural University, 284003, Jhansi, Uttar Pradesh, India
| | - Bijay Kumar Behera
- Rani Lakshmi Bai Central Agricultural University, 284003, Jhansi, Uttar Pradesh, India
| | - Shahana Majumder
- Department of Botany, School of Life Sciences, Mahatma Gandhi Central University, Motihari, Bihar, India.
| |
Collapse
|
2
|
Ojala T, Häkkinen AE, Kankuri E, Kankainen M. Current concepts, advances, and challenges in deciphering the human microbiota with metatranscriptomics. Trends Genet 2023; 39:686-702. [PMID: 37365103 DOI: 10.1016/j.tig.2023.05.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 05/24/2023] [Accepted: 05/25/2023] [Indexed: 06/28/2023]
Abstract
Metatranscriptomics refers to the analysis of the collective microbial transcriptome of a sample. Its increased utilization for the characterization of human-associated microbial communities has enabled the discovery of many disease-state related microbial activities. Here, we review the principles of metatranscriptomics-based analysis of human-associated microbial samples. We describe strengths and weaknesses of popular sample preparation, sequencing, and bioinformatics approaches and summarize strategies for their use. We then discuss how human-associated microbial communities have recently been examined and how their characterization may change. We conclude that metatranscriptomics insights into human microbiotas under health and disease have not only expanded our knowledge on human health, but also opened avenues for rational antimicrobial drug use and disease management.
Collapse
Affiliation(s)
- Teija Ojala
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | | | - Esko Kankuri
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Matti Kankainen
- Hematology Research Unit, University of Helsinki, Helsinki, Finland; Laboratory of Genetics, HUS Diagnostic Center, Hospital District of Helsinki and Uusimaa (HUS), Helsinki, Finland.
| |
Collapse
|
3
|
Couvillion SP, Mostoller KE, Williams JE, Pace RM, Stohel IL, Peterson HK, Nicora CD, Nakayasu ES, Webb-Robertson BJM, McGuire MA, McGuire MK, Metz TO. Interrogating the role of the milk microbiome in mastitis in the multi-omics era. Front Microbiol 2023; 14:1105675. [PMID: 36819069 PMCID: PMC9932517 DOI: 10.3389/fmicb.2023.1105675] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 01/16/2023] [Indexed: 02/05/2023] Open
Abstract
There is growing interest in a functional understanding of milk-associated microbiota as there is ample evidence that host-associated microbial communities play an active role in host health and phenotype. Mastitis, characterized by painful inflammation of the mammary gland, is prevalent among lactating humans and agricultural animals and is associated with significant clinical and economic consequences. The etiology of mastitis is complex and polymicrobial and correlative studies have indicated alterations in milk microbial community composition. Recent evidence is beginning to suggest that a causal relationship may exist between the milk microbiota and host phenotype in mastitis. Multi-omic approaches can be leveraged to gain a mechanistic, molecular level understanding of how the milk microbiome might modulate host physiology, thereby informing strategies to prevent and ameliorate mastitis. In this paper, we review existing studies that have utilized omics approaches to investigate the role of the milk microbiome in mastitis. We also summarize the strengths and challenges associated with the different omics techniques including metagenomics, metatranscriptomics, metaproteomics, metabolomics and lipidomics and provide perspective on the integration of multiple omics technologies for a better functional understanding of the milk microbiome.
Collapse
Affiliation(s)
- Sneha P. Couvillion
- Pacific Northwest National Laboratory, Earth and Biological Sciences Directorate, Richland, WA, United States,*Correspondence: Sneha P. Couvillion, ✉
| | - Katie E. Mostoller
- Pacific Northwest National Laboratory, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Janet E. Williams
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, ID, United States
| | - Ryan M. Pace
- Margaret Ritchie School of Family and Consumer Sciences, University of Idaho, Moscow, ID, United States
| | - Izabel L. Stohel
- Pacific Northwest National Laboratory, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Haley K. Peterson
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, ID, United States
| | - Carrie D. Nicora
- Pacific Northwest National Laboratory, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Ernesto S. Nakayasu
- Pacific Northwest National Laboratory, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Mark A. McGuire
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, ID, United States
| | - Michelle K. McGuire
- Margaret Ritchie School of Family and Consumer Sciences, University of Idaho, Moscow, ID, United States
| | - Thomas O. Metz
- Pacific Northwest National Laboratory, Earth and Biological Sciences Directorate, Richland, WA, United States,Thomas O. Metz, ✉
| |
Collapse
|
4
|
Chen JW, Shrestha L, Green G, Leier A, Marquez-Lago TT. The hitchhikers' guide to RNA sequencing and functional analysis. Brief Bioinform 2023; 24:bbac529. [PMID: 36617463 PMCID: PMC9851315 DOI: 10.1093/bib/bbac529] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/18/2022] [Accepted: 11/07/2022] [Indexed: 01/10/2023] Open
Abstract
DNA and RNA sequencing technologies have revolutionized biology and biomedical sciences, sequencing full genomes and transcriptomes at very high speeds and reasonably low costs. RNA sequencing (RNA-Seq) enables transcript identification and quantification, but once sequencing has concluded researchers can be easily overwhelmed with questions such as how to go from raw data to differential expression (DE), pathway analysis and interpretation. Several pipelines and procedures have been developed to this effect. Even though there is no unique way to perform RNA-Seq analysis, it usually follows these steps: 1) raw reads quality check, 2) alignment of reads to a reference genome, 3) aligned reads' summarization according to an annotation file, 4) DE analysis and 5) gene set analysis and/or functional enrichment analysis. Each step requires researchers to make decisions, and the wide variety of options and resulting large volumes of data often lead to interpretation challenges. There also seems to be insufficient guidance on how best to obtain relevant information and derive actionable knowledge from transcription experiments. In this paper, we explain RNA-Seq steps in detail and outline differences and similarities of different popular options, as well as advantages and disadvantages. We also discuss non-coding RNA analysis, multi-omics, meta-transcriptomics and the use of artificial intelligence methods complementing the arsenal of tools available to researchers. Lastly, we perform a complete analysis from raw reads to DE and functional enrichment analysis, visually illustrating how results are not absolute truths and how algorithmic decisions can greatly impact results and interpretation.
Collapse
Affiliation(s)
- Jiung-Wen Chen
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Lisa Shrestha
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - George Green
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Microbiology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| |
Collapse
|
5
|
Shafranskaya D, Kale V, Finn R, Lapidus AL, Korobeynikov A, Prjibelski AD. MetaGT: A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data. Front Microbiol 2022; 13:981458. [PMID: 36386613 PMCID: PMC9651917 DOI: 10.3389/fmicb.2022.981458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 09/29/2022] [Indexed: 11/25/2022] Open
Abstract
While metagenome sequencing may provide insights on the genome sequences and composition of microbial communities, metatranscriptome analysis can be useful for studying the functional activity of a microbiome. RNA-Seq data provides the possibility to determine active genes in the community and how their expression levels depend on external conditions. Although the field of metatranscriptomics is relatively young, the number of projects related to metatranscriptome analysis increases every year and the scope of its applications expands. However, there are several problems that complicate metatranscriptome analysis: complexity of microbial communities, wide dynamic range of transcriptome expression and importantly, the lack of high-quality computational methods for assembling meta-RNA sequencing data. These factors deteriorate the contiguity and completeness of metatranscriptome assemblies, therefore affecting further downstream analysis. Here we present MetaGT, a pipeline for de novo assembly of metatranscriptomes, which is based on the idea of combining both metatranscriptomic and metagenomic data sequenced from the same sample. MetaGT assembles metatranscriptomic contigs and fills in missing regions based on their alignments to metagenome assembly. This approach allows to overcome described complexities and obtain complete RNA sequences, and additionally estimate their abundances. Using various publicly available real and simulated datasets, we demonstrate that MetaGT yields significant improvement in coverage and completeness of metatranscriptome assemblies compared to existing methods that do not exploit metagenomic data. The pipeline is implemented in NextFlow and is freely available from https://github.com/ablab/metaGT.
Collapse
Affiliation(s)
- Daria Shafranskaya
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Varsha Kale
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Rob Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Alla L. Lapidus
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Andrey D. Prjibelski
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- *Correspondence: Andrey D. Prjibelski,
| |
Collapse
|
6
|
Deng M, Liao CQ, Chen Q, Huang GH, Wang X. Phylogenetic relationships among Bombycinae (Lepidoptera, Bombycoidea, and Bombycidae) based on mitochondrial genomes. ARCHIVES OF INSECT BIOCHEMISTRY AND PHYSIOLOGY 2022; 111:e21889. [PMID: 35349185 DOI: 10.1002/arch.21889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 03/09/2022] [Accepted: 03/10/2022] [Indexed: 06/14/2023]
Abstract
The subfamily Bombycinae Latreille, [1802] is an important silk-producing group, including well-known economical insects. Although there are many studies on the development of these economic insects, the relationships between genera/species of this subfamily are still unclear. Two data sets of mitochondrial genomes, 13 protein-coding genes (13PCGs) and 13PCGs-AA, were used to estimate phylogenetic relationships based on the maximum likelihood and Bayesian inference methods. The results strongly support the subfamily Bombycinae as a monophyletic group divided into two clades.
Collapse
Affiliation(s)
- Min Deng
- College of Plant Protection, Hunan Agricultural University, Changsha, Hunan, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, Hunan, China
| | - Cheng-Qing Liao
- College of Plant Protection, Hunan Agricultural University, Changsha, Hunan, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, Hunan, China
| | - Qi Chen
- College of Plant Protection, Hunan Agricultural University, Changsha, Hunan, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, Hunan, China
| | - Guo-Hua Huang
- College of Plant Protection, Hunan Agricultural University, Changsha, Hunan, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, Hunan, China
| | - Xing Wang
- College of Science, Qiongtai Normal University, Haikou, Hainan, China
- College of Plant Protection, Hunan Agricultural University, Changsha, Hunan, China
| |
Collapse
|
7
|
Lopez MLD, Lin YY, Sato M, Hsieh CH, Shiah FK, Machida RJ. Using metatranscriptomics to estimate the diversity and composition of zooplankton communities. Mol Ecol Resour 2021; 22:638-652. [PMID: 34555254 PMCID: PMC9293175 DOI: 10.1111/1755-0998.13506] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 08/18/2021] [Accepted: 09/09/2021] [Indexed: 01/04/2023]
Abstract
DNA metabarcoding is a rapid, high‐resolution tool used for biomonitoring complex zooplankton communities. However, diversity estimates derived with this approach can be biased by the co‐detection of sequences from environmental DNA (eDNA), nuclear‐encoded mitochondrial (NUMT) pseudogene contamination, and taxon‐specific PCR primer affinity differences. To avoid these methodological uncertainties, we tested the use of metatranscriptomics as an alternative approach for characterizing zooplankton communities. Specifically, we compared metatranscriptomics with PCR‐based methods using genomic (gDNA) and complementary DNA (cDNA) amplicons, and morphology‐based data for estimating species diversity and composition for both mock communities and field‐collected samples. Mock community analyses showed that the use of gDNA mitochondrial cytochrome c oxidase I (mtCO1) amplicons inflates species richness due to the co‐detection of extra‐organismal eDNA. Significantly more amplicon sequence variants, nucleotide diversity, and indels were observed with gDNA amplicons than with cDNA, indicating the presence of putative NUMT pseudogenes. Moreover, PCR‐based methods failed to detect the most abundant species in mock communities due to priming site mismatch. Overall, metatranscriptomics provided estimates of species richness and composition that closely resembled those derived from morphological data. The use of metatranscriptomics was further tested using field‐collected samples, with the results showing consistent species diversity estimates among biological and technical replicates. Additionally, temporal zooplankton species composition changes could be monitored using different mitochondrial markers. These findings demonstrate the advantages of metatranscriptomics as an effective tool for monitoring diversity in zooplankton research.
Collapse
Affiliation(s)
- Mark Louie D Lopez
- Biodiversity Program, Taiwan International Graduate Program, Academia Sinica and National Taiwan Normal University, Taipei, Taiwan.,Department of Life Science, National Taiwan Normal University, Taipei, Taiwan
| | - Ya-Ying Lin
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Mitsuhide Sato
- Department of Environment and Fisheries Resources, Nagasaki University, Nagasaki, Japan
| | - Chih-Hao Hsieh
- Institute of Oceanography, National Taiwan University, Taipei, Taiwan.,Environmental Change Research Center, Academia Sinica, Taipei, Taiwan
| | - Fuh-Kwo Shiah
- Environmental Change Research Center, Academia Sinica, Taipei, Taiwan
| | - Ryuji J Machida
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
8
|
Zhang Y, Thompson KN, Branck T, Yan Yan, Nguyen LH, Franzosa EA, Huttenhower C. Metatranscriptomics for the Human Microbiome and Microbial Community Functional Profiling. Annu Rev Biomed Data Sci 2021; 4:279-311. [PMID: 34465175 DOI: 10.1146/annurev-biodatasci-031121-103035] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Shotgun metatranscriptomics (MTX) is an increasingly practical way to survey microbial community gene function and regulation at scale. This review begins by summarizing the motivations for community transcriptomics and the history of the field. We then explore the principles, best practices, and challenges of contemporary MTX workflows: beginning with laboratory methods for isolation and sequencing of community RNA, followed by informatics methods for quantifying RNA features, and finally statistical methods for detecting differential expression in a community context. In thesecond half of the review, we survey important biological findings from the MTX literature, drawing examples from the human microbiome, other (nonhuman) host-associated microbiomes, and the environment. Across these examples, MTX methods prove invaluable for probing microbe-microbe and host-microbe interactions, the dynamics of energy harvest and chemical cycling, and responses to environmental stresses. We conclude with a review of open challenges in the MTX field, including making assays and analyses more robust, accessible, and adaptable to new technologies; deciphering roles for millions of uncharacterized microbial transcripts; and solving applied problems such as biomarker discovery and development of microbial therapeutics.
Collapse
Affiliation(s)
- Yancong Zhang
- Harvard Chan Microbiome in Public Health Center and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA; , .,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Kelsey N Thompson
- Harvard Chan Microbiome in Public Health Center and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA; , .,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Tobyn Branck
- Harvard Chan Microbiome in Public Health Center and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA; , .,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Systems, Synthetic, and Quantitative Biology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Yan Yan
- Harvard Chan Microbiome in Public Health Center and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA; , .,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Long H Nguyen
- Harvard Chan Microbiome in Public Health Center and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA; , .,Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA.,Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02108, USA
| | - Eric A Franzosa
- Harvard Chan Microbiome in Public Health Center and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA; , .,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Curtis Huttenhower
- Harvard Chan Microbiome in Public Health Center and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA; , .,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| |
Collapse
|
9
|
Mehta S, Crane M, Leith E, Batut B, Hiltemann S, Arntzen MØ, Kunath BJ, Pope PB, Delogu F, Sajulga R, Kumar P, Johnson JE, Griffin TJ, Jagtap PD. ASaiM-MT: a validated and optimized ASaiM workflow for metatranscriptomics analysis within Galaxy framework. F1000Res 2021; 10:103. [PMID: 34484688 PMCID: PMC8383124 DOI: 10.12688/f1000research.28608.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/12/2021] [Indexed: 12/13/2022] Open
Abstract
The Earth Microbiome Project (EMP) aided in understanding the role of microbial communities and the influence of collective genetic material (the 'microbiome') and microbial diversity patterns across the habitats of our planet. With the evolution of new sequencing technologies, researchers can now investigate the microbiome and map its influence on the environment and human health. Advances in bioinformatics methods for next-generation sequencing (NGS) data analysis have helped researchers to gain an in-depth knowledge about the taxonomic and genetic composition of microbial communities. Metagenomic-based methods have been the most commonly used approaches for microbiome analysis; however, it primarily extracts information about taxonomic composition and genetic potential of the microbiome under study, lacking quantification of the gene products (RNA and proteins). On the other hand, metatranscriptomics, the study of a microbial community's RNA expression, can reveal the dynamic gene expression of individual microbial populations and the community as a whole, ultimately providing information about the active pathways in the microbiome. In order to address the analysis of NGS data, the ASaiM analysis framework was previously developed and made available via the Galaxy platform. Although developed for both metagenomics and metatranscriptomics, the original publication demonstrated the use of ASaiM only for metagenomics, while thorough testing for metatranscriptomics data was lacking. In the current study, we have focused on validating and optimizing the tools within ASaiM for metatranscriptomics data. As a result, we deliver a robust workflow that will enable researchers to understand dynamic functional response of the microbiome in a wide variety of metatranscriptomics studies. This improved and optimized ASaiM-metatranscriptomics (ASaiM-MT) workflow is publicly available via the ASaiM framework, documented and supported with training material so that users can interrogate and characterize metatranscriptomic data, as part of larger meta-omic studies of microbiomes.
Collapse
Affiliation(s)
- Subina Mehta
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Marie Crane
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Emma Leith
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Bérénice Batut
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany
| | - Saskia Hiltemann
- Department of Pathology, Erasmus Medical Center, Rotterdam, The Netherlands
| | | | | | | | | | - Ray Sajulga
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Praveen Kumar
- University of Minnesota, Twin Cities, MN, 55455, USA
| | | | | | | |
Collapse
|
10
|
Mehta S, Crane M, Leith E, Batut B, Hiltemann S, Arntzen MØ, Kunath BJ, Pope PB, Delogu F, Sajulga R, Kumar P, Johnson JE, Griffin TJ, Jagtap PD. ASaiM-MT: a validated and optimized ASaiM workflow for metatranscriptomics analysis within Galaxy framework. F1000Res 2021; 10:103. [PMID: 34484688 PMCID: PMC8383124 DOI: 10.12688/f1000research.28608.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/03/2021] [Indexed: 12/13/2022] Open
Abstract
The Human Microbiome Project (HMP) aided in understanding the role of microbial communities and the influence of collective genetic material (the 'microbiome') in human health and disease. With the evolution of new sequencing technologies, researchers can now investigate the microbiome and map its influence on human health. Advances in bioinformatics methods for next-generation sequencing (NGS) data analysis have helped researchers to gain an in-depth knowledge about the taxonomic and genetic composition of microbial communities. Metagenomic-based methods have been the most commonly used approaches for microbiome analysis; however, it primarily extracts information about taxonomic composition and genetic potential of the microbiome under study, lacking quantification of the gene products (RNA and proteins). Conversely, metatranscriptomics, the study of a microbial community's RNA expression, can reveal the dynamic gene expression of individual microbial populations and the community as a whole, ultimately providing information about the active pathways in the microbiome. In order to address the analysis of NGS data, the ASaiM analysis framework was previously developed and made available via the Galaxy platform. Although developed for both metagenomics and metatranscriptomics, the original publication demonstrated the use of ASaiM only for metagenomics, while thorough testing for metatranscriptomics data was lacking. In the current study, we have focused on validating and optimizing the tools within ASaiM for metatranscriptomics data. As a result, we deliver a robust workflow that will enable researchers to understand dynamic functional response of the microbiome in a wide variety of metatranscriptomics studies. This improved and optimized ASaiM-metatranscriptomics (ASaiM-MT) workflow is publicly available via the ASaiM framework, documented and supported with training material so that users can interrogate and characterize metatranscriptomic data, as part of larger meta-omic studies of microbiomes.
Collapse
Affiliation(s)
- Subina Mehta
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Marie Crane
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Emma Leith
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Bérénice Batut
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany
| | - Saskia Hiltemann
- Department of Pathology, Erasmus Medical Center, Rotterdam, The Netherlands
| | | | | | | | | | - Ray Sajulga
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Praveen Kumar
- University of Minnesota, Twin Cities, MN, 55455, USA
| | | | | | | |
Collapse
|
11
|
Romanis CS, Pearson LA, Neilan BA. Cyanobacterial blooms in wastewater treatment facilities: Significance and emerging monitoring strategies. J Microbiol Methods 2020; 180:106123. [PMID: 33316292 DOI: 10.1016/j.mimet.2020.106123] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 12/06/2020] [Accepted: 12/08/2020] [Indexed: 12/30/2022]
Abstract
Municipal wastewater treatment facilities (WWTFs) are prone to the proliferation of cyanobacterial species which thrive in stable, nutrient-rich environments. Dense cyanobacterial blooms frequently disrupt treatment processes and the supply of recycled water due to their production of extracellular polymeric substances, which hinder microfiltration, and toxins, which pose a health risk to end-users. A variety of methods are employed by water utilities for the identification and monitoring of cyanobacteria and their toxins in WWTFs, including microscopy, flow cytometry, ELISA, chemoanalytical methods, and more recently, molecular methods. Here we review the literature on the occurrence and significance of cyanobacterial blooms in WWTFs and discuss the pros and cons of the various strategies for monitoring these potentially hazardous events. Particular focus is directed towards next-generation metagenomic sequencing technologies for the development of site-specific cyanobacterial bloom management strategies. Long-term multi-omic observations will enable the identification of indicator species and the development of site-specific bloom dynamics models for the mitigation and management of cyanobacterial blooms in WWTFs. While emerging metagenomic tools could potentially provide deep insight into the diversity and flux of problematic cyanobacterial species in these systems, they should be considered a complement to, rather than a replacement of, quantitative chemoanalytical approaches.
Collapse
Affiliation(s)
- Caitlin S Romanis
- School of Environmental and Life Sciences, University of Newcastle, Newcastle 2308, Australia
| | - Leanne A Pearson
- School of Environmental and Life Sciences, University of Newcastle, Newcastle 2308, Australia
| | - Brett A Neilan
- School of Environmental and Life Sciences, University of Newcastle, Newcastle 2308, Australia.
| |
Collapse
|
12
|
Chen L, Wahlberg N, Liao CQ, Wang CB, Ma FZ, Huang GH. Fourteen complete mitochondrial genomes of butterflies from the genus Lethe (Lepidoptera, Nymphalidae, Satyrinae) with mitogenome-based phylogenetic analysis. Genomics 2020; 112:4435-4441. [PMID: 32745503 DOI: 10.1016/j.ygeno.2020.07.042] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 05/16/2020] [Accepted: 07/26/2020] [Indexed: 12/01/2022]
Abstract
The mitochondrial genome (mitogenome) can help us understand the phylogenetic relationships within the genus Lethe and the subfamily Satyrinae. In this study, we sequenced the complete mitogenomes of 14 Lethe species, which range in size from 15,225 to 15,271 bp, with both 37 genes (13 PCGs, 22 tRNAs, 2 rRNAs) and a noncoding A + T-rich region. The gene arrangement and orientation is similar to typical mitogenomes of Lepidoptera. The Ka/Ks ratio shows that cox1 has the slowest evolutionary rate. The secondary structure of trnN lacks the Pseudouracil loop (TψC loop) in most Lethe species. The inferred phylogenetic analyses show that Lethe is a well-supported monophyletic group, and reveal 2 major clades within the genus Lethe, which is consistent with previous morphological classifications.
Collapse
Affiliation(s)
- Lu Chen
- College of Plant Protection, Hunan Agricultural University, Changsha, Hunan 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road 1, Furong District, Changsha, Hunan 410128, China
| | - Niklas Wahlberg
- Systematic Biology Group, Department of Biology, Lund University, Lund, Sweden
| | - Cheng-Qing Liao
- College of Plant Protection, Hunan Agricultural University, Changsha, Hunan 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road 1, Furong District, Changsha, Hunan 410128, China
| | - Chen-Bin Wang
- State Key Laboratory of Biosafety, Ministry of Environmental Protection, Nanjing Institute of Environmental Sciences, Ministry of Environmental Protection, Nanjing, Jiangsu 210042, China
| | - Fang-Zhou Ma
- State Key Laboratory of Biosafety, Ministry of Environmental Protection, Nanjing Institute of Environmental Sciences, Ministry of Environmental Protection, Nanjing, Jiangsu 210042, China.
| | - Guo-Hua Huang
- College of Plant Protection, Hunan Agricultural University, Changsha, Hunan 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road 1, Furong District, Changsha, Hunan 410128, China.
| |
Collapse
|
13
|
Bushmanova E, Antipov D, Lapidus A, Prjibelski AD. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience 2020; 8:5559527. [PMID: 31494669 PMCID: PMC6736328 DOI: 10.1093/gigascience/giz100] [Citation(s) in RCA: 353] [Impact Index Per Article: 88.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/20/2019] [Accepted: 08/01/2019] [Indexed: 12/18/2022] Open
Abstract
Background The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. Results Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. Conclusions Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.
Collapse
Affiliation(s)
- Elena Bushmanova
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, 199004, 6 linia V.O. 11d, Russia
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, 199004, 6 linia V.O. 11d, Russia
| | - Alla Lapidus
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, 199004, 6 linia V.O. 11d, Russia
| | - Andrey D Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, 199004, 6 linia V.O. 11d, Russia
| |
Collapse
|
14
|
Shakya M, Lo CC, Chain PSG. Advances and Challenges in Metatranscriptomic Analysis. Front Genet 2019; 10:904. [PMID: 31608125 PMCID: PMC6774269 DOI: 10.3389/fgene.2019.00904] [Citation(s) in RCA: 192] [Impact Index Per Article: 38.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 08/26/2019] [Indexed: 11/13/2022] Open
Abstract
Sequencing-based analyses of microbiomes have traditionally focused on addressing the question of community membership and profiling taxonomic abundance through amplicon sequencing of 16 rRNA genes. More recently, shotgun metagenomics, which involves the random sequencing of all genomic content of a microbiome, has dominated this arena due to advancements in sequencing technology throughput and capability to profile genes as well as microbiome membership. While these methods have revealed a great number of insights into a wide variety of microbiomes, both of these approaches only describe the presence of organisms or genes, and not whether they are active members of the microbiome. To obtain deeper insights into how a microbial community responds over time to their changing environmental conditions, microbiome scientists are beginning to employ large-scale metatranscriptomics approaches. Here, we present a comprehensive review on computational metatranscriptomics approaches to study microbial community transcriptomes. We review the major advancements in this burgeoning field, compare strengths and weaknesses to other microbiome analysis methods, list available tools and workflows, and describe use cases and limitations of this method. We envision that this field will continue to grow exponentially, as will the scope of projects (e.g. longitudinal studies of community transcriptional responses to perturbations over time) and the resulting data. This review will provide a list of options for computational analysis of these data and will highlight areas in need of development.
Collapse
Affiliation(s)
- Migun Shakya
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Chien-Chi Lo
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| | - Patrick S G Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, United States
| |
Collapse
|
15
|
Westreich ST, Treiber ML, Mills DA, Korf I, Lemay DG. SAMSA2: a standalone metatranscriptome analysis pipeline. BMC Bioinformatics 2018; 19:175. [PMID: 29783945 PMCID: PMC5963165 DOI: 10.1186/s12859-018-2189-z] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 05/04/2018] [Indexed: 01/24/2023] Open
Abstract
Background Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. Results SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. Conclusions SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.
Collapse
Affiliation(s)
| | - Michelle L Treiber
- Genome Center, University of California, Davis, California, USA.,Department of Food Science and Technology, University of California, Davis, California, USA.,USDA ARS Western Nutrition Research Center, Davis, CA, USA
| | - David A Mills
- Department of Food Science and Technology, University of California, Davis, California, USA.,USDA ARS Western Nutrition Research Center, Davis, CA, USA
| | - Ian Korf
- Genome Center, University of California, Davis, California, USA
| | - Danielle G Lemay
- Genome Center, University of California, Davis, California, USA. .,USDA ARS Western Nutrition Research Center, Davis, CA, USA.
| |
Collapse
|
16
|
Mallick H, Ma S, Franzosa EA, Vatanen T, Morgan XC, Huttenhower C. Experimental design and quantitative analysis of microbial community multiomics. Genome Biol 2017; 18:228. [PMID: 29187204 PMCID: PMC5708111 DOI: 10.1186/s13059-017-1359-z] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Studies of the microbiome have become increasingly sophisticated, and multiple sequence-based, molecular methods as well as culture-based methods exist for population-scale microbiome profiles. To link the resulting host and microbial data types to human health, several experimental design considerations, data analysis challenges, and statistical epidemiological approaches must be addressed. Here, we survey current best practices for experimental design in microbiome molecular epidemiology, including technologies for generating, analyzing, and integrating microbiome multiomics data. We highlight studies that have identified molecular bioactives that influence human health, and we suggest steps for scaling translational microbiome research to high-throughput target discovery across large populations.
Collapse
Affiliation(s)
- Himel Mallick
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Siyuan Ma
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Eric A Franzosa
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Tommi Vatanen
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Xochitl C Morgan
- Department of Microbiology and Immunology, The University of Otago, Dunedin, New Zealand
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
17
|
Kim CS, Winn MD, Sachdeva V, Jordan KE. K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity. BMC Bioinformatics 2017; 18:467. [PMID: 29100493 PMCID: PMC5670514 DOI: 10.1186/s12859-017-1881-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 10/26/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND De novo transcriptome assembly is an important technique for understanding gene expression in non-model organisms. Many de novo assemblers using the de Bruijn graph of a set of the RNA sequences rely on in-memory representation of this graph. However, current methods analyse the complete set of read-derived k-mer sequence at once, resulting in the need for computer hardware with large shared memory. RESULTS We introduce a novel approach that clusters k-mers as the first step. The clusters correspond to small sets of gene products, which can be processed quickly to give candidate transcripts. We implement the clustering step using the MapReduce approach for parallelising the analysis of large datasets, which enables the use of compute clusters. The computational task is distributed across the compute system using the industry-standard MPI protocol, and no specialised hardware is required. Using this approach, we have re-implemented the Inchworm module from the widely used Trinity pipeline, and tested the method in the context of the full Trinity pipeline. Validation tests on a range of real datasets show large reductions in the runtime and per-node memory requirements, when making use of a compute cluster. CONCLUSIONS Our study shows that MapReduce-based clustering has great potential for distributing challenging sequencing problems, without loss of accuracy. Although we have focussed on the Trinity package, we propose that such clustering is a useful initial step for other assembly pipelines.
Collapse
Affiliation(s)
- Chang Sik Kim
- The Hartree Centre and Scientific Computing Department, STFC Daresbury Laboratory, Warrington, WA4 4AD, UK.,Present addresse Cancer Research UK Manchester Institute, The University of Manchester, M20 4BX, Manchester, UK
| | - Martyn D Winn
- The Hartree Centre and Scientific Computing Department, STFC Daresbury Laboratory, Warrington, WA4 4AD, UK.
| | - Vipin Sachdeva
- Computational Science Center, IBM T.J. Watson Research, Cambridge, MA, USA.,Present addresse Silicon Therapeutics, 300 A Street, Boston, MA, USA
| | - Kirk E Jordan
- Computational Science Center, IBM T.J. Watson Research, Cambridge, MA, USA
| |
Collapse
|
18
|
Narayanasamy S, Jarosz Y, Muller EEL, Heintz-Buschart A, Herold M, Kaysen A, Laczny CC, Pinel N, May P, Wilmes P. IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses. Genome Biol 2016; 17:260. [PMID: 27986083 PMCID: PMC5159968 DOI: 10.1186/s13059-016-1116-8] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 11/22/2016] [Indexed: 01/28/2023] Open
Abstract
Existing workflows for the analysis of multi-omic microbiome datasets are lab-specific and often result in sub-optimal data usage. Here we present IMP, a reproducible and modular pipeline for the integrated and reference-independent analysis of coupled metagenomic and metatranscriptomic data. IMP incorporates robust read preprocessing, iterative co-assembly, analyses of microbial community structure and function, automated binning, as well as genomic signature-based visualizations. The IMP-based data integration strategy enhances data usage, output volume, and output quality as demonstrated using relevant use-cases. Finally, IMP is encapsulated within a user-friendly implementation using Python and Docker. IMP is available at http://r3lab.uni.lu/web/imp/ (MIT license).
Collapse
Affiliation(s)
- Shaman Narayanasamy
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Yohan Jarosz
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Emilie E. L. Muller
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
- Present address: Department of Microbiology, Genomics and the Environment, UMR 7156 UNISTRA—CNRS, Université de Strasbourg, Strasbourg, France
| | - Anna Heintz-Buschart
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Malte Herold
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Anne Kaysen
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Cédric C. Laczny
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
- Present address: Saarland University, Building E2 1, Saarbrücken, 66123 Germany
| | - Nicolás Pinel
- Institute of Systems Biology, 401 Terry Avenue North, Seattle, WA 98109 USA
- Present address: Universidad EAFIT, Carrera 49 No 7 sur 50, Medellín, Colombia
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette, L-4362 Luxembourg
| |
Collapse
|
19
|
Valles-Colomer M, Darzi Y, Vieira-Silva S, Falony G, Raes J, Joossens M. Meta-omics in Inflammatory Bowel Disease Research: Applications, Challenges, and Guidelines. J Crohns Colitis 2016; 10:735-46. [PMID: 26802086 DOI: 10.1093/ecco-jcc/jjw024] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 01/15/2016] [Indexed: 12/13/2022]
Abstract
Meta-omics [metagenomics, metatranscriptomics, and metaproteomics] are rapidly expanding our knowledge of the gut microbiota in health and disease. These technologies are increasingly used in inflammatory bowel disease [IBD] research. Yet, meta-omics data analysis, interpretation, and among-study comparison remain challenging. In this review we discuss the role these techniques are playing in IBD research, highlighting their strengths and limitations. We give guidelines on proper sample collection and preparation methods, and on performing the analyses and interpreting the results, reporting available user-friendly tools and pipelines.
Collapse
Affiliation(s)
- Mireia Valles-Colomer
- KU Leuven, Department of Microbiology and Immunology, Rega Institute, Leuven, Belgium VIB, Center for the Biology of Disease, Leuven, Belgium
| | - Youssef Darzi
- KU Leuven, Department of Microbiology and Immunology, Rega Institute, Leuven, Belgium VIB, Center for the Biology of Disease, Leuven, Belgium Microbiology Unit, Faculty of Sciences and Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | - Sara Vieira-Silva
- KU Leuven, Department of Microbiology and Immunology, Rega Institute, Leuven, Belgium VIB, Center for the Biology of Disease, Leuven, Belgium
| | - Gwen Falony
- KU Leuven, Department of Microbiology and Immunology, Rega Institute, Leuven, Belgium VIB, Center for the Biology of Disease, Leuven, Belgium
| | - Jeroen Raes
- KU Leuven, Department of Microbiology and Immunology, Rega Institute, Leuven, Belgium VIB, Center for the Biology of Disease, Leuven, Belgium
| | - Marie Joossens
- KU Leuven, Department of Microbiology and Immunology, Rega Institute, Leuven, Belgium VIB, Center for the Biology of Disease, Leuven, Belgium Microbiology Unit, Faculty of Sciences and Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| |
Collapse
|
20
|
Davids M, Hugenholtz F, Martins dos Santos V, Smidt H, Kleerebezem M, Schaap PJ. Functional Profiling of Unfamiliar Microbial Communities Using a Validated De Novo Assembly Metatranscriptome Pipeline. PLoS One 2016; 11:e0146423. [PMID: 26756338 PMCID: PMC4710500 DOI: 10.1371/journal.pone.0146423] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 12/15/2015] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Metatranscriptomic landscapes can provide insights in functional relationships within natural microbial communities. Analysis of complex metatranscriptome datasets of these communities poses a considerable bioinformatic challenge since they are non-restricted with a varying number of participating strains and species. For RNA-Seq data a standard approach is to align the generated reads to a set of closely related reference genomes. This only works well for microbial communities for which a near complete catalogue of reference genomes is available at a small evolutionary distance. In this study, we focus on the design of a validated de novo metatranscriptome assembly pipeline for single-end Illumina RNA-Seq data to obtain functional and taxonomic profiles of murine microbial communities. RESULTS The here developed de novo assembly metatranscriptome pipeline combined rRNA removal, IDBA-UD assembler, functional annotation and taxonomic classification. Different assemblers were tested and validated using RNA-Seq data from an in silico generated mock community and in vivo RNA-Seq data from a restricted microbial community taken from a mouse model colonized with Altered Schaedler Flora (ASF). Precision and recall of resulting gene expression, functional and taxonomic profiles were compared to those obtained with a standard alignment method. The validated pipeline was subsequently used to generate expression profiles from non-restricted cecal communities of four C57BL/6J mice fed on a high-fat high-protein diet spiked with an RNA-Seq data set from a well-characterized human sample. The spike in control was used to estimate precision and recall at assembly, functional and taxonomic level of non-restricted communities. CONCLUSIONS A generic de novo assembly pipeline for metatranscriptome data analysis was designed for microbial ecosystems, which can be applied for microbial metatranscriptome analysis in any chosen niche.
Collapse
Affiliation(s)
- Mark Davids
- Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, Wageningen, The Netherlands
- Netherlands Consortium for Systems Biology, TI Food and Nutrition, Wageningen, The Netherlands
| | - Floor Hugenholtz
- Laboratory of Microbiology, Wageningen University, Dreijenplein 10, Wageningen, The Netherlands
- Netherlands Consortium for Systems Biology, TI Food and Nutrition, Wageningen, The Netherlands
| | - Vitor Martins dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, Wageningen, The Netherlands
- Netherlands Consortium for Systems Biology, TI Food and Nutrition, Wageningen, The Netherlands
| | - Hauke Smidt
- Laboratory of Microbiology, Wageningen University, Dreijenplein 10, Wageningen, The Netherlands
- Netherlands Consortium for Systems Biology, TI Food and Nutrition, Wageningen, The Netherlands
| | - Michiel Kleerebezem
- Host-Microbe Interactomics Group, Wageningen University, Wageningen, The Netherlands
- Netherlands Consortium for Systems Biology, TI Food and Nutrition, Wageningen, The Netherlands
| | - Peter J. Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, Wageningen, The Netherlands
- Netherlands Consortium for Systems Biology, TI Food and Nutrition, Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
21
|
Ju F, Zhang T. Experimental Design and Bioinformatics Analysis for the Application of Metagenomics in Environmental Sciences and Biotechnology. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2015; 49:12628-40. [PMID: 26451629 DOI: 10.1021/acs.est.5b03719] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Recent advances in DNA sequencing technologies have prompted the widespread application of metagenomics for the investigation of novel bioresources (e.g., industrial enzymes and bioactive molecules) and unknown biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. This review discusses the rigorous experimental design and sample preparation in the context of applying metagenomics in environmental sciences and biotechnology. Moreover, this review summarizes the principles, methodologies, and state-of-the-art bioinformatics procedures, tools and database resources for metagenomics applications and discusses two popular strategies (analysis of unassembled reads versus assembled contigs/draft genomes) for quantitative or qualitative insights of microbial community structure and functions. Overall, this review aims to facilitate more extensive application of metagenomics in the investigation of uncultured microorganisms, novel enzymes, microbe-environment interactions, and biohazards in biotechnological applications where microbial communities are engineered for bioenergy production, wastewater treatment, and bioremediation.
Collapse
Affiliation(s)
- Feng Ju
- Environmental Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong , Hong Kong SRA, China
| | - Tong Zhang
- Environmental Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong , Hong Kong SRA, China
| |
Collapse
|
22
|
Ye Y, Tang H. Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis. Bioinformatics 2015; 32:1001-8. [PMID: 26319390 PMCID: PMC4896364 DOI: 10.1093/bioinformatics/btv510] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 08/24/2015] [Indexed: 11/26/2022] Open
Abstract
Motivation: Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory characteristics of the microbial communities. Current metatranscriptomics projects are often carried out without matched metagenomic datasets (of the same microbial communities). For the projects that produce both metatranscriptomic and metagenomic datasets, their analyses are often not integrated. Metagenome assemblies are far from perfect, partially explaining why metagenome assemblies are not used for the analysis of metatranscriptomic datasets. Results: Here, we report a reads mapping algorithm for mapping of short reads onto a de Bruijn graph of assemblies. A hash table of junction k-mers (k-mers spanning branching structures in the de Bruijn graph) is used to facilitate fast mapping of reads to the graph. We developed an application of this mapping algorithm: a reference-based approach to metatranscriptome assembly using graphs of metagenome assembly as the reference. Our results show that this new approach (called TAG) helps to assemble substantially more transcripts that otherwise would have been missed or truncated because of the fragmented nature of the reference metagenome. Availability and implementation: TAG was implemented in C++ and has been tested extensively on the Linux platform. It is available for download as open source at http://omics.informatics.indiana.edu/TAG. Contact:yye@indiana.edu
Collapse
Affiliation(s)
- Yuzhen Ye
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
23
|
Leung HCM, Yiu SM, Chin FYL. IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information. J Comput Biol 2014; 22:367-76. [PMID: 25535824 DOI: 10.1089/cmb.2014.0139] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Metatranscriptomic analysis provides information on how a microbial community reacts to environmental changes. Using next-generation sequencing (NGS) technology, biologists can study the microbe community by sampling short reads from a mixture of mRNAs (metatranscriptomic data). As most microbial genome sequences are unknown, it would seem that de novo assembly of the mRNAs is needed. However, NGS reads are short and mRNAs share many similar regions and differ tremendously in abundance levels, making de novo assembly challenging. The existing assembler, IDBA-MT, designed specifically for the assembly of metatranscriptomic data and performs well only on high-expressed mRNAs. This article introduces IDBA-MTP, which adopts a novel approach to metatranscriptomic assembly that makes use of the fact that there is a database of millions of known protein sequences associated with mRNAs. How to effectively use the protein information is nontrivial given the size of the database and given that different mRNAs might lead to proteins with similar functions (because different amino acids might have similar characteristics). IDBA-MTP employs a similarity measure between mRNAs and protein sequences, dynamic programming techniques, and seed-and-extend heuristics to tackle the problem effectively and efficiently. Experimental results show that IDBA-MTP outperforms existing assemblers by reconstructing 14% more mRNAs.
Collapse
Affiliation(s)
- Henry C M Leung
- Department of Computer Science, The University of Hong Kong , Hong Kong , Hong Kong
| | | | | |
Collapse
|
24
|
Abram F. Systems-based approaches to unravel multi-species microbial community functioning. Comput Struct Biotechnol J 2014; 13:24-32. [PMID: 25750697 PMCID: PMC4348430 DOI: 10.1016/j.csbj.2014.11.009] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Revised: 11/25/2014] [Accepted: 11/26/2014] [Indexed: 01/24/2023] Open
Abstract
Some of the most transformative discoveries promising to enable the resolution of this century's grand societal challenges will most likely arise from environmental science and particularly environmental microbiology and biotechnology. Understanding how microbes interact in situ, and how microbial communities respond to environmental changes remains an enormous challenge for science. Systems biology offers a powerful experimental strategy to tackle the exciting task of deciphering microbial interactions. In this framework, entire microbial communities are considered as metaorganisms and each level of biological information (DNA, RNA, proteins and metabolites) is investigated along with in situ environmental characteristics. In this way, systems biology can help unravel the interactions between the different parts of an ecosystem ultimately responsible for its emergent properties. Indeed each level of biological information provides a different level of characterisation of the microbial communities. Metagenomics, metatranscriptomics, metaproteomics, metabolomics and SIP-omics can be employed to investigate collectively microbial community structure, potential, function, activity and interactions. Omics approaches are enabled by high-throughput 21st century technologies and this review will discuss how their implementation has revolutionised our understanding of microbial communities.
Collapse
Affiliation(s)
- Florence Abram
- Functional Environmental Microbiology, School of Natural Sciences, National University of Ireland Galway, University Road, Galway, Ireland
| |
Collapse
|
25
|
Sequence assembly using next generation sequencing data--challenges and solutions. SCIENCE CHINA-LIFE SCIENCES 2014; 57:1140-8. [PMID: 25326069 DOI: 10.1007/s11427-014-4752-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 08/25/2014] [Indexed: 10/24/2022]
Abstract
Sequence assembling is an important step for bioinformatics study. With the help of next generation sequencing (NGS) technology, high throughput DNA fragment (reads) can be randomly sampled from DNA or RNA molecular sequence. However, as the positions of reads being sampled are unknown, assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence. Compared with traditional Sanger sequencing methods, although the throughput of NGS reads increases, the read length is shorter and the error rate is higher. It introduces several problems in assembling. Moreover, paired-end reads instead of single-end reads can be sampled which contain more information. The existing assemblers cannot fully utilize this information and fails to assemble longer contigs. In this article, we will revisit the major problems of assembling NGS reads on genomic, transcriptomic, metagenomic and metatranscriptomic data. We will also describe our IDBA package for solving these problems. IDBA package has adopted several novel ideas in assembling, including using multiple k, local assembling and progressive depth removal. Compared with existence assemblers, IDBA has better performance on many simulated and real sequencing datasets.
Collapse
|
26
|
Microbial genome-enabled insights into plant–microorganism interactions. Nat Rev Genet 2014; 15:797-813. [DOI: 10.1038/nrg3748] [Citation(s) in RCA: 146] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
27
|
Celaj A, Markle J, Danska J, Parkinson J. Comparison of assembly algorithms for improving rate of metatranscriptomic functional annotation. MICROBIOME 2014; 2:39. [PMID: 25411636 PMCID: PMC4236897 DOI: 10.1186/2049-2618-2-39] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 09/17/2014] [Indexed: 05/11/2023]
Abstract
BACKGROUND Microbiome-wide gene expression profiling through high-throughput RNA sequencing ('metatranscriptomics') offers a powerful means to functionally interrogate complex microbial communities. Key to successful exploitation of these datasets is the ability to confidently match relatively short sequence reads to known bacterial transcripts. In the absence of reference genomes, such annotation efforts may be enhanced by assembling reads into longer contiguous sequences ('contigs'), prior to database search strategies. Since reads from homologous transcripts may derive from several species, represented at different abundance levels, it is not clear how well current assembly pipelines perform for metatranscriptomic datasets. Here we evaluate the performance of four currently employed assemblers including de novo transcriptome assemblers - Trinity and Oases; the metagenomic assembler - Metavelvet; and the recently developed metatranscriptomic assembler IDBA-MT. RESULTS We evaluated the performance of the assemblers on a previously published dataset of single-end RNA sequence reads derived from the large intestine of an inbred non-obese diabetic mouse model of type 1 diabetes. We found that Trinity performed best as judged by contigs assembled, reads assigned to contigs, and number of reads that could be annotated to a known bacterial transcript. Only 15.5% of RNA sequence reads could be annotated to a known transcript in contrast to 50.3% with Trinity assembly. Paired-end reads generated from the same mouse samples resulted in modest performance gains. A database search estimated that the assemblies are unlikely to erroneously merge multiple unrelated genes sharing a region of similarity (<2% of contigs). A simulated dataset based on ten species confirmed these findings. A more complex simulated dataset based on 72 species found that greater assembly errors were introduced than is expected by sequencing quality. Through the detailed evaluation of assembly performance, the insights provided by this study will help drive the design of future metatranscriptomic analyses. CONCLUSION Assembly of metatranscriptome datasets greatly improved read annotation. Of the four assemblers evaluated, Trinity provided the best performance. For more complex datasets, reads generated from transcripts sharing considerable sequence similarity can be a source of significant assembly error, suggesting a need to collate reads on the basis of common taxonomic origin prior to assembly.
Collapse
Affiliation(s)
- Albi Celaj
- Molecular Structure and Function, Hospital for Sick Children, Peter Gilgan Center for Research and Learning, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Janet Markle
- Department of Immunology, University of Toronto, Medical Sciences Building, 1 King’s College Circle, Room 5207, Toronto, Ontario M5S 1A8, Canada
- Genetics and Genomic Biology, Hospital for Sick Children, Peter Gilgan Center for Research and Learning, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada
- Current address: Laboratory of Human Genetics of Infectious Diseases, Rockefeller University, New York, NY 10065, USA
| | - Jayne Danska
- Department of Immunology, University of Toronto, Medical Sciences Building, 1 King’s College Circle, Room 5207, Toronto, Ontario M5S 1A8, Canada
- Genetics and Genomic Biology, Hospital for Sick Children, Peter Gilgan Center for Research and Learning, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada
| | - John Parkinson
- Molecular Structure and Function, Hospital for Sick Children, Peter Gilgan Center for Research and Learning, 686 Bay Street, Toronto, Ontario M5G 0A4, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
28
|
IDBA-MTP: A Hybrid MetaTranscriptomic Assembler Based on Protein Information. LECTURE NOTES IN COMPUTER SCIENCE 2014. [DOI: 10.1007/978-3-319-05269-4_12] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|