1
|
Ramakodi MP. Don't let valuable microbiome data go to waste: combined usage of merging and direct-joining of sequencing reads for low-quality paired-end amplicon data. Biotechnol Lett 2024; 46:791-805. [PMID: 38970710 DOI: 10.1007/s10529-024-03509-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 05/27/2024] [Accepted: 06/24/2024] [Indexed: 07/08/2024]
Abstract
The pernicious nature of low-quality sequencing data warrants improvement in the bioinformatics workflow for profiling microbial diversity. The conventional merging approach, which drops a copious amount of sequencing reads when processing low-quality amplicon data, requires alternative methods. In this study, a computational workflow, a combination of merging and direct-joining where the paired-end reads lacking overlaps are concatenated and pooled with the merged sequences, is proposed to handle the low-quality amplicon data. The proposed computational strategy was compared with two workflows; the merging approach where the paired-end reads are merged, and the direct-joining approach where the reads are concatenated. The results showed that the merging approach generates a significantly low number of amplicon sequences, limits the microbiome inference, and obscures some microbial associations. In comparison to other workflows, the combination of merging and direct-joining strategy reduces the loss of amplicon data, improves the taxonomy classification, and importantly, abates the misleading results associated with the merging approach when analysing the low-quality amplicon data. The mock community analysis also supports the findings. In summary, the researchers are suggested to follow the merging and direct-joining workflow to avoid problems associated with low-quality data while profiling the microbial community structure.
Collapse
Affiliation(s)
- Meganathan P Ramakodi
- CSIR-National Environmental Engineering Research Institute (NEERI), Hyderabad Zonal Centre, IICT Campus, Tarnaka, Hyderabad, Telangana, 500007, India.
| |
Collapse
|
2
|
Comparative analysis of two next-generation sequencing platforms for analysis of antimicrobial resistance genes. J Glob Antimicrob Resist 2022; 31:167-174. [PMID: 36055548 DOI: 10.1016/j.jgar.2022.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/10/2022] [Accepted: 08/23/2022] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVES The use of antibiotics in human medicine and livestock production has contributed to the widespread occurrence of Antimicrobial Resistance (AMR). Recognizing the relevance of AMR to human and livestock health, it is important to assess the occurrence of genetic determinants of resistance in medical, veterinary, and public health settings in order to understand risks of transmission and treatment failure. Advances in next-generation sequencing technologies have had a significant impact on research in microbial genetics and microbiome analyses. The aim of the present study was to compare the Illumina MiSeq and Ion Torrent S5 Plus sequencing platforms for the analysis of AMR genes in a veterinary/public health setting. METHODS All samples were processed in parallel for the two sequencing technologies, subsequently following a common bioinformatics workflow to define the occurrence and abundance of AMR gene sequences. The Comprehensive Antibiotic Resistance Database (CARD), QIAGEN Microbial Insight - Antimicrobial Resistance, Antimicrobial resistance database, and Comprehensive Antibiotic Resistance Database developed by CLC bio (CARD-CLC) databases were compared for analysis, with the most genes identified using CARD. RESULTS Drawing on these results, we described an end-to-end workflow for the analysis of AMR genes a using advances in next-generation sequencing. No statistically significant differences were observed among any other genes except the tet-(40) gene between two sequencing platforms, which may be due to the short amplicon length. CONCLUSIONS Irrespective of sequencing chemistry and platform used, comparative analysis of AMR genes and candidate host organism suggest that the Illumina MiSeq and Ion Torrent platforms performed almost equally. Regardless of sequencing platform, the results were closely comparable with minor differences.
Collapse
|
3
|
Ramakodi MP. A comprehensive evaluation of single-end sequencing data analyses for environmental microbiome research. Arch Microbiol 2021; 203:6295-6302. [PMID: 34654941 DOI: 10.1007/s00203-021-02597-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 09/17/2021] [Accepted: 09/28/2021] [Indexed: 01/04/2023]
Abstract
Illumina sequencing platforms have been widely used for amplicon-based environmental microbiome research. Analyses of amplicon data of environmental samples, generated from Illumina MiSeq platform illustrate the reverse (R2) reads in the PE datasets to have low quality towards the 3' end of the reads which affect the sequencing depth of samples and ultimately impact the sample size which may possibly lead to an altered outcome. This study evaluates the usefulness of single-end (SE) sequencing data in microbiome research when the Illumina MiSeq PE dataset shows significantly high number of low-quality reverse reads. In this study, the amplicon data (V1V3, V3V4, V4V5 and V6V8) from 128 environmental (soil) samples, downloaded from SRA, demonstrate the efficiency of single-end (SE) sequencing data analyses in microbiome research. The SE datasets were found to infer the core microbiome structure as comparable to the PE dataset. Conspicuously, the forward (R1) datasets inferred a higher number of taxa as compared to PE datasets for most of the amplicon regions, except V3V4. Thus, analyses of SE sequencing data, especially R1 reads, in environmental microbiome studies could ameliorate the problems arising on sample size of the study due to low quality reverse reads in the dataset. However, care must be taken while interpreting the microbiome structure as few taxa observed in the PE datasets were absent in the SE datasets. In conclusion, this study demonstrates the availability of choices in analyzing the amplicon data without having the need to remove samples with low quality reverse reads.
Collapse
Affiliation(s)
- Meganathan P Ramakodi
- CSIR-National Environmental Engineering Research Institute (NEERI), Hyderabad Zonal Centre, IICT Campus, Tarnaka, Hyderabad, Telangana, 500007, India.
| |
Collapse
|
4
|
Dacey DP, Chain FJJ. Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities. BMC Bioinformatics 2021; 22:493. [PMID: 34641782 PMCID: PMC8507205 DOI: 10.1186/s12859-021-04410-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 09/29/2021] [Indexed: 01/04/2023] Open
Abstract
Background Taxonomic classification of genetic markers for microbiome analysis is affected by the numerous choices made from sample preparation to bioinformatics analysis. Paired-end read merging is routinely used to capture the entire amplicon sequence when the read ends overlap. However, the exclusion of unmerged reads from further analysis can result in underestimating the diversity in the sequenced microbial community and is influenced by bioinformatic processes such as read trimming and the choice of reference database. A potential solution to overcome this is to concatenate (join) reads that do not overlap and keep them for taxonomic classification. The use of concatenated reads can outperform taxonomic recovery from single-end reads, but it remains unclear how their performance compares to merged reads. Using various sequenced mock communities with different amplicons, read length, read depth, taxonomic composition, and sequence quality, we tested how merging and concatenating reads performed for genus recall and precision in bioinformatic pipelines combining different parameters for read trimming and taxonomic classification using different reference databases. Results The addition of concatenated reads to merged reads always increased pipeline performance. The top two performing pipelines both included read concatenation, with variable strengths depending on the mock community. The pipeline that combined merged and concatenated reads that were quality-trimmed performed best for mock communities with larger amplicons and higher average quality sequences. The pipeline that used length-trimmed concatenated reads outperformed quality trimming in mock communities with lower quality sequences but lost a significant amount of input sequences for taxonomic classification during processing. Genus level classification was more accurate using the SILVA reference database compared to Greengenes. Conclusions Merged sequences with the addition of concatenated sequences that were unable to be merged increased performance of taxonomic classifications. This was especially beneficial in mock communities with larger amplicons. We have shown for the first time, using an in-depth comparison of pipelines containing merged vs concatenated reads combined with different trimming parameters and reference databases, the potential advantages of concatenating sequences in improving resolution in microbiome investigations. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04410-2.
Collapse
Affiliation(s)
- Daniel P Dacey
- Department of Biological Sciences, University of Massachusetts Lowell, Lowell, MA, USA.
| | - Frédéric J J Chain
- Department of Biological Sciences, University of Massachusetts Lowell, Lowell, MA, USA
| |
Collapse
|
5
|
16S rRNA of Mucosal Colon Microbiome and CCL2 Circulating Levels Are Potential Biomarkers in Colorectal Cancer. Int J Mol Sci 2021; 22:ijms221910747. [PMID: 34639088 PMCID: PMC8509685 DOI: 10.3390/ijms221910747] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/30/2021] [Accepted: 09/30/2021] [Indexed: 12/12/2022] Open
Abstract
Colorectal cancer (CRC) is one of the most common malignancies in the Western world and intestinal dysbiosis might contribute to its pathogenesis. The mucosal colon microbiome and C-C motif chemokine 2 (CCL2) were investigated in 20 healthy controls (HC) and 20 CRC patients using 16S rRNA sequencing and immunoluminescent assay, respectively. A total of 10 HC subjects were classified as overweight/obese (OW/OB_HC) and 10 subjects were normal weight (NW_HC); 15 CRC patients were classified as OW/OB_CRC and 5 patients were NW_CRC. Results: Fusobacterium nucleatum and Escherichia coli were more abundant in OW/OB_HC than in NW_HC microbiomes. Globally, Streptococcus intermedius, Gemella haemolysans, Fusobacterium nucleatum, Bacteroides fragilis and Escherichia coli were significantly increased in CRC patient tumor/lesioned tissue (CRC_LT) and CRC patient unlesioned tissue (CRC_ULT) microbiomes compared to HC microbiomes. CCL2 circulating levels were associated with tumor presence and with the abundance of Fusobacterium nucleatum, Bacteroides fragilis and Gemella haemolysans. Our data suggest that mucosal colon dysbiosis might contribute to CRC pathogenesis by inducing inflammation. Notably, Fusobacterium nucleatum, which was more abundant in the OW/OB_HC than in the NW_HC microbiomes, might represent a putative link between obesity and increased CRC risk.
Collapse
|
6
|
Minardi D, Ryder D, Del Campo J, Garcia Fonseca V, Kerr R, Mortensen S, Pallavicini A, Bass D. Improved high throughput protocol for targeting eukaryotic symbionts in metazoan and eDNA samples. Mol Ecol Resour 2021; 22:664-678. [PMID: 34549891 PMCID: PMC9292944 DOI: 10.1111/1755-0998.13509] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 08/23/2021] [Accepted: 09/01/2021] [Indexed: 01/04/2023]
Abstract
Eukaryote symbionts of animals are major drivers of ecosystems not only because of their diversity and host interactions from variable pathogenicity but also through different key roles such as commensalism and to different types of interdependence. However, molecular investigations of metazoan eukaryomes require minimising coamplification of homologous host genes. In this study we (1) identified a previously published “antimetazoan” reverse primer to theoretically enable amplification of a wider range of microeukaryotic symbionts, including more evolutionarily divergent sequence types, (2) evaluated in silico several antimetazoan primer combinations, and (3) optimised the application of the best performing primer pair for high throughput sequencing (HTS) by comparing one‐step and two‐step PCR amplification approaches, testing different annealing temperatures and evaluating the taxonomic profiles produced by HTS and data analysis. The primer combination 574*F – UNonMet_DB tested in silico showed the largest diversity of nonmetazoan sequence types in the SILVA database and was also the shortest available primer combination for broadly‐targeting antimetazoan amplification across the 18S rRNA gene V4 region. We demonstrate that the one‐step PCR approach used for library preparation produces significantly lower proportions of metazoan reads, and a more comprehensive coverage of host‐associated microeukaryote reads than the two‐step approach. Using higher PCR annealing temperatures further increased the proportion of nonmetazoan reads in all sample types tested. The resulting V4 region amplicons were taxonomically informative even when only the forward read is analysed. This region also revealed a diversity of known and putatively parasitic lineages and a wider diversity of host‐associated eukaryotes.
Collapse
Affiliation(s)
- Diana Minardi
- Centre for Environment, Fisheries and Aquaculture Research, Weymouth, Dorset, UK.,Cefas, International Centre for Aquatic Animal Health, Weymouth, Dorset, UK
| | - David Ryder
- Centre for Environment, Fisheries and Aquaculture Research, Weymouth, Dorset, UK.,Cefas, International Centre for Aquatic Animal Health, Weymouth, Dorset, UK
| | - Javier Del Campo
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain
| | - Vera Garcia Fonseca
- Centre for Environment, Fisheries and Aquaculture Research, Weymouth, Dorset, UK.,Cefas, International Centre for Aquatic Animal Health, Weymouth, Dorset, UK
| | - Rose Kerr
- Centre for Environment, Fisheries and Aquaculture Research, Weymouth, Dorset, UK.,Cefas, International Centre for Aquatic Animal Health, Weymouth, Dorset, UK
| | | | | | - David Bass
- Centre for Environment, Fisheries and Aquaculture Research, Weymouth, Dorset, UK.,Cefas, International Centre for Aquatic Animal Health, Weymouth, Dorset, UK.,Department of Life Sciences, The Natural History Museum, London, UK
| |
Collapse
|
7
|
Environmental factors shape the epiphytic bacterial communities of Gracilariopsis lemaneiformis. Sci Rep 2021; 11:8671. [PMID: 33883606 PMCID: PMC8060329 DOI: 10.1038/s41598-021-87977-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 04/06/2021] [Indexed: 02/02/2023] Open
Abstract
Macroalgae host various symbionts on their surface, which play a critical role in their growth and development processes. However, there is still incomplete understanding of this epiphytic bacteria-host algae interactions. This study comprehensively analysed variation of the epiphytic bacterial communities (EBC) composition of red macroalga Gracilariopsis lemaneiformis at different geographic locations and environmental factors (i.e., nitrogen and phosphorus), which shape the EBC composition of G. lemaneiformis. The composition and structure of EBC were characterized using high throughput sequencing of the V3-V4 hypervariable region of the 16S rRNA gene. The results revealed that epiphytic bacteria varied significantly among three different geographic locations in China, i.e., Nan'ao Island (NA), Lianjiang County (LJ), and Nanri Island (NR). Redundancy analysis (RDA) showed that the relative abundance of Bacteroidetes, Firmicutes, Verrucomicrobia, and Epsilonbacteraeota at NR were strongly positively correlated with total nitrogen (TN), total phosphorus (TP), nitrate nitrogen (NO3-N), and dissolved inorganic nitrogen (DIN), but negatively correlated with nitrite nitrogen (NO2-N). The relative abundance of Cyanobacteria at NA and LJ were strongly positively correlated with NO2-N, but negatively correlated with TN, TP, NO3-N, and DIN. Besides, the Mantel test results indicated that the EBC composition was significantly correlated with these environmental factors, which was also confirmed by Spearman correlation analysis. Thus, environmental factors such as NO3-N and DIN play a key role in the community composition of epiphytic bacteria on G. lemaneiformis. This study provides important baseline knowledge on the community composition of epiphytic bacteria on G. lemaneiformis and shows correlation between different epiphytic bacteria and their surrounding environmental factors.
Collapse
|
8
|
Can We Use Functional Annotation of Prokaryotic Taxa (FAPROTAX) to Assign the Ecological Functions of Soil Bacteria? APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11020688] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
FAPROTAX is a promising tool for predicting ecological relevant functions of bacterial and archaeal taxa derived from 16S rRNA amplicon sequencing. The database was initially developed to predict the function of marine species using standard microbiological references. This study, however, has attempted to access the application of FAPROTAX in soil environments. We hypothesized that FAPROTAX was compatible with terrestrial ecosystems. The potential use of FAPROTAX to assign ecological functions of soil bacteria was investigated using meta-analysis and our newly designed experiments. Soil samples from two major terrestrial ecosystems, including agricultural land and forest, were collected. Bacterial taxonomy was analyzed using Illumina sequencing of the 16S rRNA gene and ecological functions of the soil bacteria were assigned by FAPROTAX. The presence of all functionally assigned OTUs (Operation Taxonomic Units) in soil were manually checked using peer-reviewed articles as well as standard microbiology books. Overall, we showed that sample source was not a predominant factor that limited the application of FAPROTAX, but poor taxonomic identification was. The proportion of assigned taxa between aquatic and non-aquatic ecosystems was not significantly different (p > 0.05). There were strong and significant correlations (σ = 0.90–0.95, p < 0.01) between the number of OTUs assigned to genus or order level and the number of functionally assigned OTUs. After manual verification, we found that more than 97% of the FAPROTAX assigned OTUs have previously been detected and potentially performed functions in agricultural and forest soils. We further provided information regarding taxa capable of N-fixation, P and K solubilization, which are three main important elements in soil systems and can be integrated with FAPROTAX to increase the proportion of functionally assigned OTUs. Consequently, we concluded that FAPROTAX can be used for a fast-functional screening or grouping of 16S derived bacterial data from terrestrial ecosystems and its performance could be enhanced through improving the taxonomic and functional reference databases.
Collapse
|
9
|
16S rRNA Gene Amplicon Sequencing Data of Tailing and Nontailing Rhizosphere Soils of Mimosa pudica from a Heavy Metal-Contaminated Ex-Tin Mining Area. Microbiol Resour Announc 2020; 9:9/42/e00761-20. [PMID: 33060266 PMCID: PMC7561685 DOI: 10.1128/mra.00761-20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The 16S rRNA gene amplicon sequence data from tailing and nontailing rhizosphere soils of Mimosa pudica from a heavy metal-contaminated area are reported here. Diverse bacterial taxa were represented in the results, and the most dominant phyla were Proteobacteria (41.2%), Acidobacteria (17.1%), and Actinobacteria (14.4%). The 16S rRNA gene amplicon sequence data from tailing and nontailing rhizosphere soils of Mimosa pudica from a heavy metal-contaminated area are reported here. Diverse bacterial taxa were represented in the results, and the most dominant phyla were Proteobacteria (41.2%), Acidobacteria (17.1%), and Actinobacteria (14.4%).
Collapse
|
10
|
Singh A, Nylander JAA, Schnürer A, Bongcam-Rudloff E, Müller B. High-Throughput Sequencing and Unsupervised Analysis of Formyltetrahydrofolate Synthetase (FTHFS) Gene Amplicons to Estimate Acetogenic Community Structure. Front Microbiol 2020; 11:2066. [PMID: 32983047 PMCID: PMC7481360 DOI: 10.3389/fmicb.2020.02066] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 08/05/2020] [Indexed: 11/17/2022] Open
Abstract
The formyltetrahydrofolate synthetase (FTHFS) gene is a molecular marker of choice to study the diversity of acetogenic communities. However, current analyses are limited due to lack of a high-throughput sequencing approach for FTHFS gene amplicons and a dedicated bioinformatics pipeline for data analysis, including taxonomic annotation and visualization of the sequence data. In the present study, we combined the barcode approach for multiplexed sequencing with unsupervised data analysis to visualize acetogenic community structure. We used samples from a biogas digester to develop proof-of-principle for our combined approach. We successfully generated high-throughput sequence data for the partial FTHFS gene and performed unsupervised data analysis using the novel bioinformatics pipeline “AcetoScan” presented in this study, which resulted in taxonomically annotated OTUs, phylogenetic tree, abundance plots and diversity indices. The results demonstrated that high-throughput sequencing can be used to sequence the FTHFS amplicons from a pool of samples, while the analysis pipeline AcetoScan can be reliably used to process the raw sequence data and visualize acetogenic community structure. The method and analysis pipeline described in this paper can assist in the identification and quantification of known or potentially new acetogens. The AcetoScan pipeline is freely available at https://github.com/abhijeetsingh1704/AcetoScan.
Collapse
Affiliation(s)
- Abhijeet Singh
- Anaerobic Microbiology and Biotechnology Group, Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Johan A A Nylander
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden.,National Bioinformatics Infrastructure Sweden, SciLifeLab, Uppsala, Sweden
| | - Anna Schnürer
- Anaerobic Microbiology and Biotechnology Group, Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Erik Bongcam-Rudloff
- SLU-Global Bioinformatics Centre, Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Bettina Müller
- Anaerobic Microbiology and Biotechnology Group, Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|