1
|
Heo Y, Manikandan G, Ramachandran A, Chen D. Comprehensive Evaluation of Error-Correction Methodologies for Genome Sequencing Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
2
|
Mitchell K, Brito JJ, Mandric I, Wu Q, Knyazev S, Chang S, Martin LS, Karlsberg A, Gerasimov E, Littman R, Hill BL, Wu NC, Yang HT, Hsieh K, Chen L, Littman E, Shabani T, Enik G, Yao D, Sun R, Schroeder J, Eskin E, Zelikovsky A, Skums P, Pop M, Mangul S. Benchmarking of computational error-correction methods for next-generation sequencing data. Genome Biol 2020; 21:71. [PMID: 32183840 PMCID: PMC7079412 DOI: 10.1186/s13059-020-01988-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 03/06/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.
Collapse
Affiliation(s)
- Keith Mitchell
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Jaqueline J Brito
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Igor Mandric
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Qiaozhen Wu
- Department of Mathematics, University of California Los Angeles, 520 Portola Plaza, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Lana S Martin
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Aaron Karlsberg
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Ekaterina Gerasimov
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Russell Littman
- UCLA Bioinformatics, 621 Charles E Young Dr S, Los Angeles, CA, 90024, USA
| | - Brian L Hill
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Nicholas C Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Kevin Hsieh
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Linus Chen
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Eli Littman
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Taylor Shabani
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - German Enik
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Douglas Yao
- Department of Molecular, Cell, and Developmental Biology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Jan Schroeder
- Epigenetics & Reprogramming Laboratory, Monash University, 15 Innovation Walk, Melbourne, VIC, 3800, Australia
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
- The Laboratory of Bioinformatics, I.M, Sechenov First Moscow State Medical University, Moscow, Russia, 119991
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Mihai Pop
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.
| |
Collapse
|
3
|
Minias P, Pikus E, Whittingham LA, Dunn PO. A global analysis of selection at the avian MHC. Evolution 2018; 72:1278-1293. [PMID: 29665025 DOI: 10.1111/evo.13490] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 03/29/2018] [Accepted: 04/09/2018] [Indexed: 12/29/2022]
Abstract
Recent advancements in sequencing technology have resulted in rapid progress in the study of the major histocompatibility complex (MHC) in non-model avian species. Here, we analyze a global dataset of avian MHC class I and class II sequences (ca. 11,000 sequences from over 250 species) to gain insight into the processes that govern macroevolution of MHC genes in birds. Analysis of substitution rates revealed striking differences in the patterns of diversifying selection between passerine and non-passerine birds. Non-passerines showed stronger selection at MHC class II, which is primarily involved in recognition of extracellular pathogens, while passerines showed stronger selection at MHC class I, which is involved in recognition of intracellular pathogens. Positions of positively selected amino-acid residues showed marked discrepancies with peptide-binding residues (PBRs) of human MHC molecules, suggesting that using a human classification of PBRs to assess selection patterns at the avian MHC may be unjustified. Finally, our analysis provided evidence that indel mutations can make a substantial contribution to adaptive variation at the avian MHC.
Collapse
Affiliation(s)
- Piotr Minias
- Department of Biodiversity Studies and Bioeducation, Faculty of Biology and Environmental Protection, University of Łódź, Łódź, 90-237, Poland
| | - Ewa Pikus
- Department of Biodiversity Studies and Bioeducation, Faculty of Biology and Environmental Protection, University of Łódź, Łódź, 90-237, Poland
| | - Linda A Whittingham
- Behavioral and Molecular Ecology Group, Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, 53211
| | - Peter O Dunn
- Department of Biodiversity Studies and Bioeducation, Faculty of Biology and Environmental Protection, University of Łódź, Łódź, 90-237, Poland.,Behavioral and Molecular Ecology Group, Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, 53211
| |
Collapse
|
4
|
Hathaway NJ, Parobek CM, Juliano JJ, Bailey JA. SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing. Nucleic Acids Res 2018; 46:e21. [PMID: 29202193 PMCID: PMC5829576 DOI: 10.1093/nar/gkx1201] [Citation(s) in RCA: 87] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 11/16/2017] [Accepted: 11/20/2017] [Indexed: 01/08/2023] Open
Abstract
PCR amplicon deep sequencing continues to transform the investigation of genetic diversity in viral, bacterial, and eukaryotic populations. In eukaryotic populations such as Plasmodium falciparum infections, it is important to discriminate sequences differing by a single nucleotide polymorphism. In bacterial populations, single-base resolution can provide improved resolution towards species and strains. Here, we introduce the SeekDeep suite built around the qluster algorithm, which is capable of accurately building de novo clusters representing true, biological local haplotypes differing by just a single base. It outperforms current software, particularly at low frequencies and at low input read depths, whether resolving single-base differences or traditional OTUs. SeekDeep is open source and works with all major sequencing technologies, making it broadly useful in a wide variety of applications of amplicon deep sequencing to extract accurate and maximal biologic information.
Collapse
Affiliation(s)
- Nicholas J Hathaway
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Christian M Parobek
- Curriculum in Genetics and Molecular Biology, University of North Carolina School of Medicine, Chapel Hill, NC, USA
| | - Jonathan J Juliano
- Curriculum in Genetics and Molecular Biology, University of North Carolina School of Medicine, Chapel Hill, NC, USA
- Division of Infectious Diseases, Department of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Jeffrey A Bailey
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
- Division of Transfusion Medicine, Department of Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| |
Collapse
|
5
|
Ivády G, Madar L, Dzsudzsák E, Koczok K, Kappelmayer J, Krulisova V, Macek M, Horváth A, Balogh I. Analytical parameters and validation of homopolymer detection in a pyrosequencing-based next generation sequencing system. BMC Genomics 2018; 19:158. [PMID: 29466940 PMCID: PMC5822529 DOI: 10.1186/s12864-018-4544-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 02/13/2018] [Indexed: 01/14/2023] Open
Abstract
Background Current technologies in next-generation sequencing are offering high throughput reads at low costs, but still suffer from various sequencing errors. Although pyro- and ion semiconductor sequencing both have the advantage of delivering long and high quality reads, problems might occur when sequencing homopolymer-containing regions, since the repeating identical bases are going to incorporate during the same synthesis cycle, which leads to uncertainty in base calling. The aim of this study was to evaluate the analytical performance of a pyrosequencing-based next-generation sequencing system in detecting homopolymer sequences using homopolymer-preintegrated plasmid constructs and human DNA samples originating from patients with cystic fibrosis. Results In the plasmid system average correct genotyping was 95.8% in 4-mers, 87.4% in 5-mers and 72.1% in 6-mers. Despite the experienced low genotyping accuracy in 5- and 6-mers, it was possible to generate amplicons with more than a 90% adequate detection rate in every homopolymer tract. When homopolymers in the CFTR gene were sequenced average accuracy was 89.3%, but varied in a wide range (52.2 – 99.1%). In all but one case, an optimal amplicon-sequencing primer combination could be identified. In that single case (7A tract in exon 14 (c.2046_2052)), none of the tested primer sets produced the required analytical performance. Conclusions Our results show that pyrosequencing is the most reliable in case of 4-mers and as homopolymer length gradually increases, accuracy deteriorates. With careful primer selection, the NGS system was able to correctly genotype all but one of the homopolymers in the CFTR gene. In conclusion, we configured a plasmid test system that can be used to assess genotyping accuracy of NGS devices and developed an accurate NGS assay for the molecular diagnosis of CF using self-designed primers for amplification and sequencing. Electronic supplementary material The online version of this article (10.1186/s12864-018-4544-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gergely Ivády
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - László Madar
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - Erika Dzsudzsák
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - Katalin Koczok
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary.,Division of Clinical Genetics, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - János Kappelmayer
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - Veronika Krulisova
- Department of Biology and Medical Genetics, Second Faculty of Medicine and University Hospital Motol, Charles University, Prague, Czech Republic
| | - Milan Macek
- Department of Biology and Medical Genetics, Second Faculty of Medicine and University Hospital Motol, Charles University, Prague, Czech Republic
| | - Attila Horváth
- Genomic Medicine and Bioinformatic Core Facility, University of Debrecen, Debrecen, Hungary
| | - István Balogh
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary. .,Division of Clinical Genetics, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary.
| |
Collapse
|
6
|
Rastelli E, Corinaldesi C, Dell'Anno A, Tangherlini M, Martorelli E, Ingrassia M, Chiocci FL, Lo Martire M, Danovaro R. High potential for temperate viruses to drive carbon cycling in chemoautotrophy-dominated shallow-water hydrothermal vents. Environ Microbiol 2017; 19:4432-4446. [DOI: 10.1111/1462-2920.13890] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 07/06/2017] [Accepted: 08/08/2017] [Indexed: 11/29/2022]
Affiliation(s)
- Eugenio Rastelli
- Department of Life and Environmental Sciences; Polytechnic University of Marche; Ancona 60131 Italy
- Stazione Zoologica Anton Dohrn; Villa Comunale; Naples 80121 Italy
| | - Cinzia Corinaldesi
- Department of Life and Environmental Sciences; Polytechnic University of Marche; Ancona 60131 Italy
| | - Antonio Dell'Anno
- Department of Life and Environmental Sciences; Polytechnic University of Marche; Ancona 60131 Italy
| | - Michael Tangherlini
- Department of Life and Environmental Sciences; Polytechnic University of Marche; Ancona 60131 Italy
| | - Eleonora Martorelli
- Institute of Environmental Geology and Geoengineering; Italian National Research Council; Rome Italy
| | - Michela Ingrassia
- Institute of Environmental Geology and Geoengineering; Italian National Research Council; Rome Italy
- Department of Earth Science; University of Rome Sapienza; Rome Italy
| | - Francesco L. Chiocci
- Institute of Environmental Geology and Geoengineering; Italian National Research Council; Rome Italy
- Department of Earth Science; University of Rome Sapienza; Rome Italy
| | - Marco Lo Martire
- Department of Life and Environmental Sciences; Polytechnic University of Marche; Ancona 60131 Italy
| | - Roberto Danovaro
- Department of Life and Environmental Sciences; Polytechnic University of Marche; Ancona 60131 Italy
- Stazione Zoologica Anton Dohrn; Villa Comunale; Naples 80121 Italy
| |
Collapse
|
7
|
Song L, Huang W, Kang J, Huang Y, Ren H, Ding K. Comparison of error correction algorithms for Ion Torrent PGM data: application to hepatitis B virus. Sci Rep 2017; 7:8106. [PMID: 28808243 PMCID: PMC5556038 DOI: 10.1038/s41598-017-08139-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 07/05/2017] [Indexed: 01/26/2023] Open
Abstract
Ion Torrent Personal Genome Machine (PGM) technology is a mid-length read, low-cost and high-speed next-generation sequencing platform with a relatively high insertion and deletion (indel) error rate. A full systematic assessment of the effectiveness of various error correction algorithms in PGM viral datasets (e.g., hepatitis B virus (HBV)) has not been performed. We examined 19 quality-trimmed PGM datasets for the HBV reverse transcriptase (RT) region and found a total error rate of 0.48% ± 0.12%. Deletion errors were clearly present at the ends of homopolymer runs. Tests using both real and simulated data showed that the algorithms differed in their abilities to detect and correct errors and that the error rate and sequencing depth significantly affected the performance. Of the algorithms tested, Pollux showed a better overall performance but tended to over-correct 'genuine' substitution variants, whereas Fiona proved to be better at distinguishing these variants from sequencing errors. We found that the combined use of Pollux and Fiona gave the best results when error-correcting Ion Torrent PGM viral data.
Collapse
Affiliation(s)
- Liting Song
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, Department of Infectious Diseases, The Second Affiliated Hospital, Chongqing Medical University, Chongqing, 400010, P.R. China
| | - Wenxun Huang
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, Department of Infectious Diseases, The Second Affiliated Hospital, Chongqing Medical University, Chongqing, 400010, P.R. China
| | - Juan Kang
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, Department of Infectious Diseases, The Second Affiliated Hospital, Chongqing Medical University, Chongqing, 400010, P.R. China
| | - Yuan Huang
- Center for Hepatobillary and Pancreatic Diseases, Beijing Tsinghua Changgung Hospital, Medical Center, Tsinghua University, Beijing, 100044, P.R. China
| | - Hong Ren
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, Department of Infectious Diseases, The Second Affiliated Hospital, Chongqing Medical University, Chongqing, 400010, P.R. China
| | - Keyue Ding
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis, Department of Infectious Diseases, The Second Affiliated Hospital, Chongqing Medical University, Chongqing, 400010, P.R. China.
| |
Collapse
|
8
|
Hou D, Chen C, Seely EJ, Chen S, Song Y. High-Throughput Sequencing-Based Immune Repertoire Study during Infectious Disease. Front Immunol 2016; 7:336. [PMID: 27630639 PMCID: PMC5005336 DOI: 10.3389/fimmu.2016.00336] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 08/19/2016] [Indexed: 11/13/2022] Open
Abstract
The selectivity of the adaptive immune response is based on the enormous diversity of T and B cell antigen-specific receptors. The immune repertoire, the collection of T and B cells with functional diversity in the circulatory system at any given time, is dynamic and reflects the essence of immune selectivity. In this article, we review the recent advances in immune repertoire study of infectious diseases, which were achieved by traditional techniques and high-throughput sequencing (HTS) techniques. HTS techniques enable the determination of complementary regions of lymphocyte receptors with unprecedented efficiency and scale. This progress in methodology enhances the understanding of immunologic changes during pathogen challenge and also provides a basis for further development of novel diagnostic markers, immunotherapies, and vaccines.
Collapse
Affiliation(s)
- Dongni Hou
- Department of Pulmonary Medicine, Zhongshan Hospital, Fudan University , Shanghai , China
| | - Cuicui Chen
- Department of Pulmonary Medicine, Zhongshan Hospital, Fudan University , Shanghai , China
| | - Eric John Seely
- Department of Medicine, Division of Pulmonary and Critical Care Medicine, University of California San Francisco , San Francisco, CA , USA
| | - Shujing Chen
- Department of Pulmonary Medicine, Zhongshan Hospital, Fudan University , Shanghai , China
| | - Yuanlin Song
- Department of Pulmonary Medicine, Zhongshan Hospital, Fudan University , Shanghai , China
| |
Collapse
|
9
|
Improved Efficiency and Reliability of NGS Amplicon Sequencing Data Analysis for Genetic Diagnostic Procedures Using AGSA Software. BIOMED RESEARCH INTERNATIONAL 2016; 2016:5623089. [PMID: 27656653 PMCID: PMC5021467 DOI: 10.1155/2016/5623089] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 06/28/2016] [Indexed: 11/23/2022]
Abstract
Screening for BRCA mutations in women with familial risk of breast or ovarian cancer is an ideal situation for high-throughput sequencing, providing large amounts of low cost data. However, 454, Roche, and Ion Torrent, Thermo Fisher, technologies produce homopolymer-associated indel errors, complicating their use in routine diagnostics. We developed software, named AGSA, which helps to detect false positive mutations in homopolymeric sequences. Seventy-two familial breast cancer cases were analysed in parallel by amplicon 454 pyrosequencing and Sanger dideoxy sequencing for genetic variations of the BRCA genes. All 565 variants detected by dideoxy sequencing were also detected by pyrosequencing. Furthermore, pyrosequencing detected 42 variants that were missed with Sanger technique. Six amplicons contained homopolymer tracts in the coding sequence that were systematically misread by the software supplied by Roche. Read data plotted as histograms by AGSA software aided the analysis considerably and allowed validation of the majority of homopolymers. As an optimisation, additional 250 patients were analysed using microfluidic amplification of regions of interest (Access Array Fluidigm) of the BRCA genes, followed by 454 sequencing and AGSA analysis. AGSA complements a complete line of high-throughput diagnostic sequence analysis, reducing time and costs while increasing reliability, notably for homopolymer tracts.
Collapse
|
10
|
Lavezzo E, Barzon L, Toppo S, Palù G. Third generation sequencing technologies applied to diagnostic microbiology: benefits and challenges in applications and data analysis. Expert Rev Mol Diagn 2016; 16:1011-23. [PMID: 27453996 DOI: 10.1080/14737159.2016.1217158] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION The diagnosis of infectious diseases is among the most successful areas of application of new generation sequencing technologies. The field has seen the development of numerous experimental and analytical approaches for the detection and the fine description of pathogenic and non-pathogenic microorganisms. AREAS COVERED Without claiming to be exhaustive with respect to all applications and methods developed over the years, this review focuses on the advantages and the issues brought by the new technologies, with an eye in particular to third generation sequencing methods. Both experimental procedures and algorithmic strategies are presented, following the most relevant publications which have led to progress in our ability of detecting infectious agents. Expert commentary: The technical advance brought by third generation sequencing platforms has the potential to significantly expand the range of diagnostic tools that will be available to clinicians. Nonetheless, the implementation of these technologies in clinical practice is still far from being actionable and will temporally follow the path undertaken by second generation methods, which still require the setup of standardized pipelines in both wet and dry laboratory procedures.
Collapse
Affiliation(s)
- Enrico Lavezzo
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Luisa Barzon
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Stefano Toppo
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| | - Giorgio Palù
- a Department of Molecular Medicine , University of Padova , Padova , Italy
| |
Collapse
|
11
|
González-Tortuero E, Rusek J, Maayan I, Petrusek A, Piálek L, Laurent S, Wolinska J. Genetic diversity of two Daphnia-infecting microsporidian parasites, based on sequence variation in the internal transcribed spacer region. Parasit Vectors 2016; 9:293. [PMID: 27206473 PMCID: PMC4875737 DOI: 10.1186/s13071-016-1584-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 05/10/2016] [Indexed: 11/12/2022] Open
Abstract
Background Microsporidia are spore-forming obligate intracellular parasites that include both emerging pathogens and economically important disease agents. However, little is known about the genetic diversity of microsporidia. Here, we investigated patterns of geographic population structure, intraspecific genetic variation, and recombination in two microsporidian taxa that commonly infect cladocerans of the Daphnia longispina complex in central Europe. Taken together, this information helps elucidate the reproductive mode and life-cycles of these parasite species. Methods Microsporidia-infected Daphnia were sampled from seven drinking water reservoirs in the Czech Republic. Two microsporidia species (Berwaldia schaefernai and microsporidium lineage MIC1) were sequenced at the internal transcribed spacer (ITS) region, using the 454 pyrosequencing platform. Geographical structure analyses were performed applying Fisher’s exact tests, analyses of molecular variance, and permutational MANOVA. To evaluate the genetic diversity of the ITS region, the number of polymorphic sites and Tajima’s and Watterson’s estimators of theta were calculated. Tajima’s D was also used to determine if the ITS in these taxa evolved neutrally. Finally, neighbour similarity score and pairwise homology index tests were performed to detect recombination events. Results While there was little variation among Berwaldia parasite strains infecting different host populations, the among-population genetic variation of MIC1 was significant. Likewise, ITS genetic diversity was lower in Berwaldia than in MIC1. Recombination signals were detected only in Berwaldia. Conclusion Genetic tests showed that parasite populations could have expanded recently after a bottleneck or that the ITS could be under negative selection in both microsporidia species. Recombination analyses might indicate cryptic sex in Berwaldia and pure asexuality in MIC1. The differences observed between the two microsporidian species present an exciting opportunity to study the genetic basis of microsporidia-Daphnia coevolution in natural populations, and to better understand reproduction in these parasites. Electronic supplementary material The online version of this article (doi:10.1186/s13071-016-1584-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Enrique González-Tortuero
- Department of Ecosystem Research, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 301, 12587, Berlin, Germany. .,Berlin Centre for Genomics in Biodiversity Research (BeGenDiv), Königin-Luise-Straße 6-8, 14195, Berlin, Germany. .,Department of Biology II, Ludwig Maximilians University, Großhaderner Straße 2, 82512, Planegg-Martinsried, Germany.
| | - Jakub Rusek
- Department of Biology II, Ludwig Maximilians University, Großhaderner Straße 2, 82512, Planegg-Martinsried, Germany
| | - Inbar Maayan
- Department of Biology II, Ludwig Maximilians University, Großhaderner Straße 2, 82512, Planegg-Martinsried, Germany
| | - Adam Petrusek
- Department of Ecology, Faculty of Science, Charles University in Prague, Viničná 7, 128 44, Prague, Czech Republic
| | - Lubomír Piálek
- Department of Ecology, Faculty of Science, Charles University in Prague, Viničná 7, 128 44, Prague, Czech Republic.,Department of Zoology, Faculty of Science, University of South Bohemia, Branišovská 31, 370 05, České Budějovice, Czech Republic
| | - Stefan Laurent
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Justyna Wolinska
- Department of Ecosystem Research, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 301, 12587, Berlin, Germany.,Department of Biology, Chemistry and Pharmacy, Institute of Biology, Free University of Berlin, Königin-Luise-Straße 1-3, 14195, Berlin, Germany
| |
Collapse
|
12
|
Alic AS, Tomas A, Medina I, Blanquer I. MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.09.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Alic AS, Ruzafa D, Dopazo J, Blanquer I. Objective review of de novostand-alone error correction methods for NGS data. WILEY INTERDISCIPLINARY REVIEWS: COMPUTATIONAL MOLECULAR SCIENCE 2016. [DOI: 10.1002/wcms.1239] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Andy S. Alic
- Institute of Instrumentation for Molecular Imaging (I3M); Universitat Politècnica de València; València Spain
| | - David Ruzafa
- Departamento de Quìmica Fìsica e Instituto de Biotecnologìa, Facultad de Ciencias; Universidad de Granada; Granada Spain
| | - Joaquin Dopazo
- Department of Computational Genomics; Príncipe Felipe Research Centre (CIPF); Valencia Spain
- CIBER de Enfermedades Raras (CIBERER); Valencia Spain
- Functional Genomics Node (INB) at CIPF; Valencia Spain
| | - Ignacio Blanquer
- Institute of Instrumentation for Molecular Imaging (I3M); Universitat Politècnica de València; València Spain
- Biomedical Imaging Research Group GIBI 2; Polytechnic University Hospital La Fe; Valencia Spain
| |
Collapse
|
14
|
Pathogen Discovery. Mol Microbiol 2016. [DOI: 10.1128/9781555819071.ch7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
15
|
Laehnemann D, Borkhardt A, McHardy AC. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform 2016; 17:154-79. [PMID: 26026159 PMCID: PMC4719071 DOI: 10.1093/bib/bbv029] [Citation(s) in RCA: 180] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 04/09/2015] [Indexed: 12/23/2022] Open
Abstract
Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.
Collapse
|
16
|
Thangam M, Gopal RK. CRCDA--Comprehensive resources for cancer NGS data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav092. [PMID: 26450948 PMCID: PMC4597977 DOI: 10.1093/database/bav092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/31/2015] [Indexed: 12/24/2022]
Abstract
Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer. Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html.
Collapse
Affiliation(s)
- Manonanthini Thangam
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| | - Ramesh Kumar Gopal
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| |
Collapse
|
17
|
Allam A, Kalnis P, Solovyev V. Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data. Bioinformatics 2015; 31:3421-8. [DOI: 10.1093/bioinformatics/btv415] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 07/08/2015] [Indexed: 11/12/2022] Open
|
18
|
Ollier M, Radosevic-Robin N, Kwiatkowski F, Ponelle F, Viala S, Privat M, Uhrhammer N, Bernard-Gallon D, Penault-Llorca F, Bignon YJ, Bidet Y. DNA repair genes implicated in triple negative familial non-BRCA1/2 breast cancer predisposition. Am J Cancer Res 2015; 5:2113-2126. [PMID: 26328243 PMCID: PMC4548324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 06/11/2015] [Indexed: 06/04/2023] Open
Abstract
Among breast cancers, 10 to 15% of cases would be due to hereditary risk. In these familial cases, mutations in BRCA1 and BRCA2 are found in only 15% to 20%, meaning that new susceptibility genes remain to be found. Triple-negative breast cancers represent 15% of all breast cancers, and are generally aggressive tumours without targeted therapies available. Our hypothesis is that some patients with triple negative breast cancer could share a genetic susceptibility different from other types of breast cancers. We screened 36 candidate genes, using pyrosequencing, in all the 50 triple negative breast cancer patients with familial history of cancer but no BRCA1 or BRCA2 mutation of a population of 3000 families who had consulted for a familial breast cancer between 2005 and 2013. Any mutations were also sequenced in available relatives of cases. Protein expression and loss of heterozygosity were explored in tumours. Seven deleterious mutations in 6 different genes (RAD51D, MRE11A, CHEK2, MLH1, MSH6, PALB2) were observed in one patient each, except the RAD51D mutation found in two cases. Loss of heterozygosity in the tumour was found for 2 of the 7 mutations. Protein expression was absent in tumour tissue for 5 mutations. Taking into consideration a specific subtype of tumour has revealed susceptibility genes, most of them in the homologous recombination DNA repair pathway. This may provide new possibilities for targeted therapies, along with better screening and care of patients.
Collapse
Affiliation(s)
- Marie Ollier
- Department of Molecular Oncology, Centre Jean PerrinClermont-Ferrand 63000, France
- Université d’AuvergneEA 4677, ERTICa, BP 10448, Clermont-Ferrand 63000, France
| | - Nina Radosevic-Robin
- Department of Anatomopathology, Centre Jean PerrinClermont-Ferrand 63000, France
- Université d’AuvergneEA 4677, ERTICa, BP 10448, Clermont-Ferrand 63000, France
| | - Fabrice Kwiatkowski
- Department of Molecular Oncology, Centre Jean PerrinClermont-Ferrand 63000, France
| | - Flora Ponelle
- Department of Molecular Oncology, Centre Jean PerrinClermont-Ferrand 63000, France
| | - Sandrine Viala
- Department of Molecular Oncology, Centre Jean PerrinClermont-Ferrand 63000, France
| | - Maud Privat
- Department of Molecular Oncology, Centre Jean PerrinClermont-Ferrand 63000, France
| | - Nancy Uhrhammer
- Department of Molecular Oncology, Centre Jean PerrinClermont-Ferrand 63000, France
| | | | - Frédérique Penault-Llorca
- Department of Anatomopathology, Centre Jean PerrinClermont-Ferrand 63000, France
- Université d’AuvergneEA 4677, ERTICa, BP 10448, Clermont-Ferrand 63000, France
| | - Yves-Jean Bignon
- Department of Molecular Oncology, Centre Jean PerrinClermont-Ferrand 63000, France
- Université d’AuvergneEA 4677, ERTICa, BP 10448, Clermont-Ferrand 63000, France
| | - Yannick Bidet
- Department of Molecular Oncology, Centre Jean PerrinClermont-Ferrand 63000, France
- Université d’AuvergneEA 4677, ERTICa, BP 10448, Clermont-Ferrand 63000, France
| |
Collapse
|
19
|
Gaspar JM, Thomas WK. FlowClus: efficiently filtering and denoising pyrosequenced amplicons. BMC Bioinformatics 2015; 16:105. [PMID: 25885646 PMCID: PMC4380255 DOI: 10.1186/s12859-015-0532-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Accepted: 03/10/2015] [Indexed: 11/24/2022] Open
Abstract
Background Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been designed that can reduce error rates in mock community data, but they change the sequence data in a manner that can be inconsistent with the process of removing errors in studies of real communities. In addition, they are limited by the size of the dataset and the sequencing technology used. Results FlowClus uses a systematic approach to filter and denoise reads efficiently. When denoising real datasets, FlowClus provides feedback about the process that can be used as the basis to adjust the parameters of the algorithm to suit the particular dataset. When used to analyze a mock community dataset, FlowClus produced a lower error rate compared to other denoising algorithms, while retaining significantly more sequence information. Among its other attributes, FlowClus can analyze longer reads being generated from all stages of 454 sequencing technology, as well as from Ion Torrent. It has processed a large dataset of 2.2 million GS-FLX Titanium reads in twelve hours; using its more efficient (but less precise) trie analysis option, this time was further reduced, to seven minutes. Conclusions Many of the amplicon-based metagenomics datasets generated over the last several years have been processed through a denoising pipeline that likely caused deleterious effects on the raw data. By using FlowClus, one can avoid such negative outcomes while maintaining control over the filtering and denoising processes. Because of its efficiency, FlowClus can be used to re-analyze multiple large datasets together, thereby leading to more standardized conclusions. FlowClus is freely available on GitHub (jsh58/FlowClus); it is written in C and supported on Linux. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0532-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- John M Gaspar
- Department of Molecular Cellular & Biomedical Sciences, University of New Hampshire, Durham, NH, USA.
| | - W Kelley Thomas
- Department of Molecular Cellular & Biomedical Sciences, University of New Hampshire, Durham, NH, USA.
| |
Collapse
|