Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hajirasouliha I, Hormozdiari F, Sahinalp SC, Birol I. Optimal pooling for genome re-sequencing with ultra-high-throughput short-read technologies. Bioinformatics 2008;24:i32-40. [PMID: 18586730 PMCID: PMC2718651 DOI: 10.1093/bioinformatics/btn173] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

For:	Hajirasouliha I, Hormozdiari F, Sahinalp SC, Birol I. Optimal pooling for genome re-sequencing with ultra-high-throughput short-read technologies. Bioinformatics 2008;24:i32-40. [PMID: 18586730 PMCID: PMC2718651 DOI: 10.1093/bioinformatics/btn173] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Number

Cited by Other Article(s)

Kerepesi C, Szalkai B, Grolmusz V. Visual analysis of the quantitative composition of metagenomic communities: the AmphoraVizu webserver. MICROBIAL ECOLOGY 2015;69:695-697. [PMID: 25296554 DOI: 10.1007/s00248-014-0502-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2014] [Accepted: 09/24/2014] [Indexed: 06/04/2023]

Abstract

Low-cost DNA sequencing methods have given rise to an enormous development of metagenomics in the past few years. One basic--and difficult--task is the phylogenetic annotation of the metagenomic samples studied. The difficulty comes from the fact that the typical environmental sample contains hundreds of unknown and still uncharacterized microorganisms. There are several possible methods to assign at least partial phylogenetic information to these uncharacterized data. Originally, the 16S ribosomal RNA was used as phylogenetic marker, then genome sequence alignments and similarity measures between the unknown genome and the reference genomes were applied (e.g., in the MEGAN software), and more recently, phylogeny-based methods applying suitable sets of marker genes were suggested (AMPHORA, AMPHORA2, and the webserver implementation AmphoraNet). Here, we present a visual analysis tool that is capable of demonstrating the quantitative relations gained from the output of the AMPHORA2 program or the easy-to-use AmphoraNet webserver. Our web-based tool, the AmphoraVizu webserver, makes the phylogenetic distribution of the metagenomic sample clearly visible by using the native output format of AMPHORA2 or AmphoraNet. The user may set the phylogenetic resolution (i.e., superkingdom, phylum, class, order, family, genus, and species) along with the chart type and will receive the distribution data detailed for all relevant marker genes in the sample. For publication quality results, the chart labels can be customized by the user. The visualization webserver is available at the address http://amphoravizu.pitgroup.org. The AmphoraNet webserver is available at http://amphoranet.pitgroup.org. The open-source version of the AmphoraVizu program is available for download at http://pitgroup.org/apps/amphoravizu/AmphoraVizu.pl.

Collapse

Kerepesi C, Bánky D, Grolmusz V. AmphoraNet: the webserver implementation of the AMPHORA2 metagenomic workflow suite. Gene 2013;533:538-40. [PMID: 24144838 DOI: 10.1016/j.gene.2013.10.015] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Revised: 10/07/2013] [Accepted: 10/08/2013] [Indexed: 02/07/2023]

Kuroshu RM. Nonoverlapping clone pooling for high-throughput sequencing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:1091-1097. [PMID: 24384700 DOI: 10.1109/tcbb.2013.83] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Lonardi S, Duma D, Alpert M, Cordero F, Beccuti M, Bhat PR, Wu Y, Ciardo G, Alsaihati B, Ma Y, Wanamaker S, Resnik J, Bozdag S, Luo MC, Close TJ. Combinatorial pooling enables selective sequencing of the barley gene space. PLoS Comput Biol 2013;9:e1003010. [PMID: 23592960 PMCID: PMC3617026 DOI: 10.1371/journal.pcbi.1003010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 02/05/2013] [Indexed: 11/23/2022] Open

Abstract

For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.

The problem of obtaining the full genomic sequence of an organism has been solved either via a global brute-force approach (called whole-genome shotgun) or by a divide-and-conquer strategy (called clone-by-clone). Both approaches have advantages and disadvantages in terms of cost, manual labor, and the ability to deal with sequencing errors and highly repetitive regions of the genome. With the advent of second-generation sequencing instruments, the whole-genome shotgun approach has been the preferred choice. The clone-by-clone strategy is, however, still very relevant for large complex genomes. In fact, several research groups and international consortia have produced clone libraries and physical maps for many economically or ecologically important organisms and now are in a position to proceed with sequencing. In this manuscript, we demonstrate the feasibility of this approach on the gene-space of a large, very repetitive plant genome. The novelty of our approach is that, in order to take advantage of the throughput of the current generation of sequencing instruments, we pool hundreds of clones using a special type of “smart” pooling design that allows one to establish with high accuracy the source clone from the sequenced reads in a pool. Extensive simulations and experimental results support our claims.

Collapse

Accurate Decoding of Pooled Sequenced Data Using Compressed Sensing. LECTURE NOTES IN COMPUTER SCIENCE 2013. [DOI: 10.1007/978-3-642-40453-5_7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Feder AF, Petrov DA, Bergland AO. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS One 2012;7:e48588. [PMID: 23152785 PMCID: PMC3494690 DOI: 10.1371/journal.pone.0048588] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 10/03/2012] [Indexed: 12/14/2022] Open

Elsharawy A, Forster M, Schracke N, Keller A, Thomsen I, Petersen BS, Stade B, Stähler P, Schreiber S, Rosenstiel P, Franke A. Improving mapping and SNP-calling performance in multiplexed targeted next-generation sequencing. BMC Genomics 2012;13:417. [PMID: 22913592 PMCID: PMC3563481 DOI: 10.1186/1471-2164-13-417] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2011] [Accepted: 08/10/2012] [Indexed: 11/10/2022] Open

Abstract

Background

Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions).

Results

We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent ‘read-backmapping’ to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach.

Conclusions

We recommend applying our general ‘two-step’ mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results.

Collapse

Zhu Y, Bergland AO, González J, Petrov DA. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS One 2012;7:e41901. [PMID: 22848651 PMCID: PMC3406057 DOI: 10.1371/journal.pone.0041901] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 06/28/2012] [Indexed: 11/26/2022] Open

Yu G. Gnom(Cmp): a quantitative approach for comparative analysis of closely related genomes of bacterial pathogens. Genome 2011;54:402-18. [PMID: 21539441 DOI: 10.1139/g11-005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Bansal V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 2010;26:i318-24. [PMID: 20529923 PMCID: PMC2881398 DOI: 10.1093/bioinformatics/btq214] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Abstract

Motivation: Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing.

Results: We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80–85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3–5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP.

Availability: Implementation of this method is available at http://polymorphism.scripps.edu/∼vbansal/software/CRISP/

Contact:vbansal@scripps.edu

Collapse

Knudsen B, Forsberg R, Miyamoto MM. A computer simulator for assessing different challenges and strategies of de novo sequence assembly. Genes (Basel) 2010;1:263-82. [PMID: 24710045 PMCID: PMC3954094 DOI: 10.3390/genes1020263] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2010] [Revised: 08/18/2010] [Accepted: 08/31/2010] [Indexed: 11/16/2022] Open

Prabhu S, Pe'er I. Overlapping pools for high-throughput targeted resequencing. Genome Res 2009;19:1254-61. [PMID: 19447964 DOI: 10.1101/gr.088559.108] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]