Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Dohm JC, Lottaz C, Borodina T, Himmelbauer H. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 2007;17:1697-706. [PMID: 17908823 PMCID: PMC2045152 DOI: 10.1101/gr.6435207] [Citation(s) in RCA: 207] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

For:	Dohm JC, Lottaz C, Borodina T, Himmelbauer H. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 2007;17:1697-706. [PMID: 17908823 PMCID: PMC2045152 DOI: 10.1101/gr.6435207] [Citation(s) in RCA: 207] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Number

Cited by Other Article(s)

101

Zerbino DR, McEwen GK, Margulies EH, Birney E. Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS One 2009;4:e8407. [PMID: 20027311 PMCID: PMC2793427 DOI: 10.1371/journal.pone.0008407] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 10/21/2009] [Indexed: 11/22/2022] Open

102

Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 2009;20:265-72. [PMID: 20019144 DOI: 10.1101/gr.097261.109] [Citation(s) in RCA: 2099] [Impact Index Per Article: 139.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

103

Zhao SQ, Wang J, Zhang L, Li JT, Gu X, Gao G, Wei L. BOAT: Basic Oligonucleotide Alignment Tool. BMC Genomics 2009;10 Suppl 3:S2. [PMID: 19958483 PMCID: PMC2788372 DOI: 10.1186/1471-2164-10-s3-s2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

104

Imelfort M, Edwards D. De novo sequencing of plant genomes using second-generation technologies. Brief Bioinform 2009;10:609-18. [DOI: 10.1093/bib/bbp039] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

105

Medvedev P, Brudno M. Maximum likelihood genome assembly. J Comput Biol 2009;16:1101-16. [PMID: 19645596 DOI: 10.1089/cmb.2009.0047] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

106

Marguerat S, Bähler J. RNA-seq: from technology to biology. CELLULAR AND MOLECULAR LIFE SCIENCES : CMLS 2009. [PMID: 19859660 DOI: 10.1007/s00018‐009‐0180‐6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

107

Horner DS, Pavesi G, Castrignano T, De Meo PD, Liuni S, Sammeth M, Picardi E, Pesole G. Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing. Brief Bioinform 2009;11:181-97. [DOI: 10.1093/bib/bbp046] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

108

Kerstens HHD, Crooijmans RPMA, Veenendaal A, Dibbits BW, Chin-A-Woeng TFC, den Dunnen JT, Groenen MAM. Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey. BMC Genomics 2009;10:479. [PMID: 19835600 PMCID: PMC2772860 DOI: 10.1186/1471-2164-10-479] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2009] [Accepted: 10/16/2009] [Indexed: 01/18/2023] Open

Abstract

BACKGROUND

The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a short read de novo sequence assembler and a program designed to identify variation within short reads. To illustrate the potential of this technique, we present the results obtained with a randomly sheared, enzymatically generated, 2-3 kbp genome fraction of six pooled Meleagris gallopavo (turkey) individuals.

RESULTS

A total of 100 million 36 bp reads were generated, representing approximately 5-6% (approximately 62 Mbp) of the turkey genome, with an estimated sequence depth of 58. Reads consisting of bases called with less than 1% error probability were selected and assembled into contigs. Subsequently, high throughput discovery of nucleotide variation was performed using sequences with more than 90% reliability by using the assembled contigs that were 50 bp or longer as the reference sequence. We identified more than 7,500 SNPs with a high probability of representing true nucleotide variation in turkeys. Increasing the reference genome by adding publicly available turkey BAC-end sequences increased the number of SNPs to over 11,000. A comparison with the sequenced chicken genome indicated that the assembled turkey contigs were distributed uniformly across the turkey genome. Genotyping of a representative sample of 340 SNPs resulted in a SNP conversion rate of 95%. The correlation of the minor allele count (MAC) and observed minor allele frequency (MAF) for the validated SNPs was 0.69.

CONCLUSION

We provide an efficient and cost-effective approach for the identification of thousands of high quality SNPs in species currently lacking a sequenced genome and applied this to turkey. The methodology addresses a random fraction of the genome, resulting in an even distribution of SNPs across the targeted genome.

Collapse

109

Sense from sequence reads: methods for alignment and assembly. Nat Methods 2009;6:S6-S12. [DOI: 10.1038/nmeth.1376] [Citation(s) in RCA: 266] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

110

Zhou X, Su Z, Sammons RD, Peng Y, Tranel PJ, Stewart CN, Yuan JS. Novel software package for cross-platform transcriptome analysis (CPTRA). BMC Bioinformatics 2009;10 Suppl 11:S16. [PMID: 19811681 PMCID: PMC3226187 DOI: 10.1186/1471-2105-10-s11-s16] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Abstract

Background

Next-generation sequencing techniques enable several novel transcriptome profiling approaches. Recent studies indicated that digital gene expression profiling based on short sequence tags has superior performance as compared to other transcriptome analysis platforms including microarrays. However, the transcriptomic analysis with tag-based methods often depends on available genome sequence. The use of tag-based methods in species without genome sequence should be complemented by other methods such as cDNA library sequencing. The combination of different next generation sequencing techniques like 454 pyrosequencing and Illumina Genome Analyzer (Solexa) will enable high-throughput and accurate global gene expression profiling in species with limited genome information. The combination of transcriptome data acquisition methods requires cross-platform transcriptome data analysis platforms, including a new software package for data processing.

Results

Here we presented a software package, CPTRA: Cross-Platform TRanscriptome Analysis, to analyze transcriptome profiling data from separate methods. The software package is available at http://people.tamu.edu/~syuan/cptra/cptra.html. It was applied to the case study of non-target site glyphosate resistance in horseweed; and the data was mined to discover resistance target gene(s). For the software, the input data included a long-read sequence dataset with proper annotation, and a short-read sequence tag dataset for the quantification of transcripts. By combining the two datasets, the software carries out the unique sequence tag identification, tag counting for transcript quantification, and cross-platform sequence matching functions, whereby the short sequence tags can be annotated with a function, level of expression, and Gene Ontology (GO) classification. Multiple sequence search algorithms were implemented and compared. The analysis highlighted the importance of transport genes in glyphosate resistance and identified several candidate genes for down-stream analysis.

Conclusion

CPTRA is a powerful software package for next generation sequencing-based transcriptome profiling in species with limited genome information. According to our case study, the strategy can greatly broaden the application of the next generation sequencing for transcriptome analysis in species without reference genome sequence.

Collapse

111

Maccallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A, Malek J, McKernan K, Ranade S, Shea TP, Williams L, Young S, Nusbaum C, Jaffe DB. ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol 2009;10:R103. [PMID: 19796385 PMCID: PMC2784318 DOI: 10.1186/gb-2009-10-10-r103] [Citation(s) in RCA: 141] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2009] [Revised: 08/20/2009] [Accepted: 10/01/2009] [Indexed: 11/10/2022] Open

112

Nagarajan N, Pop M. Parametric complexity of sequence assembly: theory and applications to next generation sequencing. J Comput Biol 2009;16:897-908. [PMID: 19580519 DOI: 10.1089/cmb.2009.0005] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

113

Soderlund C, Johnson E, Bomhoff M, Descour A. PAVE: program for assembling and viewing ESTs. BMC Genomics 2009;10:400. [PMID: 19709403 PMCID: PMC2748094 DOI: 10.1186/1471-2164-10-400] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 08/26/2009] [Indexed: 11/10/2022] Open

Abstract

Background

New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs.

Results

The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs.

Conclusion

The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.

Collapse

114

Studholme DJ, Ibanez SG, MacLean D, Dangl JL, Chang JH, Rathjen JP. A draft genome sequence and functional screen reveals the repertoire of type III secreted proteins of Pseudomonas syringae pathovar tabaci 11528. BMC Genomics 2009;10:395. [PMID: 19703286 PMCID: PMC2745422 DOI: 10.1186/1471-2164-10-395] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Accepted: 08/24/2009] [Indexed: 11/28/2022] Open

Abstract

Background

Pseudomonas syringae is a widespread bacterial pathogen that causes disease on a broad range of economically important plant species. Pathogenicity of P. syringae strains is dependent on the type III secretion system, which secretes a suite of up to about thirty virulence 'effector' proteins into the host cytoplasm where they subvert the eukaryotic cell physiology and disrupt host defences. P. syringae pathovar tabaci naturally causes disease on wild tobacco, the model member of the Solanaceae, a family that includes many crop species as well as on soybean.

Results

We used the 'next-generation' Illumina sequencing platform and the Velvet short-read assembly program to generate a 145X deep 6,077,921 nucleotide draft genome sequence for P. syringae pathovar tabaci strain 11528. From our draft assembly, we predicted 5,300 potential genes encoding proteins of at least 100 amino acids long, of which 303 (5.72%) had no significant sequence similarity to those encoded by the three previously fully sequenced P. syringae genomes. Of the core set of Hrp Outer Proteins that are conserved in three previously fully sequenced P. syringae strains, most were also conserved in strain 11528, including AvrE1, HopAH2, HopAJ2, HopAK1, HopAN1, HopI, HopJ1, HopX1, HrpK1 and HrpW1. However, the hrpZ1 gene is partially deleted and hopAF1 is completely absent in 11528. The draft genome of strain 11528 also encodes close homologues of HopO1, HopT1, HopAH1, HopR1, HopV1, HopAG1, HopAS1, HopAE1, HopAR1, HopF1, and HopW1 and a degenerate HopM1'. Using a functional screen, we confirmed that hopO1, hopT1, hopAH1, hopM1', hopAE1, hopAR1, and hopAI1' are part of the virulence-associated HrpL regulon, though the hopAI1' and hopM1' sequences were degenerate with premature stop codons. We also discovered two additional HrpL-regulated effector candidates and an HrpL-regulated distant homologue of avrPto1.

Conclusion

The draft genome sequence facilitates the continued development of P. syringae pathovar tabaci on wild tobacco as an attractive model system for studying bacterial disease on plants. The catalogue of effectors sheds further light on the evolution of pathogenicity and host-specificity as well as providing a set of molecular tools for the study of plant defence mechanisms. We also discovered several large genomic regions in Pta 11528 that do not share detectable nucleotide sequence similarity with previously sequenced Pseudomonas genomes. These regions may include horizontally acquired islands that possibly contribute to pathogenicity or epiphytic fitness of Pta 11528.

Collapse

115

Varshney RK, Nayak SN, May GD, Jackson SA. Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 2009;27:522-30. [PMID: 19679362 DOI: 10.1016/j.tibtech.2009.05.006] [Citation(s) in RCA: 396] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2009] [Revised: 05/21/2009] [Accepted: 05/27/2009] [Indexed: 10/20/2022]

116

Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang H, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW. Comparison of next generation sequencing technologies for transcriptome characterization. BMC Genomics 2009;10:347. [PMID: 19646272 PMCID: PMC2907694 DOI: 10.1186/1471-2164-10-347] [Citation(s) in RCA: 157] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 08/01/2009] [Indexed: 11/10/2022] Open

Abstract

Background

We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis.

Results

The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics.

Conclusion

NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms.

Collapse

117

Du J, Bjornson RD, Zhang ZD, Kong Y, Snyder M, Gerstein MB. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants. PLoS Comput Biol 2009;5:e1000432. [PMID: 19593373 PMCID: PMC2700963 DOI: 10.1371/journal.pcbi.1000432] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2008] [Accepted: 06/04/2009] [Indexed: 12/02/2022] Open

Abstract

The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.

In recent years, the development of high throughput sequencing and array technologies has enabled the accurate re-sequencing of individual genomes, especially in identifying and reconstructing the variants in an individual's genome compared to a “reference”. The costs and sensitivities of these technologies differ considerably from each other, and even more technologies are expected to appear in the near future. To both reduce the total cost of re-sequencing to an affordable point and be adaptive to these constantly evolving bio-technologies, we propose to build a computationally efficient simulation framework that can help us optimize the combination of different technologies to perform low cost comparative genome re-sequencing, especially in reconstructing large structural variants, which is considered in many respects the most challenging step in genome re-sequencing. Our simulation results quantitatively show how much improvement one can gain in reconstructing large structural variants by integrating different technologies in optimal ways. We envision that in the future, more experimental technologies will be incorporated into this simulation framework and its results can provide informative guidelines for the actual experimental design to achieve optimal genome re-sequencing output at low costs.

Collapse

118

Schröder J, Schröder H, Puglisi SJ, Sinha R, Schmidt B. SHREC: a short-read error correction method. Bioinformatics 2009;25:2157-63. [PMID: 19542152 DOI: 10.1093/bioinformatics/btp379] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

119

Schmidt B, Sinha R, Beresford-Smith B, Puglisi SJ. A fast hybrid short read fragment assembly algorithm. Bioinformatics 2009;25:2279-80. [PMID: 19535537 DOI: 10.1093/bioinformatics/btp374] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

120

Pop M. Genome assembly reborn: recent computational challenges. Brief Bioinform 2009;10:354-66. [PMID: 19482960 DOI: 10.1093/bib/bbp026] [Citation(s) in RCA: 181] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

121

Application of 'next-generation' sequencing technologies to microbial genetics. Nat Rev Microbiol 2009;7:287-96. [PMID: 19287448 DOI: 10.1038/nrmicro2122] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

122

Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res 2009;19:1117-23. [PMID: 19251739 DOI: 10.1101/gr.089532.108] [Citation(s) in RCA: 2412] [Impact Index Per Article: 160.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

123

Voelkerding KV, Dames SA, Durtschi JD. Next-generation sequencing: from basic research to diagnostics. Clin Chem 2009;55:641-58. [PMID: 19246620 DOI: 10.1373/clinchem.2008.112789] [Citation(s) in RCA: 433] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

124

QSRA: a quality-value guided de novo short read assembler. BMC Bioinformatics 2009;10:69. [PMID: 19239711 PMCID: PMC2653489 DOI: 10.1186/1471-2105-10-69] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2008] [Accepted: 02/24/2009] [Indexed: 12/16/2022] Open

125

MacLean D, Jones JDG, Studholme DJ. Application of 'next-generation' sequencing technologies to microbial genetics. Nat Rev Microbiol 2009. [DOI: 10.1038/nrmicro2088] [Citation(s) in RCA: 243] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

126

Wang W, Zhang P, Liu X. Short read DNA fragment anchoring algorithm. BMC Bioinformatics 2009;10 Suppl 1:S17. [PMID: 19208116 PMCID: PMC2648759 DOI: 10.1186/1471-2105-10-s1-s17] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

127

Jackson BG, Schnable PS, Aluru S. Parallel short sequence assembly of transcriptomes. BMC Bioinformatics 2009;10 Suppl 1:S14. [PMID: 19208113 PMCID: PMC2648799 DOI: 10.1186/1471-2105-10-s1-s14] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

128

Hossain MS, Azimi N, Skiena S. Crystallizing short-read assemblies around seeds. BMC Bioinformatics 2009;10 Suppl 1:S16. [PMID: 19208115 PMCID: PMC2648751 DOI: 10.1186/1471-2105-10-s1-s16] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

129

Chin FYL, Leung HCM, Li WL, Yiu SM. Finding optimal threshold for correction error reads in DNA assembling. BMC Bioinformatics 2009;10 Suppl 1:S15. [PMID: 19208114 PMCID: PMC2648749 DOI: 10.1186/1471-2105-10-s1-s15] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

130

Warren RL, Nelson BH, Holt RA. Profiling model T-cell metagenomes with short reads. ACTA ACUST UNITED AC 2009;25:458-64. [PMID: 19136549 DOI: 10.1093/bioinformatics/btp010] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

131

Imelfort M. Sequence Comparison Tools. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

132

Fox S, Filichkin S, Mockler TC. Applications of ultra-high-throughput sequencing. Methods Mol Biol 2009;553:79-108. [PMID: 19588102 DOI: 10.1007/978-1-60327-563-7_5] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]

133

Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol 2008;26:1135-45. [PMID: 18846087 DOI: 10.1038/nbt1486] [Citation(s) in RCA: 2395] [Impact Index Per Article: 149.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

134

Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M, Valle G, Wincker P, Scarpelli C, Jaillon O, Artiguenave F. Annotating genomes with massive-scale RNA sequencing. Genome Biol 2008;9:R175. [PMID: 19087247 PMCID: PMC2646279 DOI: 10.1186/gb-2008-9-12-r175] [Citation(s) in RCA: 171] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2008] [Revised: 10/30/2008] [Accepted: 12/16/2008] [Indexed: 01/13/2023] Open

135

Farrer RA, Kemen E, Jones JDG, Studholme DJ. De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads. FEMS Microbiol Lett 2008;291:103-11. [PMID: 19077061 DOI: 10.1111/j.1574-6968.2008.01441.x] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

136

Scheibye-Alsing K, Hoffmann S, Frankel A, Jensen P, Stadler PF, Mang Y, Tommerup N, Gilchrist MJ, Nygård AB, Cirera S, Jørgensen CB, Fredholm M, Gorodkin J. Sequence assembly. Comput Biol Chem 2008;33:121-36. [PMID: 19152793 DOI: 10.1016/j.compbiolchem.2008.11.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2008] [Revised: 11/28/2008] [Accepted: 11/28/2008] [Indexed: 01/20/2023]

137

Chaisson MJ, Brinza D, Pevzner PA. De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res 2008;19:336-46. [PMID: 19056694 DOI: 10.1101/gr.079053.108] [Citation(s) in RCA: 208] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

138

Reinhardt JA, Baltrus DA, Nishimura MT, Jeck WR, Jones CD, Dangl JL. De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res 2008;19:294-305. [PMID: 19015323 DOI: 10.1101/gr.083311.108] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

139

Noguchi H, Taniguchi T, Itoh T. MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 2008;15:387-96. [PMID: 18940874 PMCID: PMC2608843 DOI: 10.1093/dnares/dsn027] [Citation(s) in RCA: 459] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

140

Holt RA, Jones SJM. The new paradigm of flow cell sequencing. Genome Res 2008;18:839-46. [PMID: 18519653 DOI: 10.1101/gr.073262.107] [Citation(s) in RCA: 171] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

141

Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 2008;36:e105. [PMID: 18660515 PMCID: PMC2532726 DOI: 10.1093/nar/gkn425] [Citation(s) in RCA: 748] [Impact Index Per Article: 46.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

142

Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008;18:821-9. [PMID: 18349386 DOI: 10.1101/gr.074492.107] [Citation(s) in RCA: 7064] [Impact Index Per Article: 441.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

143

Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, Nusbaum C, Jaffe DB. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 2008;18:810-20. [PMID: 18340039 DOI: 10.1101/gr.7337908] [Citation(s) in RCA: 533] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

144

Hernandez D, François P, Farinelli L, Osterås M, Schrenzel J. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res 2008;18:802-9. [PMID: 18332092 DOI: 10.1101/gr.072033.107] [Citation(s) in RCA: 483] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

145

Pop M, Salzberg SL. Bioinformatics challenges of new sequencing technology. Trends Genet 2008;24:142-9. [DOI: 10.1016/j.tig.2007.12.006] [Citation(s) in RCA: 234] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2007] [Revised: 12/18/2007] [Accepted: 12/19/2007] [Indexed: 12/24/2022]