Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hossain MS, Azimi N, Skiena S. Crystallizing short-read assemblies around seeds. BMC Bioinformatics 2009;10 Suppl 1:S16. [PMID: 19208115 PMCID: PMC2648751 DOI: 10.1186/1471-2105-10-s1-s16] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

For:	Hossain MS, Azimi N, Skiena S. Crystallizing short-read assemblies around seeds. BMC Bioinformatics 2009;10 Suppl 1:S16. [PMID: 19208115 PMCID: PMC2648751 DOI: 10.1186/1471-2105-10-s1-s16] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Number

Cited by Other Article(s)

Samantray D, Tanwar AS, Murali TS, Brand A, Satyamoorthy K, Paul B. A Comprehensive Bioinformatics Resource Guide for Genome-Based Antimicrobial Resistance Studies. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023;27:445-460. [PMID: 37861712 DOI: 10.1089/omi.2023.0140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]

Li F, Zhao X, Li M, He K, Huang C, Zhou Y, Li Z, Walters JR. Insect genomes: progress and challenges. INSECT MOLECULAR BIOLOGY 2019;28:739-758. [PMID: 31120160 DOI: 10.1111/imb.12599] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 03/22/2019] [Accepted: 05/14/2019] [Indexed: 05/24/2023]

Si X, Wang Q, Zhang L, Wu R, Ma J. Survey of gene splicing algorithms based on reads. Bioengineered 2017;8:750-758. [PMID: 28873323 DOI: 10.1080/21655979.2017.1373538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open

Langenkämper D, Jakobi T, Feld D, Jelonek L, Goesmann A, Nattkemper TW. Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations. Front Genet 2016;7:5. [PMID: 26904094 PMCID: PMC4748744 DOI: 10.3389/fgene.2016.00005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 01/17/2016] [Indexed: 12/27/2022] Open

Abstract

Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.

Collapse

Nimmy SF, Kamal MS. Next generation sequencing under de novo genome assembly. INT J BIOMATH 2015. [DOI: 10.1142/s1793524515300018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Allen JM, Huang DI, Cronk QC, Johnson KP. aTRAM - automated target restricted assembly method: a fast method for assembling loci across divergent taxa from next-generation sequencing data. BMC Bioinformatics 2015;16:98. [PMID: 25887972 PMCID: PMC4380108 DOI: 10.1186/s12859-015-0515-2] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 02/24/2015] [Indexed: 11/10/2022] Open

Bonham-Carter O, Ali H, Bastola D. A base composition analysis of natural patterns for the preprocessing of metagenome sequences. BMC Bioinformatics 2014;14 Suppl 11:S5. [PMID: 24564274 PMCID: PMC3816298 DOI: 10.1186/1471-2105-14-s11-s5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

Abstract

Background

On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects.

Results

Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy.

Conclusions

We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms often have much conserved code.

Collapse

El-Metwally S, Ouda OM, Helmy M. Approaches and Challenges of Next-Generation Sequence Assembly Stages. NEXT GENERATION SEQUENCING TECHNOLOGIES AND CHALLENGES IN SEQUENCE ASSEMBLY 2014. [DOI: 10.1007/978-1-4939-0715-1_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

El-Metwally S, Hamza T, Zakaria M, Helmy M. Next-generation sequence assembly: four stages of data processing and computational challenges. PLoS Comput Biol 2013;9:e1003345. [PMID: 24348224 PMCID: PMC3861042 DOI: 10.1371/journal.pcbi.1003345] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

PRICE: software for the targeted assembly of components of (Meta) genomic sequence data. G3-GENES GENOMES GENETICS 2013;3:865-80. [PMID: 23550143 PMCID: PMC3656733 DOI: 10.1534/g3.113.005967] [Citation(s) in RCA: 188] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Lu M, Luo Q, Wang B, Wu J, Zhao J. GPU-Accelerated Bidirected De Bruijn Graph Construction for Genome Assembly. WEB TECHNOLOGIES AND APPLICATIONS 2013. [DOI: 10.1007/978-3-642-37401-2_8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Mutz KO, Heilkenbrinker A, Lönne M, Walter JG, Stahl F. Transcriptome analysis using next-generation sequencing. Curr Opin Biotechnol 2012;24:22-30. [PMID: 23020966 DOI: 10.1016/j.copbio.2012.09.004] [Citation(s) in RCA: 293] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2012] [Revised: 09/03/2012] [Accepted: 09/04/2012] [Indexed: 12/16/2022]

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012. [PMID: 22506599 DOI: 10.1089/cmb.20120021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2023] Open

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012;19:455-77. [PMID: 22506599 DOI: 10.1089/cmb.2012.0021] [Citation(s) in RCA: 16211] [Impact Index Per Article: 1350.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

[Progress on whole genome sequencing in woody plants]. YI CHUAN = HEREDITAS 2012;34:145-56. [PMID: 22382056 DOI: 10.3724/sp.j.1005.2012.00145] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Plantagora: modeling whole genome sequencing and assembly of plant genomes. PLoS One 2011;6:e28436. [PMID: 22174807 PMCID: PMC3236183 DOI: 10.1371/journal.pone.0028436] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 11/08/2011] [Indexed: 01/17/2023] Open

Abstract

Background

Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them.

Methodology/Principal Findings

For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website.

Conclusions/Significance

Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further.

Collapse

Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ. WITHDRAWN: Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 2011:jhg201162. [PMID: 21677664 DOI: 10.1038/jhg.2011.62] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Cerdeira LT, Carneiro AR, Ramos RTJ, de Almeida SS, D'Afonseca V, Schneider MPC, Baumbach J, Tauch A, McCulloch JA, Azevedo VAC, Silva A. Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study. J Microbiol Methods 2011;86:218-23. [PMID: 21620904 DOI: 10.1016/j.mimet.2011.05.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2010] [Revised: 05/05/2011] [Accepted: 05/12/2011] [Indexed: 12/28/2022]

Comparing de novo genome assembly: the long and short of it. PLoS One 2011;6:e19175. [PMID: 21559467 PMCID: PMC3084767 DOI: 10.1371/journal.pone.0019175] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 03/29/2011] [Indexed: 01/30/2023] Open

Abstract

Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.

Collapse

Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 2011;56:406-14. [PMID: 21525877 DOI: 10.1038/jhg.2011.43] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Localized Genome Assembly from Reads to Scaffolds: Practical Traversal of the Paired String Graph. LECTURE NOTES IN COMPUTER SCIENCE 2011. [DOI: 10.1007/978-3-642-23038-7_4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Chain PS, Xie G, Starkenburg SR, Scholz MB, Beckloff N, Lo CC, Davenport KW, Reitenga KG, Daligault HE, Detter JC, Freitas TA, Gleasner CD, Green LD, Han CS, McMurry KK, Meincke LJ, Shen X, Zeytun A. Genomics for Key Players in the N Cycle. Methods Enzymol 2011;496:289-318. [DOI: 10.1016/b978-0-12-386489-5.00012-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Boisvert S, Laviolette F, Corbeil J. Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J Comput Biol 2010;17:1519-33. [PMID: 20958248 DOI: 10.1089/cmb.2009.0238] [Citation(s) in RCA: 362] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics 2010;95:315-27. [PMID: 20211242 DOI: 10.1016/j.ygeno.2010.03.001] [Citation(s) in RCA: 630] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2009] [Revised: 02/26/2010] [Accepted: 03/02/2010] [Indexed: 01/08/2023]

Zerbino DR, McEwen GK, Margulies EH, Birney E. Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS One 2009;4:e8407. [PMID: 20027311 PMCID: PMC2793427 DOI: 10.1371/journal.pone.0008407] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 10/21/2009] [Indexed: 11/22/2022] Open