1
|
Na JC, Lee I, Rhee JK, Shin SY. Fast single individual haplotyping method using GPGPU. Comput Biol Med 2019; 113:103421. [PMID: 31499396 DOI: 10.1016/j.compbiomed.2019.103421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 08/28/2019] [Accepted: 08/28/2019] [Indexed: 11/27/2022]
Abstract
BACKGROUND Most bioinformatic tools for next generation sequencing (NGS) data are computationally intensive, requiring a large amount of computational power for processing and analysis. Here the utility of graphic processing units (GPUs) for NGS data computation is assessed. METHOD In a previous study, we developed a probabilistic evolutionary algorithm with toggling for haplotyping (PEATH) method based on the estimation of distribution algorithm and toggling heuristic. Here, we parallelized the PEATH method (PEATH/G) using general-purpose computing on GPU (GPGPU). RESULTS The PEATH/G runs approximately 46.8 times and 25.4 times faster than PEATH on the NA12878 fosmid-sequencing dataset and the HuRef dataset, respectively, with an NVIDIA GeForce GTX 1660Ti. Moreover, the PEATH/G is approximately 13.3 times faster on the fosmid-sequencing dataset, even with an inexpensive conventional GPGPU (NVIDIA GeForce GTX 950). CONCLUSIONS PEATH/G can be a practical single individual haplotyping tool in terms of both its accuracy and speed. GPGPU can help reduce the running time of NGS analysis tools.
Collapse
Affiliation(s)
- Joong Chae Na
- Department of Computer Science and Engineering, Sejong University, Seoul, 05006, South Korea
| | - Inbok Lee
- Department of Software, Korea Aerospace University, Goyang, 10540, South Korea
| | - Je-Keun Rhee
- School of Systems Biomedical Science, Soongsil University, Seoul, 06978, South Korea.
| | - Soo-Yong Shin
- Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06351, South Korea; Big Data Research Center, Samsung Medical Center, Seoul, 06351, South Korea.
| |
Collapse
|
2
|
Warris S, Schijlen E, van de Geest H, Vegesna R, Hesselink T, Te Lintel Hekkert B, Sanchez Perez G, Medvedev P, Makova KD, de Ridder D. Correcting palindromes in long reads after whole-genome amplification. BMC Genomics 2018; 19:798. [PMID: 30400848 PMCID: PMC6218980 DOI: 10.1186/s12864-018-5164-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 10/15/2018] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Next-generation sequencing requires sufficient DNA to be available. If limited, whole-genome amplification is applied to generate additional amounts of DNA. Such amplification often results in many chimeric DNA fragments, in particular artificial palindromic sequences, which limit the usefulness of long sequencing reads. RESULTS Here, we present Pacasus, a tool for correcting such errors. Two datasets show that it markedly improves read mapping and de novo assembly, yielding results similar to these that would be obtained with non-amplified DNA. CONCLUSIONS With Pacasus long-read technologies become available for sequencing targets with very small amounts of DNA, such as single cells or even single chromosomes.
Collapse
Affiliation(s)
- Sven Warris
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands.
| | - Elio Schijlen
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Henri van de Geest
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands.,Present address Genetwister Technologies BV, Wageningen, The Netherlands
| | - Rahulsimham Vegesna
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, State College, PA, 16802, USA.,Computation, Bioinformatics, Statistics Graduate Training Program, Pennsylvania State University, University Park, State College, PA, 16802, USA.,The Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Thamara Hesselink
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Bas Te Lintel Hekkert
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Gabino Sanchez Perez
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands.,Present address Genetwister Technologies BV, Wageningen, The Netherlands
| | - Paul Medvedev
- The Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, State College, PA, 16802, USA.,Department of Computer Science and Engineering, Pennsylvania State University, University Park, State College, PA, 16802, USA.,Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, State College, PA, 16802, USA.,The Center for Medical Genomics, Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Kateryna D Makova
- The Center for Medical Genomics, Pennsylvania State University, University Park, State College, PA, 16802, USA.,Department of Biology, Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
3
|
Awan MG, Eslami T, Saeed F. GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data. Comput Biol Med 2018; 101:163-173. [PMID: 30145436 DOI: 10.1016/j.compbiomed.2018.08.015] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 08/10/2018] [Accepted: 08/12/2018] [Indexed: 11/29/2022]
Abstract
In the age of ever increasing data, faster and more efficient data processing algorithms are needed. Graphics Processing Units (GPU) are emerging as a cost-effective alternative architecture for high-end computing. The optimal design of GPU algorithms is a challenging task which requires thorough understanding of the high performance computing architecture as well as the algorithmic design. The steep learning curve needed for effective GPU-centric algorithm design and implementation requires considerable expertise, time, and resources. In this paper, we present GPU-DAEMON, a GPU Data Management, Algorithm Design and Optimization technique suitable for processing array based big omics data. Our proposed GPU algorithm design template outlines and provides generic methods to tackle critical bottlenecks which can be followed to implement high performance, scalable GPU algorithms for given big data problem. We study the capability of GPU-DAEMON by reviewing the implementation of GPU-DAEMON based algorithms for three different big data problems. Speed up of as large as 386x (over the sequential version) and 50x (over naive GPU design methods) are observed using the proposed GPU-DAEMON. GPU-DAEMON template is available at https://github.com/pcdslab/GPU-DAEMON and the source codes for GPU-ArraySort, G-MSR and GPU-PCC are available at https://github.com/pcdslab.
Collapse
Affiliation(s)
- Muaaz Gul Awan
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| | - Taban Eslami
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL, USA.
| |
Collapse
|
4
|
Warris S, Timal NRN, Kempenaar M, Poortinga AM, van de Geest H, Varbanescu AL, Nap JP. pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment. PLoS One 2018; 13:e0190279. [PMID: 29293576 PMCID: PMC5749749 DOI: 10.1371/journal.pone.0190279] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2017] [Accepted: 12/11/2017] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python. RESULTS The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS. CONCLUSIONS pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.
Collapse
Affiliation(s)
- Sven Warris
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences Groningen, Groningen, the Netherlands.,Applied Bioinformatics, Wageningen University and Research, Wageningen, the Netherlands
| | - N Roshan N Timal
- Parallel and Distributed Systems, Delft University of Technology, Delft, the Netherlands
| | - Marcel Kempenaar
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences Groningen, Groningen, the Netherlands
| | - Arne M Poortinga
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences Groningen, Groningen, the Netherlands
| | - Henri van de Geest
- Applied Bioinformatics, Wageningen University and Research, Wageningen, the Netherlands
| | - Ana L Varbanescu
- Parallel and Distributed Systems, Delft University of Technology, Delft, the Netherlands
| | - Jan-Peter Nap
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences Groningen, Groningen, the Netherlands.,Applied Bioinformatics, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
5
|
Vanheule A, Audenaert K, Warris S, van de Geest H, Schijlen E, Höfte M, De Saeger S, Haesaert G, Waalwijk C, van der Lee T. Living apart together: crosstalk between the core and supernumerary genomes in a fungal plant pathogen. BMC Genomics 2016; 17:670. [PMID: 27552804 PMCID: PMC4994206 DOI: 10.1186/s12864-016-2941-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2016] [Accepted: 07/14/2016] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Eukaryotes display remarkable genome plasticity, which can include supernumerary chromosomes that differ markedly from the core chromosomes. Despite the widespread occurrence of supernumerary chromosomes in fungi, their origin, relation to the core genome and the reason for their divergent characteristics are still largely unknown. The complexity of genome assembly due to the presence of repetitive DNA partially accounts for this. RESULTS Here we use single-molecule real-time (SMRT) sequencing to assemble the genome of a prominent fungal wheat pathogen, Fusarium poae, including at least one supernumerary chromosome. The core genome contains limited transposable elements (TEs) and no gene duplications, while the supernumerary genome holds up to 25 % TEs and multiple gene duplications. The core genome shows all hallmarks of repeat-induced point mutation (RIP), a defense mechanism against TEs, specific for fungi. The absence of RIP on the supernumerary genome accounts for the differences between the two (sub)genomes, and results in a functional crosstalk between them. The supernumerary genome is a reservoir for TEs that migrate to the core genome, and even large blocks of supernumerary sequence (>200 kb) have recently translocated to the core. Vice versa, the supernumerary genome acts as a refuge for genes that are duplicated from the core genome. CONCLUSIONS For the first time, a mechanism was determined that explains the differences that exist between the core and supernumerary genome in fungi. Different biology rather than origin was shown to be responsible. A "living apart together" crosstalk exists between the core and supernumerary genome, accelerating chromosomal and organismal evolution.
Collapse
Affiliation(s)
- Adriaan Vanheule
- Department of Applied Biosciences, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
- Wageningen UR, Wageningen, The Netherlands
| | - Kris Audenaert
- Department of Applied Biosciences, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | | | | | | | - Monica Höfte
- Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Sarah De Saeger
- Department of Bioanalysis, Faculty of Pharmaceutical Sciences, Ghent University, Ghent, Belgium
| | - Geert Haesaert
- Department of Applied Biosciences, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | | | | |
Collapse
|