Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Fischer M, Snajder R, Pabinger S, Dander A, Schossig A, Zschocke J, Trajanoski Z, Stocker G. SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data. PLoS One 2012;7:e41948. [PMID: 22870267 PMCID: PMC3411592 DOI: 10.1371/journal.pone.0041948] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 06/28/2012] [Indexed: 01/24/2023] Open

For:	Fischer M, Snajder R, Pabinger S, Dander A, Schossig A, Zschocke J, Trajanoski Z, Stocker G. SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data. PLoS One 2012;7:e41948. [PMID: 22870267 PMCID: PMC3411592 DOI: 10.1371/journal.pone.0041948] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 06/28/2012] [Indexed: 01/24/2023] Open

Number

Cited by Other Article(s)

Harsono IW, Ariani Y, Benyamin B, Fadilah F, Pujianto DA, Hafifah CN. IDeRare: a lightweight and extensible open-source phenotype and exome analysis pipeline for germline rare disease diagnosis. JAMIA Open 2024;7:ooae052. [PMID: 38883202 PMCID: PMC11179852 DOI: 10.1093/jamiaopen/ooae052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 05/20/2024] [Accepted: 05/27/2024] [Indexed: 06/18/2024] Open

Anilkumar Sithara A, Maripuri D, Moorthy K, Amirtha Ganesh S, Philip P, Banerjee S, Sudhakar M, Raman K. iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data. NAR Genom Bioinform 2022;4:lqac053. [PMID: 35899080 PMCID: PMC9310080 DOI: 10.1093/nargab/lqac053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 06/17/2022] [Accepted: 07/04/2022] [Indexed: 11/13/2022] Open

Affiliation(s)

Anjana Anilkumar Sithara Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras , Chennai 600036, India Centre for Integrative Biology and Systems mEdicine , IIT Madras, India Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI) , IIT Madras, India
Devi Priyanka Maripuri Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras , Chennai 600036, India Centre for Integrative Biology and Systems mEdicine , IIT Madras, India Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI) , IIT Madras, India
Keerthika Moorthy Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras , Chennai 600036, India Centre for Integrative Biology and Systems mEdicine , IIT Madras, India Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI) , IIT Madras, India
Sai Sruthi Amirtha Ganesh Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras , Chennai 600036, India Centre for Integrative Biology and Systems mEdicine , IIT Madras, India Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI) , IIT Madras, India
Philge Philip Centre for Integrative Biology and Systems mEdicine , IIT Madras, India Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI) , IIT Madras, India
Shayantan Banerjee Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras , Chennai 600036, India Centre for Integrative Biology and Systems mEdicine , IIT Madras, India Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI) , IIT Madras, India
Malvika Sudhakar Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras , Chennai 600036, India Centre for Integrative Biology and Systems mEdicine , IIT Madras, India Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI) , IIT Madras, India
Karthik Raman Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras , Chennai 600036, India Centre for Integrative Biology and Systems mEdicine , IIT Madras, India Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI) , IIT Madras, India

Collapse

Ahmed Z, Renart EG, Mishra D, Zeeshan S. JWES: a new pipeline for whole genome/exome sequence data processing, management, and gene-variant discovery, annotation, prediction, and genotyping. FEBS Open Bio 2021;11:2441-2452. [PMID: 34370400 PMCID: PMC8409305 DOI: 10.1002/2211-5463.13261] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 01/07/2023] Open

Abstract

Whole genome and exome sequencing (WGS/WES) are the most popular next‐generation sequencing (NGS) methodologies and are at present often used to detect rare and common genetic variants of clinical significance. We emphasize that automated sequence data processing, management, and visualization should be an indispensable component of modern WGS and WES data analysis for sequence assembly, variant detection (SNPs, SVs), imputation, and resolution of haplotypes. In this manuscript, we present a newly developed findable, accessible, interoperable, and reusable (FAIR) bioinformatics‐genomics pipeline Java based Whole Genome/Exome Sequence Data Processing Pipeline (JWES) for efficient variant discovery and interpretation, and big data modeling and visualization. JWES is a cross‐platform, user‐friendly, product line application, that entails three modules: (a) data processing, (b) storage, and (c) visualization. The data processing module performs a series of different tasks for variant calling, the data storage module efficiently manages high‐volume gene‐variant data, and the data visualization module supports variant data interpretation with Circos graphs. The performance of JWES was tested and validated in‐house with different experiments, using Microsoft Windows, macOS Big Sur, and UNIX operating systems. JWES is an open‐source and freely available pipeline, allowing scientists to take full advantage of all the computing resources available, without requiring much computer science knowledge. We have successfully applied JWES for processing, management, and gene‐variant discovery, annotation, prediction, and genotyping of WGS and WES data to analyze variable complex disorders. In summary, we report the performance of JWES with some reproducible case studies, using open access and in‐house generated, high‐quality datasets.

Collapse

Ahmed Z, Renart EG, Zeeshan S. Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping. PeerJ 2021;9:e11724. [PMID: 34395068 PMCID: PMC8320519 DOI: 10.7717/peerj.11724] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/14/2021] [Indexed: 12/12/2022] Open

Nieroda L, Maas L, Thiebes S, Lang U, Sunyaev A, Achter V, Peifer M. iRODS metadata management for a cancer genome analysis workflow. BMC Bioinformatics 2019;20:29. [PMID: 30646845 PMCID: PMC6334444 DOI: 10.1186/s12859-018-2576-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 12/10/2018] [Indexed: 12/13/2022] Open

Cox KH, Oliveira LMB, Plummer L, Corbin B, Gardella T, Balasubramanian R, Crowley WF. Modeling mutant/wild-type interactions to ascertain pathogenicity of PROKR2 missense variants in patients with isolated GnRH deficiency. Hum Mol Genet 2019;27:338-350. [PMID: 29161432 DOI: 10.1093/hmg/ddx404] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Accepted: 11/10/2017] [Indexed: 12/30/2022] Open

Abstract

A major challenge in human genetics is the validation of pathogenicity of heterozygous missense variants. This problem is well-illustrated by PROKR2 variants associated with Isolated GnRH Deficiency (IGD). Homozygous, loss of function variants in PROKR2 was initially implicated in autosomal recessive IGD; however, most IGD-associated PROKR2 variants are heterozygous. Moreover, while IGD patient cohorts are enriched for PROKR2 missense variants similar rare variants are also found in normal individuals. To elucidate the pathogenic mechanisms distinguishing IGD-associated PROKR2 variants from rare variants in controls, we assessed 59 variants using three approaches: (i) in silico prediction, (ii) traditional in vitro functional assays across three signaling pathways with mutant-alone transfections, and (iii) modified in vitro assays with mutant and wild-type expression constructs co-transfected to model in vivo heterozygosity. We found that neither in silico analyses nor traditional in vitro assessments of mutants transfected alone could distinguish IGD variants from control variants. However, in vitro co-transfections revealed that 15/34 IGD variants caused loss-of-function (LoF), including 3 novel dominant-negatives, while only 4/25 control variants caused LoF. Surprisingly, 19 IGD-associated variants were benign or exhibited LoF that could be rescued by WT co-transfection. Overall, variants that were LoF in ≥ 2 signaling assays under co-transfection conditions were more likely to be disease-associated than benign or 'rescuable' variants. Our findings suggest that in vitro modeling of WT/Mutant interactions increases the resolution for identifying causal variants, uncovers novel dominant negative mutations, and provides new insights into the pathogenic mechanisms underlying heterozygous PROKR2 variants.

Collapse

Musacchia F, Ciolfi A, Mutarelli M, Bruselles A, Castello R, Pinelli M, Basu S, Banfi S, Casari G, Tartaglia M, Nigro V. VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database. BMC Bioinformatics 2018;19:477. [PMID: 30541431 PMCID: PMC6291943 DOI: 10.1186/s12859-018-2532-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Accepted: 11/21/2018] [Indexed: 02/06/2023] Open

Abstract

BACKGROUND

Targeted resequencing has become the most used and cost-effective approach for identifying causative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid technological progress, NGS laboratories are expanding their capabilities to address the increasing number of analyses. Several open source tools are available to build a generic variant calling pipeline, but a tool able to simultaneously execute multiple analyses, organize, and categorize the samples is still missing.

RESULTS

Here we describe VarGenius, a Linux based command line software able to execute customizable pipelines for the analysis of multiple targeted resequencing data using parallel computing. VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes). VarGenius can also perform the "joint analysis" of hundreds of samples with a single command, drastically reducing the time for the configuration and execution of the analysis. VarGenius executes the standard pipeline of the Genome Analysis Tool-Kit (GATK) best practices (GBP) for germinal variant calling, annotates the variants using Annovar, and generates a user-friendly output displaying the results through a web page. VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM each. Under this configuration, a 50 M whole exome sequencing (WES) analysis for a family was executed in about 7 h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a 1 M panel in about 2 h.

CONCLUSIONS

We developed VarGenius, a "master" tool that faces the increasing demand of heterogeneous NGS analyses and allows maximum flexibility for downstream analyses. It paves the way to a different kind of analysis, centered on cohorts rather than on singleton. Patient and variant information are stored into the database and any output file can be accessed programmatically. VarGenius can be used for routine analyses by biomedical researchers with basic Linux skills providing additional flexibility for computational biologists to develop their own algorithms for the comparison and analysis of data. The software is freely available at: https://github.com/frankMusacchia/VarGenius.

Collapse

Meena N, Mathur P, Medicherla K, Suravajhala P. A Bioinformatics Pipeline for Whole Exome Sequencing: Overview of the Processing and Steps from Raw Data to Downstream Analysis. Bio Protoc 2018. [DOI: 10.21769/bioprotoc.2805] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open

MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants. BMC Bioinformatics 2017;18:49. [PMID: 28107819 PMCID: PMC5248509 DOI: 10.1186/s12859-016-1454-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 12/24/2016] [Indexed: 12/28/2022] Open

Abstract

Background

Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices.

Results

In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers.

Conclusions

MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.

Collapse

Hintzsche J, Kim J, Yadav V, Amato C, Robinson SE, Seelenfreund E, Shellman Y, Wisell J, Applegate A, McCarter M, Box N, Tentler J, De S, Robinson WA, Tan AC. IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples. J Am Med Inform Assoc 2016;23:721-30. [PMID: 27026619 DOI: 10.1093/jamia/ocw022] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 02/01/2016] [Indexed: 11/14/2022] Open

Abstract

OBJECTIVE

Currently, there is a disconnect between finding a patient's relevant molecular profile and predicting actionable therapeutics. Here we develop and implement the Integrating Molecular Profiles with Actionable Therapeutics (IMPACT) analysis pipeline, linking variants detected from whole-exome sequencing (WES) to actionable therapeutics.

METHODS AND MATERIALS

The IMPACT pipeline contains 4 analytical modules: detecting somatic variants, calling copy number alterations, predicting drugs against deleterious variants, and analyzing tumor heterogeneity. We tested the IMPACT pipeline on whole-exome sequencing data in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples with known EGFR mutations. We also used IMPACT to analyze melanoma patient tumor samples before treatment, after BRAF-inhibitor treatment, and after BRAF- and MEK-inhibitor treatment.

RESULTS

IMPACT Food and Drug Administration (FDA) correctly identified known EGFR mutations in the TCGA lung adenocarcinoma samples. IMPACT linked these EGFR mutations to the appropriate FDA-approved EGFR inhibitors. For the melanoma patient samples, we identified NRAS p.Q61K as an acquired resistance mutation to BRAF-inhibitor treatment. We also identified CDKN2A deletion as a novel acquired resistance mutation to BRAFi/MEKi inhibition. The IMPACT analysis pipeline predicts these somatic variants to actionable therapeutics. We observed the clonal dynamic in the tumor samples after various treatments. We showed that IMPACT not only helped in successful prioritization of clinically relevant variants but also linked these variations to possible targeted therapies.

CONCLUSION

IMPACT provides a new bioinformatics strategy to delineate candidate somatic variants and actionable therapies. This approach can be applied to other patient tumor samples to discover effective drug targets for personalized medicine.IMPACT is publicly available at http://tanlab.ucdenver.edu/IMPACT.

Collapse

Affiliation(s)

Jennifer Hintzsche Division of Medical Oncology, Department of Medicine, School of Medicine
Jihye Kim Division of Medical Oncology, Department of Medicine, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Vinod Yadav Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, School of Medicine
Carol Amato Division of Medical Oncology, Department of Medicine, School of Medicine
Steven E Robinson Division of Medical Oncology, Department of Medicine, School of Medicine
Eric Seelenfreund Division of Medical Oncology, Department of Medicine, School of Medicine
Yiqun Shellman Department of Dermatology, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Joshua Wisell Department of Pathology, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Allison Applegate Division of Medical Oncology, Department of Medicine, School of Medicine
Martin McCarter Department of Surgery, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Neil Box Department of Dermatology, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
John Tentler Division of Medical Oncology, Department of Medicine, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Subhajyoti De Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, School of Medicine Department of Biostatistics and Informatics, Colorado School of Public Health University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
William A Robinson Division of Medical Oncology, Department of Medicine, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Aik Choon Tan Division of Medical Oncology, Department of Medicine, School of Medicine Department of Biostatistics and Informatics, Colorado School of Public Health University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA

Collapse

Pandey RV, Pabinger S, Kriegner A, Weinhäusel A. MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics. PLoS One 2016;11:e0147697. [PMID: 26840129 PMCID: PMC4739551 DOI: 10.1371/journal.pone.0147697] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 01/07/2016] [Indexed: 12/20/2022] Open

Pandey RV, Pabinger S, Kriegner A, Weinhäusel A. ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research. BMC Bioinformatics 2016;17:56. [PMID: 26830926 PMCID: PMC4735967 DOI: 10.1186/s12859-016-0915-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 01/28/2016] [Indexed: 01/07/2023] Open

Abstract

Background

Traditional Sanger sequencing has been used as a gold standard method for genetic testing in clinic to perform single gene test, which has been a cumbersome and expensive method to test several genes in heterogeneous disease such as cancer. With the advent of Next Generation Sequencing technologies, which produce data on unprecedented speed in a cost effective manner have overcome the limitation of Sanger sequencing. Therefore, for the efficient and affordable genetic testing, Next Generation Sequencing has been used as a complementary method with Sanger sequencing for disease causing mutation identification and confirmation in clinical research. However, in order to identify the potential disease causing mutations with great sensitivity and specificity it is essential to ensure high quality sequencing data. Therefore, integrated software tools are lacking which can analyze Sanger and NGS data together and eliminate platform specific sequencing errors, low quality reads and support the analysis of several sample/patients data set in a single run.

Results

We have developed ClinQC, a flexible and user-friendly pipeline for format conversion, quality control, trimming and filtering of raw sequencing data generated from Sanger sequencing and three NGS sequencing platforms including Illumina, 454 and Ion Torrent. First, ClinQC convert input read files from their native formats to a common FASTQ format and remove adapters, and PCR primers. Next, it split bar-coded samples, filter duplicates, contamination and low quality sequences and generates a QC report. ClinQC output high quality reads in FASTQ format with Sanger quality encoding, which can be directly used in down-stream analysis. It can analyze hundreds of sample/patients data in a single run and generate unified output files for both Sanger and NGS sequencing data. Our tool is expected to be very useful for quality control and format conversion of Sanger and NGS data to facilitate improved downstream analysis and mutation screening.

Conclusions

ClinQC is a powerful and easy to handle pipeline for quality control and trimming in clinical research. ClinQC is written in Python with multiprocessing capability, run on all major operating systems and is available at https://sourceforge.net/projects/clinqc.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-0915-y) contains supplementary material, which is available to authorized users.

Collapse

Souilmi Y, Lancaster AK, Jung JY, Rizzo E, Hawkins JB, Powles R, Amzazi S, Ghazal H, Tonellato PJ, Wall DP. Scalable and cost-effective NGS genotyping in the cloud. BMC Med Genomics 2015;8:64. [PMID: 26470712 PMCID: PMC4608296 DOI: 10.1186/s12920-015-0134-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 09/11/2015] [Indexed: 12/20/2022] Open

Pranckevičiene E, Rančelis T, Pranculis A, Kučinskas V. Challenges in exome analysis by LifeScope and its alternative computational pipelines. BMC Res Notes 2015;8:421. [PMID: 26346699 PMCID: PMC4562342 DOI: 10.1186/s13104-015-1385-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2015] [Accepted: 08/24/2015] [Indexed: 12/22/2022] Open

Abstract

Background

Every next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope’s pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope’s pipeline versus open source tools.

Results

Sufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50 % concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope’s computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants.

Conclusions

We quantitatively supported a conclusion that Lifescope’s pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

Collapse

Bao R, Hernandez K, Huang L, Kang W, Bartom E, Onel K, Volchenboum S, Andrade J. ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification. PLoS One 2015;10:e0135800. [PMID: 26271043 PMCID: PMC4535852 DOI: 10.1371/journal.pone.0135800] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 07/27/2015] [Indexed: 12/30/2022] Open

Varghese B, Patel I, Barker A. RBioCloud: A Light-Weight Framework for Bioconductor and R-based Jobs on the Cloud. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:871-878. [PMID: 26357328 DOI: 10.1109/tcbb.2014.2361327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow. PLoS One 2015;10:e0126321. [PMID: 25942438 PMCID: PMC4420499 DOI: 10.1371/journal.pone.0126321] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 03/31/2015] [Indexed: 12/26/2022] Open

Next-generation sequencing data analysis on cloud computing. Genes Genomics 2015. [DOI: 10.1007/s13258-015-0280-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Gao X, Xu J, Starmer J. Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses. BMC Res Notes 2015;8:72. [PMID: 25889517 PMCID: PMC4376134 DOI: 10.1186/s13104-015-1027-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 02/23/2015] [Indexed: 12/26/2022] Open

Azam S, Rathore A, Shah TM, Telluri M, Amindala B, Ruperao P, Katta MAVSK, Varshney RK. An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data. PLoS One 2014;9:e101754. [PMID: 25003610 PMCID: PMC4086967 DOI: 10.1371/journal.pone.0101754] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 06/11/2014] [Indexed: 12/30/2022] Open

Abstract

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.

Collapse

Lee IH, Lee K, Hsing M, Choe Y, Park JH, Kim SH, Bohn JM, Neu MB, Hwang KB, Green RC, Kohane IS, Kong SW. Prioritizing disease-linked variants, genes, and pathways with an interactive whole-genome analysis pipeline. Hum Mutat 2014;35:537-47. [PMID: 24478219 DOI: 10.1002/humu.22520] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Accepted: 01/23/2014] [Indexed: 01/02/2023]

Dander A, Pabinger S, Sperk M, Fischer M, Stocker G, Trajanoski Z. SeqBench: integrated solution for the management and analysis of exome sequencing data. BMC Res Notes 2014;7:43. [PMID: 24444368 PMCID: PMC3898724 DOI: 10.1186/1756-0500-7-43] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Accepted: 01/14/2014] [Indexed: 11/21/2022] Open

Karczewski KJ, Fernald GH, Martin AR, Snyder M, Tatonetti NP, Dudley JT. STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud. PLoS One 2014;9:e84860. [PMID: 24454756 PMCID: PMC3893165 DOI: 10.1371/journal.pone.0084860] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/27/2013] [Indexed: 12/30/2022] Open

Dorff KC, Chambwe N, Zeno Z, Simi M, Shaknovich R, Campagne F. GobyWeb: simplified management and analysis of gene expression and DNA methylation sequencing data. PLoS One 2013;8:e69666. [PMID: 23936070 PMCID: PMC3720652 DOI: 10.1371/journal.pone.0069666] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 06/11/2013] [Indexed: 01/04/2023] Open

D'Antonio M, D'Onorio De Meo P, Paoletti D, Elmi B, Pallocca M, Sanna N, Picardi E, Pesole G, Castrignanò T. WEP: a high-performance analysis pipeline for whole-exome data. BMC Bioinformatics 2013;14 Suppl 7:S11. [PMID: 23815231 PMCID: PMC3633005 DOI: 10.1186/1471-2105-14-s7-s11] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Abstract

Background

The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics.

In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to help us understanding high-penetrance allelic variation and its relationship to phenotype. A complete WES analysis involves several steps which need to be suitably designed and arranged into an efficient pipeline.

Managing a NGS analysis pipeline and its huge amount of produced data requires non trivial IT skills and computational power.

Results

Our web resource WEP (Whole-Exome sequencing Pipeline web tool) performs a complete WES pipeline and provides easy access through interface to intermediate and final results. The WEP pipeline is composed of several steps:

1) verification of input integrity and quality checks, read trimming and filtering; 2) gapped alignment; 3) BAM conversion, sorting and indexing; 4) duplicates removal; 5) alignment optimization around insertion/deletion (indel) positions; 6) recalibration of quality scores; 7) single nucleotide and deletion/insertion polymorphism (SNP and DIP) variant calling; 8) variant annotation; 9) result storage into custom databases to allow cross-linking and intersections, statistics and much more. In order to overcome the challenge of managing large amount of data and maximize the biological information extracted from them, our tool restricts the number of final results filtering data by customizable thresholds, facilitating the identification of functionally significant variants. Default threshold values are also provided at the analysis computation completion, tuned with the most common literature work published in recent years.

Conclusions

Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and a user-friendly web interface for annotated variant visualization.

Non-IT mastered users can access through WEP to the most updated and tested WES algorithms, tuned to maximize the quality of called variants while minimizing artifacts and false positives.

The web tool is available at the following web address: http://www.caspur.it/wep

Collapse

Hong H, Zhang W, Shen J, Su Z, Ning B, Han T, Perkins R, Shi L, Tong W. Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine. SCIENCE CHINA-LIFE SCIENCES 2013;56:110-8. [PMID: 23393026 DOI: 10.1007/s11427-013-4439-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2012] [Accepted: 11/29/2012] [Indexed: 01/12/2023]

Impacts of massively parallel sequencing for genetic diagnosis of neuromuscular disorders. Acta Neuropathol 2013;125:173-85. [PMID: 23224362 DOI: 10.1007/s00401-012-1072-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2012] [Revised: 11/27/2012] [Accepted: 11/28/2012] [Indexed: 12/11/2022]

Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform 2013;15:256-78. [PMID: 23341494 PMCID: PMC3956068 DOI: 10.1093/bib/bbs086] [Citation(s) in RCA: 335] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Charoentong P, Angelova M, Efremova M, Gallasch R, Hackl H, Galon J, Trajanoski Z. Bioinformatics for cancer immunology and immunotherapy. Cancer Immunol Immunother 2012;61:1885-903. [PMID: 22986455 PMCID: PMC3493665 DOI: 10.1007/s00262-012-1354-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 09/04/2012] [Indexed: 01/24/2023]