1
|
Sarwal V, Niehus S, Ayyala R, Kim M, Sarkar A, Chang S, Lu A, Rajkumar N, Darfci-Maher N, Littman R, Chhugani K, Soylev A, Comarova Z, Wesel E, Castellanos J, Chikka R, Distler MG, Eskin E, Flint J, Mangul S. A comprehensive benchmarking of WGS-based deletion structural variant callers. Brief Bioinform 2022; 23:6618239. [PMID: 35753701 DOI: 10.1093/bib/bbac221] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 04/30/2022] [Accepted: 05/11/2022] [Indexed: 01/10/2023] Open
Abstract
Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.
Collapse
Affiliation(s)
- Varuni Sarwal
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA.,Indian Institute of Technology Delhi, Hauz Khas, New Delhi, Delhi 110016, India
| | - Sebastian Niehus
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany.,Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, 10117 Berlin, Germany
| | - Ram Ayyala
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Minyoung Kim
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089
| | - Aditya Sarkar
- School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Kamand, Mandi, Himachal Pradesh 175001, India
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Angela Lu
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Neha Rajkumar
- Department of Bioengineering, Department of Bioengineering, University of California Los Angeles, Los Angeles, CA, 90095
| | - Nicholas Darfci-Maher
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Russell Littman
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Karishma Chhugani
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| | - Arda Soylev
- Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - Zoia Comarova
- Department Civil and Environmental Engineering, University of Southern California, Los Angeles, CA, United States
| | - Emily Wesel
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Jacqueline Castellanos
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Rahul Chikka
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Margaret G Distler
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, 695 Charles E. Young Drive South, Box 708822, Los Angeles, CA, 90095, USA.,Department of Computational Medicine, David Geffen School of Medicine at UCLA, 73-235 CHS, Los Angeles, CA, 90095, USA
| | - Jonathan Flint
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90095, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| |
Collapse
|
2
|
Identification of Copy Number Alterations from Next-Generation Sequencing Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:55-74. [DOI: 10.1007/978-3-030-91836-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
3
|
Biswas B, Lai Y. A distance-type measure approach to the analysis of copy number variation in DNA sequencing data. BMC Genomics 2019; 20:195. [PMID: 30967117 PMCID: PMC6456939 DOI: 10.1186/s12864-019-5491-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The next generation sequencing technology allows us to obtain a large amount of short DNA sequence (DNA-seq) reads at a genome-wide level. DNA-seq data have been increasingly collected during the recent years. Count-type data analysis is a widely used approach for DNA-seq data. However, the related data pre-processing is based on the moving window method, in which a window size need to be defined in order to obtain count-type data. Furthermore, useful information can be reduced after data pre-processing for count-type data. RESULTS In this study, we propose to analyze DNA-seq data based on the related distance-type measure. Distances are measured in base pairs (bps) between two adjacent alignments of short reads mapped to a reference genome. Our experimental data based simulation study confirms the advantages of distance-type measure approach in both detection power and detection accuracy. Furthermore, we propose artificial censoring for the distance data so that distances larger than a given value are considered potential outliers. Our purpose is to simplify the pre-processing of DNA-seq data. Statistically, we consider a mixture of right censored geometric distributions to model the distance data. Additionally, to reduce the GC-content bias, we extend the mixture model to a mixture of generalized linear models (GLMs). The estimation of model can be achieved by the Newton-Raphson algorithm as well as the Expectation-Maximization (E-M) algorithm. We have conducted simulations to evaluate the performance of our approach. Based on the rank based inverse normal transformation of distance data, we can obtain the related z-values for a follow-up analysis. For an illustration, an application to the DNA-seq data from a pair of normal and tumor cell lines is presented with a change-point analysis of z-values to detect DNA copy number alterations. CONCLUSION Our distance-type measure approach is novel. It does not require either a fixed or a sliding window procedure for generating count-type data. Its advantages have been demonstrated by our simulation studies and its practical usefulness has been illustrated by an experimental data application.
Collapse
Affiliation(s)
- Bipasa Biswas
- Diagnostics Devices Branch 1, FDA/CDRH/OSB-DBS, White Oak Bldg #66, Room 2222, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Yinglei Lai
- Department of Statistics and Biostatistics Center, The George Washington University, Rome Hall, 7th Floor, 801, 22nd Street NW, Washington D.C, 20052, USA.
| |
Collapse
|
4
|
do Nascimento F, Guimaraes KS. Copy Number Variations Detection: Unravelling the Problem in Tangible Aspects. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1237-1250. [PMID: 27295681 DOI: 10.1109/tcbb.2016.2576441] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In the midst of the important genomic variants associated to the susceptibility and resistance to complex diseases, Copy Number Variations (CNV) has emerged as a prevalent class of structural variation. Following the flood of next-generation sequencing data, numerous tools publicly available have been developed to provide computational strategies to identify CNV at improved accuracy. This review goes beyond scrutinizing the main approaches widely used for structural variants detection in general, including Split-Read, Paired-End Mapping, Read-Depth, and Assembly-based. In this paper, (1) we characterize the relevant technical details around the detection of CNV, which can affect the estimation of breakpoints and number of copies, (2) we pinpoint the most important insights related to GC-content and mappability biases, and (3) we discuss the paramount caveats in the tools evaluation process. The points brought out in this study emphasize common assumptions, a variety of possible limitations, valuable insights, and directions for desirable contributions to the state-of-the-art in CNV detection tools.
Collapse
|
5
|
Chakraborty C, Bandyopadhyay S, Agoramoorthy G. India's Computational Biology Growth and Challenges. Interdiscip Sci 2016; 8:263-76. [PMID: 27465042 DOI: 10.1007/s12539-016-0179-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 09/08/2015] [Accepted: 09/08/2015] [Indexed: 11/30/2022]
Abstract
India's computational science is growing swiftly due to the outburst of internet and information technology services. The bioinformatics sector of India has been transforming rapidly by creating a competitive position in global bioinformatics market. Bioinformatics is widely used across India to address a wide range of biological issues. Recently, computational researchers and biologists are collaborating in projects such as database development, sequence analysis, genomic prospects and algorithm generations. In this paper, we have presented the Indian computational biology scenario highlighting bioinformatics-related educational activities, manpower development, internet boom, service industry, research activities, conferences and trainings undertaken by the corporate and government sectors. Nonetheless, this new field of science faces lots of challenges.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Bio-informatics, School of Computer and Information Sciences, Galgotias University, Greater Noida, India
| | | | | |
Collapse
|
6
|
Yuan X, Zhang J, Yang L. IntSIM: An Integrated Simulator of Next-Generation Sequencing Data. IEEE Trans Biomed Eng 2016; 64:441-451. [PMID: 27164567 DOI: 10.1109/tbme.2016.2560939] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE Next-generation sequencing data has been widely used for DNA variant discovery and tumor study through computational tools. Effective simulation of such data with many realistic features is very necessary for testing existing tools and guiding the development of new tools. METHODS We present an integrated simulation system, IntSIM, to simulate common DNA variants and to generate sequencing reads for mixture genomes. IntSIM has three novel features in comparison with other simulation programs: 1) it is able to simulate both germline and somatic variants in the same sequence, 2) it deals with tumor purity so as to generate reads corresponding to heterogeneous genomes and also produce tumor-normal matched samples, and 3) it simulates correlations among SNPs, among CNVs/CNAs based on HMM models trained from real sequencing genomes, and can simulates broad and focal CNV/CNA events. RESULTS The simulation data of IntSIM can reflect characteristics observed from real data and are consistent with input parameters. The IntSIM software package is freely available at http://intsim.sourceforge.net/. CONCLUSION Based on a great number of experiments, IntSIM performs better than other program for some scenarios, such as simulation of heterozygous SNPs, CNVs/CNAs, and can achieve some functions that other programs cannot achieve. SIGNIFICANCE Simulation with IntSIM can be expected to evaluate performance of methods in detecting various types of variants, analyzing tumor samples, and especially providing a realistic assessment of effect of tumor purity on identification of somatic mutations.
Collapse
|
7
|
Big Data and Cancer Research. BIG DATA ANALYTICS 2016. [DOI: 10.1007/978-81-322-3628-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
8
|
Varadan V, Singh S, Nosrati A, Ravi L, Lutterbaugh J, Barnholtz-Sloan JS, Markowitz SD, Willis JE, Guda K. ENVE: a novel computational framework characterizes copy-number mutational landscapes in colorectal cancers from African American patients. Genome Med 2015; 7:69. [PMID: 26269717 PMCID: PMC4534088 DOI: 10.1186/s13073-015-0192-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 06/30/2015] [Indexed: 01/16/2023] Open
Abstract
Reliable detection of somatic copy-number alterations (sCNAs) in tumors using whole-exome sequencing (WES) remains challenging owing to technical (inherent noise) and sample-associated variability in WES data. We present a novel computational framework, ENVE, which models inherent noise in any WES dataset, enabling robust detection of sCNAs across WES platforms. ENVE achieved high concordance with orthogonal sCNA assessments across two colorectal cancer (CRC) WES datasets, and consistently outperformed a best-in-class algorithm, Control-FREEC. We subsequently used ENVE to characterize global sCNA landscapes in African American CRCs, identifying genomic aberrations potentially associated with CRC pathogenesis in this population. ENVE is downloadable at https://github.com/ENVE-Tools/ENVE.
Collapse
Affiliation(s)
- Vinay Varadan
- Division of General Medical Sciences-Oncology, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Western Reserve University, 2103 Cornell Road, Wolstein Research Building, Cleveland, OH 44106 USA
| | - Salendra Singh
- Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Arman Nosrati
- Division of Hematology and Oncology, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Lakshmeswari Ravi
- Division of Hematology and Oncology, Case Western Reserve University, Cleveland, OH 44106 USA
| | - James Lutterbaugh
- Division of Hematology and Oncology, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Jill S Barnholtz-Sloan
- Division of General Medical Sciences-Oncology, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Sanford D Markowitz
- Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Division of Hematology and Oncology, Case Western Reserve University, Cleveland, OH 44106 USA ; Department of Medicine, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Medical Center, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Joseph E Willis
- Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Department of Medicine, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Medical Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Department of Pathology, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Kishore Guda
- Division of General Medical Sciences-Oncology, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH 44106 USA ; Department of Medicine, Case Western Reserve University, Cleveland, OH 44106 USA ; Case Western Reserve University, 2103 Cornell Road, Wolstein Research Building, Cleveland, OH 44106 USA
| |
Collapse
|
9
|
Kim J, Kim S, Nam H, Kim S, Lee D. SoloDel: a probabilistic model for detecting low-frequent somatic deletions from unmatched sequencing data. Bioinformatics 2015; 31:3105-13. [PMID: 26071141 DOI: 10.1093/bioinformatics/btv358] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 06/05/2015] [Indexed: 01/26/2023] Open
Abstract
MOTIVATION Finding somatic mutations from massively parallel sequencing data is becoming a standard process in genome-based biomedical studies. There are a number of robust methods developed for detecting somatic single nucleotide variations However, detection of somatic copy number alteration has been substantially less explored and remains vulnerable to frequently raised sampling issues: low frequency in cell population and absence of the matched control samples. RESULTS We developed a novel computational method SoloDel that accurately classifies low-frequent somatic deletions from germline ones with or without matched control samples. We first constructed a probabilistic, somatic mutation progression model that describes the occurrence and propagation of the event in the cellular lineage of the sample. We then built a Gaussian mixture model to represent the mixed population of somatic and germline deletions. Parameters of the mixture model could be estimated using the expectation-maximization algorithm with the observed distribution of read-depth ratios at the points of discordant-read based initial deletion calls. Combined with conventional structural variation caller, SoloDel greatly increased the accuracy in classifying somatic mutations. Even without control, SoloDel maintained a comparable performance in a wide range of mutated subpopulation size (10-70%). SoloDel could also successfully recall experimentally validated somatic deletions from previously reported neuropsychiatric whole-genome sequencing data. AVAILABILITY AND IMPLEMENTATION Java-based implementation of the method is available at http://sourceforge.net/projects/solodel/ CONTACT swkim@yuhs.ac or dhlee@biosoft.kaist.ac.kr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Junho Kim
- Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul 120-752, Korea, Department of Bio and Brain Engineering, KAIST, Yuseong-Gu, Daejeon 305-701, Korea
| | - Sanghyeon Kim
- Stanley Brain Research Laboratory, Stanley Medical Research Institute, Rockville, MD 20850, USA and
| | - Hojung Nam
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju 500-712, Korea
| | - Sangwoo Kim
- Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul 120-752, Korea
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, Yuseong-Gu, Daejeon 305-701, Korea
| |
Collapse
|
10
|
Glusman G, Severson A, Dhankani V, Robinson M, Farrah T, Mauldin DE, Stittrich AB, Ament SA, Roach JC, Brunkow ME, Bodian DL, Vockley JG, Shmulevich I, Niederhuber JE, Hood L. Identification of copy number variants in whole-genome data using Reference Coverage Profiles. Front Genet 2015; 6:45. [PMID: 25741365 PMCID: PMC4330915 DOI: 10.3389/fgene.2015.00045] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 01/30/2015] [Indexed: 12/20/2022] Open
Abstract
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Dale L Bodian
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Joseph G Vockley
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | | | - John E Niederhuber
- Inova Translational Medicine Institute, Inova Health System Falls Church, VA, USA
| | - Leroy Hood
- Institute for Systems Biology Seattle, WA, USA
| |
Collapse
|
11
|
Alkodsi A, Louhimo R, Hautaniemi S. Comparative analysis of methods for identifying somatic copy number alterations from deep sequencing data. Brief Bioinform 2014; 16:242-54. [DOI: 10.1093/bib/bbu004] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
|
12
|
SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data. BMC Bioinformatics 2014; 15:40. [PMID: 24495296 PMCID: PMC3926339 DOI: 10.1186/1471-2105-15-40] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 02/03/2014] [Indexed: 12/30/2022] Open
Abstract
Background The rapid advancements in the field of genome sequencing are aiding our understanding on many biological systems. In the last five years, computational biologists and bioinformatics specialists have come up with newer, better and more efficient tools towards the discovery, analysis and interpretation of different genomic variants from high-throughput sequencing data. Availability of reliable simulated dataset is essential and is the first step towards testing any newly developed analytical tools for variant discovery. Although there are tools currently available that can simulate variants, none present the possibility of simulating all the three major types of variations (Single Nucleotide Polymorphisms, Insertions and Deletions and Copy Number Variations) and can generate reads taking a realistic error-model into consideration. Therefore, an efficient simulator and read generator is needed that can simulate variants taking the error rates of true biological samples into consideration. Results We report SInC (Snp, Indel and Cnv) an open-source variant simulator and read generator capable of simulating all the three common types of biological variants taking into account a distribution of base quality score from a most commonly used next-generation sequencing instrument from Illumina. SInC is capable of generating single- and paired-end reads with user-defined insert size and with high efficiency compared to the other existing tools. SInC, due to its multi-threaded capability during read generation, has a low time footprint. SInC is currently optimised to work in limited infrastructure setup and can efficiently exploit the commonly used quad-core desktop architecture to simulate short sequence reads with deep coverage for large genomes. Conclusions We have come up with a user-friendly multi-variant simulator and read-generator tools called SInC. SInC can be downloaded from
http://sourceforge.net/projects/sincsimulator.
Collapse
|