301
|
Lizardi PM, Forloni M, Wajapeyee N. Genome-wide approaches for cancer gene discovery. Trends Biotechnol 2011; 29:558-68. [PMID: 21757246 DOI: 10.1016/j.tibtech.2011.06.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Revised: 05/20/2011] [Accepted: 06/06/2011] [Indexed: 11/30/2022]
Abstract
One of the central aims of cancer research is to identify and characterize cancer-causing alterations in cancer genomes. In recent years, unprecedented advances in genome-wide sequencing, functional genomics technologies for RNA interference screens and methods for evaluating three-dimensional chromatin organization in vivo have resulted in important discoveries regarding human cancer. The cancer-causing genes identified from these new genome-wide technologies have also provided opportunities for effective and personalized cancer therapy. In this review, we describe some of the most recent technologies for cancer gene discovery. We also provide specific examples in which these technologies have proven remarkably successful in uncovering important cancer-causing alterations.
Collapse
Affiliation(s)
- Paul M Lizardi
- Department of Pathology, Yale University School of Medicine, New Haven, CT 06520-8023, USA
| | | | | |
Collapse
|
302
|
Pamphlett R, Morahan JM. Copy number imbalances in blood and hair in monozygotic twins discordant for amyotrophic lateral sclerosis. J Clin Neurosci 2011; 18:1231-4. [PMID: 21741244 DOI: 10.1016/j.jocn.2010.12.049] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2010] [Revised: 06/25/2010] [Accepted: 12/12/2010] [Indexed: 12/13/2022]
Abstract
Chromosomal copy number association studies in patients with amyotrophic lateral sclerosis (ALS) using blood DNA have so far been inconclusive. We employed genome-wide screening to look for copy number imbalances (CNIs) between blood and hair DNA from three ALS-discordant monozygotic twin pairs and two phenotypically normal monozygotic twin pairs. Genome-wide chromosomal copy number was estimated using AffyMetrix 6.0 GeneChips. CNIs were sought both between twin pairs and between blood and hair DNA from the same individuals. Two blood CNIs were found in one ALS-discordant twin pair. In another ALS-discordant twin pair, seven hair CNIs were detected. CNIs were also found between blood and hair in three individuals. Imbalances in blood copy number appear to be rare in monozygotic twin pairs, but hair may harbour more CNIs than blood. Copy number differences between blood and hair from the same individuals appear to be common. Since brain and hair share a common ectodermal origin, hair may be a more suitable tissue than blood to estimate somatic copy number variation in the brain.
Collapse
Affiliation(s)
- Roger Pamphlett
- The Stacey Motor Neuron Disease Laboratory, Department of Pathology D06, University of Sydney, Sydney, New South Wales 2006, Australia.
| | | |
Collapse
|
303
|
Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proc Natl Acad Sci U S A 2011; 108:10249-54. [PMID: 21646520 DOI: 10.1073/pnas.1107739108] [Citation(s) in RCA: 180] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.
Collapse
|
304
|
Woollard PM, Mehta NA, Vamathevan JJ, Van Horn S, Bonde BK, Dow DJ. The application of next-generation sequencing technologies to drug discovery and development. Drug Discov Today 2011; 16:512-9. [DOI: 10.1016/j.drudis.2011.03.006] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2010] [Revised: 02/24/2011] [Accepted: 03/17/2011] [Indexed: 12/17/2022]
|
305
|
Abstract
Advances in whole genome amplification and next-generation sequencing methods have enabled genomic analyses of single cells, and these techniques are now beginning to be used to detect genomic lesions in individual cancer cells. Previous approaches have been unable to resolve genomic differences in complex mixtures of cells, such as heterogeneous tumors, despite the importance of characterizing such tumors for cancer treatment. Sequencing of single cells is likely to improve several aspects of medicine, including the early detection of rare tumor cells, monitoring of circulating tumor cells (CTCs), measuring intratumor heterogeneity, and guiding chemotherapy. In this review we discuss the challenges and technical aspects of single-cell sequencing, with a strong focus on genomic copy number, and discuss how this information can be used to diagnose and treat cancer patients.
Collapse
|
306
|
Abstract
High-throughput tools for nucleic acid characterization now provide the means to conduct comprehensive analyses of all somatic alterations in the cancer genomes. Both large-scale and focused efforts have identified new targets of translational potential. The deluge of information that emerges from these genome-scale investigations has stimulated a parallel development of new analytical frameworks and tools. The complexity of somatic genomic alterations in cancer genomes also requires the development of robust methods for the interrogation of the function of genes identified by these genomics efforts. Here we provide an overview of the current state of cancer genomics, appraise the current portals and tools for accessing and analyzing cancer genomic data, and discuss emerging approaches to exploring the functions of somatically altered genes in cancer.
Collapse
Affiliation(s)
- Lynda Chin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.
| | | | | | | |
Collapse
|
307
|
Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 2011; 12:R41. [PMID: 21527027 PMCID: PMC3218867 DOI: 10.1186/gb-2011-12-4-r41] [Citation(s) in RCA: 2306] [Impact Index Per Article: 177.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Revised: 02/14/2011] [Accepted: 04/28/2011] [Indexed: 12/18/2022] Open
Abstract
We describe methods with enhanced power and specificity to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, we improve the estimation of background rates for each category. We additionally describe a probabilistic method for defining the boundaries of selected-for SCNA regions with user-defined confidence. Here we detail this revised computational approach, GISTIC2.0, and validate its performance in real and simulated datasets.
Collapse
Affiliation(s)
- Craig H Mermel
- Cancer Program, The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA
| | | | | | | | | | | |
Collapse
|
308
|
Ritz A, Paris PL, Ittmann MM, Collins C, Raphael BJ. Detection of recurrent rearrangement breakpoints from copy number data. BMC Bioinformatics 2011; 12:114. [PMID: 21510904 PMCID: PMC3112242 DOI: 10.1186/1471-2105-12-114] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2010] [Accepted: 04/21/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variants (CNVs), including deletions, amplifications, and other rearrangements, are common in human and cancer genomes. Copy number data from array comparative genome hybridization (aCGH) and next-generation DNA sequencing is widely used to measure copy number variants. Comparison of copy number data from multiple individuals reveals recurrent variants. Typically, the interior of a recurrent CNV is examined for genes or other loci associated with a phenotype. However, in some cases, such as gene truncations and fusion genes, the target of variant lies at the boundary of the variant. RESULTS We introduce Neighborhood Breakpoint Conservation (NBC), an algorithm for identifying rearrangement breakpoints that are highly conserved at the same locus in multiple individuals. NBC detects recurrent breakpoints at varying levels of resolution, including breakpoints whose location is exactly conserved and breakpoints whose location varies within a gene. NBC also identifies pairs of recurrent breakpoints such as those that result from fusion genes. We apply NBC to aCGH data from 36 primary prostate tumors and identify 12 novel rearrangements, one of which is the well-known TMPRSS2-ERG fusion gene. We also apply NBC to 227 glioblastoma tumors and predict 93 novel rearrangements which we further classify as gene truncations, germline structural variants, and fusion genes. A number of these variants involve the protein phosphatase PTPN12 suggesting that deregulation of PTPN12, via a variety of rearrangements, is common in glioblastoma. CONCLUSIONS We demonstrate that NBC is useful for detection of recurrent breakpoints resulting from copy number variants or other structural variants, and in particular identifies recurrent breakpoints that result in gene truncations or fusion genes. Software is available at http://http.//cs.brown.edu/people/braphael/software.html.
Collapse
Affiliation(s)
- Anna Ritz
- Department of Computer Science, Brown University, Providence, RI, USA.
| | | | | | | | | |
Collapse
|
309
|
He D, Hormozdiari F, Furlotte N, Eskin E. Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions. Bioinformatics 2011; 27:1513-20. [PMID: 21505028 DOI: 10.1093/bioinformatics/btr169] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
MOTIVATION Structural variations and in particular copy number variations (CNVs) have dramatic effects of disease and traits. Technologies for identifying CNVs have been an active area of research for over 10 years. The current generation of high-throughput sequencing techniques presents new opportunities for identification of CNVs. Methods that utilize these technologies map sequencing reads to a reference genome and look for signatures which might indicate the presence of a CNV. These methods work well when CNVs lie within unique genomic regions. However, the problem of CNV identification and reconstruction becomes much more challenging when CNVs are in repeat-rich regions, due to the multiple mapping positions of the reads. RESULTS In this study, we propose an efficient algorithm to handle these multi-mapping reads such that the CNVs can be reconstructed with high accuracy even for repeat-rich regions. To our knowledge, this is the first attempt to both identify and reconstruct CNVs in repeat-rich regions. Our experiments show that our method is not only computationally efficient but also accurate.
Collapse
Affiliation(s)
- Dan He
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA
| | | | | | | |
Collapse
|
310
|
Casbon JA, Osborne RJ, Brenner S, Lichtenstein CP. A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res 2011; 39:e81. [PMID: 21490082 PMCID: PMC3130290 DOI: 10.1093/nar/gkr217] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Amplification by polymerase chain reaction is often used in the preparation of template DNA molecules for next-generation sequencing. Amplification increases the number of available molecules for sequencing but changes the representation of the template molecules in the amplified product and introduces random errors. Such changes in representation hinder applications requiring accurate quantification of template molecules, such as allele calling or estimation of microbial diversity. We present a simple method to count the number of template molecules using degenerate bases and show that it improves genotyping accuracy and removes noise from PCR amplification. This method can be easily added to existing DNA library preparation techniques and can improve the accuracy of variant calling.
Collapse
Affiliation(s)
- James A Casbon
- Population Genetics Technologies Ltd., Babraham Institute, Babraham, Cambridgeshire CB22 3AT, UK
| | | | | | | |
Collapse
|
311
|
Nord AS, Lee M, King MC, Walsh T. Accurate and exact CNV identification from targeted high-throughput sequence data. BMC Genomics 2011; 12:184. [PMID: 21486468 PMCID: PMC3088570 DOI: 10.1186/1471-2164-12-184] [Citation(s) in RCA: 156] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Accepted: 04/12/2011] [Indexed: 11/23/2022] Open
Abstract
Background Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data. Results Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate. Conclusions Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.
Collapse
Affiliation(s)
- Alex S Nord
- Department of Genome Sciences, University of Washington, Seattle, 98195-7720, USA.
| | | | | | | |
Collapse
|
312
|
Winslow MM, Dayton TL, Verhaak RGW, Kim-Kiselak C, Snyder EL, Feldser DM, Hubbard DD, DuPage MJ, Whittaker CA, Hoersch S, Yoon S, Crowley D, Bronson RT, Chiang DY, Meyerson M, Jacks T. Suppression of lung adenocarcinoma progression by Nkx2-1. Nature 2011; 473:101-4. [PMID: 21471965 PMCID: PMC3088778 DOI: 10.1038/nature09881] [Citation(s) in RCA: 330] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2010] [Accepted: 01/31/2011] [Indexed: 01/17/2023]
Abstract
Despite the high prevalence and poor outcome of patients with metastatic lung cancer, the mechanisms of tumour progression and metastasis remain largely uncharacterized. We modelled human lung adenocarcinoma, which frequently harbours activating point mutations in KRAS1 and inactivation of the p53-pathway2, using conditional alleles in mice3–5. Lentiviral-mediated somatic activation of oncogenic Kras and deletion of p53 in the lung epithelial cells of KrasLSL-G12D/+;p53flox/flox mice initiates lung adenocarcinoma development4. Although tumours are initiated synchronously by defined genetic alterations, only a subset become malignant, suggesting that disease progression requires additional alterations. Identification of the lentiviral integration sites allowed us to distinguish metastatic from non-metastatic tumours and determine the gene expression alterations that distinguish these tumour types. Cross-species analysis identified the NK-2 related homeobox transcription factor Nkx2-1 (Ttf-1/Titf1) as a candidate suppressor of malignant progression. In this mouse model, Nkx2-1-negativity is pathognomonic of high-grade poorly differentiated tumours. Gain-and loss-of-function experiments in cells derived from metastatic and non-metastatic tumours demonstrated that Nkx2-1 controls tumour differentiation and limits metastatic potential in vivo. Interrogation of Nkx2-1 regulated genes, analysis of tumours at defined developmental stages, and functional complementation experiments indicate that Nkx2-1 constrains tumours in part by repressing the embryonically-restricted chromatin regulator Hmga2. While focal amplification of NKX2-1 in a fraction of human lung adenocarcinomas has focused attention on its oncogenic function6–9, our data specifically link Nkx2-1 downregulation to loss of differentiation, enhanced tumour seeding ability, and increased metastatic proclivity. Thus, the oncogenic and suppressive functions of Nkx2-1 in the same tumour type substantiate its role as a dual function lineage factor.
Collapse
Affiliation(s)
- Monte M Winslow
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
313
|
Tumour evolution inferred by single-cell sequencing. Nature 2011; 472:90-4. [PMID: 21399628 DOI: 10.1038/nature09807] [Citation(s) in RCA: 1866] [Impact Index Per Article: 143.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 01/07/2011] [Indexed: 12/13/2022]
Abstract
Genomic analysis provides insights into the role of copy number variation in disease, but most methods are not designed to resolve mixed populations of cells. In tumours, where genetic heterogeneity is common, very important information may be lost that would be useful for reconstructing evolutionary history. Here we show that with flow-sorted nuclei, whole genome amplification and next generation sequencing we can accurately quantify genomic copy number within an individual nucleus. We apply single-nucleus sequencing to investigate tumour population structure and evolution in two human breast cancer cases. Analysis of 100 single cells from a polygenomic tumour revealed three distinct clonal subpopulations that probably represent sequential clonal expansions. Additional analysis of 100 single cells from a monogenomic primary tumour and its liver metastasis indicated that a single clonal expansion formed the primary tumour and seeded the metastasis. In both primary tumours, we also identified an unexpectedly abundant subpopulation of genetically diverse 'pseudodiploid' cells that do not travel to the metastatic site. In contrast to gradual models of tumour progression, our data indicate that tumours grow by punctuated clonal expansions with few persistent intermediates.
Collapse
|
314
|
Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet 2011; 12:363-76. [PMID: 21358748 DOI: 10.1038/nrg2958] [Citation(s) in RCA: 963] [Impact Index Per Article: 74.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Comparisons of human genomes show that more base pairs are altered as a result of structural variation - including copy number variation - than as a result of point mutations. Here we review advances and challenges in the discovery and genotyping of structural variation. The recent application of massively parallel sequencing methods has complemented microarray-based methods and has led to an exponential increase in the discovery of smaller structural-variation events. Some global discovery biases remain, but the integration of experimental and computational approaches is proving fruitful for accurate characterization of the copy, content and structure of variable regions. We argue that the long-term goal should be routine, cost-effective and high quality de novo assembly of human genomes to comprehensively assess all classes of structural variation.
Collapse
Affiliation(s)
- Can Alkan
- Department of Genome Sciences, University of Washington School of Medicine, Foege S413C, 3720 15th Ave NE, Seattle, Washington, USA
| | | | | |
Collapse
|
315
|
Mapping copy number variation by population-scale genome sequencing. Nature 2011; 470:59-65. [PMID: 21293372 PMCID: PMC3077050 DOI: 10.1038/nature09708] [Citation(s) in RCA: 821] [Impact Index Per Article: 63.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2010] [Accepted: 11/26/2010] [Indexed: 11/08/2022]
Abstract
Genomic structural variants (SVs) are abundant in humans, differing from other variation classes in extent, origin, and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (i.e., copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analyzing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
Collapse
|
316
|
Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 2011; 12:R18. [PMID: 21338519 PMCID: PMC3188800 DOI: 10.1186/gb-2011-12-2-r18] [Citation(s) in RCA: 757] [Impact Index Per Article: 58.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Revised: 12/23/2010] [Accepted: 02/21/2011] [Indexed: 01/18/2023] Open
Abstract
Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR during library preparation as a principal source of bias and optimized the conditions. Our improved protocol significantly reduces amplification bias and minimizes the previously severe effects of PCR instrument and temperature ramp rate.
Collapse
Affiliation(s)
- Daniel Aird
- Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, MA 02141, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
317
|
Magi A, Benelli M, Yoon S, Roviello F, Torricelli F. Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm. Nucleic Acids Res 2011; 39:e65. [PMID: 21321017 PMCID: PMC3105418 DOI: 10.1093/nar/gkr068] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The discovery of genomic structural variants (SVs), such as copy number variants (CNVs), is essential to understand genetic variation of human populations and complex diseases. Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome. At present, few computational methods have been developed for the analysis of DOC data and all of these methods allow to analyse only one sample at time. For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously. We test JointSLM performance on synthetic and real data and we show its unprecedented resolution that enables the detection of recurrent CNV regions as small as 500 bp in size. When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.
Collapse
Affiliation(s)
- Alberto Magi
- Laboratory Department, Diagnostic Genetic Unit, Careggi Hospital, Florence 5014, Italy.
| | | | | | | | | |
Collapse
|
318
|
Handsaker RE, Korn JM, Nemesh J, McCarroll SA. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet 2011; 43:269-76. [PMID: 21317889 PMCID: PMC5094049 DOI: 10.1038/ng.768] [Citation(s) in RCA: 242] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2010] [Accepted: 01/20/2011] [Indexed: 11/09/2022]
Abstract
Accurate and complete analysis of genome variation in large populations will be required to understand the role of genome variation in complex disease. We present an analytical framework for characterizing genome deletion polymorphism in populations using sequence data that are distributed across hundreds or thousands of genomes. Our approach uses population-level concepts to reinterpret the technical features of sequence data that often reflect structural variation. In the 1000 Genomes Project pilot, this approach identified deletion polymorphism across 168 genomes (sequenced at 4 × average coverage) with sensitivity and specificity unmatched by other algorithms. We also describe a way to determine the allelic state or genotype of each deletion polymorphism in each genome; the 1000 Genomes Project used this approach to type 13,826 deletion polymorphisms (48-995,664 bp) at high accuracy in populations. These methods offer a way to relate genome structural polymorphism to complex disease in populations.
Collapse
Affiliation(s)
- Robert E Handsaker
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | |
Collapse
|
319
|
Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 2011; 21:974-84. [PMID: 21324876 DOI: 10.1101/gr.114876.110] [Citation(s) in RCA: 1089] [Impact Index Per Article: 83.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Copy number variation (CNV) in the genome is a complex phenomenon, and not completely understood. We have developed a method, CNVnator, for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. Our method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs. We calibrated CNVnator using the extensive validation performed by the 1000 Genomes Project. Because of this, we could use CNVnator for CNV discovery and genotyping in a population and characterization of atypical CNVs, such as de novo and multi-allelic events. Overall, for CNVs accessible by RD, CNVnator has high sensitivity (86%-96%), low false-discovery rate (3%-20%), high genotyping accuracy (93%-95%), and high resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage). Furthermore, CNVnator is complementary in a straightforward way to split-read and read-pair approaches: It misses CNVs created by retrotransposable elements, but more than half of the validated CNVs that it identifies are not detected by split-read or read-pair. By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, we estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier. Moreover, among these events, we observed cases with allele distribution strongly deviating from Hardy-Weinberg equilibrium, possibly implying selection on certain complex loci. Finally, by combining discovery and genotyping, we identified six potential de novo CNVs in two family trios.
Collapse
Affiliation(s)
- Alexej Abyzov
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA.
| | | | | | | |
Collapse
|
320
|
Perne A, Zhang X, Lehmann L, Groth M, Stuber F, Book M. Comparison of multiplex ligation-dependent probe amplification and real-time PCR accuracy for gene copy number quantification using the beta-defensin locus. Biotechniques 2011; 47:1023-8. [PMID: 20041854 DOI: 10.2144/000113300] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The reliable quantification of gene copy number variations is a precondition for future investigations regarding their functional relevance. To date, there is no generally accepted gold standard method for copy number quantification, and methods in current use have given inconsistent results in selected cohorts. In this study, we compare two methods for copy number quantification. beta-defensin gene copy numbers were determined in parallel in 80 genomic DNA samples by real-time PCR and multiplex ligation-dependent probe amplification (MLPA). The pyrosequencing-based paralog ratio test (PPRT) was used as a standard of comparison in 79 out of 80 samples. Realtime PCR and MPLA results confirmed concordant DEFB4, DEFB103A, and DEFB104A copy numbers within samples. These two methods showed identical results in 32 out of 80 samples; 29 of these 32 samples comprised four or fewer copies. The coefficient of variation of MLPA is lower compared with PCR. In addition, the consistency between MLPA and PPRT is higher than either PCR/MLPA or PCR/PPRT consistency. In summary, these results suggest that MLPA is superior to real-time PCR in beta-defensin copy number quantification.
Collapse
Affiliation(s)
- Andrea Perne
- Department of Anaesthesiology and Intensive Care Medicine, University Hospital Bonn, Bonn, Germany
| | | | | | | | | | | |
Collapse
|
321
|
Miller CA, Hampton O, Coarfa C, Milosavljevic A. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS One 2011; 6:e16327. [PMID: 21305028 PMCID: PMC3031566 DOI: 10.1371/journal.pone.0016327] [Citation(s) in RCA: 149] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2010] [Accepted: 12/10/2010] [Indexed: 11/18/2022] Open
Abstract
Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at http://code.google.com/p/readdepth/.
Collapse
Affiliation(s)
- Christopher A. Miller
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Oliver Hampton
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Cristian Coarfa
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Aleksandar Milosavljevic
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
322
|
Xi R, Kim TM, Park PJ. Detecting structural variations in the human genome using next generation sequencing. Brief Funct Genomics 2011; 9:405-15. [PMID: 21216738 DOI: 10.1093/bfgp/elq025] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Structural variations are widespread in the human genome and can serve as genetic markers in clinical and evolutionary studies. With the advances in the next-generation sequencing technology, recent methods allow for identification of structural variations with unprecedented resolution and accuracy. They also provide opportunities to discover variants that could not be detected on conventional microarray-based platforms, such as dosage-invariant chromosomal translocations and inversions. In this review, we will describe some of the sequencing-based algorithms for detection of structural variations and discuss the key issues in future development.
Collapse
Affiliation(s)
- Ruibin Xi
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | | | | |
Collapse
|
323
|
Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 2011. [PMID: 21338519 DOI: 10.1186/1465-6906-12-s1-i18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023] Open
Abstract
Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR during library preparation as a principal source of bias and optimized the conditions. Our improved protocol significantly reduces amplification bias and minimizes the previously severe effects of PCR instrument and temperature ramp rate.
Collapse
Affiliation(s)
- Daniel Aird
- Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, MA 02141, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
324
|
He D, Furlotte N, Eskin E. Detection and reconstruction of tandemly organized de novo copy number variations. BMC Bioinformatics 2010; 11 Suppl 11:S12. [PMID: 21172047 PMCID: PMC3024866 DOI: 10.1186/1471-2105-11-s11-s12] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background The characterization of structural variations (SV) such as insertions, deletions and copy number variations is a critical step in the process of understanding the full genetic architecture of organisms. Copy number variations (CNV) have attracted much recent attention due to their effects on gene expression and disease status. Results In this paper, we present a method that utilizes next-generation sequencing technologies (NGS), in order to both detect and reconstruct CNVs. We focus on a special type of CNV, namely tandemly organized de novo CNVs, which have been shown to occur with high frequency in the mouse genome. Conclusions We apply our method to CNV regions randomly inserted into the reference mouse genome and show that our method achieves good performance for both detection and reconstruction of tandemly organized de novo CNVs.
Collapse
Affiliation(s)
- Dan He
- Dept, of Comp, Sci, Univ, of California Los Angeles, Los Angeles, CA 90095, USA.
| | | | | |
Collapse
|
325
|
Zhao Q, Kirkness EF, Caballero OL, Galante PA, Parmigiani RB, Edsall L, Kuan S, Ye Z, Levy S, Vasconcelos ATR, Ren B, de Souza SJ, Camargo AA, Simpson AJG, Strausberg RL. Systematic detection of putative tumor suppressor genes through the combined use of exome and transcriptome sequencing. Genome Biol 2010; 11:R114. [PMID: 21108794 PMCID: PMC3156953 DOI: 10.1186/gb-2010-11-11-r114] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Revised: 09/27/2010] [Accepted: 11/25/2010] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND To identify potential tumor suppressor genes, genome-wide data from exome and transcriptome sequencing were combined to search for genes with loss of heterozygosity and allele-specific expression. The analysis was conducted on the breast cancer cell line HCC1954, and a lymphoblast cell line from the same individual, HCC1954BL. RESULTS By comparing exome sequences from the two cell lines, we identified loss of heterozygosity events at 403 genes in HCC1954 and at one gene in HCC1954BL. The combination of exome and transcriptome sequence data also revealed 86 and 50 genes with allele specific expression events in HCC1954 and HCC1954BL, which comprise 5.4% and 2.6% of genes surveyed, respectively. Many of these genes identified by loss of heterozygosity and allele-specific expression are known or putative tumor suppressor genes, such as BRCA1, MSH3 and SETX, which participate in DNA repair pathways. CONCLUSIONS Our results demonstrate that the combined application of high throughput sequencing to exome and allele-specific transcriptome analysis can reveal genes with known tumor suppressor characteristics, and a shortlist of novel candidates for the study of tumor suppressor activities.
Collapse
Affiliation(s)
- Qi Zhao
- Ludwig Collaborative Group, Department of Neurosurgery, Johns Hopkins University, 1550 Orleans Street, Baltimore, MD 21231, USA
| | - Ewen F Kirkness
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA
| | - Otavia L Caballero
- Ludwig Collaborative Group, Department of Neurosurgery, Johns Hopkins University, 1550 Orleans Street, Baltimore, MD 21231, USA
| | - Pedro A Galante
- Ludwig Institute for Cancer Research, São Paulo Branch at Hospital Alemão Oswaldo Cruz, Rua João Julião 245, 01323-903 São Paulo, Brazil
| | - Raphael B Parmigiani
- Ludwig Institute for Cancer Research, São Paulo Branch at Hospital Alemão Oswaldo Cruz, Rua João Julião 245, 01323-903 São Paulo, Brazil
| | - Lee Edsall
- Ludwig Institute for Cancer Research, San Diego Branch, 9500 Gilman Drive, La Jolla, CA 92093-0660, USA
| | - Samantha Kuan
- Ludwig Institute for Cancer Research, San Diego Branch, 9500 Gilman Drive, La Jolla, CA 92093-0660, USA
| | - Zhen Ye
- Ludwig Institute for Cancer Research, San Diego Branch, 9500 Gilman Drive, La Jolla, CA 92093-0660, USA
| | - Samuel Levy
- Scripps Translational Science Institute, 3344 North Torrey Pines Court, La Jolla, CA 92037, USA
| | - Ana Tereza R Vasconcelos
- Laboratório Nacional de Computação Científica, Laboratório de Bioinformática, Av. Getúlio Vargas 333, Petrópolis, RJ 25651-075, Brazil
| | - Bing Ren
- Ludwig Institute for Cancer Research, San Diego Branch, 9500 Gilman Drive, La Jolla, CA 92093-0660, USA
| | - Sandro J de Souza
- Ludwig Institute for Cancer Research, São Paulo Branch at Hospital Alemão Oswaldo Cruz, Rua João Julião 245, 01323-903 São Paulo, Brazil
| | - Anamaria A Camargo
- Ludwig Institute for Cancer Research, São Paulo Branch at Hospital Alemão Oswaldo Cruz, Rua João Julião 245, 01323-903 São Paulo, Brazil
| | - Andrew JG Simpson
- Ludwig Collaborative Group, Department of Neurosurgery, Johns Hopkins University, 1550 Orleans Street, Baltimore, MD 21231, USA
| | - Robert L Strausberg
- Ludwig Collaborative Group, Department of Neurosurgery, Johns Hopkins University, 1550 Orleans Street, Baltimore, MD 21231, USA
| |
Collapse
|
326
|
Boeva V, Zinovyev A, Bleakley K, Vert JP, Janoueix-Lerosey I, Delattre O, Barillot E. Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. ACTA ACUST UNITED AC 2010; 27:268-9. [PMID: 21081509 PMCID: PMC3018818 DOI: 10.1093/bioinformatics/btq635] [Citation(s) in RCA: 183] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Summary: We present a tool for control-free copy number alteration (CNA) detection using deep-sequencing data, particularly useful for cancer studies. The tool deals with two frequent problems in the analysis of cancer deep-sequencing data: absence of control sample and possible polyploidy of cancer cells. FREEC (control-FREE Copy number caller) automatically normalizes and segments copy number profiles (CNPs) and calls CNAs. If ploidy is known, FREEC assigns absolute copy number to each predicted CNA. To normalize raw CNPs, the user can provide a control dataset if available; otherwise GC content is used. We demonstrate that for Illumina single-end, mate-pair or paired-end sequencing, GC-contentr normalization provides smooth profiles that can be further segmented and analyzed in order to predict CNAs. Availability: Source code and sample data are available at http://bioinfo-out.curie.fr/projects/freec/. Contact:freec@curie.fr Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
327
|
Waszak SM, Hasin Y, Zichner T, Olender T, Keydar I, Khen M, Stütz AM, Schlattl A, Lancet D, Korbel JO. Systematic inference of copy-number genotypes from personal genome sequencing data reveals extensive olfactory receptor gene content diversity. PLoS Comput Biol 2010; 6:e1000988. [PMID: 21085617 PMCID: PMC2978733 DOI: 10.1371/journal.pcbi.1000988] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Accepted: 10/05/2010] [Indexed: 12/02/2022] Open
Abstract
Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95–99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ∼15% and ∼20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing. Human individual genome sequencing has recently become affordable, enabling highly detailed genetic sequence comparisons. While the identification and genotyping of single-nucleotide polymorphisms has already been successfully established for different sequencing platforms, the detection, quantification and genotyping of large-scale copy-number variants (CNVs), i.e., losses or gains of long genomic segments, has remained challenging. We present a computational approach that enables detecting CNVs in sequencing data and accurately identifies the actual copy-number at which DNA segments of interest occur in an individual genome. This approach enabled us to obtain novel insights into the largest human gene family – the olfactory receptors (ORs) – involved in smell perception. While previous studies reported an abundance of CNVs in ORs, our approach enabled us to globally identify absolute differences in OR gene counts that exist between humans. While several OR genes have very high gene counts, other ORs are found only once or are missing entirely in some individuals. The latter have a particularly high probability of influencing individual differences in the perception of smell, a question that future experimental efforts can now address. Furthermore, we observed differences in OR gene counts between populations, pointing at ORs that might contribute to population-specific differences in smell.
Collapse
Affiliation(s)
- Sebastian M. Waszak
- Department of Molecular Genetics, Crown Human Genome Center, Weizmann Institute of Science, Rehovot, Israel
- Department of Biotechnology and Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Freising, Germany
- Genome Biology Research Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Yehudit Hasin
- Department of Molecular Genetics, Crown Human Genome Center, Weizmann Institute of Science, Rehovot, Israel
| | - Thomas Zichner
- Genome Biology Research Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Tsviya Olender
- Department of Molecular Genetics, Crown Human Genome Center, Weizmann Institute of Science, Rehovot, Israel
| | - Ifat Keydar
- Department of Molecular Genetics, Crown Human Genome Center, Weizmann Institute of Science, Rehovot, Israel
| | - Miriam Khen
- Department of Molecular Genetics, Crown Human Genome Center, Weizmann Institute of Science, Rehovot, Israel
| | - Adrian M. Stütz
- Genome Biology Research Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Andreas Schlattl
- Genome Biology Research Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Doron Lancet
- Department of Molecular Genetics, Crown Human Genome Center, Weizmann Institute of Science, Rehovot, Israel
| | - Jan O. Korbel
- Genome Biology Research Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
- European Bioinformatics Institute, EMBL-EBI, Hinxton, United Kingdom
- * E-mail:
| |
Collapse
|
328
|
Hong D, Park SS, Ju YS, Kim S, Shin JY, Kim S, Yu SB, Lee WC, Lee S, Park H, Kim JI, Seo JS. TIARA: a database for accurate analysis of multiple personal genomes based on cross-technology. Nucleic Acids Res 2010; 39:D883-8. [PMID: 21051338 PMCID: PMC3013693 DOI: 10.1093/nar/gkq1101] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
High-throughput genomic technologies have been used to explore personal human genomes for the past few years. Although the integration of technologies is important for high-accuracy detection of personal genomic variations, no databases have been prepared to systematically archive genomes and to facilitate the comparison of personal genomic data sets prepared using a variety of experimental platforms. We describe here the Total Integrated Archive of Short-Read and Array (TIARA; http://tiara.gmi.ac.kr) database, which contains personal genomic information obtained from next generation sequencing (NGS) techniques and ultra-high-resolution comparative genomic hybridization (CGH) arrays. This database improves the accuracy of detecting personal genomic variations, such as SNPs, short indels and structural variants (SVs). At present, 36 individual genomes have been archived and may be displayed in the database. TIARA supports a user-friendly genome browser, which retrieves read-depths (RDs) and log2 ratios from NGS and CGH arrays, respectively. In addition, this database provides information on all genomic variants and the raw data, including short reads and feature-level CGH data, through anonymous file transfer protocol. More personal genomes will be archived as more individuals are analyzed by NGS or CGH array. TIARA provides a new approach to the accurate interpretation of personal genomes for genome research.
Collapse
Affiliation(s)
- Dongwan Hong
- Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul 110-799, Korea
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
329
|
Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing. Nat Genet 2010; 42:931-6. [PMID: 20972442 DOI: 10.1038/ng.691] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2010] [Accepted: 09/10/2010] [Indexed: 12/13/2022]
Abstract
We report the analysis of a Japanese male using high-throughput sequencing to × 40 coverage. More than 99% of the sequence reads were mapped to the reference human genome. Using a Bayesian decision method, we identified 3,132,608 single nucleotide variations (SNVs). Comparison with six previously reported genomes revealed an excess of singleton nonsense and nonsynonymous SNVs, as well as singleton SNVs in conserved non-coding regions. We also identified 5,319 deletions smaller than 10 kb with high accuracy, in addition to copy number variations and rearrangements. De novo assembly of the unmapped sequence reads generated around 3 Mb of novel sequence, which showed high similarity to non-reference human genomes and the human herpesvirus 4 genome. Our analysis suggests that considerable variation remains undiscovered in the human genome and that whole-genome sequencing is an invaluable tool for obtaining a complete understanding of human genetic variation.
Collapse
|
330
|
Ivakhno S, Royce T, Cox AJ, Evers DJ, Cheetham RK, Tavaré S. CNAseg—a novel framework for identification of copy number changes in cancer from second-generation sequencing data. Bioinformatics 2010; 26:3051-8. [DOI: 10.1093/bioinformatics/btq587] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
|
331
|
Marcinkowska M, Wong KK, Kwiatkowski DJ, Kozlowski P. Design and generation of MLPA probe sets for combined copy number and small-mutation analysis of human genes: EGFR as an example. ScientificWorldJournal 2010; 10:2003-18. [PMID: 20953551 PMCID: PMC4004796 DOI: 10.1100/tsw.2010.195] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Multiplex ligation-dependent probe amplification (MLPA) is a multiplex copy number analysis method that is routinely used to identify large mutations in many clinical and research labs. One of the most important drawbacks of the standard MLPA setup is a complicated, and therefore expensive, procedure of generating long MLPA probes. This drawback substantially limits the applicability of MLPA to those genomic regions for which ready-to-use commercial kits are available. Here we present a simple protocol for designing MLPA probe sets that are composed entirely of short oligonucleotide half-probes generated through chemical synthesis. As an example, we present the design and generation of an MLPA assay for parallel copy number and small-mutation analysis of the EGFR gene.
Collapse
Affiliation(s)
- Malgorzata Marcinkowska
- Laboratory of Cancer Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | | | | | | |
Collapse
|
332
|
Yau C, Mouradov D, Jorissen RN, Colella S, Mirza G, Steers G, Harris A, Ragoussis J, Sieber O, Holmes CC. A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome Biol 2010; 11:R92. [PMID: 20858232 PMCID: PMC2965384 DOI: 10.1186/gb-2010-11-9-r92] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2010] [Revised: 08/20/2010] [Accepted: 09/21/2010] [Indexed: 11/26/2022] Open
Abstract
We describe a statistical method for the characterization of genomic aberrations in single nucleotide polymorphism microarray data acquired from cancer genomes. Our approach allows us to model the joint effect of polyploidy, normal DNA contamination and intra-tumour heterogeneity within a single unified Bayesian framework. We demonstrate the efficacy of our method on numerous datasets including laboratory generated mixtures of normal-cancer cell lines and real primary tumours.
Collapse
Affiliation(s)
- Christopher Yau
- Department of Statistics, University of Oxford, South Parks Road, Oxford, OX1 3TG, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
333
|
Meyerson M, Gabriel S, Getz G. Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 2010; 11:685-96. [PMID: 20847746 DOI: 10.1038/nrg2841] [Citation(s) in RCA: 766] [Impact Index Per Article: 54.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
334
|
Ding L, Wendl MC, Koboldt DC, Mardis ER. Analysis of next-generation genomic data in cancer: accomplishments and challenges. Hum Mol Genet 2010; 19:R188-96. [PMID: 20843826 DOI: 10.1093/hmg/ddq391] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The application of next-generation sequencing technology has produced a transformation in cancer genomics, generating large data sets that can be analyzed in different ways to answer a multitude of questions about the genomic alterations associated with the disease. Analytical approaches can discover focused mutations such as substitutions and small insertion/deletions, large structural alterations and copy number events. As our capacity to produce such data for multiple cancers of the same type is improving, so are the demands to analyze multiple tumor genomes simultaneously growing. For example, pathway-based analyses that provide the full mutational impact on cellular protein networks and correlation analyses aimed at revealing causal relationships between genomic alterations and clinical presentations are both enabled. As the repertoire of data grows to include mRNA-seq, non-coding RNA-seq and methylation for multiple genomes, our challenge will be to intelligently integrate data types and genomes to produce a coherent picture of the genetic basis of cancer.
Collapse
Affiliation(s)
- Li Ding
- Department of Genetics, The Genome Center at Washington University School of Medicine, 4444 Forest Park Blvd., St Louis, MO 63108, USA
| | | | | | | |
Collapse
|
335
|
Magi A, Benelli M, Gozzini A, Girolami F, Torricelli F, Brandi ML. Bioinformatics for next generation sequencing data. Genes (Basel) 2010; 1:294-307. [PMID: 24710047 PMCID: PMC3954090 DOI: 10.3390/genes1020294] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2010] [Revised: 08/30/2010] [Accepted: 09/14/2010] [Indexed: 12/31/2022] Open
Abstract
The emergence of next-generation sequencing (NGS) platforms imposes increasing demands on statistical methods and bioinformatic tools for the analysis and the management of the huge amounts of data generated by these technologies. Even at the early stages of their commercial availability, a large number of softwares already exist for analyzing NGS data. These tools can be fit into many general categories including alignment of sequence reads to a reference, base-calling and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection and genome browsing. This manuscript aims to guide readers in the choice of the available computational tools that can be used to face the several steps of the data analysis workflow.
Collapse
Affiliation(s)
- Alberto Magi
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Matteo Benelli
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Alessia Gozzini
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Francesca Girolami
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Francesca Torricelli
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Maria Luisa Brandi
- Department of Internal Medicine, University of Florence Medical School, Florence, Italy.
| |
Collapse
|
336
|
Nishant KT, Wei W, Mancera E, Argueso JL, Schlattl A, Delhomme N, Ma X, Bustamante CD, Korbel JO, Gu Z, Steinmetz LM, Alani E. The baker's yeast diploid genome is remarkably stable in vegetative growth and meiosis. PLoS Genet 2010; 6:e1001109. [PMID: 20838597 PMCID: PMC2936533 DOI: 10.1371/journal.pgen.1001109] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2010] [Accepted: 08/03/2010] [Indexed: 11/18/2022] Open
Abstract
Accurate estimates of mutation rates provide critical information to analyze genome evolution and organism fitness. We used whole-genome DNA sequencing, pulse-field gel electrophoresis, and comparative genome hybridization to determine mutation rates in diploid vegetative and meiotic mutation accumulation lines of Saccharomyces cerevisiae. The vegetative lines underwent only mitotic divisions while the meiotic lines underwent a meiotic cycle every ∼20 vegetative divisions. Similar base substitution rates were estimated for both lines. Given our experimental design, these measures indicated that the meiotic mutation rate is within the range of being equal to zero to being 55-fold higher than the vegetative rate. Mutations detected in vegetative lines were all heterozygous while those in meiotic lines were homozygous. A quantitative analysis of intra-tetrad mating events in the meiotic lines showed that inter-spore mating is primarily responsible for rapidly fixing mutations to homozygosity as well as for removing mutations. We did not observe 1-2 nt insertion/deletion (in-del) mutations in any of the sequenced lines and only one structural variant in a non-telomeric location was found. However, a large number of structural variations in subtelomeric sequences were seen in both vegetative and meiotic lines that did not affect viability. Our results indicate that the diploid yeast nuclear genome is remarkably stable during the vegetative and meiotic cell cycles and support the hypothesis that peripheral regions of chromosomes are more dynamic than gene-rich central sections where structural rearrangements could be deleterious. This work also provides an improved estimate for the mutational load carried by diploid organisms.
Collapse
Affiliation(s)
- K. T. Nishant
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Wu Wei
- European Molecular Biology Laboratory, Heidelberg, Germany
| | | | - Juan Lucas Argueso
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, United States of America
| | | | | | - Xin Ma
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Carlos D. Bustamante
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Jan O. Korbel
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Zhenglong Gu
- Division of Nutritional Sciences, Cornell University, Ithaca, New York, United States of America
| | - Lars M. Steinmetz
- European Molecular Biology Laboratory, Heidelberg, Germany
- * E-mail: (LMS); (EA)
| | - Eric Alani
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
- * E-mail: (LMS); (EA)
| |
Collapse
|
337
|
Medvedev P, Fiume M, Dzamba M, Smith T, Brudno M. Detecting copy number variation with mated short reads. Genome Res 2010; 20:1613-22. [PMID: 20805290 DOI: 10.1101/gr.106344.110] [Citation(s) in RCA: 119] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in the human genome. While in the past CNVs have been detected based on array CGH data, recent studies have shown that depth-of-coverage information from HTS technologies can also be used for the reliable identification of large copy-variable regions. Such methods, however, are hindered by sequencing biases that lead certain regions of the genome to be over- or undersampled, lowering their resolution and ability to accurately identify the exact breakpoints of the variants. In this work, we develop a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where mate pairs mapping discordantly to the reference serve to indicate the presence of variation. Our algorithm, called CNVer, combines this information within a unified computational framework called the donor graph, allowing us to better mitigate the sequencing biases that cause uneven local coverage and accurately predict CNVs. We use CNVer to detect 4879 CNVs in the recently described genome of a Yoruban individual. Most of the calls (77%) coincide with previously known variants within the Database of Genomic Variants, while 81% of deletion copy number variants previously known for this individual coincide with one of our loss calls. Furthermore, we demonstrate that CNVer can reconstruct the absolute copy counts of segments of the donor genome and evaluate the feasibility of using CNVer with low coverage datasets.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science, University of Toronto, Toronto, Ontario M5R 3G4, Canada
| | | | | | | | | |
Collapse
|
338
|
Ju YS, Hong D, Kim S, Park SS, Kim S, Lee S, Park H, Kim JI, Seo JS. Reference-unbiased copy number variant analysis using CGH microarrays. Nucleic Acids Res 2010; 38:e190. [PMID: 20802225 PMCID: PMC2978381 DOI: 10.1093/nar/gkq730] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Comparative genomic hybridization (CGH) microarrays have been used to determine copy number variations (CNVs) and their effects on complex diseases. Detection of absolute CNVs independent of genomic variants of an arbitrary reference sample has been a critical issue in CGH array experiments. Whole genome analysis using massively parallel sequencing with multiple ultra-high resolution CGH arrays provides an opportunity to catalog highly accurate genomic variants of the reference DNA (NA10851). Using information on variants, we developed a new method, the CGH array reference-free algorithm (CARA), which can determine reference-unbiased absolute CNVs from any CGH array platform. The algorithm enables the removal and rescue of false positive and false negative CNVs, respectively, which appear due to the effects of genomic variants of the reference sample in raw CGH array experiments. We found that the CARA remarkably enhanced the accuracy of CGH array in determining absolute CNVs. Our method thus provides a new approach to interpret CGH array data for personalized medicine.
Collapse
Affiliation(s)
- Young Seok Ju
- Genomic Medicine Institute, Medical Research Center, Seoul National University, Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul 110-799, Korea
| | | | | | | | | | | | | | | | | |
Collapse
|
339
|
Kim TM, Luquette LJ, Xi R, Park PJ. rSW-seq: algorithm for detection of copy number alterations in deep sequencing data. BMC Bioinformatics 2010; 11:432. [PMID: 20718989 PMCID: PMC2939611 DOI: 10.1186/1471-2105-11-432] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2009] [Accepted: 08/18/2010] [Indexed: 02/05/2023] Open
Abstract
Background Recent advances in sequencing technologies have enabled generation of large-scale genome sequencing data. These data can be used to characterize a variety of genomic features, including the DNA copy number profile of a cancer genome. A robust and reliable method for screening chromosomal alterations would allow a detailed characterization of the cancer genome with unprecedented accuracy. Results We develop a method for identification of copy number alterations in a tumor genome compared to its matched control, based on application of Smith-Waterman algorithm to single-end sequencing data. In a performance test with simulated data, our algorithm shows >90% sensitivity and >90% precision in detecting a single copy number change that contains approximately 500 reads for the normal sample. With 100-bp reads, this corresponds to a ~50 kb region for 1X genome coverage of the human genome. We further refine the algorithm to develop rSW-seq, (recursive Smith-Waterman-seq) to identify alterations in a complex configuration, which are commonly observed in the human cancer genome. To validate our approach, we compare our algorithm with an existing algorithm using simulated and publicly available datasets. We also compare the sequencing-based profiles to microarray-based results. Conclusion We propose rSW-seq as an efficient method for detecting copy number changes in the tumor genome.
Collapse
Affiliation(s)
- Tae-Min Kim
- Center for Biomedical Informatics, Harvard Medical School, 10 Shattuck St, Boston, Massachusetts 02115, USA
| | | | | | | |
Collapse
|
340
|
Yin XL, Li J. Detecting copy number variations from array CGH data based on a conditional random field model. J Bioinform Comput Biol 2010; 8:295-314. [PMID: 20401947 DOI: 10.1142/s021972001000480x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Revised: 10/30/2009] [Accepted: 10/30/2009] [Indexed: 11/18/2022]
Abstract
Array comparative genomic hybridization (aCGH) allows identification of copy number alterations across genomes. The key computational challenge in analyzing copy number variations (CNVs) using aCGH data or other similar data generated by a variety of array technologies is the detection of segment boundaries of copy number changes and inference of the copy number state for each segment. We have developed a novel statistical model based on the framework of conditional random fields (CRFs) that can effectively combine data smoothing, segmentation and copy number state decoding into one unified framework. Our approach (termed CRF-CNV) provides great flexibilities in defining meaningful feature functions. Therefore, it can effectively integrate local spatial information of arbitrary sizes into the model. For model parameter estimations, we have adopted the conjugate gradient (CG) method for likelihood optimization and developed efficient forward/backward algorithms within the CG framework. The method is evaluated using real data with known copy numbers as well as simulated data with realistic assumptions, and compared with two popular publicly available programs. Experimental results have demonstrated that CRF-CNV outperforms a Bayesian Hidden Markov Model-based approach on both datasets in terms of copy number assignments. Comparing to a non-parametric approach, CRF-CNV has achieved much greater precision while maintaining the same level of recall on the real data, and their performance on the simulated data is comparable.
Collapse
Affiliation(s)
- Xiao-Lin Yin
- Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, Ohio 44106, United States.
| | | |
Collapse
|
341
|
Bae JS, Cheong HS, Park BL, Kim LH, Han CS, Park TJ, Kim JY, Pasaje CFA, Lee JS, Shin HD. Genome-wide profiling of structural genomic variations in Korean HapMap individuals. PLoS One 2010; 5:e11417. [PMID: 20625389 PMCID: PMC2896390 DOI: 10.1371/journal.pone.0011417] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 06/10/2010] [Indexed: 02/05/2023] Open
Abstract
Background Structural genomic variation study, along with microarray technology development has provided many genomic resources related with architecture of human genome, and led to the fact that human genome structure is a lot more complicated than previously thought. Methodology/Principal Findings In the case of International HapMap Project, Epstein-Barr various immortalized cell lines were preferably used over blood in order to get a larger number of genomic DNA. However, genomic aberration stemming from immortalization process, biased representation of the donor tissue, and culture process may influence the accuracy of SNP genotypes. In order to identify chromosome aberrations including loss of heterozygosity (LOH), large-scale and small-scale copy number variations, we used Illumina HumanHap500 BeadChip (555,352 markers) on Korean HapMap individuals (n = 90) to obtain Log R ratio and B allele frequency information, and then utilized the data with various programs including Illumina ChromoZone, cnvParition and PennCNV. As a result, we identified 28 LOHs (>3 mb) and 35 large-scale CNVs (>1 mb), with 4 samples having completely duplicated chromosome. In addition, after checking the sample quality (standard deviation of log R ratio <0.30), we selected 79 samples and used both signal intensity and B allele frequency simultaneously for identification of small-scale CNVs (<1 mb) to discover 4,989 small-scale CNVs. Identified CNVs in this study were successfully validated using visual examination of the genoplot images, overlapping analysis with previously reported CNVs in DGV, and quantitative PCR. Conclusion/Significance In this study, we describe the result of the identified chromosome aberrations in Korean HapMap individuals, and expect that these findings will provide more meaningful information on the human genome.
Collapse
Affiliation(s)
- Joon Seol Bae
- Laboratory of Genomic Diversity, Department of Life Science, Sogang University, Seoul, Republic of Korea
| | - Hyun Sub Cheong
- Department of Genetic Epidemiology, SNP Genetics, Inc., Seoul, Republic of Korea
| | - Byung Lae Park
- Department of Genetic Epidemiology, SNP Genetics, Inc., Seoul, Republic of Korea
| | - Lyoung Hyo Kim
- Department of Genetic Epidemiology, SNP Genetics, Inc., Seoul, Republic of Korea
| | - Chang Soo Han
- Department of Genetic Epidemiology, SNP Genetics, Inc., Seoul, Republic of Korea
| | - Tae Joon Park
- Laboratory of Genomic Diversity, Department of Life Science, Sogang University, Seoul, Republic of Korea
| | - Jason Yongha Kim
- Laboratory of Genomic Diversity, Department of Life Science, Sogang University, Seoul, Republic of Korea
| | - Charisse Flerida A. Pasaje
- Laboratory of Genomic Diversity, Department of Life Science, Sogang University, Seoul, Republic of Korea
| | - Jin Sol Lee
- Laboratory of Genomic Diversity, Department of Life Science, Sogang University, Seoul, Republic of Korea
| | - Hyoung Doo Shin
- Laboratory of Genomic Diversity, Department of Life Science, Sogang University, Seoul, Republic of Korea
- Department of Genetic Epidemiology, SNP Genetics, Inc., Seoul, Republic of Korea
- * E-mail:
| |
Collapse
|
342
|
Abstract
Integrating results from diverse experiments is an essential process in our effort to understand the logic of complex systems, such as development, homeostasis and responses to the environment. With the advent of high-throughput methods--including genome-wide association (GWA) studies, chromatin immunoprecipitation followed by sequencing (ChIP-seq) and RNA sequencing (RNA-seq)--acquisition of genome-scale data has never been easier. Epigenomics, transcriptomics, proteomics and genomics each provide an insightful, and yet one-dimensional, view of genome function; integrative analysis promises a unified, global view. However, the large amount of information and diverse technology platforms pose multiple challenges for data access and processing. This Review discusses emerging issues and strategies related to data integration in the era of next-generation genomics.
Collapse
Affiliation(s)
- R. David Hawkins
- Ludwig Institute for Cancer Research, Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, CA 92093-0653
| | - Gary C. Hon
- Ludwig Institute for Cancer Research, Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, CA 92093-0653
| | - Bing Ren
- Ludwig Institute for Cancer Research, Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, CA 92093-0653
| |
Collapse
|
343
|
Day DS, Luquette LJ, Park PJ, Kharchenko PV. Estimating enrichment of repetitive elements from high-throughput sequence data. Genome Biol 2010; 11:R69. [PMID: 20584328 PMCID: PMC2911117 DOI: 10.1186/gb-2010-11-6-r69] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2010] [Revised: 06/15/2010] [Accepted: 06/28/2010] [Indexed: 11/13/2022] Open
Abstract
We describe computational methods for analysis of repetitive elements from short-read sequencing data, and apply them to study histone modifications associated with the repetitive elements in human and mouse cells. Our results demonstrate that while accurate enrichment estimates can be obtained for individual repeat types and small sets of repeat instances, there are distinct combinatorial patterns of chromatin marks associated with major annotated repeat families, including H3K27me3/H3K9me3 differences among the endogenous retroviral element classes.
Collapse
Affiliation(s)
- Daniel S Day
- Harvard-MIT Health Sciences and Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | | | | | | |
Collapse
|
344
|
Fiume M, Williams V, Brook A, Brudno M. Savant: genome browser for high-throughput sequencing data. Bioinformatics 2010; 26:1938-44. [PMID: 20562449 DOI: 10.1093/bioinformatics/btq332] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. RESULTS We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. AVAILABILITY Savant is freely available at http://compbio.cs.toronto.edu/savant.
Collapse
Affiliation(s)
- Marc Fiume
- Department of Computer Science, University of Toronto, Ontario, Canada
| | | | | | | |
Collapse
|
345
|
Leary RJ, Kinde I, Diehl F, Schmidt K, Clouser C, Duncan C, Antipova A, Lee C, McKernan K, De La Vega FM, Kinzler KW, Vogelstein B, Diaz LA, Velculescu VE. Development of personalized tumor biomarkers using massively parallel sequencing. Sci Transl Med 2010. [PMID: 20371490 DOI: 10.1126/scitranslmed.300070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Clinical management of human cancer is dependent on the accurate monitoring of residual and recurrent tumors. The evaluation of patient-specific translocations in leukemias and lymphomas has revolutionized diagnostics for these diseases. We have developed a method, called personalized analysis of rearranged ends (PARE), which can identify translocations in solid tumors. Analysis of four colorectal and two breast cancers with massively parallel sequencing revealed an average of nine rearranged sequences (range, 4 to 15) per tumor. Polymerase chain reaction with primers spanning the breakpoints was able to detect mutant DNA molecules present at levels lower than 0.001% and readily identified mutated circulating DNA in patient plasma samples. This approach provides an exquisitely sensitive and broadly applicable approach for the development of personalized biomarkers to enhance the clinical management of cancer patients.
Collapse
Affiliation(s)
- Rebecca J Leary
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
346
|
Wood HM, Belvedere O, Conway C, Daly C, Chalkley R, Bickerdike M, McKinley C, Egan P, Ross L, Hayward B, Morgan J, Davidson L, MacLennan K, Ong TK, Papagiannopoulos K, Cook I, Adams DJ, Taylor GR, Rabbitts P. Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens. Nucleic Acids Res 2010; 38:e151. [PMID: 20525786 PMCID: PMC2919738 DOI: 10.1093/nar/gkq510] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The use of next-generation sequencing technologies to produce genomic copy number data has recently been described. Most approaches, however, reply on optimal starting DNA, and are therefore unsuitable for the analysis of formalin-fixed paraffin-embedded (FFPE) samples, which largely precludes the analysis of many tumour series. We have sought to challenge the limits of this technique with regards to quality and quantity of starting material and the depth of sequencing required. We confirm that the technique can be used to interrogate DNA from cell lines, fresh frozen material and FFPE samples to assess copy number variation. We show that as little as 5 ng of DNA is needed to generate a copy number karyogram, and follow this up with data from a series of FFPE biopsies and surgical samples. We have used various levels of sample multiplexing to demonstrate the adjustable resolution of the methodology, depending on the number of samples and available resources. We also demonstrate reproducibility by use of replicate samples and comparison with microarray-based comparative genomic hybridization (aCGH) and digital PCR. This technique can be valuable in both the analysis of routine diagnostic samples and in examining large repositories of fixed archival material.
Collapse
Affiliation(s)
- Henry M Wood
- Leeds Institute of Molecular Medicine, St James's University Hospital, Leeds, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
347
|
Bueno R, De Rienzo A, Dong L, Gordon GJ, Hercus CF, Richards WG, Jensen RV, Anwar A, Maulik G, Chirieac LR, Ho KF, Taillon BE, Turcotte CL, Hercus RG, Gullans SR, Sugarbaker DJ. Second generation sequencing of the mesothelioma tumor genome. PLoS One 2010; 5:e10612. [PMID: 20485525 PMCID: PMC2869344 DOI: 10.1371/journal.pone.0010612] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 04/01/2010] [Indexed: 12/29/2022] Open
Abstract
The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM) tumor and matched normal tissue by using a combination of sequencing-by-synthesis and pyrosequencing methodologies to a 9.6X depth of coverage. Read density analysis uncovered significant aneuploidy and numerous rearrangements. Method-dependent informatics rules, which combined the results of different sequencing platforms, were developed to identify and validate candidate mutations of multiple types. Many more tumor-specific rearrangements than point mutations were uncovered at this depth of sequencing, resulting in novel, large-scale, inter- and intra-chromosomal deletions, inversions, and translocations. Nearly all candidate point mutations appeared to be previously unknown SNPs. Thirty tumor-specific fusions/translocations were independently validated with PCR and Sanger sequencing. Of these, 15 represented disrupted gene-encoding regions, including kinases, transcription factors, and growth factors. One large deletion in DPP10 resulted in altered transcription and expression of DPP10 transcripts in a set of 53 additional MPM tumors correlated with survival. Additionally, three point mutations were observed in the coding regions of NKX6-2, a transcription regulator, and NFRKB, a DNA-binding protein involved in modulating NFKB1. Several regions containing genes such as PCBD2 and DHFR, which are involved in growth factor signaling and nucleotide synthesis, respectively, were selectively amplified in the tumor. Second-generation sequencing uncovered all types of mutations in this MPM tumor, with DNA rearrangements representing the dominant type.
Collapse
Affiliation(s)
- Raphael Bueno
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Assunta De Rienzo
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Lingsheng Dong
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Gavin J. Gordon
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | | | - William G. Richards
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Roderick V. Jensen
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | | | - Gautam Maulik
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Lucian R. Chirieac
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | | | - Bruce E. Taillon
- 454 Life Sciences, Inc., Branford, Connecticut, United States of America
| | | | | | - Steven R. Gullans
- Excel Medical Ventures, Boston, Massachusetts, United States of America
| | - David J. Sugarbaker
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| |
Collapse
|
348
|
Fadista J, Thomsen B, Holm LE, Bendixen C. Copy number variation in the bovine genome. BMC Genomics 2010; 11:284. [PMID: 20459598 PMCID: PMC2902221 DOI: 10.1186/1471-2164-11-284] [Citation(s) in RCA: 126] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2009] [Accepted: 05/06/2010] [Indexed: 12/12/2022] Open
Abstract
Background Copy number variations (CNVs), which represent a significant source of genetic diversity in mammals, have been shown to be associated with phenotypes of clinical relevance and to be causative of disease. Notwithstanding, little is known about the extent to which CNV contributes to genetic variation in cattle. Results We designed and used a set of NimbleGen CGH arrays that tile across the assayable portion of the cattle genome with approximately 6.3 million probes, at a median probe spacing of 301 bp. This study reports the highest resolution map of copy number variation in the cattle genome, with 304 CNV regions (CNVRs) being identified among the genomes of 20 bovine samples from 4 dairy and beef breeds. The CNVRs identified covered 0.68% (22 Mb) of the genome, and ranged in size from 1.7 to 2,031 kb (median size 16.7 kb). About 20% of the CNVs co-localized with segmental duplications, while 30% encompass genes, of which the majority is involved in environmental response. About 10% of the human orthologous of these genes are associated with human disease susceptibility and, hence, may have important phenotypic consequences. Conclusions Together, this analysis provides a useful resource for assessment of the impact of CNVs regarding variation in bovine health and production traits.
Collapse
Affiliation(s)
- João Fadista
- Group of Molecular Genetics and Systems Biology, Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, Aarhus University, Blichers Allé 20, DK-8830 Tjele, Denmark
| | | | | | | |
Collapse
|
349
|
Navin NE, Hicks J. Tracing the tumor lineage. Mol Oncol 2010; 4:267-83. [PMID: 20537601 DOI: 10.1016/j.molonc.2010.04.010] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Revised: 04/23/2010] [Accepted: 04/23/2010] [Indexed: 12/13/2022] Open
Abstract
Defining the pathways through which tumors progress is critical to our understanding and treatment of cancer. We do not routinely sample patients at multiple time points during the progression of their disease, and thus our research is limited to inferring progression a posteriori from the examination of a single tumor sample. Despite this limitation, inferring progression is possible because the tumor genome contains a natural history of the mutations that occur during the formation of the tumor mass. There are two approaches to reconstructing a lineage of progression: (1) inter-tumor comparisons, and (2) intra-tumor comparisons. The inter-tumor approach consists of taking single samples from large collections of tumors and comparing the complexity of the genomes to identify early and late mutations. The intra-tumor approach involves taking multiple samples from individual heterogeneous tumors to compare divergent clones and reconstruct a phylogenetic lineage. Here we discuss how these approaches can be used to interpret the current models for tumor progression. We also compare data from primary and metastatic copy number profiles to shed light on the final steps of breast cancer progression. Finally, we discuss how recent technical advances in single cell genomics will herald a new era in understanding the fundamental basis of tumor heterogeneity and progression.
Collapse
Affiliation(s)
- Nicholas E Navin
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA.
| | | |
Collapse
|
350
|
Fan HC, Quake SR. Sensitivity of noninvasive prenatal detection of fetal aneuploidy from maternal plasma using shotgun sequencing is limited only by counting statistics. PLoS One 2010; 5:e10439. [PMID: 20454671 PMCID: PMC2862719 DOI: 10.1371/journal.pone.0010439] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2010] [Accepted: 04/06/2010] [Indexed: 11/17/2022] Open
Abstract
We recently demonstrated noninvasive detection of fetal aneuploidy by shotgun sequencing cell-free DNA in maternal plasma using next-generation high throughput sequencer. However, GC bias introduced by the sequencer placed a practical limit on the sensitivity of aneuploidy detection. In this study, we describe a method to computationally remove GC bias in short read sequencing data by applying weight to each sequenced read based on local genomic GC content. We show that sensitivity is limited only by counting statistics and that sensitivity can be increased to arbitrary precision in sample containing arbitrarily small fraction of fetal DNA simply by sequencing more DNA molecules. High throughput shotgun sequencing of maternal plasma DNA should therefore enable noninvasive diagnosis of any type of fetal aneuploidy.
Collapse
Affiliation(s)
- H Christina Fan
- Department of Bioengineering, Stanford University and Howard Hughes Medical Institute, Stanford, California, United States of America
| | | |
Collapse
|