1
|
Kuhn A, Ong YM, Quake SR, Burkholder WF. Read count-based method for high-throughput allelic genotyping of transposable elements and structural variants. BMC Genomics 2015; 16:508. [PMID: 26153459 PMCID: PMC4494700 DOI: 10.1186/s12864-015-1700-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 06/15/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed. RESULTS We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate. CONCLUSIONS This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.
Collapse
Affiliation(s)
- Alexandre Kuhn
- Microfluidics Systems Biology Lab, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Proteos Building, Room #03-04, 61 Biopolis Drive, Singapore, 138673, Singapore.
| | - Yao Min Ong
- Microfluidics Systems Biology Lab, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Proteos Building, Room #03-04, 61 Biopolis Drive, Singapore, 138673, Singapore.
| | - Stephen R Quake
- Depts. of Bioengineering and Applied Physics and Howard Hughes Medical Institute, Stanford University, Clark Center, Room E300, 318 Campus Drive, Stanford, CA, 94305, USA. .,Visiting Investigator, Institute of Molecular and Cell Biology, A*STAR, Singapore, 138673, Singapore.
| | - William F Burkholder
- Microfluidics Systems Biology Lab, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Proteos Building, Room #03-04, 61 Biopolis Drive, Singapore, 138673, Singapore.
| |
Collapse
|
2
|
Cicconardi F, Chillemi G, Tramontano A, Marchitelli C, Valentini A, Ajmone-Marsan P, Nardone A. Massive screening of copy number population-scale variation in Bos taurus genome. BMC Genomics 2013; 14:124. [PMID: 23442185 PMCID: PMC3618309 DOI: 10.1186/1471-2164-14-124] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Accepted: 02/11/2013] [Indexed: 12/13/2022] Open
Abstract
Background Copy number variations (CNVs) represent a significant source of genomic structural variation. Their length ranges from approximately one hundred to millions of base pair. Genome-wide screenings have clarified that CNVs are a ubiquitous phenomenon affecting essentially the whole genome. Although Bos taurus is one of the most important domestic animal species worldwide and one of the most studied ruminant models for metabolism, reproduction, and disease, relatively few studies have investigated CNVs in cattle and little is known about how CNVs contribute to normal phenotypic variation and to disease susceptibility in this species, compared to humans and other model organisms. Results Here we characterize and compare CNV profiles in 2654 animals from five dairy and beef Bos taurus breeds, using the Illumina BovineSNP50 genotyping array (54001 SNP probes). In this study we applied the two most commonly used algorithms for CNV discovery (QuantiSNP and PennCNV) and identified 4830 unique candidate CNVs belonging to 326 regions. These regions overlap with 5789 known genes, 76.7% of which are significantly co-localized with segmental duplications (SD). Conclusions This large scale screening significantly contributes to the enrichment of the Bos taurus CNV map, demonstrates the ubiquity, great diversity and complexity of this type of genomic variation and sets the basis for testing the influence of CNVs on Bos taurus complex functional and production traits.
Collapse
Affiliation(s)
- Francesco Cicconardi
- Department for innovation in biological, agro-food and forest systems, University of Tuscia, via de Lellis, Viterbo 01100, Italy.
| | | | | | | | | | | | | |
Collapse
|
3
|
Wu ZJ, Jin W. [Copy-number variation: a new pattern of structural diversity in genome]. YI CHUAN = HEREDITAS 2009; 31:339-47. [PMID: 19586885 DOI: 10.3724/sp.j.1005.2009.00339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Copy number variation (CNV) is increasingly recognized as a source of inter-individual differences in genome sequence and has been proposed as a driving force for genome evolution and phenotypic variation. Many CNVs resulted in different levels of gene expression, which may account for a significant proportion of normal phenotypic variation and human diseases. This review unveiled the research process and study strategy of CNVs. Subsequently, the potential mechanisms of CNV formation and its clinical implications were discussed. In addition, the first-generation copy number variation map of the human genome was introduced, which demonstrated that DNA copy number variation was associated with specific chromosomal rearrangements and genomic disorders.
Collapse
Affiliation(s)
- Zhi-Jun Wu
- Department of Cardiology, Rui Jin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, China.
| | | |
Collapse
|
4
|
Mefford HC, Cooper GM, Zerr T, Smith JD, Baker C, Shafer N, Thorland EC, Skinner C, Schwartz CE, Nickerson DA, Eichler EE. A method for rapid, targeted CNV genotyping identifies rare variants associated with neurocognitive disease. Genome Res 2009; 19:1579-85. [PMID: 19506092 DOI: 10.1101/gr.094987.109] [Citation(s) in RCA: 109] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Copy-number variants (CNVs) are substantial contributors to human disease. A central challenge in CNV-disease association studies is to characterize the pathogenicity of rare and possibly incompletely penetrant events, which requires the accurate detection of rare CNVs in large numbers of individuals. Cost and throughput issues limit our ability to perform these studies. We have adapted the Illumina BeadXpress SNP genotyping assay and developed an algorithm, SNP-Conditional OUTlier detection (SCOUT), to rapidly and accurately detect both rare and common CNVs in large cohorts. This approach is customizable, cost effective, highly parallelized, and largely automated. We applied this method to screen 69 loci in 1105 children with unexplained intellectual disability, identifying pathogenic variants in 3.1% of these individuals and potentially pathogenic variants in an additional 2.3%. We identified seven individuals (0.7%) with a deletion of 16p11.2, which has been previously associated with autism. Our results widen the phenotypic spectrum of these deletions to include intellectual disability without autism. We also detected 1.65-3.4 Mbp duplications at 16p13.11 in 1.1% of affected individuals and 350 kbp deletions at 15q11.2, near the Prader-Willi/Angelman syndrome critical region, in 0.8% of affected individuals. Compared to published CNVs in controls they are significantly (P = 4.7 x 10(-5) and 0.003, respectively) enriched in these children, supporting previously published hypotheses that they are neurocognitive disease risk factors. More generally, this approach offers a previously unavailable balance between customization, cost, and throughput for analysis of CNVs and should prove valuable for targeted CNV detection in both research and diagnostic settings.
Collapse
Affiliation(s)
- Heather C Mefford
- Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Li W, Lee A, Gregersen PK. Copy-number-variation and copy-number-alteration region detection by cumulative plots. BMC Bioinformatics 2009; 10 Suppl 1:S67. [PMID: 19208171 PMCID: PMC2648736 DOI: 10.1186/1471-2105-10-s1-s67] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background Regions with copy number variations (in germline cells) or copy number alteration (in somatic cells) are of great interest for human disease gene mapping and cancer studies. They represent a new type of mutation and are larger-scaled than the single nucleotide polymorphisms. Using genotyping microarray for copy number variation detection has become standard, and there is a need for improving analysis methods. Results We apply the cumulative plot to the detection of regions with copy number variation/alteration, on samples taken from a chronic lymphocytic leukemia patient. Two sets of whole-genome genotyping of 317 k single nucleotide polymorphisms, one from the normal cell and another from the cancer cell, are analyzed. We demonstrate the utility of cumulative plot in detecting a 9 Mb (9 ×106 bases) hemizygous deletion and 1 Mb homozygous deletion on chromosome 13. We also show the possibility to detect smaller copy number variation/alteration regions below the 100 kb range. Conclusion As a graphic tool, the cumulative plot is an intuitive and a scale-free (window-less) way for detecting copy number variation/alteration regions, especially when such regions are small.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S Boas Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY 11030, USA.
| | | | | |
Collapse
|
6
|
Barnes C, Plagnol V, Fitzgerald T, Redon R, Marchini J, Clayton D, Hurles ME. A robust statistical method for case-control association testing with copy number variation. Nat Genet 2008; 40:1245-52. [PMID: 18776912 PMCID: PMC2784596 DOI: 10.1038/ng.206] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2008] [Accepted: 06/30/2008] [Indexed: 11/08/2022]
Abstract
Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.
Collapse
Affiliation(s)
- Chris Barnes
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | | | | | |
Collapse
|
7
|
Cooper GM, Zerr T, Kidd JM, Eichler EE, Nickerson DA. Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nat Genet 2008; 40:1199-203. [PMID: 18776910 DOI: 10.1038/ng.236] [Citation(s) in RCA: 176] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 08/18/2008] [Indexed: 11/09/2022]
Abstract
SNP genotyping has emerged as a technology to incorporate copy number variants (CNVs) into genetic analyses of human traits. However, the extent to which SNP platforms accurately capture CNVs remains unclear. Using independent, sequence-based CNV maps, we find that commonly used SNP platforms have limited or no probe coverage for a large fraction of CNVs. Despite this, in 9 samples we inferred 368 CNVs using Illumina SNP genotyping data and experimentally validated over two-thirds of these. We also developed a method (SNP-Conditional Mixture Modeling, SCIMM) to robustly genotype deletions using as few as two SNP probes. We find that HapMap SNPs are strongly correlated with 82% of common deletions, but the newest SNP platforms effectively tag about 50%. We conclude that currently available genome-wide SNP assays can capture CNVs accurately, but improvements in array designs, particularly in duplicated sequences, are necessary to facilitate more comprehensive analyses of genomic variation.
Collapse
Affiliation(s)
- Gregory M Cooper
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
| | | | | | | | | |
Collapse
|
8
|
Flomen RH, Davies AF, Di Forti M, Cascia CL, Mackie-Ogilvie C, Murray R, Makoff AJ. The copy number variant involving part of the α7 nicotinic receptor gene contains a polymorphic inversion. Eur J Hum Genet 2008; 16:1364-71. [DOI: 10.1038/ejhg.2008.112] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
9
|
Shen F, Huang J, Fitch KR, Truong VB, Kirby A, Chen W, Zhang J, Liu G, McCarroll SA, Jones KW, Shapero MH. Improved detection of global copy number variation using high density, non-polymorphic oligonucleotide probes. BMC Genet 2008; 9:27. [PMID: 18373861 PMCID: PMC2374799 DOI: 10.1186/1471-2156-9-27] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 03/28/2008] [Indexed: 11/27/2022] Open
Abstract
Background DNA sequence diversity within the human genome may be more greatly affected by copy number variations (CNVs) than single nucleotide polymorphisms (SNPs). Although the importance of CNVs in genome wide association studies (GWAS) is becoming widely accepted, the optimal methods for identifying these variants are still under evaluation. We have previously reported a comprehensive view of CNVs in the HapMap DNA collection using high density 500 K EA (Early Access) SNP genotyping arrays which revealed greater than 1,000 CNVs ranging in size from 1 kb to over 3 Mb. Although the arrays used most commonly for GWAS predominantly interrogate SNPs, CNV identification and detection does not necessarily require the use of DNA probes centered on polymorphic nucleotides and may even be hindered by the dependence on a successful SNP genotyping assay. Results In this study, we have designed and evaluated a high density array predicated on the use of non-polymorphic oligonucleotide probes for CNV detection. This approach effectively uncouples copy number detection from SNP genotyping and thus has the potential to significantly improve probe coverage for genome-wide CNV identification. This array, in conjunction with PCR-based, complexity-reduced DNA target, queries over 1.3 M independent NspI restriction enzyme fragments in the 200 bp to 1100 bp size range, which is a several fold increase in marker density as compared to the 500 K EA array. In addition, a novel algorithm was developed and validated to extract CNV regions and boundaries. Conclusion Using a well-characterized pair of DNA samples, close to 200 CNVs were identified, of which nearly 50% appear novel yet were independently validated using quantitative PCR. The results indicate that non-polymorphic probes provide a robust approach for CNV identification, and the increasing precision of CNV boundary delineation should allow a more complete analysis of their genomic organization.
Collapse
Affiliation(s)
- Fan Shen
- Affymetrix, Inc, 3420 Central Expressway; Santa Clara, CA 95051, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Perry GH, Ben-Dor A, Tsalenko A, Sampas N, Rodriguez-Revenga L, Tran CW, Scheffer A, Steinfeld I, Tsang P, Yamada NA, Park HS, Kim JI, Seo JS, Yakhini Z, Laderman S, Bruhn L, Lee C. The fine-scale and complex architecture of human copy-number variation. Am J Hum Genet 2008; 82:685-95. [PMID: 18304495 DOI: 10.1016/j.ajhg.2007.12.010] [Citation(s) in RCA: 253] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2007] [Revised: 12/12/2007] [Accepted: 12/31/2007] [Indexed: 11/27/2022] Open
Abstract
Despite considerable excitement over the potential functional significance of copy-number variants (CNVs), we still lack knowledge of the fine-scale architecture of the large majority of CNV regions in the human genome. In this study, we used a high-resolution array-based comparative genomic hybridization (aCGH) platform that targeted known CNV regions of the human genome at approximately 1 kb resolution to interrogate the genomic DNAs of 30 individuals from four HapMap populations. Our results revealed that 1020 of 1153 CNV loci (88%) were actually smaller in size than what is recorded in the Database of Genomic Variants based on previously published studies. A reduction in size of more than 50% was observed for 876 CNV regions (76%). We conclude that the total genomic content of currently known common human CNVs is likely smaller than previously thought. In addition, approximately 8% of the CNV regions observed in multiple individuals exhibited genomic architectural complexity in the form of smaller CNVs within larger ones and CNVs with interindividual variation in breakpoints. Future association studies that aim to capture the potential influences of CNVs on disease phenotypes will need to consider how to best ascertain this previously uncharacterized complexity.
Collapse
Affiliation(s)
- George H Perry
- Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nat Genet 2007; 40:96-101. [PMID: 18157130 DOI: 10.1038/ng.2007.34] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Accepted: 09/19/2007] [Indexed: 01/31/2023]
Abstract
The human genome sequence has been finished to very high standards; however, more than 340 gaps remained when the finished genome was published by the International Human Genome Sequencing Consortium in 2004. Using fosmid resources generated from multiple individuals, we targeted gaps in the euchromatic part of the human genome. Here we report 2,488,842 bp of previously unknown euchromatic sequence, 363,114 bp of which close 26 of 250 euchromatic gaps, or 10%, including two remaining euchromatic gaps on chromosome 19. Eight (30.7%) of the closed gaps were found to be polymorphic. These sequences allow complete annotation of several human genes as well as the assignment of mRNAs. The gap sequences are 2.3-fold enriched in segmentally duplicated sequences compared to the whole genome. Our analysis confirms that not all gaps within 'finished' genomes are recalcitrant to subcloning and suggests that the paired-end-sequenced fosmid libraries could prove to be a rich resource for completion of the human euchromatic genome.
Collapse
|
12
|
Ting JC, Roberson EDO, Miller ND, Lysholm-Bernacchi A, Stephan DA, Capone GT, Ruczinski I, Thomas GH, Pevsner J. Visualization of uniparental inheritance, Mendelian inconsistencies, deletions, and parent of origin effects in single nucleotide polymorphism trio data with SNPtrio. Hum Mutat 2007; 28:1225-35. [PMID: 17661425 DOI: 10.1002/humu.20583] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A variety of alterations occur in chromosomal DNA, many of which can be detected using high density single nucleotide polymorphism (SNP) microarrays. These include deletions and duplications (assessed by observing changes in copy number) and regions of homozygosity. The analysis of SNP data from trios can provide an additional category of information about the nature and origin of inheritance patterns, including uniparental disomy (UPD), loss of transmitted allele (LTA), and nonparental relationship. The main purpose of SNPtrio is to locate regions of uniparental inheritance (UPI) and Mendelian inconsistency (MI), identify the type (paternal vs. maternal, iso- vs. hetero-), and assess the associated statistical probability of occurrence by chance. SNPtrio's schema permits the identification of hemizygous or homozygous deletions as well as UPD. We validated the performance of SNPtrio on three platforms (Affymetrix 10 K and 100 K arrays and Illumina 550 K arrays) using SNP data obtained from DNA samples of patients known to have UPD and diagnosed with Prader-Willi syndrome, Angelman syndrome, Beckwith-Wiedemann syndrome, pseudohypoparathyroidism, and a complex chromosome 2 abnormality. We further validated SNPtrio using DNA from patients previously shown to have microdeletions that were verified by fluorescence in situ hybridization (FISH). SNPtrio successfully identified previously known UPD and deletion regions, and generated associated probability values. SNPtrio analysis of trisomy 21 (Down syndrome) cases and their parents permitted identification of the parent of origin of the extra chromosomal copy. SNPtrio is freely accessible at http://pevsnerlab.kennedykrieger.org/SNPtrio.htm (Last accessed: 20 June 2007).
Collapse
Affiliation(s)
- Jason C Ting
- Department of Neurology, Kennedy Krieger Institute, Baltimore, Maryland 21205, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Abstract
Population genetics is central to our understanding of human variation, and by linking medical and evolutionary themes, it enables us to understand the origins and impacts of our genomic differences. Despite current limitations in our knowledge of the locations, sizes and mutational origins of structural variants, our characterization of their population genetics is developing apace, bringing new insights into recent human adaptation, genome biology and disease. We summarize recent dramatic advances, describe the diverse mutational origins of chromosomal rearrangements and argue that their complexity necessitates a re-evaluation of existing population genetic methods.
Collapse
Affiliation(s)
- Donald F Conrad
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | | |
Collapse
|
14
|
High-throughput genotyping of a common deletion polymorphism disrupting the TRY6 gene and its association with breast cancer risk. BMC Genet 2007; 8:41. [PMID: 17598925 PMCID: PMC1925117 DOI: 10.1186/1471-2156-8-41] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2007] [Accepted: 06/29/2007] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Copy number polymorphisms caused by genomic rearrangements like deletions, make a significant contribution to the genomic differences between two individuals and may add to disease predisposition. Therefore, genotyping of such deletion polymorphisms in case-control studies could give important insights into risk associations. RESULTS We mapped the breakpoints and developed a fluorescent fragment analysis for a deletion disrupting the TRY6 gene to exemplify a quick and cheap genotyping approach for such structural variants. We showed that the deletion is larger than predicted and encompasses also the pseudogene TRY5. We performed a case-control study to test an association of the TRY6 deletion polymorphism with breast cancer using a single nucleotide polymorphism which is in 100% linkage disequilibrium with the deletion. We did not observe an effect of the deletion on breast cancer risk (OR 1.05, 95% CI 0.71-1.56). CONCLUSION Although we did not observe an association between the TRY6 deletion polymorphism and breast cancer risk, the identification and investigation of further deletions using the present approach may help to elucidate their effect on disease susceptibility.
Collapse
|
15
|
Korbel JO, Urban AE, Grubert F, Du J, Royce TE, Starr P, Zhong G, Emanuel BS, Weissman SM, Snyder M, Gerstein MB. Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. Proc Natl Acad Sci U S A 2007; 104:10110-5. [PMID: 17551006 PMCID: PMC1891248 DOI: 10.1073/pnas.0703834104] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from http://breakptr.gersteinlab.org). We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, "active" approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 breakpoints on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of approximately 300 bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.
Collapse
Affiliation(s)
- Jan O. Korbel
- Departments of *Molecular Biophysics and Biochemistry and
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany
- To whom correspondence may be addressed. E-mail: , , or
| | - Alexander Eckehart Urban
- Genetics, Yale University School of Medicine, New Haven, CT 06520
- Departments of Molecular, Cellular, and Developmental Biology and
| | - Fabian Grubert
- Genetics, Yale University School of Medicine, New Haven, CT 06520
| | - Jiang Du
- Computer Science, Yale University, New Haven, CT 06520; and
| | | | - Peter Starr
- Departments of *Molecular Biophysics and Biochemistry and
| | - Guoneng Zhong
- Departments of *Molecular Biophysics and Biochemistry and
| | - Beverly S. Emanuel
- **Department of Pediatrics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104
| | | | - Michael Snyder
- Departments of Molecular, Cellular, and Developmental Biology and
- To whom correspondence may be addressed. E-mail: , , or
| | - Mark B. Gerstein
- Departments of *Molecular Biophysics and Biochemistry and
- Computer Science, Yale University, New Haven, CT 06520; and
- To whom correspondence may be addressed. E-mail: , , or
| |
Collapse
|
16
|
Abstract
DNA copy number variation (CNV) represents a considerable source of human genetic diversity. Recently,1 a global map of copy number variation in the human genome has been drawn up which reveals not only the ubiquity but also the complexity of this type of variation. Thus, two human genomes may differ by more than 20 Mb and it is likely that the full extent of CNV still remains to be discovered. Nearly 3000 genes are associated with CNV. This high degree of variability with regard to gene copy number between two individuals challenges definitions of normality. Many CNVs are located in regions of complex genomic structure and this currently limits the extent to which these variants can be genotyped by using tagging SNPs. However, some CNVs are already amenable to genome-wide association studies so that their influence on human phenotypic diversity and disease susceptibility may soon be determined.
Collapse
|
17
|
Carson AR, Feuk L, Mohammed M, Scherer SW. Strategies for the detection of copy number and other structural variants in the human genome. Hum Genomics 2006; 2:403-14. [PMID: 16848978 PMCID: PMC3525157 DOI: 10.1186/1479-7364-2-6-403] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2022] Open
Abstract
Advances in genome scanning technologies are revealing that copy number variants (CNVs) and polymorphisms, ranging from a few kilobases to several megabases in size, are present in genomes at frequencies much greater than previously known. Discoveries of additional forms of genomic variation, including inversions, insertions, deletions and complex rearrangements, are also occurring at an increased rate. Along with CNVs, these sequence alterations are collectively known as structural variants, and their discovery has had an immediate impact on the interpretation of basic research and clinical diagnostic data. This paper discusses different methods, experimental strategies and technologies that are currently available to study copy number variation and other structural variants in the human genome.
Collapse
Affiliation(s)
- Andrew R Carson
- The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children and Department of Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Lars Feuk
- The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children and Department of Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada
| | | | - Stephen W Scherer
- The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children and Department of Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
18
|
Komura D, Shen F, Ishikawa S, Fitch KR, Chen W, Zhang J, Liu G, Ihara S, Nakamura H, Hurles ME, Lee C, Scherer SW, Jones KW, Shapero MH, Huang J, Aburatani H. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res 2006; 16:1575-84. [PMID: 17122084 PMCID: PMC1665641 DOI: 10.1101/gr.5629106] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recent reports indicate that copy number variations (CNVs) within the human genome contribute to nucleotide diversity to a larger extent than single nucleotide polymorphisms (SNPs). In addition, the contribution of CNVs to human disease susceptibility may be greater than previously expected, although a complete understanding of the phenotypic consequences of CNVs is incomplete. We have recently reported a comprehensive view of CNVs among 270 HapMap samples using high-density SNP genotyping arrays and BAC array CGH. In this report, we describe a novel algorithm using Affymetrix GeneChip Human Mapping 500K Early Access (500K EA) arrays that identified 1203 CNVs ranging in size from 960 bp to 3.4 Mb. The algorithm consists of three steps: (1) Intensity pre-processing to improve the resolution between pairwise comparisons by directly estimating the allele-specific affinity as well as to reduce signal noise by incorporating probe and target sequence characteristics via an improved version of the Genomic Imbalance Map (GIM) algorithm; (2) CNV extraction using an adapted SW-ARRAY procedure to automatically and robustly detect candidate CNV regions; and (3) copy number inference in which all pairwise comparisons are summarized to more precisely define CNV boundaries and accurately estimate CNV copy number. Independent testing of a subset of CNVs by quantitative PCR and mass spectrometry demonstrated a >90% verification rate. The use of high-resolution oligonucleotide arrays relative to other methods may allow more precise boundary information to be extracted, thereby enabling a more accurate analysis of the relationship between CNVs and other genomic features.
Collapse
Affiliation(s)
- Daisuke Komura
- Research Center for Advanced Science and Technology, The University of Tokyo, Meguro, Tokyo 153-8904, Japan
- Department of Advanced Interdisciplinary Studies, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Fan Shen
- Affymetrix, Inc., Santa Clara, California 95051, USA
| | - Shumpei Ishikawa
- Research Center for Advanced Science and Technology, The University of Tokyo, Meguro, Tokyo 153-8904, Japan
| | | | - Wenwei Chen
- Affymetrix, Inc., Santa Clara, California 95051, USA
| | - Jane Zhang
- Affymetrix, Inc., Santa Clara, California 95051, USA
| | - Guoying Liu
- Affymetrix, Inc., Santa Clara, California 95051, USA
| | - Sigeo Ihara
- Research Center for Advanced Science and Technology, The University of Tokyo, Meguro, Tokyo 153-8904, Japan
| | - Hiroshi Nakamura
- Research Center for Advanced Science and Technology, The University of Tokyo, Meguro, Tokyo 153-8904, Japan
- Department of Advanced Interdisciplinary Studies, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Matthew E. Hurles
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom
| | - Charles Lee
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Stephen W. Scherer
- The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, M5G 1L7, Canada
| | | | | | - Jing Huang
- Affymetrix, Inc., Santa Clara, California 95051, USA
- Corresponding authors.E-mail ; fax (408) 732-7025.E-mail ; fax 81-3-5452-5355
| | - Hiroyuki Aburatani
- Research Center for Advanced Science and Technology, The University of Tokyo, Meguro, Tokyo 153-8904, Japan
- Japan Science and Technology Agency, Kawaguchi, Saitama, 332-0012, Japan
- Corresponding authors.E-mail ; fax (408) 732-7025.E-mail ; fax 81-3-5452-5355
| |
Collapse
|
19
|
Bhangale TR, Stephens M, Nickerson DA. Automating resequencing-based detection of insertion-deletion polymorphisms. Nat Genet 2006; 38:1457-62. [PMID: 17115056 DOI: 10.1038/ng1925] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2006] [Accepted: 10/17/2006] [Indexed: 12/19/2022]
Abstract
Structural and insertion-deletion (indel) variants have received considerable recent attention, partly because of their phenotypic consequences. Among these variants, the most common are small indels ( approximately 1-30 bp). Identifying and genotyping indels using sequence traces obtained from diploid samples requires extensive manual review, which makes large-scale studies inconvenient. We report a new algorithm, implemented in available software (PolyPhred version 6.0), to help automate detection and genotyping of indels from sequence traces. The algorithm identifies heterozygous individuals, which permits the discovery of low-frequency indels. It finds 80% of all indel polymorphisms with almost no false positives and finds 97% with a false discovery rate of 10%. Additionally, genotyping accuracy exceeds 99%, and it correctly infers indel length in 96% of the cases. Using this approach, we identify indels in the HapMap ENCODE regions, providing the first report of these polymorphisms in this data set.
Collapse
Affiliation(s)
- Tushar R Bhangale
- Department of Bioengineering, University of Washington, Seattle, Washington 98195, USA
| | | | | |
Collapse
|
20
|
Goidts V, Cooper DN, Armengol L, Schempp W, Conroy J, Estivill X, Nowak N, Hameister H, Kehrer-Sawatzki H. Complex patterns of copy number variation at sites of segmental duplications: an important category of structural variation in the human genome. Hum Genet 2006; 120:270-84. [PMID: 16838144 DOI: 10.1007/s00439-006-0217-y] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 05/26/2006] [Indexed: 10/24/2022]
Abstract
The structural diversity of the human genome is much higher than previously assumed although its full extent remains unknown. To investigate the association between segmental duplications that display constitutive copy number differences (CNDs) between humans and the great apes and those which exhibit polymorphic copy number variations (CNVs) between humans, we analysed a BAC array enriched with segmental duplications displaying such CNDs. This study documents for the first time that in addition to human-specific gains common to all humans, these duplication clusters (DCs) also exhibit polymorphic CNVs > 40 kb. Segmental duplication is known to have been a frequent event during human genome evolution. Importantly, among the CNV-associated genes identified here, those involved in transcriptional regulation were found to be significantly overrepresented. Complex patterns of variation were evident at sites of DCs, manifesting as inter-individual differentially sized copy number alterations at the same genomic loci. Thus, CNVs associated with segmental duplications do not simply represent insertion/deletion polymorphisms, but rather constitute a wide variety of rearrangements involving differential amplification and partial gains and losses with high inter-individual variability. Although the number of CNVs was not found to differ between Africans and Caucasians/Asians, the average number of variant patterns per locus was significantly lower in Africans. Thus, complex variation patterns characterizing segmental duplications result from relatively recent genomic rearrangements. The high number of these rearrangements, some of which are potentially recurrent, together with differences in population size and expansion dynamics, may account for the greater diversity of CNV in Caucasians/Asians as compared with Africans.
Collapse
Affiliation(s)
- Violaine Goidts
- Department of Human Genetics, University of Ulm, Albert Einstein Allee 11, 89081, Ulm, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Freeman JL, Perry GH, Feuk L, Redon R, McCarroll SA, Altshuler DM, Aburatani H, Jones KW, Tyler-Smith C, Hurles ME, Carter NP, Scherer SW, Lee C. Copy number variation: new insights in genome diversity. Genome Res 2006; 16:949-61. [PMID: 16809666 DOI: 10.1101/gr.3677206] [Citation(s) in RCA: 545] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
DNA copy number variation has long been associated with specific chromosomal rearrangements and genomic disorders, but its ubiquity in mammalian genomes was not fully realized until recently. Although our understanding of the extent of this variation is still developing, it seems likely that, at least in humans, copy number variants (CNVs) account for a substantial amount of genetic variation. Since many CNVs include genes that result in differential levels of gene expression, CNVs may account for a significant proportion of normal phenotypic variation. Current efforts are directed toward a more comprehensive cataloging and characterization of CNVs that will provide the basis for determining how genomic diversity impacts biological function, evolution, and common human diseases.
Collapse
Affiliation(s)
- Jennifer L Freeman
- Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Sharp A. Revealing the hidden structure of our genome. Nat Methods 2006; 3:427-8. [PMID: 16721375 DOI: 10.1038/nmeth0606-427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
23
|
Carlson CS, Smith JD, Stanaway IB, Rieder MJ, Nickerson DA. Direct detection of null alleles in SNP genotyping data. Hum Mol Genet 2006; 15:1931-7. [PMID: 16644863 DOI: 10.1093/hmg/ddl115] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Pinpointing genetic associations in the human genome relies heavily on the accuracy of the underlying genotype data. Null alleles can generate significant inaccuracies in genotype data and can negatively affect the statistical power of a study. Existing quality control (QC) tests, including tests of Hardy-Weinberg equilibrium, are not sensitive enough to detect the presence of even moderately frequent null alleles in the data. We show that direct analysis of raw data from a quantitative genotyping platform can detect up to 75% of null alleles, even at frequencies below the sensitivity of more traditional methods. Detecting unexpected null alleles not only has benefits in QC of genotype data but may also be valuable in detecting rare, functional null alleles that would otherwise be missed.
Collapse
Affiliation(s)
- Christopher S Carlson
- Department of Genome Sciences, University of Washington, Seattle, WA 98195-7730, USA
| | | | | | | | | |
Collapse
|