1
|
Balagué-Dobón L, Cáceres A, González JR. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief Bioinform 2022; 23:bbac043. [PMID: 35211719 PMCID: PMC8921734 DOI: 10.1093/bib/bbac043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
Collapse
|
2
|
Cosemans N, Claes P, Brison N, Vermeesch JR, Peeters H. Noise-robust assessment of SNP array based CNV calls through local noise estimation of log R ratios. Stat Appl Genet Mol Biol 2018; 17:sagmb-2017-0026. [PMID: 29708886 DOI: 10.1515/sagmb-2017-0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Arrays based on single nucleotide polymorphisms (SNPs) have been successful for the large scale discovery of copy number variants (CNVs). However, current CNV calling algorithms still have limitations in detecting CNVs with high specificity and sensitivity, especially in case of small (<100 kb) CNVs. Therefore, this study presents a simple statistical analysis to evaluate CNV calls from SNP arrays in order to improve the noise-robustness of existing CNV calling algorithms. The proposed approach estimates local noise of log R ratios and returns the probability that a certain observation is different from this log R ratio noise level. This probability can be triggered at different thresholds to tailor specificity and/or sensitivity in a flexible way. Moreover, a comparison based on qPCR experiments showed that the proposed noise-robust CNV calls outperformed original ones for multiple threshold values.
Collapse
Affiliation(s)
- Nele Cosemans
- Center for Human Genetics, University Hospital Leuven, KU Leuven, Leuven, Belgium
| | - Peter Claes
- Medical Image Computing, ESAT/PSI, Department of Electrical Engineering, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, UZ Leuven, Leuven, Belgium
| | - Nathalie Brison
- Center for Human Genetics, University Hospital Leuven, KU Leuven, Leuven, Belgium
| | | | - Hilde Peeters
- Center for Human Genetics, University Hospital Leuven, KU Leuven, Leuven, Belgium
| |
Collapse
|
3
|
Stamoulis C, Betensky RA. Optimization of Signal Decomposition Matched Filtering (SDMF) for Improved Detection of Copy-Number Variations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:584-591. [PMID: 27295643 PMCID: PMC4905595 DOI: 10.1109/tcbb.2015.2448077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We aim to improve the performance of the previously proposed signal decomposition matched filtering (SDMF) method [26] for the detection of copy-number variations (CNV) in the human genome. Through simulations, we show that the modified SDMF is robust even at high noise levels and outperforms the original SDMF method, which indirectly depends on CNV frequency. Simulations are also used to develop a systematic approach for selecting relevant parameter thresholds in order to optimize sensitivity, specificity and computational efficiency. We apply the modified method to array CGH data from normal samples in the cancer genome atlas (TCGA) and compare detected CNVs to those estimated using circular binary segmentation (CBS) [19], a hidden Markov model (HMM)-based approach [11] and a subset of CNVs in the Database of Genomic Variants. We show that a substantial number of previously identified CNVs are detected by the optimized SDMF, which also outperforms the other two methods.
Collapse
Affiliation(s)
- Catherine Stamoulis
- Department of Radiology, Harvard Medical School and Boston Children’s Hospital, Boston, MA 02115
| | - Rebecca A. Betensky
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115
| |
Collapse
|
4
|
Cava C, Bertoli G, Castiglioni I. Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential. BMC SYSTEMS BIOLOGY 2015; 9:62. [PMID: 26391647 PMCID: PMC4578257 DOI: 10.1186/s12918-015-0211-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 09/15/2015] [Indexed: 12/11/2022]
Abstract
BACKGROUND Development of human cancer can proceed through the accumulation of different genetic changes affecting the structure and function of the genome. Combined analyses of molecular data at multiple levels, such as DNA copy-number alteration, mRNA and miRNA expression, can clarify biological functions and pathways deregulated in cancer. The integrative methods that are used to investigate these data involve different fields, including biology, bioinformatics, and statistics. RESULTS These methodologies are presented in this review, and their implementation in breast cancer is discussed with a focus on integration strategies. We report current applications, recent studies and interesting results leading to the identification of candidate biomarkers for diagnosis, prognosis, and therapy in breast cancer by using both individual and combined analyses. CONCLUSION This review presents a state of art of the role of different technologies in breast cancer based on the integration of genetics and epigenetics, and shares some issues related to the new opportunities and challenges offered by the application of such integrative approaches.
Collapse
Affiliation(s)
- Claudia Cava
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Gloria Bertoli
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Isabella Castiglioni
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| |
Collapse
|
5
|
Nutsua ME, Fischer A, Nebel A, Hofmann S, Schreiber S, Krawczak M, Nothnagel M. Family-Based Benchmarking of Copy Number Variation Detection Software. PLoS One 2015. [PMID: 26197066 PMCID: PMC4510559 DOI: 10.1371/journal.pone.0133465] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The analysis of structural variants, in particular of copy-number variations (CNVs), has proven valuable in unraveling the genetic basis of human diseases. Hence, a large number of algorithms have been developed for the detection of CNVs in SNP array signal intensity data. Using the European and African HapMap trio data, we undertook a comparative evaluation of six commonly used CNV detection software tools, namely Affymetrix Power Tools (APT), QuantiSNP, PennCNV, GLAD, R-gada and VEGA, and assessed their level of pair-wise prediction concordance. The tool-specific CNV prediction accuracy was assessed in silico by way of intra-familial validation. Software tools differed greatly in terms of the number and length of the CNVs predicted as well as the number of markers included in a CNV. All software tools predicted substantially more deletions than duplications. Intra-familial validation revealed consistently low levels of prediction accuracy as measured by the proportion of validated CNVs (34-60%). Moreover, up to 20% of apparent family-based validations were found to be due to chance alone. Software using Hidden Markov models (HMM) showed a trend to predict fewer CNVs than segmentation-based algorithms albeit with greater validity. PennCNV yielded the highest prediction accuracy (60.9%). Finally, the pairwise concordance of CNV prediction was found to vary widely with the software tools involved. We recommend HMM-based software, in particular PennCNV, rather than segmentation-based algorithms when validity is the primary concern of CNV detection. QuantiSNP may be used as an additional tool to detect sets of CNVs not detectable by the other tools. Our study also reemphasizes the need for laboratory-based validation, such as qPCR, of CNVs predicted in silico.
Collapse
Affiliation(s)
- Marcel Elie Nutsua
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Annegret Fischer
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Almut Nebel
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Sylvia Hofmann
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Stefan Schreiber
- Institute of Clinical Molecular Biology, Christian-Albrechts University, Kiel, Germany
| | - Michael Krawczak
- Institute of Medical Informatics and Statistics, Christian-Albrechts University, Kiel, Germany
| | - Michael Nothnagel
- Institute of Medical Informatics and Statistics, Christian-Albrechts University, Kiel, Germany; Cologne Center for Genomics, University of Cologne, Cologne, Germany
| |
Collapse
|
6
|
Locke MEO, Milojevic M, Eitutis ST, Patel N, Wishart AE, Daley M, Hill KA. Genomic copy number variation in Mus musculus. BMC Genomics 2015; 16:497. [PMID: 26141061 PMCID: PMC4490682 DOI: 10.1186/s12864-015-1713-z] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 06/22/2015] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Copy number variation is an important dimension of genetic diversity and has implications in development and disease. As an important model organism, the mouse is a prime candidate for copy number variant (CNV) characterization, but this has yet to be completed for a large sample size. Here we report CNV analysis of publicly available, high-density microarray data files for 351 mouse tail samples, including 290 mice that had not been characterized for CNVs previously. RESULTS We found 9634 putative autosomal CNVs across the samples affecting 6.87% of the mouse reference genome. We find significant differences in the degree of CNV uniqueness (single sample occurrence) and the nature of CNV-gene overlap between wild-caught mice and classical laboratory strains. CNV-gene overlap was associated with lipid metabolism, pheromone response and olfaction compared to immunity, carbohydrate metabolism and amino-acid metabolism for wild-caught mice and classical laboratory strains, respectively. Using two subspecies of wild-caught Mus musculus, we identified putative CNVs unique to those subspecies and show this diversity is better captured by wild-derived laboratory strains than by the classical laboratory strains. A total of 9 genic copy number variable regions (CNVRs) were selected for experimental confirmation by droplet digital PCR (ddPCR). CONCLUSION The analysis we present is a comprehensive, genome-wide analysis of CNVs in Mus musculus, which increases the number of known variants in the species and will accelerate the identification of novel variants in future studies.
Collapse
Affiliation(s)
- M Elizabeth O Locke
- Department of Computer Science, The University of Western Ontario, London, ON, N6A 5B7, Canada.
| | - Maja Milojevic
- Department of Biology, The University of Western Ontario, Biological and Geological Sciences Building 1151 Richmond St. N, London, ON, N6A 5B7, Canada.
| | - Susan T Eitutis
- Department of Biology, The University of Western Ontario, Biological and Geological Sciences Building 1151 Richmond St. N, London, ON, N6A 5B7, Canada.
| | - Nisha Patel
- Department of Biology, The University of Western Ontario, Biological and Geological Sciences Building 1151 Richmond St. N, London, ON, N6A 5B7, Canada.
| | - Andrea E Wishart
- Department of Biology, The University of Western Ontario, Biological and Geological Sciences Building 1151 Richmond St. N, London, ON, N6A 5B7, Canada.
| | - Mark Daley
- Department of Computer Science, The University of Western Ontario, London, ON, N6A 5B7, Canada.
- Department of Biology, The University of Western Ontario, Biological and Geological Sciences Building 1151 Richmond St. N, London, ON, N6A 5B7, Canada.
| | - Kathleen A Hill
- Department of Computer Science, The University of Western Ontario, London, ON, N6A 5B7, Canada.
- Department of Biology, The University of Western Ontario, Biological and Geological Sciences Building 1151 Richmond St. N, London, ON, N6A 5B7, Canada.
| |
Collapse
|
7
|
Castellani CA, Melka MG, Wishart AE, Locke MEO, Awamleh Z, O'Reilly RL, Singh SM. Biological relevance of CNV calling methods using familial relatedness including monozygotic twins. BMC Bioinformatics 2014; 15:114. [PMID: 24750645 PMCID: PMC4021055 DOI: 10.1186/1471-2105-15-114] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 04/14/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Studies involving the analysis of structural variation including Copy Number Variation (CNV) have recently exploded in the literature. Furthermore, CNVs have been associated with a number of complex diseases and neurodevelopmental disorders. Common methods for CNV detection use SNP, CNV, or CGH arrays, where the signal intensities of consecutive probes are used to define the number of copies associated with a given genomic region. These practices pose a number of challenges that interfere with the ability of available methods to accurately call CNVs. It has, therefore, become necessary to develop experimental protocols to test the reliability of CNV calling methods from microarray data so that researchers can properly discriminate biologically relevant data from noise. RESULTS We have developed a workflow for the integration of data from multiple CNV calling algorithms using the same array results. It uses four CNV calling programs: PennCNV (PC), Affymetrix® Genotyping Console™ (AGC), Partek® Genomics Suite™ (PGS) and Golden Helix SVS™ (GH) to analyze CEL files from the Affymetrix® Human SNP 6.0 Array™. To assess the relative suitability of each program, we used individuals of known genetic relationships. We found significant differences in CNV calls obtained by different CNV calling programs. CONCLUSIONS Although the programs showed variable patterns of CNVs in the same individuals, their distribution in individuals of different degrees of genetic relatedness has allowed us to offer two suggestions. The first involves the use of multiple algorithms for the detection of the largest possible number of CNVs, and the second suggests the use of PennCNV over all other methods when the use of only one software program is desirable.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Shiva M Singh
- Department of Biology, The University of Western Ontario, London N6A 5B7, ON, Canada.
| |
Collapse
|
8
|
Feber A, Guilhamon P, Lechner M, Fenton T, Wilson GA, Thirlwell C, Morris TJ, Flanagan AM, Teschendorff AE, Kelly JD, Beck S. Using high-density DNA methylation arrays to profile copy number alterations. Genome Biol 2014; 15:R30. [PMID: 24490765 PMCID: PMC4054098 DOI: 10.1186/gb-2014-15-2-r30] [Citation(s) in RCA: 100] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 02/03/2014] [Indexed: 12/17/2022] Open
Abstract
The integration of genomic and epigenomic data is an increasingly popular approach for studying the complex mechanisms driving cancer development. We have developed a method for evaluating both methylation and copy number from high-density DNA methylation arrays. Comparing copy number data from Infinium HumanMethylation450 BeadChips and SNP arrays, we demonstrate that Infinium arrays detect copy number alterations with the sensitivity of SNP platforms. These results show that high-density methylation arrays provide a robust and economic platform for detecting copy number and methylation changes in a single experiment. Our method is available in the ChAMP Bioconductor package: http://www.bioconductor.org/packages/2.13/bioc/html/ChAMP.html.
Collapse
Affiliation(s)
- Andrew Feber
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Paul Guilhamon
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Matthias Lechner
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Tim Fenton
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Gareth A Wilson
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Christina Thirlwell
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Tiffany J Morris
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Adrienne M Flanagan
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
- Royal National Orthopaedic Hospital, Stanmore, Brockly Hill, Middlesex HA7 4LP, UK
| | - Andrew E Teschendorff
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - John D Kelly
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
- Division of Surgery and Interventional Science, UCL Medical School, University College London, London WC1E 6BT, UK
| | - Stephan Beck
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| |
Collapse
|
9
|
Vandeweyer G, Kooy RF. Detection and interpretation of genomic structural variation in health and disease. Expert Rev Mol Diagn 2014; 13:61-82. [DOI: 10.1586/erm.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
10
|
Bendjilali N, Kim H, Weinsheimer S, Guo DE, Kwok PY, Zaroff JG, Sidney S, Lawton MT, McCulloch CE, Koeleman BPC, Klijn CJM, Young WL, Pawlikowska L. A genome-wide investigation of copy number variation in patients with sporadic brain arteriovenous malformation. PLoS One 2013; 8:e71434. [PMID: 24098321 PMCID: PMC3789669 DOI: 10.1371/journal.pone.0071434] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2013] [Accepted: 06/30/2013] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Brain arteriovenous malformations (BAVM) are clusters of abnormal blood vessels, with shunting of blood from the arterial to venous circulation and a high risk of rupture and intracranial hemorrhage. Most BAVMs are sporadic, but also occur in patients with Hereditary Hemorrhagic Telangiectasia, a Mendelian disorder caused by mutations in genes in the transforming growth factor beta (TGFβ) signaling pathway. METHODS To investigate whether copy number variations (CNVs) contribute to risk of sporadic BAVM, we performed a genome-wide association study in 371 sporadic BAVM cases and 563 healthy controls, all Caucasian. Cases and controls were genotyped using the Affymetrix 6.0 array. CNVs were called using the PennCNV and Birdsuite algorithms and analyzed via segment-based and gene-based approaches. Common and rare CNVs were evaluated for association with BAVM. RESULTS A CNV region on 1p36.13, containing the neuroblastoma breakpoint family, member 1 gene (NBPF1), was significantly enriched with duplications in BAVM cases compared to controls (P = 2.2×10(-9)); NBPF1 was also significantly associated with BAVM in gene-based analysis using both PennCNV and Birdsuite. We experimentally validated the 1p36.13 duplication; however, the association did not replicate in an independent cohort of 184 sporadic BAVM cases and 182 controls (OR = 0.81, P = 0.8). Rare CNV analysis did not identify genes significantly associated with BAVM. CONCLUSION We did not identify common CNVs associated with sporadic BAVM that replicated in an independent cohort. Replication in larger cohorts is required to elucidate the possible role of common or rare CNVs in BAVM pathogenesis.
Collapse
Affiliation(s)
- Nasrine Bendjilali
- Center for Cerebrovascular Research, Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, California, United States of America
| | - Helen Kim
- Center for Cerebrovascular Research, Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America
| | - Shantel Weinsheimer
- Center for Cerebrovascular Research, Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, California, United States of America
| | - Diana E. Guo
- Center for Cerebrovascular Research, Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, California, United States of America
| | - Pui-Yan Kwok
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- Cardiovascular Research Institute, University of California San Francisco, San Francisco, California, United States of America
| | - Jonathan G. Zaroff
- Kaiser Northern California Division of Research, San Francisco, California, United States of America
| | - Stephen Sidney
- Kaiser Northern California Division of Research, San Francisco, California, United States of America
| | - Michael T. Lawton
- Department of Neurological Surgery, University of California San Francisco, San Francisco, California, United States of America
| | - Charles E. McCulloch
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America
| | - Bobby P. C. Koeleman
- Department of Medical Genetics, University Medical Center, Utrecht, The Netherlands
| | - Catharina J. M. Klijn
- Department of Neurology and Neurosurgery, Rudolf Magnus Institute of Neuroscience, University Medical Center, Utrecht, The Netherlands
| | - William L. Young
- Center for Cerebrovascular Research, Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, California, United States of America
- Department of Neurological Surgery, University of California San Francisco, San Francisco, California, United States of America
- Department of Neurology, University of California San Francisco, San Francisco, California, United States of America
| | - Ludmila Pawlikowska
- Center for Cerebrovascular Research, Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
11
|
Abstract
MOTIVATION Data quality is a critical issue in the analyses of DNA copy number alterations obtained from microarrays. It is commonly assumed that copy number alteration data can be modeled as piecewise constant and the measurement errors of different probes are independent. However, these assumptions do not always hold in practice. In some published datasets, we find that measurement errors are highly correlated between probes that interrogate nearby genomic loci, and the piecewise-constant model does not fit the data well. The correlated errors cause problems in downstream analysis, leading to a large number of DNA segments falsely identified as having copy number gains and losses. METHOD We developed a simple tool, called autocorrelation scanning profile, to assess the dependence of measurement error between neighboring probes. RESULTS Autocorrelation scanning profile can be used to check data quality and refine the analysis of DNA copy number data, which we demonstrate in some typical datasets. CONTACT lzhangli@mdanderson.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Liangcai Zhang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA and Department of Biophysics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | | |
Collapse
|
12
|
Metzger J, Philipp U, Lopes MS, da Camara Machado A, Felicetti M, Silvestrelli M, Distl O. Analysis of copy number variants by three detection algorithms and their association with body size in horses. BMC Genomics 2013; 14:487. [PMID: 23865711 PMCID: PMC3720552 DOI: 10.1186/1471-2164-14-487] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Accepted: 07/15/2013] [Indexed: 12/14/2022] Open
Abstract
Background Copy number variants (CNVs) have been shown to play an important role in genetic diversity of mammals and in the development of many complex phenotypic traits. The aim of this study was to perform a standard comparative evaluation of CNVs in horses using three different CNV detection programs and to identify genomic regions associated with body size in horses. Results Analysis was performed using the Illumina Equine SNP50 genotyping beadchip for 854 horses. CNVs were detected by three different algorithms, CNVPartition, PennCNV and QuantiSNP. Comparative analysis revealed 50 CNVs that affected 153 different genes mainly involved in sensory perception, signal transduction and cellular components. Genome-wide association analysis for body size showed highly significant deleted regions on ECA1, ECA8 and ECA9. Homologous regions to the detected CNVs on ECA1 and ECA9 have also been shown to be correlated with human height. Conclusions Comparative analysis of CNV detection algorithms was useful to increase the specificity of CNV detection but had certain limitations dependent on the detection tool. GWAS revealed genome-wide associated CNVs for body size in horses.
Collapse
Affiliation(s)
- Julia Metzger
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany
| | | | | | | | | | | | | |
Collapse
|
13
|
Comparative Analysis of CNV Calling Algorithms: Literature Survey and a Case Study Using Bovine High-Density SNP Data. MICROARRAYS 2013; 2:171-85. [PMID: 27605188 PMCID: PMC5003459 DOI: 10.3390/microarrays2030171] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Revised: 06/04/2013] [Accepted: 06/05/2013] [Indexed: 11/23/2022]
Abstract
Copy number variations (CNVs) are gains and losses of genomic sequence between two individuals of a species when compared to a reference genome. The data from single nucleotide polymorphism (SNP) microarrays are now routinely used for genotyping, but they also can be utilized for copy number detection. Substantial progress has been made in array design and CNV calling algorithms and at least 10 comparison studies in humans have been published to assess them. In this review, we first survey the literature on existing microarray platforms and CNV calling algorithms. We then examine a number of CNV calling tools to evaluate their impacts using bovine high-density SNP data. Large incongruities in the results from different CNV calling tools highlight the need for standardizing array data collection, quality assessment and experimental validation. Only after careful experimental design and rigorous data filtering can the impacts of CNVs on both normal phenotypic variability and disease susceptibility be fully revealed.
Collapse
|
14
|
Chia NL, Bryce M, Hickman PE, Potter JM, Glasgow N, Koerbin G, Danoy P, Brown MA, Cavanaugh J. High-resolution SNP microarray investigation of copy number variations on chromosome 18 in a control cohort. Cytogenet Genome Res 2013; 141:16-25. [PMID: 23635498 DOI: 10.1159/000350767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/02/2013] [Indexed: 11/19/2022] Open
Abstract
Copy number variations (CNVs) as described in the healthy population are purported to contribute significantly to genetic heterogeneity. Recent studies have described CNVs using lymphoblastoid cell lines or by application of specifically developed algorithms to interrogate previously described data. However, the full extent of CNVs remains unclear. Using high-density SNP array, we have undertaken a comprehensive investigation of chromosome 18 for CNV discovery and characterisation of distribution and association with chromosome architecture. We identified 399 CNVs, of which loss represents 98%, 58% are less than 2.5 kb in size and 71% are intergenic. Intronic deletions account for the majority of copy number changes with gene involvement. Furthermore, one-third of CNVs do not have putative breakpoints within repetitive sequences. We conclude that replicative processes, mediated either by repetitive elements or microhomology, account for the majority of CNVs in the healthy population. Genomic instability involving the formation of a non-B structure is demonstrated in one region.
Collapse
Affiliation(s)
- N L Chia
- ANU Medical School, Australian National University, Canberra, A.C.T., Australia.
| | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Copy Number Variation detection from 1000 Genomes Project exon capture sequencing data. BMC Bioinformatics 2012; 13:305. [PMID: 23157288 PMCID: PMC3563612 DOI: 10.1186/1471-2105-13-305] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Accepted: 11/07/2012] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function. RESULTS As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%. CONCLUSIONS This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection.
Collapse
|
16
|
Li W, Olivier M. Current analysis platforms and methods for detecting copy number variation. Physiol Genomics 2012; 45:1-16. [PMID: 23132758 DOI: 10.1152/physiolgenomics.00082.2012] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Copy number variation (CNV), generated through duplication or deletion events that affect one or more loci, is widespread in the human genomes and is often associated with functional consequences that may include changes in gene expression levels or fusion of genes. Genome-wide association studies indicate that some disease phenotypes and physiological pathways might be impacted by CNV in a small number of characterized genomic regions. However, the pervasiveness and full impact of such variation remains unclear. Suitable analytic methods are needed to thoroughly mine human genomes for genomic structural variation, and to explore the interplay between observed CNV and disease phenotypes, but many medical researchers are unfamiliar with the features and nuances of recently developed technologies for detecting CNV. In this article, we evaluate a suite of commonly used and recently developed approaches to uncovering genome-wide CNVs and discuss the relative merits of each.
Collapse
Affiliation(s)
- Wenli Li
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | |
Collapse
|
17
|
Zhang YP, Deng FY, Yang TL, Zhang F, Chen XD, Shen H, Zhu XZ, Tian Q, Deng HW. Genome-wide association study identified CNP12587 region underlying height variation in Chinese females. PLoS One 2012; 7:e44292. [PMID: 22957059 PMCID: PMC3434125 DOI: 10.1371/journal.pone.0044292] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2012] [Accepted: 08/01/2012] [Indexed: 12/01/2022] Open
Abstract
Introduction Human height is a highly heritable trait considered as an important factor for health. There has been limited success in identifying the genetic factors underlying height variation. We aim to identify sequence variants associated with adult height by a genome-wide association study of copy number variants (CNVs) in Chinese. Methods Genome-wide CNV association analyses were conducted in 1,625 unrelated Chinese adults and sex specific subgroup for height variation, respectively. Height was measured with a stadiometer. Affymetrix SNP6.0 genotyping platform was used to identify copy number polymorphisms (CNPs). We constructed a genomic map containing 1,009 CNPs in Chinese individuals and performed a genome-wide association study of CNPs with height. Results We detected 10 significant association signals for height (p<0.05) in the whole population, 9 and 11 association signals for Chinese female and male population, respectively. A copy number polymorphism (CNP12587, chr18:54081842-54086942, p = 2.41×10−4) was found to be significantly associated with height variation in Chinese females even after strict Bonferroni correction (p = 0.048). Confirmatory real time PCR experiments lent further support for CNV validation. Compared to female subjects with two copies of the CNP, carriers of three copies had an average of 8.1% decrease in height. An important candidate gene, ubiquitin-protein ligase NEDD4-like (NEDD4L), was detected at this region, which plays important roles in bone metabolism by binding to bone formation regulators. Conclusions Our findings suggest the important genetic variants underlying height variation in Chinese.
Collapse
Affiliation(s)
- Yin-Ping Zhang
- Key Laboratory of Environment and Genes Related to Diseases, Ministry of Education, College of Medicine, Xi’an Jiaotong University, Xi’an, P. R. China
- * E-mail: (Y-PZ); (H-WD)
| | - Fei-Yan Deng
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, United States of America
| | - Tie-Lin Yang
- The Key Laboratory of Biomedical Information Engineering, Ministry of Education and Institute of Molecular Genetics, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, P. R. China
| | - Feng Zhang
- Key Laboratory of Environment and Genes Related to Diseases, Ministry of Education, College of Medicine, Xi’an Jiaotong University, Xi’an, P. R. China
| | - Xiang-Ding Chen
- Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, P. R. China
| | - Hui Shen
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, United States of America
| | - Xue-Zheng Zhu
- Center of Systematic Biomedical Research, Shanghai University of Science and Technology, Shanghai, P. R. China
| | - Qing Tian
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, United States of America
| | - Hong-Wen Deng
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, United States of America
- The Key Laboratory of Biomedical Information Engineering, Ministry of Education and Institute of Molecular Genetics, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, P. R. China
- Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, P. R. China
- Center of Systematic Biomedical Research, Shanghai University of Science and Technology, Shanghai, P. R. China
- * E-mail: (Y-PZ); (H-WD)
| |
Collapse
|
18
|
Carr IM, Diggle CP, Khan K, Inglehearn C, McKibbin M, Bonthron DT, Markham AF, Anwar R, Dobbie A, Pena SDJ, Ali M. Rapid visualisation of microarray copy number data for the detection of structural variations linked to a disease phenotype. PLoS One 2012; 7:e43466. [PMID: 22912880 PMCID: PMC3422275 DOI: 10.1371/journal.pone.0043466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Accepted: 07/20/2012] [Indexed: 12/12/2022] Open
Abstract
Whilst the majority of inherited diseases have been found to be caused by single base substitutions, small insertions or deletions (<1Kb), a significant proportion of genetic variability is due to copy number variation (CNV). The possible role of CNV in monogenic and complex diseases has recently attracted considerable interest. However, until the development of whole genome, oligonucleotide micro-arrays, designed specifically to detect the presence of copy number variation, it was not easy to screen an individual for the presence of unknown deletions or duplications with sizes below the level of sensitivity of optical microscopy (3-5 Mb). Now that currently available oligonucleotide micro-arrays have in excess of a million probes, the problem of copy number analysis has moved from one of data production to that of data analysis. We have developed CNViewer, to identify copy number variation that co-segregates with a disease phenotype in small nuclear families, from genome-wide oligonucleotide micro-array data. This freely available program should constitute a useful addition to the diagnostic armamentarium of clinical geneticists.
Collapse
Affiliation(s)
- Ian M Carr
- School of Medicine, University of Leeds, Leeds, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Wineinger NE, Tiwari HK. The impact of errors in copy number variation detection algorithms on association results. PLoS One 2012; 7:e32396. [PMID: 22523537 PMCID: PMC3327691 DOI: 10.1371/journal.pone.0032396] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2011] [Accepted: 01/30/2012] [Indexed: 11/27/2022] Open
Abstract
The inaccuracy of copy number variation (CNV) detection on single nucleotide polymorphism (SNP) arrays has recently been brought to attention. Such high error rates will undoubtedly have ramifications on downstream association testing. We examined this effect for a wide range of scenarios and found a noticeable decrease in power for error rates typical of CNV calling algorithms. We compared power using CNV calls to the log relative ratio and found the latter to be superior when error rates are moderate to large or when the CNV size is small. It is our recommendation that CNV researchers use intensity measurements as an alternative to CNV calls in these scenarios.
Collapse
Affiliation(s)
- Nathan E Wineinger
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama, Birmingham, Alabama, United States of America.
| | | |
Collapse
|
20
|
Filges I, Suda L, Weber P, Datta AN, Fischer D, Dill P, Glanzmann R, Benzing J, Hegi L, Wenzel F, Huber AR, Mori AC, Miny P, Röthlisberger B. High resolution array in the clinical approach to chromosomal phenotypes. Gene 2012; 495:163-9. [PMID: 22240311 DOI: 10.1016/j.gene.2011.12.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Revised: 12/19/2011] [Accepted: 12/23/2011] [Indexed: 12/11/2022]
Abstract
Array genomic hybridization (AGH) has recently been implemented as a diagnostic tool for the detection of submicroscopic copy number variants (CNVs) in patients with developmental disorders. However, there is no consensus regarding the choice of the platform, the minimal resolution needed and systematic interpretation of CNVs. We report our experience in the clinical diagnostic use of high resolution AGH up to 100 kb on 131 patients with chromosomal phenotypes but previously normal karyotype. We evaluated the usefulness in our clinics and laboratories by the detection rate of causal CNVs and CNVs of unknown clinical significance and to what extent their interpretation would challenge the systematic use of high-resolution arrays in clinical application. Prioritizing phenotype-genotype correlation in our interpretation strategy to criteria previously described, we identified 33 (25.2%) potentially pathogenic aberrations. 16 aberrations were confirmed pathogenic (16.4% syndromic, 8.5% non-syndromic patients); 9 were new and individual aberrations, 3 of them were pathogenic although inherited and one is as small as approx 200 kb. 13 of 16 further CNVs of unknown significance were classified likely benign, for 3 the significance remained unclear. High resolution array allows the detection of up to 12.2% of pathogenic aberrations in a diagnostic clinical setting. Although the majority of aberrations are larger, the detection of small causal aberrations may be relevant for family counseling. The number of remaining unclear CNVs is limited. Careful phenotype-genotype correlations of the individual CNVs and clinical features are challenging but remain a hallmark for CNV interpretation.
Collapse
|
21
|
Stamoulis C, Betensky RA. A novel signal processing approach for the detection of copy number variations in the human genome. Bioinformatics 2011; 27:2338-45. [PMID: 21752800 DOI: 10.1093/bioinformatics/btr402] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Human genomic variability occurs at different scales, from single nucleotide polymorphisms (SNPs) to large DNA segments. Copy number variations (CNVs) represent a significant part of our genetic heterogeneity and have also been associated with many diseases and disorders. Short, localized CNVs, which may play an important role in human disease, may be undetectable in noisy genomic data. Therefore, robust methodologies are needed for their detection. Furthermore, for meaningful identification of pathological CNVs, estimation of normal allelic aberrations is necessary. RESULTS We developed a signal processing-based methodology for sequence denoising followed by pattern matching, to increase SNR in genomic data and improve CNV detection. We applied this signal-decomposition-matched filtering (SDMF) methodology to 429 normal genomic sequences, and compared detected CNVs to those in the Database of Genomic Variants. SDMF successfully detected a significant number of previously identified CNVs with frequencies of occurrence ≥10%, as well as unreported short CNVs. Its performance was also compared to circular binary segmentation (CBS). through simulations. SDMF had a significantly lower false detection rate and was significantly faster than CBS, an important advantage for handling large datasets generated with high-resolution arrays. By focusing on improving SNR (instead of the robustness of the detection algorithm), SDMF is a very promising methodology for identifying CNVs at all genomic spatial scales. AVAILABILITY The data are available at http://tcga-data.nci.nih.gov/tcga/ The software and list of analyzed sequence IDs are available at http://www.hsph.harvard.edu/~betensky/ A Matlab code for Empirical Mode Decomposition may be found at: http://www.clear.rice.edu/elec301/Projects02/empiricalMode/code.html CONTACT caterina@mit.edu.
Collapse
Affiliation(s)
- Catherine Stamoulis
- Department of Radiology, Harvard School of Public Health, Boston, MA 02115, USA.
| | | |
Collapse
|
22
|
Eckel-Passow JE, Atkinson EJ, Maharjan S, Kardia SLR, de Andrade M. Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform. BMC Bioinformatics 2011; 12:220. [PMID: 21627824 PMCID: PMC3146450 DOI: 10.1186/1471-2105-12-220] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 05/31/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments. RESULTS APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce. CONCLUSIONS If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.
Collapse
Affiliation(s)
- Jeanette E Eckel-Passow
- Division of Biomedical Statistics and Informatics, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA.
| | | | | | | | | |
Collapse
|
23
|
Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 2011; 12:R41. [PMID: 21527027 PMCID: PMC3218867 DOI: 10.1186/gb-2011-12-4-r41] [Citation(s) in RCA: 2332] [Impact Index Per Article: 166.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Revised: 02/14/2011] [Accepted: 04/28/2011] [Indexed: 12/18/2022] Open
Abstract
We describe methods with enhanced power and specificity to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, we improve the estimation of background rates for each category. We additionally describe a probabilistic method for defining the boundaries of selected-for SCNA regions with user-defined confidence. Here we detail this revised computational approach, GISTIC2.0, and validate its performance in real and simulated datasets.
Collapse
Affiliation(s)
- Craig H Mermel
- Cancer Program, The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA
| | | | | | | | | | | |
Collapse
|
24
|
Clevert DA, Mitterecker A, Mayr A, Klambauer G, Tuefferd M, De Bondt A, Talloen W, Göhlmann H, Hochreiter S. cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate. Nucleic Acids Res 2011; 39:e79. [PMID: 21486749 PMCID: PMC3130288 DOI: 10.1093/nar/gkr197] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, 'cn.FARMS', which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html.
Collapse
Affiliation(s)
- Djork-Arné Clevert
- Institute of Bioinformatics, Johannes Kepler University Linz, Linz, Austria
| | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Walter V, Nobel AB, Wright FA. DiNAMIC: a method to identify recurrent DNA copy number aberrations in tumors. Bioinformatics 2011; 27:678-85. [PMID: 21183584 PMCID: PMC3042182 DOI: 10.1093/bioinformatics/btq717] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Revised: 12/06/2010] [Accepted: 12/21/2010] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION DNA copy number gains and losses are commonly found in tumor tissue, and some of these aberrations play a role in tumor genesis and development. Although high resolution DNA copy number data can be obtained using array-based techniques, no single method is widely used to distinguish between recurrent and sporadic copy number aberrations. RESULTS Here we introduce Discovering Copy Number Aberrations Manifested In Cancer (DiNAMIC), a novel method for assessing the statistical significance of recurrent copy number aberrations. In contrast to competing procedures, the testing procedure underlying DiNAMIC is carefully motivated, and employs a novel cyclic permutation scheme. Extensive simulation studies show that DiNAMIC controls false positive discoveries in a variety of realistic scenarios. We use DiNAMIC to analyze two publicly available tumor datasets, and our results show that DiNAMIC detects multiple loci that have biological relevance. AVAILABILITY Source code implemented in R, as well as text files containing examples and sample datasets are available at http://www.bios.unc.edu/research/genomic_software/DiNAMIC.
Collapse
Affiliation(s)
- Vonn Walter
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| | | | | |
Collapse
|
26
|
Zhang D, Qian Y, Akula N, Alliey-Rodriguez N, Tang J, Gershon ES, Liu C. Accuracy of CNV Detection from GWAS Data. PLoS One 2011; 6:e14511. [PMID: 21249187 PMCID: PMC3020939 DOI: 10.1371/journal.pone.0014511] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Accepted: 12/19/2010] [Indexed: 12/01/2022] Open
Abstract
Several computer programs are available for detecting copy number variants (CNVs) using genome-wide SNP arrays. We evaluated the performance of four CNV detection software suites—Birdsuite, Partek, HelixTree, and PennCNV-Affy—in the identification of both rare and common CNVs. Each program's performance was assessed in two ways. The first was its recovery rate, i.e., its ability to call 893 CNVs previously identified in eight HapMap samples by paired-end sequencing of whole-genome fosmid clones, and 51,440 CNVs identified by array Comparative Genome Hybridization (aCGH) followed by validation procedures, in 90 HapMap CEU samples. The second evaluation was program performance calling rare and common CNVs in the Bipolar Genome Study (BiGS) data set (1001 bipolar cases and 1033 controls, all of European ancestry) as measured by the Affymetrix SNP 6.0 array. Accuracy in calling rare CNVs was assessed by positive predictive value, based on the proportion of rare CNVs validated by quantitative real-time PCR (qPCR), while accuracy in calling common CNVs was assessed by false positive/false negative rates based on qPCR validation results from a subset of common CNVs. Birdsuite recovered the highest percentages of known HapMap CNVs containing >20 markers in two reference CNV datasets. The recovery rate increased with decreased CNV frequency. In the tested rare CNV data, Birdsuite and Partek had higher positive predictive values than the other software suites. In a test of three common CNVs in the BiGS dataset, Birdsuite's call was 98.8% consistent with qPCR quantification in one CNV region, but the other two regions showed an unacceptable degree of accuracy. We found relatively poor consistency between the two “gold standards,” the sequence data of Kidd et al., and aCGH data of Conrad et al. Algorithms for calling CNVs especially common ones need substantial improvement, and a “gold standard” for detection of CNVs remains to be established.
Collapse
Affiliation(s)
- Dandan Zhang
- Department of Pathology, School of Medicine, Zhejiang University, Hangzhou, People's Republic of China
- Department of Psychiatry and Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, United States of America
| | - Yudong Qian
- Department of Psychiatry and Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, United States of America
| | - Nirmala Akula
- Intramural Research Programs, The National Institute of Mental Health, Bethesda, Maryland, United States of America
| | - Ney Alliey-Rodriguez
- Department of Psychiatry and Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, United States of America
| | - Jinsong Tang
- Institute of Mental Health, The Second Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | | | - Elliot S. Gershon
- Department of Psychiatry and Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (CL); (ESG)
| | - Chunyu Liu
- Department of Psychiatry and Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (CL); (ESG)
| |
Collapse
|
27
|
Chang PL, Dilkes BP, McMahon M, Comai L, Nuzhdin SV. Homoeolog-specific retention and use in allotetraploid Arabidopsis suecica depends on parent of origin and network partners. Genome Biol 2010; 11:R125. [PMID: 21182768 PMCID: PMC3046485 DOI: 10.1186/gb-2010-11-12-r125] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2010] [Revised: 11/06/2010] [Accepted: 12/23/2010] [Indexed: 01/17/2023] Open
Abstract
Background Allotetraploids carry pairs of diverged homoeologs for most genes. With the genome doubled in size, the number of putative interactions is enormous. This poses challenges on how to coordinate the two disparate genomes, and creates opportunities by enhancing the phenotypic variation. New combinations of alleles co-adapt and respond to new environmental pressures. Three stages of the allopolyploidization process - parental species divergence, hybridization, and genome duplication - have been well analyzed. The last stage of evolutionary adjustments remains mysterious. Results Homoeolog-specific retention and use were analyzed in Arabidopsis suecica (As), a species derived from A. thaliana (At) and A. arenosa (Aa) in a single event 12,000 to 300,000 years ago. We used 405,466 diagnostic features on tiling microarrays to recognize At and Aa contributions to the As genome and transcriptome: 324 genes lacked Aa contributions and 614 genes lacked At contributions within As. In leaf tissues, 3,458 genes preferentially expressed At homoeologs while 4,150 favored Aa homoeologs. These patterns were validated with resequencing. Genes with preferential use of Aa homoeologs were enriched for expression functions, consistent with the dominance of Aa transcription. Heterologous networks - mixed from At and Aa transcripts - were underrepresented. Conclusions Thousands of deleted and silenced homoeologs in the genome of As were identified. Since heterologous networks may be compromised by interspecies incompatibilities, these networks evolve co-biases, expressing either only Aa or only At homoeologs. This progressive change towards predominantly pure parental networks might contribute to phenotypic variability and plasticity, and enable the species to exploit a larger range of environments.
Collapse
Affiliation(s)
- Peter L Chang
- Molecular and Computational Biology, University of Southern California, 1050 Childs Way, RRI 201, Los Angeles, CA 90089-2910, USA.
| | | | | | | | | |
Collapse
|
28
|
Siggberg L, Ala-Mello S, Jaakkola E, Kuusinen E, Schuit R, Kohlhase J, Böhm D, Ignatius J, Knuutila S. Array CGH in molecular diagnosis of mental retardation - A study of 150 Finnish patients. Am J Med Genet A 2010; 152A:1398-410. [PMID: 20503314 DOI: 10.1002/ajmg.a.33402] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
We report on the results of an array comparative genomic hybridization (array CGH) study of 150 karyotypically normal Finnish patients with idiopathic mental retardation and/or dysmorphic features and/or malformations. Using high-resolution microarray analysis, we sought to identify clinically relevant microdeletions and microduplications in these patients. The results were confirmed using other methods and compared with findings reported in recent publications and internet databases. Small aberrations of potential clinical significance were found in 28 (18.6%) of the 150 patients. Eight of the identified aberrations are known to cause syndromes, 4 affected the X chromosome in males, 4 were familial, and 13 have yet to be associated with a phenotype. This study demonstrates the benefits of array CGH in clinical diagnostics of developmental disorders. Further, our findings give evidence of new syndromes.
Collapse
Affiliation(s)
- Linda Siggberg
- Department of Pathology, Haartman Institute, University of Helsinki, Helsinki, Finland.
| | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Wagner JR, Ge B, Pokholok D, Gunderson KL, Pastinen T, Blanchette M. Computational analysis of whole-genome differential allelic expression data in human. PLoS Comput Biol 2010; 6:e1000849. [PMID: 20628616 PMCID: PMC2900287 DOI: 10.1371/journal.pcbi.1000849] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2009] [Accepted: 06/02/2010] [Indexed: 12/16/2022] Open
Abstract
Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (∼750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3′ end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases. Measures of gene expression, and the search for regulatory regions in the genome responsible for differences in levels of gene expression, is one of the key paths of research used to identify disease causing genes, as well as explain differences between healthy individuals. Typically, experiments have measured and compared gene expression in multiple individuals, and used this information to attempt to map regulatory regions responsible. Differences in environment between individuals can, however, cause differences in gene expression unrelated to the underlying regulatory sequence. New genotyping technologies enable the measurement of expression of both copies of a particular gene, at loci that are heterozygous within a particular individual. This will therefore act as an internal control, as environmental factors will continue to affect the expression of both copies of a gene at presumably equal levels, and differences in expression are more likely to be explicable by differences in regulatory regions specific to the two copies of the gene itself. Differences between regulatory regions are expected to lead to differences in expression of the two copies (or the two alleles) of a particular gene, also known as allelic imbalance. We describe a set of signal processing methods for the reliable detection of allelic expression within the genome.
Collapse
Affiliation(s)
- James R. Wagner
- School of Computer Science, McGill University, Montreal, Quebec, Canada
| | - Bing Ge
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | | | | | - Tomi Pastinen
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
- Department of Human and Medical Genetics, McGill University Health Centre, McGill University, Montreal, Quebec, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
30
|
Li F, Zhou X, Huang W, Chang CC, Wong STC. Conditional random pattern model for copy number aberration detection. BMC Bioinformatics 2010; 11:200. [PMID: 20412592 PMCID: PMC2876128 DOI: 10.1186/1471-2105-11-200] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2009] [Accepted: 04/22/2010] [Indexed: 12/12/2022] Open
Abstract
Background DNA copy number aberration (CNA) is very important in the pathogenesis of tumors and other diseases. For example, CNAs may result in suppression of anti-oncogenes and activation of oncogenes, which would cause certain types of cancers. High density single nucleotide polymorphism (SNP) array data is widely used for the CNA detection. However, it is nontrivial to detect the CNA automatically because the signals obtained from high density SNP arrays often have low signal-to-noise ratio (SNR), which might be caused by whole genome amplification, mixtures of normal and tumor cells, experimental noise or other technical limitations. With the reduction in SNR, many false CNA regions are often detected and the true CNA regions are missed. Thus, more sophisticated statistical models are needed to make the CNAs detection, using the low SNR signals, more robust and reliable. Results This paper presents a conditional random pattern (CRP) model for CNA detection where much contextual cues are explored to suppress the noise and improve CNA detection accuracy. Both simulated and the real data are used to evaluate the proposed model, and the validation results show that the CRP model is more robust and reliable in the presence of noise for CNA detection using high density SNP array data, compared to a number of widely used software packages. Conclusions The proposed conditional random pattern (CRP) model could effectively detect the CNA regions in the presence of noise.
Collapse
Affiliation(s)
- Fuhai Li
- Center for Bioengineering and Informatics, Department of Radiology, The Methodist Hospital Research Institute, Weill Cornell Medical College, Houston, TX 77030, USA
| | | | | | | | | |
Collapse
|
31
|
Abstract
High-throughput genotyping technologies have become popular in studies that aim to reveal the genetics behind polygenic traits such as complex disease and the diverse response to some drug treatments. These technologies utilize bioinformatics tools to define strategies, analyze data, and estimate the final associations between certain genetic markers and traits. The strategy followed for an association study depends on its efficiency and cost. The efficiency is based on the assumed characteristics of the polymorphisms' allele frequencies and linkage disequilibrium for putative casual alleles. Statistically significant markers (single mutations or haplotypes) that cause a human disorder should be validated and their biological function elucidated. The aim of this chapter is to present a subset of bioinformatics tools for haplotype inference, tag SNP selection, and genome-wide association studies using a high-throughput generated SNP data set.
Collapse
Affiliation(s)
- Ana M Aransay
- Functional Genomics Unit, Parque Technológico de Bizkaia, Derio, Spain
| | | | | |
Collapse
|
32
|
Abstract
The analysis of cancer genomes has benefited from the advances in technology that enable data to be generated on an unprecedented scale, describing a tumour genome's sequence and composition at increasingly high resolution and reducing cost. This progress is likely to increase further over the coming years as next-generation sequencing approaches are applied to the study of cancer genomes, in tandem with large-scale efforts such as the Cancer Genome Atlas and recently announced International Cancer Genome Consortium efforts to complement those already established such as the Sanger Institute Cancer Genome Project. This presents challenges for the cancer researcher and the research community in general, in terms of analysing the data generated in one's own projects and also in coordinating and interrogating data that are publicly available. This review aims to provide a brief overview of some of the main informatics resources currently available and their use, and some of the informatics approaches that may be applied in the study of cancer genomes.
Collapse
Affiliation(s)
- Ian P Barrett
- Cancer Bioscience, AstraZeneca, Macclesfield, Cheshire, UK
| |
Collapse
|
33
|
Friedman J, Adam S, Arbour L, Armstrong L, Baross A, Birch P, Boerkoel C, Chan S, Chai D, Delaney AD, Flibotte S, Gibson WT, Langlois S, Lemyre E, Li HI, MacLeod P, Mathers J, Michaud JL, McGillivray BC, Patel MS, Qian H, Rouleau GA, Van Allen MI, Yong SL, Zahir FR, Eydoux P, Marra MA. Detection of pathogenic copy number variants in children with idiopathic intellectual disability using 500 K SNP array genomic hybridization. BMC Genomics 2009; 10:526. [PMID: 19917086 PMCID: PMC2781027 DOI: 10.1186/1471-2164-10-526] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Accepted: 11/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Array genomic hybridization is being used clinically to detect pathogenic copy number variants in children with intellectual disability and other birth defects. However, there is no agreement regarding the kind of array, the distribution of probes across the genome, or the resolution that is most appropriate for clinical use. RESULTS We performed 500 K Affymetrix GeneChip array genomic hybridization in 100 idiopathic intellectual disability trios, each comprised of a child with intellectual disability of unknown cause and both unaffected parents. We found pathogenic genomic imbalance in 16 of these 100 individuals with idiopathic intellectual disability. In comparison, we had found pathogenic genomic imbalance in 11 of 100 children with idiopathic intellectual disability in a previous cohort who had been studied by 100 K GeneChip array genomic hybridization. Among 54 intellectual disability trios selected from the previous cohort who were re-tested with 500 K GeneChip array genomic hybridization, we identified all 10 previously-detected pathogenic genomic alterations and at least one additional pathogenic copy number variant that had not been detected with 100 K GeneChip array genomic hybridization. Many benign copy number variants, including one that was de novo, were also detected with 500 K array genomic hybridization, but it was possible to distinguish the benign and pathogenic copy number variants with confidence in all but 3 (1.9%) of the 154 intellectual disability trios studied. CONCLUSION Affymetrix GeneChip 500 K array genomic hybridization detected pathogenic genomic imbalance in 10 of 10 patients with idiopathic developmental disability in whom 100 K GeneChip array genomic hybridization had found genomic imbalance, 1 of 44 patients in whom 100 K GeneChip array genomic hybridization had found no abnormality, and 16 of 100 patients who had not previously been tested. Effective clinical interpretation of these studies requires considerable skill and experience.
Collapse
Affiliation(s)
- Jm Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, Canada.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Greenman CD, Bignell G, Butler A, Edkins S, Hinton J, Beare D, Swamy S, Santarius T, Chen L, Widaa S, Futreal PA, Stratton MR. PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data. Biostatistics 2009; 11:164-75. [PMID: 19837654 PMCID: PMC2800165 DOI: 10.1093/biostatistics/kxp045] [Citation(s) in RCA: 168] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
High-throughput oligonucleotide microarrays are commonly employed to investigate genetic disease, including cancer. The algorithms employed to extract genotypes and copy number variation function optimally for diploid genomes usually associated with inherited disease. However, cancer genomes are aneuploid in nature leading to systematic errors when using these techniques. We introduce a preprocessing transformation and hidden Markov model algorithm bespoke to cancer. This produces genotype classification, specification of regions of loss of heterozygosity, and absolute allelic copy number segmentation. Accurate prediction is demonstrated with a combination of independent experimental techniques. These methods are exemplified with affymetrix genome-wide SNP6.0 data from 755 cancer cell lines, enabling inference upon a number of features of biological interest. These data and the coded algorithm are freely available for download.
Collapse
Affiliation(s)
- Chris D Greenman
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Array CGH in patients with learning disability (mental retardation) and congenital anomalies: updated systematic review and meta-analysis of 19 studies and 13,926 subjects. Genet Med 2009; 11:139-46. [PMID: 19367186 DOI: 10.1097/gim.0b013e318194ee8f] [Citation(s) in RCA: 145] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Array-based comparative genomic hybridization is being increasingly used in patients with learning disability (mental retardation) and congenital anomalies. In this article, we update our previous meta-analysis evaluating the diagnostic and false-positive yields of this technology. An updated systematic review and meta-analysis was conducted investigating patients with learning disability and congenital anomalies in whom conventional cytogenetic analyses have proven negative. Nineteen studies (13,926 patients) were included of which 12 studies (13,464 patients) were published since our previous analysis. The overall diagnostic yield of causal abnormalities was 10% (95% confidence interval: 8-12%). The overall number needed to test to identify an extra causal abnormality was 10 (95% confidence interval: 8-13). The overall false-positive yield of noncausal abnormalities was 7% (95% confidence interval: 5-10%). This updated meta-analysis provides new evidence to support the use of array-based comparative genomic hybridization in investigating patients with learning disability and congenital anomalies in whom conventional cytogenetic tests have proven negative. However, given that this technology also identifies false positives at a similar rate to causal variants, caution in clinical practice should be advised.
Collapse
|
36
|
McMullan DJ, Bonin M, Hehir-Kwa JY, de Vries BBA, Dufke A, Rattenberry E, Steehouwer M, Moruz L, Pfundt R, de Leeuw N, Riess A, Altug-Teber O, Enders H, Singer S, Grasshoff U, Walter M, Walker JM, Lamb CV, Davison EV, Brueton L, Riess O, Veltman JA. Molecular karyotyping of patients with unexplained mental retardation by SNP arrays: a multicenter study. Hum Mutat 2009; 30:1082-92. [PMID: 19388127 DOI: 10.1002/humu.21015] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2022]
Abstract
Genomic microarrays have been implemented in the diagnosis of patients with unexplained mental retardation. This method, although revolutionizing cytogenetics, is still limited to the detection of rare de novo copy number variants (CNVs). Genome-wide single nucleotide polymorphism (SNP) microarrays provide high-resolution genotype as well as CNV information in a single experiment. We hypothesize that the widespread use of these microarray platforms can be exploited to greatly improve our understanding of the genetic causes of mental retardation and many other common disorders, while already providing a robust platform for routine diagnostics. Here we report a detailed validation of Affymetrix 500k SNP microarrays for the detection of CNVs associated to mental retardation. After this validation we applied the same platform in a multicenter study to test a total of 120 patients with unexplained mental retardation and their parents. Rare de novo CNVs were identified in 15% of cases, showing the importance of this approach in daily clinical practice. In addition, much more genomic variation was observed in these patients as well as their parents. We provide all of these data for the scientific community to jointly enhance our understanding of these genomic variants and their potential role in this common disorder.
Collapse
Affiliation(s)
- Dominic J McMullan
- West Midlands Regional Genetics Laboratory and Clinical Genetics Unit, Birmingham Women's Hospital, Birmingham, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Parisi MA, Zayed H, Slavotinek AM, Rutledge JC. Congenital diaphragmatic hernia and microtia in a newborn with mycophenolate mofetil (MMF) exposure: phenocopy for Fryns syndrome or broad spectrum of teratogenic effects? Am J Med Genet A 2009; 149A:1237-40. [PMID: 19449404 DOI: 10.1002/ajmg.a.32684] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A newborn female infant born to a woman on immunosuppressive medications including mycophenolate mofetil (MMF) for a renal graft secondary to lupus nephritis presented with congenital diaphragmatic hernia (CDH) and additional findings of microtia, esophageal atresia with tracheoesophageal fistula, cleft palate, congenital heart defect, digital anomalies, and dysmorphic facial features. Pulmonary hypoplasia resulted in death at day 2 of life. She was presumed to have Fryns syndrome based on diagnostic criteria established for this recessive disorder with prominent features including CDH, facial anomalies, and nail hypoplasia. In retrospect, this infant's findings are more likely the result of teratogenic exposure to MMF, as more recent data have emerged linking aural atresia, digital anomalies, and dysmorphic features to this drug. To date, this is the only human report of CDH in an infant with prenatal exposure to MMF, although the manufacturer's package insert alludes to animal studies with a broad spectrum of malformations, including CDH. Thus, a teratogenic exposure can mimic a known Mendelian genetic syndrome, and caution is urged in presuming a genetic etiology for infants with potential teratogenic exposure to relatively new drugs with limited published animal data.
Collapse
Affiliation(s)
- Melissa A Parisi
- Department of Pediatrics, Seattle Children's Hospital, University of Washington, Seattle, Washington 98105, USA
| | | | | | | |
Collapse
|
38
|
Abstract
The last few years have seen major advances in common non-syndromic obesity research, much of it the result of genetic studies. This Review outlines the competing hypotheses about the mechanisms underlying the genetic and physiological basis of obesity, and then examines the recent explosion of genetic association studies that have yielded insights into obesity, both at the candidate gene level and the genome-wide level. With obesity genetics now entering the post-genome-wide association scan era, the obvious question is how to improve the results obtained so far using single nucleotide polymorphism markers and how to move successfully into the other areas of genomic variation that may be associated with common obesity.
Collapse
|
39
|
Cancer gene discovery in mouse and man. Biochim Biophys Acta Rev Cancer 2009; 1796:140-61. [PMID: 19285540 PMCID: PMC2756404 DOI: 10.1016/j.bbcan.2009.03.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2009] [Revised: 03/03/2009] [Accepted: 03/05/2009] [Indexed: 12/31/2022]
Abstract
The elucidation of the human and mouse genome sequence and developments in high-throughput genome analysis, and in computational tools, have made it possible to profile entire cancer genomes. In parallel with these advances mouse models of cancer have evolved into a powerful tool for cancer gene discovery. Here we discuss the approaches that may be used for cancer gene identification in both human and mouse and discuss how a cross-species 'oncogenomics' approach to cancer gene discovery represents a powerful strategy for finding genes that drive tumourigenesis.
Collapse
|
40
|
Genome-wide copy-number-variation study identified a susceptibility gene, UGT2B17, for osteoporosis. Am J Hum Genet 2008; 83:663-74. [PMID: 18992858 DOI: 10.1016/j.ajhg.2008.10.006] [Citation(s) in RCA: 159] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2008] [Revised: 10/01/2008] [Accepted: 10/09/2008] [Indexed: 11/24/2022] Open
Abstract
Osteoporosis, a highly heritable disease, is characterized mainly by low bone-mineral density (BMD), poor bone geometry, and/or osteoporotic fractures (OF). Copy-number variation (CNV) has been shown to be associated with complex human diseases. The contribution of CNV to osteoporosis has not been determined yet. We conducted case-control genome-wide CNV analyses, using the Affymetrix 500K Array Set, in 700 elderly Chinese individuals comprising 350 cases with homogeneous hip OF and 350 matched controls. We constructed a genomic map containing 727 CNV regions in Chinese individuals. We found that CNV 4q13.2 was strongly associated with OF (p = 2.0 x 10(-4), Bonferroni-corrected p = 0.02, odds ratio = 1.73). Validation experiments using PCR and electrophoresis, as well as real-time PCR, further identified a deletion variant of UGT2B17 in CNV 4q13.2. Importantly, the association between CNV of UGT2B17 and OF was successfully replicated in an independent Chinese sample containing 399 cases with hip OF and 400 controls. We further examined this CNV's relevance to major risk factors for OF (i.e., hip BMD and femoral-neck bone geometry) in both Chinese (689 subjects) and white (1000 subjects) samples and found consistently significant results (p = 5.0 x 10(-4) -0.021). Because UGT2B17 encodes an enzyme catabolizing steroid hormones, we measured the concentrations of serum testosterone and estradiol for 236 young Chinese males and assessed their UGT2B17 copy number. Subjects without UGT2B17 had significantly higher concentrations of testosterone and estradiol. Our findings suggest the important contribution of CNV of UGT2B17 to the pathogenesis of osteoporosis.
Collapse
|
41
|
Friedman JM. High-resolution array genomic hybridization in prenatal diagnosis. Prenat Diagn 2008; 29:20-8. [DOI: 10.1002/pd.2129] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
42
|
Flibotte S, Moerman DG. Experimental analysis of oligonucleotide microarray design criteria to detect deletions by comparative genomic hybridization. BMC Genomics 2008; 9:497. [PMID: 18940006 PMCID: PMC2577661 DOI: 10.1186/1471-2164-9-497] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Accepted: 10/21/2008] [Indexed: 12/31/2022] Open
Abstract
Background Microarray comparative genomic hybridization (CGH) is currently one of the most powerful techniques to measure DNA copy number in large genomes. In humans, microarray CGH is widely used to assess copy number variants in healthy individuals and copy number aberrations associated with various diseases, syndromes and disease susceptibility. In model organisms such as Caenorhabditis elegans (C. elegans) the technique has been applied to detect mutations, primarily deletions, in strains of interest. Although various constraints on oligonucleotide properties have been suggested to minimize non-specific hybridization and improve the data quality, there have been few experimental validations for CGH experiments. For genomic regions where strict design filters would limit the coverage it would also be useful to quantify the expected loss in data quality associated with relaxed design criteria. Results We have quantified the effects of filtering various oligonucleotide properties by measuring the resolving power for detecting deletions in the human and C. elegans genomes using NimbleGen microarrays. Approximately twice as many oligonucleotides are typically required to be affected by a deletion in human DNA samples in order to achieve the same statistical confidence as one would observe for a deletion in C. elegans. Surprisingly, the ability to detect deletions strongly depends on the oligonucleotide 15-mer count, which is defined as the sum of the genomic frequency of all the constituent 15-mers within the oligonucleotide. A similarity level above 80% to non-target sequences over the length of the probe produces significant cross-hybridization. We recommend the use of a fairly large melting temperature window of up to 10°C, the elimination of repeat sequences, the elimination of homopolymers longer than 5 nucleotides, and a threshold of -1 kcal/mol on the oligonucleotide self-folding energy. We observed very little difference in data quality when varying the oligonucleotide length between 50 and 70, and even when using an isothermal design strategy. Conclusion We have determined experimentally the effects of varying several key oligonucleotide microarray design criteria for detection of deletions in C. elegans and humans with NimbleGen's CGH technology. Our oligonucleotide design recommendations should be applicable for CGH analysis in most species.
Collapse
Affiliation(s)
- Stephane Flibotte
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada.
| | | |
Collapse
|
43
|
Kang TW, Jeon YJ, Jang E, Kim HJ, Kim JH, Park JL, Lee S, Kim YS, Kim JY, Kim SY. Copy number variations (CNVs) identified in Korean individuals. BMC Genomics 2008; 9:492. [PMID: 18928558 PMCID: PMC2576253 DOI: 10.1186/1471-2164-9-492] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2008] [Accepted: 10/18/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variations (CNVs) are deletions, insertions, duplications, and more complex variations ranging from 1 kb to sub-microscopic sizes. Recent advances in array technologies have enabled researchers to identify a number of CNVs from normal individuals. However, the identification of new CNVs has not yet reached saturation, and more CNVs from diverse populations remain to be discovered. RESULTS We identified 65 copy number variation regions (CNVRs) in 116 normal Korean individuals by analyzing Affymetrix 250 K Nsp whole-genome SNP data. Ten of these CNVRs were novel and not present in the Database of Genomic Variants (DGV). To increase the specificity of CNV detection, three algorithms, CNAG, dChip and GEMCA, were applied to the data set, and only those regions recognized at least by two algorithms were identified as CNVs. Most CNVRs identified in the Korean population were rare (<1%), occurring just once among the 116 individuals. When CNVs from the Korean population were compared with CNVs from the three HapMap ethnic groups, African, European, and Asian; our Korean population showed the highest degree of overlap with the Asian population, as expected. However, the overlap was less than 40%, implying that more CNVs remain to be discovered from the Asian population as well as from other populations. Genes in the novel CNVRs from the Korean population were enriched for genes involved in regulation and development processes. CONCLUSION CNVs are recently-recognized structural variations among individuals, and more CNVs need to be identified from diverse populations. Until now, CNVs from Asian populations have been studied less than those from European or American populations. In this regard, our study of CNVs from the Korean population will contribute to the full cataloguing of structural variation among diverse human populations.
Collapse
Affiliation(s)
- Tae-Wook Kang
- Medical Genomics Research Center, KRIBB, 52 Eoeun-dong, Yuseong-gu, Daejeon 305-806, Republic of Korea.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Morozova O, Marra MA. From cytogenetics to next-generation sequencing technologies: advances in the detection of genome rearrangements in tumorsThis paper is one of a selection of papers published in this Special Issue, entitled CSBMCB — Systems and Chemical Biology, and has undergone the Journal's usual peer review process. Biochem Cell Biol 2008; 86:81-91. [PMID: 18443621 DOI: 10.1139/o08-003] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Genome rearrangements have long been recognized as hallmarks of human tumors and have been used to diagnose cancer. Techniques used to detect genome rearrangements have evolved from microscopic examinations of chromosomes to the more recent microarray-based approaches. The availability of next-generation sequencing technologies may provide a means for scrutinizing entire cancer genomes and transcriptomes at unparalleled resolution. Here we review the methods that have been used to detect genome rearrangements and discuss the scope and limitations of each approach. We end with a discussion of the potential that next-generation sequencing technologies may offer to the field.
Collapse
Affiliation(s)
- Olena Morozova
- BC Cancer Agency Genome Sciences Centre, Suite 100-570 West 7th Avenue, Vancouver, BC V5Z 4S6, Canada
| | - Marco A. Marra
- BC Cancer Agency Genome Sciences Centre, Suite 100-570 West 7th Avenue, Vancouver, BC V5Z 4S6, Canada
| |
Collapse
|