1
|
Yu X, Luo X, Cai G, Xiao F. OSCAA: A two-dimensional Gaussian mixture model for copy number variation association analysis. Genet Epidemiol 2024. [PMID: 38533840 DOI: 10.1002/gepi.22558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 01/30/2024] [Accepted: 03/05/2024] [Indexed: 03/28/2024]
Abstract
Copy number variants (CNVs) are prevalent in the human genome and are found to have a profound effect on genomic organization and human diseases. Discovering disease-associated CNVs is critical for understanding the pathogenesis of diseases and aiding their diagnosis and treatment. However, traditional methods for assessing the association between CNVs and disease risks adopt a two-stage strategy conducting quantitative CNV measurements first and then testing for association, which may lead to biased association estimation and low statistical power, serving as a major barrier in routine genome-wide assessment of such variation. In this article, we developed One-Stage CNV-disease Association Analysis (OSCAA), a flexible algorithm to discover disease-associated CNVs for both quantitative and qualitative traits. OSCAA employs a two-dimensional Gaussian mixture model that is built upon the PCs from copy number intensities, accounting for technical biases in CNV detection while simultaneously testing for their effect on outcome traits. In OSCAA, CNVs are identified and their associations with disease risk are evaluated simultaneously in a single step, taking into account the uncertainty of CNV identification in the statistical model. Our simulations demonstrated that OSCAA outperformed the existing one-stage method and traditional two-stage methods by yielding a more accurate estimate of the CNV-disease association, especially for short CNVs or CNVs with weak signals. In conclusion, OSCAA is a powerful and flexible approach for CNV association testing with high sensitivity and specificity, which can be easily applied to different traits and clinical risk predictions.
Collapse
Affiliation(s)
- Xuanxuan Yu
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina, USA
| | - Xizhi Luo
- Data and Statistical Sciences, AbbVie Inc., North Chicago, Illinois, USA
| | - Guoshuai Cai
- Department of Surgery, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Feifei Xiao
- Department of Biostatistics, College of Public Health and Health Promotion & College of Medicine, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
2
|
Wu J, Wu T, Xie X, Niu Q, Zhao Z, Zhu B, Chen Y, Zhang L, Gao X, Niu X, Gao H, Li J, Xu L. Genetic Association Analysis of Copy Number Variations for Meat Quality in Beef Cattle. Foods 2023; 12:3986. [PMID: 37959106 PMCID: PMC10647706 DOI: 10.3390/foods12213986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 10/24/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
Meat quality is an economically important trait for global food production. Copy number variations (CNVs) have been previously implicated in elucidating the genetic basis of complex traits. In this article, we detected a total of 112,198 CNVs and 10,102 CNV regions (CNVRs) based on the Bovine HD SNP array. Next, we performed a CNV-based genome-wide association analysis (GWAS) of six meat quality traits and identified 12 significant CNV segments corresponding to eight candidate genes, including PCDH15, CSMD3, etc. Using region-based association analysis, we further identified six CNV segments relevant to meat quality in beef cattle. Among these, TRIM77 and TRIM64 within CNVR4 on BTA29 were detected as candidate genes for backfat thickness (BFT). Notably, we identified a 34 kb duplication for meat color (MC) which was supported by read-depth signals, and this duplication was embedded within the keratin gene family including KRT4, KRT78, and KRT79. Our findings will help to dissect the genetic architecture of meat quality traits from the aspects of CNVs, and subsequently improve the selection process in breeding programs.
Collapse
Affiliation(s)
- Jiayuan Wu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Tianyi Wu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Xueyuan Xie
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Jinzhong 030801, China
| | - Qunhao Niu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Zhida Zhao
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Bo Zhu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Yan Chen
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Lupei Zhang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Xue Gao
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Xiaoyan Niu
- College of Animal Science and Veterinary Medicine, Shanxi Agricultural University, Jinzhong 030801, China
| | - Huijiang Gao
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Junya Li
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| | - Lingyang Xu
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (J.W.); (B.Z.); (L.Z.); (J.L.)
| |
Collapse
|
3
|
Yu X, Luo X, Cai G, Xiao F. OSCAA: A Two-Dimensional Gaussian Mixture Model for Copy Number Variation Association Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.25.559392. [PMID: 37808739 PMCID: PMC10557568 DOI: 10.1101/2023.09.25.559392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Copy number variants (CNVs) are prevalent in the human genome which provide profound effect on genomic organization and human diseases. Discovering disease associated CNVs is critical for understanding the pathogenesis of diseases and aiding their diagnosis and treatment. However, traditional methods for assessing the association between CNVs and disease risks adopt a two-stage strategy conducting quantitative CNV measurements first and then testing for association, which may lead to biased association estimation and low statistical power, serving as a major barrier in routine genome wide assessment of such variation. In this article, we developed OSCAA, a flexible algorithm to discover disease associated CNVs for both quantitative and qualitative traits. OSCAA employs a two-dimensional Gaussian mixture model that is built upon the principal components from copy number intensities, accounting for technical biases in CNV detection while simultaneously testing for their effect on outcome traits. In OSCAA, CNVs are identified and their associations with disease risk are evaluated simultaneously in a single step, taking into account the uncertainty of CNV identification in the statistical model. Our simulations demonstrated that OSCAA outperformed the existing one-stage method and traditional two-stage methods by yielding a more accurate estimate of the CNV-disease association, especially for short CNVs or CNVs with weak signal. In conclusion, OSCAA is a powerful and flexible approach for CNV association testing with high sensitivity and specificity, which can be easily applied to different traits and clinical risk predictions.
Collapse
|
4
|
Yuan L, Sun T, Zhao J, Shen Z. A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources. Front Genet 2021; 12:696956. [PMID: 34267783 PMCID: PMC8276077 DOI: 10.3389/fgene.2021.696956] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Copy number variation (CNV) may contribute to the development of complex diseases. However, due to the complex mechanism of path association and the lack of sufficient samples, understanding the relationship between CNV and cancer remains a major challenge. The unprecedented abundance of CNV, gene, and disease label data provides us with an opportunity to design a new machine learning framework to predict potential disease-related CNVs. In this paper, we developed a novel machine learning approach, namely, IHI-BMLLR (Integrating Heterogeneous Information sources with Biweight Mid-correlation and L1-regularized Logistic Regression under stability selection), to predict the CNV-disease path associations by using a data set containing CNV, disease state labels, and gene data. CNVs, genes, and diseases are connected through edges and then constitute a biological association network. To construct a biological network, we first used a self-adaptive biweight mid-correlation (BM) formula to calculate correlation coefficients between CNVs and genes. Then, we used logistic regression with L1 penalty (LLR) function to detect genes related to disease. We added stability selection strategy, which can effectively reduce false positives, when using self-adaptive BM and LLR. Finally, a weighted path search algorithm was applied to find top D path associations and important CNVs. The experimental results on both simulation and prostate cancer data show that IHI-BMLLR is significantly better than two state-of-the-art CNV detection methods (i.e., CCRET and DPtest) under false-positive control. Furthermore, we applied IHI-BMLLR to prostate cancer data and found significant path associations. Three new cancer-related genes were discovered in the paths, and these genes need to be verified by biological research in the future.
Collapse
Affiliation(s)
- Lin Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Tao Sun
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Jing Zhao
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, China
| |
Collapse
|
5
|
Apetrei A, Molin A, Gruchy N, Godin M, Bracquemart C, Resbeut A, Rey G, Nadeau G, Richard N. A novel synonymous variant in exon 1 of GNAS gene results in a cryptic splice site and causes pseudohypoparathyroidism type 1A and pseudo-pseudohypoparathyroidism in a French family. Bone Rep 2021; 14:101073. [PMID: 33997150 PMCID: PMC8100090 DOI: 10.1016/j.bonr.2021.101073] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 04/17/2021] [Accepted: 04/20/2021] [Indexed: 12/31/2022] Open
Abstract
INTRODUCTION Pseudohypoparathyroidism type 1A (PHP1A) and pseudopseudohypoparathyroidism (PPHP) (Inactivating PTH/PTHrP Signaling Disorders type 2, IPPSD2) are two rare autosomal disorders caused by loss-of-function mutations on either maternal or paternal allele, respectively, in the imprinted GNAS gene, which encodes the α subunit of the ubiquitously-expressed stimulatory G protein (Gαs). CASE PRESENTATION We investigated a synonymous GNAS variant NM_001077488.2: c.108C>A / p.(Val36=) identified in a family presenting with IPPSD2 phenotype. In silico splicing prediction algorithms were in favor of a deleterious effect of this variant, by creating a new donor splicing site. The GNAS expression studies in blood suggested haploinsufficiency and showed an alternate splice product demonstrating the unmasking of a cryptic site, leading to a 34 base pairs deletion and the creation of a probable unstable RNA.We present the first familial case of IPPSD2 caused by a pathogenic synonymous variant in GNAS gene.
Collapse
Affiliation(s)
- Andreea Apetrei
- Normandy University, UNICAEN, Caen University Hospital, Department of Genetics, Reference Center of Rare Diseases of Calcium and Phosphorus Metabolism, EA 7450 BioTARGen, Caen, France
| | - Arnaud Molin
- Normandy University, UNICAEN, Caen University Hospital, Department of Genetics, Reference Center of Rare Diseases of Calcium and Phosphorus Metabolism, EA 7450 BioTARGen, Caen, France
| | - Nicolas Gruchy
- Normandy University, UNICAEN, Caen University Hospital, Department of Genetics, Reference Center of Rare Diseases of Calcium and Phosphorus Metabolism, EA 7450 BioTARGen, Caen, France
| | - Manon Godin
- Normandy University, UNICAEN, Caen University Hospital, Department of Genetics, Reference Center of Rare Diseases of Calcium and Phosphorus Metabolism, EA 7450 BioTARGen, Caen, France
| | - Claire Bracquemart
- Normandy University, UNICAEN, Caen University Hospital, Department of Genetics, Reference Center of Rare Diseases of Calcium and Phosphorus Metabolism, EA 7450 BioTARGen, Caen, France
| | - Antoine Resbeut
- Normandy University, UNICAEN, Caen University Hospital, Department of Genetics, Reference Center of Rare Diseases of Calcium and Phosphorus Metabolism, EA 7450 BioTARGen, Caen, France
| | - Gaëlle Rey
- Metropole Savoie Hospital Center, Genetics Department, Chambéry, France
| | - Gwenaël Nadeau
- Metropole Savoie Hospital Center, Genetics Department, Chambéry, France
| | - Nicolas Richard
- Normandy University, UNICAEN, Caen University Hospital, Department of Genetics, Reference Center of Rare Diseases of Calcium and Phosphorus Metabolism, EA 7450 BioTARGen, Caen, France
| |
Collapse
|
6
|
Dennis J, Walker L, Tyrer J, Michailidou K, Easton DF. Detecting rare copy number variants from Illumina genotyping arrays with the CamCNV pipeline: Segmentation of z-scores improves detection and reliability. Genet Epidemiol 2021; 45:237-248. [PMID: 33020983 PMCID: PMC8005414 DOI: 10.1002/gepi.22367] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 09/03/2020] [Accepted: 09/22/2020] [Indexed: 01/21/2023]
Abstract
The intensities from genotyping array data can be used to detect copy number variants (CNVs) but a high level of noise in the data and overlap between different copy-number intensity distributions produces unreliable calls, particularly when only a few probes are covered by the CNV. We present a novel pipeline (CamCNV) with a series of steps to reduce noise and detect more reliably CNVs covering as few as three probes. The pipeline aims to detect rare CNVs (below 1% frequency) for association tests in large cohorts. The method uses the information from all samples to convert intensities to z-scores, thus adjusting for variance between probes. We tested the sensitivity of our pipeline by looking for known CNVs from the 1000 Genomes Project in our genotyping of 1000 Genomes samples. We also compared the CNV calls for 1661 pairs of genotyped replicate samples. At the chosen mean z-score cut-off, sensitivity to detect the 1000 Genomes CNVs was approximately 85% for deletions and 65% for duplications. From the replicates, we estimate the false discovery rate is controlled at ∼10% for deletions (falling to below 3% with more than five probes) and ∼28% for duplications. The pipeline demonstrates improved sensitivity when compared to calling with PennCNV, particularly for short deletions covering only a few probes. For each called CNV, the mean z-score is a useful metric for controlling the false discovery rate.
Collapse
Affiliation(s)
- Joe Dennis
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
| | - Logan Walker
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | - Jonathan Tyrer
- Department of Oncology, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
| | - Kyriaki Michailidou
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
- Biostatistics Unit, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
- Cyprus School of Molecular Medicine, Nicosia, Cyprus
| | - Douglas F Easton
- Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK
| |
Collapse
|
7
|
González JR, López-Sánchez M, Cáceres A, Puig P, Esko T, Pérez-Jurado LA. MADloy: robust detection of mosaic loss of chromosome Y from genotype-array-intensity data. BMC Bioinformatics 2020; 21:533. [PMID: 33225898 PMCID: PMC7682048 DOI: 10.1186/s12859-020-03768-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 09/20/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Accurate protocols and methods to robustly detect the mosaic loss of chromosome Y (mLOY) are needed given its reported role in cancer, several age-related disorders and overall male mortality. Intensity SNP-array data have been used to infer mLOY status and to determine its prominent role in male disease. However, discrepancies of reported findings can be due to the uncertainty and variability of the methods used for mLOY detection and to the differences in the tissue-matrix used. RESULTS We created a publicly available software tool called MADloy (Mosaic Alteration Detection for LOY) that incorporates existing methods and includes a new robust approach, allowing efficient calling in large studies and comparisons between methods. MADloy optimizes mLOY calling by correctly modeling the underlying reference population with no-mLOY status and incorporating B-deviation information. We observed improvements in the calling accuracy to previous methods, using experimentally validated samples, and an increment in the statistical power to detect associations with disease and mortality, using simulation studies and real dataset analyses. To understand discrepancies in mLOY detection across different tissues, we applied MADloy to detect the increment of mLOY cellularity in blood on 18 individuals after 3 years and to confirm that its detection in saliva was sub-optimal (41%). We additionally applied MADloy to detect the down-regulation genes in the chromosome Y in kidney and bladder tumors with mLOY, and to perform pathway analyses for the detection of mLOY in blood. CONCLUSIONS MADloy is a new software tool implemented in R for the easy and robust calling of mLOY status across different tissues aimed to facilitate its study in large epidemiological studies.
Collapse
Affiliation(s)
- Juan R González
- Barcelona Institute for Global Health (ISGlobal), 08003, Barcelona, Spain. .,Centro de Investigación Biomédica en Red en Epidemiología Y Salud Pública (CIBERESP), Madrid, Spain. .,Department of Mathematics, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain.
| | - Marcos López-Sánchez
- Genetics Unit, Universitat Pompeu Fabra, Barcelona, Spain.,Institut Hospital del Mar D'Investigacions Mediques (IMIM), Barcelona, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Madrid, Spain
| | - Alejandro Cáceres
- Barcelona Institute for Global Health (ISGlobal), 08003, Barcelona, Spain.,Centro de Investigación Biomédica en Red en Epidemiología Y Salud Pública (CIBERESP), Madrid, Spain
| | - Pere Puig
- Department of Mathematics, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain
| | - Tonu Esko
- Estonian Genome Centre Science Centre, University of Tartu, Tartu, Estonia
| | - Luis A Pérez-Jurado
- Genetics Unit, Universitat Pompeu Fabra, Barcelona, Spain.,Institut Hospital del Mar D'Investigacions Mediques (IMIM), Barcelona, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Madrid, Spain.,SA Clinical Genetics, Women's and Children's Hospital, Adelaide, Australia.,South Australian Health and Medical Research Institute, University of Adelaide, Adelaide, Australia
| |
Collapse
|
8
|
Cristiano S, McKean D, Carey J, Bracci P, Brennan P, Chou M, Du M, Gallinger S, Goggins MG, Hassan MM, Hung RJ, Kurtz RC, Li D, Lu L, Neale R, Olson S, Petersen G, Rabe KG, Fu J, Risch H, Rosner GL, Ruczinski I, Klein AP, Scharpf RB. Bayesian copy number detection and association in large-scale studies. BMC Cancer 2020; 20:856. [PMID: 32894098 PMCID: PMC7487704 DOI: 10.1186/s12885-020-07304-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 08/17/2020] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. METHODS We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. RESULTS Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). CONCLUSIONS Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.
Collapse
Affiliation(s)
- Stephen Cristiano
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - David McKean
- Department of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jacob Carey
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Paige Bracci
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
| | - Paul Brennan
- Genetics Section, International Agency for Research on Cancer, Lyon, France
| | - Michael Chou
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Mengmeng Du
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, 10065, NY, USA
| | - Steven Gallinger
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Toronto, M5G 1x5, Ontario, Canada
| | - Michael G Goggins
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Manal M Hassan
- Department of Epidemiology, Cancer Prevention & Population Sciences, UT MD Anderson Cancer Center, Houston, 77030, TX, USA
| | - Rayjean J Hung
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Toronto, M5G 1x5, Ontario, Canada
| | - Robert C Kurtz
- Department of Gastroenterology, Hepatology, and Nutrition Service, Memorial Sloan Kettering Cancer Center, New York, 10065, NY, USA
| | - Donghui Li
- Department of Gastrointestinal Medical Oncology, University of Texas MD Anderson Cancer Center, Houston, 77030, TX, USA
| | - Lingeng Lu
- Department of Chronic Disease Epidemiology, Yale School of Public Health, Yale Cancer Center, New Haven, CT, USA
| | - Rachel Neale
- Population Health Department, QIMR Berghofer Medical Research Institute, Brisbane, 4029, Australia
| | - Sara Olson
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, 10065, NY, USA
| | - Gloria Petersen
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, 55905, MN, USA
| | - Kari G Rabe
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, 55905, MN, USA
| | - Jack Fu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Harvey Risch
- Department of Chronic Disease Epidemiology, Yale School of Public Health, Yale Cancer Center, New Haven, CT, USA
| | - Gary L Rosner
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Epidemiology, Cancer Prevention & Population Sciences, UT MD Anderson Cancer Center, Houston, 77030, TX, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Alison P Klein
- Department of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| | - Robert B Scharpf
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Oncology The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
9
|
Jun Shin S, Wu Y, Hao N. A backward procedure for change‐point detection with applications to copy number variation detection. CAN J STAT 2020. [DOI: 10.1002/cjs.11535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Seung Jun Shin
- Department of StatisticsKorea UniversitySeoul South Korea
| | - Yichao Wu
- Department of Mathematics, Statistics, and Computer ScienceThe University of Illinois at ChicagoChicago IL U.S.A
| | - Ning Hao
- Department of MathematicsThe University of ArizonaTuscon AZ U.S.A
| |
Collapse
|
10
|
Xiao F, Luo X, Hao N, Niu YS, Xiao X, Cai G, Amos CI, Zhang H. An accurate and powerful method for copy number variation detection. Bioinformatics 2020; 35:2891-2898. [PMID: 30649252 DOI: 10.1093/bioinformatics/bty1041] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Revised: 11/28/2018] [Accepted: 01/09/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. RESULTS Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. AVAILABILITY AND IMPLEMENTATION http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Feifei Xiao
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA
| | - Xizhi Luo
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA
| | - Ning Hao
- Department of Mathematics, University of Arizona, Tucson, AZ, USA
| | - Yue S Niu
- Department of Mathematics, University of Arizona, Tucson, AZ, USA
| | - Xiangjun Xiao
- Department of Quantitative Sciences, Baylor College of Medicine, Houston, TX, USA
| | - Guoshuai Cai
- Department of Environmental Health Science, University of South Carolina, Columbia, SC, USA
| | - Christopher I Amos
- Department of Quantitative Sciences, Baylor College of Medicine, Houston, TX, USA
| | - Heping Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
11
|
Liu M, Fang L, Liu S, Pan MG, Seroussi E, Cole JB, Ma L, Chen H, Liu GE. Array CGH-based detection of CNV regions and their potential association with reproduction and other economic traits in Holsteins. BMC Genomics 2019; 20:181. [PMID: 30845913 PMCID: PMC6407259 DOI: 10.1186/s12864-019-5552-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Accepted: 02/21/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variations (CNVs) are structural variants consisting of large-scale insertions and deletions of genomic fragments. Exploring CNVs and estimating their effects on phenotypes are useful for genome selection but remain challenging in the livestock. RESULTS We identified 1043 CNV regions (CNVRs) from array comparative genomic hybridization (CGH) data of 47 Holstein bulls. Using a probe-based CNV association approach, we detected 87 CNVRs significantly (Bonferroni-corrected P value < 0.05) associated with at least one out of 41 complex traits. Within them, 39 CNVRs were simultaneously associated with at least 2 complex traits. Notably, 24 CNVRs were markedly related to daughter pregnancy rate (DPR). For example, CNVR661 containing CYP4A11 and CNVR213 containing CTR9, respectively, were associated with DPR and other traits related to reproduction, production, and body conformation. CNVR758 was also significantly related to DPR, with a nearby gene CAPZA3, encoding one of F-actin-capping proteins which play a role in determining sperm architecture and male fertility. We corroborated these CNVRs by examining their overlapped quantitative trait loci and comparing with previously published CNV results. CONCLUSION To our knowledge, this is one of the first genome-wide association studies based on CNVs called by array CGH in Holstein cattle. Our results contribute substantial information about the potential CNV impacts on reproduction, health, production, and body conformation traits, which lay the foundation for incorporating CNV into the future dairy cattle breeding program.
Collapse
Affiliation(s)
- Mei Liu
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Agricultural Molecular Biology, Yangling, 712100 Shaanxi China
- Animal Genomics and Improvement Laboratory, BARC, Agricultural Research Service, USDA, Beltsville, MD 20705 USA
| | - Lingzhao Fang
- Animal Genomics and Improvement Laboratory, BARC, Agricultural Research Service, USDA, Beltsville, MD 20705 USA
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD USA
| | - Shuli Liu
- Animal Genomics and Improvement Laboratory, BARC, Agricultural Research Service, USDA, Beltsville, MD 20705 USA
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193 China
| | - Michael G. Pan
- Animal Genomics and Improvement Laboratory, BARC, Agricultural Research Service, USDA, Beltsville, MD 20705 USA
| | - Eyal Seroussi
- Agricultural Research Organization (ARO), Volcani Center, Institute of Animal Science, Department of Quantitative and Molecular Genetics, HaMaccabim Road, P.O.B 15159, 7528809 Rishon LeTsiyon, Israel
| | - John B. Cole
- Animal Genomics and Improvement Laboratory, BARC, Agricultural Research Service, USDA, Beltsville, MD 20705 USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD USA
| | - Hong Chen
- College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Agricultural Molecular Biology, Yangling, 712100 Shaanxi China
| | - George E. Liu
- Animal Genomics and Improvement Laboratory, BARC, Agricultural Research Service, USDA, Beltsville, MD 20705 USA
| |
Collapse
|
12
|
Xu L, Yang L, Wang L, Zhu B, Chen Y, Gao H, Gao X, Zhang L, Liu GE, Li J. Probe-based association analysis identifies several deletions associated with average daily gain in beef cattle. BMC Genomics 2019; 20:31. [PMID: 30630414 PMCID: PMC6327516 DOI: 10.1186/s12864-018-5403-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 12/20/2018] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Average daily gain (ADG) is an important trait that contributes to the production efficiency and economic benefits in the beef cattle industry. The molecular mechanisms of ADG have not yet been fully explored because most recent association studies for ADG are based on SNPs or haplotypes. We reported a systematic CNV discovery and association analysis for ADG in Chinese Simmental beef cattle. RESULTS Our study identified 4912 nonredundant CNVRs with a total length of ~ 248.7 Mb, corresponding to ~ 8.9% of the cattle genome. Using probe-based CNV association, we identified 24 and 12 significant SNP probes within five deletions and two duplications for ADG, respectively. Among them, we found one common deletion with 89 kb imbedded in LHFPL Tetraspan Subfamily Member 6 (LHFPL6) at 22.9 Mb on BTA12, which has high frequency (12.9%) dispersing across population. CNV selection test using VST statistic suggested this common deletion may be under positive selection in Chinese Simmental cattle. Moreover, this deletion was not overlapped with any candidate SNP for ADG compared with previous SNPs-based association studies, suggesting its important role for ADG. In addition, we identified one rare deletion near gene Growth Factor Receptor-bound Protein 10 (GRB10) at 5.1 Mb on BTA4 for ADG using both probe-based association and region-based approaches. CONCLUSIONS Our results provided some valuable insights to elucidate the genetic basis of ADG in beef cattle, and these findings offer an alternative perspective to understand the genetic mechanism of complex traits in terms of copy number variations in farm animals.
Collapse
Affiliation(s)
- Lingyang Xu
- Innovation Team of Cattle Genetic Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| | - Liu Yang
- Innovation Team of Cattle Genetic Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.,Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, 611130, China
| | - Lei Wang
- Beijing Genecast Biotechnology Co., Beijing, 100191, China
| | - Bo Zhu
- Innovation Team of Cattle Genetic Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Yan Chen
- Innovation Team of Cattle Genetic Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Huijiang Gao
- Innovation Team of Cattle Genetic Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Xue Gao
- Innovation Team of Cattle Genetic Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Lupei Zhang
- Innovation Team of Cattle Genetic Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - George E Liu
- U.S. Department of Agriculture-Agricultural Research Services, Animal Genomics and Improvement Laboratory, Beltsville, MD, 20705, USA.
| | - Junya Li
- Innovation Team of Cattle Genetic Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
13
|
Cheng Y, Dai JY, Wang X, Kooperberg C. Identifying disease-associated copy number variations by a doubly penalized regression model. Biometrics 2018; 74:1341-1350. [PMID: 29894562 PMCID: PMC6663092 DOI: 10.1111/biom.12920] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 05/01/2018] [Accepted: 05/01/2018] [Indexed: 11/27/2022]
Abstract
Copy number variation (CNV) of DNA plays an important role in the development of many diseases. However, due to the irregularity and sparsity of the CNVs, studying the association between CNVs and a disease outcome or a trait can be challenging. Up to now, not many methods have been proposed in the literature for this problem. Most of the current researchers reply on an ad hoc two-stage procedure by first identifying CNVs in each individual genome and then performing an association test using these identified CNVs. This potentially leads to information loss and as a result a lower power to identify disease associated CNVs. In this article, we describe a new method that combines the two steps into a single coherent model to identify the common CNV across patients that are associated with certain diseases. We use a double penalty model to capture CNVs' association with both the intensities and the disease trait. We validate its performance in simulated datasets and a data example on platinum resistance and CNV in ovarian cancer genome.
Collapse
Affiliation(s)
- Yichen Cheng
- Institute for Insight, Georgia State University, Atlanta, Georgia, USA
| | - James Y. Dai
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, U.S.A
| | - Xiaoyu Wang
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, U.S.A
| | - Charles Kooperberg
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, U.S.A
| |
Collapse
|
14
|
Adewoye AB, Shrine N, Odenthal-Hesse L, Welsh S, Malarstig A, Jelinsky S, Kilty I, Tobin MD, Hollox EJ, Wain LV. Human CCL3L1 copy number variation, gene expression, and the role of the CCL3L1-CCR5 axis in lung function. Wellcome Open Res 2018; 3:13. [PMID: 29682616 PMCID: PMC5883389 DOI: 10.12688/wellcomeopenres.13902.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2018] [Indexed: 01/21/2023] Open
Abstract
Background: The CCL3L1-CCR5 signaling axis is important in a number of inflammatory responses, including macrophage function, and T-cell-dependent immune responses. Small molecule CCR5 antagonists exist, including the approved antiretroviral drug maraviroc, and therapeutic monoclonal antibodies are in development. Repositioning of drugs and targets into new disease areas can accelerate the availability of new therapies and substantially reduce costs. As it has been shown that drug targets with genetic evidence supporting their involvement in the disease are more likely to be successful in clinical development, using genetic association studies to identify new target repurposing opportunities could be fruitful. Here we investigate the potential of perturbation of the CCL3L1-CCR5 axis as treatment for respiratory disease. Europeans typically carry between 0 and 5 copies of CCL3L1 and this multi-allelic variation is not detected by widely used genome-wide single nucleotide polymorphism studies. Methods: We directly measured the complex structural variation of CCL3L1 using the Paralogue Ratio Test and imputed (with validation) CCR5del32 genotypes in 5,000 individuals from UK Biobank, selected from the extremes of the lung function distribution, and analysed DNA and RNAseq data for CCL3L1 from the 1000 Genomes Project. Results: We confirmed the gene dosage effect of CCL3L1 copy number on CCL3L1 mRNA expression levels. We found no evidence for association of CCL3L1 copy number or CCR5del32 genotype with lung function. Conclusions: These results suggest that repositioning CCR5 antagonists is unlikely to be successful for the treatment of airflow obstruction.
Collapse
Affiliation(s)
- Adeolu B. Adewoye
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Nick Shrine
- Department of Health Sciences, University of Leicester, Leicester, UK
| | - Linda Odenthal-Hesse
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | | | | | - Scott Jelinsky
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Iain Kilty
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Martin D. Tobin
- Department of Health Sciences, University of Leicester, Leicester, UK,National Institute of Health Research Biomedical Research Centre, University of Leicester, Leicester, UK
| | - Edward J. Hollox
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK,
| | - Louise V. Wain
- Department of Health Sciences, University of Leicester, Leicester, UK,National Institute of Health Research Biomedical Research Centre, University of Leicester, Leicester, UK,
| |
Collapse
|
15
|
Complement receptor 1 gene (CR1) intragenic duplication and risk of Alzheimer's disease. Hum Genet 2018; 137:305-314. [PMID: 29675612 PMCID: PMC5937907 DOI: 10.1007/s00439-018-1883-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 04/13/2018] [Indexed: 11/09/2022]
Abstract
Single nucleotide variants (SNVs) within and surrounding the complement receptor 1 (CR1) gene show some of the strongest genome-wide association signals with late-onset Alzheimer’s disease. Some studies have suggested that this association signal is due to a duplication allele (CR1-B) of a low copy repeat (LCR) within the CR1 gene, which increases the number of complement C3b/C4b-binding sites in the mature receptor. In this study, we develop a triplex paralogue ratio test assay for CR1 LCR copy number allowing large numbers of samples to be typed with a limited amount of DNA. We also develop a CR1-B allele-specific PCR based on the junction generated by an historical non-allelic homologous recombination event between CR1 LCRs. We use these methods to genotype CR1 and measure CR1-B allele frequency in both late-onset and early-onset cases and unaffected controls from the United Kingdom. Our data support an association of late-onset Alzheimer’s disease with the CR1-B allele, and confirm that this allele occurs most frequently on the risk haplotype defined by SNV alleles. Furthermore, regression models incorporating CR1-B genotype provide a better fit to our data compared to incorporating the SNV-defined risk haplotype, supporting the CR1-B allele as the variant underlying the increased risk of late-onset Alzheimer’s disease.
Collapse
|
16
|
Adewoye AB, Shrine N, Odenthal-Hesse L, Welsh S, Malarstig A, Jelinsky S, Kilty I, Tobin MD, Hollox EJ, Wain LV. Human CCL3L1 copy number variation, gene expression, and the role of the CCL3L1-CCR5 axis in lung function. Wellcome Open Res 2018. [DOI: 10.12688/wellcomeopenres.13902.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: The CCL3L1-CCR5 signaling axis is important in a number of inflammatory responses, including macrophage function, and T-cell-dependent immune responses. Small molecule CCR5 antagonists exist, including the approved antiretroviral drug maraviroc, and therapeutic monoclonal antibodies are in development. Repositioning of drugs and targets into new disease areas can accelerate the availability of new therapies and substantially reduce costs. As it has been shown that drug targets with genetic evidence supporting their involvement in the disease are more likely to be successful in clinical development, using genetic association studies to identify new target repurposing opportunities could be fruitful. Here we investigate the potential of perturbation of the CCL3L1-CCR5 axis as treatment for respiratory disease. Europeans typically carry between 0 and 5 copies of CCL3L1 and this multi-allelic variation is not detected by widely used genome-wide single nucleotide polymorphism studies. Methods: We directly measured the complex structural variation of CCL3L1 using the Paralogue Ratio Test and imputed (with validation) CCR5del32 genotypes in 5,000 individuals from UK Biobank, selected from the extremes of the lung function distribution, and analysed DNA and RNAseq data for CCL3L1 from the 1000 Genomes Project. Results: We confirmed the gene dosage effect of CCL3L1 copy number on CCL3L1 mRNA expression levels. We found no evidence for association of CCL3L1 copy number or CCR5del32 genotype with lung function. Conclusions: These results suggest that repositioning CCR5 antagonists is unlikely to be successful for the treatment of airflow obstruction.
Collapse
|
17
|
Abstract
Differences between genomes can be due to single nucleotide variants (SNPs), translocations, inversions and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 250 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease or phenotypic traits.While the link between SNPs and disease susceptibility has been well studied, to date there are still very few published CNV genome-wide association studies; probably owing to the fact that CNV analysis remains a slightly more complex task than SNP analysis (both in term of bioinformatics workflow and uncertainty in the CNV calling leading to high false positive rates and unknown false negative rates). This chapter aims at explaining computational methods for the analysis of CNVs, ranging from study design, data processing and quality control, up to genome-wide association study with clinical traits.
Collapse
Affiliation(s)
- Aurélien Macé
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Zoltán Kutalik
- Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | |
Collapse
|
18
|
Liu M, Moon S, Wang L, Kim S, Kim YJ, Hwang MY, Kim YJ, Elston RC, Kim BJ, Won S. On the association analysis of CNV data: a fast and robust family-based association method. BMC Bioinformatics 2017; 18:217. [PMID: 28420343 PMCID: PMC5395793 DOI: 10.1186/s12859-017-1622-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 03/31/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variation (CNV) is known to play an important role in the genetics of complex diseases and several methods have been proposed to detect association of CNV with phenotypes of interest. Statistical methods for CNV association analysis can be categorized into two different strategies. First, the copy number is estimated by maximum likelihood and association of the expected copy number with the phenotype is tested. Second, the observed probe intensity measurements can be directly used to detect association of CNV with the phenotypes of interest. RESULTS For each strategy we provide a statistic that can be applied to extended families. The computational efficiency of the proposed methods enables genome-wide association analysis and we show with simulation studies that the proposed methods outperform other existing approaches. In particular, we found that the first strategy is always more efficient than the second strategy no matter whether copy numbers for each individual are well identified or not. With the proposed methods, we performed genome-wide CNV association analyses of hematological trait, hematocrit, on 521 Korean family samples. CONCLUSIONS We found that statistical analysis with the expected copy number is more powerful than the statistic with the probe intensity measurements regardless of the accuracy of the estimation of copy numbers.
Collapse
Affiliation(s)
- Meiling Liu
- Department of Applied Statistics, Chung-Ang University, Seoul, 156-756, South Korea.,Department of Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
| | - Sanghoon Moon
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea
| | - Longfei Wang
- Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, 151-742, South Korea
| | - Sulgi Kim
- Naver Labs, 235 Pangyoyeok-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, 13494, South Korea
| | - Yeon-Jung Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea
| | - Mi Yeong Hwang
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea
| | - Young Jin Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea
| | - Robert C Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Bong-Jo Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Cheongju-si, Chungcheongbuk-do, 363-951, South Korea.
| | - Sungho Won
- Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, 151-742, South Korea. .,Department of Public Health Science, Seoul National University, Seoul, 151-742, South Korea. .,Institute of Health and Environment, Seoul National University, Seoul, 151-742, South Korea.
| |
Collapse
|
19
|
Rahbari R, Zuccherato LW, Tischler G, Chihota B, Ozturk H, Saleem S, Tarazona‐Santos E, Machado LR, Hollox EJ. Understanding the Genomic Structure of Copy-Number Variation of the Low-Affinity Fcγ Receptor Region Allows Confirmation of the Association of FCGR3B Deletion with Rheumatoid Arthritis. Hum Mutat 2017; 38:390-399. [PMID: 27995740 PMCID: PMC5363352 DOI: 10.1002/humu.23159] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 12/14/2016] [Indexed: 11/23/2022]
Abstract
Fcγ receptors are a family of cell-surface receptors that are expressed by a host of different innate and adaptive immune cells, and mediate inflammatory responses by binding the Fc portion of immunoglobulin G. In humans, five low-affinity receptors are encoded by the genes FCGR2A, FCGR2B, FCGR2C, FCGR3A, and FCGR3B, which are located in an 82.5-kb segmental tandem duplication on chromosome 1q23.3, which shows extensive copy-number variation (CNV). Deletions of FCGR3B have been suggested to increase the risk of inflammatory diseases such as systemic lupus erythematosus and rheumatoid arthritis (RA). In this study, we identify the deletion breakpoints of FCGR3B deletion alleles in the UK population and endogamous native American population, and show that some but not all alleles are likely to be identical-by-descent. We also localize a duplication breakpoint, confirming that the mechanism of CNV generation is nonallelic homologous recombination, and identify several alleles with gene conversion events using fosmid sequencing data. We use information on the structure of the deletion alleles to distinguish FCGR3B deletions from FCGR3A deletions in whole-genome array comparative genomic hybridization (aCGH) data. Reanalysis of published aCGH data using this approach supports association of FCGR3B deletion with increased risk of RA in a large cohort of 1,982 cases and 3,271 controls (odds ratio 1.61, P = 2.9×10-3 ).
Collapse
Affiliation(s)
- Raheleh Rahbari
- Department of GeneticsUniversity of LeicesterLeicesterUnited Kingdom
- Wellcome Trust Sanger InstituteHinxtonUnited Kingdom
| | - Luciana W Zuccherato
- Department of GeneticsUniversity of LeicesterLeicesterUnited Kingdom
- Departmento de Biologia GeralInstituto de Ciências BiológicasUniversidade Federal de Minas GeraisBelo HorizonteBrazil
| | | | - Belinda Chihota
- School of HealthUniversity of NorthamptonNorthamptonUnited Kingdom
| | - Hasret Ozturk
- Department of GeneticsUniversity of LeicesterLeicesterUnited Kingdom
| | - Sara Saleem
- Department of GeneticsUniversity of LeicesterLeicesterUnited Kingdom
| | - Eduardo Tarazona‐Santos
- Departmento de Biologia GeralInstituto de Ciências BiológicasUniversidade Federal de Minas GeraisBelo HorizonteBrazil
| | - Lee R Machado
- Department of GeneticsUniversity of LeicesterLeicesterUnited Kingdom
- School of HealthUniversity of NorthamptonNorthamptonUnited Kingdom
| | - Edward J Hollox
- Department of GeneticsUniversity of LeicesterLeicesterUnited Kingdom
| |
Collapse
|
20
|
Hu XS, Yeh FC, Hu Y, Deng LT, Ennos RA, Chen X. High mutation rates explain low population genetic divergence at copy-number-variable loci in Homo sapiens. Sci Rep 2017; 7:43178. [PMID: 28225073 PMCID: PMC5320550 DOI: 10.1038/srep43178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 01/19/2017] [Indexed: 11/09/2022] Open
Abstract
Copy-number-variable (CNV) loci differ from single nucleotide polymorphic (SNP) sites in size, mutation rate, and mechanisms of maintenance in natural populations. It is therefore hypothesized that population genetic divergence at CNV loci will differ from that found at SNP sites. Here, we test this hypothesis by analysing 856 CNV loci from the genomes of 1184 healthy individuals from 11 HapMap populations with a wide range of ancestry. The results show that population genetic divergence at the CNV loci is generally more than three times lower than at genome-wide SNP sites. Populations generally exhibit very small genetic divergence (Gst = 0.05 ± 0.049). The smallest divergence is among African populations (Gst = 0.0081 ± 0.0025), with increased divergence among non-African populations (Gst = 0.0217 ± 0.0109) and then among African and non-African populations (Gst = 0.0324 ± 0.0064). Genetic diversity is high in African populations (~0.13), low in Asian populations (~0.11), and intermediate in the remaining 11 populations. Few significant linkage disequilibria (LDs) occur between the genome-wide CNV loci. Patterns of gametic and zygotic LDs indicate the absence of epistasis among CNV loci. Mutation rate is about twice as large as the migration rate in the non-African populations, suggesting that the high mutation rates play dominant roles in producing the low population genetic divergence at CNV loci.
Collapse
Affiliation(s)
- Xin-Sheng Hu
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong 510642, China.,College of Forestry and Landscape Architecture, South China Agricultural University, Guangdong 510642, China
| | - Francis C Yeh
- Department of Renewable Resources, 751 General Service Building, University of Alberta, Edmonton, AB T6G 2H1, Canada
| | - Yang Hu
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2S4, Canada
| | - Li-Ting Deng
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong 510642, China.,College of Forestry and Landscape Architecture, South China Agricultural University, Guangdong 510642, China
| | - Richard A Ennos
- Institute of Evolutionary Biology, Ashworth Laboratories, School of Biological Sciences, University of Edinburgh, Edinburgh EH 9 3JT, United Kingdom
| | - Xiaoyang Chen
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong 510642, China.,College of Forestry and Landscape Architecture, South China Agricultural University, Guangdong 510642, China
| |
Collapse
|
21
|
Abstract
Copy number variation (CNV), where a segment of DNA differs in copy number between different individuals, is an extensive and often underappreciated source of genetic variation within species. However, reliably determining copy number of a particular DNA sequence for a large number of samples can be challenging. Here, I describe and review the paralogue ratio test (PRT) in detail. PRT was developed to robustly type the CNV of the beta-defensin locus using small amounts of genomic DNA in a high-throughput manner, and has been applied successfully at many other loci. I discuss the strategies for designing successful PRT assays using both manual and bioinformatics methods, how to optimize experimental conditions, and approaches for analyzing the data. I discuss strengths and weaknesses of the approach, and how to troubleshoot results, as well as the range of problems to which PRT can be a potential solution.
Collapse
|
22
|
Fu J, Beaty TH, Scott AF, Hetmanski J, Parker MM, Wilson JEB, Marazita ML, Mangold E, Albacha-Hejazi H, Murray JC, Bureau A, Carey J, Cristiano S, Ruczinski I, Scharpf RB. Whole exome association of rare deletions in multiplex oral cleft families. Genet Epidemiol 2017; 41:61-69. [PMID: 27910131 PMCID: PMC5154821 DOI: 10.1002/gepi.22010] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Revised: 09/21/2016] [Accepted: 09/21/2016] [Indexed: 11/11/2022]
Abstract
By sequencing the exomes of distantly related individuals in multiplex families, rare mutational and structural changes to coding DNA can be characterized and their relationship to disease risk can be assessed. Recently, several rare single nucleotide variants (SNVs) were associated with an increased risk of nonsyndromic oral cleft, highlighting the importance of rare sequence variants in oral clefts and illustrating the strength of family-based study designs. However, the extent to which rare deletions in coding regions of the genome occur and contribute to risk of nonsyndromic clefts is not well understood. To identify putative structural variants underlying risk, we developed a pipeline for rare hemizygous deletions in families from whole exome sequencing and statistical inference based on rare variant sharing. Among 56 multiplex families with 115 individuals, we identified 53 regions with one or more rare hemizygous deletions. We found 45 of the 53 regions contained rare deletions occurring in only one family member. Members of the same family shared a rare deletion in only eight regions. We also devised a scalable global test for enrichment of shared rare deletions.
Collapse
Affiliation(s)
- Jack Fu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Terri H. Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Alan F. Scott
- Center for Inherited Disease Research and Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore MD, USA
| | - Jacqueline Hetmanski
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Margaret M. Parker
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston MA, USA
| | - Joan E. Bailey Wilson
- Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore MD, USA
| | - Mary L. Marazita
- Department of Oral Biology, Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, PA, USA
| | | | | | - Jeffrey C. Murray
- Department of Pediatrics, School of Medicine, University of Iowa, IA, USA
| | - Alexandre Bureau
- Centre de Recherche de l’Institut Universitaire en Santé Mentale de Québec and Département de Médecine Sociale et Préventive, Université Laval, Québec, Canada
| | - Jacob Carey
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Stephen Cristiano
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Robert B. Scharpf
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore MD, USA
| |
Collapse
|
23
|
Park TJ, Hwang MY, Moon S, Hwang JY, Go MJ, Kim BJ. Identification of a Copy Number Variation on Chromosome 20q13.12 Associated with Osteoporotic Fractures in the Korean Population. Genomics Inform 2016; 14:216-221. [PMID: 28154514 PMCID: PMC5287127 DOI: 10.5808/gi.2016.14.4.216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 11/12/2016] [Accepted: 11/14/2016] [Indexed: 12/02/2022] Open
Abstract
Osteoporotic fractures (OFs) are critical hard outcomes of osteoporosis and are characterized by decreased bone strength induced by low bone density and microarchitectural deterioration in bone tissue. Most OFs cause acute pain, hospitalization, immobilization, and slow recovery in patients and are associated with increased mortality. A variety of genetic studies have suggested associations of genetic variants with the risk of OF. Genome-wide association studies have reported various single-nucleotide polymorphisms and copy number variations (CNVs) in European and Asian populations. To identify CNV regions associated with OF risk, we conducted a genome-wide CNV study in a Korean population. We performed logistic regression analyses in 1,537 Korean subjects (299 OF cases and 1,238 healthy controls) and identified a total of 8 CNV regions significantly associated with OF (p < 0.05). Then, one CNV region located on chromosome 20q13.12 was selected for experimental validation. The selected CNV region was experimentally validated by quantitative polymerase chain reaction. The CNV region of chromosome 20q13.12 is positioned upstream of a family of long non-coding RNAs, LINC01260. Our findings could provide new information on the genetic factors associated with the risk of OF.
Collapse
Affiliation(s)
- Tae-Joon Park
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Cheongju 28159, Korea
| | - Mi Yeong Hwang
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Cheongju 28159, Korea
| | - Sanghoon Moon
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Cheongju 28159, Korea
| | - Joo-Yeon Hwang
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Cheongju 28159, Korea
| | - Min Jin Go
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Cheongju 28159, Korea
| | - Bong-Jo Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Cheongju 28159, Korea
| |
Collapse
|
24
|
Copy number variations in 375 patients with oesophageal atresia and/or tracheoesophageal fistula. Eur J Hum Genet 2016; 24:1715-1723. [PMID: 27436264 DOI: 10.1038/ejhg.2016.86] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Revised: 06/08/2016] [Accepted: 06/14/2016] [Indexed: 02/06/2023] Open
Abstract
Oesophageal atresia (OA) with or without tracheoesophageal fistula (TOF) are rare anatomical congenital malformations whose cause is unknown in over 90% of patients. A genetic background is suggested, and among the reported genetic defects are copy number variations (CNVs). We hypothesized that CNVs contribute to OA/TOF development. Quantifying their prevalence could aid in genetic diagnosis and clinical care strategies. Therefore, we profiled 375 patients in a combined Dutch, American and German cohort via genomic microarray and compared the CNV profiles with their unaffected parents and published control cohorts. We identified 167 rare CNVs containing genes (frequency<0.0005 in our in-house cohort). Eight rare CNVs - in six patients - were de novo, including one CNV previously associated with oesophageal disease. (hg19 chr7:g.(143820444_143839360)_(159119486_159138663)del) 1.55% of isolated OA/TOF patients and 1.62% of patients with additional congenital anomalies had de novo CNVs. Furthermore, three (15q13.3, 16p13.3 and 22q11.2) susceptibility loci were identified based on their overlap with known OA/TOF-associated CNV syndromes and overlap with loci in published CNV association case-control studies in developmental delay. Our study suggests that CNVs contribute to OA/TOF development. In addition to the identified likely deleterious de novo CNVs, we detected 167 rare CNVs. Although not directly disease-causing, these CNVs might be of interest, as they can act as a modifier in a multiple hit model, or as the second hit in a recessive condition.
Collapse
|
25
|
Hu XS, Hu Y, Chen X. Testing neutrality at copy-number-variable loci under the finite-allele and finite-site models. Theor Popul Biol 2016; 112:1-13. [PMID: 27423854 DOI: 10.1016/j.tpb.2016.07.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Revised: 07/05/2016] [Accepted: 07/06/2016] [Indexed: 02/01/2023]
Abstract
Copy-number variation (CNV) is an important form of DNA structural variation because a certain proportion of genomes in many eukaryotic species can contribute to such variations. Owing to the differences between CNVs and single nucleotide polymorphisms (SNPs) in size, mutation rate and maintaining mechanism, it is more realistic to characterize CNV evolution under the finite-allele and finite-site models. Here, we propose a method to test multiple CNVs neutrality under the finite-allele and finite-site models and the assumption of mutation-drift process. The statistical property of the method is evaluated through Monte Carlo simulations under the effects of the sample size, the scaled mutation rates, the number of CNVs, the population demographic change, and selection. Different from Tajima's D test, a bootstrap or a permutation approach is suggested to conduct a neutrality test. Application of this method is illustrated using the diploid CNV genotypes measured in discrete copy numbers in 11 HapMap phase III populations. The results show that the mutation-drift process can explain the variation of genome-wide CNVs among 1184 individuals (856 CNVs, ∼0.02Mb on average in size), irrespective of the historical demographic changes. Patterns from allele-frequency-spectrum analysis also support the hypothesis of neutral CNVs. Our results suggest that most human chromosomal changes in healthy individuals via unbalanced rearrangements of the segments with certain sizes are neutral.
Collapse
Affiliation(s)
- Xin-Sheng Hu
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong 510642, China; State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangdong 510642, China; Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX13RB, United Kingdom.
| | - Yang Hu
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2S4, Canada
| | - Xiaoyang Chen
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong 510642, China; State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangdong 510642, China.
| |
Collapse
|
26
|
Polley S, Cipriani V, Khan JC, Shahid H, Moore AT, Yates JRW, Hollox EJ. Analysis of copy number variation at DMBT1 and age-related macular degeneration. BMC MEDICAL GENETICS 2016; 17:44. [PMID: 27416785 PMCID: PMC4946147 DOI: 10.1186/s12881-016-0311-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 07/07/2016] [Indexed: 12/22/2022]
Abstract
BACKGROUND DMBT1 is a gene that shows extensive copy number variation (CNV) that alters the number of bacteria-binding domains in the protein and has been shown to activate the complement pathway. It lies next to the ARMS2/HTRA1 genes in a region of chromosome 10q26, where single nucleotide variants have been strongly associated with age-related macular degeneration (AMD), the commonest cause of blindness in Western populations. Complement activation is thought to be a key factor in the pathogenesis of this condition. We sought to investigate whether DMBT1 CNV plays any role in the susceptibility to AMD. METHODS We analysed long-range linkage disequilibrium of DMBT1 CNV1 and CNV2 with flanking single nucleotide polymorphisms (SNPs) using our previously published CNV and HapMap Phase 3 SNP data in the CEPH Europeans from Utah (CEU). We then typed a large cohort of 860 AMD patients and 419 examined age-matched controls for copy number at DMBT1 CNV1 and CNV2 and combined these data with copy numbers from a further 480 unexamined controls. RESULTS We found weak linkage disequilibrium between DMBT1 CNV1 and CNV2 with the SNPs rs1474526 and rs714816 in the HTRA1/ARMS2 region. By directly analysing copy number variation, we found no evidence of association of CNV1 or CNV2 with AMD. CONCLUSIONS We have shown that copy number variation at DMBT1 does not affect risk of developing age-related macular degeneration and can therefore be ruled out from future studies investigating the association of structural variation at 10q26 with AMD.
Collapse
Affiliation(s)
- Shamik Polley
- Department of Genetics, University of Leicester, Leicester, UK
| | - Valentina Cipriani
- UCL Institute of Ophthalmology, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Jane C Khan
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
- Centre for Ophthalmology and Visual Science, Lions Eye Institute, University of Western Australia, Perth, Australia
- Department of Ophthalmology, Royal Perth Hospital, Perth, Australia
| | - Humma Shahid
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
- Department of Ophthamology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Anthony T Moore
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
- Department of Ophthalmology UCSF Medical School, San Francisco, USA
| | - John R W Yates
- UCL Institute of Ophthalmology, University College London, London, UK
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - Edward J Hollox
- Department of Genetics, University of Leicester, Leicester, UK.
| |
Collapse
|
27
|
Yong RY, Mustaffa SB, Wasan PS, Sheng L, Marshall CR, Scherer SW, Teo YY, Yap EP. Complex Copy Number Variation of AMY1
does not Associate with Obesity in two East Asian Cohorts. Hum Mutat 2016; 37:669-78. [DOI: 10.1002/humu.22996] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 03/08/2016] [Indexed: 11/12/2022]
Affiliation(s)
- Rita Y.Y. Yong
- Defence Medical and Environmental Research Institute; DSO National Laboratories; Singapore
- Saw Swee Hock School of Public Health; National University of Singapore; Singapore
| | - Su'Aidah B. Mustaffa
- Defence Medical and Environmental Research Institute; DSO National Laboratories; Singapore
- Lee Kong Chian School of Medicine; Nanyang Technological University; Singapore
| | - Pavandip S. Wasan
- Defence Medical and Environmental Research Institute; DSO National Laboratories; Singapore
- Saw Swee Hock School of Public Health; National University of Singapore; Singapore
| | - Liang Sheng
- Unit of Biostatistics; Yong Loo Lin School of Medicine; National University of Singapore; Singapore
| | - Christian R. Marshall
- The Centre for Applied Genomics; Genetics and Genome Biology; The Hospital for Sick Children; Toronto ON Canada
| | - Stephen W. Scherer
- The Centre for Applied Genomics; Genetics and Genome Biology; The Hospital for Sick Children; Toronto ON Canada
- Department of Molecular Genetics and McLaughlin Centre; University of Toronto; Toronto ON Canada
| | - Yik-Ying Teo
- Saw Swee Hock School of Public Health; National University of Singapore; Singapore
- Department of Statistics and Applied Probability; Faculty of Science; National University of Singapore; Singapore
| | - Eric P.H. Yap
- Defence Medical and Environmental Research Institute; DSO National Laboratories; Singapore
- Saw Swee Hock School of Public Health; National University of Singapore; Singapore
- Lee Kong Chian School of Medicine; Nanyang Technological University; Singapore
| |
Collapse
|
28
|
Franke L, el Bannoudi H, Jansen DTSL, Kok K, Trynka G, Diogo D, Swertz M, Fransen K, Knevel R, Gutierrez-Achury J, Ärlestig L, Greenberg JD, Kremer J, Pappas DA, Kanterakis A, Weersma RK, van der Helm-van Mil AHM, Guryev V, Rantapää-Dahlqvist S, Gregersen PK, Plenge RM, Wijmenga C, Huizinga TWJ, Ioan-Facsinay A, Toes REM, Zhernakova A. Association analysis of copy numbers of FC-gamma receptor genes for rheumatoid arthritis and other immune-mediated phenotypes. Eur J Hum Genet 2016; 24:263-70. [PMID: 25966632 PMCID: PMC4717214 DOI: 10.1038/ejhg.2015.95] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Revised: 03/27/2015] [Accepted: 04/15/2015] [Indexed: 12/20/2022] Open
Abstract
Segmental duplications (SDs) comprise about 5% of the human genome and are enriched for immune genes. SD loci often show copy numbers variations (CNV), which are difficult to tag with genotyping methods. CNV in the Fcγ receptor region (FCGR) has been suggested to be associated with rheumatic diseases. The objective of this study was to delineate association of FCGR-CNV with rheumatoid arthritis (RA), coeliac disease and Inflammatory bowel disease incidence. We developed a method to accurately quantify CNV in SD loci based on the intensity values from the Immunochip platform and applied it to the FCGR locus. We determined the method's validity using three independent assays: segregation analysis in families, arrayCGH, and whole genome sequencing. Our data showed the presence of two separate CNVs in the FCGR locus. The first region encodes FCGR2A, FCGR3A and part of FCGR2C gene, the second encodes another part of FCGR2C, FCGR3B and FCGR2B. Analysis of CNV status in 4578 individuals with RA and 5457 controls indicated association of duplications in the FCGR3B gene in antibody-negative RA (P=0.002, OR=1.43). Deletion in FCGR3B was associated with increased risk of antibody-positive RA, consistently with previous reports (P=0.023, OR=1.23). A clear genotype-phenotype relationship was observed: CNV polymorphisms of the FCGR3A gene correlated to CD16A expression (encoded by FCGR3A) on CD8 T-cells. In conclusion, our method allows determining the CNV status of the FCGR locus, we identified association of CNV in FCGR3B to RA and showed a functional relationship between CNV in the FCGR3A gene and CD16A expression.
Collapse
Affiliation(s)
- Lude Franke
- Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Hanane el Bannoudi
- Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Diahann T S L Jansen
- Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Klaas Kok
- Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Gosia Trynka
- Division of Rheumatology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Dorothee Diogo
- Division of Rheumatology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Morris Swertz
- Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
- Genomics Coordination Center, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Karin Fransen
- Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
- Department of Gastroenterology and Hepatology, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Rachel Knevel
- Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Javier Gutierrez-Achury
- Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Lisbeth Ärlestig
- Department of Public Health and Clinical Medicine/Rheumatology, Umeå University, Umeå, Sweden
| | - Jeffrey D Greenberg
- Department of Medicine, New York University School of Medicine, New York, New York, USA
| | - Joel Kremer
- Department of Medicine, Albany Medical College, Albany, New York, USA
| | - Dimitrios A Pappas
- Department of Medicine, Columbia University College of Physicians and Surgeons, New York, New York, USA
| | - Alexandros Kanterakis
- Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
- Genomics Coordination Center, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Rinse K Weersma
- Department of Gastroenterology and Hepatology, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | | | - Viktor Guryev
- Laboratory of Genome Structure and Ageing, European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | | | | | - Robert M Plenge
- Division of Rheumatology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Cisca Wijmenga
- Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Tom W-J Huizinga
- Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Andreea Ioan-Facsinay
- Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Rene E M Toes
- Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Alexandra Zhernakova
- Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
- Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
| |
Collapse
|
29
|
Copy number variation of scavenger-receptor cysteine-rich domains within DMBT1 and Crohn's disease. Eur J Hum Genet 2016; 24:1294-300. [PMID: 26813944 PMCID: PMC4851238 DOI: 10.1038/ejhg.2015.280] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 12/08/2015] [Accepted: 12/09/2015] [Indexed: 12/21/2022] Open
Abstract
Previous work has shown that the gene DMBT1, which encodes a large secreted epithelial glycoprotein known as salivary agglutinin, gp340, hensin or muclin, is an innate immune defence protein that binds bacteria. A deletion variant of DMBT1 has been previously associated with Crohn's disease, and a DMBT1−/− knockout mouse has increased levels of colitis induced by dextran sulphate. DMBT1 has a complex copy number variable structure, with two, independent, rapidly mutating copy number variable regions, called CNV1 and CNV2. Because the copy number variable regions are predicted to affect the number of bacteria-binding domains, different alleles may alter host–microbe interactions in the gut. Our aim was to investigate the role of this complex variation in susceptibility to Crohn's disease by assessing the previously reported association. We analysed the association of both copy number variable regions with presence of Crohn's disease, and its severity, on three case–control cohorts. We also reanalysed array comparative genomic hybridisation data (aCGH) from a large case–control cohort study for both copy number variable regions. We found no association with a linear increase in copy number, nor when the CNV1 is regarded as presence or absence of a deletion allele. Taken together, we show that the DMBT1 CNV does not affect susceptibility to Crohn's disease, at least in Northern Europeans.
Collapse
|
30
|
Bourgeois S, Jorgensen A, Zhang EJ, Hanson A, Gillman MS, Bumpstead S, Toh CH, Williamson P, Daly AK, Kamali F, Deloukas P, Pirmohamed M. A multi-factorial analysis of response to warfarin in a UK prospective cohort. Genome Med 2016; 8:2. [PMID: 26739746 PMCID: PMC4702374 DOI: 10.1186/s13073-015-0255-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2015] [Accepted: 12/10/2015] [Indexed: 01/13/2023] Open
Abstract
Background Warfarin is the most widely used oral anticoagulant worldwide, but it has a narrow therapeutic index which necessitates constant monitoring of anticoagulation response. Previous genome-wide studies have focused on identifying factors explaining variance in stable dose, but have not explored the initial patient response to warfarin, and a wider range of clinical and biochemical factors affecting both initial and stable dosing with warfarin. Methods A prospective cohort of 711 patients starting warfarin was followed up for 6 months with analyses focusing on both non-genetic and genetic factors. The outcome measures used were mean weekly warfarin dose (MWD), stable mean weekly dose (SMWD) and international normalised ratio (INR) > 4 during the first week. Samples were genotyped on the Illumina Human610-Quad chip. Statistical analyses were performed using Plink and R. Results VKORC1 and CYP2C9 were the major genetic determinants of warfarin MWD and SMWD, with CYP4F2 having a smaller effect. Age, height, weight, cigarette smoking and interacting medications accounted for less than 20 % of the variance. Our multifactorial analysis explained 57.89 % and 56.97 % of the variation for MWD and SMWD, respectively. Genotypes for VKORC1 and CYP2C9*3, age, height and weight, as well as other clinical factors such as alcohol consumption, loading dose and concomitant drugs were important for the initial INR response to warfarin. In a small subset of patients for whom data were available, levels of the coagulation factors VII and IX (highly correlated) also played a role. Conclusion Our multifactorial analysis in a prospectively recruited cohort has shown that multiple factors, genetic and clinical, are important in determining the response to warfarin. VKORC1 and CYP2C9 genetic polymorphisms are the most important determinants of warfarin dosing, and it is highly unlikely that other common variants of clinical importance influencing warfarin dosage will be found. Both VKORC1 and CYP2C9*3 are important determinants of the initial INR response to warfarin. Other novel variants, which did not reach genome-wide significance, were identified for the different outcome measures, but need replication. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0255-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Stephane Bourgeois
- Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | | | - Eunice J Zhang
- University of Liverpool, Liverpool, Merseyside, L69 3GE, UK.
| | - Anita Hanson
- University of Liverpool, Liverpool, Merseyside, L69 3GE, UK.
| | - Matthew S Gillman
- Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | - Suzannah Bumpstead
- Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | - Cheng Hock Toh
- University of Liverpool, Liverpool, Merseyside, L69 3GE, UK.
| | | | - Ann K Daly
- Newcastle University, Newcastle upon Tyne, UK.
| | | | - Panos Deloukas
- Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. .,William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK.
| | - Munir Pirmohamed
- University of Liverpool, Liverpool, Merseyside, L69 3GE, UK. .,Royal Liverpool and Broadgreen University Hospital NHS Trust, Liverpool, L7 8XP, UK. .,The Wolfson Centre for Personalised Medicine, Institute of Translational Medicine, University of Liverpool, Block A: Waterhouse Building, 1-5 Brownlow Street, Liverpool, L69 3GL, UK.
| |
Collapse
|
31
|
Forni D, Martin D, Abujaber R, Sharp AJ, Sironi M, Hollox EJ. Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data. BMC Genomics 2015; 16:891. [PMID: 26526070 PMCID: PMC4630827 DOI: 10.1186/s12864-015-2123-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 10/22/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variation (CNV) is a major component of genomic variation, yet methods to accurately type genomic CNV lag behind methods that type single nucleotide variation. High-throughput sequencing can contribute to these methods by using sequence read depth, which takes the number of reads that map to a given part of the reference genome as a proxy for copy number of that region, and compares across samples. Furthermore, high-throughput sequencing also provides information on the sequence differences between copies within and between individuals. METHODS In this study we use high-coverage phase 3 exome sequences of the 1000 Genomes project to infer diploid copy number of the beta-defensin genomic region, a well-studied CNV that carries several beta-defensin genes involved in the antimicrobial response, signalling, and fertility. We also use these data to call sequence variants, a particular challenge given the multicopy nature of the region. RESULTS We confidently call copy number and sequence variation of the beta-defensin genes on 1285 samples from 26 global populations, validate copy number using Nanostring nCounter and triplex paralogue ratio test data. We use the copy number calls to verify the genomic extent of the CNV and validate sequence calls using analysis of cloned PCR products. We identify novel variation, mostly individually rare, predicted to alter amino-acid sequence in the beta-defensin genes. Such novel variants may alter antimicrobial properties or have off-target receptor interactions, and may contribute to individuality in immunological response and fertility. CONCLUSIONS Given that 81% of identified sequence variants were not previously in dbSNP, we show that sequence variation in multiallelic CNVs represent an unappreciated source of genomic diversity.
Collapse
Affiliation(s)
- Diego Forni
- Department of Genetics, University of Leicester, Leicester, UK.,Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio, Parini, Italy
| | - Diana Martin
- Department of Genetics, University of Leicester, Leicester, UK
| | - Razan Abujaber
- Department of Genetics, University of Leicester, Leicester, UK
| | - Andrew J Sharp
- Department of Genetics and Genome Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio, Parini, Italy
| | - Edward J Hollox
- Department of Genetics, University of Leicester, Leicester, UK.
| |
Collapse
|
32
|
Jeng J, Wu Q, Li H. A Statistical Method for Identifying Trait-Associated Copy Number Variants. Hum Hered 2015. [PMID: 26201700 DOI: 10.1159/000381585] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Copy number variants (CNVs), ranging in size from about one kilobase to several megabases, are DNA alterations of a genome that result in the cell having less or more than two copies of segments of the DNA. Such CNVs have been shown to be associated with many complex phenotypes, ranging from diseases to gene expressions. Novel methods have been developed for identifying CNVs both at the individual and at the population level. However, methods for testing CNV association are limited. Most available methods employ a two-step approach, where CNVs carried by the samples are identified first and then tested for association. However, the results of such tests depend on the threshold used for CNV identification and also the number of CNVs to be tested. We developed a method, CNVtest, to directly identify the trait-associated CNVs without the need of identifying sample-specific CNVs. We show that CNVtest asymptotically controls the type I error rate and identifies true trait-associated CNVs with a high probability. We demonstrate the methods using simulations and an application to identify the CNVs that are associated with population differentiation.
Collapse
Affiliation(s)
- Jessie Jeng
- Department of Statistics, North Carolina State University, Raleigh, N.C., USA
| | | | | |
Collapse
|
33
|
Huang N, Wen Y, Guo X, Li Z, Dai J, Ni B, Yu J, Lin Y, Zhou W, Yao B, Jiang Y, Sha J, Conrad DF, Hu Z. A Screen for Genomic Disorders of Infertility Identifies MAST2 Duplications Associated with Nonobstructive Azoospermia in Humans. Biol Reprod 2015. [PMID: 26203179 DOI: 10.1095/biolreprod.115.131185] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Since the cytogenetic identification of azoospermia factor regions 40 years ago, the Y chromosome has dominated research on the genetics of male infertility. We hypothesized that hotspots of structural rearrangement, which are dispersed across the genome, may mediate rare, recurrent copy number variations (CNVs), leading to severe infertility. We tested this hypothesis by contrasting patterns of rare CNVs in 970 Han Chinese men with idiopathic nonobstructive azoospermia and 1661 ethnicity-matched controls. Our results strongly support our previous claim that sperm production is modulated by genetic variation across the entire genome. The X chromosome in particular was enriched for loci modulating spermatogenesis--rare X-linked deletions larger than 100 kb were twice as common in patients compared with controls (odds ratio [OR] = 2.05, P = 0.01). At rearrangement hotspots across the genome, we observed a 2.4-fold enrichment of singleton CNVs in patients (P < 0.02), and we identified 117 testis genes, such as SYCE1, contained within 47 hotspots that may plausibly mediate genomic disorders of fertility. In our discovery sample we observed 3 case-specific duplications of the autosomal gene MAST2, and in a replication phase we found another 11 duplications in 1457 patients and 1 duplication in 1590 controls (P < 5 × 10(-5), combined data). With a large, polygenic genetic basis, new ways of establishing the pathogenicity of rare, large-effect mutations will be needed to fully reap the benefit of genome data in the management of azoospermia.
Collapse
Affiliation(s)
- Ni Huang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, Missouri
| | - Yang Wen
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Xuejiang Guo
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Histology and Embryology, Nanjing Medical University, Nanjing, China
| | - Zheng Li
- Shanghai Human Sperm Bank, Department of Urology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Juncheng Dai
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Bixian Ni
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Jun Yu
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yuan Lin
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Wen Zhou
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Bing Yao
- Department of Andrology, Nanjing Jinling Hospital, Nanjing, China
| | - Yue Jiang
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Jiahao Sha
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Histology and Embryology, Nanjing Medical University, Nanjing, China
| | - Donald F Conrad
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, Missouri
| | - Zhibin Hu
- State Key Lab of Reproductive Medicine, Nanjing Medical University, Nanjing, China Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| |
Collapse
|
34
|
Abstract
Hundreds of copy number variants are complex and multi-allelic, in that they have many structural alleles and have rearranged multiple times in the ancestors who contributed chromosomes to current humans. Not only are the relationships of these multi-allelic CNVs (mCNVs) to phenotypes generally unknown, but many mCNVs have not yet been described at the basic levels—alleles, allele frequencies, structural features—that support genetic investigation. To date, most reported disease associations to these variants have been ascertained through candidate gene studies. However, only a few associations have reached the level of acceptance defined by durable replications in many cohorts. This likely stems from longstanding challenges in making precise molecular measurements of the alleles individuals have at these loci. However, approaches for mCNV analysis are improving quickly, and some of the unique characteristics of mCNVs may assist future association studies. Their various structural alleles are likely to have different magnitudes of effect, creating a natural allelic series of growing phenotypic impact and giving investigators a set of natural predictions and testable hypotheses about the extent to which each allele of an mCNV predisposes to a phenotype. Also, mCNVs’ low-to-modest correlation to individual single-nucleotide polymorphisms (SNPs) may make it easier to distinguish between mCNVs and nearby SNPs as the drivers of an association signal, and perhaps, make it possible to preliminarily screen candidate loci, or the entire genome, for the many mCNV–disease relationships that remain to be discovered.
Collapse
|
35
|
Structural forms of the human amylase locus and their relationships to SNPs, haplotypes and obesity. Nat Genet 2015; 47:921-5. [PMID: 26098870 PMCID: PMC4712930 DOI: 10.1038/ng.3340] [Citation(s) in RCA: 104] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 05/27/2015] [Indexed: 12/16/2022]
Abstract
Hundreds of genes reside in structurally complex, poorly understood regions of the human genome1-3. One such region contains the three amylase genes (AMY2B, AMY2A, and AMY1) responsible for digesting starch into sugar. The copy number of AMY1 is reported to be the genome’s largest influence on obesity4, though genome-wide association studies for obesity have found this locus unremarkable. Using whole genome sequence analysis3,5, droplet digital PCR6, and genome mapping7, we identified eight common structural haplotypes of the amylase locus that suggest its mutational history. We found that AMY1 copy number in individuals’ genomes is generally even (rather than odd) and partially correlates to nearby SNPs, which do not associate with BMI. We measured amylase gene copy number in 1,000 obese or lean Estonians and in two other cohorts totaling ~3,500 individuals. We had 99% power to detect the lower bound of the reported effects on BMI4, yet found no association.
Collapse
|
36
|
Rivas MA, Pirinen M, Conrad DF, Lek M, Tsang EK, Karczewski KJ, Maller JB, Kukurba KR, DeLuca DS, Fromer M, Ferreira PG, Smith KS, Zhang R, Zhao F, Banks E, Poplin R, Ruderfer DM, Purcell SM, Tukiainen T, Minikel EV, Stenson PD, Cooper DN, Huang KH, Sullivan TJ, Nedzel J, Bustamante CD, Li JB, Daly MJ, Guigo R, Donnelly P, Ardlie K, Sammeth M, Dermitzakis ET, McCarthy MI, Montgomery SB, Lappalainen T, MacArthur DG. Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science 2015; 348:666-9. [PMID: 25954003 PMCID: PMC4537935 DOI: 10.1126/science.1261877] [Citation(s) in RCA: 196] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.
Collapse
Affiliation(s)
- Manuel A Rivas
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK.
| | - Matti Pirinen
- FInstitute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | | | - Monkol Lek
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Emily K Tsang
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Pathology, Stanford University, Stanford, CA, USA. Biomedical Informatics Program, Stanford University, Stanford, CA, USA
| | - Konrad J Karczewski
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Julian B Maller
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Kimberly R Kukurba
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Pathology, Stanford University, Stanford, CA, USA
| | | | - Menachem Fromer
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA. Department of Psychiatry, Mt. Sinai Hospital, NY, USA
| | - Pedro G Ferreira
- Department of Genetic Medicine and Development,University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Kevin S Smith
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Pathology, Stanford University, Stanford, CA, USA
| | - Rui Zhang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Fengmei Zhao
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Eric Banks
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ryan Poplin
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Douglas M Ruderfer
- Department of Psychiatry, Mt. Sinai Hospital, NY, USA. Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, NY, USA
| | - Shaun M Purcell
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA. Department of Psychiatry, Mt. Sinai Hospital, NY, USA. Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, NY, USA
| | - Taru Tukiainen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Eric V Minikel
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | | | | | - Jared Nedzel
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jin Billy Li
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Mark J Daly
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Roderic Guigo
- Center for Genomic Regulation (CRG), Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Peter Donnelly
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK. Department of Statistics, University of Oxford, Oxford, UK
| | | | - Michael Sammeth
- Center for Genomic Regulation (CRG), Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. National Institute for Scientific Computing (LNCC), Petropolis, Rio de Janeiro, Brazil
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development,University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Mark I McCarthy
- Wellcome Trust Centre for Human Genetics, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK. Oxford Center for Diabetes Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Pathology, Stanford University, Stanford, CA, USA
| | - Tuuli Lappalainen
- Department of Genetics, Stanford University, Stanford, CA, USA. Department of Genetic Medicine and Development,University of Geneva, Geneva, Switzerland. Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. Swiss Institute of Bioinformatics, Geneva, Switzerland. New York Genome Center, New York, NY, USA. Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Daniel G MacArthur
- Broad Institute of MIT and Harvard, Cambridge, MA, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA. Department of Medicine, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
37
|
A method for generating new datasets based on copy number for cancer analysis. BIOMED RESEARCH INTERNATIONAL 2015; 2015:467514. [PMID: 25949998 PMCID: PMC4407403 DOI: 10.1155/2015/467514] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 03/08/2015] [Indexed: 12/04/2022]
Abstract
New data sources for the analysis of cancer data are rapidly supplementing the large number of gene-expression markers used for current methods of analysis. Significant among these new sources are copy number variation (CNV) datasets, which typically enumerate several hundred thousand CNVs distributed throughout the genome. Several useful algorithms allow systems-level analyses of such datasets. However, these rich data sources have not yet been analyzed as deeply as gene-expression data. To address this issue, the extensive toolsets used for analyzing expression data in cancerous and noncancerous tissue (e.g., gene set enrichment analysis and phenotype prediction) could be redirected to extract a great deal of predictive information from CNV data, in particular those derived from cancers. Here we present a software package capable of preprocessing standard Agilent copy number datasets into a form to which essentially all expression analysis tools can be applied. We illustrate the use of this toolset in predicting the survival time of patients with ovarian cancer or glioblastoma multiforme and also provide an analysis of gene- and pathway-level deletions in these two types of cancer.
Collapse
|
38
|
Haplotype phasing and inheritance of copy number variants in nuclear families. PLoS One 2015; 10:e0122713. [PMID: 25853576 PMCID: PMC4390228 DOI: 10.1371/journal.pone.0122713] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 02/12/2015] [Indexed: 11/19/2022] Open
Abstract
DNA copy number variants (CNVs) that alter the copy number of a particular DNA segment in the genome play an important role in human phenotypic variability and disease susceptibility. A number of CNVs overlapping with genes have been shown to confer risk to a variety of human diseases thus highlighting the relevance of addressing the variability of CNVs at a higher resolution. So far, it has not been possible to deterministically infer the allelic composition of different haplotypes present within the CNV regions. We have developed a novel computational method, called PiCNV, which enables to resolve the haplotype sequence composition within CNV regions in nuclear families based on SNP genotyping microarray data. The algorithm allows to i) phase normal and CNV-carrying haplotypes in the copy number variable regions, ii) resolve the allelic copies of rearranged DNA sequence within the haplotypes and iii) infer the heritability of identified haplotypes in trios or larger nuclear families. To our knowledge this is the first program available that can deterministically phase null, mono-, di-, tri- and tetraploid genotypes in CNV loci. We applied our method to study the composition and inheritance of haplotypes in CNV regions of 30 HapMap Yoruban trios and 34 Estonian families. For 93.6% of the CNV loci, PiCNV enabled to unambiguously phase normal and CNV-carrying haplotypes and follow their transmission in the corresponding families. Furthermore, allelic composition analysis identified the co-occurrence of alternative allelic copies within 66.7% of haplotypes carrying copy number gains. We also observed less frequent transmission of CNV-carrying haplotypes from parents to children compared to normal haplotypes and identified an emergence of several de novo deletions and duplications in the offspring.
Collapse
|
39
|
Moon S, Keam B, Hwang MY, Lee Y, Park S, Oh JH, Kim YJ, Lee HS, Kim NH, Kim YJ, Kim DH, Han BG, Kim BJ, Lee J. A genome-wide association study of copy-number variation identifies putative loci associated with osteoarthritis in Koreans. BMC Musculoskelet Disord 2015; 16:76. [PMID: 25880085 PMCID: PMC4395893 DOI: 10.1186/s12891-015-0531-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Accepted: 03/17/2015] [Indexed: 12/31/2022] Open
Abstract
Background OA is a complex disease caused by environmental and genetic risk factors. The purpose of this study is to identify candidate copy number variations (CNVs) associated with OA. Methods We performed a genome-wide association study of CNV to identify potential loci that confer susceptibility to or protection from OA. CNV genotyping was conducted using NimbleGen HD2 3 × 720K comparative hybridization array and included samples from 371 OA patients and 467 healthy controls. The putative CNV regions identified were confirmed with a TaqMan assay. Results We identified six genomic regions associated with OA encompassing CNV loci. None of six loci had previously been reported in genome-wide association studies with OA, although a genetic analysis suggested that they have functional effects. The protein product of a candidate risk gene for obesity, TNKS, targets Wnt inhibition, and this gene was significantly associated with hand and knee OA. Copy number deletion on TNKS was associated with a 1.37-fold decreased risk for OA. In addition, CA10, which shows a strong association with osteoporosis, was also significant in our study. Copy number deletion on this gene was associated with a 1.69-fold decreased risk for OA. Conclusion We identified several CNV loci that may contribute to OA susceptibility in Koreans. Further functional investigations of these genes are warranted to fully characterize their putative association. Electronic supplementary material The online version of this article (doi:10.1186/s12891-015-0531-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sanghoon Moon
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Bhumsuk Keam
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea. .,Department of Internal Medicine, Seoul National University Hospital, 110-744, Seoul, Republic of Korea.
| | - Mi Yeong Hwang
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Young Lee
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Suyeon Park
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea. .,Department of Biostatistics, Soonchunhyang University, College of Medicine, 140-743, Seoul, Republic of Korea.
| | - Ji Hee Oh
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Yeon-Jung Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Heun-Sik Lee
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Nam Hee Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Young Jin Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Dong-Hyun Kim
- Department of Social and Preventive Medicine, Hallym University College of Medicine, 200-702, Chunchun, Republic of Korea.
| | - Bok-Ghee Han
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Bong-Jo Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| | - Juyoung Lee
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, 363-951, Chungchengbuk-Do, Republic of Korea.
| |
Collapse
|
40
|
Li D, Zhao H, Kranzler HR, Li MD, Jensen KP, Zayats T, Farrer LA, Gelernter J. Genome-wide association study of copy number variations (CNVs) with opioid dependence. Neuropsychopharmacology 2015; 40:1016-26. [PMID: 25345593 PMCID: PMC4330517 DOI: 10.1038/npp.2014.290] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2014] [Revised: 08/18/2014] [Accepted: 08/26/2014] [Indexed: 12/20/2022]
Abstract
Single-nucleotide polymorphisms that have been associated with opioid dependence (OD) altogether account for only a small proportion of the known heritability. Most of the genetic risk factors are unknown. Some of the 'missing heritability' might be explained by copy number variations (CNVs) in the human genome. We used Illumina HumanOmni1 arrays to genotype 5152 African-American and European-American OD cases and screened controls and implemented combined CNV calling methods. After quality control measures were applied, a genome-wide association study (GWAS) of CNVs with OD was performed. For common CNVs, two deletions and one duplication were significantly associated with OD genome-wide (eg, P=2 × 10(-8) and OR (95% CI)=0.64 (0.54-0.74) for a chromosome 18q12.3 deletion). Several rare or unique CNVs showed suggestive or marginal significance with large effect sizes. This study is the first GWAS of OD using CNVs. Some identified CNVs harbor genes newly identified here to be of biological importance in addiction, whereas others affect genes previously known to contribute to substance dependence risk. Our findings augment our specific knowledge of the importance of genomic variation in addictive disorders, and provide an addiction CNV pool for further research. These findings require replication.
Collapse
Affiliation(s)
- Dawei Li
- Department of Psychiatry, School of Medicine, Yale University, New Haven, CT, USA
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, USA
- Department of Computer Science, University of Vermont, Burlington, VT, USA
- Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, VT, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
- Department of Genetics, School of Medicine, Yale University, New Haven, CT, USA
| | - Henry R Kranzler
- Department of Psychiatry, University of Pennsylvania School of Medicine and VISN 4 MIRECC, Philadelphia VAMC, Philadelphia, PA, USA
| | - Ming D Li
- Department of Psychiatry and Neurobehavioral Sciences, University of Virginia, Charlottesville, VA, USA
| | - Kevin P Jensen
- Department of Psychiatry, School of Medicine, Yale University, New Haven, CT, USA
| | - Tetyana Zayats
- Department of Psychiatry, School of Medicine, Yale University, New Haven, CT, USA
| | - Lindsay A Farrer
- Departments of Medicine (Biomedical Genetics), Neurology, Ophthalmology, Genetics and Genomics, Biostatistics, and Epidemiology, Boston University Schools of Medicine and Public Health, Boston, MA, USA
| | - Joel Gelernter
- Department of Psychiatry, School of Medicine, Yale University, New Haven, CT, USA
- Department of Genetics, School of Medicine, Yale University, New Haven, CT, USA
- VA Connecticut Healthcare Center, Department of Neurobiology, Yale University School of Medicine, New Haven, CT, USA
| |
Collapse
|
41
|
Hu YJ, Lin DY, Sun W, Zeng D. A Likelihood-Based Framework for Association Analysis of Allele-Specific Copy Numbers. J Am Stat Assoc 2015; 109:1533-1545. [PMID: 25663726 DOI: 10.1080/01621459.2014.908777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Copy number variants (CNVs) and single nucleotide polymorphisms (SNPs) co-exist throughout the human genome and jointly contribute to phenotypic variations. Thus, it is desirable to consider both types of variants, as characterized by allele-specific copy numbers (ASCNs), in association studies of complex human diseases. Current SNP genotyping technologies capture the CNV and SNP information simultaneously via fluorescent intensity measurements. The common practice of calling ASCNs from the intensity measurements and then using the ASCN calls in downstream association analysis has important limitations. First, the association tests are prone to false-positive findings when differential measurement errors between cases and controls arise from differences in DNA quality or handling. Second, the uncertainties in the ASCN calls are ignored. We present a general framework for the integrated analysis of CNVs and SNPs, including the analysis of total copy numbers as a special case. Our approach combines the ASCN calling and the association analysis into a single step while allowing for differential measurement errors. We construct likelihood functions that properly account for case-control sampling and measurement errors. We establish the asymptotic properties of the maximum likelihood estimators and develop EM algorithms to implement the corresponding inference procedures. The advantages of the proposed methods over the existing ones are demonstrated through realistic simulation studies and an application to a genome-wide association study of schizophrenia. Extensions to next-generation sequencing data are discussed.
Collapse
|
42
|
King DA, Jones WD, Crow YJ, Dominiczak AF, Foster NA, Gaunt TR, Harris J, Hellens SW, Homfray T, Innes J, Jones EA, Joss S, Kulkarni A, Mansour S, Morris AD, Parker MJ, Porteous DJ, Shihab HA, Smith BH, Tatton-Brown K, Tolmie JL, Trzaskowski M, Vasudevan PC, Wakeling E, Wright M, Plomin R, Timpson NJ, Hurles ME. Mosaic structural variation in children with developmental disorders. Hum Mol Genet 2015; 24:2733-45. [PMID: 25634561 PMCID: PMC4406290 DOI: 10.1093/hmg/ddv033] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 01/27/2015] [Indexed: 01/01/2023] Open
Abstract
Delineating the genetic causes of developmental disorders is an area of active investigation. Mosaic structural abnormalities, defined as copy number or loss of heterozygosity events that are large and present in only a subset of cells, have been detected in 0.2–1.0% of children ascertained for clinical genetic testing. However, the frequency among healthy children in the community is not well characterized, which, if known, could inform better interpretation of the pathogenic burden of this mutational category in children with developmental disorders. In a case–control analysis, we compared the rate of large-scale mosaicism between 1303 children with developmental disorders and 5094 children lacking developmental disorders, using an analytical pipeline we developed, and identified a substantial enrichment in cases (odds ratio = 39.4, P-value 1.073e − 6). A meta-analysis that included frequency estimates among an additional 7000 children with congenital diseases yielded an even stronger statistical enrichment (P-value 1.784e − 11). In addition, to maximize the detection of low-clonality events in probands, we applied a trio-based mosaic detection algorithm, which detected two additional events in probands, including an individual with genome-wide suspected chimerism. In total, we detected 12 structural mosaic abnormalities among 1303 children (0.9%). Given the burden of mosaicism detected in cases, we suspected that many of the events detected in probands were pathogenic. Scrutiny of the genotypic–phenotypic relationship of each detected variant assessed that the majority of events are very likely pathogenic. This work quantifies the burden of structural mosaicism as a cause of developmental disorders.
Collapse
Affiliation(s)
- Daniel A King
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK
| | - Wendy D Jones
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK
| | - Yanick J Crow
- Manchester Centre for Genomic Medicine, Central Manchester University Hospitals, NHS Foundation Trust, Manchester Academic Health Science Centre (MAHSC), Manchester M13 9WL, UK
| | - Anna F Dominiczak
- College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Nicola A Foster
- University Hospitals of Leicester, NHS Trust, Leicester Royal Infirmary, Leicester LE1 5WW, UK
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK
| | - Jade Harris
- Manchester Centre for Genomic Medicine, Central Manchester University Hospitals, NHS Foundation Trust, Manchester Academic Health Science Centre (MAHSC), Manchester M13 9WL, UK
| | - Stephen W Hellens
- Northern Genetics Service, Newcastle upon Tyne Hospitals NHS Trust, Newcastle upon Tyne NE1 3BZ, UK
| | - Tessa Homfray
- Southwest Thames Regional Genetics Centre, St George's Healthcare NHS Trust, London SW17 0RE, UK
| | - Josie Innes
- Manchester Centre for Genomic Medicine, Central Manchester University Hospitals, NHS Foundation Trust, Manchester Academic Health Science Centre (MAHSC), Manchester M13 9WL, UK
| | - Elizabeth A Jones
- Manchester Centre for Genomic Medicine, Central Manchester University Hospitals, NHS Foundation Trust, Manchester Academic Health Science Centre (MAHSC), Manchester M13 9WL, UK, Manchester Centre for Genomic Medicine, Institute of Human Development, Faculty of Medical and Human Sciences, University of Manchester, MAHSC, Manchester M13 9WL, UK
| | - Shelagh Joss
- West of Scotland Clinical Genetics Service, Southern General Hospital, Glasgow DD1 9SY, UK
| | - Abhijit Kulkarni
- Southwest Thames Regional Genetics Centre, St George's Healthcare NHS Trust, London SW17 0RE, UK
| | - Sahar Mansour
- Southwest Thames Regional Genetics Centre, St George's Healthcare NHS Trust, London SW17 0RE, UK
| | - Andrew D Morris
- School of Molecular, Genetic and Population Health Sciences, University of Edinburgh Medical School, Teviot Place, Edinburgh EH8 9AG, UK
| | - Michael J Parker
- Sheffield Clinical Genetics Service, Sheffield Children's Hospital, Western Bank, Sheffield, UK
| | - David J Porteous
- Medical Genetics Section, Molecular Medicine Centre, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Hashem A Shihab
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK
| | - Blair H Smith
- School of Medicine, Dundee University, Mackenzie Building, Kirsty Semple Way, Ninewells Hospital and Medical School, Dundee DD2 4RB, UK
| | - Katrina Tatton-Brown
- Southwest Thames Regional Genetics Centre, St George's Healthcare NHS Trust, London SW17 0RE, UK
| | - John L Tolmie
- West of Scotland Clinical Genetics Service, Southern General Hospital, Glasgow DD1 9SY, UK
| | - Maciej Trzaskowski
- King's College London, MRC Social, Genetic and Developmental Psychiatry Research Centre, Institute of Psychiatry, Psychology & Neuroscience, De Crespigny Park, London SE5 8AF, UK and
| | - Pradeep C Vasudevan
- University Hospitals of Leicester, NHS Trust, Leicester Royal Infirmary, Leicester LE1 5WW, UK
| | - Emma Wakeling
- North West Thames Regional Genetics Service, North West London Hospitals NHS Trust, Watford Rd, Harrow HA1 3UJ, UK
| | - Michael Wright
- Northern Genetics Service, Newcastle upon Tyne Hospitals NHS Trust, Newcastle upon Tyne NE1 3BZ, UK
| | - Robert Plomin
- King's College London, MRC Social, Genetic and Developmental Psychiatry Research Centre, Institute of Psychiatry, Psychology & Neuroscience, De Crespigny Park, London SE5 8AF, UK and
| | - Nicholas J Timpson
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK
| | | | | |
Collapse
|
43
|
Effects of copy number variable regions on local gene expression in white blood cells of Mexican Americans. Eur J Hum Genet 2015; 23:1229-35. [PMID: 25585699 PMCID: PMC4538210 DOI: 10.1038/ejhg.2014.280] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Revised: 09/25/2014] [Accepted: 11/26/2014] [Indexed: 11/23/2022] Open
Abstract
Only few systematic studies on the contribution of copy number variation to gene expression variation have been published to date. Here we identify effects of copy number variable regions (CNVRs) on nearby gene expression by investigating 909 CNVRs and expression levels of 12059 nearby genes in white blood cells from Mexican-American participants of the San Antonio Family Heart Study. We empirically evaluate our ability to detect the contribution of CNVs to proximal gene expression (presumably in cis) at various window sizes (up to a 10 Mb distance) between the gene and CNV. We found a ~1-Mb window size to be optimal for capturing cis effects of CNVs. Up to 10% of the CNVs in this study were found to be significantly associated with the expression of at least one gene within their vicinity. As expected, we find that CNVs that directly overlap gene sequences have the largest effects on gene expression (compared with non-overlapping CNVRs located nearby), with positive correlation (except for a few exceptions) between estimated genomic dosage and expression level. We find that genes whose expression level is significantly influenced by nearby CNVRs are enriched for immunity and autoimmunity related genes. These findings add to the currently limited catalog of CNVRs that are recognized as expression quantitative trait loci, and have implications for future study designs as well as for prioritizing candidate causal variants in genomic regions associated with disease.
Collapse
|
44
|
Xiao F, Min X, Zhang H. Modified screening and ranking algorithm for copy number variation detection. ACTA ACUST UNITED AC 2014; 31:1341-8. [PMID: 25542927 DOI: 10.1093/bioinformatics/btu850] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Accepted: 12/19/2014] [Indexed: 12/17/2022]
Abstract
MOTIVATION Copy number variation (CNV) is a type of structural variation, usually defined as genomic segments that are 1 kb or larger, which present variable copy numbers when compared with a reference genome. The screening and ranking algorithm (SaRa) was recently proposed as an efficient approach for multiple change-points detection, which can be applied to CNV detection. However, some practical issues arise from application of SaRa to single nucleotide polymorphism data. RESULTS In this study, we propose a modified SaRa on CNV detection to address these issues. First, we use the quantile normalization on the original intensities to guarantee that the normal mean model-based SaRa is a robust method. Second, a novel normal mixture model coupled with a modified Bayesian information criterion is proposed for candidate change-point selection and further clustering the potential CNV segments to copy number states. Simulations revealed that the modified SaRa became a robust method for identifying change-points and achieved better performance than the circular binary segmentation (CBS) method. By applying the modified SaRa to real data from the HapMap project, we illustrated its performance on detecting CNV segments. In conclusion, our modified SaRa method improves SaRa theoretically and numerically, for identifying CNVs with high-throughput genotyping data. AVAILABILITY AND IMPLEMENTATION The modSaRa package is implemented in R program and freely available at http://c2s2.yale.edu/software/modSaRa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Feifei Xiao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Xiaoyi Min
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Heping Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
45
|
Hwang MY, Moon S, Heo L, Kim YJ, Oh JH, Kim YJ, Kim YK, Lee J, Han BG, Kim BJ. Combinatorial approach to estimate copy number genotype using whole-exome sequencing data. Genomics 2014; 105:145-9. [PMID: 25535679 DOI: 10.1016/j.ygeno.2014.12.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Revised: 12/08/2014] [Accepted: 12/16/2014] [Indexed: 11/29/2022]
Abstract
Copy number variations (CNVs) are known risk factors in complex diseases. Array-based approaches have been widely used to detect CNVs, but limitations of array-based CNV detection methods, such as noisy signal and low resolution, have hindered detection of small CNVs. Recently, the development of next-generation sequencing techniques has increased rapidly owing to declines in cost. Particularly, whole-exome sequencing has proved useful for finding causal genes and variants in complex diseases. Because gene copy number may affect expression, CNV genotyping can be very valuable in disease association studies. However, almost all current CNV detection tools consider only two types of CNV genotypes. In this study, we propose a CNV genotype estimation approach using a combination of existing methods. Our approach was comprehensively compared with the customized Agilent array-comparative genomic hybridization. We found that our genotyping approach proved to be accurate, and reproducible, suggesting that it can complement existing CNV genotyping methods.
Collapse
Affiliation(s)
- Mi Yeong Hwang
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea
| | - Sanghoon Moon
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea
| | - Lyong Heo
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea
| | - Young Jin Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea
| | - Ji Hee Oh
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea
| | - Yeon-Jung Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea
| | - Yun Kyoung Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea
| | - Juyoung Lee
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea
| | - Bok-Ghee Han
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea
| | - Bong-Jo Kim
- Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 361-951, Republic of Korea.
| |
Collapse
|
46
|
Cassese A, Guindani M, Vannucci M. A bayesian integrative model for genetical genomics with spatially informed variable selection. Cancer Inform 2014; 13:29-37. [PMID: 25288877 PMCID: PMC4179607 DOI: 10.4137/cin.s13784] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 04/10/2014] [Accepted: 04/16/2014] [Indexed: 11/05/2022] Open
Abstract
We consider a Bayesian hierarchical model for the integration of gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. The approach defines a measurement error model that relates the gene expression levels to latent copy number states. In turn, the latent states are related to the observed surrogate CGH measurements via a hidden Markov model. The model further incorporates variable selection with a spatial prior based on a probit link that exploits dependencies across adjacent DNA segments. Posterior inference is carried out via Markov chain Monte Carlo stochastic search techniques. We study the performance of the model in simulations and show better results than those achieved with recently proposed alternative priors. We also show an application to data from a genomic study on lung squamous cell carcinoma, where we identify potential candidates of associations between copy number variants and the transcriptional activity of target genes. Gene ontology (GO) analyses of our findings reveal enrichments in genes that code for proteins involved in cancer. Our model also identifies a number of potential candidate biomarkers for further experimental validation.
Collapse
Affiliation(s)
- Alberto Cassese
- Department of Statistics, Rice University, Houston, TX, USA. ; Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Michele Guindani
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | |
Collapse
|
47
|
Genome-wide copy number variation study reveals KCNIP1 as a modulator of insulin secretion. Genomics 2014; 104:113-20. [DOI: 10.1016/j.ygeno.2014.05.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Revised: 05/19/2014] [Accepted: 05/23/2014] [Indexed: 01/09/2023]
|
48
|
Scharpf RB, Mireles L, Yang Q, Köttgen A, Ruczinski I, Susztak K, Halper-Stromberg E, Tin A, Cristiano S, Chakravarti A, Boerwinkle E, Fox CS, Coresh J, Linda Kao WH. Copy number polymorphisms near SLC2A9 are associated with serum uric acid concentrations. BMC Genet 2014; 15:81. [PMID: 25007794 PMCID: PMC4118309 DOI: 10.1186/1471-2156-15-81] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 06/30/2014] [Indexed: 11/10/2022] Open
Abstract
Background Hyperuricemia is associated with multiple diseases, including gout, cardiovascular disease, and renal disease. Serum urate is highly heritable, yet association studies of single nucleotide polymorphisms (SNPs) and serum uric acid explain a small fraction of the heritability. Whether copy number polymorphisms (CNPs) contribute to uric acid levels is unknown. Results We assessed copy number on a genome-wide scale among 8,411 individuals of European ancestry (EA) who participated in the Atherosclerosis Risk in Communities (ARIC) study. CNPs upstream of the urate transporter SLC2A9 on chromosome 4p16.1 are associated with uric acid (χ2df2=3545, p=3.19×10-23). Effect sizes, expressed as the percentage change in uric acid per deleted copy, are most pronounced among women (3.974.935.87 [ 2.55097.5 denoting percentiles], p=4.57×10-23) and independent of previously reported SNPs in SLC2A9 as assessed by SNP and CNP regression models and the phasing SNP and CNP haplotypes (χ2df2=3190,p=7.23×10-08). Our finding is replicated in the Framingham Heart Study (FHS), where the effect size estimated from 4,089 women is comparable to ARIC in direction and magnitude (1.414.707.88, p=5.46×10-03). Conclusions This is the first study to characterize CNPs in ARIC and the first genome-wide analysis of CNPs and uric acid. Our findings suggests a novel, non-coding regulatory mechanism for SLC2A9-mediated modulation of serum uric acid, and detail a bioinformatic approach for assessing the contribution of CNPs to heritable traits in large population-based studies where technical sources of variation are substantial.
Collapse
Affiliation(s)
- Robert B Scharpf
- 550 N, Broadway, Suite 1101, Department of Oncology, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Zanda M, Onengut-Gumuscu S, Walker N, Shtir C, Gallo D, Wallace C, Smyth D, Todd JA, Hurles ME, Plagnol V, Rich SS. A genome-wide assessment of the role of untagged copy number variants in type 1 diabetes. PLoS Genet 2014; 10:e1004367. [PMID: 24875393 PMCID: PMC4038470 DOI: 10.1371/journal.pgen.1004367] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 03/26/2014] [Indexed: 01/02/2023] Open
Abstract
Genome-wide association studies (GWAS) for type 1 diabetes (T1D) have successfully identified more than 40 independent T1D associated tagging single nucleotide polymorphisms (SNPs). However, owing to technical limitations of copy number variants (CNVs) genotyping assays, the assessment of the role of CNVs has been limited to the subset of these in high linkage disequilibrium with tag SNPs. The contribution of untagged CNVs, often multi-allelic and difficult to genotype using existing assays, to the heritability of T1D remains an open question. To investigate this issue, we designed a custom comparative genetic hybridization array (aCGH) specifically designed to assay untagged CNV loci identified from a variety of sources. To overcome the technical limitations of the case control design for this class of CNVs, we genotyped the Type 1 Diabetes Genetics Consortium (T1DGC) family resource (representing 3,903 transmissions from parents to affected offspring) and used an association testing strategy that does not necessitate obtaining discrete genotypes. Our design targeted 4,309 CNVs, of which 3,410 passed stringent quality control filters. As a positive control, the scan confirmed the known T1D association at the INS locus by direct typing of the 5′ variable number of tandem repeat (VNTR) locus. Our results clarify the fact that the disease association is indistinguishable from the two main polymorphic allele classes of the INS VNTR, class I-and class III. We also identified novel technical artifacts resulting into spurious associations at the somatically rearranging loci, T cell receptor, TCRA/TCRD and TCRB, and Immunoglobulin heavy chain, IGH, loci on chromosomes 14q11.2, 7q34 and 14q32.33, respectively. However, our data did not identify novel T1D loci. Our results do not support a major role of untagged CNVs in T1D heritability. For many complex traits, and in particular type 1 diabetes (T1D), the genome-wide association study (GWAS) design has been successful at detecting a large number of loci that contribute disease risk. However, in the case of T1D as well as almost all other traits, the sum of these loci does not fully explain the heritability estimated from familial studies. This observation raises the possibility that additional variants exist but have not yet been found because they have not effectively been targeted by the GWAS design. Here, we focus on a specific class of large deletions/duplications called copy number variants (CNVs), and more precisely to the subset of these loci that mutate rapidly, which are highly polymorphic. A consequence of this high level of polymorphism is that these variants have typically not been captured by previous GWAS studies. We use a family based design that is optimized to capture these previously untested variants. We then perform a genome-wide scan to assess their contribution to T1D. Our scan was technically successful but did not identify novel associations. This suggests that little was missed by the GWAS strategy, and that the remaining heritability of T1D is most likely driven by a large number of variants, either rare of common, but with a small individual contribution to disease risk.
Collapse
Affiliation(s)
- Manuela Zanda
- University College London (UCL) Genetics Institute (UGI), London, United Kingdom
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | | | - Neil Walker
- JDRF/Wellcome Trust Diabetes and Inflammation laboratory, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Corina Shtir
- JDRF/Wellcome Trust Diabetes and Inflammation laboratory, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Daniel Gallo
- University of Virginia, Charlottesville, Virginia, United States of America
| | - Chris Wallace
- JDRF/Wellcome Trust Diabetes and Inflammation laboratory, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Deborah Smyth
- JDRF/Wellcome Trust Diabetes and Inflammation laboratory, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - John A. Todd
- JDRF/Wellcome Trust Diabetes and Inflammation laboratory, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | | | - Vincent Plagnol
- University College London (UCL) Genetics Institute (UGI), London, United Kingdom
- * E-mail: (VP); (SSR)
| | - Stephen S. Rich
- University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail: (VP); (SSR)
| |
Collapse
|
50
|
Cantsilieris S, Western PS, Baird PN, White SJ. Technical considerations for genotyping multi-allelic copy number variation (CNV), in regions of segmental duplication. BMC Genomics 2014; 15:329. [PMID: 24885186 PMCID: PMC4035060 DOI: 10.1186/1471-2164-15-329] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 04/22/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Intrachromosomal segmental duplications provide the substrate for non-allelic homologous recombination, facilitating extensive copy number variation in the human genome. Many multi-copy gene families are embedded within genomic regions with high levels of sequence identity (>95%) and therefore pose considerable analytical challenges. In some cases, the complexity involved in analyzing such regions is largely underestimated. Rapid, cost effective analysis of multi-copy gene regions have typically implemented quantitative approaches, however quantitative data are not an absolute means of certainty. Therefore any technique prone to degrees of measurement error can produce ambiguous results that may lead to spurious associations with complex disease. RESULTS In this study we have focused on testing the accuracy and reproducibility of quantitative analysis techniques. With reference to the C-C Chemokine Ligand-3-like-1 (CCL3L1) gene, we performed analysis using real-time Quantitative PCR (QPCR), Multiplex Ligation-dependent Probe Amplification (MLPA) and Paralogue Ratio Test (PRT). After controlling for potential outside variables on assay performance, including DNA concentration, quality, preparation and storage conditions, we find that real-time QPCR produces data that does not cluster tightly around copy number integer values, with variation substantially greater than that of the MLPA or PRT systems. We find that the method of rounding real-time QPCR measurements can potentially lead to mis-scoring of copy number genotypes and suggest caution should be exercised in interpreting QPCR data. CONCLUSIONS We conclude that real-time QPCR is inherently prone to measurement error, even under conditions that would seem favorable for association studies. Our results indicate that potential variability in the physicochemical properties of the DNA samples cannot solely explain the poor performance exhibited by the real-time QPCR systems. We recommend that more robust approaches such as PRT or MLPA should be used to genotype multi-allelic copy number variation in disease association studies and suggest several approaches which can be implemented to ensure the quality of the copy number typing using quantitative methods.
Collapse
Affiliation(s)
- Stuart Cantsilieris
- Centre for Genetic Diseases, MIMR-PHI Institute of Medical Research, Monash University, 27-31 Wright Street, Clayton 3168, Victoria, Australia.
| | | | | | | |
Collapse
|