151
|
Browning BL, Yu Z. Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am J Hum Genet 2009; 85:847-61. [PMID: 19931040 DOI: 10.1016/j.ajhg.2009.11.004] [Citation(s) in RCA: 164] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2009] [Revised: 10/08/2009] [Accepted: 11/03/2009] [Indexed: 12/21/2022] Open
Abstract
We present a novel method for simultaneous genotype calling and haplotype-phase inference. Our method employs the computationally efficient BEAGLE haplotype-frequency model, which can be applied to large-scale studies with millions of markers and thousands of samples. We compare genotype calls made with our method to genotype calls made with the BIRDSEED, CHIAMO, GenCall, and ILLUMINUS genotype-calling methods, using genotype data from the Illumina 550K and Affymetrix 500K arrays. We show that our method has higher genotype-call accuracy and yields fewer uncalled genotypes than competing methods. We perform single-marker analysis of data from the Wellcome Trust Case Control Consortium bipolar disorder and type 2 diabetes studies. For bipolar disorder, the genotype calls in the original study yield 25 markers with apparent false-positive association with bipolar disorder at a p < 10(-7) significance level, whereas genotype calls made with our method yield no associated markers at this significance threshold. Conversely, for markers with replicated association with type 2 diabetes, there is good concordance between genotype calls used in the original study and calls made by our method. Results from single-marker and haplotypic analysis of our method's genotype calls for the bipolar disorder study indicate that our method is highly effective at eliminating genotyping artifacts that cause false-positive associations in genome-wide association studies. Our new genotype-calling methods are implemented in the BEAGLE and BEAGLECALL software packages.
Collapse
|
152
|
Cooper JD, Walker NM, Smyth DJ, Downes K, Healy BC, Todd JA. Follow-up of 1715 SNPs from the Wellcome Trust Case Control Consortium genome-wide association study in type I diabetes families. Genes Immun 2009; 10 Suppl 1:S85-94. [PMID: 19956107 PMCID: PMC2805462 DOI: 10.1038/gene.2009.97] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The advent of genome-wide association (GWA) studies has revolutionized the detection of disease loci and provided abundant evidence for previously undetected disease loci that can be pooled together in meta-analysis studies or used to design follow-up studies. A total of 1715 SNPs from the Wellcome Trust Case Control Consortium GWA study of type I diabetes (T1D) were selected and a follow-up study was conducted in 1410 affected sib-pair families assembled by the Type I Diabetes Genetics Consortium. In addition to the support for previously identified loci (PTPN22/1p13; ERBB3/12q13; SH2B3/12q24; CLEC16A/16p13; UBASH3A/21q22), evidence supporting two new and distinct chromosome locations associated with T1D was observed: FHOD3/18q12 (rs2644261, P=5.9 x 10(-4)) and Xp22 (rs5979785, P=6.8 x 10(-3); http://www.T1DBase.org). There was independent support for both SNPs in a GWA meta-analysis of 7514 cases and 9045 controls (P values=5.0 x 10(-3) and 6.7 x 10(-6), respectively). The chromosome 18q12 region contains four genes, none of which are obvious functional candidate genes. In contrast, the Xp22 SNP is located 30 kb centromeric of the functional candidate genes TLR8 and TLR7 genes. Both TLR8 and TLR7 are functional candidate genes owing to their key roles as pathogen recognition receptors and, in the case of TLR7, overexpression has been associated directly with murine autoimmune disease.
Collapse
Affiliation(s)
- J D Cooper
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, UK.
| | | | | | | | | | | |
Collapse
|
153
|
Abstract
Genome-wide association studies (GWAS) have become the method of choice for investigating the genetic basis of common diseases and complex traits. The immense scale of these experiments is unprecedented, involving thousands of samples and up to a million variables. The careful execution of exploratory data analysis (EDA) prior to the actual genotype-phenotype association analysis is crucial as this identifies problematic samples and poorly assayed genetic polymorphisms that, if undetected, can compromise the outcome of the experiment. EDA of such large-scale genetic data sets thus requires specialized numerical and graphical strategies, and this article provides a review of the current exploratory tools commonly used in GWAS.
Collapse
Affiliation(s)
- Yik Y Teo
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK.
| |
Collapse
|
154
|
Abstract
The past few years have seen enormous advances in genotyping technology, including chips that accommodate in excess of 1 million SNP assays. In addition, the cost per genotype has been driven down to levels unimagined only a few years ago. These developments have resulted in an explosion of positive whole-genome association studies and the identification of many new genes for common diseases. Here I review high-throughput genotyping platforms as well as other approaches for lower numbers of assays but high sample throughput, which play an important role in genotype validation and study replication. Further, the utility of SNP arrays for detecting structural variation through the development of genotyping algorithms is reviewed and methods for long-range haplotyping are presented. It is anticipated that in the future, sample throughput and cost savings will be increased further through the combination of automation, microfluidics, and nanotechnologies.
Collapse
Affiliation(s)
- Jiannis Ragoussis
- Genomics Laboratory, Wellcome Trust Centre for Human Genetics, Oxford University, Oxford OX3 7BN, United Kingdom.
| |
Collapse
|
155
|
Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, Hamshere ML, Pahwa JS, Moskvina V, Dowzell K, Williams A, Jones N, Thomas C, Stretton A, Morgan AR, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Morgan K, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, Holmes C, Mann D, Smith AD, Love S, Kehoe PG, Hardy J, Mead S, Fox N, Rossor M, Collinge J, Maier W, Jessen F, Schürmann B, Heun R, van den Bussche H, Heuser I, Kornhuber J, Wiltfang J, Dichgans M, Frölich L, Hampel H, Hüll M, Rujescu D, Goate AM, Kauwe JSK, Cruchaga C, Nowotny P, Morris JC, Mayo K, Sleegers K, Bettens K, Engelborghs S, De Deyn PP, Van Broeckhoven C, Livingston G, Bass NJ, Gurling H, McQuillin A, Gwilliam R, Deloukas P, Al-Chalabi A, Shaw CE, Tsolaki M, Singleton AB, Guerreiro R, Mühleisen TW, Nöthen MM, Moebus S, Jöckel KH, Klopp N, Wichmann HE, Carrasquillo MM, Pankratz VS, Younkin SG, Holmans PA, O'Donovan M, Owen MJ, Williams J. Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease. Nat Genet 2009; 41:1088-93. [PMID: 19734902 PMCID: PMC2845877 DOI: 10.1038/ng.440] [Citation(s) in RCA: 2123] [Impact Index Per Article: 141.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2009] [Accepted: 07/31/2009] [Indexed: 12/22/2022]
Abstract
We undertook a two-stage genome-wide association study of Alzheimer's disease involving over 16,000 individuals. In stage 1 (3,941 cases and 7,848 controls), we replicated the established association with the APOE locus (most significant SNP: rs2075650, p= 1.8×10−157) and observed genome-wide significant association with SNPs at two novel loci: rs11136000 in the CLU or APOJ gene (p= 1.4×10−9) and rs3851179, a SNP 5′ to the PICALM gene (p= 1.9×10−8). Both novel associations were supported in stage 2 (2,023 cases and 2,340 controls), producing compelling evidence for association with AD in the combined dataset (rs11136000: p= 8.5×10−10, odds ratio= 0.86; rs3851179: p= 1.3×10−9, odds ratio= 0.86). We also observed more variants associated at p< 1×10−5 than expected by chance (p=7.5×10−6), including polymorphisms at the BIN1, DAB1 and CR1 loci.
Collapse
Affiliation(s)
- Denise Harold
- Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, School of Medicine, Cardiff University, Cardiff, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
156
|
Ritchie ME, Carvalho BS, Hetrick KN, Tavaré S, Irizarry RA. R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips. Bioinformatics 2009; 25:2621-3. [PMID: 19661241 PMCID: PMC2752620 DOI: 10.1093/bioinformatics/btp470] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Summary: Illumina produces a number of microarray-based technologies for human genotyping. An Infinium BeadChip is a two-color platform that types between 105 and 106 single nucleotide polymorphisms (SNPs) per sample. Despite being widely used, there is a shortage of open source software to process the raw intensities from this platform into genotype calls. To this end, we have developed the R/Bioconductor package crlmm for analyzing BeadChip data. After careful preprocessing, our software applies the CRLMM algorithm to produce genotype calls, confidence scores and other quality metrics at both the SNP and sample levels. We provide access to the raw summary-level intensity data, allowing users to develop their own methods for genotype calling or copy number analysis if they wish. Availability and Implementation: The crlmm Bioconductor package is available from http://www.bioconductor.org. Data packages and documentation are available from http://rafalab.jhsph.edu/software.html. Contact:mritchie@wehi.edu.au; rafa@jhu.edu
Collapse
Affiliation(s)
- Matthew E Ritchie
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville Victoria 3052, Australia.
| | | | | | | | | |
Collapse
|
157
|
Nolte IM, Wallace C, Newhouse SJ, Waggott D, Fu J, Soranzo N, Gwilliam R, Deloukas P, Savelieva I, Zheng D, Dalageorgou C, Farrall M, Samani NJ, Connell J, Brown M, Dominiczak A, Lathrop M, Zeggini E, Wain LV, Newton-Cheh C, Eijgelsheim M, Rice K, de Bakker PIW, Pfeufer A, Sanna S, Arking DE, Asselbergs FW, Spector TD, Carter ND, Jeffery S, Tobin M, Caulfield M, Snieder H, Paterson AD, Munroe PB, Jamshidi Y. Common genetic variation near the phospholamban gene is associated with cardiac repolarisation: meta-analysis of three genome-wide association studies. PLoS One 2009; 4:e6138. [PMID: 19587794 PMCID: PMC2704957 DOI: 10.1371/journal.pone.0006138] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 06/04/2009] [Indexed: 12/22/2022] Open
Abstract
To identify loci affecting the electrocardiographic QT interval, a measure of cardiac repolarisation associated with risk of ventricular arrhythmias and sudden cardiac death, we conducted a meta-analysis of three genome-wide association studies (GWAS) including 3,558 subjects from the TwinsUK and BRIGHT cohorts in the UK and the DCCT/EDIC cohort from North America. Five loci were significantly associated with QT interval at P<1×10−6. To validate these findings we performed an in silico comparison with data from two QT consortia: QTSCD (n = 15,842) and QTGEN (n = 13,685). Analysis confirmed the association between common variants near NOS1AP (P = 1.4×10−83) and the phospholamban (PLN) gene (P = 1.9×10−29). The most associated SNP near NOS1AP (rs12143842) explains 0.82% variance; the SNP near PLN (rs11153730) explains 0.74% variance of QT interval duration. We found no evidence for interaction between these two SNPs (P = 0.99). PLN is a key regulator of cardiac diastolic function and is involved in regulating intracellular calcium cycling, it has only recently been identified as a susceptibility locus for QT interval. These data offer further mechanistic insights into genetic influence on the QT interval which may predispose to life threatening arrhythmias and sudden cardiac death.
Collapse
Affiliation(s)
- Ilja M Nolte
- Unit of Genetic Epidemiology and Bioinformatics, Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
158
|
Sun W, Wright FA, Tang Z, Nordgard SH, Van Loo P, Yu T, Kristensen VN, Perou CM. Integrated study of copy number states and genotype calls using high-density SNP arrays. Nucleic Acids Res 2009; 37:5365-77. [PMID: 19581427 PMCID: PMC2935461 DOI: 10.1093/nar/gkp493] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.
Collapse
Affiliation(s)
- Wei Sun
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | |
Collapse
|
159
|
Panoutsopoulou K, Zeggini E. Finding common susceptibility variants for complex disease: past, present and future. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:345-52. [PMID: 19571035 PMCID: PMC2758134 DOI: 10.1093/bfgp/elp020] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The identification of complex disease susceptibility loci has been accelerated considerably by advances in high-throughput genotyping technologies, improved insight into correlation patterns of common variants and the availability of large-scale sample sets. Linkage scans and small-scale candidate gene studies have now given way to genome-wide association scans. In this review, we summarize insights gained from the past, highlight practical issues relating to the design and analysis of current state-of-the-art GWA studies and look into future trends in the field of human complex trait genetics.
Collapse
|
160
|
Teo YY, Fry AE, Bhattacharya K, Small KS, Kwiatkowski DP, Clark TG. Genome-wide comparisons of variation in linkage disequilibrium. Genome Res 2009; 19:1849-60. [PMID: 19541915 DOI: 10.1101/gr.092189.109] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Current genome-wide surveys of common diseases and complex traits fundamentally aim to detect indirect associations where the single nucleotide polymorphisms (SNPs) carrying the association signals are not biologically active but are in linkage disequilibrium (LD) with some unknown functional polymorphisms. Reproducing any novel discoveries from these genome-wide scans in independent studies is now a prerequisite for the putative findings to be accepted. Significant differences in patterns of LD between populations can affect the portability of phenotypic associations when the replication effort or meta-analyses are attempted in populations that are distinct from the original population in which the genome-wide study is performed. Here, we introduce a novel method for genome-wide analyses of LD variations between populations that allow the identification of candidate regions with different patterns of LD. The evidence of LD variation provided by the introduced method correlated with the degree of differences in the frequencies of the most common haplotype across the populations. Identified regions also resulted in greater variation in the success of replication attempts compared with random regions in the genome. A separate permutation strategy introduced for assessing LD variation in the absence of genome-wide data also correctly identified the expected variation in LD patterns in two well-established regions undergoing strong population-specific evolutionary pressure. Importantly, this method addresses whether a failure to reproduce a disease association in a disparate population is due to underlying differences in LD structure with an unknown functional polymorphism, which is vital in the current climate of replicating and fine-mapping established findings from genome-wide association studies.
Collapse
Affiliation(s)
- Yik Y Teo
- Wellcome Trust Centre for Human Genetics, University of Oxford, United Kingdom.
| | | | | | | | | | | |
Collapse
|
161
|
Barrett JC, Clayton D, Concannon P, Akolkar B, Cooper JD, Erlich HA, Julier C, Morahan G, Nerup J, Nierras C, Plagnol V, Pociot F, Schuilenburg H, Smyth DJ, Stevens H, Todd JA, Walker NM, Rich SS. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet 2009; 41:703-7. [PMID: 19430480 PMCID: PMC2889014 DOI: 10.1038/ng.381] [Citation(s) in RCA: 1319] [Impact Index Per Article: 87.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2008] [Accepted: 04/15/2009] [Indexed: 02/07/2023]
Abstract
Type 1 diabetes (T1D) is a common autoimmune disorder that arises from the action of multiple genetic and environmental risk factors. We report the findings of a genome-wide association study of T1D, combined in a meta-analysis with two previously published studies. The total sample set included 7,514 cases and 9,045 reference samples. Forty-one distinct genomic locations provided evidence for association with T1D in the meta-analysis (P < 10(-6)). After excluding previously reported associations, we further tested 27 regions in an independent set of 4,267 cases, 4,463 controls and 2,319 affected sib-pair (ASP) families. Of these, 18 regions were replicated (P < 0.01; overall P < 5 × 10(-8)) and 4 additional regions provided nominal evidence of replication (P < 0.05). The many new candidate genes suggested by these results include IL10, IL19, IL20, GLIS3, CD69 and IL27.
Collapse
MESH Headings
- Algorithms
- Antigens, CD/genetics
- CTLA-4 Antigen
- Chromosome Mapping/methods
- Chromosomes, Human, Pair 1/genetics
- Chromosomes, Human, Pair 17/genetics
- Chromosomes, Human, Pair 2/genetics
- DEAD-box RNA Helicases/genetics
- DNA/genetics
- Diabetes Mellitus, Type 1/epidemiology
- Diabetes Mellitus, Type 1/genetics
- Diabetes Mellitus, Type 1/immunology
- Family
- Female
- Genome-Wide Association Study
- Genotype
- HLA Antigens/genetics
- Humans
- Interferon-Induced Helicase, IFIH1
- Male
- Meta-Analysis as Topic
- Polymorphism, Single Nucleotide/genetics
- Protein Tyrosine Phosphatase, Non-Receptor Type 22/genetics
- Risk Assessment
- Siblings
Collapse
Affiliation(s)
- Jeffrey C. Barrett
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK
| | - David Clayton
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK
| | - Patrick Concannon
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Beena Akolkar
- Division of Diabetes, Endocrinology, and Metabolic Diseases, The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health, Bethesda, MD, USA
| | - Jason D. Cooper
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK
| | | | - Cécile Julier
- Inserm U730, Centre National de Génotypage, Evry, FR
| | - Grant Morahan
- Centre for Diabetes Research, The Western Australian Institute for Medical Research, and Centre for Medical Research, University of Western Australia, Perth, WA, AUSTRALIA
| | | | | | - Vincent Plagnol
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK
| | | | - Helen Schuilenburg
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK
| | - Deborah J. Smyth
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK
| | - Helen Stevens
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK
| | - John A. Todd
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK
| | - Neil M. Walker
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke’s Hospital, Cambridge, CB2 0XY, UK
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Public Health Sciences, Division of Biostatistics and Epidemiology, University of Virginia, Charlottesville, VA, USA
| | | |
Collapse
|
162
|
Soranzo N, Rivadeneira F, Chinappen-Horsley U, Malkina I, Richards JB, Hammond N, Stolk L, Nica A, Inouye M, Hofman A, Stephens J, Wheeler E, Arp P, Gwilliam R, Jhamai PM, Potter S, Chaney A, Ghori MJR, Ravindrarajah R, Ermakov S, Estrada K, Pols HAP, Williams FM, McArdle WL, van Meurs JB, Loos RJF, Dermitzakis ET, Ahmadi KR, Hart DJ, Ouwehand WH, Wareham NJ, Barroso I, Sandhu MS, Strachan DP, Livshits G, Spector TD, Uitterlinden AG, Deloukas P. Meta-analysis of genome-wide scans for human adult stature identifies novel Loci and associations with measures of skeletal frame size. PLoS Genet 2009; 5:e1000445. [PMID: 19343178 PMCID: PMC2661236 DOI: 10.1371/journal.pgen.1000445] [Citation(s) in RCA: 213] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2008] [Accepted: 03/04/2009] [Indexed: 12/31/2022] Open
Abstract
Recent genome-wide (GW) scans have identified several independent loci affecting human stature, but their contribution through the different skeletal components of height is still poorly understood. We carried out a genome-wide scan in 12,611 participants, followed by replication in an additional 7,187 individuals, and identified 17 genomic regions with GW-significant association with height. Of these, two are entirely novel (rs11809207 in CATSPER4, combined P-value = 6.1×10−8 and rs910316 in TMED10, P-value = 1.4×10−7) and two had previously been described with weak statistical support (rs10472828 in NPR3, P-value = 3×10−7 and rs849141 in JAZF1, P-value = 3.2×10−11). One locus (rs1182188 at GNA12) identifies the first height eQTL. We also assessed the contribution of height loci to the upper- (trunk) and lower-body (hip axis and femur) skeletal components of height. We find evidence for several loci associated with trunk length (including rs6570507 in GPR126, P-value = 4×10−5 and rs6817306 in LCORL, P-value = 4×10−4), hip axis length (including rs6830062 at LCORL, P-value = 4.8×10−4 and rs4911494 at UQCC, P-value = 1.9×10−4), and femur length (including rs710841 at PRKG2, P-value = 2.4×10−5 and rs10946808 at HIST1H1D, P-value = 6.4×10−6). Finally, we used conditional analyses to explore a possible differential contribution of the height loci to these different skeletal size measurements. In addition to validating four novel loci controlling adult stature, our study represents the first effort to assess the contribution of genetic loci to three skeletal components of height. Further statistical tests in larger numbers of individuals will be required to verify if the height loci affect height preferentially through these subcomponents of height. The first genetic association studies of adult height have confirmed a role of many common variants in influencing human height, but to date, the genetic basis of differences between different skeletal components of height have not been addressed. Here, we take advantage of recent technical and methodological advances to examine the role of common genetic variants on both height and skeletal components of height. By examining nearly 20,000 individuals from the UK and the Netherlands, we provide statistically significant evidence that 17 genomic regions are associated with height, including four novel regions. We also examine, for the first time, the association of these 17 regions with skeletal size measurements of spine, femur, and hip axis length, a measurement of hip geometry known to influence the risk of osteoporotic fractures. We find that some height loci are also associated with these skeletal components, although further statistical tests will be required to verify if these genetic variants act differentially on the individual skeletal measurements. The knowledge generated by this and other studies will not only inform the genetics of human quantitative variation, but will also lead to the potential discovery of many medically important polymorphisms.
Collapse
Affiliation(s)
- Nicole Soranzo
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
- Department of Twin Research and Genetic Epidemiology, St. Thomas' Hospital Campus, King's College London, London, United Kingdom
| | - Fernando Rivadeneira
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Usha Chinappen-Horsley
- Department of Twin Research and Genetic Epidemiology, St. Thomas' Hospital Campus, King's College London, London, United Kingdom
| | - Ida Malkina
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - J. Brent Richards
- Department of Twin Research and Genetic Epidemiology, St. Thomas' Hospital Campus, King's College London, London, United Kingdom
- Department of Medicine, Jewish General Hospital, McGill University, Montreal, Quebec, Canada
| | - Naomi Hammond
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Lisette Stolk
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Alexandra Nica
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Michael Inouye
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Albert Hofman
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Jonathan Stephens
- Department of Haematology of Cambridge and NHS Blood and Transplant (NHSBT), Cambridge, United Kingdom
| | - Eleanor Wheeler
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Pascal Arp
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Rhian Gwilliam
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - P. Mila Jhamai
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Simon Potter
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Amy Chaney
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Mohammed J. R. Ghori
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Radhi Ravindrarajah
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Sergey Ermakov
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Karol Estrada
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Huibert A. P. Pols
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Frances M. Williams
- Department of Twin Research and Genetic Epidemiology, St. Thomas' Hospital Campus, King's College London, London, United Kingdom
| | - Wendy L. McArdle
- ALSPAC Laboratory, Department of Social Medicine, University of Bristol, Bristol, United Kingdom
| | - Joyce B. van Meurs
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Ruth J. F. Loos
- Medical Research Council Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, United Kingdom
| | | | - Kourosh R. Ahmadi
- Department of Twin Research and Genetic Epidemiology, St. Thomas' Hospital Campus, King's College London, London, United Kingdom
| | - Deborah J. Hart
- Department of Twin Research and Genetic Epidemiology, St. Thomas' Hospital Campus, King's College London, London, United Kingdom
| | - Willem H. Ouwehand
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
- Department of Haematology of Cambridge and NHS Blood and Transplant (NHSBT), Cambridge, United Kingdom
| | - Nicholas J. Wareham
- Medical Research Council Epidemiology Unit, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, United Kingdom
| | - Inês Barroso
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Manjinder S. Sandhu
- Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - David P. Strachan
- Division of Community Health Sciences, St. George's, University of London, London, United Kingdom
| | - Gregory Livshits
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Timothy D. Spector
- Department of Twin Research and Genetic Epidemiology, St. Thomas' Hospital Campus, King's College London, London, United Kingdom
- ¶ These authors also contributed equally to this work
| | - André G. Uitterlinden
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands
- ¶ These authors also contributed equally to this work
| | - Panos Deloukas
- Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
- * E-mail:
| |
Collapse
|
163
|
Sampson JN, Zhao H. Genotyping and inflated type I error rate in genome-wide association case/control studies. BMC Bioinformatics 2009; 10:68. [PMID: 19236714 PMCID: PMC2679732 DOI: 10.1186/1471-2105-10-68] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Accepted: 02/23/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One common goal of a case/control genome wide association study (GWAS) is to find SNPs associated with a disease. Traditionally, the first step in such studies is to assign a genotype to each SNP in each subject, based on a statistic summarizing fluorescence measurements. When the distributions of the summary statistics are not well separated by genotype, the act of genotype assignment can lead to more potential problems than acknowledged by the literature. RESULTS Specifically, we show that the proportions of each called genotype need not equal the true proportions in the population, even as the number of subjects grows infinitely large. The called genotypes for two subjects need not be independent, even when their true genotypes are independent. Consequently, p-values from tests of association can be anti-conservative, even when the distributions of the summary statistic for the cases and controls are identical. To address these problems, we propose two new tests designed to reduce the inflation in the type I error rate caused by these problems. The first algorithm, logiCALL, measures call quality by fully exploring the likelihood profile of intensity measurements, and the second algorithm avoids genotyping by using a likelihood ratio statistic. CONCLUSION Genotyping can introduce avoidable false positives in GWAS.
Collapse
Affiliation(s)
- Joshua N Sampson
- Department of Epidemiology and Public Health, Yale University School of Medicine, New haven, CT, USA
| | - Hongyu Zhao
- Department of Epidemiology and Public Health, Yale University School of Medicine, New haven, CT, USA
| |
Collapse
|
164
|
Yu Z, Garner C, Ziogas A, Anton-Culver H, Schaid DJ. Genotype determination for polymorphisms in linkage disequilibrium. BMC Bioinformatics 2009; 10:63. [PMID: 19228433 PMCID: PMC2753842 DOI: 10.1186/1471-2105-10-63] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 02/20/2009] [Indexed: 01/28/2023] Open
Abstract
Background Genome-wide association studies with single nucleotide polymorphisms (SNPs) show great promise to identify genetic determinants of complex human traits. In current analyses, genotype calling and imputation of missing genotypes are usually considered as two separated tasks. The genotypes of SNPs are first determined one at a time from allele signal intensities. Then the missing genotypes, i.e., no-calls caused by not perfectly separated signal clouds, are imputed based on the linkage disequilibrium (LD) between multiple SNPs. Although many statistical methods have been developed to improve either genotype calling or imputation of missing genotypes, treating the two steps independently can lead to loss of genetic information. Results We propose a novel genotype calling framework. In this framework, we consider the signal intensities and underlying LD structure of SNPs simultaneously by estimating both cluster parameters and haplotype frequencies. As a result, our new method outperforms some existing algorithms in terms of both call rates and genotyping accuracy. Our studies also suggest that jointly analyzing multiple SNPs in LD provides more accurate estimation of haplotypes than haplotype reconstruction methods that only use called genotypes. Conclusion Our study demonstrates that jointly analyzing signal intensities and LD structure of multiple SNPs is a better way to determine genotypes and estimate LD parameters.
Collapse
Affiliation(s)
- Zhaoxia Yu
- Department of Statistics, University of California, Irvine, CA, USA.
| | | | | | | | | |
Collapse
|
165
|
Howson JMM, Walker NM, Clayton D, Todd JA. Confirmation of HLA class II independent type 1 diabetes associations in the major histocompatibility complex including HLA-B and HLA-A. Diabetes Obes Metab 2009; 11 Suppl 1:31-45. [PMID: 19143813 PMCID: PMC2779837 DOI: 10.1111/j.1463-1326.2008.01001.x] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AIM Until recently, human leucocyte antigen (HLA) class II-independent associations with type 1 diabetes (T1D) in the Major Histocompatibility Complex (MHC) region were not adequately characterized owing to insufficient map coverage, inadequate statistical approaches and strong linkage disequilibrium spanning the entire MHC. Here we test for HLA class II-independent associations in the MHC using fine mapping data generated by the Type 1 Diabetes Genetics Consortium (T1DGC). METHODS We have applied recursive partitioning to the modelling of the class II loci and used stepwise conditional logistic regression to test approximately 1534 loci between 29 and 34 Mb on chromosome 6p21, typed in 2240 affected sibpair (ASP) families. RESULTS Preliminary analyses confirm that HLA-B (at 31.4 Mb), HLA-A (at 30.0 Mb) are associated with T1D independently of the class II genes HLA-DRB1 and HLA-DQB1 (P = 6.0 x 10(-17) and 8.8 x 10(-13), respectively). In addition, a second class II region of association containing the single-nucleotide polymorphism (SNP), rs439121, and the class II locus HLA-DPB1, was identified as a T1D susceptibility effect which is independent of HLA-DRB1, HLA-DQB1 and HLA-B (P = 9.2 x 10(-8)). A younger age-at-diagnosis of T1D was found for HLA-B*39 (P = 7.6 x 10(-6)), and HLA-B*38 was protective for T1D. CONCLUSIONS These analyses in the T1DGC families replicate our results obtained previously in approximately 2000 cases and controls and 850 families. Taking both studies together, there is evidence for four T1D-associated regions at 30.0 Mb (HLA-A), 31.4 Mb (HLA-B), 32.5 Mb (rs9268831/HLA-DRA) and 33.2 Mb (rs439121/HLA-DPB1) that are independent of HLA-DRB1/HLA-DQB1. Neither study found evidence of independent associations at HLA-C, HLA-DQA1 loci nor in the UBD/MAS1L or ITPR3 gene regions. These studies show that to find true class II-independent effects, large, well-powered sample collections are required and be genotyped with a dense map of markers. In addition, a robust statistical methodology that fully models the class II effects is necessary. Recursive partitioning is a useful tool for modelling these multiallelic systems.
Collapse
Affiliation(s)
- J M M Howson
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.
| | | | | | | |
Collapse
|
166
|
Lynch AG, Dunning MJ, Iddawela M, Barbosa-Morais NL, Ritchie ME. Considerations for the processing and analysis of GoldenGate-based two-colour Illumina platforms. Stat Methods Med Res 2009; 18:437-52. [PMID: 19153169 DOI: 10.1177/0962280208099451] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Illumina's GoldenGate technology is a two-channel microarray platform that allows for the simultaneous interrogation of about 1,500 locations in the genome. GoldenGate has proved a flexible platform not only in the choice of those 1,500 locations, but also in the choice of the property being measured at them. It retains the desirable properties of Illumina's BeadArrays in that the probes (in this case 'beads') are randomly arranged across the microarray, there are multiple instances of each probe and many samples can be processed simultaneously. As for other Illumina technologies, however, these properties are not exploited as they might be. Here we review the various common adaptations of the GoldenGate platform, review the analysis methods that are associated with each adaptation and then, with the aid of a number of example data sets we illustrate some of the improvements that can be made over the default analysis.
Collapse
Affiliation(s)
- A G Lynch
- University of Cambridge/Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Cambridge, UK.
| | | | | | | | | |
Collapse
|
167
|
Bayjanov JR, Wels M, Starrenburg M, van Hylckama Vlieg JET, Siezen RJ, Molenaar D. PanCGH: a genotype-calling algorithm for pangenome CGH data. ACTA ACUST UNITED AC 2009; 25:309-14. [PMID: 19129208 PMCID: PMC2639077 DOI: 10.1093/bioinformatics/btn632] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION Pangenome arrays contain DNA oligomers targeting several sequenced reference genomes from the same species. In microbiology, these can be employed to investigate the often high genetic variability within a species by comparative genome hybridization (CGH). The biological interpretation of pangenome CGH data depends on the ability to compare strains at a functional level, particularly by comparing the presence or absence of orthologous genes. Due to the high genetic variability, available genotype-calling algorithms can not be applied to pangenome CGH data. RESULTS We have developed the algorithm PanCGH that incorporates orthology information about genes to predict the presence or absence of orthologous genes in a query organism using CGH arrays that target the genomes of sequenced representatives of a group of microorganisms. PanCGH was tested and applied in the analysis of genetic diversity among 39 Lactococcus lactis strains from three different subspecies (lactis.cremoris, hordniae) and isolated from two different niches (dairy and plant). Clustering of these strains using the presence/absence data of gene orthologs revealed a clear separation between different subspecies and reflected the niche of the strains.
Collapse
Affiliation(s)
- Jumamurat R Bayjanov
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Centre, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands.
| | | | | | | | | | | |
Collapse
|
168
|
Chan EKF, Hawken R, Reverter A. The combined effect of SNP-marker and phenotype attributes in genome-wide association studies. Anim Genet 2008; 40:149-56. [PMID: 19076733 PMCID: PMC2680326 DOI: 10.1111/j.1365-2052.2008.01816.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The last decade has seen rapid improvements in high-throughput single nucleotide polymorphism (SNP) genotyping technologies that have consequently made genome-wide association studies (GWAS) possible. With tens to hundreds of thousands of SNP markers being tested simultaneously in GWAS, it is imperative to appropriately pre-process, or filter out, those SNPs that may lead to false associations. This paper explores the relationships between various SNP genotype and phenotype attributes and their effects on false associations. We show that (i) uniformly distributed ordinal data as well as binary data are more easily influenced, though not necessarily negatively, by differences in various SNP attributes compared with normally distributed data; (ii) filtering SNPs on minor allele frequency (MAF) and extent of Hardy–Weinberg equilibrium (HWE) deviation has little effect on the overall false positive rate; (iii) in some cases, filtering on MAF only serves to exclude SNPs from the analysis without reduction of the overall proportion of false associations; and (iv) HWE, MAF and heterozygosity are all dependent on minor genotype frequency, a newly proposed measure for genotype integrity.
Collapse
Affiliation(s)
- E K F Chan
- Cooperative Research Centre for Beef Genetic Technologies, CSIRO Livestock Industries, Queensland Bioscience Precinct, 306 Carmody Road, St Lucia, Qld 4067, Australia
| | | | | |
Collapse
|
169
|
Lin Y, Tseng GC, Cheong SY, Bean LJH, Sherman SL, Feingold E. Smarter clustering methods for SNP genotype calling. ACTA ACUST UNITED AC 2008; 24:2665-71. [PMID: 18826959 DOI: 10.1093/bioinformatics/btn509] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Most genotyping technologies for single nucleotide polymorphism (SNP) markers use standard clustering methods to 'call' the SNP genotypes. These methods are not always optimal in distinguishing the genotype clusters of a SNP because they do not take advantage of specific features of the genotype calling problem. In particular, when family data are available, pedigree information is ignored. Furthermore, prior information about the distribution of the measurements for each cluster can be used to choose an appropriate model-based clustering method and can significantly improve the genotype calls. One special genotyping problem that has never been discussed in the literature is that of genotyping of trisomic individuals, such as individuals with Down syndrome. Calling trisomic genotypes is a more complicated problem, and the addition of external information becomes very important. RESULTS In this article, we discuss the impact of incorporating external information into clustering algorithms to call the genotypes for both disomic and trisomic data. We also propose two new methods to call genotypes using family data. One is a modification of the K-means method and uses the pedigree information by updating all members of a family together. The other is a likelihood-based method that combines the Gaussian or beta-mixture model with pedigree information. We compare the performance of these two methods and some other existing methods using simulation studies. We also compare the performance of these methods on a real dataset generated by the Illumina platform (www.illumina.com). AVAILABILITY The R code for the family-based genotype calling methods (SNPCaller) is available to be downloaded from the following website: http://watson.hgen.pitt.edu/register.
Collapse
Affiliation(s)
- Yan Lin
- Department of Biostatistics, Department of Medicine, Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA.
| | | | | | | | | | | |
Collapse
|
170
|
Giannoulatou E, Yau C, Colella S, Ragoussis J, Holmes CC. GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population. Bioinformatics 2008; 24:2209-14. [PMID: 18653518 DOI: 10.1093/bioinformatics/btn386] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Current genotyping algorithms typically call genotypes by clustering allele-specific intensity data on a single nucleotide polymorphism (SNP) by SNP basis. This approach assumes the availability of a large number of control samples that have been sampled on the same array and platform. We have developed a SNP genotyping algorithm for the Illumina Infinium SNP genotyping assay that is entirely within-sample and does not require the need for a population of control samples nor parameters derived from such a population. Our algorithm exhibits high concordance with current methods and >99% call accuracy on HapMap samples. The ability to call genotypes using only within-sample information makes the method computationally light and practical for studies involving small sample sizes and provides a valuable independent quality control metric for other population-based approaches. AVAILABILITY http://www.stats.ox.ac.uk/~giannoul/GenoSNP/.
Collapse
Affiliation(s)
- Eleni Giannoulatou
- Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX13TG, UK
| | | | | | | | | |
Collapse
|
171
|
Teo YY, Small KS, Clark TG, Kwiatkowski DP. Perturbation analysis: a simple method for filtering SNPs with erroneous genotyping in genome-wide association studies. Ann Hum Genet 2008; 72:368-74. [PMID: 18261185 PMCID: PMC2997476 DOI: 10.1111/j.1469-1809.2007.00422.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We introduce a simple and yet scientifically objective criterion for identifying SNPs with genotyping errors due to poor clustering. This yields a metric for assessing the stability of the assigned genotypes by evaluating the extent of discordance between the calls made with the unperturbed and perturbed intensities. The efficacy of the metric is evaluated by: (1) estimating the extent of over-dispersion of the Hardy-Weinberg equilibrium chi-square test statistics; (2) an interim case-control study, where we investigated the efficacy of the introduced metric and standard quality control filters in reducing the number of SNPs with evidence of phenotypic association which are attributed to genotyping errors; (3) investigating the call and concordance rates of SNPs identified by perturbation analysis which have been genotyped on both Affymetrix and Illumina platforms. Removing SNPs identified by the extent of discordance can reduce the degree of over-dispersion of the HWE test statistic. Sensible use of perturbation analysis in an association study can correctly identify SNPs with problematic genotyping, reducing the number required for visual inspection. SNPs identified by perturbation analysis had lower call and concordance rates, and removal of these SNPs significantly improved the performance for the remaining SNPs.
Collapse
Affiliation(s)
- Y Y Teo
- Wellcome Trust Centre for Human Genetics, University of Oxford, United Kingdom.
| | | | | | | |
Collapse
|
172
|
Teo YY, Inouye M, Small KS, Fry AE, Potter SC, Dunstan SJ, Seielstad M, Barroso I, Wareham NJ, Rockett KA, Kwiatkowski DP, Deloukas P. Whole genome-amplified DNA: insights and imputation. Nat Methods 2008; 5:279-80. [PMID: 18376389 DOI: 10.1038/nmeth0408-279] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
173
|
Teo YY. Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling and population structure. Curr Opin Lipidol 2008; 19:133-43. [PMID: 18388693 DOI: 10.1097/mol.0b013e3282f5dd77] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
PURPOSE OF REVIEW Genetic association studies which survey the entire genome have become a common design for uncovering the genetic basis of common diseases, including lipid-related traits. Such studies have identified several novel loci which influence blood lipids. The present review highlights the statistical challenges associated with such large-scale genetic studies and discusses the available methodological strategies for handling these issues. RECENT FINDINGS The successful analysis of genome-wide data assayed on commercial genotyping arrays depends on careful exploration of the data. Unaccounted sample failures, genotyping errors and population structure can introduce misleading signals that mimic genuine association. Careful interpretation of useful summary statistics and graphical data displays can minimize the extent of false associations that need to be followed up in replication or fine-mapping experiments. SUMMARY Recently published genome-wide studies are beginning to yield valuable insights into the importance of well designed methodological and statistical techniques for sensible interpretation of the plethora of genetic data generated.
Collapse
Affiliation(s)
- Yik Y Teo
- Wellcome Trust Centre for Human Genetics, University of Oxford, UK.
| |
Collapse
|