1
|
Abstract
Combining statistical significances (P-values) from a set of single-locus association tests in genome-wide association studies is a proof-of-principle method for identifying disease-associated genomic segments, functional genes and biological pathways. We review P-value combinations for genome-wide association studies and introduce an integrated analysis tool, Omnibus P-value Association Tests (OPATs), which provides popular analysis methods of P-value combinations. The software OPATs programmed in R and R graphical user interface features a user-friendly interface. In addition to analysis modules for data quality control and single-locus association tests, OPATs provides three types of set-based association test: window-, gene- and biopathway-based association tests. P-value combinations with or without threshold and rank truncation are provided. The significance of a set-based association test is evaluated by using resampling procedures. Performance of the set-based association tests in OPATs has been evaluated by simulation studies and real data analyses. These set-based association tests help boost the statistical power, alleviate the multiple-testing problem, reduce the impact of genetic heterogeneity, increase the replication efficiency of association tests and facilitate the interpretation of association signals by streamlining the testing procedures and integrating the genetic effects of multiple variants in genomic regions of biological relevance. In summary, P-value combinations facilitate the identification of marker sets associated with disease susceptibility and uncover missing heritability in association studies, thereby establishing a foundation for the genetic dissection of complex diseases and traits. OPATs provides an easy-to-use and statistically powerful analysis tool for P-value combinations. OPATs, examples, and user guide can be downloaded from http://www.stat.sinica.edu.tw/hsinchou/genetics/association/OPATs.htm.
Collapse
Affiliation(s)
| | - Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica
- Corresponding author: Hsin-Chou Yang, Institute of Statistical Science, Academia Sinica, No 128, Academia Road, Section 2, Nankang, Taipei 115, Taiwan. Tel.: 886-2-27835611 ext. 113; Fax: 886-2-27831523; E-mail:
| |
Collapse
|
2
|
Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm. Genet Res (Camb) 2009; 90:509-24. [PMID: 19123969 DOI: 10.1017/s0016672308009877] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates are also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2-5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.
Collapse
|
3
|
Kuk AYC, Zhang H, Yang Y. Computationally feasible estimation of haplotype frequencies from pooled DNA with and without Hardy-Weinberg equilibrium. Bioinformatics 2008; 25:379-86. [PMID: 19050036 DOI: 10.1093/bioinformatics/btn623] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Pooling large number of DNA samples is a common practice in association study, especially for initial screening. However, the use of expectation-maximization (EM)-type algorithms in estimating haplotype distributions for even moderate pool sizes is hampered by the computational complexity involved. A novel constrained EM algorithm called PoooL has been proposed recently to bypass the difficulty via the use of asymptotic normality of the pooled allele frequencies. The resulting estimates are, however, not maximum likelihood estimates and hence not optimal. Furthermore, the assumption of Hardy-Weinberg equilibrium (HWE) made may not be realistic in practice. METHODS Rather than carrying out constrained maximization as in PoooL, we revert to the usual EM algorithm but make it computationally feasible by using normal approximations. The resulting algorithm is much simpler to implement than PoooL because there is no need to invoke sophisticated iterative scaling methods as in PoooL. We also develop an estimating equation analogue of the EM algorithm for the case of Hardy-Weinberg disequilibrium (HWD) by conditioning on the haplotypes of both chromosomes of the same individual. Incorporated into the method is a way of estimating the inbreeding coefficient by relating it to overdispersion. RESULTS Simulation study assuming HWE shows that our simplified implementation of the EM algorithm leads to estimates with substantially smaller SDs than PoooL estimates. Further simulations show that ignoring HWD will induce biases in the estimates. Our extended method with estimation of inbreeding coefficient incorporated is able to reduce the bias leading to estimates with substantially smaller mean square errors. We also present results to suggest that our method can cope with a certain degree of locus-specific inbreeding as well as additional overdispersion not caused by inbreeding. AVAILABILITY http://staff.ustc.edu.cn/ approximately ynyang/aem-aes
Collapse
Affiliation(s)
- Anthony Y C Kuk
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546.
| | | | | |
Collapse
|
4
|
Chen HH, Jou YS, Lee WJ, Pan WH. Applying polynomial standard curve method to correct bias encountered in estimating allele frequencies using DNA pooling strategy. Genomics 2008; 92:429-35. [PMID: 18793711 DOI: 10.1016/j.ygeno.2008.08.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2008] [Revised: 08/15/2008] [Accepted: 08/18/2008] [Indexed: 11/25/2022]
Abstract
DNA pooling approach is a cost-saving strategy which is crucial for multiple-SNP association study and particularly for laboratories with limited budget. However, the biased allele frequency estimates cannot be completely abolished by kappa correction. Using the SNaPshottrade mark, we systematically examined the relations between actual minor allele frequencies (AMiAFs) levels and estimates obtained from the pooling process for all six types of SNPs. We applied principle of polynomial standard curves method (PSCM) to produce allele frequency estimates in pooled DNA samples and compared it with the kappa method. The results showed that estimates derived from the PSCM were in general closer to AMiAFs than those from the kappa method, particularly for C/G and G/T polymorphisms at the range of AMiAF between 20-40%. We demonstrated that applying PSCM in the SNaPshottrade mark platform is suitable for multiple-SNP association study using pooling strategy, due to its cost effectiveness and estimation accuracy.
Collapse
Affiliation(s)
- Hsin-Hung Chen
- Institute of Biomedical Sciences, Academia Sinica, Taiwan, ROC
| | | | | | | |
Collapse
|
5
|
Homer N, Tembe WD, Szelinger S, Redman M, Stephan DA, Pearson JV, Nelson SF, Craig D. Multimarker analysis and imputation of multiple platform pooling-based genome-wide association studies. Bioinformatics 2008; 24:1896-902. [PMID: 18617537 PMCID: PMC2732219 DOI: 10.1093/bioinformatics/btn333] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2008] [Revised: 06/26/2008] [Accepted: 06/27/2008] [Indexed: 12/26/2022] Open
Abstract
For many genome-wide association (GWA) studies individually genotyping one million or more SNPs provides a marginal increase in coverage at a substantial cost. Much of the information gained is redundant due to the correlation structure inherent in the human genome. Pooling-based GWA studies could benefit significantly by utilizing this redundancy to reduce noise, improve the accuracy of the observations and increase genomic coverage. We introduce a measure of correlation between individual genotyping and pooling, under the same framework that r(2) provides a measure of linkage disequilibrium (LD) between pairs of SNPs. We then report a new non-haplotype multimarker multi-loci method that leverages the correlation structure between SNPs in the human genome to increase the efficacy of pooling-based GWA studies. We first give a theoretical framework and derivation of our multimarker method. Next, we evaluate simulations using this multimarker approach in comparison to single marker analysis. Finally, we experimentally evaluate our method using different pools of HapMap individuals on the Illumina 450S Duo, Illumina 550K and Affymetrix 5.0 platforms for a combined total of 1 333 631 SNPs. Our results show that use of multimarker analysis reduces noise specific to pooling-based studies, allows for efficient integration of multiple microarray platforms and provides more accurate measures of significance than single marker analysis. Additionally, this approach can be extended to allow for imputing the association significance for SNPs not directly observed using neighboring SNPs in LD. This multimarker method can now be used to cost-effectively complete pooling-based GWA studies with multiple platforms across over one million SNPs and to impute neighboring SNPs weighted for the loss of information due to pooling.
Collapse
Affiliation(s)
- Nils Homer
- Translational Genomics Research Institute (TGen), Phoenix, AZ 85004, USA
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Pratap S, Williams SM, Levy SE. Evaluation of pooled allelotyping versus individual genotyping for genome-wide association analysis of complex disease. BMC Bioinformatics 2008. [PMCID: PMC3313174 DOI: 10.1186/1471-2105-9-s7-p11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
7
|
Zhang H, Yang HC, Yang Y. PoooL: an efficient method for estimating haplotype frequencies from large DNA pools. Bioinformatics 2008; 24:1942-8. [DOI: 10.1093/bioinformatics/btn324] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
8
|
Yang HC, Huang MC, Li LH, Lin CH, Yu ALT, Diccianni MB, Wu JY, Chen YT, Fann CSJ. MPDA: microarray pooled DNA analyzer. BMC Bioinformatics 2008; 9:196. [PMID: 18412951 PMCID: PMC2387178 DOI: 10.1186/1471-2105-9-196] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 04/15/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microarray-based pooled DNA experiments that combine the merits of DNA pooling and gene chip technology constitute a pivotal advance in biotechnology. This new technique uses pooled DNA, thereby reducing costs associated with the typing of DNA from numerous individuals. Moreover, use of an oligonucleotide gene chip reduces costs related to processing various DNA segments (e.g., primers, reagents). Thus, the technique provides an overall cost-effective solution for large-scale genomic/genetic research. However, few publicly shared tools are available to systematically analyze the rapidly accumulating volume of whole-genome pooled DNA data. RESULTS We propose a generalized concept of pooled DNA and present a user-friendly tool named Microarray Pooled DNA Analyzer (MPDA) that we developed to analyze hybridization intensity data from microarray-based pooled DNA experiments. MPDA enables whole-genome DNA preferential amplification/hybridization analysis, allele frequency estimation, association mapping, allelic imbalance detection, and permits integration with shared data resources online. Graphic and numerical outputs from MPDA support global and detailed inspection of large amounts of genomic data. Four whole-genome data analyses are used to illustrate the major functionalities of MPDA. The first analysis shows that MPDA can characterize genomic patterns of preferential amplification/hybridization and provide calibration information for pooled DNA data analysis. The second analysis demonstrates that MPDA can accurately estimate allele frequencies. The third analysis indicates that MPDA is cost-effective and reliable for association mapping. The final analysis shows that MPDA can identify regions of chromosomal aberration in cancer without paired-normal tissue. CONCLUSION MPDA, the software that integrates pooled DNA association analysis and allelic imbalance analysis, provides a convenient analysis system for extensive whole-genome pooled DNA data analysis. The software, user manual and illustrated examples are freely available online at the MPDA website listed in the Availability and requirements section.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan.
| | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Abstract
The genetic dissection of complex disorders via genetic marker data has gained popularity in the postgenome era. Methods for typing genetic markers on human chromosomes continue to improve. Compared with the popular individual genotyping experiment, a pooled-DNA experiment (alleotyping experiment) is more cost effective when carrying out genetic typing. This chapter provides an overview of association mapping using pooled DNA and describes a five-stage study design including the preliminary calibration of peak intensities, estimation of allele frequency, single-locus association mapping, multilocus association mapping, and a confirmation study. Software and an analysis of authentic data are presented. The strengths and weaknesses of pooled-DNA analyses, as well as possible future applications for this method, are discussed.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
| | | |
Collapse
|
10
|
Wilkening S, Chen B, Wirtenberger M, Burwinkel B, Försti A, Hemminki K, Canzian F. Allelotyping of pooled DNA with 250 K SNP microarrays. BMC Genomics 2007; 8:77. [PMID: 17367522 PMCID: PMC1839100 DOI: 10.1186/1471-2164-8-77] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Accepted: 03/16/2007] [Indexed: 12/02/2022] Open
Abstract
Background Genotyping technologies for whole genome association studies are now available. To perform such studies to an affordable price, pooled DNA can be used. Recent studies have shown that GeneChip Human Mapping 10 K and 50 K arrays are suitable for the estimation of the allele frequency in pooled DNA. In the present study, we tested the accuracy of the 250 K Nsp array, which is part of the 500 K array set representing 500,568 SNPs. Furthermore, we compared different algorithms to estimate allele frequencies of pooled DNA. Results We could confirm that the polynomial based probe specific correction (PPC) was the most accurate method for allele frequency estimation. However, a simple k-correction, using the relative allele signal (RAS) of heterozygous individuals, performed only slightly worse and provided results for more SNPs. Using four replicates of the 250 K array and the k-correction using heterozygous RAS values, we obtained results for 104.141 SNPs. The correlation between estimated and real allele frequency was 0.983 and the average error was 0.046, which was comparable to the results obtained with the 10 K array. Furthermore, we could show how the estimation accuracy depended on the SNP type (average error for A/T SNPs: 0.043 and for G/C SNPs: 0.052). Conclusion The combination of DNA pooling and analysis of single nucleotide polymorphisms (SNPs) on high density microarrays is a promising tool for whole genome association studies.
Collapse
Affiliation(s)
- Stefan Wilkening
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Bowang Chen
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Michael Wirtenberger
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Barbara Burwinkel
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Helmholtz University Group Molecular Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Asta Försti
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Center for Family Medicine, Karolinska Institute, SE-14183 Huddinge, Sweden
| | - Kari Hemminki
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Center for Family Medicine, Karolinska Institute, SE-14183 Huddinge, Sweden
| | - Federico Canzian
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| |
Collapse
|
11
|
Pearson JV, Huentelman MJ, Halperin RF, Tembe WD, Melquist S, Homer N, Brun M, Szelinger S, Coon KD, Zismann VL, Webster JA, Beach T, Sando SB, Aasly JO, Heun R, Jessen F, Kolsch H, Tsolaki M, Daniilidou M, Reiman EM, Papassotiropoulos A, Hutton ML, Stephan DA, Craig DW. Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies. Am J Hum Genet 2007; 80:126-39. [PMID: 17160900 PMCID: PMC1785308 DOI: 10.1086/510686] [Citation(s) in RCA: 126] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2006] [Accepted: 11/07/2006] [Indexed: 01/06/2023] Open
Abstract
We report the development and validation of experimental methods, study designs, and analysis software for pooling-based genomewide association (GWA) studies that use high-throughput single-nucleotide-polymorphism (SNP) genotyping microarrays. We first describe a theoretical framework for establishing the effectiveness of pooling genomic DNA as a low-cost alternative to individually genotyping thousands of samples on high-density SNP microarrays. Next, we describe software called "GenePool," which directly analyzes SNP microarray probe intensity data and ranks SNPs by increased likelihood of being genetically associated with a trait or disorder. Finally, we apply these methods to experimental case-control data and demonstrate successful identification of published genetic susceptibility loci for a rare monogenic disease (sudden infant death with dysgenesis of the testes syndrome), a rare complex disease (progressive supranuclear palsy), and a common complex disease (Alzheimer disease) across multiple SNP genotyping platforms. On the basis of these theoretical calculations and their experimental validation, our results suggest that pooling-based GWA studies are a logical first step for determining whether major genetic associations exist in diseases with high heritability.
Collapse
Affiliation(s)
- John V Pearson
- Translational Genomics Research Institute, Phoenix, AZ, 85004, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Yang HC, Liang YJ, Huang MC, Li LH, Lin CH, Wu JY, Chen YT, Fann C. A genome-wide study of preferential amplification/hybridization in microarray-based pooled DNA experiments. Nucleic Acids Res 2006; 34:e106. [PMID: 16931491 PMCID: PMC1616968 DOI: 10.1093/nar/gkl446] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2006] [Revised: 05/05/2006] [Accepted: 06/09/2006] [Indexed: 01/27/2023] Open
Abstract
Microarray-based pooled DNA methods overcome the cost bottleneck of simultaneously genotyping more than 100 000 markers for numerous study individuals. The success of such methods relies on the proper adjustment of preferential amplification/hybridization to ensure accurate and reliable allele frequency estimation. We performed a hybridization-based genome-wide single nucleotide polymorphisms (SNPs) genotyping analysis to dissect preferential amplification/hybridization. The majority of SNPs had less than 2-fold signal amplification or suppression, and the lognormal distributions adequately modeled preferential amplification/hybridization across the human genome. Comparative analyses suggested that the distributions of preferential amplification/hybridization differed among genotypes and the GC content. Patterns among different ethnic populations were similar; nevertheless, there were striking differences for a small proportion of SNPs, and a slight ethnic heterogeneity was observed. To fulfill appropriate and gratuitous adjustments, databases of preferential amplification/hybridization for African Americans, Caucasians and Asians were constructed based on the Affymetrix GeneChip Human Mapping 100 K Set. The robustness of allele frequency estimation using this database was validated by a pooled DNA experiment. This study provides a genome-wide investigation of preferential amplification/hybridization and suggests guidance for the reliable use of the database. Our results constitute an objective foundation for theoretical development of preferential amplification/hybridization and provide important information for future pooled DNA analyses.
Collapse
Affiliation(s)
- H.-C. Yang
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - Y.-J. Liang
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - M.-C. Huang
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - L.-H. Li
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - C.-H. Lin
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - J.-Y. Wu
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - Y.-T. Chen
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - C.S.J. Fann
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| |
Collapse
|