1
|
Arca M, Mary-Huard T, Gouesnard B, Bérard A, Bauland C, Combes V, Madur D, Charcosset A, Nicolas SD. Deciphering the Genetic Diversity of Landraces With High-Throughput SNP Genotyping of DNA Bulks: Methodology and Application to the Maize 50k Array. FRONTIERS IN PLANT SCIENCE 2021; 11:568699. [PMID: 33488638 PMCID: PMC7817617 DOI: 10.3389/fpls.2020.568699] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 11/12/2020] [Indexed: 05/13/2023]
Abstract
Genebanks harbor original landraces carrying many original favorable alleles for mitigating biotic and abiotic stresses. Their genetic diversity remains, however, poorly characterized due to their large within genetic diversity. We developed a high-throughput, cheap and labor saving DNA bulk approach based on single-nucleotide polymorphism (SNP) Illumina Infinium HD array to genotype landraces. Samples were gathered for each landrace by mixing equal weights from young leaves, from which DNA was extracted. We then estimated allelic frequencies in each DNA bulk based on fluorescent intensity ratio (FIR) between two alleles at each SNP using a two step-approach. We first tested either whether the DNA bulk was monomorphic or polymorphic according to the two FIR distributions of individuals homozygous for allele A or B, respectively. If the DNA bulk was polymorphic, we estimated its allelic frequency by using a predictive equation calibrated on FIR from DNA bulks with known allelic frequencies. Our approach: (i) gives accurate allelic frequency estimations that are highly reproducible across laboratories, (ii) protects against false detection of allele fixation within landraces. We estimated allelic frequencies of 23,412 SNPs in 156 landraces representing American and European maize diversity. Modified Roger's genetic Distance between 156 landraces estimated from 23,412 SNPs and 17 simple sequence repeats using the same DNA bulks were highly correlated, suggesting that the ascertainment bias is low. Our approach is affordable, easy to implement and does not require specific bioinformatics support and laboratory equipment, and therefore should be highly relevant for large-scale characterization of genebanks for a wide range of species.
Collapse
Affiliation(s)
- Mariangela Arca
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE – Le Moulon, Gif-sur-Yvette, France
| | - Tristan Mary-Huard
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE – Le Moulon, Gif-sur-Yvette, France
| | - Brigitte Gouesnard
- AGAP, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Aurélie Bérard
- Université Paris-Saclay, INRAE, Etude du Polymorphisme des Génomes Végétaux, Evry-Courcouronnes, France
| | - Cyril Bauland
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE – Le Moulon, Gif-sur-Yvette, France
| | - Valérie Combes
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE – Le Moulon, Gif-sur-Yvette, France
| | - Delphine Madur
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE – Le Moulon, Gif-sur-Yvette, France
| | - Alain Charcosset
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE – Le Moulon, Gif-sur-Yvette, France
| | - Stéphane D. Nicolas
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE – Le Moulon, Gif-sur-Yvette, France
| |
Collapse
|
2
|
Genomic signatures of parasite-driven natural selection in north European Atlantic salmon (Salmo salar). Mar Genomics 2018; 39:26-38. [DOI: 10.1016/j.margen.2018.01.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2017] [Revised: 12/16/2017] [Accepted: 01/08/2018] [Indexed: 02/06/2023]
|
3
|
Pritchard VL, Erkinaro J, Kent MP, Niemelä E, Orell P, Lien S, Primmer CR. Single nucleotide polymorphisms to discriminate different classes of hybrid between wild Atlantic salmon and aquaculture escapees. Evol Appl 2016; 9:1017-31. [PMID: 27606009 PMCID: PMC4999531 DOI: 10.1111/eva.12407] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Accepted: 06/07/2016] [Indexed: 12/14/2022] Open
Abstract
Many wild Atlantic salmon (Salmo salar) populations are threatened by introgressive hybridization from domesticated fish that have escaped from aquaculture facilities. A detailed understanding of the hybridization dynamics between wild salmon and aquaculture escapees requires discrimination of different hybrid classes; however, markers currently available to discriminate the two types of parental genome have limited power to do this. Using a high‐density Atlantic salmon single nucleotide polymorphism (SNP) array, in combination with pooled‐sample allelotyping and an Fst outlier approach, we identified 200 SNPs that differentiated an important Atlantic salmon stock from the escapees potentially hybridizing with it. By simulating multiple generations of wild–escapee hybridization, involving wild populations in two major phylogeographic lineages and a genetically diverse set of escapees, we showed that both the complete set of SNPs and smaller subsets could reliably assign individuals to different hybrid classes up to the third hybrid (F3) generation. This set of markers will be a useful tool for investigating the genetic interactions between native wild fish and aquaculture escapees in many Atlantic salmon populations.
Collapse
Affiliation(s)
| | | | - Matthew P Kent
- Centre for Integrative Genetics (CIGENE) Department of Animal and Aquacultural Sciences Norwegian University of Life Sciences Aas Norway
| | - Eero Niemelä
- Natural Resources Institute Finland (Luke) Utsjoki Finland
| | - Panu Orell
- Natural Resources Institute Finland (Luke) Utsjoki Finland
| | - Sigbjørn Lien
- Centre for Integrative Genetics (CIGENE) Department of Animal and Aquacultural Sciences Norwegian University of Life Sciences Aas Norway
| | | |
Collapse
|
4
|
Hellicar AD, Rahman A, Smith DV, Henshall JM. Machine learning approach for pooled DNA sample calibration. BMC Bioinformatics 2015; 16:214. [PMID: 26156142 PMCID: PMC4495942 DOI: 10.1186/s12859-015-0593-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 04/23/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. However, genotyping of pooled samples comprising DNA from many individuals is challenging due to the presence of errors that exceed the allele frequency quantisation size and therefore cannot be simply corrected by clustering techniques. The solution to the calibration problem is a correction to the allele frequency to mitigate errors incurred in the measurement process. We highlight the limitations of the existing calibration solutions such as the fact they impose assumptions on the variation between allele frequencies 0, 0.5, and 1.0, and address a limited set of error types. We propose a novel machine learning method to address the limitations identified. RESULTS The approach is tested on SNPs genotyped with the Sequenom iPLEX platform and compared to existing state of the art calibration methods. The new method is capable of reducing the mean square error in allele frequency to half that achievable with existing approaches. Furthermore for the first time we demonstrate the importance of carefully considering the choice of training data when using calibration approaches built from pooled data. CONCLUSION This paper demonstrates that improvements in pooled allele frequency estimates result if the genotyping platform is characterised at allele frequencies other than the homozygous and heterozygous cases. Techniques capable of incorporating such information are described along with aspects of implementation.
Collapse
Affiliation(s)
- Andrew D Hellicar
- CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia.
| | - Ashfaqur Rahman
- CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia.
| | - Daniel V Smith
- CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia.
| | | |
Collapse
|
5
|
Reverter A, Henshall JM, McCulloch R, Sasazaki S, Hawken R, Lehnert SA. Numerical analysis of intensity signals resulting from genotyping pooled DNA samples in beef cattle and broiler chicken. J Anim Sci 2014; 92:1874-85. [PMID: 24663186 DOI: 10.2527/jas.2013-7133] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Pooled genomic DNA has been proposed as a cost-effective approach in genomewide association studies (GWAS). However, algorithms for genotype calling of biallelic SNP are not adequate with pooled DNA samples because they assume the presence of 2 fluorescent signals, 1 for each allele, and operate under the expectation that at most 2 copies of the variant allele can be found for any given SNP and DNA sample. We adapt analytical methodology from 2-channel gene expression microarray technology to SNP genotyping of pooled DNA samples. Using 5 datasets from beef cattle and broiler chicken of varying degrees of complexity in terms of design and phenotype, continuous and dichotomous, we show that both differential hybridization (M = green minus red intensity signal) and abundance (A = average of red and green intensities) provide useful information in the prediction of SNP allele frequencies. This is predominantly true when making inference about extreme SNP that are either nearly fixed or highly polymorphic. We propose the use of model-based clustering via mixtures of bivariate normal distributions as an optimal framework to capture the relationship between hybridization intensity and allele frequency from pooled DNA samples. The range of M and A values observed here are in agreement with those reported within the context of gene expression microarray and also with those from SNP array data within the context of analytical methodology for the identification of copy number variants. In particular, we confirm that highly polymorphic SNP yield a strong signal from both channels (red and green) while lowly or nonpolymorphic SNP yield a strong signal from 1 channel only. We further confirm that when the SNP allele frequencies are known, either because the individuals in the pools or from a closely related population are themselves genotyped, a multiple regression model with linear and quadratic components can be developed with high prediction accuracy. We conclude that when these approaches are applied to the estimation of allele frequencies, the resulting estimates allow for the development of cost-effective and reliable GWAS.
Collapse
Affiliation(s)
- A Reverter
- CSIRO Food Futures Flagship and CSIRO Animal, Food and Health Sciences, 306Carmody Road, St. Lucia, Brisbane, Queensland 4067, Australia
| | | | | | | | | | | |
Collapse
|
6
|
Teumer A, Ernst FD, Wiechert A, Uhr K, Nauck M, Petersmann A, Völzke H, Völker U, Homuth G. Comparison of genotyping using pooled DNA samples (allelotyping) and individual genotyping using the affymetrix genome-wide human SNP array 6.0. BMC Genomics 2013; 14:506. [PMID: 23885805 PMCID: PMC3727995 DOI: 10.1186/1471-2164-14-506] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Accepted: 07/23/2013] [Indexed: 12/26/2022] Open
Abstract
Background Genome-wide association studies (GWAS) using array-based genotyping technology are widely used to identify genetic loci associated with complex diseases or other phenotypes. The costs of GWAS projects based on individual genotyping are still comparatively high and increase with the size of study populations. Genotyping using pooled DNA samples, as also being referred as to allelotyping approach, offers an alternative at affordable costs. In the present study, data from 100 DNA samples individually genotyped with the Affymetrix Genome-Wide Human SNP Array 6.0 were used to estimate the error of the pooling approach by comparing the results with those obtained using the same array type but DNA pools each composed of 50 of the same samples. Newly developed and established methods for signal intensity correction were applied. Furthermore, the relative allele intensity signals (RAS) obtained by allelotyping were compared to the corresponding values derived from individual genotyping. Similarly, differences in RAS values between pools were determined and compared. Results Regardless of the intensity correction method applied, the pooling-specific error of the pool intensity values was larger for single pools than for the comparison of the intensity values of two pools, which reflects the scenario of a case–control study. Using 50 pooled samples and analyzing 10,000 SNPs with a minor allele frequency of >1% and applying the best correction method for the corresponding type of comparison, the 90% quantile (median) of the pooling-specific absolute error of the RAS values for single sub-pools and the SNP-specific difference in allele frequency comparing two pools was 0.064 (0.026) and 0.056 (0.021), respectively. Conclusions Correction of the RAS values reduced the error of the RAS values when analyzing single pool intensities. We developed a new correction method with high accuracy but low computational costs. Correction of RAS, however, only marginally reduced the error of true differences between two sample groups and those obtained by allelotyping. Exclusion of SNPs with a minor allele frequency of ≤1% notably reduced the pooling-specific error. Our findings allow for improving the estimation of the pooling-specific error and may help in designing allelotyping studies using the Affymetrix Genome-Wide Human SNP Array 6.0.
Collapse
Affiliation(s)
- Alexander Teumer
- Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, 17487 Greifswald, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Screening of cerebral infarction-related genetic markers using a Cox regression analysis between onset age and heterozygosity at randomly selected short tandem repeat loci. J Thromb Thrombolysis 2012; 33:318-21. [PMID: 22476643 DOI: 10.1007/s11239-012-0724-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The aim of this paper is to explore whether the heterozygosity at the 9 CODIS short tandem repeats (STR) loci including D3S1358, vWA, FGA, D8S1179, D21S11, D18S51, D5S818, D13S317 and D7S820 is associated with the risk of atherosclerotic cerebral infarction (CI). The DNA samples were collected from patients with CI (n = 72) and people over the age of 90 years without CI (n = 59). Alleles of the STR loci were determined using the STR Profiler Plus PCR amplification kit. The relationship between the age of onset and heterozygosity was determined with the Cox regression method. A correlation between the age of onset and heterozygosity was observed for the D8S1179 locus (p < 0.05). It implied that regions in the vicinity of locus D8S1179 may harbor susceptibility genes for CI. The analysis of heterozygosity for particular loci as genetic markers using our new study design may be an efficient and reliable approach to estimate genetic predispositions.
Collapse
|
8
|
Tong F, Yu W, Liu H. Efficient association analysis between colorectal cancer and allelic polymorphisms of HLA-DQB1 by comparison of age of onset. Oncol Lett 2011; 3:517-519. [PMID: 22740942 DOI: 10.3892/ol.2011.540] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2011] [Accepted: 12/19/2011] [Indexed: 11/05/2022] Open
Abstract
Common methods for identifying cancer-related genes are solely based on differences between gene frequencies in the disease and control groups, and do not take into account the age of onset in the gene carriers. In the present investigation, we developed a new study design based on the age of onset of cancer for the identification of colorectal cancer-related genes. The samples from patients with colorectal cancer were typed using an HLA-DQB1 polymerase chain reaction using a sequence-specific primers (PCR-SSP) typing kit. The mean age of subjects with and without the alleles was calculated. The mean age of subjects with the HLA-DQB1*02 allele was significantly less than that of subjects without this allele (p<0.05). We found that the HLA-DQB1*02 allele was associated with colorectal cancer susceptibility. This new method of analysis may therefore be an efficient and reliable approach for the identification of cancer-causing genes.
Collapse
Affiliation(s)
- Fengzhi Tong
- Dalian Fifth Hospital, Dalian Medical University, Dalian 116044, P.R. China
| | | | | |
Collapse
|
9
|
Anantharaman R, Andiappan AK, Nilkanth PP, Suri BK, Wang DY, Chew FT. Genome-wide association study identifies PERLD1 as asthma candidate gene. BMC MEDICAL GENETICS 2011; 12:170. [PMID: 22188591 PMCID: PMC3268734 DOI: 10.1186/1471-2350-12-170] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 12/21/2011] [Indexed: 11/10/2022]
Abstract
Background Recent genome-wide association studies (GWAS) for asthma have been successful in identifying novel associations which have been well replicated. The aim of this study is to identify the genetic variants that influence predisposition towards asthma in an ethnic Chinese population in Singapore using a GWAS approach. Methods A two-stage GWAS was performed in case samples with allergic asthma, and in control samples without asthma and atopy. In the discovery stage, 490 case and 490 control samples were analysed by pooled genotyping. Significant associations from the first stage were evaluated in a replication cohort of 521 case and 524 control samples in the second stage. The same 980 samples used in the discovery phase were also individually genotyped for purposes of a combined analysis. An additional 1445 non-asthmatic atopic control samples were also genotyped. Results 19 promising SNPs which passed our genome-wide P value threshold of 5.52 × 10-8 were individually genotyped. In the combined analysis of 1011 case and 1014 control samples, SNP rs2941504 in PERLD1 on chromosome 17q12 was found to be significantly associated with asthma at the genotypic level (P = 1.48 × 10-6, ORAG = 0.526 (0.369-0.700), ORAA = 0.480 (0.361-0.639)) and at the allelic level (P = 9.56 × 10-6, OR = 0.745 (0.654-0.848)). These findings were found to be replicated in 3 other asthma GWAS studies, thus validating our own results. Analysis against the atopy control samples suggested that the SNP was associated with allergic asthma and not to either the asthma or allergy components. Genotyping of additional SNPs in 100 kb flanking rs2941504 further confirmed that the association was indeed to PERLD1. PERLD1 is involved in the modification of the glycosylphosphatidylinositol anchors for cell surface markers such as CD48 and CD59 which are known to play multiple roles in T-cell activation and proliferation. Conclusions These findings reveal the association of a PERLD1 as a novel asthma candidate gene and reinforce the involvement of genes on the 17q12-21 chromosomal region in the etiology of asthma.
Collapse
Affiliation(s)
- Ramani Anantharaman
- Department of Biological Sciences, National University of Singapore, Singapore
| | | | | | | | | | | |
Collapse
|
10
|
Distefano JK, Taverna DM. Technological issues and experimental design of gene association studies. Methods Mol Biol 2011; 700:3-16. [PMID: 21204023 DOI: 10.1007/978-1-61737-954-3_1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Genome-wide association studies (GWAS), in which thousands of single-nucleotide polymorphisms (SNPs) spanning the genome are genotyped in individuals who are phenotypically well characterized, -currently represent the most popular strategy for identifying gene regions associated with common -diseases and related quantitative traits. Improvements in technology and throughput capability, development of powerful statistical tools, and more widespread acceptance of pooling-based genotyping approaches have led to greater utilization of GWAS in human genetics research. However, important considerations for optimal experimental design, including selection of the most appropriate genotyping platform, can enhance the utility of the approach even further. This chapter reviews experimental and technological issues that may affect the success of GWAS findings and proposes strategies for developing the most comprehensive, logical, and cost-effective approaches for genotyping given the population of interest.
Collapse
Affiliation(s)
- Johanna K Distefano
- Diabetes, Cardiovascular, and Metabolic Diseases Division, Translational Genomics Research Institute, Phoenix, AZ, USA.
| | | |
Collapse
|
11
|
Andiappan AK, Anantharaman R, Nilkanth PP, Wang DY, Chew FT. Evaluating the transferability of Hapmap SNPs to a Singapore Chinese population. BMC Genet 2010; 11:36. [PMID: 20459637 PMCID: PMC2877651 DOI: 10.1186/1471-2156-11-36] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Accepted: 05/07/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The International Hapmap project serves as a valuable resource for human genome variation data, however its applicability to other populations has yet to be exhaustively investigated. In this paper, we use high density genotyping chips and resequencing strategies to compare the Singapore Chinese population with the Hapmap populations. First we compared 1028 and 114 unrelated Singapore Chinese samples genotyped using the Illumina Human Hapmap 550 k chip and Affymetrix 500 k array respectively against the 270 samples from Hapmap. Secondly, data from 20 candidate genes on 5q31-33 resequenced for an asthma candidate gene based study was also used for the analysis. RESULTS A total of 237 SNPs were identified through resequencing of which only 95 SNPs (40%) were in Hapmap; however an additional 56 SNPs (24%) were not genotyped directly but had a proxy SNP in the Hapmap. At the genome-wide level, Singapore Chinese were highly correlated with Hapmap Han Chinese with correlation of 0.954 and 0.947 for the Illumina and Affymetrix platforms respectively with deviant SNPs randomly distributed within and across all chromosomes. CONCLUSIONS The high correlation between our population and Hapmap Han Chinese reaffirms the applicability of Hapmap based genome-wide chips for GWA studies. There is a clear population signature for the Singapore Chinese samples and they predominantly resemble the southern Han Chinese population; however when new migrants particularly those with northern Han Chinese background were included, population stratification issues may arise. Future studies needs to address population stratification within the sample collection while designing and interpreting GWAS in the Chinese population.
Collapse
Affiliation(s)
- Anand Kumar Andiappan
- Department of Biological Sciences, National University of Singapore, Science Drive 4, Singapore 117543
| | - Ramani Anantharaman
- Department of Biological Sciences, National University of Singapore, Science Drive 4, Singapore 117543
| | - Pallavi Parate Nilkanth
- Department of Biological Sciences, National University of Singapore, Science Drive 4, Singapore 117543
| | - De Yun Wang
- Department of Otolaryngology, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119260
| | - Fook Tim Chew
- Department of Biological Sciences, National University of Singapore, Science Drive 4, Singapore 117543
| |
Collapse
|
12
|
Chi XF, Lou XY, Shu QY. Combining DNA pooling with selective recombinant genotyping for increased efficiency in fine mapping. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 120:775-783. [PMID: 19898814 PMCID: PMC2829194 DOI: 10.1007/s00122-009-1198-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2009] [Accepted: 10/17/2009] [Indexed: 05/28/2023]
Abstract
One of the key steps in positional cloning and marker-aided selection is to identify marker(s) tightly linked to the target gene (i.e., fine mapping). Selective genotyping such as selective recombinant genotyping (SRG) is commonly used in fine mapping for cost-saving. To further decrease genotyping effort and rapidly screen for tightly linked markers, we propose here a combined DNA pooling and SRG strategy. A two-stage pooled genotyping can be used for identifying recombinants between a pair of flanking markers more efficiently, and a joint use of bulked DNA analysis and two-stage pooling can also save cost for genotyping recombinants. The combined DNA pooling and SRG strategy can further be extended to fine mapping for polygenic traits. The numerical results based on hypothetical scenarios and an illustrative application to fine mapping of a mutant gene, called xl(t), in rice suggest that the proposed strategy can remarkably reduce genotyping amount compared with the conventional SRG.
Collapse
Affiliation(s)
- Xiao-Fei Chi
- Institute of Nuclear Agricultural Sciences, Zhejiang University, Hangzhou, People’s Republic of China
| | - Xiang-Yang Lou
- Institute of Bioinformatics, Zhejiang University, Hangzhou, People’s Republic of China. Department of Biostatistics, University of Alabama at Birmingham, RPHB 420B, 1665 University Boulevard, Birmingham, AL 35294, USA
| | - Qing-Yao Shu
- Institute of Nuclear Agricultural Sciences, Zhejiang University, Hangzhou, People’s Republic of China
| |
Collapse
|
13
|
Anantharaman R, Chew FT. Validation of pooled genotyping on the Affymetrix 500 k and SNP6.0 genotyping platforms using the polynomial-based probe-specific correction. BMC Genet 2009; 10:82. [PMID: 20003400 PMCID: PMC2806376 DOI: 10.1186/1471-2156-10-82] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2008] [Accepted: 12/14/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of pooled DNA on SNP microarrays (SNP-MaP) has been shown to be a cost effective and rapid manner to perform whole-genome association evaluations. While the accuracy of SNP-MaP was extensively evaluated on the early Affymetrix 10 k and 100 k platforms, there have not been as many similarly comprehensive studies on more recent platforms. In the present study, we used the data generated from the full Affymetrix 500 k SNP set together with the polynomial-based probe-specific correction (PPC) to derive allele frequency estimates. These estimates were compared to genotyping results of the same individuals on the same platform, as the basis to evaluate the reliability and accuracy of pooled genotyping on these high-throughput platforms. We subsequently extended this comparison to the new SNP6.0 platform capable of genotyping 1.8 million genetic variants. RESULTS We showed that pooled genotyping on the 500 k platform performed as well as those previously shown on the relatively lower throughput 10 k and 100 k array sets, with high levels of accuracy (correlation coefficient: 0.988) and low median error (0.036) in allele frequency estimates. Similar results were also obtained from the SNP6.0 array set. A novel pooling strategy of overlapping sub-pools was attempted and comparison of estimated allele frequencies showed this strategy to be as reliable as replicate pools. The importance of an appropriate reference genotyping data set for the application of the PPC algorithm was also evaluated; reference samples with similar ethnic background to the pooled samples were found to improve estimation of allele frequencies. CONCLUSION We conclude that use of the PPC algorithm to estimate allele frequencies obtained from pooled genotyping on the high throughput 500 k and SNP6.0 platforms is highly accurate and reproducible especially when a suitable reference sample set is used to estimate the beta values for PPC.
Collapse
Affiliation(s)
- Ramani Anantharaman
- Department of Biological Sciences, National University of Singapore, Science Drive 4, Singapore 117543
| | - Fook Tim Chew
- Department of Biological Sciences, National University of Singapore, Science Drive 4, Singapore 117543
| |
Collapse
|
14
|
Ronald A, Butcher LM, Docherty S, Davis OSP, Schalkwyk LC, Craig IW, Plomin R. A genome-wide association study of social and non-social autistic-like traits in the general population using pooled DNA, 500 K SNP microarrays and both community and diagnosed autism replication samples. Behav Genet 2009; 40:31-45. [PMID: 20012890 PMCID: PMC2797846 DOI: 10.1007/s10519-009-9308-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 10/14/2009] [Indexed: 10/28/2022]
Abstract
Two separate genome-wide association studies were conducted to identify single nucleotide polymorphisms (SNPs) associated with social and nonsocial autistic-like traits. We predicted that we would find SNPs associated with social and non-social autistic-like traits and that different SNPs would be associated with social and nonsocial. In Stage 1, each study screened for allele frequency differences in approximately 430,000 autosomal SNPs using pooled DNA on microarrays in high-scoring versus low-scoring boys from a general population sample (N = approximately 400/group). In Stage 2, 22 and 20 SNPs in the social and non-social studies, respectively, were tested for QTL association by individually genotyping an independent community sample of 1,400 boys. One SNP (rs11894053) was nominally associated (P < .05, uncorrected for multiple testing) with social autistic-like traits. When the sample was increased by adding females, 2 additional SNPs were nominally significant (P < .05). These 3 SNPs, however, showed no significant association in transmission disequilibrium analyses of diagnosed ASD families.
Collapse
Affiliation(s)
- Angelica Ronald
- Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, De Crespigny Park, London SE5 8AF, UK.
| | | | | | | | | | | | | |
Collapse
|
15
|
Hui L, Liping G. Statistical estimation of diagnosis with genetic markers based on decision tree analysis of complex disease. Comput Biol Med 2009; 39:989-92. [PMID: 19712931 DOI: 10.1016/j.compbiomed.2009.07.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2006] [Revised: 07/19/2009] [Accepted: 07/27/2009] [Indexed: 10/20/2022]
Abstract
To explore combinations of genetic markers and to estimate their joint action, decision trees are built on the basis of marker frequencies in both disease and control groups. Youden's index (0.1-0.9 for a single marker) is calculated for genetic markers with different diagnostic capacities. When 23 single genetic markers with diagnostic power 0.10 are combined, the resulting diagnostic power is 0.5. Medium diagnostic power (Youden's index 0.7) can be obtained by combining four low effect diagnostic items. High diagnostic power (Youden's index 0.9) can be obtained by combining either eight low power items or four medium power ones. This implies that selection of about 100 genetic markers, differing in capacity to distinguish between the disease and control groups by (say) 10%, will meet the requirement for clinic diagnosis. Thus, diagnosis of complex diseases by genetic markers is possible through the discovery and characterization of markers throughout the human genome and the development of genotyping technology.
Collapse
Affiliation(s)
- Liu Hui
- College of Medical Laboratory, Dalian Medical University, Dalian 116027, China.
| | | |
Collapse
|
16
|
Yin BC, Wang XF, Ye BC. Multiplex genotyping and allele frequency estimation in pooled DNAs using non-gel capillary electrophoresis. Anal Biochem 2009; 387:221-9. [DOI: 10.1016/j.ab.2009.01.021] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2008] [Revised: 01/14/2009] [Accepted: 01/15/2009] [Indexed: 10/21/2022]
|
17
|
Yin BC, Li H, Ye BC. Microarray-based estimation of SNP allele-frequency in pooled DNA using the Langmuir kinetic model. BMC Genomics 2008; 9:605. [PMID: 19087310 PMCID: PMC2640397 DOI: 10.1186/1471-2164-9-605] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2008] [Accepted: 12/16/2008] [Indexed: 11/20/2022] Open
Abstract
Background High throughput genotyping of single nucleotide polymorphisms (SNPs) for genome-wide association requires technologies for generating millions of genotypes with relative ease but also at a reasonable cost and with high accuracy. In this work, we have developed a theoretical approach to estimate allele frequency in pooled DNA samples, based on the physical principles of DNA immobilization and hybridization on solid surface using the Langmuir kinetic model and quantitative analysis of the allelic signals. Results This method can successfully distinguish allele frequencies differing by 0.01 in the actual pool of clinical samples, and detect alleles with a frequency as low as 2%. The accuracy of measuring known allele frequencies is very high, with the strength of correlation between measured and actual frequencies having an r2 = 0.9992. These results demonstrated that this method could allow the accurate estimation of absolute allele frequencies in pooled samples of DNA in a feasible and inexpensive way. Conclusion We conclude that this novel strategy for quantitative analysis of the ratio of SNP allelic sequences in DNA pools is an inexpensive and feasible alternative for detecting polymorphic differences in candidate gene association studies and genome-wide linkage disequilibrium scans.
Collapse
Affiliation(s)
- Bin-Cheng Yin
- Laboratory of Biosystems and Microanalysis, State Key Laboratory of Bioreactor Engineering, East China University of Science & Technology, Shanghai, PR China.
| | | | | |
Collapse
|
18
|
Chen HH, Jou YS, Lee WJ, Pan WH. Applying polynomial standard curve method to correct bias encountered in estimating allele frequencies using DNA pooling strategy. Genomics 2008; 92:429-35. [PMID: 18793711 DOI: 10.1016/j.ygeno.2008.08.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2008] [Revised: 08/15/2008] [Accepted: 08/18/2008] [Indexed: 11/25/2022]
Abstract
DNA pooling approach is a cost-saving strategy which is crucial for multiple-SNP association study and particularly for laboratories with limited budget. However, the biased allele frequency estimates cannot be completely abolished by kappa correction. Using the SNaPshottrade mark, we systematically examined the relations between actual minor allele frequencies (AMiAFs) levels and estimates obtained from the pooling process for all six types of SNPs. We applied principle of polynomial standard curves method (PSCM) to produce allele frequency estimates in pooled DNA samples and compared it with the kappa method. The results showed that estimates derived from the PSCM were in general closer to AMiAFs than those from the kappa method, particularly for C/G and G/T polymorphisms at the range of AMiAF between 20-40%. We demonstrated that applying PSCM in the SNaPshottrade mark platform is suitable for multiple-SNP association study using pooling strategy, due to its cost effectiveness and estimation accuracy.
Collapse
Affiliation(s)
- Hsin-Hung Chen
- Institute of Biomedical Sciences, Academia Sinica, Taiwan, ROC
| | | | | | | |
Collapse
|
19
|
Yang HC, Huang MC, Li LH, Lin CH, Yu ALT, Diccianni MB, Wu JY, Chen YT, Fann CSJ. MPDA: microarray pooled DNA analyzer. BMC Bioinformatics 2008; 9:196. [PMID: 18412951 PMCID: PMC2387178 DOI: 10.1186/1471-2105-9-196] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 04/15/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microarray-based pooled DNA experiments that combine the merits of DNA pooling and gene chip technology constitute a pivotal advance in biotechnology. This new technique uses pooled DNA, thereby reducing costs associated with the typing of DNA from numerous individuals. Moreover, use of an oligonucleotide gene chip reduces costs related to processing various DNA segments (e.g., primers, reagents). Thus, the technique provides an overall cost-effective solution for large-scale genomic/genetic research. However, few publicly shared tools are available to systematically analyze the rapidly accumulating volume of whole-genome pooled DNA data. RESULTS We propose a generalized concept of pooled DNA and present a user-friendly tool named Microarray Pooled DNA Analyzer (MPDA) that we developed to analyze hybridization intensity data from microarray-based pooled DNA experiments. MPDA enables whole-genome DNA preferential amplification/hybridization analysis, allele frequency estimation, association mapping, allelic imbalance detection, and permits integration with shared data resources online. Graphic and numerical outputs from MPDA support global and detailed inspection of large amounts of genomic data. Four whole-genome data analyses are used to illustrate the major functionalities of MPDA. The first analysis shows that MPDA can characterize genomic patterns of preferential amplification/hybridization and provide calibration information for pooled DNA data analysis. The second analysis demonstrates that MPDA can accurately estimate allele frequencies. The third analysis indicates that MPDA is cost-effective and reliable for association mapping. The final analysis shows that MPDA can identify regions of chromosomal aberration in cancer without paired-normal tissue. CONCLUSION MPDA, the software that integrates pooled DNA association analysis and allelic imbalance analysis, provides a convenient analysis system for extensive whole-genome pooled DNA data analysis. The software, user manual and illustrated examples are freely available online at the MPDA website listed in the Availability and requirements section.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan.
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Butcher LM, Plomin R. The nature of nurture: a genomewide association scan for family chaos. Behav Genet 2008; 38:361-71. [PMID: 18360741 PMCID: PMC2480594 DOI: 10.1007/s10519-008-9198-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2007] [Accepted: 02/25/2008] [Indexed: 11/25/2022]
Abstract
Widely used measures of the environment, especially the family environment of children, show genetic influence in dozens of twin and adoption studies. This phenomenon is known as gene-environment correlation in which genetically driven influences of individuals affect their environments. We conducted the first genome-wide association (GWA) analysis of an environmental measure. We used a measure called CHAOS which assesses 'environmental confusion' in the home, a measure that is more strongly associated with cognitive development in childhood than any other environmental measure. CHAOS was assessed by parental report when the children were 3 years and again when the children were 4 years; a composite CHAOS measure was constructed across the 2 years. We screened 490,041 autosomal single-nucleotide polymorphisms (SNPs) in a two-stage design in which children in low chaos families (N = 469) versus high chaos families (N = 369) from 3,000 families of 4-year-old twins were screened in Stage 1 using pooled DNA. In Stage 2, following SNP quality control procedures, 41 nominated SNPs were tested for association with family chaos by individual genotyping an independent representative sample of 3,529. Despite having 99% power to detect associations that account for more than 0.5% of the variance, none of the 41 nominated SNPs met conservative criteria for replication. Similar to GWA analyses of other complex traits, it is likely that most of the heritable variation in environmental measures such as family chaos is due to many genes of very small effect size.
Collapse
Affiliation(s)
- Lee M Butcher
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Box Number P082, De Crespigny Park, London, UK.
| | | |
Collapse
|
21
|
Macgregor S, Zhao ZZ, Henders A, Nicholas MG, Montgomery GW, Visscher PM. Highly cost-efficient genome-wide association studies using DNA pools and dense SNP arrays. Nucleic Acids Res 2008; 36:e35. [PMID: 18276640 PMCID: PMC2346606 DOI: 10.1093/nar/gkm1060] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Genome-wide association (GWA) studies to map genes for complex traits are powerful yet costly. DNA-pooling strategies have the potential to dramatically reduce the cost of GWA studies. Pooling using Affymetrix arrays has been proposed and used but the efficiency of these arrays has not been quantified. We compared and contrasted Affymetrix Genechip HindIII and Illumina HumanHap300 arrays on the same DNA pools and showed that the HumanHap300 arrays are substantially more efficient. In terms of effective sample size, HumanHap300-based pooling extracts >80% of the information available with individual genotyping (IG). In contrast, Genechip HindIII-based pooling only extracts approximately 30% of the available information. With HumanHap300 arrays concordance with IG data is excellent. Guidance is given on best study design and it is shown that even after taking into account pooling error, one stage scans can be performed for >100-fold reduced cost compared with IG. With appropriately designed two stage studies, IG can provide confirmation of pooling results whilst still providing approximately 20-fold reduction in total cost compared with IG-based alternatives. The large cost savings with Illumina HumanHap300-based pooling imply that future studies need only be limited by the availability of samples and not cost.
Collapse
Affiliation(s)
- Stuart Macgregor
- Genetic Epidemiology, Queensland Institute of Medical Research, Brisbane, Australia.
| | | | | | | | | | | |
Collapse
|
22
|
Butcher LM, Davis OSP, Craig IW, Plomin R. Genome-wide quantitative trait locus association scan of general cognitive ability using pooled DNA and 500K single nucleotide polymorphism microarrays. GENES BRAIN AND BEHAVIOR 2008; 7:435-46. [PMID: 18067574 PMCID: PMC2408663 DOI: 10.1111/j.1601-183x.2007.00368.x] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
General cognitive ability (g), which refers to what cognitive abilities have in common, is an important target for molecular genetic research because multivariate quantitative genetic analyses have shown that the same set of genes affects diverse cognitive abilities as well as learning disabilities. In this first autosomal genome-wide association scan of g, we used a two-stage quantitative trait locus (QTL) design with pooled DNA to screen more than 500 000 single nucleotide polymorphisms (SNPs) on microarrays, selecting from a sample of 7000 7-year-old children. In stage 1, we screened for allele frequency differences between groups pooled for low and high g. In stage 2, 47 SNPs nominated in stage 1 were tested by individually genotyping an independent sample of 3195 individuals, representative of the entire distribution of g scores in the full 7000 7-year-old children. Six SNPs yielded significant associations across the normal distribution of g, although only one SNP remained significant after a false discovery rate of 0.05 was imposed. However, none of these SNPs accounted for more than 0.4% of the variance of g, despite 95% power to detect associations of that size. It is likely that QTL effect sizes, even for highly heritable traits such as cognitive abilities and disabilities, are much smaller than previously assumed. Nonetheless, an aggregated ‘SNP set’ of the six SNPs correlated 0.11 (P < 0.00000003) with g. This shows that future SNP sets that will incorporate many more SNPs could be useful for predicting genetic risk and for investigating functional systems of effects from genes to brain to behavior.
Collapse
Affiliation(s)
- L M Butcher
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, London, UK
| | | | | | | |
Collapse
|
23
|
Liu H, Yu W, Wang X, Fang F, Yang G, Zhou J, Liang X, An W. Number of STR repeats as a potential new quantitative genetic marker for complex diseases, illustrated by schizophrenia. Biochem Genet 2007; 45:683-9. [PMID: 17690978 DOI: 10.1007/s10528-007-9105-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2006] [Accepted: 02/24/2007] [Indexed: 11/29/2022]
Abstract
It has proved difficult to find strong and replicable genetic linkages for complex diseases, since each susceptibility gene makes only a modest contribution to onset. This is partly because high-efficacy genetic markers are not usually available. The aim of this article is to explore the possibility that the total number of tandem repeats in one STR locus, rather than the frequencies of different alleles, is a higher efficacy quantitative genetic marker. DNA samples were collected from schizophrenic patients and from a control population. Alleles of the short tandem repeats (STR) loci D3S1358, vWA, and FGA were determined using the STR Profiler Plus PCR amplification kit. The two groups did not differ statistically in the frequencies of alleles at the D3S1358, vWA, or FGA loci. However, a significant difference was obtained in the vWA locus when the total number of core unit repeats was compared between the schizophrenia and control groups (33.28+/-2.61 vs. 32.35+/-2.58, P<0.05). It seems that the number of STR repeats may be a new, quantitative, and higher efficacy genetic marker for directly indicating genetic predisposition to complex hereditary diseases such as schizophrenia.
Collapse
Affiliation(s)
- Hui Liu
- College of Medical Laboratory, Dalian Medical University, Dalian, 116027, P.R. China.
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Docherty SJ, Butcher LM, Schalkwyk LC, Plomin R. Applicability of DNA pools on 500 K SNP microarrays for cost-effective initial screens in genomewide association studies. BMC Genomics 2007; 8:214. [PMID: 17610740 PMCID: PMC1925094 DOI: 10.1186/1471-2164-8-214] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2007] [Accepted: 07/04/2007] [Indexed: 01/02/2023] Open
Abstract
Background Genetic influences underpinning complex traits are thought to involve multiple quantitative trait loci (QTLs) of small effect size. Detection of such QTL associations requires systematic screening of large numbers of DNA markers within large sample populations. Using pooled DNA on SNP microarrays to screen for allelic frequency differences between groups such as cases and controls (called SNP Microarray and Pooling, or SNP-MaP) has been validated as an efficient solution on both 10 k and 100 k platforms. We demonstrate that this approach can be effectively applied to the truly genomewide Affymetrix GeneChip® Mapping 500 K Array. Results In comparisons between five independent DNA pools (N ~200 per pool) on separate Affymetrix GeneChip® Mapping 500 K Array sets, we show that, for SNPs with minor allele frequencies > 0.05, the reliability of the rank order of estimated allele frequencies, assessed as the average correlation between allele frequency estimates across the DNA pools, was 0.948 (average mean difference across the five pools = 0.069). Similarly, validity of the SNP-MaP approach was demonstrated by a rank-order correlation of 0.937 (average mean difference = 0.095) between the average DNA pool allele frequency estimates and the allele frequencies of an independent (CEPH) sample of 60 unrelated individually genotyped subjects. Conclusion We conclude that SNP-MaP can be extended for use on the Affymetrix GeneChip® Mapping 500 K Array, providing a cost-effective, reliable and valid initial screen of 500 K SNP microarrays in genomewide association scans.
Collapse
Affiliation(s)
- Sophia J Docherty
- Social, Genetic and Developmental Psychiatry Centre, Box Number P082, Institute of Psychiatry, DeCrispigny Park, London, SE5 8AF, UK
| | - Lee M Butcher
- Social, Genetic and Developmental Psychiatry Centre, Box Number P082, Institute of Psychiatry, DeCrispigny Park, London, SE5 8AF, UK
| | - Leonard C Schalkwyk
- Social, Genetic and Developmental Psychiatry Centre, Box Number P082, Institute of Psychiatry, DeCrispigny Park, London, SE5 8AF, UK
| | - Robert Plomin
- Social, Genetic and Developmental Psychiatry Centre, Box Number P082, Institute of Psychiatry, DeCrispigny Park, London, SE5 8AF, UK
| |
Collapse
|
25
|
Korol A, Frenkel Z, Cohen L, Lipkin E, Soller M. Fractioned DNA pooling: a new cost-effective strategy for fine mapping of quantitative trait loci. Genetics 2007; 176:2611-23. [PMID: 17603122 PMCID: PMC1950659 DOI: 10.1534/genetics.106.070011] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Selective DNA pooling (SDP) is a cost-effective means for an initial scan for linkage between marker and quantitative trait loci (QTL) in suitable populations. The method is based on scoring marker allele frequencies in DNA pools from the tails of the population trait distribution. Various analytical approaches have been proposed for QTL detection using data on multiple families with SDP analysis. This article presents a new experimental procedure, fractioned-pool design (FPD), aimed to increase the reliability of SDP mapping results, by "fractioning" the tails of the population distribution into independent subpools. FPD is a conceptual and structural modification of SDP that allows for the first time the use of permutation tests for QTL detection rather than relying on presumed asymptotic distributions of the test statistics. For situations of family and cross mapping design we propose a spectrum of new tools for QTL mapping in FPD that were previously possible only with individual genotyping. These include: joint analysis of multiple families and multiple markers across a chromosome, even when the marker loci are only partly shared among families; detection of families segregating (heterozygous) for the QTL; estimation of confidence intervals for the QTL position; and analysis of multiple-linked QTL. These new advantages are of special importance for pooling analysis with SNP chips. Combining SNP microarray analysis with DNA pooling can dramatically reduce the cost of screening large numbers of SNPs on large samples, making chip technology readily applicable for genomewide association mapping in humans and farm animals. This extension, however, will require additional, nontrivial, development of FPD analytical tools.
Collapse
Affiliation(s)
- A Korol
- Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel.
| | | | | | | | | |
Collapse
|
26
|
Hanson RL, Craig DW, Millis MP, Yeatts KA, Kobes S, Pearson JV, Lee AM, Knowler WC, Nelson RG, Wolford JK. Identification of PVT1 as a candidate gene for end-stage renal disease in type 2 diabetes using a pooling-based genome-wide single nucleotide polymorphism association study. Diabetes 2007; 56:975-83. [PMID: 17395743 DOI: 10.2337/db06-1072] [Citation(s) in RCA: 147] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
To identify genetic variants contributing to end-stage renal disease (ESRD) in type 2 diabetes, we performed a genome-wide analysis of 115,352 single nucleotide polymorphisms (SNPs) in pools of 105 unrelated case subjects with ESRD and 102 unrelated control subjects who have had type 2 diabetes for > or =10 years without macroalbuminuria. Using a sliding window statistic of ranked SNPs, we identified a 200-kb region on 8q24 harboring three SNPs showing substantial differences in allelic frequency between case and control pools. These SNPs were genotyped in individuals comprising each pool, and strong evidence for association was found with rs2720709 (P = 0.000021; odds ratio 2.57 [95% CI 1.66-3.96]), which is located in the plasmacytoma variant translocation gene PVT1. We sequenced all exons, exon-intron boundaries, and the promoter of PVT1 and identified 47 variants, 11 of which represented nonredundant markers with minor allele frequency > or =0.05. We subsequently genotyped these 11 variants and an additional 87 SNPs identified through public databases in 319-kb flanking rs2720709 ( approximately 1 SNP/3.5 kb); 23 markers were associated with ESRD at P < 0.01. The strongest evidence for association was found for rs2648875 (P = 0.0000018; 2.97 [1.90-4.65]), which maps to intron 8 of PVT1. Together, these results suggest that PVT1 may contribute to ESRD susceptibility in diabetes.
Collapse
Affiliation(s)
- Robert L Hanson
- Translational Genomics Research Institute, 445 North Fifth St., Phoenix, AZ 85004, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Wilkening S, Chen B, Wirtenberger M, Burwinkel B, Försti A, Hemminki K, Canzian F. Allelotyping of pooled DNA with 250 K SNP microarrays. BMC Genomics 2007; 8:77. [PMID: 17367522 PMCID: PMC1839100 DOI: 10.1186/1471-2164-8-77] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Accepted: 03/16/2007] [Indexed: 12/02/2022] Open
Abstract
Background Genotyping technologies for whole genome association studies are now available. To perform such studies to an affordable price, pooled DNA can be used. Recent studies have shown that GeneChip Human Mapping 10 K and 50 K arrays are suitable for the estimation of the allele frequency in pooled DNA. In the present study, we tested the accuracy of the 250 K Nsp array, which is part of the 500 K array set representing 500,568 SNPs. Furthermore, we compared different algorithms to estimate allele frequencies of pooled DNA. Results We could confirm that the polynomial based probe specific correction (PPC) was the most accurate method for allele frequency estimation. However, a simple k-correction, using the relative allele signal (RAS) of heterozygous individuals, performed only slightly worse and provided results for more SNPs. Using four replicates of the 250 K array and the k-correction using heterozygous RAS values, we obtained results for 104.141 SNPs. The correlation between estimated and real allele frequency was 0.983 and the average error was 0.046, which was comparable to the results obtained with the 10 K array. Furthermore, we could show how the estimation accuracy depended on the SNP type (average error for A/T SNPs: 0.043 and for G/C SNPs: 0.052). Conclusion The combination of DNA pooling and analysis of single nucleotide polymorphisms (SNPs) on high density microarrays is a promising tool for whole genome association studies.
Collapse
Affiliation(s)
- Stefan Wilkening
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Bowang Chen
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Michael Wirtenberger
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Barbara Burwinkel
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Helmholtz University Group Molecular Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Asta Försti
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Center for Family Medicine, Karolinska Institute, SE-14183 Huddinge, Sweden
| | - Kari Hemminki
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Center for Family Medicine, Karolinska Institute, SE-14183 Huddinge, Sweden
| | - Federico Canzian
- Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| |
Collapse
|
28
|
Macgregor S. Most pooling variation in array-based DNA pooling is attributable to array error rather than pool construction error. Eur J Hum Genet 2007; 15:501-4. [PMID: 17264871 DOI: 10.1038/sj.ejhg.5201768] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Genome-wide association (GWA) approaches are important in complex disease gene mapping studies but are often prohibitively expensive. Array-based DNA pooling has been shown to offer substantial cost savings compared with individual genotyping. This reduced cost potentially brings well-powered GWA studies well within the reach of most laboratories. The main factor, which affects the efficiency of pooling compared with individual genotyping is the magnitude of the pooling error variance. By examining variation between and within pools it is shown that most of the error associated with pooling is attributable to array variation not pooling construction variation (assuming the pools are not small and the pools are accurately constructed). With Affymetrix HindIII 50K arrays used here the array-specific variance is seven times the pooling construction variance. This has important implications for optimal study design for array-based pooling. Given carefully constructed pools, resources should be allocated to increasing the number of arrays per sample rather than to constructing multiple pools.
Collapse
Affiliation(s)
- Stuart Macgregor
- Genetic Epidemiology, Queensland Institute of Medical Research, Brisbane, Australia.
| |
Collapse
|
29
|
Pearson JV, Huentelman MJ, Halperin RF, Tembe WD, Melquist S, Homer N, Brun M, Szelinger S, Coon KD, Zismann VL, Webster JA, Beach T, Sando SB, Aasly JO, Heun R, Jessen F, Kolsch H, Tsolaki M, Daniilidou M, Reiman EM, Papassotiropoulos A, Hutton ML, Stephan DA, Craig DW. Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies. Am J Hum Genet 2007; 80:126-39. [PMID: 17160900 PMCID: PMC1785308 DOI: 10.1086/510686] [Citation(s) in RCA: 126] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2006] [Accepted: 11/07/2006] [Indexed: 01/06/2023] Open
Abstract
We report the development and validation of experimental methods, study designs, and analysis software for pooling-based genomewide association (GWA) studies that use high-throughput single-nucleotide-polymorphism (SNP) genotyping microarrays. We first describe a theoretical framework for establishing the effectiveness of pooling genomic DNA as a low-cost alternative to individually genotyping thousands of samples on high-density SNP microarrays. Next, we describe software called "GenePool," which directly analyzes SNP microarray probe intensity data and ranks SNPs by increased likelihood of being genetically associated with a trait or disorder. Finally, we apply these methods to experimental case-control data and demonstrate successful identification of published genetic susceptibility loci for a rare monogenic disease (sudden infant death with dysgenesis of the testes syndrome), a rare complex disease (progressive supranuclear palsy), and a common complex disease (Alzheimer disease) across multiple SNP genotyping platforms. On the basis of these theoretical calculations and their experimental validation, our results suggest that pooling-based GWA studies are a logical first step for determining whether major genetic associations exist in diseases with high heritability.
Collapse
Affiliation(s)
- John V Pearson
- Translational Genomics Research Institute, Phoenix, AZ, 85004, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Macgregor S, Visscher PM, Montgomery G. Analysis of pooled DNA samples on high density arrays without prior knowledge of differential hybridization rates. Nucleic Acids Res 2006; 34:e55. [PMID: 16627870 PMCID: PMC1440945 DOI: 10.1093/nar/gkl136] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Array based DNA pooling techniques facilitate genome-wide scale genotyping of large samples. We describe a structured analysis method for pooled data using internal replication information in large scale genotyping sets. The method takes advantage of information from single nucleotide polymorphisms (SNPs) typed in parallel on a high density array to construct a test statistic with desirable statistical properties. We utilize a general linear model to appropriately account for the structured multiple measurements available with array data. The method does not require the use of additional arrays for the estimation of unequal hybridization rates and hence scales readily to accommodate arrays with several hundred thousand SNPs. Tests for differences between cases and controls can be conducted with very few arrays. We demonstrate the method on 384 endometriosis cases and controls, typed using Affymetrix Genechip© HindIII 50 K arrays. For a subset of this data there were accurate measures of hybridization rates available. Assuming equal hybridization rates is shown to have a negligible effect upon the results. With a total of only six arrays, the method extracted one-third of the information (in terms of equivalent sample size) available with individual genotyping (requiring 768 arrays). With 20 arrays (10 for cases, 10 for controls), over half of the information could be extracted from this sample.
Collapse
Affiliation(s)
- Stuart Macgregor
- Genetic Epidemiology, Queensland Institute of Medical Research, Brisbane, Australia.
| | | | | |
Collapse
|