1
|
Taslima K, Wehner S, Taggart JB, de Verdal H, Benzie JAH, Bekaert M, McAndrew BJ, Penman DJ. Sex determination in the GIFT strain of tilapia is controlled by a locus in linkage group 23. BMC Genet 2020; 21:49. [PMID: 32349678 PMCID: PMC7189693 DOI: 10.1186/s12863-020-00853-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 04/15/2020] [Indexed: 12/25/2022] Open
Abstract
Background Tilapias (Family Cichlidae) are the second most important group of aquaculture species in the world. They have been the subject of much research on sex determination due to problems caused by early maturation in culture and their complex sex-determining systems. Different sex-determining loci (linkage group 1, 20 and 23) have been detected in various tilapia stocks. The ‘genetically improved farmed tilapia’ (GIFT) stock, founded from multiple Nile tilapia (Oreochromis niloticus) populations, with some likely to have been introgressed with O. mossambicus, is a key resource for tilapia aquaculture. The sex-determining mechanism in the GIFT stock was unknown, but potentially complicated due to its multiple origins. Results A bulk segregant analysis (BSA) version of double-digest restriction-site associated DNA sequencing (BSA-ddRADseq) was developed and used to detect and position sex-linked single nucleotide polymorphism (SNP) markers in 19 families from the GIFT strain breeding nucleus and two Stirling families as controls (a single XY locus had been previously mapped to LG1 in the latter). About 1500 SNPs per family were detected across the genome. Phenotypic sex in Stirling families showed strong association with LG1, whereas only SNPs located in LG23 showed clear association with sex in the majority of the GIFT families. No other genomic regions linked to sex determination were apparent. This region was validated using a series of LG23-specific DNA markers (five SNPs with highest association to sex from this study, the LG23 sex-associated microsatellite UNH898 and ARO172, and the recently isolated amhy marker for individual fish (n = 284). Conclusions Perhaps surprisingly given its multiple origins, sex determination in the GIFT strain breeding nucleus was associated only with a locus in LG23. BSA-ddRADseq allowed cost-effective analysis of multiple families, strengthening this conclusion. This technique has potential to be applied to other complex traits. The sex-linked SNP markers identified will be useful for potential marker-assisted selection (MAS) to control sex-ratio in GIFT tilapia to suppress unwanted reproduction during growout.
Collapse
Affiliation(s)
- Khanam Taslima
- Institute of Aquaculture, Faculty of Natural Sciences, University of Stirling, Stirling, Scotland, UK.,Department of Fisheries Biology and Genetics, Bangladesh Agricultural University, Mymensingh, 2202, Bangladesh
| | - Stefanie Wehner
- Institute of Aquaculture, Faculty of Natural Sciences, University of Stirling, Stirling, Scotland, UK
| | - John B Taggart
- Institute of Aquaculture, Faculty of Natural Sciences, University of Stirling, Stirling, Scotland, UK
| | - Hugues de Verdal
- WorldFish Centre, Jalan Batu Maung, Bayan Lepas, Penang, Malaysia.,CIRAD, UMR ISEM, F-34398 Montpellier, France; ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - John A H Benzie
- WorldFish Centre, Jalan Batu Maung, Bayan Lepas, Penang, Malaysia.,School of Biological Earth and Environmental Sciences, University College Cork, Cork, Ireland
| | - Michaël Bekaert
- Institute of Aquaculture, Faculty of Natural Sciences, University of Stirling, Stirling, Scotland, UK
| | - Brendan J McAndrew
- Institute of Aquaculture, Faculty of Natural Sciences, University of Stirling, Stirling, Scotland, UK
| | - David J Penman
- Institute of Aquaculture, Faculty of Natural Sciences, University of Stirling, Stirling, Scotland, UK.
| |
Collapse
|
2
|
Alexandre PA, Porto-Neto LR, Karaman E, Lehnert SA, Reverter A. Pooled genotyping strategies for the rapid construction of genomic reference populations1. J Anim Sci 2019; 97:4761-4769. [PMID: 31710679 DOI: 10.1093/jas/skz344] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 11/06/2019] [Indexed: 01/24/2023] Open
Abstract
The growing concern with the environment is making important for livestock producers to focus on selection for efficiency-related traits, which is a challenge for commercial cattle herds due to the lack of pedigree information. To explore a cost-effective opportunity for genomic evaluations of commercial herds, this study compared the accuracy of bulls' genomic estimated breeding values (GEBV) using different pooled genotype strategies. We used ten replicates of previously simulated genomic and phenotypic data for one low (t1) and one moderate (t2) heritability trait of 200 sires and 2,200 progeny. Sire's GEBV were calculated using a univariate mixed model, with a hybrid genomic relationship matrix (h-GRM) relating sires to: 1) 1,100 pools of 2 animals; 2) 440 pools of 5 animals; 3) 220 pools of 10 animals; 4) 110 pools of 20 animals; 5) 88 pools of 25 animals; 6) 44 pools of 50 animals; and 7) 22 pools of 100 animals. Pooling criteria were: at random, grouped sorting by t1, grouped sorting by t2, and grouped sorting by a combination of t1 and t2. The same criteria were used to select 110, 220, 440, and 1,100 individual genotypes for GEBV calculation to compare GEBV accuracy using the same number of individual genotypes and pools. Although the best accuracy was achieved for a given trait when pools were grouped based on that same trait (t1: 0.50-0.56, t2: 0.66-0.77), pooling by one trait impacted negatively on the accuracy of GEBV for the other trait (t1: 0.25-0.46, t2: 0.29-0.71). Therefore, the combined measure may be a feasible alternative to use the same pools to calculate GEBVs for both traits (t1: 0.45-0.57, t2: 0.62-0.76). Pools of 10 individuals were identified as representing a good compromise between loss of accuracy (~10%-15%) and cost savings (~90%) from genotype assays. In addition, we demonstrated that in more than 90% of the simulations, pools present higher sires' GEBV accuracy than individual genotypes when the number of genotype assays is limited (i.e., 110 or 220) and animals are assigned to pools based on phenotype. Pools assigned at random presented the poorest results (t1: 0.07-0.45, t2: 0.14-0.70). In conclusion, pooling by phenotype is the best approach to implementing genomic evaluation using commercial herd data, particularly when pools of 10 individuals are evaluated. While combining phenotypes seems a promising strategy to allow more flexibility to the estimates made using pools, more studies are necessary in this regard.
Collapse
Affiliation(s)
- Pâmela A Alexandre
- Agriculture & Food, Commonwealth Scientific and Industrial Research Organization, Brisbane, QLD, Australia
| | - Laercio R Porto-Neto
- Agriculture & Food, Commonwealth Scientific and Industrial Research Organization, Brisbane, QLD, Australia
| | - Emre Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Sigrid A Lehnert
- Agriculture & Food, Commonwealth Scientific and Industrial Research Organization, Brisbane, QLD, Australia
| | - Antonio Reverter
- Agriculture & Food, Commonwealth Scientific and Industrial Research Organization, Brisbane, QLD, Australia
| |
Collapse
|
3
|
Kindt AS, Fuerst RW, Knoop J, Laimighofer M, Telieps T, Hippich M, Woerheide MA, Wahl S, Wilson R, Sedlmeier EM, Hommel A, Todd JA, Krumsiek J, Ziegler AG, Bonifacio E. Allele-specific methylation of type 1 diabetes susceptibility genes. J Autoimmun 2018; 89:63-74. [DOI: 10.1016/j.jaut.2017.11.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 11/23/2017] [Accepted: 11/25/2017] [Indexed: 01/09/2023]
|
4
|
Hellicar AD, Rahman A, Smith DV, Henshall JM. Machine learning approach for pooled DNA sample calibration. BMC Bioinformatics 2015; 16:214. [PMID: 26156142 PMCID: PMC4495942 DOI: 10.1186/s12859-015-0593-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 04/23/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. However, genotyping of pooled samples comprising DNA from many individuals is challenging due to the presence of errors that exceed the allele frequency quantisation size and therefore cannot be simply corrected by clustering techniques. The solution to the calibration problem is a correction to the allele frequency to mitigate errors incurred in the measurement process. We highlight the limitations of the existing calibration solutions such as the fact they impose assumptions on the variation between allele frequencies 0, 0.5, and 1.0, and address a limited set of error types. We propose a novel machine learning method to address the limitations identified. RESULTS The approach is tested on SNPs genotyped with the Sequenom iPLEX platform and compared to existing state of the art calibration methods. The new method is capable of reducing the mean square error in allele frequency to half that achievable with existing approaches. Furthermore for the first time we demonstrate the importance of carefully considering the choice of training data when using calibration approaches built from pooled data. CONCLUSION This paper demonstrates that improvements in pooled allele frequency estimates result if the genotyping platform is characterised at allele frequencies other than the homozygous and heterozygous cases. Techniques capable of incorporating such information are described along with aspects of implementation.
Collapse
Affiliation(s)
- Andrew D Hellicar
- CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia.
| | - Ashfaqur Rahman
- CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia.
| | - Daniel V Smith
- CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia.
| | | |
Collapse
|
5
|
Chevalier FD, Valentim CLL, LoVerde PT, Anderson TJC. Efficient linkage mapping using exome capture and extreme QTL in schistosome parasites. BMC Genomics 2014; 15:617. [PMID: 25048426 PMCID: PMC4117968 DOI: 10.1186/1471-2164-15-617] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Accepted: 07/14/2014] [Indexed: 12/30/2022] Open
Abstract
Background Identification of parasite genes that underlie traits such as drug resistance and host specificity is challenging using classical linkage mapping approaches. Extreme QTL (X-QTL) methods, originally developed by rodent malaria and yeast researchers, promise to increase the power and simplify logistics of linkage mapping in experimental crosses of schistosomes (or other helminth parasites), because many 1000s of progeny can be analysed, phenotyping is not required, and progeny pools rather than individuals are genotyped. We explored the utility of this method for mapping a drug resistance gene in the human parasitic fluke Schistosoma mansoni. Results We staged a genetic cross between oxamniquine sensitive and resistant parasites, then between two F1 progeny, to generate multiple F2 progeny. One group of F2s infecting hamsters was treated with oxamniquine, while a second group was left untreated. We used exome capture to reduce the size of the genome (from 363 Mb to 15 Mb) and exomes from pooled F2 progeny (treated males, untreated males, treated females, untreated females) and the two parent parasites were sequenced to high read depth (mean = 95-366×) and allele frequencies at 14,489 variants compared. We observed dramatic enrichment of alleles from the resistant parent in a small region of chromosome 6 in drug-treated male and female pools (combined analysis: = 11.07, p = 8.74 × 10-29). This region contains Smp_089320 a gene encoding a sulfotransferase recently implicated in oxamniquine resistance using classical linkage mapping methods. Conclusions These results (a) demonstrate the utility of exome capture for generating reduced representation libraries in Schistosoma mansoni, and (b) provide proof-of-principle that X-QTL methods can be successfully applied to an important human helminth. The combination of these methods will simplify linkage analysis of biomedically or biologically important traits in this parasite. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-617) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Timothy J C Anderson
- Department of Genetics, Texas Biomedical Research Institute, P,O, Box 760549, 78245 San Antonio, Texas, USA.
| |
Collapse
|
6
|
Analysis and optimization of bulk DNA sampling with binary scoring for germplasm characterization. PLoS One 2013; 8:e79936. [PMID: 24260321 PMCID: PMC3833943 DOI: 10.1371/journal.pone.0079936] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2013] [Accepted: 10/05/2013] [Indexed: 11/19/2022] Open
Abstract
The strategy of bulk DNA sampling has been a valuable method for studying large numbers of individuals through genetic markers. The application of this strategy for discrimination among germplasm sources was analyzed through information theory, considering the case of polymorphic alleles scored binarily for their presence or absence in DNA pools. We defined the informativeness of a set of marker loci in bulks as the mutual information between genotype and population identity, composed by two terms: diversity and noise. The first term is the entropy of bulk genotypes, whereas the noise term is measured through the conditional entropy of bulk genotypes given germplasm sources. Thus, optimizing marker information implies increasing diversity and reducing noise. Simple formulas were devised to estimate marker information per allele from a set of estimated allele frequencies across populations. As an example, they allowed optimization of bulk size for SSR genotyping in maize, from allele frequencies estimated in a sample of 56 maize populations. It was found that a sample of 30 plants from a random mating population is adequate for maize germplasm SSR characterization. We analyzed the use of divided bulks to overcome the allele dilution problem in DNA pools, and concluded that samples of 30 plants divided into three bulks of 10 plants are efficient to characterize maize germplasm sources through SSR with a good control of the dilution problem. We estimated the informativeness of 30 SSR loci from the estimated allele frequencies in maize populations, and found a wide variation of marker informativeness, which positively correlated with the number of alleles per locus.
Collapse
|
7
|
Guo Y, Cai Q, Li C, Li J, Courtney R, Zheng W, Long J. An evaluation of allele frequency estimation accuracy using pooled sequencing data. ACTA ACUST UNITED AC 2013; 6:279-93. [PMID: 24088264 DOI: 10.1504/ijcbdd.2013.056709] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Next generation sequencing technology has matured, and with its current affordability, will replace the SNP chip as the genotyping tool of choice. Even with the current affordability of NGS, large scale studies will require careful study design to reduce cost. In this study, we designed an experiment to assess the accuracy of allele frequency estimated from pooled sequencing data. We compared the allele frequency estimated from sequencing data with the allele frequency estimated from individual SNP chip data and observed high correlations between them. However, by calculating error rate, we found that many SNPs had their allele frequency estimated from sequencing data significantly different from allele frequency estimated from SNP chip data. In conclusion, we found correlation is not an ideal measurement for comparing allele frequencies. And for the purpose of estimating allele frequency, we do not recommend using pooling with NGS as a cheaper alternative to genotype each sample individually.
Collapse
Affiliation(s)
- Yan Guo
- Department of Cancer Biology, Vanderbilt University, Nashville TN 37232, USA
| | | | | | | | | | | | | |
Collapse
|
8
|
Evaluation of allele frequency estimation using pooled sequencing data simulation. ScientificWorldJournal 2013; 2013:895496. [PMID: 23476151 PMCID: PMC3582166 DOI: 10.1155/2013/895496] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 12/30/2012] [Indexed: 11/17/2022] Open
Abstract
Next-generation sequencing (NGS) technology has provided researchers with opportunities to study the genome in unprecedented detail. In particular, NGS is applied to disease association studies. Unlike genotyping chips, NGS is not limited to a fixed set of SNPs. Prices for NGS are now comparable to the SNP chip, although for large studies the cost can be substantial. Pooling techniques are often used to reduce the overall cost of large-scale studies. In this study, we designed a rigorous simulation model to test the practicability of estimating allele frequency from pooled sequencing data. We took crucial factors into consideration, including pool size, overall depth, average depth per sample, pooling variation, and sampling variation. We used real data to demonstrate and measure reference allele preference in DNAseq data and implemented this bias in our simulation model. We found that pooled sequencing data can introduce high levels of relative error rate (defined as error rate divided by targeted allele frequency) and that the error rate is more severe for low minor allele frequency SNPs than for high minor allele frequency SNPs. In order to overcome the error introduced by pooling, we recommend a large pool size and high average depth per sample.
Collapse
|
9
|
Danaher MR, Schisterman EF, Roy A, Albert PS. Estimation of gene-environment interaction by pooling biospecimens. Stat Med 2012; 31:3241-52. [PMID: 22859290 DOI: 10.1002/sim.5357] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Accepted: 02/08/2012] [Indexed: 11/09/2022]
Abstract
Case-control studies are prone to low power for testing gene-environment interactions (GXE) given the need for a sufficient number of individuals on each strata of disease, gene, and environment. We propose a new study design to increase power by strategically pooling biospecimens. Pooling biospecimens allows us to increase the number of subjects significantly, thereby providing substantial increase in power. We focus on a special, although realistic case, where disease and environmental statuses are binary, and gene status is ordinal with each individual having 0, 1, or 2 minor alleles. Through pooling, we obtain an allele frequency for each level of disease and environmental status. Using the allele frequencies, we develop a new methodology for estimating and testing GXE that is comparable to the situation when we have complete data on gene status for each individual. We also explore the measurement process and its effect on the GXE estimator. Using an illustration, we show the effectiveness of pooling with an epidemiologic study, which tests an interaction for fiber and paraoxonase on anovulation. Through simulation, we show that taking 12 pooled measurements from 1000 individuals achieves more power than individually genotyping 500 individuals. Our findings suggest that strategic pooling should be considered when an investigator designs a pilot study to test for a GXE.
Collapse
Affiliation(s)
- M R Danaher
- Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, MD, U.S.A
| | | | | | | |
Collapse
|
10
|
Upadhyaya HD, Wang YH, Sharma S, Singh S. Association mapping of height and maturity across five environments using the sorghum mini core collection. Genome 2012; 55:471-9. [PMID: 22680231 DOI: 10.1139/g2012-034] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Sorghum is a potential energy crop thanks to its high biomass productivity and low input. Biomass yield in sorghum is defined by height and maturity. To develop molecular breeding tools for genetic improvement of these two traits, we have identified simple sequence repeat markers linked to height and maturity using a pool-based association mapping technique. The sorghum mini core collection was evaluated across five environments for height and maturity. Seven tall and seven short accessions were selected based on their height in all environments. Likewise, six early- and 10 late-maturing accessions were selected mostly based on their maturity in two post-rainy seasons. Two additional height pools were constructed based on phenotypes in one environment. The three pairs of pools were screened with 703 SSR markers and 39 polymorphic markers were confirmed by individual genotyping. Association mapping of the 39 markers with 242 accessions from the mini core collection identified five markers associated with maturity or height. All were clustered on chromosomes 6, 9, and 10 with previously mapped height and maturity markers or QTLs. One marker associated with both height and maturity was 84 kb from recently cloned Ma1. These markers will lay a foundation for identifying additional height and maturity genes in sorghum.
Collapse
Affiliation(s)
- Hari D Upadhyaya
- a Gene Bank, International Crops Research Institute for the Semi Arid Tropics, Patancheru 502 324, Andhra Pradesh, India
| | | | | | | |
Collapse
|
11
|
Oxidative stress survival in a clinical Saccharomyces cerevisiae isolate is influenced by a major quantitative trait nucleotide. Genetics 2011; 188:709-22. [PMID: 21515583 DOI: 10.1534/genetics.111.128256] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
One of the major challenges in characterizing eukaryotic genetic diversity is the mapping of phenotypes that are the cumulative effect of multiple alleles. We have investigated tolerance of oxidative stress in the yeast Saccharomyces cerevisiae, a trait showing phenotypic variation in the population. Initial crosses identified that this is a quantitative trait. Microorganisms experience oxidative stress in many environments, including during infection of higher eukaryotes. Natural variation in oxidative stress tolerance is an important aspect of response to oxidative stress exerted by the human immune system and an important trait in microbial pathogens. A clinical isolate of the usually benign yeast S. cerevisiae was found to survive oxidative stress significantly better than the laboratory strain. We investigated the genetic basis of increased peroxide survival by crossing those strains, phenotyping 1500 segregants, and genotyping of high-survival segregants by hybridization of bulk and single segregant DNA to microarrays. This effort has led to the identification of an allele of the transcription factor Rds2 as contributing to stress response. Rds2 has not previously been associated with the survival of oxidative stress. The identification of its role in the oxidative stress response here is an example of a specific trait that appears to be beneficial to Saccharomyces cerevisiae when growing as a pathogen. Understanding the role of this fungal-specific transcription factor in pathogenicity will be important in deciphering how fungi infect and colonize the human host and could eventually lead to a novel drug target.
Collapse
|
12
|
Bercovici S, Geiger D. Admixture Aberration Analysis: Application to Mapping in Admixed Population Using Pooled DNA. J Comput Biol 2011; 18:237-49. [DOI: 10.1089/cmb.2010.0250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Sivan Bercovici
- Computer Science Department, Technion–Israel Institute of Technology, Haifa, Israel
| | - Dan Geiger
- Computer Science Department, Technion–Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
13
|
Yang HC, Lin HC, Huang MC, Li LH, Pan WH, Wu JY, Chen YT. A new analysis tool for individual-level allele frequency for genomic studies. BMC Genomics 2010; 11:415. [PMID: 20602748 PMCID: PMC2996943 DOI: 10.1186/1471-2164-11-415] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 07/05/2010] [Indexed: 01/23/2023] Open
Abstract
Background Allele frequency is one of the most important population indices and has been broadly applied to genetic/genomic studies. Estimation of allele frequency using genotypes is convenient but may lose data information and be sensitive to genotyping errors. Results This study utilizes a unified intensity-measuring approach to estimating individual-level allele frequencies for 1,104 and 1,270 samples genotyped with the single-nucleotide-polymorphism arrays of the Affymetrix Human Mapping 100K and 500K Sets, respectively. Allele frequencies of all samples are estimated and adjusted by coefficients of preferential amplification/hybridization (CPA), and large ethnicity-specific and cross-ethnicity databases of CPA and allele frequency are established. The results show that using the CPA significantly improves the accuracy of allele frequency estimates; moreover, this paramount factor is insensitive to the time of data acquisition, effect of laboratory site, type of gene chip, and phenotypic status. Based on accurate allele frequency estimates, analytic methods based on individual-level allele frequencies are developed and successfully applied to discover genomic patterns of allele frequencies, detect chromosomal abnormalities, classify sample groups, identify outlier samples, and estimate the purity of tumor samples. The methods are packaged into a new analysis tool, ALOHA (Allele-frequency/Loss-of-heterozygosity/Allele-imbalance). Conclusions This is the first time that these important genetic/genomic applications have been simultaneously conducted by the analyses of individual-level allele frequencies estimated by a unified intensity-measuring approach. We expect that additional practical applications for allele frequency analysis will be found. The developed databases and tools provide useful resources for human genome analysis via high-throughput single-nucleotide-polymorphism arrays. The ALOHA software was written in R and R GUI and can be downloaded at http://www.stat.sinica.edu.tw/hsinchou/genetics/aloha/ALOHA.htm.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan.
| | | | | | | | | | | | | |
Collapse
|
14
|
Knight J, Saccone SF, Zhang Z, Ballinger DG, Rice JP. A comparison of association statistics between pooled and individual genotypes. Hum Hered 2009; 67:219-25. [PMID: 19172081 DOI: 10.1159/000194975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Accepted: 07/25/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Markers for individual genotyping can be selected using quantitative genotyping of pooled DNA. This strategy saves time and money. METHODS To determine the efficacy of this approach, we investigated the bivariate distribution of association test statistics from pooled and individual genotypes. We used a sample of approximately 1,000 samples with individual and pooled genotyping on 40,000 SNPs. RESULTS AND CONCLUSIONS We found that the distribution of the joint test statistics can be modelled as a mixture of two bivariate normal distributions. One distribution has a correlation of zero, and is probably due to SNPs whose pooled genotyping was unsuccessful. The other distribution has a correlation of approximately 0.65 in our data. This latter distribution is probably accounted for by SNPs whose pooled genotyping accurately predicts the underlying allele frequency. Approximately 87% of the data belongs to this distribution. We also derived a method to investigate the effect of both the correlation and selection cut-off on the relative power of pooling studies. We demonstrate that pooled genotyping has good power to detect SNPs that are truly associated with disease-causing variants for SNPs showing good correlation between pooled and individual genotyping. Therefore, this approach is a cost effective tool for association studies.
Collapse
Affiliation(s)
- Jo Knight
- Social Genetic & Developmental Psychiatry MRC Centre, Institute of Psychiatry, Kings College London, London, UK.
| | | | | | | | | |
Collapse
|
15
|
Chi XF, Lou XY, Yang MCK, Shu QY. An optimal DNA pooling strategy for progressive fine mapping. Genetica 2008; 135:267-81. [PMID: 18506582 DOI: 10.1007/s10709-008-9275-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2007] [Accepted: 05/08/2008] [Indexed: 11/28/2022]
Abstract
We present a cost-effective DNA pooling strategy for fine mapping of a single Mendelian gene in controlled crosses. The theoretical argument suggests that it is potentially possible for a single-stage pooling approach to reduce the overall experimental expense considerably by balancing costs for genotyping and sample collection. Further, the genotyping burden can be reduced through multi-stage pooling. Numerical results are provided for practical guidelines. For example, the genotyping effort can be reduced to only a small fraction of that needed for individual genotyping at a small loss of estimation accuracy or at a cost of increasing sample sizes slightly when recombination rates are 0.5% or less. An optimal two-stage pooling scheme can reduce the amount of genotyping to 19.5%, 14.5% and 6.4% of individual genotyping efforts for identifying a gene within 1, 0.5, and 0.1 cM, respectively. Finally, we use a genetic data set for mapping the rice xl(t) gene to demonstrate the feasibility and efficiency of the DNA pooling strategy. Taken together, the results demonstrate that this DNA pooling strategy can greatly reduce the genotyping burden and the overall cost in fine mapping experiments.
Collapse
Affiliation(s)
- Xiao-Fei Chi
- IAEA-Zhejiang University Collaborating Center and National Key Laboratory of Rice Biology, Institute of Nuclear Agricultural Sciences, Zhejiang University, 268 Kaixuan Road, Huajia Pool Campus, Hangzhou, 310029, People's Republic of China
| | | | | | | |
Collapse
|
16
|
Yang HC, Huang MC, Li LH, Lin CH, Yu ALT, Diccianni MB, Wu JY, Chen YT, Fann CSJ. MPDA: microarray pooled DNA analyzer. BMC Bioinformatics 2008; 9:196. [PMID: 18412951 PMCID: PMC2387178 DOI: 10.1186/1471-2105-9-196] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 04/15/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microarray-based pooled DNA experiments that combine the merits of DNA pooling and gene chip technology constitute a pivotal advance in biotechnology. This new technique uses pooled DNA, thereby reducing costs associated with the typing of DNA from numerous individuals. Moreover, use of an oligonucleotide gene chip reduces costs related to processing various DNA segments (e.g., primers, reagents). Thus, the technique provides an overall cost-effective solution for large-scale genomic/genetic research. However, few publicly shared tools are available to systematically analyze the rapidly accumulating volume of whole-genome pooled DNA data. RESULTS We propose a generalized concept of pooled DNA and present a user-friendly tool named Microarray Pooled DNA Analyzer (MPDA) that we developed to analyze hybridization intensity data from microarray-based pooled DNA experiments. MPDA enables whole-genome DNA preferential amplification/hybridization analysis, allele frequency estimation, association mapping, allelic imbalance detection, and permits integration with shared data resources online. Graphic and numerical outputs from MPDA support global and detailed inspection of large amounts of genomic data. Four whole-genome data analyses are used to illustrate the major functionalities of MPDA. The first analysis shows that MPDA can characterize genomic patterns of preferential amplification/hybridization and provide calibration information for pooled DNA data analysis. The second analysis demonstrates that MPDA can accurately estimate allele frequencies. The third analysis indicates that MPDA is cost-effective and reliable for association mapping. The final analysis shows that MPDA can identify regions of chromosomal aberration in cancer without paired-normal tissue. CONCLUSION MPDA, the software that integrates pooled DNA association analysis and allelic imbalance analysis, provides a convenient analysis system for extensive whole-genome pooled DNA data analysis. The software, user manual and illustrated examples are freely available online at the MPDA website listed in the Availability and requirements section.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan.
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Hwang SH, Oh HB, Choi SE, Hong SP, Yoo W. Effective screening of informative single nucleotide polymorphisms using the novel method of restriction fragment mass polymorphism. J Int Med Res 2008; 35:827-35. [PMID: 18034996 DOI: 10.1177/147323000703500611] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Restriction fragment mass polymorphism (RFMP) was applied to pooled DNA for selecting informative single nucleotide polymorphisms (SNPs). A total of 225 coding non-synonymous SNPs (cnSNPs) from immunomodulating genes known to be involved in the pathogenesis of asthma were selected from the National Center for Biotechnology Information's (NCBI) SNP database (dbSNP). DNA samples from 200 healthy Koreans were pooled, amplified by polymerase chain reaction, digested with restriction enzymes and the fragments analysed by mass spectrometry. Only 30 of the 225 cnSNPs (13.3%) were informative, i.e.had a minor allele frequency>10%. The percentage of informative cnSNPs varied according to the validation status of the dbSNP, being 42.3% (22/52) when validated by multiple submissions and frequency data, 8.7% (2/23) when validated by multiple submissions alone and 9.1% (3/33) when validated by frequency data alone. Most of the 112 unvalidated cnSNPs were not informative. In conclusion, the RFMP method using pooled DNA is useful in selecting informative SNPs, as also is validation status in the dbSNP.
Collapse
Affiliation(s)
- S-H Hwang
- Department of Laboratory Medicine, Pusan National University Hospital, Busan, Republic of Korea
| | | | | | | | | |
Collapse
|
18
|
Abstract
The genetic dissection of complex disorders via genetic marker data has gained popularity in the postgenome era. Methods for typing genetic markers on human chromosomes continue to improve. Compared with the popular individual genotyping experiment, a pooled-DNA experiment (alleotyping experiment) is more cost effective when carrying out genetic typing. This chapter provides an overview of association mapping using pooled DNA and describes a five-stage study design including the preliminary calibration of peak intensities, estimation of allele frequency, single-locus association mapping, multilocus association mapping, and a confirmation study. Software and an analysis of authentic data are presented. The strengths and weaknesses of pooled-DNA analyses, as well as possible future applications for this method, are discussed.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
| | | |
Collapse
|
19
|
Papa R, Bellucci E, Rossi M, Leonardi S, Rau D, Gepts P, Nanni L, Attene G. Tagging the signatures of domestication in common bean (Phaseolus vulgaris) by means of pooled DNA samples. ANNALS OF BOTANY 2007; 100:1039-51. [PMID: 17673468 PMCID: PMC2759209 DOI: 10.1093/aob/mcm151] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
BACKGROUND AND AIMS The main aim of this study was to use an amplified fragment length polymorphism (AFLP)-based, large-scale screening of the whole genome of Phaseolus vulgaris to determine the effects of selection on the structure of the genetic diversity in wild and domesticated populations. METHODS Using pooled DNA samples, seven each of wild and domesticated populations of P. vulgaris were studied using 2506 AFLP markers (on average, one every 250 kb). About 10 % of the markers were also analysed on individual genotypes and were used to infer allelic frequencies empirically from bulk data. In both data sets, tests were made to determine the departure from neutral expectation for each marker using an F(ST)-based method. KEY RESULTS The most important outcome is that a large fraction of the genome of the common bean (16 %; P < 0.01) appears to have been subjected to effects of selection during domestication. Markers obtained in individual genotypes were also mapped and classified according to their proximities to known genes and quantitative trait loci (QTLs) of the domestication syndrome. Most of the markers that were found to be potentially under the effects of selection were located in the proximity of previously mapped genes and QTLs related to the domestication syndrome. CONCLUSIONS Overall, the results indicate that in P. vulgaris a large portion of the genome appears to have been subjected to the effects of selection, probably because of linkage to the loci selected during domestication. As most of the markers that are under the effects of selection are linked to known loci related to the domestication syndrome, it is concluded that population genomics approaches are very efficient in detecting QTLs. A method based on bulk DNA samples is presented that is effective in pre-screening for a large number of markers to determine selection signatures.
Collapse
Affiliation(s)
- Roberto Papa
- Dipartimento di Scienze degli Alimenti, Università Politecnica delle Marche, Via Brecce Bianche, 60131 Ancona, Italy.
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Johnson T. Bayesian method for gene detection and mapping, using a case and control design and DNA pooling. Biostatistics 2006; 8:546-65. [PMID: 16984977 DOI: 10.1093/biostatistics/kxl028] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Association mapping studies aim to determine the genetic basis of a trait. A common experimental design uses a sample of unrelated individuals classified into 2 groups, for example cases and controls. If the trait has a complex genetic basis, consisting of many quantitative trait loci (QTLs), each group needs to be large. Each group must be genotyped at marker loci covering the region of interest; for dense coverage of a large candidate region, or a whole-genome scan, the number of markers will be very large. The total amount of genotyping required for such a study is formidable. A laboratory effort efficient technique called DNA pooling could reduce the amount of genotyping required, but the data generated are less informative and require novel methods for efficient analysis. In this paper, a Bayesian statistical analysis of the classic model of McPeek and Strahs is proposed. In contrast to previous work on this model, I assume that data are collected using DNA pooling, so individual genotypes are not directly observed, and also account for experimental errors. A complete analysis can be performed using analytical integration, a propagation algorithm for a hidden Markov model, and quadrature. The method developed here is both statistically and computationally efficient. It allows simultaneous detection and mapping of a QTL, in a large-scale association mapping study, using data from pooled DNA. The method is shown to perform well on data sets simulated under a realistic coalescent-with-recombination model, and is shown to outperform classical single-point methods. The method is illustrated on data consisting of 27 markers in an 880-kb region around the CYP2D6 gene.
Collapse
Affiliation(s)
- Toby Johnson
- School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3JT, UK.
| |
Collapse
|
21
|
Yang HC, Liang YJ, Huang MC, Li LH, Lin CH, Wu JY, Chen YT, Fann C. A genome-wide study of preferential amplification/hybridization in microarray-based pooled DNA experiments. Nucleic Acids Res 2006; 34:e106. [PMID: 16931491 PMCID: PMC1616968 DOI: 10.1093/nar/gkl446] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2006] [Revised: 05/05/2006] [Accepted: 06/09/2006] [Indexed: 01/27/2023] Open
Abstract
Microarray-based pooled DNA methods overcome the cost bottleneck of simultaneously genotyping more than 100 000 markers for numerous study individuals. The success of such methods relies on the proper adjustment of preferential amplification/hybridization to ensure accurate and reliable allele frequency estimation. We performed a hybridization-based genome-wide single nucleotide polymorphisms (SNPs) genotyping analysis to dissect preferential amplification/hybridization. The majority of SNPs had less than 2-fold signal amplification or suppression, and the lognormal distributions adequately modeled preferential amplification/hybridization across the human genome. Comparative analyses suggested that the distributions of preferential amplification/hybridization differed among genotypes and the GC content. Patterns among different ethnic populations were similar; nevertheless, there were striking differences for a small proportion of SNPs, and a slight ethnic heterogeneity was observed. To fulfill appropriate and gratuitous adjustments, databases of preferential amplification/hybridization for African Americans, Caucasians and Asians were constructed based on the Affymetrix GeneChip Human Mapping 100 K Set. The robustness of allele frequency estimation using this database was validated by a pooled DNA experiment. This study provides a genome-wide investigation of preferential amplification/hybridization and suggests guidance for the reliable use of the database. Our results constitute an objective foundation for theoretical development of preferential amplification/hybridization and provide important information for future pooled DNA analyses.
Collapse
Affiliation(s)
- H.-C. Yang
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - Y.-J. Liang
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - M.-C. Huang
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - L.-H. Li
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - C.-H. Lin
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - J.-Y. Wu
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - Y.-T. Chen
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| | - C.S.J. Fann
- Institute of Biomedical Sciences, Academia SinicaTaipei 115, Taiwan
| |
Collapse
|
22
|
Goldstein O, Zangerl B, Pearce-Kelling S, Sidjanin DJ, Kijas JW, Felix J, Acland GM, Aguirre GD. Linkage disequilibrium mapping in domestic dog breeds narrows the progressive rod-cone degeneration interval and identifies ancestral disease-transmitting chromosome. Genomics 2006; 88:541-50. [PMID: 16859891 PMCID: PMC4006154 DOI: 10.1016/j.ygeno.2006.05.013] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2006] [Revised: 05/30/2006] [Accepted: 05/31/2006] [Indexed: 11/16/2022]
Abstract
Canine progressive rod-cone degeneration (prcd) is a retinal disease previously mapped to a broad, gene-rich centromeric region of canine chromosome 9. As allelic disorders are present in multiple breeds, we used linkage disequilibrium (LD) to narrow the approximately 6.4-Mb interval candidate region. Multiple dog breeds, each representing genetically isolated populations, were typed for SNPs and other polymorphisms identified from BACs. The candidate region was initially localized to a 1.5-Mb zero recombination interval between growth factor receptor-bound protein 2 (GRB2) and SEC14-like 1 (SEC14L). A fine-scale haplotype of the region was developed, which reduced the LD interval to 106 kb and identified a conserved haplotype of 98 polymorphisms present in all prcd-affected chromosomes from 14 different dog breeds. The findings strongly suggest that a common ancestor transmitted the prcd disease allele to many of the modern dog breeds and demonstrate the power of the LD approach in the canine model.
Collapse
Affiliation(s)
- Orly Goldstein
- James A. Baker Institute, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Barbara Zangerl
- Section of Medical Genetics, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sue Pearce-Kelling
- James A. Baker Institute, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Duska J. Sidjanin
- Department of Ophthalmology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - James W. Kijas
- CSIRO Livestock Industries, Brisbane, Queensland, Australia
| | - Jeanette Felix
- OptiGen, LLC, Cornell Business & Technology Park; Ithaca, NY, USA
| | - Gregory M Acland
- James A. Baker Institute, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Gustavo D. Aguirre
- Department of Ophthalmology, Medical College of Wisconsin, Milwaukee, WI, USA
- corresponding author: Gustavo D. Aguirre, Section of Medical Genetics, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA; phone: 215-898-4667; fax: 215-573-2162;
| |
Collapse
|
23
|
Chowdari KV, Northup A, Pless L, Wood J, Joo YH, Mirnics K, Lewis DA, Levitt PR, Bacanu SA, Nimgaonkar VL. DNA pooling: a comprehensive, multi-stage association analysis of ACSL6 and SIRT5 polymorphisms in schizophrenia. GENES BRAIN AND BEHAVIOR 2006; 6:229-39. [PMID: 16827919 DOI: 10.1111/j.1601-183x.2006.00251.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Many candidate gene association studies have evaluated incomplete, unrepresentative sets of single nucleotide polymorphisms (SNPs), producing non-significant results that are difficult to interpret. Using a rapid, efficient strategy designed to investigate all common SNPs, we tested associations between schizophrenia and two positional candidate genes: ACSL6 (Acyl-Coenzyme A synthetase long-chain family member 6) and SIRT5 (silent mating type information regulation 2 homologue 5). We initially evaluated the utility of DNA sequencing traces to estimate SNP allele frequencies in pooled DNA samples. The mean variances for the DNA sequencing estimates were acceptable and were comparable to other published methods (mean variance: 0.0008, range 0-0.0119). Using pooled DNA samples from cases with schizophrenia/schizoaffective disorder (Diagnostic and Statistical Manual of Mental Disorders edition IV criteria) and controls (n=200, each group), we next sequenced all exons, introns and flanking upstream/downstream sequences for ACSL6 and SIRT5. Among 69 identified SNPs, case-control allele frequency comparisons revealed nine suggestive associations (P<0.2). Each of these SNPs was next genotyped in the individual samples composing the pools. A suggestive association with rs 11743803 at ACSL6 remained (allele-wise P=0.02), with diminished evidence in an extended sample (448 cases, 554 controls, P=0.062). In conclusion, we propose a multi-stage method for comprehensive, rapid, efficient and economical genetic association analysis that enables simultaneous SNP detection and allele frequency estimation in large samples. This strategy may be particularly useful for research groups lacking access to high throughput genotyping facilities. Our analyses did not yield convincing evidence for associations of schizophrenia with ACSL6 or SIRT5.
Collapse
Affiliation(s)
- K V Chowdari
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Yu A, Geng H, Zhou X. Quantify single nucleotide polymorphism (SNP) ratio in pooled DNA based on normalized fluorescence real-time PCR. BMC Genomics 2006; 7:143. [PMID: 16764712 PMCID: PMC1552069 DOI: 10.1186/1471-2164-7-143] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2005] [Accepted: 06/09/2006] [Indexed: 12/02/2022] Open
Abstract
Background Conventional real-time PCR to quantify the allele ratio in pooled DNA mainly depends on PCR amplification efficiency determination and Ct value, which is defined as the PCR cycle number at which the fluorescence emission exceeds the fixed threshold. Because of the nature of exponential calculation, slight errors are multiplied and the variations of the results seem too large. We have developed a new PCR data point analysis strategy for allele ratio quantification based on normalized fluorescence ratio. Results In our method, initial reaction background fluorescence was determined based upon fitting of raw fluorescence data to four-parametric sigmoid function. After that, each fluorescence data point was first subtracted by respective background fluorescence and then each subtracted fluorescence data point was divided by the specific background fluorescence to get normalized fluorescence. By relating the normalized fluorescence ratio to the premixed known allele ratio of two alleles in standard samples, standard linear regression equation was generated, from which unknown specimens allele ratios were extrapolated using the measured normalized fluorescence ratio. In this article, we have compared the results of the proposed method with those of baseline subtracted fluorescence ratio method and conventional Ct method. Conclusion Results demonstrated that the proposed method could improve the reliability, precision, and repeatability for quantifying allele ratios. At the same time, it has the potential of fully automatic allelic ratio quantification.
Collapse
Affiliation(s)
- Airong Yu
- Department of Biology, Huaiyin Teachers College, 71 Jiao tong Road, Huai'an, Jiangsu Province, 223001, P.R. China
| | - Haifeng Geng
- Center of Marine Biotechnology, University of Maryland Biotechnology Institute, MD, 21202, USA
| | - Xuerui Zhou
- Department of Biology, Huaiyin Teachers College, 71 Jiao tong Road, Huai'an, Jiangsu Province, 223001, P.R. China
| |
Collapse
|
25
|
Yang HC, Pan CC, Lin CY, Fann CSJ. PDA: Pooled DNA analyzer. BMC Bioinformatics 2006; 7:233. [PMID: 16643673 PMCID: PMC1539032 DOI: 10.1186/1471-2105-7-233] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2005] [Accepted: 04/28/2006] [Indexed: 11/19/2022] Open
Abstract
Background Association mapping using abundant single nucleotide polymorphisms is a powerful tool for identifying disease susceptibility genes for complex traits and exploring possible genetic diversity. Genotyping large numbers of SNPs individually is performed routinely but is cost prohibitive for large-scale genetic studies. DNA pooling is a reliable and cost-saving alternative genotyping method. However, no software has been developed for complete pooled-DNA analyses, including data standardization, allele frequency estimation, and single/multipoint DNA pooling association tests. This motivated the development of the software, 'PDA' (Pooled DNA Analyzer), to analyze pooled DNA data. Results We develop the software, PDA, for the analysis of pooled-DNA data. PDA is originally implemented with the MATLAB® language, but it can also be executed on a Windows system without installing the MATLAB®. PDA provides estimates of the coefficient of preferential amplification and allele frequency. PDA considers an extended single-point association test, which can compare allele frequencies between two DNA pools constructed under different experimental conditions. Moreover, PDA also provides novel chromosome-wide multipoint association tests based on p-value combinations and a sliding-window concept. This new multipoint testing procedure overcomes a computational bottleneck of conventional haplotype-oriented multipoint methods in DNA pooling analyses and can handle data sets having a large pool size and/or large numbers of polymorphic markers. All of the PDA functions are illustrated in the four bona fide examples. Conclusion PDA is simple to operate and does not require that users have a strong statistical background. The software is available at .
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, 115, Taiwan
| | - Chia-Ching Pan
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, 115, Taiwan
| | - Chin-Yu Lin
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, 115, Taiwan
| | - Cathy SJ Fann
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, 115, Taiwan
| |
Collapse
|
26
|
Brohede J, Dunne R, McKay JD, Hannan GN. PPC: an algorithm for accurate estimation of SNP allele frequencies in small equimolar pools of DNA using data from high density microarrays. Nucleic Acids Res 2005; 33:e142. [PMID: 16199750 PMCID: PMC1240117 DOI: 10.1093/nar/gni142] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Robust estimation of allele frequencies in pools of DNA has the potential to reduce genotyping costs and/or increase the number of individuals contributing to a study where hundreds of thousands of genetic markers need to be genotyped in very large populations sample sets, such as genome wide association studies. In order to make accurate allele frequency estimations from pooled samples a correction for unequal allele representation must be applied. We have developed the polynomial based probe specific correction (PPC) which is a novel correction algorithm for accurate estimation of allele frequencies in data from high-density microarrays. This algorithm was validated through comparison of allele frequencies from a set of 10 individually genotyped DNA's and frequencies estimated from pools of these 10 DNAs using GeneChip 10K Mapping Xba 131 arrays. Our results demonstrate that when using the PPC to correct for allelic biases the accuracy of the allele frequency estimates increases dramatically.
Collapse
Affiliation(s)
- Jesper Brohede
- CSIRO Preventative Health National Research FlagshipSydney, Australia
- CSIRO Molecular and Health TechnologiesSydney, Australia
| | - Rob Dunne
- CSIRO Preventative Health National Research FlagshipSydney, Australia
- CSIRO Mathematical and Information SciencesSydney, Australia
| | - James D. McKay
- Menzies Research Institute, University of TasmaniaHobart, Australia
- International Agency for Research on CancerLyon, France
| | - Garry N. Hannan
- CSIRO Preventative Health National Research FlagshipSydney, Australia
- CSIRO Molecular and Health TechnologiesSydney, Australia
- To whom correspondence should be addressed. Tel. +61 2 9490 5054; Fax +61 2 9490 5010;
| |
Collapse
|
27
|
Abstract
The genetic dissection of complex human diseases requires large-scale association studies which explore the population associations between genetic variants and disease phenotypes. DNA pooling can substantially reduce the cost of genotyping assays in these studies, and thus enables one to examine a large number of genetic variants on a large number of subjects. The availability of pooled genotype data instead of individual data poses considerable challenges in the statistical inference, especially in the haplotype-based analysis because of increased phase uncertainty. Here we present a general likelihood-based approach to making inferences about haplotype-disease associations based on possibly pooled DNA data. We consider cohort and case-control studies of unrelated subjects, and allow arbitrary and unequal pool sizes. The phenotype can be discrete or continuous, univariate or multivariate. The effects of haplotypes on disease phenotypes are formulated through flexible regression models, which allow a variety of genetic hypotheses and gene-environment interactions. We construct appropriate likelihood functions for various designs and phenotypes, accommodating Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We develop simple and efficient numerical algorithms for calculating the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. We assess the performance of the proposed methods through simulation studies, and provide an application to the Finland-United States Investigation of NIDDM Genetics Study. The results show that DNA pooling is highly efficient in studying haplotype-disease associations. As a by-product, this work provides valid and efficient methods for estimating haplotype-disease associations with unpooled DNA samples.
Collapse
Affiliation(s)
- D Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599-7420, USA
| | | |
Collapse
|
28
|
Buchholz R, Jones Dukes MD, Hecht S, Findley AM. Investigating the turkey's 'snood' as a morphological marker of heritable disease resistance. J Anim Breed Genet 2004. [DOI: 10.1111/j.1439-0388.2004.00449.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
Fingerlin TE, Boehnke M, Abecasis GR. Increasing the power and efficiency of disease-marker case-control association studies through use of allele-sharing information. Am J Hum Genet 2004; 74:432-43. [PMID: 14752704 PMCID: PMC1182257 DOI: 10.1086/381652] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2003] [Accepted: 11/20/2003] [Indexed: 12/20/2022] Open
Abstract
Case-control disease-marker association studies are often used in the search for variants that predispose to complex diseases. One approach to increasing the power of these studies is to enrich the case sample for individuals likely to be affected because of genetic factors. In this article, we compare three case-selection strategies that use allele-sharing information with the standard strategy that selects a single individual from each family at random. In affected sibship samples, we show that, by carefully selecting sibships and/or individuals on the basis of allele sharing, we can increase the frequency of disease-associated alleles in the case sample. When these cases are compared with unrelated controls, the difference in the frequency of the disease-associated allele is therefore also increased. We find that, by choosing the affected sib who shows the most evidence for pairwise allele sharing with the other affected sibs in families, the test statistic is increased by >20%, on average, for additive models with modest genotype relative risks. In addition, we find that the per-genotype information associated with the allele sharing-based strategies is increased compared with that associated with random selection of a sib for genotyping. Even though we select sibs on the basis of a nonparametric statistic, the additional gain for selection based on the unknown underlying mode of inheritance is minimal. We show that these properties hold even when the power to detect linkage to a region in the entire sample is negligible. This approach can be extended to more-general pedigree structures and quantitative traits.
Collapse
Affiliation(s)
- Tasha E Fingerlin
- Department of Epidemiology, School of Public Health, and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| | | | | |
Collapse
|
30
|
Shi M, Caprau D, Dagle J, Christiansen L, Christensen K, Murray JC. Application of kinetic polymerase chain reaction and molecular beacon assays to pooled analyses and high-throughput genotyping for candidate genes. ACTA ACUST UNITED AC 2004; 70:65-74. [PMID: 14991913 DOI: 10.1002/bdra.10153] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND The addition of DNA analysis to epidemiologic studies that have traditionally incorporated demographic and interview data can provide additional power and open new avenues for investigation. DNA can be obtained from a variety of tissues, but each has attendant variation in sample quantity, quality, and cost of acquisition. Analytic approaches for DNA genotyping are under constant development, but current applications allow small amounts (less than 2 ng per assay) of DNA to be used for genotyping. METHODS In this report, we designed effective assays for a spectrum of genes using either kinetic polymerase chain reaction (PCR) or molecular beacon applications. We also investigated the extent to which DNA use and reagent cost could be minimized. Kinetic PCR assays were also applied to investigate the potential of pooled sample analysis. RESULTS Our results show that small amounts of DNA can be successfully amplified in a high-throughput fashion using both kinetic PCR and molecular beacon methods. Greater than 97% of the genotype results from these two methods are consistent. In addition, error rates in allele frequency measurements using DNA pools of 100 or more samples were often less than 1% and usually less than 3%, which provides another option for substantially minimizing the costs of genotyping in studies involving large numbers of individuals. CONCLUSIONS Effective assays have been designed for a spectrum of genes widely studied in birth defects, including: MTHFR, NAT1, TGFA, RFC1, PAX9, EPHX1, and SKI. An efficient assay has been designed for the detection of the presence of X and Y chromosomes, which can be applied to the studies of sex chromosome abnormalities or sample quality control.
Collapse
Affiliation(s)
- Min Shi
- Department of Pediatrics, University of Iowa, Iowa City, Iowa 52242, USA
| | | | | | | | | | | |
Collapse
|
31
|
Abstract
Systematic analysis of the genetic background of complex diseases using single nucleotide polymorphisms (SNPs) affords a tremendous amount of genotypings. To reduce the amount of genotypings necessary and hence the overall cost of a case-control study with SNPs, the genotyping is often performed in two stages. In the first, the DNA of all cases and all controls are mixed into two pools and genotyped for each SNP. The frequency of both alleles is determined in both pooled DNA samples. If different frequencies are observed in the pools of cases and controls, genotyping is performed individually in the second stage and analyzed conventionally. However, so far no well-founded algorithm is available to guide the decision on whether to genotype a SNP individually. In this report, an approach is introduced for the decision on individual genotyping based on the results from pooled DNA. The analysis is modeled as a decision process with the specific goal to decide on whether to genotype a specific SNP individually. For a given situation, the resulting decision criteria are aimed to be optimal for those conducting the study. Different loss functions and decision rules are presented. Using Monte-Carlo simulations, we show that for a given situation, the genotyping rates and hence the costs can be reduced remarkably while maintaining acceptable overall error rates.
Collapse
Affiliation(s)
- Inke R König
- Centre for Genetic Epidemiological Methods, Institute of Medical Biometry and Statistics, University at Lübeck, Lübeck, Germany
| | | |
Collapse
|
32
|
Hoh J, Matsuda F, Peng X, Markovic D, Lathrop MG, Ott J. SNP haplotype tagging from DNA pools of two individuals. BMC Bioinformatics 2003; 4:14. [PMID: 12709267 PMCID: PMC156884 DOI: 10.1186/1471-2105-4-14] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2002] [Accepted: 04/22/2003] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND DNA pooling is a technique to reduce genotyping effort while incurring only minor losses in accuracy of allele frequency estimates for single nucleotide polymorphism (SNP) markers. RESULTS We present an algorithm for reconstructing haplotypes (alleles for multiple SNPs on same chromosome) from pools of two individual DNAs, in which Hardy-Weinberg equilibrium conditions or other assumptions are not required. The program outputs, in addition to inferred haplotypes, a minimal number of haplotype-tagging SNPs that are identified after an exhaustive search procedure. CONCLUSION Our method and algorithms lead to a significant reduction in genotyping effort, for example, in case-control disease association studies while maintaining the possibility of reconstructing haplotypes under very general conditions.
Collapse
Affiliation(s)
- Josephine Hoh
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA
| | | | - Xu Peng
- Centre National de Génotypage, 91057 Evry, France
| | - Daniela Markovic
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA
| | | | - Jurg Ott
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA
| |
Collapse
|
33
|
Abstract
Genome-wide association studies may be necessary to identify genes underlying certain complex diseases. Because such studies can be extremely expensive, DNA pooling has been introduced, as it may greatly reduce the genotyping burden. Parallel to DNA pooling developments, the importance of haplotypes in genetic studies has been amply demonstrated in the literature. However, DNA pooling of a large number of samples may lose haplotype information among tightly linked genetic markers. Here, we examine the cost-effectiveness of DNA pooling in the estimation of haplotype frequencies from population data. When the maximum likelihood estimates of haplotype frequencies are obtained from pooled samples, we compare the overall cost of the study, including both DNA collection and marker genotyping, between the individual genotyping strategy and the DNA pooling strategy. We find that the DNA pooling of two individuals can be more cost-effective than individual genotypings, especially when a large number of haplotype systems are studied.
Collapse
Affiliation(s)
- Shuang Wang
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut 06520-8034, USA
| | | | | |
Collapse
|
34
|
Mohlke KL, Erdos MR, Scott LJ, Fingerlin TE, Jackson AU, Silander K, Hollstein P, Boehnke M, Collins FS. High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools. Proc Natl Acad Sci U S A 2002; 99:16928-33. [PMID: 12482934 PMCID: PMC139246 DOI: 10.1073/pnas.262661399] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To facilitate positional cloning of complex trait susceptibility loci, we are investigating methods to reduce the effort required to identify trait-associated alleles. We examined primer extension analysis by matrix-assisted laser desorptionionization time-of-flight mass spectrometry to screen single-nucleotide polymorphisms (SNPs) for association by using DNA pools. We tested whether this method can accurately estimate allele frequency differences between pools while maintaining the high-throughput nature of assay design, sample handling, and scoring. We follow up interesting allele frequency differences in pools by genotyping individuals. We tested DNA pools of 182, 228, and 499 individuals using 16 SNPs with minor allele frequencies 0.026-0.486 and allele frequency differences 0.001-0.108 that we had genotyped previously on individuals and 381 SNPs that we had not. Precision, as measured by the average standard deviation among 16 semidependent replicates, was 0.021 +/- 0.011 for the 16 SNPs and 0.018 +/- 0.008 for the 291381 SNPs used in further analysis. For the 16 SNPs, the average absolute error in predicting allele frequency differences between pools was 0.009; the largest errors were 0.031, 0.028, and 0.027. We determined that compensating for unequal peak heights in heterozygotes improved precision of allele frequency estimates but had only a very minor effect on accuracy of allele frequency differences between pools. Based on these data and assuming pools of 500 individuals, we conclude that at significance level 0.05 we would have 95% (82%) power to detect population allele frequency differences of 0.07 for control allele frequencies of 0.10 (0.50).
Collapse
Affiliation(s)
- Karen L Mohlke
- Genome Technology Branch, National Human Genome Research Institute, Bethesda, MD 20892, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Sham P, Bader JS, Craig I, O'Donovan M, Owen M. DNA Pooling: a tool for large-scale association studies. Nat Rev Genet 2002; 3:862-71. [PMID: 12415316 DOI: 10.1038/nrg930] [Citation(s) in RCA: 404] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
DNA pooling is a practical way to reduce the cost of large-scale association studies to identify susceptibility loci for common diseases. Pooling allows allele frequencies in groups of individuals to be measured using far fewer PCR reactions and genotyping assays than are used when genotyping individuals. Here, we discuss recent developments in quantitative genotyping assays and in the design and analysis of pooling studies. Sophisticated pooling designs are being developed that can take account of hidden population stratification, confounders and inter-loci interactions, and that allow the analysis of haplotypes.
Collapse
Affiliation(s)
- Pak Sham
- P080, Institute of Psychiatry, King's College, Denmark Hill, London SE5 8AF, UK.
| | | | | | | | | |
Collapse
|
36
|
Xu K, Lipsky RH, Mangal W, Ferro E, Goldman D. Single-Nucleotide Polymorphism Allele Frequencies Determined by Quantitative Kinetic Assay of Pooled DNA. Clin Chem 2002. [DOI: 10.1093/clinchem/48.9.1605] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Ke Xu
- Laboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, Rockville, MD 20852
| | - Robert H Lipsky
- Laboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, Rockville, MD 20852
| | - Walid Mangal
- Laboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, Rockville, MD 20852
| | - Erica Ferro
- Laboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, Rockville, MD 20852
| | - David Goldman
- Laboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, Rockville, MD 20852
| |
Collapse
|
37
|
Werner M, Sych M, Herbon N, Illig T, König IR, Wjst M. Large-scale determination of SNP allele frequencies in DNA pools using MALDI-TOF mass spectrometry. Hum Mutat 2002; 20:57-64. [PMID: 12112658 DOI: 10.1002/humu.10094] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
One of the major challenges in the near future is the identification of genes that contribute to complex disorders. Large scale association studies that utilize a dense map of single nucleotide polymorphisms (SNPs) have been considered as a valuable tool for this purpose. However, genome-wide screens are limited by costs of genotyping thousands of SNPs in a large number of individuals. Here we present a pooling strategy that enables high-throughput SNP validation and determination of allele frequencies in case and control populations. Quantitative analysis of allele frequencies of SNPs in DNA pools is based on matrix-assisted laser desorption/ionization time of flight (MALDI-TOF) mass spectrometry of primer extension assays. We demonstrate the accuracy and reliability of this approach on pools of eight previously genotyped individuals with an allele frequency representation in the range of 0.1 to 0.9. The accuracy of measured allele frequencies was shown in DNA pools of 142 to 186 individuals using additional markers. Allele frequencies determined from the pooled samples deviate from the real frequencies by about 3%. The described method reduces costs and time and enables genotyping of up to thousands of samples by taking advantage of the high-throughput MALDI-TOF technology.
Collapse
Affiliation(s)
- Monika Werner
- GSF-National Research Center for Environment and Health, Institute of Epidemiology, Neuherberg, Germany
| | | | | | | | | | | |
Collapse
|
38
|
Sheffield VC. Homozygosity mapping using pooled DNA. CURRENT PROTOCOLS IN HUMAN GENETICS 2001; Chapter 1:Unit 1.11. [PMID: 18428233 DOI: 10.1002/0471142905.hg0111s13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Genomic screening is a common and popular method for localizing human disease genes. Genotyping each individual sample can be a laborintensive and expensive process. This unit describes the technique of pooling DNA from many related affected individuals, genotyping the resulting pools, and scoring the results. This technique can greatly reduce the effort required to screen for genetic linkage and is particularly useful when applied to inbred populations. The unit includes a table showing a pool of human STRP markers developed by the Cooperative Human Linkage Center.
Collapse
|
39
|
Abstract
Assessing the association between DNA variants and disease has been used widely to identify regions of the genome and candidate genes that contribute to disease. However, there are numerous examples of associations that cannot be replicated, which has led to skepticism about the utility of the approach for common conditions. With the discovery of massive numbers of genetic markers and the development of better tools for genotyping, association studies will inevitably proliferate. Now is the time to consider critically the design of such studies, to avoid the mistakes of the past and to maximize their potential to identify new components of disease.
Collapse
Affiliation(s)
- L R Cardon
- University of Oxford, Nuffield Department of Clinical Medicine, Headington, Oxford OX3 9DU, UK.
| | | |
Collapse
|
40
|
Barhoumi C, Amouri R, Ben Hamida C, Ben Hamida M, Machghoul S, Gueddiche M, Hentati F. Linkage of a new locus for autosomal recessive axonal form of Charcot-Marie-Tooth disease to chromosome 8q21.3. Neuromuscul Disord 2001; 11:27-34. [PMID: 11166163 DOI: 10.1016/s0960-8966(00)00162-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
We report the clinical and genetic linkage analysis of a large Tunisian family with thirteen affected patients suffering from Charcot-Marie-Tooth disease with pyramidal involvement. The inheritance is autosomal recessive. The clinical phenotype is consistent in all patients. It is characterized by onset during the first decade, a progressive course and distal atrophy in all four limbs, associated with a mild pyramidal syndrome. Nerve biopsy in two patients showed severe axonal neuropathy. Genetic linkage excluded known loci of different genetic forms of Charcot-Marie-Tooth disease, familial spastic paraplegia and familial amyotrophic lateral sclerosis. A significant lod score was obtained with marker D8S286, confirming linkage to chromosome 8q21.3. The clinical syndrome observed in this family seems to correspond to a new genetic form of autosomal recessive Charcot-Marie-Tooth disease.
Collapse
Affiliation(s)
- C Barhoumi
- Hôpital Militaire Principal d'Instruction, Tunis, Tunisia.
| | | | | | | | | | | | | |
Collapse
|
41
|
Affiliation(s)
- L B Jorde
- Eccles Institute of Human Genetics, University of Utah Health Sciences Center, Salt Lake City, Utah 84112, USA.
| |
Collapse
|
42
|
Abstract
Human genetics is now at a critical juncture. The molecular methods used successfully to identify the genes underlying rare mendelian syndromes are failing to find the numerous genes causing more common, familial, non-mendelian diseases. With the human genome sequence nearing completion, new opportunities are being presented for unravelling the complex genetic basis of non-mendelian disorders based on large-scale genome-wide studies. Considerable debate has arisen regarding the best approach to take. In this review I discuss these issues, together with suggestions for optimal post-genome strategies.
Collapse
Affiliation(s)
- N J Risch
- Department of Genetics, Stanford University School of Medicine, California 94305-5120, USA
| |
Collapse
|
43
|
Germer S, Holland MJ, Higuchi R. High-throughput SNP allele-frequency determination in pooled DNA samples by kinetic PCR. Genome Res 2000; 10:258-66. [PMID: 10673283 PMCID: PMC310828 DOI: 10.1101/gr.10.2.258] [Citation(s) in RCA: 322] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/1999] [Accepted: 12/07/1999] [Indexed: 01/30/2023]
Abstract
We have developed an accurate, yet inexpensive and high-throughput, method for determining the allele frequency of biallelic polymorphisms in pools of DNA samples. The assay combines kinetic (real-time quantitative) PCR with allele-specific amplification and requires no post-PCR processing. The relative amounts of each allele in a sample are quantified. This is performed by dividing equal aliquots of the pooled DNA between two separate PCR reactions, each of which contains a primer pair specific to one or the other allelic SNP variant. For pools with equal amounts of the two alleles, the two amplifications should reach a detectable level of fluorescence at the same cycle number. For pools that contain unequal ratios of the two alleles, the difference in cycle number between the two amplification reactions can be used to calculate the relative allele amounts. We demonstrate the accuracy and reliability of the assay on samples with known predetermined SNP allele frequencies from 5% to 95%, including pools of both human and mouse DNAs using eight different SNPs altogether. The accuracy of measuring known allele frequencies is very high, with the strength of correlation between measured and known frequencies having an r(2) = 0.997. The loss of sensitivity as a result of measurement error is typically minimal, compared with that due to sampling error alone, for population samples up to 1000. We believe that by providing a means for SNP genotyping up to thousands of samples simultaneously, inexpensively, and reproducibly, this method is a powerful strategy for detecting meaningful polymorphic differences in candidate gene association studies and genome-wide linkage disequilibrium scans.
Collapse
Affiliation(s)
- S Germer
- Roche Molecular Systems, Alameda, California 94501 USA.
| | | | | |
Collapse
|
44
|
Sertié AL, Sousa AV, Steman S, Pavanello RC, Passos-Bueno MR. Linkage analysis in a large Brazilian family with van der Woude syndrome suggests the existence of a susceptibility locus for cleft palate at 17p11.2-11.1. Am J Hum Genet 1999; 65:433-40. [PMID: 10417286 PMCID: PMC1377942 DOI: 10.1086/302491] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
van der Woude syndrome (VWS), which has been mapped to 1q32-41, is characterized by pits and/or sinuses of the lower lip, cleft lip/palate (CL/P), cleft palate (CP), bifid uvula, and hypodontia (H). The expression of VWS, which has incomplete penetrance, is highly variable. Both the occurrence of CL/P and CP within the same genealogy and a recurrence risk <40% for CP among descendants with VWS have suggested that the development of clefts in this syndrome is influenced by modifying genes at other loci. To test this hypothesis, we have conducted linkage analysis in a large Brazilian kindred with VWS, considering as affected the individuals with CP, regardless of whether it is associated with other clinical signs of VWS. Our results suggest that a gene at 17p11.2-11.1, together with the VWS gene at 1p32-41, enhances the probability of CP in an individual carrying the two at-risk genes. If this hypothesis is confirmed in other VWS pedigrees, it will represent one of the first examples of a gene, mapped through linkage analysis, which modifies the expression of a major gene. It will also have important implications for genetic counseling, particularly for more accurately predicting recurrence risks of clefts among the offspring of patients with VWS.
Collapse
Affiliation(s)
- A L Sertié
- Departamento de Biologia, Instituto de Biociências, Universidade de São Paulo, Rua do Matão, 277, 05508-900, São Paulo, Brazil
| | | | | | | | | |
Collapse
|
45
|
Risch N, Teng J. The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. Genome Res 1998; 8:1273-88. [PMID: 9872982 DOI: 10.1101/gr.8.12.1273] [Citation(s) in RCA: 248] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
We consider statistics for analyzing a variety of family-based and nonfamily-based designs for detecting linkage disequilibrium of a marker with a disease susceptibility locus. These designs include sibships with parents, sibships without parents, and use of unrelated controls. We also provide formulas for and evaluate the relative power of different study designs using these statistics. In this first paper in the series, we derive statistical tests based on data derived from DNA pooling experiments and describe their characteristics. Although designs based on affected and unaffected sibs without parents are usually robust to population stratification, they suffer a loss of power compared with designs using parents or unrelateds as controls. Although increasing the number of unaffected sibs improves power, the increase is generally not substantial. Designs including sibships with multiple affected sibs are typically the most powerful, with any of these control groups, when the disease allele frequency is low. When the allele frequency is high, however, designs with unaffected sibs as controls do not retain this advantage. In designs with parents, having an affected parent has little impact on the power, except for rare dominant alleles, where the power is increased compared with families with no affected parents. Finally, we also demonstrate that for sibships with parents, only the parents require individual genotyping to derive the TDT statistic, whereas all the offspring can be pooled. This can potentially lead to considerable savings in genotyping, especially for multiplex sibships. The formulas and tables we derive should provide some guidance to investigators designing nuclear family-based linkage disequilibrium studies for complex diseases.
Collapse
Affiliation(s)
- N Risch
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA.
| | | |
Collapse
|
46
|
Tomita-Mitchell A, Muniappan BP, Herrero-Jimenez P, Zarbl H, Thilly WG. Single nucleotide polymorphism spectra in newborns and centenarians: identification of genes coding for rise of mortal disease. Gene X 1998; 223:381-91. [PMID: 9858772 DOI: 10.1016/s0378-1119(98)00408-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Abstract
Some single-nucleotide polymorphisms (SNPs) increase the risk of mortal disease. Identifying these SNPs and the genes in which they reside is an important area in human genomics. Such qualitative observations are important in themselves. However, an accurate assessment of the numerical distribution and age-dependent decline of SNPs in the population would permit calculation of the rises represented by each SNP. Such analyses have not been attempted because of a lack of an efficient and cost-effective method to detect multiple SNPs in a large number of individuals and a large number of genes. Here, we suggest the use of an analytical procedure that can scan for SNPs in 100-bp DNA sequences from as many as 10000 donors' blood cell samples, or 20000 alleles, simultaneously. Our suggestion is based on technology developed for studies of somatic mutations in human tissue DNA for point mutations at frequencies equal to or greater than 10(-6). In a simplified version of this technology, any SNP arising at frequencies at or above 5x10(-4) would be identified with useful precision. A gene would be represented by 10 or more sections of 100bp. This strategy includes splice-site mutations that represent a significant fraction of gene inactivating point mutations and would not be observed in strategies using cDNA. To illustrate the logic of the suggested approach, we use American mortality records to calculate the expected decrease in SNPs coding for premature mortality in newborns and centenarians. We consider several elementary cases: SNPs in one gene only, any of several genes, or all of several genes that create a risk of death by pancreatic cancer. The fraction of expressed polymorphisms affecting mortality should be simultaneously increased in probands and decreased in the aged relative to newborns. Silent polymorphisms in the same gene would remain unchanged in all three groups and serve as internal standards. A key point is that scanning a gene, in which loss of gene function creates the risk of mortality is expected to reveal not one, but multiple SNPs, which decline with age, as carriers die earlier in life than non-carriers. Several SNPs in a scanned gene would suggest that the decreasing SNP was genetically linked to a different polymorphism that creates the disease risk.
Collapse
Affiliation(s)
- A Tomita-Mitchell
- Division of Bioengineering and Environmental Health, Center for Environmental Health Sciences, Massachusetts Institute of Technology, 21 Ames St. Rm. 16-743, Cambridge, MA 02139, USA
| | | | | | | | | |
Collapse
|
47
|
Sheffield VC, Stone EM, Carmi R. Use of isolated inbred human populations for identification of disease genes. Trends Genet 1998; 14:391-6. [PMID: 9820027 DOI: 10.1016/s0168-9525(98)01556-x] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The genetic mapping of disease loci involves the use of patient phenotype and genotype data in the search for genetic markers that segregate, or are associated with, a trait or disorder. Genetically isolated populations offer many advantages for such studies. The high degree of inbreeding and/or founder effects in some small population isolates result in an increased incidence of recessive disorders. Monogenic disorders are less likely to show non-allelic heterogeneity in isolated populations than in more diverse populations. The use of isolated populations also reduces the complexity of polygenic disorders by reducing the number of loci probably involved in the disorder. Finally, a variety of strategies can be used with particular efficacy for the mapping of disease genes in isolated populations.
Collapse
Affiliation(s)
- V C Sheffield
- Howard Hughes Medical Institute, Iowa City 52242, USA.
| | | | | |
Collapse
|
48
|
Corrette-Bennett J, Rosenberg M, Przybylska M, Ananiev E, Straus D. Positional cloning without a genome map: using 'Targeted RFLP Subtraction' to isolate dense markers tightly linked to the regA locus of Volvox carteri. Nucleic Acids Res 1998; 26:1812-8. [PMID: 9512557 PMCID: PMC147462 DOI: 10.1093/nar/26.7.1812] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The ability to isolate genes defined by mutant phenotypes has fueled the rapid progress in understanding basic biological mechanisms and the causes of inherited diseases. Positional cloning, a commonly used method for isolating genes corresponding to mutations, is most efficiently applied to the small number of model organisms for which high resolution genetic maps exist. We demonstrate a new and generally applicable positional cloning method that obviates the need for a genetic map. The technique is based on Restriction Fragment Length Polymorphism (RFLP) Subtraction, a method that isolates RFLP markers spanning an entire genome. The new method, Targeted RFLP Subtraction (TRS), isolates markers from a specific region by combining RFLP Subtraction with a phenotypic pooling strategy. We used TRS to directly isolate dense markers tightly linked to the regA gene of the eukaryotic green alga Volvox. As a generally applicable method for saturating a small targeted region with DNA markers, TRS should facilitate gene isolation from diverse organisms and accelerate the process of physically mapping specific regions in preparation for sequence analysis.
Collapse
|
49
|
Shaw SH, Carrasquillo MM, Kashuk C, Puffenberger EG, Chakravarti A. Allele frequency distributions in pooled DNA samples: applications to mapping complex disease genes. Genome Res 1998; 8:111-23. [PMID: 9477339 DOI: 10.1101/gr.8.2.111] [Citation(s) in RCA: 84] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Genetic studies of complex hereditary disorders require for their mapping the determination of genotypes at several hundred polymorphic loci in several hundred families. Because only a minority of markers are expected to show linkage and association in family data, a simple screen of genetic markers to identify those showing linkage in pooled DNA samples can greatly facilitate gene identification. All studies involving pooled DNA samples require the comparison of allele frequencies in appropriate family samples and subsamples. We have tested the accuracy of allele frequency estimates, in various DNA samples, by pooling DNA from multiple individuals prior to PCR amplification. We have used the ABI 377 automated DNA sequencer and GENESCAN software for quantifying total amplification using a 5' fluorescently labeled forward PCR primer and relative peak heights to estimate allele frequencies in pooled DNA samples. In these studies, we have genotyped 11 microsatellite markers in two separate DNA pools, and an additional four markers in a third DNA pool, and compared the estimated allele frequencies with those determined by direct genotyping. In addition, we have evaluated whether pooled DNA samples can be used to accurately assess allele frequencies on transmitted and untransmitted chromosomes, in a collection of families for fine-structure gene mapping using allelic association. Our studies show that accurate, quantitative data on allele frequencies, suitable for identifying markers for complex disorders, can be identified from pooled DNA samples. This approach, being independent of the number of samples comprising a pool, promises to drastically reduce the labor and cost of genotyping in the initial identification of disease loci. Additional applications of DNA pooling are discussed. These developments suggest that new statistical methods for analyzing pooled DNA data are required.
Collapse
Affiliation(s)
- S H Shaw
- Department of Genetics and Center for Human Genetics, Case Western Reserve University School of Medicine and University Hospitals of Cleveland, Cleveland, Ohio 44106 USA
| | | | | | | | | |
Collapse
|
50
|
Esposito L, Lampasona V, Bonifacio E, Bosi E, Ferrari M. Lack of association of DMB polymorphism with insulin-dependent diabetes. J Autoimmun 1997; 10:395-400. [PMID: 9237803 DOI: 10.1006/jaut.1997.0144] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Considerable evidence exists that the genes coding for the HLA class II DQ molecules in the MHC region are major contributors to genetic susceptibility in insulin-dependent diabetes. Located centromeric to the DQ loci are the genes encoding DMA and DMB, two class II-like molecules which play an essential role in the pathway leading to antigen presentation by HLA class II. In this study we have examined the distribution of the DMB allele and studied HLA DQA1-DQB1-TAP2-DMB haplotypes in 52 IDDM families and 65 un-related controls. DMB allele frequencies in IDDM and control subjects were not significantly different. DMB*0101 was present in 85% of patients vs. 76% of controls, DMB*0102 in 12 vs. 17%, DMB*0103 in 3 vs. 5%, DMB*0104 in 0 vs. 2%. The IDDM-susceptible MHC DQA1-DQB1 haplotypes found by analysis of IDDM families were not associated with specific DMB alleles. We conclude that the described DMB polymorphisms are not associated with IDDM susceptibility and DMB genotyping is unlikely to improve the assessment of genetic risk for IDDM.
Collapse
Affiliation(s)
- L Esposito
- Department of Laboratory Medicine of the Istituto Scientifico San Raffaele, Milan, Italy
| | | | | | | | | |
Collapse
|