101
|
Hirota Y, Ohara T, Zenibayashi M, Kuno SI, Fukuyama K, Teranishi T, Kouyama K, Miyake K, Maeda E, Kasuga M. Lack of association of CPT1A polymorphisms or haplotypes on hepatic lipid content or insulin resistance in Japanese individuals with type 2 diabetes mellitus. Metabolism 2007; 56:656-61. [PMID: 17445541 DOI: 10.1016/j.metabol.2006.12.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2006] [Accepted: 12/18/2006] [Indexed: 10/23/2022]
Abstract
Accumulation of fat in the liver is associated with insulin resistance and type 2 diabetes mellitus. The carnitine palmitoyltransferase (CPT) enzyme system facilitates the transport of long-chain fatty acids into mitochondria, and the gene for the hepatic isoform of CPT1 (CPT1A) is a candidate gene for metabolic disorders such as insulin resistance associated with fatty liver. We have now investigated the contribution of the CPT1A locus to hepatic lipid content (HLC), insulin resistance, and susceptibility to type 2 diabetes mellitus. A total of 324 type 2 diabetic patients and 300 nondiabetic individuals were enrolled in the study. Eighty-seven of the type 2 diabetic patients who had not been treated with insulin or lipid-lowering drugs were evaluated by homeostasis model assessment for insulin resistance and were subjected to nuclear magnetic resonance for determination of HLC. A total of 19 single nucleotide polymorphisms (SNPs) were identified at the CPT1A locus, and linkage disequilibrium analysis revealed a strong linkage disequilibrium block between SNP8 (intron 5) and SNP17 (intron 14). Neither haplotypes nor SNPs of CPT1A were found to be associated either with susceptibility to type 2 diabetes mellitus or with HLC or insulin resistance in type 2 diabetic patients.
Collapse
Affiliation(s)
- Yushi Hirota
- Division of Diabetes and Digestive and Kidney Diseases, Department of Clinical Molecular Medicine, Kobe University Graduate School of Medicine, Kobe, Japan.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
102
|
Rakovski CS, Xu X, Lazarus R, Blacker D, Laird NM. A new multimarker test for family-based association studies. Genet Epidemiol 2007; 31:9-17. [PMID: 17086514 DOI: 10.1002/gepi.20186] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We propose a new multimarker test for family-based studies in candidate genes. We use simulations under different genetic models to assess the performance of competing testing strategies, characterized in this study as combinations of the following important factors: genes, statistical tests, tag single nucleotide polymorphisms (SNP) methods, number of tag SNPs and family designs. An ANOVA model is employed to provide descriptive summaries of the effects on power of the above-mentioned factors. We find that tag SNP methods, gene characteristics and family designs have minimal impact on the best testing strategy. The familywise error rate (FWER) controlling multiple comparison procedure and the new multimarker test offer the highest power followed by the asymptotic global haplotype test. Both the FWER and the multimarker test are invariant to family designs and gain power as we increase the number of tag SNPs. However, the performance of the global haplotype test is slightly degraded when analyzing larger numbers of tag SNPs. Within the framework of our study, the best strategy for family-based studies in candidate genes that emerged from our analysis is to use the FWER or the multimarker test and select 6-10 tag SNPs using any of the tag SNP methods considered. We confirm the conclusions of our study with an application to Alzheimer's disease data.
Collapse
Affiliation(s)
- Cyril S Rakovski
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
103
|
Mao W, He J, Brinza D, Zelikovsky A. A combinatorial method for predicting genetic susceptibility to complex diseases. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2006:224-7. [PMID: 17282153 DOI: 10.1109/iembs.2005.1616384] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recent improvements in the accessibility of high-throughput genotyping have brought a great deal of attention to disease association and susceptibility studies. This paper explores possibility of applying combinatorial methods to disease susceptibility prediction. The proposed combinatorial methods as well as standard statistical methods are applied to publicly available genotype data on Crohn's disease and autoimmune disorders for predicting susceptibility to these diseases. The quality of susceptibility prediction algorithm is assessed using leave-one-out and leave-many-out tests - the disease status of one or several individuals is predicted and compared to the their actual disease status which is initially made unknown to the algorithm. The best prediction rate achieved by the proposed algorithms is 77.78% for Crohn's disease and 64.99% for autoimmune disorders, respectively.
Collapse
Affiliation(s)
- Weidong Mao
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA. E-Mail:
| | | | | | | |
Collapse
|
104
|
Sun YV, Levin AM, Boerwinkle E, Robertson H, Kardia SLR. A scan statistic for identifying chromosomal patterns of SNP association. Genet Epidemiol 2007; 30:627-35. [PMID: 16858698 DOI: 10.1002/gepi.20173] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We have developed a single nucleotide polymorphism (SNP) association scan statistic that takes into account the complex distribution of the human genome variation in the identification of chromosomal regions with significant SNP associations. This scan statistic has wide applicability for genetic analysis, whether to identify important chromosomal regions associated with common diseases based on whole-genome SNP association studies or to identify disease susceptibility genes based on dense SNP positional candidate studies. To illustrate this method, we analyzed patterns of SNP associations on chromosome 19 in a large cohort study. Among 2,944 SNPs, we found seven regions that contained clusters of significantly associated SNPs. The average width of these regions was 35 kb with a range of 10-72 kb. We compared the scan statistic results to Fisher's product method using a sliding window approach, and detected 22 regions with significant clusters of SNP associations. The average width of these regions was 131 kb with a range of 10.1-615 kb. Given that the distances between SNPs are not taken into consideration in the sliding window approach, it is likely that a large fraction of these regions represents false positives. However, all seven regions detected by the scan statistic were also detected by the sliding window approach. The linkage disequilibrium (LD) patterns within the seven regions were highly variable indicating that the clusters of SNP associations were not due to LD alone. The scan statistic developed here can be used to make gene-based or region-based SNP inferences about disease association.
Collapse
Affiliation(s)
- Yan V Sun
- Department of Epidemiology, University of Michigan, 611 Church Street, Ann Arbor, MI 48104, USA
| | | | | | | | | |
Collapse
|
105
|
Zheng M, McPeek MS. Multipoint linkage-disequilibrium mapping with haplotype-block structure. Am J Hum Genet 2007; 80:112-25. [PMID: 17160899 PMCID: PMC1785316 DOI: 10.1086/510685] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2006] [Accepted: 11/07/2006] [Indexed: 01/17/2023] Open
Abstract
The HapMap Project is providing a great deal of new information on high-resolution haplotype structure in various human populations. This information has the potential to greatly increase the power of association mapping for a fixed amount of genotyping. A number of methods have been proposed for the identification of haplotype blocks, common haplotypes, and tagging single-nucleotide polymorphisms. Here, we build on this work by developing novel methods for case-control multipoint linkage-disequilibrium (LD) mapping that gain power and speed by making explicit use of the inferred block structure. Specifically, we developed a virtual-variant approach that uses the haplotype-block information to greatly increase power for detection of untyped common variants associated with a trait. Because full multipoint LD mapping can be slow, we exploited the haplotype-block information to develop a fast single-block multipoint mapping method. Our methods are appropriate for genotype data and take into account the uncertainty in phase. We describe the methods in the context of case-parents trios, although they are also applicable to unrelated cases and controls. Our simulations indicate that the most important gains from taking into account the haplotype-block structure at the analysis stage of multipoint LD mapping come from (1) greatly increased power to detect association with untyped variants and (2) greatly improved localization of untyped variants associated with the trait. More-modest gains are obtained in improving power to detect association with a variant that is typed with a moderate amount of missing data. The methods are applied to a Crohn disease data set.
Collapse
Affiliation(s)
- Maoxia Zheng
- Department of Statistics, University of Chicago, Chicago, IL, 60637, USA
| | | |
Collapse
|
106
|
YOSHIKAWA Y, NAKAYAMA T, SAITO K, HUI P, MORITA A, SATO N, TAKAHASHI T, TAMURA M, SATO I, AOI N, DOBA N, HINOHARA S, SOMA M, USAMI R. Haplotype-Based Case-Control Study of the Association between the Guanylate Cyclase Activator 2B (GUCA2B, Uroguanylin) Gene and Essential Hypertension. Hypertens Res 2007; 30:789-96. [DOI: 10.1291/hypres.30.789] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
107
|
Ding K, Kullo IJ. Methods for the selection of tagging SNPs: a comparison of tagging efficiency and performance. Eur J Hum Genet 2006; 15:228-36. [PMID: 17164795 DOI: 10.1038/sj.ejhg.5201755] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
There is great interest in the use of tagging single nucleotide polymorphisms (tSNPs) to facilitate association studies of complex diseases. This is based on the premise that a minimum set of tSNPs may be sufficient to capture most of the variation in certain regions of the human genome. Several methods have been described to select tSNPs, based on either haplotype-block structure or independent of the underlying block structure. In this paper, we compare eight methods for choosing tSNPs in 10 representative resequenced candidate genes (a total of 194.2 kb) with different levels of linkage disequilibrium (LD) in a sample of European-Americans. We compared tagging efficiency (TE) and prediction accuracy of tSNPs identified by these methods, as a function of several factors, including LD level, minor allele frequency, and tagging criteria. We also assessed tagging consistency between each method. We found that tSNPs selected based on the methods Haplotype Diversity and Haplotype r2 provided the highest TE, whereas the prediction accuracy was comparable among different methods. Tagging consistency between different methods of tSNPs selection was poor. This work demonstrates that when tSNPs-based association studies are undertaken, the choice of method for selecting tSNPs requires careful consideration.
Collapse
Affiliation(s)
- Keyue Ding
- Division of Cardiovascular Diseases, Mayo Clinic and Foundation, Rochester, MN 55905, USA
| | | |
Collapse
|
108
|
Ding J, Nicklas BJ, Fallin MD, de Rekeneire N, Kritchevsky SB, Pahor M, Rodondi N, Li R, Zmuda JM, Harris TB. Plasminogen activator inhibitor type 1 gene polymorphisms and haplotypes are associated with plasma plasminogen activator inhibitor type 1 levels but not with myocardial infarction or stroke. Am Heart J 2006; 152:1109-15. [PMID: 17161063 DOI: 10.1016/j.ahj.2006.06.021] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2005] [Accepted: 06/07/2006] [Indexed: 11/29/2022]
Abstract
BACKGROUND The 4G allele in the promoter region of the plasminogen activator inhibitor type 1 (PAI-1) gene is associated with higher plasma PAI-1 levels and activity, but its association with cardiovascular diseases is unclear. We investigated the association of polymorphisms and common haplotypes of the PAI-1 gene with plasma PAI-1 levels, as well as the risk of myocardial infarction and stroke. METHODS AND RESULTS This study is a prospective analysis of 2995 community-based participants (41% blacks and 51% women) aged 70 to 79 years old in the Health, Aging, and Body Composition Study. From 1997/1998 to 2001, 177 myocardial infarction events and 101 stroke events were identified. In addition to the 4G/5G polymorphism, 2 potential functional variants and other 4 haplotype-tagging variants were genotyped. In general linear models, the 4G allele was associated with higher PAI-1 levels after adjusting for age, sex, race, and site (26, 29, and 32 ng/mL for 5G/5G, 4G/5G, and 4G/4G, respectively; P for trend < .0001), but none of the other 6 polymorphisms was associated with PAI-1 levels. Haplotype analysis produced similar results. However, in Cox proportional hazard models, neither the polymorphisms nor the common haplotypes of the PAI-1 gene was associated with the risk of either myocardial infarction or stroke. CONCLUSIONS The 4G allele is associated with higher PAI-1 levels, but this study does not support an association of the PAI gene polymorphisms with the risk of either myocardial infarction or stroke.
Collapse
Affiliation(s)
- Jingzhong Ding
- Department of Internal Medicine/Geriatrics, Wake Forest University Baptist Medical Center, Winston-Salem, NC 27157, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
109
|
Menon R, Fortunato SJ, Thorsen P, Williams S. Genetic associations in preterm birth: a primer of marker selection, study design, and data analysis. ACTA ACUST UNITED AC 2006; 13:531-41. [PMID: 17088082 DOI: 10.1016/j.jsgi.2006.09.006] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2006] [Indexed: 01/16/2023]
Abstract
Spontaneous preterm birth (PTB; delivery before 37 weeks gestation) is a primary risk factor for infant morbidity and mortality. The etiology is unclear, but there is evidence that there is a genetic predisposition to PTB. Armed with the suggestion of genetic risk factors and the failure to identify useful biomarkers, investigators are starting to actively pursue the role of genetic predisposition in PTB. Several studies have been done to date assessing the role of single gene variants. However, positive findings have failed to replicate. We argue that heterogeneity in study designs, definition of phenotype, single-nucleotide polymorphism (SNP) selection, population selection, and sample size makes data interpretation difficult in complex phenotypes such as PTB. In this review, we introduce general concepts of study designs in genetic epidemiology, selection of candidate genes and markers for analysis, and analytical methodologies. We also introduce how the concept of gene-gene interactions (biologic epistasis) and gene-environment interactions may affect the predisposition to PTB.
Collapse
|
110
|
Moskvina V, Schmidt KM. Individual SNP allele reconstruction from informative markers selected by a non-linear Gauss-type algorithm. Hum Hered 2006; 62:97-106. [PMID: 17047339 DOI: 10.1159/000096097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2006] [Accepted: 08/02/2006] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES In view of the linkage disequilibrium structure of the genome, the selection of maximally informative SNP markers is a fundamental issue in the design of association studies. Currently used selection methods rely on pairwise marker correlation or informativity measures for subsets of markers. Nevertheless, the selected markers do not provide a completely satisfactory description of the individual remaining markers. The number of tag markers can be further reduced by using haplotypic information, but then the results of association analysis are difficult to interpret. METHODS AND RESULTS We propose a non-linear Gauss-type algorithm selecting a subset of markers which is optimal with respect to the informativity measures and allows an explicit reconstruction of all other known markers, thus permitting direct inference of allelic association. The selection is based on the haplotype distribution in the population, but can be adapted to work with unphased genotypes directly. CONCLUSIONS The proposed algorithm provides a rational methodology of informative marker selection, allowing for control and optimisation of information content and full marker reconstruction. Moreover, the reconstruction step can also be applied to tag markers selected using a different method at the stage of study design, identifying those markers which cannot be uniquely recovered from the chosen tags.
Collapse
Affiliation(s)
- Valentina Moskvina
- Department of Psychological Medicine, College of Medicine, Cardiff University, Cardiff, UK.
| | | |
Collapse
|
111
|
Li J, Zhang MQ, Zhang X. A new method for detecting human recombination hotspots and its applications to the HapMap ENCODE data. Am J Hum Genet 2006; 79:628-39. [PMID: 16960799 PMCID: PMC1592557 DOI: 10.1086/508066] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2006] [Accepted: 07/25/2006] [Indexed: 11/03/2022] Open
Abstract
Computational detection of recombination hotspots from population polymorphism data is important both for understanding the nature of recombination and for applications such as association studies. We propose a new method for this task based on a multiple-hotspot model and an (approximate) log-likelihood ratio test. A truncated, weighted pairwise log-likelihood is introduced and applied to the calculation of the log-likelihood ratio, and a forward-selection procedure is adopted to search for the optimal hotspot predictions. The method shows a relatively high power with a low false-positive rate in detecting multiple hotspots in simulation data and has a performance comparable to the best results of leading computational methods in experimental data for which recombination hotspots have been characterized by sperm-typing experiments. The method can be applied to both phased and unphased data directly, with a very fast computational speed. We applied the method to the 10 500-kb regions of the HapMap ENCODE data and found 172 hotspots among the three populations, with average hotspot width of 2.4 kb. By comparisons with the simulation data, we found some evidence that hotspots are not all identical across populations. The correlations between detected hotspots and several genomic characteristics were examined. In particular, we observed that DNaseI-hypersensitive sites are enriched in hotspots, suggesting the existence of human beta hotspots similar to those found in yeast.
Collapse
Affiliation(s)
- Jun Li
- Bioinformatics Division, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
112
|
Bardel C, Darlu P, Génin E. Clustering of haplotypes based on phylogeny: how good a strategy for association testing? Eur J Hum Genet 2006; 14:202-6. [PMID: 16306882 DOI: 10.1038/sj.ejhg.5201501] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Haplotypes are now widely used in association studies between markers and disease susceptibility locus. However, when a large number of markers are considered, the number of possible haplotypes increases leading to two problems: an increased number of degrees of freedom that may result in a lack of power and the existence of rare haplotypes that may be difficult to take into account in the statistical analysis. In a recent paper, Durrant et al proposed a method, CLADHC, to group haplotypes based on distance matrices and showed that this could considerably increase the power of the association test as compared to either single-locus analysis or haplotype analysis without prior grouping. Although the authors considered different one-disease-locus susceptibility models in their simulations, they did not study the impact of the linkage disequilibrium (LD) pattern and of the susceptibility allele frequency on their conclusions. Here, we show, using haplotype data from five regions of the genome of different lengths and with different LD patterns, that, when a single disease susceptibility locus is simulated, the prior grouping of haplotypes based on the algorithm of Durrant et al does not increase the power of association testing except in very particular situations of LD patterns and allele frequencies.
Collapse
Affiliation(s)
- Claire Bardel
- INSERM U535, Hôpital Paul Brousse, Villejuif, France.
| | | | | |
Collapse
|
113
|
Kimmel G, Shamir R. A fast method for computing high-significance disease association in large population-based studies. Am J Hum Genet 2006; 79:481-92. [PMID: 16909386 PMCID: PMC1559554 DOI: 10.1086/507317] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2006] [Accepted: 06/27/2006] [Indexed: 11/03/2022] Open
Abstract
Because of rapid progress in genotyping techniques, many large-scale, genomewide disease-association studies are now under way. Typically, the disorders examined are multifactorial, and, therefore, researchers seeking association must consider interactions among loci and between loci and other factors. One of the challenges of large disease-association studies is obtaining accurate estimates of the significance of discovered associations. The linkage disequilibrium between SNPs makes the tests highly dependent, and dependency worsens when interactions are tested. The standard way of assigning significance (P value) is by a permutation test. Unfortunately, in large studies, it is prohibitively slow to compute low P values by this method. We present here a faster algorithm for accurately calculating low P values in case-control association studies. Unlike with several previous methods, we do not assume a specific distribution of the traits, given the genotypes. Our method is based on importance sampling and on accounting for the decay in linkage disequilibrium along the chromosome. The algorithm is dramatically faster than the standard permutation test. On data sets mimicking medium-to-large association studies, it speeds up computation by a factor of 5,000-100,000, sometimes reducing running times from years to minutes. Thus, our method significantly increases the problem-size range for which accurate, meaningful association results are attainable.
Collapse
Affiliation(s)
- Gad Kimmel
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.
| | | |
Collapse
|
114
|
Ward K. Microarray technology in obstetrics and gynecology: a guide for clinicians. Am J Obstet Gynecol 2006; 195:364-72. [PMID: 16615920 PMCID: PMC7093878 DOI: 10.1016/j.ajog.2005.12.014] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2005] [Revised: 11/29/2005] [Accepted: 12/05/2005] [Indexed: 11/28/2022]
Abstract
Microarrays can be constructed with dozens to millions of probes on their surface to allow high-throughput analyses of many biologic processes to be performed simultaneously on the same sample. Microarrays are now widely used for gene expression analysis, deoxyribonucleic acid resequencing, single-nucleotide polymorphism genotyping, and comparative genomic hybridization. Microarray technology is accelerating research in many fields and now microarrays are moving into clinical application. This review discusses the emerging role of microarrays in molecular diagnostics, pathogen detection, oncology, and pharmacogenomics.
Collapse
Affiliation(s)
- Kenneth Ward
- Department of Obstetrics and Gynecology and Women's Health and the Pacific Research Center for Early Human Development, University of Hawaii, John A. Burns School of Medicine, Honolulu, HI 96826, USA.
| |
Collapse
|
115
|
Abstract
We propose a dictionary model for haplotypes. According to the model, a haplotype is constructed by randomly concatenating haplotype segments from a given dictionary of segments. A haplotype block is defined as a set of haplotype segments that begin and end with the same pair of markers. In this framework, haplotype blocks can overlap, and the model provides a setting for testing the accuracy of simpler models invoking only nonoverlapping blocks. Each haplotype segment in a dictionary has an assigned probability and alternate spellings that account for genotyping errors and mutation. The model also allows for missing data, unphased genotypes, and prior distribution of parameters. Likelihood evaluations rely on forward and backward recurrences similar to the ones encountered in hidden Markov models. Parameter estimation is carried out with an EM algorithm. The search for the optimal dictionary is particularly difficult because of the variable dimension of the model space. We define a minimum description length criteria to evaluate each dictionary and use a combination of greedy search and careful initialization to select a best dictionary for a given dataset. Application of the model to simulated data gives encouraging results. In a real dataset, we are able to reconstruct a parsimonious dictionary that captures patterns of linkage disequilibrium well.
Collapse
Affiliation(s)
- Kristin L Ayers
- Department of Biomathematics, University of California, Los Angeles, CA 90095-1766, USA
| | | | | |
Collapse
|
116
|
Hanson RL, Looker HC, Ma L, Muller YL, Baier LJ, Knowler WC. Design and analysis of genetic association studies to finely map a locus identified by linkage analysis: sample size and power calculations. Ann Hum Genet 2006; 70:332-49. [PMID: 16674556 DOI: 10.1111/j.1529-8817.2005.00230.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Association (e.g. case-control) studies are often used to finely map loci identified by linkage analysis. We investigated the influence of various parameters on power and sample size requirements for such a study. Calculations were performed for various values of a high-risk functional allele (fA), frequency of a marker allele associated with the high risk allele (f1), degree of linkage disquilibrium between functional and marker alleles (D') and trait heritability attributable to the functional locus (h2). The calculations show that if cases and controls are selected from equal but opposite extreme quantiles of a quantitative trait, the primary determinants of power are h2 and the specific quantiles selected. For a dichotomous trait, power also depends on population prevalence. Power is optimal if functional alleles are studied (fA= f1 and D'= 1.0) and can decrease substantially as D' diverges from 1.0 or as f(1) diverges from fA. These analyses suggest that association studies to finely map loci are most powerful if potential functional polymorphisms are identified a priori or if markers are typed to maximize haplotypic diversity. In the absence of such information, expected minimum power at a given location for a given sample size can be calculated by specifying a range of potential frequencies for fA (e.g. 0.1-0.9) and determining power for all markers within the region with specification of the expected D' between the markers and the functional locus. This method is illustrated for a fine-mapping project with 662 single nucleotide polymorphisms in 24 Mb. Regions differed by marker density and allele frequencies. Thus, in some, power was near its theoretical maximum and little additional information is expected from additional markers, while in others, additional markers appear to be necessary. These methods may be useful in the analysis and interpretation of fine-mapping studies.
Collapse
Affiliation(s)
- R L Hanson
- Diabetes Epidemiology and Clinical Research Section, National Institute of Diabetes and Digestive and Kidney Diseases, 1550 East Indian School Road, Phoenix, Arizona, 85014, USA.
| | | | | | | | | | | |
Collapse
|
117
|
Liu PY, Lu Y, Deng HW. Accurate haplotype inference for multiple linked single-nucleotide polymorphisms using sibship data. Genetics 2006; 174:499-509. [PMID: 16783022 PMCID: PMC1569787 DOI: 10.1534/genetics.105.054213] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Sibships are commonly used in genetic dissection of complex diseases, particularly for late-onset diseases. Haplotype-based association studies have been advocated as powerful tools for fine mapping and positional cloning of complex disease genes. Existing methods for haplotype inference using data from relatives were originally developed for pedigree data. In this study, we proposed a new statistical method for haplotype inference for multiple tightly linked single-nucleotide polymorphisms (SNPs), which is tailored for extensively accumulated sibship data. This new method was implemented via an expectation-maximization (EM) algorithm without the usual assumption of linkage equilibrium among markers. Our EM algorithm does not incur extra computational burden for haplotype inference using sibship data when compared with using unrelated parental data. Furthermore, its computational efficiency is not affected by increasing sibship size. We examined the robustness and statistical performance of our new method in simulated data created from an empirical haplotype data set of human growth hormone gene 1. The utility of our method was illustrated with an application to the analyses of haplotypes of three candidate genes for osteoporosis.
Collapse
Affiliation(s)
- Peng-Yuan Liu
- Osteoporosis Research Center, Creighton University, Omaha, Nebraska 68131, USA
| | | | | |
Collapse
|
118
|
Nicolas P, Sun F, Li LM. A model-based approach to selection of tag SNPs. BMC Bioinformatics 2006; 7:303. [PMID: 16776821 PMCID: PMC1525207 DOI: 10.1186/1471-2105-7-303] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2006] [Accepted: 06/15/2006] [Indexed: 11/23/2022] Open
Abstract
Background Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. Results Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. Conclusion Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available.
Collapse
Affiliation(s)
- Pierre Nicolas
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
- Mathématique, Informatique et Génome, INRA, Jouy-en-Josas, France
| | - Fengzhu Sun
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
| | - Lei M Li
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
- Department of Mathematics, University of Southern California, Los Angeles, USA
| |
Collapse
|
119
|
Maekawa K, Itoda M, Sai K, Saito Y, Kaniwa N, Shirao K, Hamaguchi T, Kunitoh H, Yamamoto N, Tamura T, Minami H, Kubota K, Ohtsu A, Yoshida T, Saijo N, Kamatani N, Ozawa S, Sawada JI. Genetic variation and haplotype structure of the ABC transporter gene ABCG2 in a Japanese population. Drug Metab Pharmacokinet 2006; 21:109-21. [PMID: 16702730 DOI: 10.2133/dmpk.21.109] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The ATP-binding cassette transporter, ABCG2, which is expressed at high levels in the intestine and liver, functions as an efflux transporter for many drugs, including clinically used anticancer agents such as topotecan and the active metabolite of irinotecan (SN-38). In this study, to elucidate the linkage disequilibrium (LD) profiles and haplotype structures of ABCG2, we have comprehensively searched for genetic variations in the putative promoter region, all the exons, and their flanking introns of ABCG2 from 177 Japanese cancer patients treated with irinotecan. Forty-three genetic variations, including 11 novel ones, were found: 5 in the 5'-flanking region, 13 in the coding exons, and 25 in the introns. In addition to 9 previously reported nonsynonymous single nucleotide polymorphisms (SNPs), 2 novel nonsynonymous SNPs, 38C>T (Ser13Leu) and 1060G>A (Gly354Arg), were found with minor allele frequencies of 0.3%. Based on the LD profiles between the SNPs and the estimated past recombination events, the region analyzed was divided into three blocks (Block -1, 1, and 2), each of which spans at least 0.2 kb, 46 kb, and 13 kb and contains 2, 24, and 17 variations, respectively. The two, eight, and five common haplotypes detected in 10 or more patients accounted for most (>90%) of the haplotypes inferred in Block -1, Block 1, and Block 2, respectively. The SNP and haplotype distributions in Japanese were different from those reported previously in Caucasians. This study provides fundamental information for the pharmacogenetic studies investigating the relationship between the genetic variations in ABCG2 and pharmacokinetic/pharmacodynamic parameters.
Collapse
Affiliation(s)
- Keiko Maekawa
- Project Team for Pharmacogenetics, Division of Biochemistry and Immunochemistry, National Institute of Health Sciences, Kamiyoga, Tokyo, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
120
|
Kukita Y, Miyatake K, Stokowski R, Hinds D, Higasa K, Wake N, Hirakawa T, Kato H, Matsuda T, Pant K, Cox D, Tahira T, Hayashi K. Genome-wide definitive haplotypes determined using a collection of complete hydatidiform moles. Genome Res 2006; 15:1511-8. [PMID: 16251461 PMCID: PMC1310639 DOI: 10.1101/gr.4371105] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
We present genome-wide definitive haplotypes, determined using a collection of 74 Japanese complete hydatidiform moles, each carrying a genome derived from a single sperm. The haplotypes incorporate 281,439 common SNPs, genotyped with a high throughput array-based oligonucleotide hybridization technique. Comparison of haplotypes inferred from pseudoindividuals (constructed from randomized mole pairs) with those of moles showed some switch errors in resolution of phases by the computational inference method. The effects of these errors on local haplotype structure and selection of tag SNPs are discussed. We also show that definitive haplotypes of moles may be useful for elucidation of long-range haplotype structure, and should be more effective for detecting extended haplotype homozygosity indicative of positive selection.
Collapse
Affiliation(s)
- Yoji Kukita
- Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka 812-8582, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
121
|
Maksymowych WP, Rahman P, Reeve JP, Gladman DD, Peddle L, Inman RD. Association of the IL1 gene cluster with susceptibility to ankylosing spondylitis: an analysis of three Canadian populations. ACTA ACUST UNITED AC 2006; 54:974-85. [PMID: 16508980 DOI: 10.1002/art.21642] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
OBJECTIVE To examine the association between the IL1 gene cluster and susceptibility to ankylosing spondylitis (AS) in 3 independent case-control cohorts. METHODS We analyzed 394 patients and 446 controls from Alberta, Newfoundland, and Toronto, Canada. Samples were genotyped using a panel of 38 single-nucleotide polymorphism (SNP) markers within the IL1 gene cluster. Data from 20 informative and nonredundant SNP markers were analyzed using several association test strategies. First, we used the program WHAP to identify single-marker associations. Second, we used WHAP to analyze "sliding windows" of 3 contiguous markers along the entire extent of the IL1 gene cluster in order to identify haplotypic associations. Third, we used the linkage disequilibrium mapping program DMLE to estimate the posterior probability distribution of a disease locus. RESULTS A total of 14 SNP markers showed significant single-locus disease associations, the most significant being rs3783526 (IL1A) (P = 0.0009 in the Alberta cohort, P = 0.04 in the Newfoundland cohort) and rs1143627 (IL1B) (P = 0.0005 in the Alberta cohort, P = 0.02 in the Newfoundland cohort). Analysis of 3-marker sliding windows revealed significant and consistent associations with all of the haplotypes in the IL1A and IL1B loci in the Alberta cohort and with IL1B in the Newfoundland cohort, especially haplotypes rs1143634/rs1143630/rs3917356 and rs1143630/rs3917356/rs3917354 (P = 0.006-0.0001). With DMLE, a strong peak in the probability distribution was estimated near IL1A in both the Alberta and the Newfoundland populations. CONCLUSION These results indicate that the IL1 locus, or a locus close to IL1, is associated with susceptibility to AS.
Collapse
|
122
|
Liu PY, Zhang YY, Lu Y, Long JR, Shen H, Zhao LJ, Xu FH, Xiao P, Xiong DH, Liu YJ, Recker RR, Deng HW. A survey of haplotype variants at several disease candidate genes: the importance of rare variants for complex diseases. J Med Genet 2006; 42:221-7. [PMID: 15744035 PMCID: PMC1736011 DOI: 10.1136/jmg.2004.024752] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
BACKGROUND The haplotype based association method offers a powerful approach to complex disease gene mapping. In this method, a few common haplotypes that account for the vast majority of chromosomes in the populations are usually examined for association with disease phenotypes. This brings us to a critical question of whether rare haplotypes play an important role in influencing disease susceptibility and thus should not be ignored in the design and execution of association studies. METHODS To address this question we surveyed, in a large sample of 1873 white subjects, six candidate genes for osteoporosis (a common late onset bone disorder), which had 29 SNPs, an average marker density of 13 kb, and covered a total of 377 kb of the DNA sequence. RESULTS Our empirical data demonstrated that two rare haplotypes of the parathyroid hormone (PTH)/PTH related peptide receptor type 1 and vitamin D receptor genes (PTHR1 and VDR) with frequencies of 1.1% and 2.9%, respectively, had significant effects on osteoporosis phenotypes (p = 4.2 x 10(-6) and p = 1.6 x 10(-4), respectively). Large phenotypic differences (4.0 approximately 5.0%) were observed between carriers of these rare haplotypes and non-carriers. Carriers of the two rare haplotypes showed quantitatively continuous variation in the population and were derived from a wide spectrum rather than from one extreme tail of the population phenotype distribution. CONCLUSIONS These findings indicate that rare haplotypes/variants are important for disease susceptibility and cannot be ignored in genetics studies of complex diseases. The study has profound implications for association studies and applications of the HapMap project.
Collapse
Affiliation(s)
- P-Y Liu
- Osteoporosis Research Center, Creighton University, Omaha, NE 68131, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
123
|
Kimmel G, Shamir R. A block-free hidden Markov model for genotypes and its application to disease association. J Comput Biol 2006; 12:1243-60. [PMID: 16379532 DOI: 10.1089/cmb.2005.12.1243] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
We present a new stochastic model for genotype generation. The model offers a compromise between rigid block structure and no structure altogether: It reflects a general blocky structure of haplotypes, but also allows for "exchange" of haplotypes at nonboundary SNP sites; it also accommodates rare haplotypes and mutations. We use a hidden Markov model and infer its parameters by an expectation-maximization algorithm. The algorithm was implemented in a software package called HINT (haplotype inference tool) and tested on 58 datasets of genotypes. To evaluate the utility of the model in association studies, we used biological human data to create a simple disease association search scenario. When comparing HINT to three other models, HINT predicted association most accurately.
Collapse
Affiliation(s)
- Gad Kimmel
- School of Computer Science, Tel-Aviv University, Israel.
| | | |
Collapse
|
124
|
Sabbagh A, Darlu P. SNP selection at the NAT2 locus for an accurate prediction of the acetylation phenotype. Genet Med 2006; 8:76-85. [PMID: 16481889 DOI: 10.1097/01.gim.0000200951.54346.d6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
PURPOSE Genetic polymorphisms in the N-acetyltransferase 2 gene determine the individual acetylator status, which influences both the toxicity and efficacy profile of acetylated drugs. Determination of an individual's acetylation phenotype prior to initiation of therapy, through DNA-based tests, should permit to improve therapy response and reduce adverse events. However, due to extensive linkage disequilibrium between markers within NAT2, the genotyping of closely spaced markers yields highly redundant data: testing them all is expensive and often unnecessary. The objective of this study is to establish the optimal strategy to define, in the genetic context of a given ethnic group, the most informative set of single-nucleotide polymorphisms that best enables accurate prediction of acetylation phenotype. METHODS Three classification methods have been investigated (classification trees, artificial neural networks and multifactor dimensionality reduction method) in order to find the optimal set of single-nucleotide polymorphisms enabling the most efficient classification of individuals in rapid and slow acetylators. RESULTS Our results show that, in almost all population samples, only one or two single-nucleotide polymorphisms would be enough to obtain a good predictive capacity with no or only a modest reduction in power relative to direct assays of all common markers. In contrast, in Black African populations, where lower levels of linkage disequilibrium are observed at NAT2, a larger number of single-nucleotide polymorphisms are required to predict acetylation phenotype. CONCLUSION The results of this study will be helpful for the design of time- and cost-effective pharmacogenetic tests (adapted to specific populations) that could be used as routine tools in clinical practice.
Collapse
Affiliation(s)
- Audrey Sabbagh
- Unité de Recherche en Génétique Epidémiologique et Structure des Populations Humaines, INSERM U535, Villejuif, France
| | | |
Collapse
|
125
|
Yuan A, Chen G, Rotimi C, Bonney GE. A statistical framework for haplotype block inference. J Bioinform Comput Biol 2006; 3:1021-38. [PMID: 16278945 DOI: 10.1142/s021972000500151x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2004] [Revised: 03/29/2005] [Accepted: 03/30/2005] [Indexed: 11/18/2022]
Abstract
The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.
Collapse
Affiliation(s)
- Ao Yuan
- Statistical Genetics and Bioinformatics Unit, Howard University, Washington, DC 20059, USA.
| | | | | | | |
Collapse
|
126
|
Abstract
Models of background variation in genomic regions form the basis of linkage disequilibrium mapping methods. In this work we analyze a background model that groups SNPs into haplotype blocks and represents the dependencies between blocks by a Markov chain. We develop an error measure to compare the performance of this model against the common model that assumes that blocks are independent. By examining data from the International Haplotype Mapping project, we show how the Markov model over haplotype blocks is most accurate when representing blocks in strong linkage disequilibrium. This contrasts with the independent model, which is rendered less accurate by linkage disequilibrium. We provide a theoretical explanation for this surprising property of the Markov model and relate its behavior to allele diversity.
Collapse
Affiliation(s)
- G Greenspan
- Computer Science Department, Technion, Haifa 32000, Israel.
| | | |
Collapse
|
127
|
Nothnagel M, Rohde K. The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates. Am J Hum Genet 2005; 77:988-98. [PMID: 16380910 PMCID: PMC1285181 DOI: 10.1086/498175] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2004] [Accepted: 09/16/2004] [Indexed: 11/03/2022] Open
Abstract
The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation.
Collapse
Affiliation(s)
- Michael Nothnagel
- Department of Bioinformatics, Max Delbrück Center for Molecular Medicine, Berlin, Germany.
| | | |
Collapse
|
128
|
|
129
|
Sutherland AM, Russell JA. Issues with Polymorphism Analysis in Sepsis. Clin Infect Dis 2005; 41 Suppl 7:S396-402. [PMID: 16237637 DOI: 10.1086/431989] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Genetic variation has been shown to play a large role in determining susceptibility to and outcome of such complex diseases as sepsis. There is a much higher heritability of death due to infection than death due to cancer or heart disease. More than 8 million single nucleotide polymorphisms (SNPs) have been detected in the human genome, and there is very little understanding of their effect on gene expression and protein function. The use of haplotypes, which are inherited sets of linked SNPs, as the unit of genetic variation in association studies and the marking of these haplotypes with unique "tag SNPs" may help to narrow down the search for causal SNPs. Future studies must be large (thousands of patients) and must be carefully designed to avoid false associations resulting from ethnic differences in genotype frequencies and disease prevalence in order to find true, reproducible associations between genotype and phenotype. Functional studies and careful characterization of intermediate phenotypes must be done to lend biological plausibility to genotype-phenotype associations. Examination of the association between genetic polymorphisms and sepsis promises to provide clinicians with new tools to evaluate prognosis, to intervene early and aggressively in treating high-risk persons, and to avoid the use of therapies with adverse effects in treating low-risk persons.
Collapse
Affiliation(s)
- Ainsley M Sutherland
- University of British Columbia, The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research, St. Paul's Hospital, Vancouver, British Columbia, Canada
| | | |
Collapse
|
130
|
Hüffmeier U, Lascorz J, Traupe H, Böhm B, Schürmeier-Horst F, Ständer M, Kelsch R, Baumann C, Küster W, Burkhardt H, Reis A. Systematic Linkage Disequilibrium Analysis of SLC12A8 at PSORS5 Confirms a Role in Susceptibility to Psoriasis Vulgaris. J Invest Dermatol 2005; 125:906-12. [PMID: 16297188 DOI: 10.1111/j.0022-202x.2005.23847.x] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The gene for solute carrier family 12 member A8 has recently been proposed as a candidate gene for psoriasis susceptibility (PSORS5) on chromosome 3q based on association of five single nucleotide polymorphisms (SNP) in Swedish patients. To investigate whether this locus is relevant for German psoriasis vulgaris (PsV) patients, we analyzed a group of 210 trios and a case-control group including 375 patients. Based on our investigation of the linkage disequilibrium (LD) structure of SLC12A8, we assayed 35 haplotype tag SNP and grouped them into nine LD-blocks. In the case-control study, we detected an association for six SNP and three LD-based haplotypes. Association was strongest for ss35527511 (chi2 = 11.224, p = 0.0008) and haplotype E-2 (chi2 = 11.788, p = 0.00059) and independent of the presence of an HLA-associated PSORS1 risk allele. Through extended haplotype analysis, we could show that two independent association signals exist in SLC12A8, suggesting allelic heterogeneity. None of the SNP showed association in trios, apart from a weak association of rs2228674 (transmission disequilibrium test statistics p = 0.048), probably due to insufficient power. We conclude that SLC12A8 is a susceptibility locus for PsV. In order to establish the exact nature of this association, efforts to identify the disease-causing variants are ongoing.
Collapse
Affiliation(s)
- Ulrike Hüffmeier
- Institute of Human Genetics, University Erlangen-Nuremberg, Erlangen, Germany
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
131
|
Zhang K, Sun F. Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples. BMC Genet 2005; 6:51. [PMID: 16236175 PMCID: PMC1274312 DOI: 10.1186/1471-2156-6-51] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2005] [Accepted: 10/19/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent studies have indicated that the human genome could be divided into regions with low haplotype diversity interspersed with regions of high haplotype diversity. In regions of low haplotype diversity, a small fraction of SNPs (tag SNPs) are sufficient to account for most of the haplotype diversity of the human genome. These tag SNPs can be extremely useful for testing the association of a marker locus with a qualitative or quantitative trait locus in that it may not be necessary to genotype all the SNPs. When tag SNPs are used to reduce the genotyping effort in association studies, it is important to know how much power is lost. It is also important to know how much power is gained when tag SNPs instead of the same number of randomly chosen SNPs are used. RESULTS We design a simulation study to tackle these problems for a variety of quantitative association tests using either case-parent samples or unrelated population samples. First, the samples are generated based on the quantitative trait model with the assumption of either an extremal sampling scheme or a random sampling scheme. Second, a small number of samples are selected to determine the haplotype blocks and the tag SNPs. Third, the statistical power of the tests is evaluated using four kinds of data: (1) all the SNPs and the corresponding haplotypes, (2) the tag SNPs and the corresponding haplotypes, (3) the same number of evenly spaced SNPs with minor allele frequency greater than a threshold and the corresponding haplotypes, (4) the same number of randomly chosen SNPs and their corresponding haplotypes. CONCLUSION Our results suggest that in most situations genotyping efforts can be significantly reduced by using tag SNPs for mapping the QTL in association studies without much loss of power, which is consistent with previous studies on association mapping of qualitative traits. For all situations considered, two-locus haplotype analysis using tag SNPs are more powerful than those using the same number of randomly selected SNPs, but the degree of such power differences depends upon the sampling scheme and the population history.
Collapse
Affiliation(s)
- Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Fengzhu Sun
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
132
|
Abstract
Much effort and expense are being spent internationally to detect genetic polymorphisms contributing to susceptibility to complex human disease. Concomitantly, the technology for detecting and genotyping single nucleotide polymorphisms (SNPs) has undergone rapid development, yielding extensive catalogues of these polymorphisms across the genome. Population-based maps of the correlations amongst SNPs (linkage disequilibrium) are now being developed to accelerate the discovery of genes for complex human diseases. These genomic advances coincide with an increasing recognition of the importance of very large sample sizes for studying genetic effects. Together, these new genetic and epidemiological data hold renewed promise for the identification of susceptibility genes for complex traits. We review the state of knowledge about the structure of the human genome as related to SNPs and linkage disequilibrium, discuss the potential applications of this knowledge to mapping complex disease genes, and consider the issues facing whole genome association scanning using SNPs.
Collapse
Affiliation(s)
- Lyle J Palmer
- Western Australian Institute for Medical Research and University of Western Australia Centre for Medical Research, University of Western Australia.
| | | |
Collapse
|
133
|
De La Vega FM, Gordon D, Su X, Scafe C, Isaac H, Gilbert DA, Spier EG. Power and Sample Size Calculations for Genetic Case/Control Studies Using Gene-Centric SNP Maps: Application to Human Chromosomes 6, 21, and 22 in Three Populations. Hum Hered 2005; 60:43-60. [PMID: 16137993 DOI: 10.1159/000087918] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2004] [Accepted: 07/12/2005] [Indexed: 01/29/2023] Open
Abstract
Power and sample size calculations are critical parts of any research design for genetic association. We present a method that utilizes haplotype frequency information and average marker-marker linkage disequilibrium on SNPs typed in and around all genes on a chromosome. The test statistic used is the classic likelihood ratio test applied to haplotypes in case/control populations. Haplotype frequencies are computed through specification of genetic model parameters. Power is determined by computation of the test's non-centrality parameter. Power per gene is computed as a weighted average of the power assuming each haplotype is associated with the trait. We apply our method to genotype data from dense SNP maps across three entire chromosomes (6, 21, and 22) for three different human populations (African-American, Caucasian, Chinese), three different models of disease (additive, dominant, and multiplicative) and two trait allele frequencies (rare, common). We perform a regression analysis using these factors, average marker-marker disequilibrium, and the haplotype diversity across the gene region to determine which factors most significantly affect average power for a gene in our data. Also, as a 'proof of principle' calculation, we perform power and sample size calculations for all genes within 100 kb of the PSORS1 locus (chromosome 6) for a previously published association study of psoriasis. Results of our regression analysis indicate that four highly significant factors that determine average power to detect association are: disease model, average marker-marker disequilibrium, haplotype diversity, and the trait allele frequency. These findings may have important implications for the design of well-powered candidate gene association studies. Our power and sample size calculations for the PSORS1 gene appear consistent with published findings, namely that there is substantial power (>0.99) for most genes within 100 kb of the PSORS1 locus at the 0.01 significance level.
Collapse
|
134
|
Hamblin MT, Salas Fernandez MG, Casa AM, Mitchell SE, Paterson AH, Kresovich S. Equilibrium processes cannot explain high levels of short- and medium-range linkage disequilibrium in the domesticated grass Sorghum bicolor. Genetics 2005; 171:1247-56. [PMID: 16157678 PMCID: PMC1456844 DOI: 10.1534/genetics.105.041566] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Patterns of linkage disequilibrium (LD) are of interest because they provide evidence of both equilibrium (e.g., mating system or long-term population structure) and nonequilibrium (e.g., demographic or selective) processes, as well as because of their importance in strategies for identifying the genetic basis of complex phenotypes. We report patterns of short and medium range (up to 100 kb) LD in six unlinked genomic regions in the partially selfing domesticated grass, Sorghum bicolor. The extent of allelic associations in S. bicolor, as assessed by pairwise measures of LD, is higher than in maize but lower than in Arabidopsis, in qualitative agreement with expectations based on mating system. Quantitative analyses of the population recombination parameter, rho, however, based on empirical estimates of rates of recombination, mutation, and self-pollination, show that LD is more extensive than expected under a neutral equilibrium model. The disparity between rho and the population mutation parameter, , is similar to that observed in other species whose population history appears to be complex. From a practical standpoint, these results suggest that S. bicolor is well suited for association studies using reasonable numbers of markers, since LD typically extends at least several kilobases but has largely decayed by 15 kb.
Collapse
Affiliation(s)
- Martha T Hamblin
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
| | | | | | | | | | | |
Collapse
|
135
|
Terry KL, De Vivo I, Titus-Ernstoff L, Shih MC, Cramer DW. Androgen receptor cytosine, adenine, guanine repeats, and haplotypes in relation to ovarian cancer risk. Cancer Res 2005; 65:5974-81. [PMID: 15994977 PMCID: PMC1364476 DOI: 10.1158/0008-5472.can-04-3885] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Biological and epidemiologic evidence suggest that androgen or its receptor may play a role in ovarian cancer pathogenesis. The most notable genetic factor influencing androgen receptor (AR) activity is the functional cytosine, adenine, guanine (CAG) repeat in which length is inversely proportional to its transactivational activity. Additional genetic variation due to single nucleotide polymorphisms in the AR gene may be captured through haplotypes. We genotyped the CAG microsatellite and six haplotype-tagging single nucleotide polymorphisms (rs962458, rs6152, rs1204038, rs2361634, rs1337080, rs1337082) of the androgen receptor gene in 987 ovarian cancer cases and 1,034 controls from a study conducted in New Hampshire and eastern Massachusetts between May 1992 and July 2003. We estimated haplotype frequencies and calculated odds ratios with 95% confidence intervals to evaluate the association between the haplotypes and the AR CAG microsatellite with ovarian cancer risk. We observed that carriage of two alleles with > or = 22 CAG repeats was associated with an increased risk of ovarian cancer compared with carriage of two alleles with <22 CAG repeats (covariate-adjusted odds ratios, 1.31; 95% confidence intervals, 1.01-1.69). Five common haplotypes in the AR gene were identified, but no association between these and ovarian cancer risk was observed. Our results suggest that possession of two long AR alleles (> or = 22 CAG repeats) may be associated with increased risk of ovarian cancer compared with women with two short AR alleles (<22 CAG repeats).
Collapse
Affiliation(s)
- Kathryn L Terry
- Obstetrics and Gynecology Epidemiology Center, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | | | | | | | |
Collapse
|
136
|
Jones R, Pembrey M, Golding J, Herrick D. The search for genenotype/phenotype associations and the phenome scan. Paediatr Perinat Epidemiol 2005; 19:264-75. [PMID: 15958149 DOI: 10.1111/j.1365-3016.2005.00664.x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
All the approaches to the search for genotype/phenotype associations have their share of problems. Comparing the genome scan and candidate gene approaches, the former makes fewer assumptions at the genetic level or about mechanism but has greater statistical difficulties while the latter partially solves the statistical problem but makes more assumptions at both genetic and mechanistic levels. Among current difficulties is a lack of information about the nature of gene variant/phenotype associations: the frequency with which different classes of gene or sequence are involved; the type of genetic variation most commonly involved; the appropriate genetic models to apply to analysis. The overarching problem is that of multiple testing, one solution to which is to integrate genetic information to create a smaller number of compound variables. At the other end of the scale, decisions about the level of complexity at which to pitch the identification of phenotypes also affect the multiple testing problem: whether to pitch them at the level of disease outcomes, or at any of the multiple levels of intermediate phenotypes or traits. The third issue is how best to deal with gene/gene or gene/environment interactions, or whether to ignore them. Only as more genotype/phenotype associations emerge, by whatever means, will the numbers of results allow these questions to be answered. We describe here a new approach to genotype/phenotype association studies, the phenome scan, in which dense phenotypic information in human cohorts is scanned for associations with individual genetic variants. We believe that this approach can generate data that will be useful in answering generic questions about genotype/phenotype associations as well as in discovering novel ones.
Collapse
Affiliation(s)
- Richard Jones
- ALSPAC, Department of Community-Based Medicine, University of Bristol, Bristol, UK.
| | | | | | | |
Collapse
|
137
|
Zhao J, Boerwinkle E, Xiong M. An entropy-based statistic for genomewide association studies. Am J Hum Genet 2005; 77:27-40. [PMID: 15931594 PMCID: PMC1226192 DOI: 10.1086/431243] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2004] [Accepted: 04/19/2005] [Indexed: 11/04/2022] Open
Abstract
Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic.
Collapse
Affiliation(s)
- Jinying Zhao
- Human Genetic Center, University of Texas, Health Science Center at Houston, Houston, TX 77225, USA
| | | | | |
Collapse
|
138
|
Rinaldo A, Bacanu SA, Devlin B, Sonpar V, Wasserman L, Roeder K. Characterization of multilocus linkage disequilibrium. Genet Epidemiol 2005; 28:193-206. [PMID: 15637716 DOI: 10.1002/gepi.20056] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Linkage disequilibrium (LD) in the human genome, often measured as pairwise correlation between adjacent markers, shows substantial spatial heterogeneity. Congruent with these results, studies have found that certain regions of the genome have far less haplotype diversity than expected if the alleles at multiple markers were independent, while other sets of adjacent markers behave almost independently. Regions with limited haplotype diversity have been described as "blocked" or "haplotype blocks." In this article, we propose a new method that aims to distinguish between blocked and unblocked regions in the genome. Like some other approaches, the method analyses haplotype diversity. Unlike other methods, it allows for adjacent, distinct blocks and also multiple, independent single nucleotide polymorphisms (SNPs) separating blocks. Based on an approximate likelihood model and a parsimony criterion to penalize for model complexity, the method partitions a genomic region into blocks relatively quickly, and simulations suggest that its partitions are accurate. We also propose a new, efficient method to select SNPs for association analysis, namely tag SNPs. These methods compare favorably to similar blocking and tagging methods using simulations.
Collapse
Affiliation(s)
- Alessandro Rinaldo
- Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | | | | | | | | | | |
Collapse
|
139
|
Cheng R, Ma JZ, Elston RC, Li MD. Fine mapping functional sites or regions from case-control data using haplotypes of multiple linked SNPs. Ann Hum Genet 2005; 69:102-12. [PMID: 15638831 DOI: 10.1046/j.1529-8817.2004.00140.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Previously, we reported an algorithm for scanning a large number of tightly linked single nucleotide polymorphisms (SNPs) for LD mapping of functional sites or regions from a family-based association design. In the present study, we extend our method to a case-control design. We first use the expectation maximization (EM) algorithm to estimate haplotype frequencies of multiple linked SNPs, and follow this by constructing a contingency table statistic S for LD analysis, based on the estimated haplotype frequencies. An empirical p-value is obtained based on the null distribution of the maximum of S (S*) from a large number (e.g., 1,000 or more) of randomized permutations. The proposed algorithm has been implemented in a computer program in which window searching for functional SNP sites can cover any number of loci without limitation, except that of computer storage. Unlike other programs for a case-control design that always conduct tests at a fix window width, in our program after setting a maximum size of haplotype window width, for a given maximum window width all possible widths of haplotypes are utilized to find the maximum statistic S * for each locus under investigation. The sensitivity of the proposed algorithm has been examined with simulated and real genotyping datasets. Association analyses indicate that our program is powerful enough to detect most, if not all, functional SNPs simulated in the original model or identified in the original report. Moreover, the program is very flexible and can be used in either regional or genome-wide scanning for association analysis with SNP markers.
Collapse
Affiliation(s)
- Rong Cheng
- Program in Genomics and Bioinformatics on Drug Addiction, Department of Psychiatry, The University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | | | | | | |
Collapse
|
140
|
Zeggini E, Barton A, Eyre S, Ward D, Ollier W, Worthington J, John S. Characterisation of the genomic architecture of human chromosome 17q and evaluation of different methods for haplotype block definition. BMC Genet 2005; 6:21. [PMID: 15850495 PMCID: PMC1090572 DOI: 10.1186/1471-2156-6-21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2004] [Accepted: 04/25/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The selection of markers in association studies can be informed through the use of haplotype blocks. Recent reports have determined the genomic architecture of chromosomal segments through different haplotype block definitions based on linkage disequilibrium (LD) measures or haplotype diversity criteria. The relative applicability of distinct block definitions to association studies, however, remains unclear. We compared different block definitions in 6.1 Mb of chromosome 17q in 189 unrelated healthy individuals. Using 137 single nucleotide polymorphisms (SNPs), at a median spacing of 15.5 kb, we constructed haplotype block maps using published methods and additional methods we have developed. Haplotype tagging SNPs (htSNPs) were identified for each map. RESULTS Blocks were found to be shorter and coverage of the region limited with methods based on LD measures, compared to the method based on haplotype diversity. Although the distribution of blocks was highly variable, the number of SNPs that needed to be typed in order to capture the maximum number of haplotypes was consistent. CONCLUSION For the marker spacing used in this study, choice of block definition is not important when used as an initial screen of the region to identify htSNPs. However, choice of block definition has consequences for the downstream interpretation of association study results.
Collapse
Affiliation(s)
- Eleftheria Zeggini
- Centre for Integrated Genomic Medical Research, University of Manchester, Manchester, UK
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Anne Barton
- arc Epidemiology Unit, University of Manchester, Manchester, UK
| | - Stephen Eyre
- arc Epidemiology Unit, University of Manchester, Manchester, UK
| | - Daniel Ward
- arc Epidemiology Unit, University of Manchester, Manchester, UK
| | - William Ollier
- arc Epidemiology Unit, University of Manchester, Manchester, UK
| | | | - Sally John
- Centre for Integrated Genomic Medical Research, University of Manchester, Manchester, UK
| |
Collapse
|
141
|
Ding K, Zhang J, Zhou K, Shen Y, Zhang X. htSNPer1.0: software for haplotype block partition and htSNPs selection. BMC Bioinformatics 2005; 6:38. [PMID: 15740612 PMCID: PMC1274247 DOI: 10.1186/1471-2105-6-38] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2004] [Accepted: 03/01/2005] [Indexed: 01/17/2023] Open
Abstract
Background There is recently great interest in haplotype block structure and haplotype tagging SNPs (htSNPs) in the human genome for its implication on htSNPs-based association mapping strategy for complex disease. Different definitions have been used to characterize the haplotype block structure in the human genome, and several different performance criteria and algorithms have been suggested on htSNPs selection. Results A heuristic algorithm, generalized branch-and-bound algorithm, is applied to the searching of minimal set of haplotype tagging SNPs (htSNPs) according to different htSNPs performance criteria. We develop a software htSNPer1.0 to implement the algorithm, and integrate three htSNPs performance criteria and four haplotype block definitions for haplotype block partitioning. It is a software with powerful Graphical User Interface (GUI), which can be used to characterize the haplotype block structure and select htSNPs in the candidate gene or interested genomic regions. It can find the global optimization with only a fraction of the computing time consumed by exhaustive searching algorithm. Conclusion htSNPer1.0 allows molecular geneticists to perform haplotype block analysis and htSNPs selection using different definitions and performance criteria. The software is a powerful tool for those focusing on association mapping based on strategy of haplotype block and htSNPs.
Collapse
Affiliation(s)
- Keyue Ding
- National Laboratory of Medical Molecular Biology; Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), Beijing 100005, China
- Chinese National Human Genome Center, Beijing 100176, China
| | - Jing Zhang
- MOE Key Laboratory of Bioinformatics/Department of Automation, Tsinghua University, Beijing 100084, China
- Chinese National Human Genome Center, Beijing 100176, China
| | - Kaixin Zhou
- National Laboratory of Medical Molecular Biology; Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), Beijing 100005, China
| | - Yan Shen
- National Laboratory of Medical Molecular Biology; Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), Beijing 100005, China
- Chinese National Human Genome Center, Beijing 100176, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics/Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
142
|
|
143
|
Terry KL, De Vivo I, Titus-Ernstoff L, Sluss PM, Cramer DW. Genetic variation in the progesterone receptor gene and ovarian cancer risk. Am J Epidemiol 2005; 161:442-51. [PMID: 15718480 PMCID: PMC1380205 DOI: 10.1093/aje/kwi064] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Evidence suggests a role for progesterone in ovarian cancer development. Progesterone exerts its effect on target cells by interacting with its receptor. Thus, genetic variations that may cause alterations in the biologic functions of the progesterone receptor can potentially contribute to individual susceptibility to ovarian cancer. Using a population-based, case-control study, the authors genotyped four polymorphisms in the progesterone receptor gene (+44C/T, +331G/A, G393G, V660L) and inferred haplotypes in 987 ovarian cancer cases and 1,034 controls living in New Hampshire and eastern Massachusetts (May 1992-November 2002). Odds ratios and 95% confidence intervals were calculated to evaluate associations with ovarian cancer. No associations were observed between the +44C/T, +331G/A, and G393G polymorphisms and ovarian cancer. However, an inverse association was observed between the V660L variant and ovarian cancer (odds ratio = 0.70, 95% confidence interval: 0.57, 0.85). Associations remained after adjustment for potential confounders. Five haplotypes occurred with greater than 5% frequency, and the haplotype carrying the V660L variant had a significant association with ovarian cancer (odds ratio = 0.76, 95% confidence interval: 0.62, 0.92). Associations were similar after stratifying by ovarian cancer histologies and risk factors.
Collapse
Affiliation(s)
- Kathryn L Terry
- Obstetrics and Gynecology Epidemiology Center, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| | | | | | | | | |
Collapse
|
144
|
Hu X, Schrodi SJ, Ross DA, Cargill M. Selecting tagging SNPs for association studies using power calculations from genotype data. Hum Hered 2005; 57:156-70. [PMID: 15297809 DOI: 10.1159/000079246] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2003] [Accepted: 04/13/2004] [Indexed: 11/19/2022] Open
Abstract
Recent studies have indicated that linkage disequilibrium (LD) between single nucleotide polymorphism (SNP) markers can be used to derive a reduced set of tagging SNPs (tSNPs) for genetic association studies. Previous strategies for identifying tSNPs have focused on LD measures or haplotype diversity, but the statistical power to detect disease-associated variants using tSNPs in genetic studies has not been fully characterized. We propose a new approach of selecting tSNPs based on determining the set of SNPs with the highest power to detect association. Two-locus genotype frequencies are used in the power calculations. To show utility, we applied this power method to a large number of SNPs that had been genotyped in Caucasian samples. We demonstrate that a significant reduction in genotyping efforts can be achieved although the reduction depends on genotypic relative risk, inheritance mode and the prevalence of disease in the human population. The tSNP sets identified by our method are remarkably robust to changes in the disease model when small relative risk and additive mode of inheritance are employed. We have also evaluated the ability of the method to detect unidentified SNPs. Our findings have important implications in applying tSNPs from different data sources in association studies.
Collapse
Affiliation(s)
- Xiaolan Hu
- Celera Diagnostics, Harbor Bay Pkwy, Alameda, CA 94502, USA.
| | | | | | | |
Collapse
|
145
|
Nakajima Y, Saito Y, Shiseki K, Fukushima-Uesaka H, Hasegawa R, Ozawa S, Sugai K, Katoh M, Saitoh O, Ohnuma T, Kawai M, Ohtsuki T, Suzuki C, Minami N, Kimura H, Goto YI, Kamatani N, Kaniwa N, Sawada JI. Haplotype structures of EPHX1 and their effects on the metabolism of carbamazepine-10,11-epoxide in Japanese epileptic patients. Eur J Clin Pharmacol 2005; 61:25-34. [PMID: 15692831 DOI: 10.1007/s00228-004-0878-1] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2004] [Accepted: 11/26/2004] [Indexed: 10/25/2022]
Abstract
OBJECTIVE Microsomal epoxide hydrolase (mEH) is an enzyme that detoxifies reactive epoxides and catalyzes the biotransformation of carbamazepine-10,11-epoxide (CBZ-epoxide) to carbamazepine-10,11-diol (CBZ-diol). Utilizing single nucleotide polymorphisms (SNPs) of the EPHX1 gene encoding mEH, we identified the haplotypes of EPHX1 blocks and investigated the association between the block haplotypes and CBZ-epoxide metabolism. METHODS SNPs of EPHX1 were analyzed by means of polymerase chain reaction amplification and DNA sequencing using DNA extracted from the blood leukocytes of 96 Japanese epileptic patients, including 58 carbamazepine-administered patients. The plasma concentrations of CBZ and its four metabolites were determined using high-performance liquid chromatography. RESULTS From sequencing all 9 exons and their surrounding introns, 29 SNPs were found in EPHX1. The SNPs were separated into three blocks on the basis of linkage disequilibrium, and the block haplotype combinations (diplotypes) were assigned. Using plasma CBZ-diol/CBZ-epoxide ratios (diol/epoxide ratios) indicative of the mEH activity, the effects of the diplotypes in each EPHX1 block were analyzed on CBZ-epoxide metabolism. In block 2, the diol/epoxide ratios increased significantly depending on the number of haplotype *2 bearing Y113H (P=0.0241). In block 3, the ratios decreased depending on the number of haplotype *2 bearing H139R (P=0.0351). Also, an increasing effect of a *1 subtype, *1c, was observed on the ratio. CONCLUSION These results show that some EPHX1 haplotypes are associated with altered CBZ-epoxide metabolism. This is the first report on the haplotype structures of EPHX1 and their potential in vivo effects.
Collapse
Affiliation(s)
- Yukiko Nakajima
- Project team for Pharmacogenetics, National Institute of Health Sciences, 1-18-1 Kamiyoga, Setagaya-ku, Tokyo, 158-8501, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
146
|
Sun X, Stephens JC, Zhao H. The impact of sample size and marker selection on the study of haplotype structures. Hum Genomics 2005; 1:179-93. [PMID: 15588478 PMCID: PMC3525083 DOI: 10.1186/1479-7364-1-3-179] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Several studies of haplotype structures in the human genome in various populations have found that the human chromosomes are structured such that each chromosome can be divided into many blocks, within which there is limited haplotype diversity. In addition, only a few genetic markers in a putative block are needed to capture most of the diversity within a block. There has been no systematic empirical study of the effects of sample size and marker set on the identified block structures and representative marker sets, however. The purpose of this study was to conduct a detailed empirical study to examine such impacts. Towards this goal, we have analysed three representative autosomal regions from a large genome-wide study of haplotypes with samples consisting of African-Americans and samples consisting of Japanese and Chinese individuals. For both populations, we have found that the sample size and marker set have significant impact on the number of blocks and the total number of representative markers identified. The marker set in particular has very strong impacts, and our results indicate that the marker density in the original datasets may not be adequate to allow a meaningful characterisation of haplotype structures. In general, we conclude that we need a relatively large sample size and a very dense marker panel in the study of haplotype structures in human populations.
Collapse
Affiliation(s)
- Xiao Sun
- Genaissance Pharmaceuticals, 5 Science Park, New Haven, CT 06511, USA
- Yale University School of Medicine, 60 College Street, New Haven, CT 06520, USA
| | | | - Hongyu Zhao
- Yale University School of Medicine, 60 College Street, New Haven, CT 06520, USA
| |
Collapse
|
147
|
Zhang X, Roeder K, Wallstrom G, Devlin B. Integration of association statistics over genomic regions using Bayesian adaptive regression splines. Hum Genomics 2005; 1:20-9. [PMID: 15601530 PMCID: PMC3525002 DOI: 10.1186/1479-7364-1-1-20] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD) with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS). For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test) and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.
Collapse
Affiliation(s)
- Xiaohua Zhang
- Department of Statistics, Carnegie Mellon University, Pittsburg, PA 15213, USA
| | - Kathryn Roeder
- Department of Statistics, Carnegie Mellon University, Pittsburg, PA 15213, USA
| | - Garrick Wallstrom
- Department of Statistics, Carnegie Mellon University, Pittsburg, PA 15213, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| |
Collapse
|
148
|
Roeder K, Bacanu SA, Sonpar V, Zhang X, Devlin B. Analysis of single-locus tests to detect gene/disease associations. Genet Epidemiol 2005; 28:207-19. [PMID: 15637715 DOI: 10.1002/gepi.20050] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A goal of association analysis is to determine whether variation in a particular candidate region or gene is associated with liability to complex disease. To evaluate such candidates, ubiquitous Single Nucleotide Polymorphisms (SNPs) are useful. It is critical, however, to select a set of SNPs that are in substantial linkage disequilibrium (LD) with all other polymorphisms in the region. Whether there is an ideal statistical framework to test such a set of 'tag SNPs' for association is unknown. Compared to tests for association based on frequencies of haplotypes, recent evidence suggests tests for association based on linear combinations of the tag SNPs (Hotelling T(2) test) are more powerful. Following this logical progression, we wondered if single-locus tests would prove generally more powerful than the regression-based tests? We answer this question by investigating four inferential procedures: the maximum of a series of test statistics corrected for multiple testing by the Bonferroni procedure, T(B), or by permutation of case-control status, T(P); a procedure that tests the maximum of a smoothed curve fitted to the series of of test statistics, T(S); and the Hotelling T(2) procedure, which we call T(R). These procedures are evaluated by simulating data like that from human populations, including realistic levels of LD and realistic effects of alleles conferring liability to disease. We find that power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele. The clearest pattern emerges between power and the number of SNPs selected. When a large fraction of the SNPs within a gene are tested, and multiple SNPs are highly correlated with the liability allele, T(S) has better power. Using a SNP selection scheme that optimizes power but also requires a substantial number of SNPs to be genotyped (roughly 10-20 SNPs per gene), power of T(P) is generally superior to that for the other procedures, including T(R). Finally, when a SNP selection procedure that targets a minimal number of SNPs per gene is applied, the average performances of T(P) and T(R) are indistinguishable.
Collapse
Affiliation(s)
- Kathryn Roeder
- Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
| | | | | | | | | |
Collapse
|
149
|
Flanders WD, Khoury MJ, Yang QH, Austin H. Tests of trait—haplotype association when linkage phase is ambiguous, appropriate for matched case-control and cohort studies with competing risks. Stat Med 2005; 24:2299-316. [PMID: 16015677 DOI: 10.1002/sim.2156] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The impact of competing risks on tests of association between disease and haplotypes has been largely ignored. We consider situations in which linkage phase is ambiguous and show that tests for disease-haplotype association can lead to rejection of the null hypothesis, even when true, with more than the nominal 5 per cent frequency. This problem tends to occur if a haplotype is associated with overall mortality, even if the haplotype is not associated with disease risk. A small simulation study illustrates the magnitude of bias (high type I error rate) in the context of a cohort study in which a modest number of disease cases (about 350) occur over time. The bias remains even if the score test is based on a logistic model that includes age as a covariate. For cohort studies, we propose a new test based on a modification of the proportional hazards model and for case-control studies, a test based on a conditional likelihood that have the correct size under the null even in the presence of competing risks, and that can be used when haplotype is ambiguous.
Collapse
Affiliation(s)
- W D Flanders
- Department of Epidemiology, Rollins School of Public Health, Emory University, 1599 Clifton Road, Atlanta, GA 30322, USA.
| | | | | | | |
Collapse
|
150
|
Harrap SB. Blood Pressure Genetics. Hypertension 2005. [DOI: 10.1016/b978-0-7216-0258-5.50095-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|