1
|
Chapman J, Whittaker J. Analysis of multiple SNPs in a candidate gene or region. Genet Epidemiol 2008; 32:560-6. [PMID: 18428428 DOI: 10.1002/gepi.20330] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We consider the analysis of multiple single nucleotide polymorphisms (SNPs) within a gene or region. The simplest analysis of such data is based on a series of single SNP hypothesis tests, followed by correction for multiple testing, but it is intuitively plausible that a joint analysis of the SNPs will have higher power, particularly when the causal locus may not have been observed. However, standard tests, such as a likelihood ratio test based on an unrestricted alternative hypothesis, tend to have large numbers of degrees of freedom and hence low power. This has motivated a number of alternative test statistics. Here we compare several of the competing methods, including the multivariate score test (Hotelling's test) of Chapman et al. ([2003] Hum. Hered. 56:18-31), Fisher's method for combining P-values, the minimum P-value approach, a Fourier-transform-based approach recently suggested by Wang and Elston ([2007] Am. J. Human Genet. 80:353-360) and a Bayesian score statistic proposed for microarray data by Goeman et al. ([2005] J. R. Stat. Soc. B 68:477-493). Some relationships between these methods are pointed out, and simulation results given to show that the minimum P-value and the Goeman et al. ([2005] J. R. Stat. Soc. B 68:477-493) approaches work well over a range of scenarios. The Wang and Elston approach often performs poorly; we explain why, and show how its performance can be substantially improved.
Collapse
Affiliation(s)
- Juliet Chapman
- London School of Hygiene and Tropical Medicine, London, United Kingdom.
| | | |
Collapse
|
2
|
Han B, Kang HM, Seo MS, Zaitlen N, Eskin E. Efficient association study design via power-optimized tag SNP selection. Ann Hum Genet 2008; 72:834-47. [PMID: 18702637 DOI: 10.1111/j.1469-1809.2008.00469.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Discovering statistical correlation between causal genetic variation and clinical traits through association studies is an important method for identifying the genetic basis of human diseases. Since fully resequencing a cohort is prohibitively costly, genetic association studies take advantage of local correlation structure (or linkage disequilibrium) between single nucleotide polymorphisms (SNPs) by selecting a subset of SNPs to be genotyped (tag SNPs). While many current association studies are performed using commercially available high-throughput genotyping products that define a set of tag SNPs, choosing tag SNPs remains an important problem for both custom follow-up studies as well as designing the high-throughput genotyping products themselves. The most widely used tag SNP selection method optimizes the correlation between SNPs (r(2)). However, tag SNPs chosen based on an r(2) criterion do not necessarily maximize the statistical power of an association study. We propose a study design framework that chooses SNPs to maximize power and efficiently measures the power through empirical simulation. Empirical results based on the HapMap data show that our method gains considerable power over a widely used r(2)-based method, or equivalently reduces the number of tag SNPs required to attain the desired power of a study. Our power-optimized 100k whole genome tag set provides equivalent power to the Affymetrix 500k chip for the CEU population. For the design of custom follow-up studies, our method provides up to twice the power increase using the same number of tag SNPs as r(2)-based methods. Our method is publicly available via web server at http://design.cs.ucla.edu.
Collapse
Affiliation(s)
- B Han
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | | | | | | | | |
Collapse
|
3
|
Cole SM, Long JC. A coalescent simulation of marker selection strategy for candidate gene association studies. Am J Med Genet B Neuropsychiatr Genet 2008; 147B:86-93. [PMID: 17722024 DOI: 10.1002/ajmg.b.30564] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Recent efforts have focused on the challenges of finding alleles that contribute to health-related phenotypes in genome-wide association studies. However, in candidate gene studies, where the genomic region of interest is small and recombination is limited, factors that affect the ability to detect disease-susceptibility alleles remain poorly understood. In particular, it is unclear how varying the number of markers on a haplotype, the type of marker (e.g., single nucleotide polymorphism (SNP), short tandem repeat (STR)), including the causative site (cs) as a genetic marker, or population demographics influences the power to detect a candidate gene. We evaluated the power of association tests using coalescent-modeled computer simulations. Results show that an effective number of markers on a haplotype is dependent on whether the cs is included as a marker. When the analyses include the cs, highest power is achieved with a single-marker association test. However, when the cs is excluded from analyses, the addition of more nonfunctional SNPs on the haplotype increases power to a certain point under most scenarios. We find a rapidly expanding population always has lower power compared to a population of constant size; although utilizing markers with a frequency of at least 5% improves the chance of detecting an association. Comparing the mutational properties of a nonfunctional SNP versus an STR, multi-allelic STRs provide more or comparable power than a bi-allelic SNP unless SNP frequencies are constrained to 10% or more. Similarly, including an STR with SNPs on a haplotype improves power unless SNP frequencies are 5% or more.
Collapse
Affiliation(s)
- Suzanne M Cole
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109-0618, USA
| | | |
Collapse
|
4
|
Baksh MF, Kelly PJ. Statistical methods for examining genetic influences of resistance to anti-epileptic drugs. Expert Rev Clin Pharmacol 2008; 1:137-44. [DOI: 10.1586/17512433.1.1.137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
5
|
|
6
|
Kullo IJ, Greene MT, Boerwinkle E, Chu J, Turner ST, Kardia SLR. Association of polymorphisms in NOS3 with the ankle-brachial index in hypertensive adults. Atherosclerosis 2007; 196:905-12. [PMID: 17367796 PMCID: PMC2858046 DOI: 10.1016/j.atherosclerosis.2007.02.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2006] [Revised: 12/23/2006] [Accepted: 02/08/2007] [Indexed: 10/23/2022]
Abstract
We investigated the association of 14 polymorphisms in the endothelial nitric oxide synthase gene (NOS3) with ankle brachial index (ABI) in non-Hispanic white hypertensives belonging to hypertensive sibships. Subjects (n=659, mean age 61+/-9 years, 54% women) underwent measurement of ABI using a standard protocol, and the lowest of 4 ABI values was used in the analyses. Non-synonymous SNPs with a minor allele frequency >0.02 and tag SNPs selected based on a measure of linkage disequilibrium (r(2)) were genotyped. We reduced the chance of false positives by testing for replication, randomly selecting 1 hypertensive sib from each sibship to create Subset 1 (n=330) and Subset 2 (n=329). Multivariable linear regression models were used to assess the associations of single NOS3 polymorphisms and haplotypes with ABI after adjustment for covariates (age, sex, body mass index, smoking, total cholesterol, HDL cholesterol, and diabetes). Two specific SNPs in significant LD with each other (rs891512 and rs1808593) were significantly associated with ABI in both subsets. Based on a sliding window approach with a window size of 2, estimated haplotypes from 2 SNP pairs (rs2070744-rs3918226 and rs1808593-rs7830) were also significantly associated with ABI in both subsets. In conclusion, specific NOS3 SNPs and haplotypes were associated with inter-individual variation in ABI, a non-invasive marker of peripheral arterial disease, in replicate subsets of hypertensive subjects. These findings motivate further investigation of the role of NOS3 variants in determining susceptibility to peripheral arterial disease.
Collapse
Affiliation(s)
- Iftikhar J Kullo
- Divisions of Cardiovascular Diseases, Department of Internal Medicine, Mayo Clinic and Foundation, Rochester, MN 55905, USA.
| | | | | | | | | | | |
Collapse
|
7
|
De La Vega FM. Selecting single-nucleotide polymorphisms for association studies with SNPbrowser software. Methods Mol Biol 2007; 376:177-93. [PMID: 17984546 DOI: 10.1007/978-1-59745-389-9_13] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The design of genetic association studies using single-nucleotide polymorphisms (SNPs) requires the selection of subsets of the variants providing high statistical power at a reasonable cost. SNPs must be selected to maximize the probability that a causative mutation is in linkage disequilibrium (LD) with at least one marker genotyped in the study. The HapMap Project performed a genome-wide survey of genetic variation with over 3 million SNPs typed in four populations, providing a rich resource to inform the design of association studies. A number of strategies have been proposed for the selection of SNPs based on observed LD, including construction of metric LD maps and the selection of haplotype-tagging SNPs. Power calculations are important at the study design stage to ensure successful results. Integrating these methods and annotations can be challenging: the algorithms required to implement these methods are complex to deploy, and all the necessary data and annotations are deposited in disparate databases. Here, we review the typical workflows for the selection of markers for association studies utilizing the SNPbrowser software, a freely available, stand-alone application that incorporates the HapMap database together with gene and SNP annotations. Selected SNPs are screened for their conversion potential to genotyping platforms, expediting the set up of genetic studies with an increased probability of success.
Collapse
|
8
|
Menon R, Fortunato SJ, Thorsen P, Williams S. Genetic associations in preterm birth: a primer of marker selection, study design, and data analysis. ACTA ACUST UNITED AC 2006; 13:531-41. [PMID: 17088082 DOI: 10.1016/j.jsgi.2006.09.006] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2006] [Indexed: 01/16/2023]
Abstract
Spontaneous preterm birth (PTB; delivery before 37 weeks gestation) is a primary risk factor for infant morbidity and mortality. The etiology is unclear, but there is evidence that there is a genetic predisposition to PTB. Armed with the suggestion of genetic risk factors and the failure to identify useful biomarkers, investigators are starting to actively pursue the role of genetic predisposition in PTB. Several studies have been done to date assessing the role of single gene variants. However, positive findings have failed to replicate. We argue that heterogeneity in study designs, definition of phenotype, single-nucleotide polymorphism (SNP) selection, population selection, and sample size makes data interpretation difficult in complex phenotypes such as PTB. In this review, we introduce general concepts of study designs in genetic epidemiology, selection of candidate genes and markers for analysis, and analytical methodologies. We also introduce how the concept of gene-gene interactions (biologic epistasis) and gene-environment interactions may affect the predisposition to PTB.
Collapse
|
9
|
Zhao J, Jin L, Xiong M. Test for interaction between two unlinked loci. Am J Hum Genet 2006; 79:831-45. [PMID: 17033960 PMCID: PMC1698572 DOI: 10.1086/508571] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2006] [Accepted: 08/14/2006] [Indexed: 11/04/2022] Open
Abstract
Despite the growing consensus on the importance of testing gene-gene interactions in genetic studies of complex diseases, the effect of gene-gene interactions has often been defined as a deviance from genetic additive effects, which is essentially treated as a residual term in genetic analysis and leads to low power in detecting the presence of interacting effects. To what extent the definition of gene-gene interaction at population level reflects the genes' biochemical or physiological interaction remains a mystery. In this article, we introduce a novel definition and a new measure of gene-gene interaction between two unlinked loci (or genes). We developed a general theory for studying linkage disequilibrium (LD) patterns in disease population under two-locus disease models. The properties of using the LD measure in a disease population as a function of the measure of gene-gene interaction between two unlinked loci were also investigated. We examined how interaction between two loci creates LD in a disease population and showed that the mathematical formulation of the new definition for gene-gene interaction between two loci was similar to that of the LD between two loci. This finding motived us to develop an LD-based statistic to detect gene-gene interaction between two unlinked loci. The null distribution and type I error rates of the LD-based statistic for testing gene-gene interaction were validated using extensive simulation studies. We found that the new test statistic was more powerful than the traditional logistic regression under three two-locus disease models and demonstrated that the power of the test statistic depends on the measure of gene-gene interaction. We also investigated the impact of using tagging SNPs for testing interaction on the power to detect interaction between two unlinked loci. Finally, to evaluate the performance of our new method, we applied the LD-based statistic to two published data sets. Our results showed that the P values of the LD-based statistic were smaller than those obtained by other approaches, including logistic regression models.
Collapse
Affiliation(s)
- Jinying Zhao
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77225, USA
| | | | | |
Collapse
|
10
|
Abstract
The number of common single nucleotide polymorphisms (SNPs) in the human genome is estimated to be around 3-6 million. It is highly anticipated that the study of SNPs will help provide a means for elucidating the genetic component of complex diseases and variable drug responses. High-throughput technologies such as oligonucleotide arrays have produced enormous amount of SNP data, which creates great challenges in genome-wide disease linkage and association studies. In this paper, we present an adaptation of the cross entropy (CE) method and propose an iterative CE Monte Carlo (CEMC) algorithm for tagging SNP selection. This differs from most of SNP selection algorithms in the literature in that our method is independent of the notion of haplotype block. Thus, the method is applicable to whole genome SNP selection without prior knowledge of block boundaries. We applied this block-free algorithm to three large datasets (two simulated and one real) that are in the order of thousands of SNPs. The successful applications to these large scale datasets demonstrate that CEMC is computationally feasible for whole genome SNP selection. Furthermore, the results show that CEMC is significantly better than random selection, and it also outperformed another block-free selection algorithm for the dataset considered.
Collapse
Affiliation(s)
- Zhenqiu Liu
- Division of Biostatistics, University of Maryland Greenebaum Cancer Center, Baltimore, Maryland, USA
| | | | | |
Collapse
|
11
|
Sham PC, Ao SI, Kwan JSH, Kao P, Cheung F, Fong PY, Ng MK. Combining functional and linkage disequilibrium information in the selection of tag SNPs. Bioinformatics 2006; 23:129-31. [PMID: 17060359 DOI: 10.1093/bioinformatics/btl532] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED We have developed an online program, WCLUSTAG, for tag SNP selection that allows the user to specify variable tagging thresholds for different SNPs. Tag SNPs are selected such that a SNP with user-specified tagging threshold C will have a minimum R2 of C with at least one tag SNP. This flexible feature is useful for researchers who wish to prioritize genomic regions or SNPs in an association study. AVAILABILITY The online WCLUSTAG program is available at http://bioinfo.hku.hk/wclustag/
Collapse
Affiliation(s)
- P C Sham
- Department of Psychiatry, Institute of Psychiatry, King's College London, UK
| | | | | | | | | | | | | |
Collapse
|
12
|
Tang NLS, Pharoah PDP, Ma SL, Easton DF. Evaluation of an algorithm of tagging SNPs selection by linkage disequilibrium. Clin Biochem 2006; 39:240-3. [PMID: 16427037 DOI: 10.1016/j.clinbiochem.2005.11.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2005] [Revised: 10/30/2005] [Accepted: 11/25/2005] [Indexed: 11/25/2022]
Abstract
BACKGROUND Single nucleotide polymorphisms (SNPs) are the most abundant kind of genetic polymorphism in the human genome. They are important in both genetic research and genetic testing in a clinical setting, such as in the area of pharmacogenetics. In order to improve efficiency, tagging SNPs (tagSNPs) are selected in genes of interest to represent other co-related SNPs in linkage disequilibrium (LD) with the tagSNPs. Various algorithms have been proposed to identify a subset of single nucleotide polymorphisms as tagSNPs. Most algorithms of tagSNPs selection are haplotype-based, in which the spatial relationship between SNPs is considered. Currently, a more efficient cluster-based algorithm is proposed which clusters SNPs solely by a LD parameter, such as r(2). Here, we evaluated the sample distribution of r(2) and its effect on the cluster-based tagSNPs selection. DESIGN AND METHODS The genotype data of 198 individual within a 500-kb region on 5q31 was used to evaluate the sample distribution of r(2) and its effect on the cluster-based tagSNPs selection. RESULTS It was found that the degree of variation of LD depends on the LD structure of genes. CONCLUSION As a cluster-based tagSNPs selection algorithm does not take into account the spatial position of SNPs, a more stringent r(2) threshold is required to achieve more reliable tagSNPs selection.
Collapse
Affiliation(s)
- Nelson L S Tang
- Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | | | | | | |
Collapse
|
13
|
Kristensen VN, Tsalenko A, Geisler J, Faldaas A, Grenaker GI, Lingjærde OC, Fjeldstad S, Yakhini Z, Lønning PE, Børresen-Dale AL. Multilocus analysis of SNP and metabolic data within a given pathway. BMC Genomics 2006; 7:5. [PMID: 16412218 PMCID: PMC1382210 DOI: 10.1186/1471-2164-7-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2004] [Accepted: 01/13/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Complex traits, which are under the influence of multiple and possibly interacting genes, have become a subject of new statistical methodological research. One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common multifactorial diseases and their association to different quantitative phenotypic traits. RESULTS Two types of data from the same metabolic pathway were used in the analysis: categorical measurements of 18 SNPs; and quantitative measurements of plasma levels of several steroids and their precursors. Using the combinatorial partitioning method we tested various thresholds for each metabolic trait and each individual SNP locus. One SNP in CYP19, 3UTR, two SNPs in CYP1B1 (R48G and A119S) and one in CYP1A1 (T461N) were significantly differently distributed between the high and low level metabolic groups. The leave one out cross validation method showed that 6 SNPs in concert make 65% correct prediction of phenotype. Further we used pattern recognition, computing the p-value by Monte Carlo simulation to identify sets of SNPs and physiological characteristics such as age and weight that contribute to a given metabolic level. Since the SNPs detected by both methods reside either in the same gene (CYP1B1) or in 3 different genes in immediate vicinity on chromosome 15 (CYP19, CYP11 and CYP1A1) we investigated the possibility that they form intragenic and intergenic haplotypes, which may jointly account for a higher activity in the pathway. We identified such haplotypes associated with metabolic levels. CONCLUSION The methods reported here may enable to study multiple low-penetrance genetic factors that together determine various quantitative phenotypic traits. Our preliminary data suggest that several genes coding for proteins involved in a common pathway, that happen to be located on common chromosomal areas and may form intragenic haplotypes, together account for a higher activity of the whole pathway.
Collapse
Affiliation(s)
- Vessela N Kristensen
- Department of Genetics, Institute of Cancer Research, the Norwegian Radium Hospital, 0310 Oslo, Norway
| | | | - Jurgen Geisler
- Department of Oncology, Haukeland Hospital, Bergen, Norway
| | - Anne Faldaas
- Department of Genetics, Institute of Cancer Research, the Norwegian Radium Hospital, 0310 Oslo, Norway
| | - Grethe Irene Grenaker
- Department of Genetics, Institute of Cancer Research, the Norwegian Radium Hospital, 0310 Oslo, Norway
| | | | | | | | | | - Anne-Lise Børresen-Dale
- Department of Genetics, Institute of Cancer Research, the Norwegian Radium Hospital, 0310 Oslo, Norway
- University in Oslo, Faculty Division Radiumhospitalet, Oslo, Norway
| |
Collapse
|
14
|
Pardi F, Lewis CM, Whittaker JC. SNP Selection for Association Studies: Maximizing Power across SNP Choice and Study Size. Ann Hum Genet 2005; 69:733-46. [PMID: 16266411 DOI: 10.1111/j.1529-8817.2005.00202.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Selection of single nucleotide polymorphisms (SNPs) is a problem of primary importance in association studies and several approaches have been proposed. However, none provides a satisfying answer to the problem of how many SNPs should be selected, and how this should depend on the pattern of linkage disequilibrium (LD) in the region under consideration. Moreover, SNP selection is usually considered as independent from deciding the sample size of the study. However, when resources are limited there is a tradeoff between the study size and the number of SNPs to genotype. We show that tuning the SNP density to the LD pattern can be achieved by looking for the best solution to this tradeoff. Our approach consists of formulating SNP selection as an optimization problem: the objective is to maximize the power of the final association study, whilst keeping the total costs below a given budget. We also propose two alternative algorithms for the solution of this optimization problem: a genetic algorithm and a hill climbing search. These standard techniques efficiently find good solutions, even when the number of possible SNPs to choose from is large. We compare the performance of these two algorithms on different chromosomal regions and show that, as expected, the selected SNPs reflect the LD pattern: the optimal SNP density varies dramatically between chromosomal regions.
Collapse
Affiliation(s)
- F Pardi
- Department of Medical and Molecular Genetics, Guy's, King's and St. Thomas' School of Medicine, King's College London, London, UK
| | | | | |
Collapse
|
15
|
Abstract
We review the rationale behind and discuss methods of design and analysis of genetic association studies. There are similarities between genetic association studies and classic epidemiological studies of environmental risk factors but there are also issues that are specific to studies of genetic risk factors such as the use of particular family-based designs, the need to account for different underlying genetic mechanisms, and the effect of population history. Association differs from linkage (covered elsewhere in this series) in that the alleles of interest will be the same across the whole population. As with other types of genetic epidemiological study, issues of design, statistical analysis, and interpretation are very important.
Collapse
Affiliation(s)
- Heather J Cordell
- University of Cambridge, Department of Medical Genetics, Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, Addenbrookes Hospital, UK.
| | | |
Collapse
|
16
|
Kathiresan S, Larson MG, Vasan RS, Guo CY, Vita JA, Mitchell GF, Keyes MJ, Newton-Cheh C, Musone SL, Lochner AL, Drake JA, Levy D, O'Donnell CJ, Hirschhorn JN, Benjamin EJ. Common Genetic Variation at the Endothelial Nitric Oxide Synthase Locus and Relations to Brachial Artery Vasodilator Function in the Community. Circulation 2005; 112:1419-27. [PMID: 16129794 DOI: 10.1161/circulationaha.105.544619] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background—
Sequence variants at the endothelial nitric oxide synthase (
NOS3
) locus have been associated with endothelial function measures, but replication has been limited.
Methods and Results—
In reference pedigrees, we characterized linkage disequilibrium structure at the
NOS3
locus using 33 common single nucleotide polymorphisms (SNPs). Eighteen SNPs that capture underlying common variation were genotyped in unrelated Framingham Heart Study participants (49.5% women; mean age, 62 years) with measured brachial artery flow-mediated dilation (n=1446) or hyperemic flow velocity (n=1043). Within 3 defined blocks of strong linkage disequilibrium that spanned
NOS3
, 11 SNPs captured >80% of common haplotypic variation. Among men, there were nominally significant associations between 8
NOS3
SNPs (minimum
P
=0.002) and between haplotypes (minimum
P
=0.002) and either flow-mediated dilation or hyperemic flow velocity. In women, we did not observe significant associations between
NOS3
SNPs or haplotypes and endothelial function measures. To correct for multiple testing, we constructed 1000 bootstrapped null data sets and found that empirical probability values exceeded 0.05 for both phenotypes.
Conclusions—
A parsimonious set of SNPs captures common genetic variation at the
NOS3
locus. A conservative interpretation of our results is that, accounting for multiple testing, we did not observe statistically significant relations between
NOS3
sequence variants and endothelial function measures in either sex. The nominal associations of select
NOS3
variants with endothelial function in men (unadjusted for multiple testing) should be viewed as hypothesis-generating observations and may merit testing in other cohorts and experimental designs.
Collapse
Affiliation(s)
- Sekar Kathiresan
- Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, MA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Halldórsson BV, Istrail S, De La Vega FM. Optimal Selection of SNP Markers for Disease Association Studies. Hum Hered 2005; 58:190-202. [PMID: 15812176 DOI: 10.1159/000083546] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Genetic association studies with population samples hold the promise of uncovering the susceptibility genes underlying the heritability of complex or common disease. Most association studies rely on the use of surrogate markers, single-nucleotide polymorphism (SNP) being the most suitable due to their abundance and ease of scoring. SNP marker selection is aimed to increase the chances that at least one typed SNP would be in linkage disequilibrium (LD) with the disease causative variant, while at the same time controlling the cost of the study in terms of the number of markers genotyped and samples. Empirical studies reporting block-like segments in the genome with high LD and low haplotype diversity have motivated a marker selection strategy whereby subsets of SNPs that 'tag' the common haplotypes of a region are picked for genotyping, avoiding typing redundant SNPs. Based on these initial observations, a plethora of 'tagging' algorithms for selecting minimum informative subsets of SNPs has recently appeared in the literature. These differ mostly in two major aspects: the quality or correlation measure used to define tagging and the algorithm used for the minimization of the final number of tagging SNPs. In this review we describe the available tagging algorithms utilizing a 3-step unifying framework, point out their methodological and conceptual differences, and make an assessment of their assumptions, performance, and scalability.
Collapse
|
18
|
Ao SI, Yip K, Ng M, Cheung D, Fong PY, Melhado I, Sham PC. CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics 2004; 21:1735-6. [PMID: 15585525 DOI: 10.1093/bioinformatics/bti201] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Cluster and set-cover algorithms are developed to obtain a set of tag single nucleotide polymorphisms (SNPs) that can represent all the known SNPs in a chromosomal region, subject to the constraint that all SNPs must have a squared correlation R2>C with at least one tag SNP, where C is specified by the user. AVAILABILITY http://hkumath.hku.hk/web/link/CLUSTAG/CLUSTAG.html CONTACT mng@maths.hku.hk.
Collapse
Affiliation(s)
- S I Ao
- Department of Mathematics, The University of Hong Kong, Pokfulam, Hong Kong
| | | | | | | | | | | | | |
Collapse
|
19
|
Neale BM, Sham PC. The future of association studies: gene-based analysis and replication. Am J Hum Genet 2004; 75:353-62. [PMID: 15272419 PMCID: PMC1182015 DOI: 10.1086/423901] [Citation(s) in RCA: 473] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2004] [Accepted: 06/21/2004] [Indexed: 11/03/2022] Open
Abstract
Historically, association tests were limited to single variants, so that the allele was considered the basic unit for association testing. As marker density increases and indirect approaches are used to assess association through linkage disequilibrium, association is now frequently considered at the haplotypic level. We suggest that there are difficulties in replicating association findings at the single-nucleotide-polymorphism (SNP) or the haplotype level, and we propose a shift toward a gene-based approach in which all common variation within a candidate gene is considered jointly. Inconsistencies arising from population differences are more readily resolved by use of a gene-based approach rather than either a SNP-based or a haplotype-based approach. A gene-based approach captures all of the potential risk-conferring variations; thus, negative findings are subject only to the issue of power. In addition, chance findings due to multiple testing can be readily accounted for by use of a genewide-significance level. Meta-analysis procedures can be formalized for gene-based methods through the combination of P values. It is only a matter of time before all variation within genes is mapped, at which point the gene-based approach will become the natural end point for association analysis and will inform our search for functional variants relevant to disease etiology.
Collapse
Affiliation(s)
- Benjamin M Neale
- Social, Genetic, and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, London, United Kingdom
| | | |
Collapse
|