1101
|
Kathiresan S, Larson MG, Vasan RS, Guo CY, Gona P, Keaney JF, Wilson PWF, Newton-Cheh C, Musone SL, Camargo AL, Drake JA, Levy D, O'Donnell CJ, Hirschhorn JN, Benjamin EJ. Contribution of clinical correlates and 13 C-reactive protein gene polymorphisms to interindividual variability in serum C-reactive protein level. Circulation 2006; 113:1415-23. [PMID: 16534007 DOI: 10.1161/circulationaha.105.591271] [Citation(s) in RCA: 171] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
BACKGROUND Serum C-reactive protein (CRP) level is a heritable complex trait that predicts incident cardiovascular disease. We investigated the clinical and genetic sources of interindividual variability in serum CRP. METHODS AND RESULTS We studied serum CRP in 3301 Framingham Heart Study (FHS) participants (mean age 61 years, 53% women). Twelve clinical covariates explained 26% of the variability in CRP level, with body mass index alone explaining 15% (P<0.0001) of the variance. To investigate the influence of genetic variation at the CRP gene on CRP levels, we first constructed a dense linkage disequilibrium map for common single-nucleotide polymorphisms (SNPs) spanning the CRP locus (1 SNP every 850 bases, 26 kilobase [kb] genomic region). Thirteen CRP SNPs were genotyped in 1640 unrelated FHS participants with measured CRP levels. After adjustment for clinical covariates, 9 of 13 SNPs were associated with CRP level (P<0.05). To account for correlation among SNPs, we conducted forward stepwise selection among all 13 SNPs; a triallelic SNP (rs3091244) remained associated with CRP level (stepwise P<0.0001). The triallelic SNP (C-->T-->A; allele frequencies 62%, 31%, and 7%), located in the promoter sequence, explained 1.4% of total serum CRP variation; haplotypes harboring the minor T and A alleles of this SNP were associated with higher CRP level (haplotype P=0.0002 and 0.004). CONCLUSIONS In our community-based sample, clinical variables explained 26% of the interindividual variation in CRP, whereas a common triallelic CRP SNP contributed modestly. Studies of larger samples are warranted to assess the association of genetic variation in CRP and risk of cardiovascular disease.
Collapse
|
1102
|
Montpetit A, Nelis M, Laflamme P, Magi R, Ke X, Remm M, Cardon L, Hudson TJ, Metspalu A. An evaluation of the performance of tag SNPs derived from HapMap in a Caucasian population. PLoS Genet 2006; 2:e27. [PMID: 16532062 PMCID: PMC1391920 DOI: 10.1371/journal.pgen.0020027] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2005] [Accepted: 01/23/2006] [Indexed: 11/18/2022] Open
Abstract
The Haplotype Map (HapMap) project recently generated genotype data for more than 1 million single-nucleotide polymorphisms (SNPs) in four population samples. The main application of the data is in the selection of tag single-nucleotide polymorphisms (tSNPs) to use in association studies. The usefulness of this selection process needs to be verified in populations outside those used for the HapMap project. In addition, it is not known how well the data represent the general population, as only 90–120 chromosomes were used for each population and since the genotyped SNPs were selected so as to have high frequencies. In this study, we analyzed more than 1,000 individuals from Estonia. The population of this northern European country has been influenced by many different waves of migrations from Europe and Russia. We genotyped 1,536 randomly selected SNPs from two 500-kbp ENCODE regions on Chromosome 2. We observed that the tSNPs selected from the CEPH (Centre d'Etude du Polymorphisme Humain) from Utah (CEU) HapMap samples (derived from US residents with northern and western European ancestry) captured most of the variation in the Estonia sample. (Between 90% and 95% of the SNPs with a minor allele frequency of more than 5% have an r2 of at least 0.8 with one of the CEU tSNPs.) Using the reverse approach, tags selected from the Estonia sample could almost equally well describe the CEU sample. Finally, we observed that the sample size, the allelic frequency, and the SNP density in the dataset used to select the tags each have important effects on the tagging performance. Overall, our study supports the use of HapMap data in other Caucasian populations, but the SNP density and the bias towards high-frequency SNPs have to be taken into account when designing association studies. The recent completion of the Haplotype Map (HapMap) project of the human genome provides considerable information on the patterns of variation in the genome of four populations. One of the applications is a description of a set of tags that act as proxies for many other surrounding variants. This will greatly help researchers in their quest to find complex disease genes by reducing the number of genetic variants to test in association studies. To evaluate its usefulness, several aspects of the map, including its transferability to other populations, still needed to be verified experimentally. Using genomic regions where variants had been thoroughly documented in Caucasian samples from Estonia, the researchers found that the transferability of tags is extremely good. The researchers also found that variants with low frequency in the general population (i.e., less than 5%) could not be accurately captured with tags, and that the regional density of variants in the HapMap project had a major impact on the performance of the tags. This research indicates that the HapMap project will be useful, but that careful consideration of hypotheses and study design will be essential for the success of association studies.
Collapse
Affiliation(s)
- Alexandre Montpetit
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Mari Nelis
- Institute of Molecular and Cell Biology of the University of Tartu, Tartu, Estonia
- Estonian Biocentre, Tartu, Estonia
| | - Philippe Laflamme
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Reedik Magi
- Institute of Molecular and Cell Biology of the University of Tartu, Tartu, Estonia
| | - Xiayi Ke
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Maido Remm
- Institute of Molecular and Cell Biology of the University of Tartu, Tartu, Estonia
| | - Lon Cardon
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Thomas J Hudson
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Andres Metspalu
- Institute of Molecular and Cell Biology of the University of Tartu, Tartu, Estonia
- Estonian Biocentre, Tartu, Estonia
- The Estonian Genome Project Foundation, Tartu, Estonia
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
1103
|
Kathiresan S, Gabriel SB, Yang Q, Lochner AL, Larson MG, Levy D, Tofler GH, Hirschhorn JN, O'Donnell CJ. Comprehensive survey of common genetic variation at the plasminogen activator inhibitor-1 locus and relations to circulating plasminogen activator inhibitor-1 levels. Circulation 2006; 112:1728-35. [PMID: 16172282 DOI: 10.1161/circulationaha.105.547836] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Using a linkage disequilibrium (LD)-based approach, we sought to comprehensively define common genetic variation at the plasminogen activator inhibitor-1 (PAI-1) locus and relate common single nucleotide polymorphisms (SNPs) and haplotypes to plasma PAI-1 levels. METHODS AND RESULTS In reference pedigrees, we defined LD structure across a 50-kb genomic segment spanning the PAI-1 locus via a dense SNP map (1 SNP every 2 kb). Eighteen sequence variants that capture underlying common genetic variation were genotyped in 1328 unrelated Framingham Heart Study participants who had plasma PAI-1 antigen levels measured. Regression analyses were used to examine associations of individual SNPs and of inferred haplotypes with multivariable-adjusted PAI-1 levels. Two genetic variants, SNP rs2227631 and the 4G/5G polymorphism, were strongly associated (P<0.0001) with PAI-1 levels. SNP rs2227631 is in tight LD (D'=0.97, r2=0.78) with the 4G/5G polymorphism, which makes it difficult to distinguish which of these 2 polymorphisms is responsible for the association with PAI-1 levels. In stepwise analysis considering all polymorphisms tested, 3 SNPs, rs2227631 (or the correlated 4G/5G polymorphism), rs6465787, and rs2227674, each explained 2.5%, 1%, and 1%, respectively, of the residual variance in multivariable-adjusted PAI-1 levels (stepwise P<0.0001, P=0.04, and P=0.03, respectively). A single common haplotype, at 50% frequency among Framingham Heart Study participants, was strongly associated with higher PAI-1 levels (haplotype-specific P=0.00001). The susceptibility haplotype harbors the minor alleles of SNP rs2227631 and the 4G/5G polymorphism. CONCLUSIONS Three sequence variants at the PAI-1 locus, in sum, explain approximately 5% of the residual variance in multivariable-adjusted PAI-1 levels. For quantitative cardiovascular traits such as circulating biomarkers, defining LD structure in a candidate gene followed by association analyses with both SNPs and haplotypes is an effective approach to localize common susceptibility alleles.
Collapse
Affiliation(s)
- Sekar Kathiresan
- National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, Massachusetts 01702-5827, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
1104
|
Hu Z, Shao M, Yuan J, Xu L, Wang F, Wang Y, Yuan W, Qian J, Ma H, Wang Y, Liu H, Chen W, Yang L, Jin G, Huo X, Chen F, Jin L, Wei Q, Huang W, Lu D, Wu T, Shen H. Polymorphisms in DNA damage binding protein 2 (DDB2) and susceptibility of primary lung cancer in the Chinese: a case-control study. Carcinogenesis 2006; 27:1475-80. [PMID: 16522664 DOI: 10.1093/carcin/bgi350] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
DNA damage binding protein 2 (DDB2) is one of the major DNA repair proteins involved in the nucleotide excision repair (NER) pathway. Mutations in the DDB2 gene can cause a repair-deficiency syndrome xeroderma pigmentosum group E. Because tobacco carcinogens can cause DNA damage that is repaired by NER and suboptimal NER capacity is reported to be associated with lung cancer risk, we hypothesized that common variants in the DDB2 gene are associated with lung cancer risk. To test this hypothesis, we conducted a case-control study of 1010 patients with incident lung cancer and 1011 cancer-free controls and genotyped two DDB2 single nucleotide polymorphisms (SNPs) (rs830083 and rs3781620) that are in linkage disequilibrium with other untyped SNPs. We found that compared with the rs830083CC, subjects carrying the heterozygous rs830083CG genotype had a significantly 1.31-fold increased risk of lung cancer [95% confidence interval (CI) 1.08-1.60] and those carrying the homozygous rs830083GG genotype had a non-significantly 1.22-fold elevated risk (95% CI 0.89-1.67). In addition, effects of the combined rs830083CG/GG variant genotypes were more evident in young subjects, heavy smokers and subjects with a positive family history of cancer. These findings indicate, for the first time, that the DDB2 rs830083 polymorphism may contribute to the etiology of lung cancer. Further functional studies on this SNP and/or related variants are warranted to elucidate the underlying molecular mechanisms of the association.
Collapse
Affiliation(s)
- Zhibin Hu
- Department of Epidemiology and Biostatistics, Cancer Research Center of Nanjing Medical University, Nanjing 210029, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1105
|
Huang W, Li C, Labu, Zhou Y, Li P, Hu B, Pubuzhuoma, Gesangzhuogab, Fang J, Wang Y. High resolution linkage disequilibrium and haplotype maps for the genes in the centromeric region of chromosome 15 in Tibetans and comparisons with Han population. CHINESE SCIENCE BULLETIN-CHINESE 2006. [DOI: 10.1007/s11434-006-0542-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
1106
|
Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin ZS, Munro HM, Abecasis GR, Donnelly P. A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet 2006; 78:437-50. [PMID: 16465620 PMCID: PMC1380287 DOI: 10.1086/500808] [Citation(s) in RCA: 218] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2005] [Accepted: 12/29/2005] [Indexed: 11/03/2022] Open
Abstract
Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r(2) between a pair of SNPs and concluded that all methods estimated r(2) well when the estimated value was >or=0.8.
Collapse
Affiliation(s)
- Jonathan Marchini
- Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1107
|
Newman TL, Rieder MJ, Morrison VA, Sharp AJ, Smith JD, Sprague LJ, Kaul R, Carlson CS, Olson MV, Nickerson DA, Eichler EE. High-throughput genotyping of intermediate-size structural variation. Hum Mol Genet 2006; 15:1159-67. [PMID: 16497726 DOI: 10.1093/hmg/ddl031] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The contribution of large-scale and intermediate-size structural variation (ISV) to human genetic disease and disease susceptibility is only beginning to be understood. The development of high-throughput genotyping technologies is one of the most critical aspects for future studies of linkage disequilibrium (LD) and disease association. Using a simple PCR-based method designed to assay the junctions of the breakpoints, we genotyped seven simple insertion and deletion polymorphisms ranging in size from 6.3 to 24.7 kb among 90 CEPH individuals. We then extended this analysis to a larger collection of samples (n=460) by application of an oligonucleotide extension-ligation genotyping assay. The analysis showed a high level of concordance ( approximately 99%) when compared with PCR/sequence-validated genotypes. Using the available HapMap data, we observed significant LD (r2=0.74-0.95) between each ISV and flanking single nucleotide polymorphisms, but this observation is likely to hold only for similar simple insertion/deletion events. The approach we describe may be used to characterize a large number of individuals in a cost-effective manner once the sequence organization of ISVs is known.
Collapse
Affiliation(s)
- Tera L Newman
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1108
|
Morris GAJ, Lowe CE, Cooper JD, Payne F, Vella A, Godfrey L, Hulme JS, Walker NM, Healy BC, Lam AC, Lyons PA, Todd JA. Polymorphism discovery and association analyses of the interferon genes in type 1 diabetes. BMC Genet 2006; 7:12. [PMID: 16504056 PMCID: PMC1402321 DOI: 10.1186/1471-2156-7-12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2005] [Accepted: 02/22/2006] [Indexed: 11/28/2022] Open
Abstract
Background The aetiology of the autoimmune disease type 1 diabetes (T1D) involves many genetic and environmental factors. Evidence suggests that innate immune responses, including the action of interferons, may also play a role in the initiation and/or pathogenic process of autoimmunity. In the present report, we have adopted a linkage disequilibrium (LD) mapping approach to test for an association between T1D and three regions encompassing 13 interferon alpha (IFNA) genes, interferon omega-1 (IFNW1), interferon beta-1 (IFNB1), interferon gamma (IFNG) and the interferon consensus-sequence binding protein 1 (ICSBP1). Results We identified 238 variants, most, single nucleotide polymorphisms (SNPs), by sequencing IFNA, IFNB1, IFNW1 and ICSBP1, 98 of which where novel when compared to dbSNP build 124. We used polymorphisms identified in the SeattleSNP database for INFG. A set of tag SNPs was selected for each of the interferon and interferon-related genes to test for an association between T1D and this complex gene family. A total of 45 tag SNPs were selected and genotyped in a collection of 472 multiplex families. Conclusion We have developed informative sets of SNPs for the interferon and interferon related genes. No statistical evidence of a major association between T1D and any of the interferon and interferon related genes tested was found.
Collapse
Affiliation(s)
- Gerard AJ Morris
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - Christopher E Lowe
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - Jason D Cooper
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - Felicity Payne
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - Adrian Vella
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - Lisa Godfrey
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - John S Hulme
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - Neil M Walker
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - Barry C Healy
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - Alex C Lam
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - Paul A Lyons
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| | - John A Todd
- Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge, CB2 2XY, UK
| |
Collapse
|
1109
|
He JQ, Burkett K, Connett JE, Anthonisen NR, Paré PD, Sandford AJ. Interferon gamma polymorphisms and their interaction with smoking are associated with lung function. Hum Genet 2006; 119:365-75. [PMID: 16474934 DOI: 10.1007/s00439-006-0143-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Accepted: 01/11/2006] [Indexed: 10/25/2022]
Abstract
Interactions between genetic and environmental determinants are likely to be important in the pathogenesis of chronic obstructive pulmonary disease. We hypothesized that interferon gamma (IFNG) single nucleotide polymorphisms (SNPs) and their interaction with smoking are associated with the rate of decline or level of lung function in smokers. We studied four SNPs in IFNG in 585 non-Hispanic whites (NHW) who had the fastest (n =280) or the slowest (n=305) decline of FEV(1)% predicted selected from among continuous smokers followed for 5 years in the NHLBI Lung Health Study. We also studied 1061 NHW with the lowest (n=530) or the highest (n=531) baseline lung function at the beginning of the LHS. Two SNPs were associated with baseline levels of lung function and the p values were 0.008 for +2197T/C in a dominant model and 0.002 for +5171A/G in a recessive model. However, after adjusting for confounding factors, only +5171A/G was still significant (p=0.001 for the recessive model). In addition, there was a significant genotype and smoking interaction with p=0.006 for the +5171A/G (GG vs.GA + AA) for the baseline lung function. When comparing individuals with GG versus individuals with AG + AA for low lung function, the adjusted odds ratios decreased significantly as pack-years increased. No association was found in the rate of decline study. There was an association between IFNG genotype and baseline of lung function and this association was modified by cigarette smoking.
Collapse
Affiliation(s)
- Jian-Qing He
- The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research, St. Paul's Hospital, University of British Columbia, Vancouver, BC, Canada
| | | | | | | | | | | |
Collapse
|
1110
|
Carlson CS, Heagerty PJ, Hatsukami TS, Richter RJ, Ranchalis J, Lewis J, Bacus TJ, McKinstry LA, Schellenberg GD, Rieder M, Nickerson D, Furlong CE, Chait A, Jarvik GP. TagSNP analyses of the PON gene cluster: effects on PON1 activity, LDL oxidative susceptibility, and vascular disease. J Lipid Res 2006; 47:1014-24. [PMID: 16474172 DOI: 10.1194/jlr.m500517-jlr200] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Paraoxonase 1 (PON1) activity is consistently predictive of vascular disease, although the genotype at four functional PON1 polymorphisms is not. To address this inconsistency, we investigated the role of all common PON1 genetic variability, as measured by tagging single-nucleotide polymorphisms (tagSNPs), in predicting PON1 activity for phenylacetate hydrolysis, LDL susceptibility to oxidation ex vivo, plasma homocysteine (Hcy) levels, and carotid artery disease (CAAD) status. The biological goal was to establish whether additional common genetic variation beyond consideration of the four known functional SNPs improves prediction of these phenotypes. PON2 and PON3 tagSNPs were secondarily evaluated. Expanded analysis of an additional 26 tagSNPs found evidence of previously undescribed common PON1 polymorphisms that affect PON1 activity independently of the four known functional SNPs. PON1 activity was not significantly correlated with LDL oxidative susceptibility, but genotypes at the PON1(-108) promoter polymorphism and several other PON1 SNPs were. Neither PON1 activity nor PON1 genotype was significantly correlated with plasma Hcy levels. This study revealed previously undetected common functional PON1 polymorphisms that explain 4% of PON1 activity and a high rate of recombination in PON1, but the sum of the common PON1 locus variation does not explain the relationship between PON1 activity and CAAD.
Collapse
Affiliation(s)
- Christopher S Carlson
- The Fred Hutchinson Cancer Research Center, Division of Public Health Sciences, The University of Washington, Seattle, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1111
|
González-Neira A, Ke X, Lao O, Calafell F, Navarro A, Comas D, Cann H, Bumpstead S, Ghori J, Hunt S, Deloukas P, Dunham I, Cardon LR, Bertranpetit J. The portability of tagSNPs across populations: a worldwide survey. Genome Res 2006; 16:323-30. [PMID: 16467560 PMCID: PMC1415211 DOI: 10.1101/gr.4138406] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
In the search for common genetic variants that contribute to prevalent human diseases, patterns of linkage disequilibrium (LD) among linked markers should be considered when selecting SNPs. Genotyping efficiency can be increased by choosing tagging SNPs (tagSNPs) in LD with other SNPs. However, it remains to be seen whether tagSNPs defined in one population efficiently capture LD in other populations; that is, how portable tagSNPs are. Indeed, tagSNP portability is a challenge for the applicability of HapMap results. We analyzed 144 SNPs in a 1-Mb region of chromosome 22 in 1055 individuals from 38 worldwide populations, classified into seven continental groups. We measured tagSNP portability by choosing three reference populations (to approximate the three HapMap populations), defining tagSNPs, and applying them to other populations independently on the availability of information on the tagSNPs in the compared population. We found that tagSNPs are highly informative in other populations within each continental group. Moreover, tagSNPs defined in Europeans are often efficient for Middle Eastern and Central/South Asian populations. TagSNPs defined in the three reference populations are also efficient for more distant and differentiated populations (Oceania, Americas), in which the impact of their special demographic history on the genetic structure does not interfere with successfully detecting the most common haplotype variation. This high degree of portability lends promise to the search for disease association in different populations, once tagSNPs are defined in a few reference populations like those analyzed in the HapMap initiative.
Collapse
Affiliation(s)
- Anna González-Neira
- Unitat de Biologia Evolutiva, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1112
|
Huang W, He Y, Wang H, Wang Y, Liu Y, Wang Y, Chu X, Wang Y, Xu L, Shen Y, Xiong X, Li H, Wen B, Qian J, Yuan W, Zhang C, Wang Y, Jiang H, Zhao G, Chen Z, Jin L. Linkage disequilibrium sharing and haplotype-tagged SNP portability between populations. Proc Natl Acad Sci U S A 2006; 103:1418-21. [PMID: 16432195 PMCID: PMC1360575 DOI: 10.1073/pnas.0510360103] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The discovery of the block-like structure of linkage disequilibrium (LD) in human populations holds the promise of delineating the etiology of common diseases. However, understanding the magnitude, mechanism, and utility of between-population LD sharing is critical for future genome-wide association studies. In this study, substantial LD sharing between six non-African populations was observed, although much less between African-American and non-African, based on 20,000 SNPs of chromosome 21. We also demonstrated the respective roles of recombination and demographic events in shaping LD sharing. Furthermore, we showed that the haplotype-tagged SNPs chosen from one population are portable to the others in East Asia. Therefore, we concluded that the magnitude of LD sharing between human populations justifies the use of representative populations for selecting haplotype-tagged SNPs in genome-wide association studies of complex diseases.
Collapse
Affiliation(s)
- Wei Huang
- Chinese National Human Genome Center, Shanghai 201203, China.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1113
|
Hao K, Liu S, Niu T. A sparse marker extension tree algorithm for selecting the best set of haplotype tagging single nucleotide polymorphisms. Genet Epidemiol 2006; 29:336-52. [PMID: 16294299 PMCID: PMC2712933 DOI: 10.1002/gepi.20095] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Single nucleotide polymorphisms (SNPs) play a central role in the identification of susceptibility genes for common diseases. Recent empirical studies on human genome have revealed block-like structures, and each block contains a set of haplotype tagging SNPs (htSNPs) that capture a large fraction of the haplotype diversity. Herein, we present an innovative sparse marker extension tree (SMET) algorithm to select optimal htSNP set(s). SMET reduces the search space considerably (compared to full enumeration strategy), and therefore improves computing efficiency. We tested this algorithm on several datasets at three different genomic scales: (1) gene-wide (NOS3, CRP, IL6 PPARA, and TNF), (2) region-wide (a Whitehead Institute inflammatory bowel disease dataset and a UK Graves' disease dataset), and (3) chromosome-wide (chromosome 22) levels. SMET offers geneticists with greater flexibilities in SNP tagging than lossless methods with adjustable haplotype diversity coverage (phi). In simulation studies, we found that (1) an initial sample size of 50 individuals (100 chromosomes) or more is needed for htSNP selection; (2) the SNP tagging strategy is considerably more efficient when the underlying block structure is taken into account; and (3) htSNP sets at 80-90% phi are more cost-effective than the lossless sets in term of relative power, relative risk ratio estimation, and genotyping efforts. Our study suggests that the novel SMET algorithm is a valuable tool for association tests.
Collapse
Affiliation(s)
- Ke Hao
- Department of Biostatistics, Harvard School of Public Health, Boston, MA
| | - Simin Liu
- Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA
| | - Tianhua Niu
- Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA
| |
Collapse
|
1114
|
Mackelprang R, Livingston RJ, Eberle MA, Carlson CS, Yi Q, Akey JM, Nickerson DA. Sequence diversity, natural selection and linkage disequilibrium in the human T cell receptor alpha/delta locus. Hum Genet 2006; 119:255-66. [PMID: 16425038 DOI: 10.1007/s00439-005-0111-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2005] [Accepted: 11/16/2005] [Indexed: 12/22/2022]
Abstract
T cell receptors (TR), through their interaction with the major histocompatibility complex, play a central role in immune responsiveness and potentially immune-related disorders. We resequenced all 57 variable (V) genes in the human T cell receptor alpha and delta (TRA/TRD) locus in 40 individuals of Northern European, Mexican, African-American and Chinese descent. Two hundred and eighty-four single nucleotide polymorphisms (SNPs) were identified. The distribution of SNPs between V genes was heterogeneous, with an average of five SNPs per gene and a range of zero to 15. We describe the patterns of linkage disequilibrium for these newly discovered SNPs and compare these patterns with other emerging large-scale datasets (e.g. Perlegen and HapMap projects) to place our findings into a framework for future analysis of genotype-phenotype associations across this locus. Furthermore, we explore signatures of natural selection across V genes. We find evidence of strong directional selection at this locus as evidenced by unusually high values of Fst.
Collapse
Affiliation(s)
- Rachel Mackelprang
- Department of Genome Sciences, University of Washington, 357730, Seattle, WA, 98195-7730, USA.
| | | | | | | | | | | | | |
Collapse
|
1115
|
Tang NLS, Pharoah PDP, Ma SL, Easton DF. Evaluation of an algorithm of tagging SNPs selection by linkage disequilibrium. Clin Biochem 2006; 39:240-3. [PMID: 16427037 DOI: 10.1016/j.clinbiochem.2005.11.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2005] [Revised: 10/30/2005] [Accepted: 11/25/2005] [Indexed: 11/25/2022]
Abstract
BACKGROUND Single nucleotide polymorphisms (SNPs) are the most abundant kind of genetic polymorphism in the human genome. They are important in both genetic research and genetic testing in a clinical setting, such as in the area of pharmacogenetics. In order to improve efficiency, tagging SNPs (tagSNPs) are selected in genes of interest to represent other co-related SNPs in linkage disequilibrium (LD) with the tagSNPs. Various algorithms have been proposed to identify a subset of single nucleotide polymorphisms as tagSNPs. Most algorithms of tagSNPs selection are haplotype-based, in which the spatial relationship between SNPs is considered. Currently, a more efficient cluster-based algorithm is proposed which clusters SNPs solely by a LD parameter, such as r(2). Here, we evaluated the sample distribution of r(2) and its effect on the cluster-based tagSNPs selection. DESIGN AND METHODS The genotype data of 198 individual within a 500-kb region on 5q31 was used to evaluate the sample distribution of r(2) and its effect on the cluster-based tagSNPs selection. RESULTS It was found that the degree of variation of LD depends on the LD structure of genes. CONCLUSION As a cluster-based tagSNPs selection algorithm does not take into account the spatial position of SNPs, a more stringent r(2) threshold is required to achieve more reliable tagSNPs selection.
Collapse
Affiliation(s)
- Nelson L S Tang
- Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | | | | | | |
Collapse
|
1116
|
Kozlowski P, Miller DT, Zee RYL, Danik JS, Chasman DI, Lazarus R, Cook NR, Ridker PM, Kwiatkowski DJ. Lack of Association Between Genetic Variation in 9 Innate Immunity Genes and Baseline CRP Levels. Ann Hum Genet 2006. [DOI: 10.1111/j.1529-8817.2005.00256.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
1117
|
Lim J, Kim YJ, Yoon Y, Kim SO, Kang H, Park J, Han AR, Han B, Oh B, Kimm K, Yoon B, Song K. Comparative study of the linkage disequilibrium of an ENCODE region, chromosome 7p15, in Korean, Japanese, and Han Chinese samples. Genomics 2006; 87:392-8. [PMID: 16376517 DOI: 10.1016/j.ygeno.2005.11.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2005] [Revised: 10/19/2005] [Accepted: 11/12/2005] [Indexed: 10/25/2022]
Abstract
The extent and pattern of linkage disequilibrium (LD) in the human genome provide important information for disease gene mapping. Previous studies have shown that LDs vary depending on chromosomal regions and populations. As the Asian samples of the International HapMap Project consisted of Japanese and Chinese populations, it was of interest whether we could use the HapMap data as a reference to carry out association studies of common complex diseases in a closely related population, such as Koreans. We have compared the LD and recombination patterns defined by single-nucleotide polymorphisms (SNPs) in ENCODE region ENm010, chromosome 7p15.2, in Korean, Japanese, and Chinese samples and further tested the robustness of tagSNPs among the Asian samples. We genotyped 792 SNPs in 500 kb (chromosome 7: 26699793-27199792, NCBI build 34) from 90 unrelated Koreans by fluorescence polarization detection and compared the data with Asian data from the HapMap project. Despite some differences in the position of high LD region boundaries, the overall patterns of LD were remarkably similar across the three samples, reflecting strong genetic affinities among them. Furthermore, the haplotype tag SNP transferability across the three samples was greater than 90%. Our results support the initial suggestion that the populations genotyped in the HapMap project might serve as reference populations for the selection of tagSNPs in association studies.
Collapse
Affiliation(s)
- Jiyoung Lim
- Department of Biochemistry and Molecular Biology, University of Ulsan College of Medicine, 388-1 Poongnap-Dong, Songpa-Gu, Seoul 138-736, Korea
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1118
|
Abstract
MOTIVATION Recent studies have shown that a small subset of Single Nucleotide Polymorphisms (SNPs) (called tag SNPs) is sufficient to capture the haplotype patterns in a high linkage disequilibrium region. To find the minimum set of tag SNPs, exact algorithms for finding the optimal solution could take exponential time. On the other hand, approximation algorithms are more efficient but may fail to find the optimal solution. RESULTS We propose a hybrid method that combines the ideas of the branch-and-bound method and the greedy algorithm. This method explores larger solution space to obtain a better solution than a traditional greedy algorithm. It also allows the user to adjust the efficiency of the program and quality of solutions. This algorithm has been implemented and tested on a variety of simulated and biological data. The experimental results indicate that our program can find better solutions than previous methods. This approach is quite general since it can be used to adapt other greedy algorithms to solve their corresponding problems. AVAILABILITY The program is available upon request.
Collapse
Affiliation(s)
- Chia-Jung Chang
- Department of Computer Science and Information Engineering, National Taiwan University Taipei, Taiwan
| | | | | |
Collapse
|
1119
|
Abstract
Evaluation of the association of haplotypes with either quantitative traits or disease status is common practice, and under some situations provides greater power than the evaluation of individual marker loci. The focus on haplotype analyses will increase as more single nucleotide polymorphisms (SNPs) are discovered, either because of interest in candidate gene regions, or because of interest in genome-wide association studies. However, there is little guidance on the determination of the sample size needed to achieve the desired power for a study, particularly when linkage phase of the haplotypes is unknown, and when a subset of tag-SNP markers is measured. There is a growing wealth of information on the distribution of haplotypes in different populations, and it is not unusual for investigators to measure genetic markers in pilot studies in order to gain knowledge of the distribution of haplotypes in the target population. Starting with this basic information on the distribution of haplotypes, we derive analytic methods to determine sample size or power to test the association of haplotypes with either a quantitative trait or disease status (e.g., a case-control study design), assuming that all subjects are unrelated. Our derivations cover both phase-known and phase-unknown haplotypes, allowing evaluation of the loss of efficiency due to unknown phase. We also extend our methods to when a subset of tag-SNPs is chosen, allowing investigators to explore the impact of tag-SNPs on power. Simulations illustrate that the theoretical power predictions are quite accurate over a broad range of conditions. Our theoretical formulae should provide useful guidance when planning haplotype association studies.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN 55905, USA.
| |
Collapse
|
1120
|
Gunderson KL, Steemers FJ, Ren H, Ng P, Zhou L, Tsan C, Chang W, Bullis D, Musmacker J, King C, Lebruska LL, Barker D, Oliphant A, Kuhn KM, Shen R. Whole-genome genotyping. Methods Enzymol 2006; 410:359-76. [PMID: 16938560 DOI: 10.1016/s0076-6879(06)10017-8] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We have developed an array-based whole-genome genotyping (WGG) assay (Infinium) using our BeadChip platform that effectively enables unlimited multiplexing and unconstrained single nucleotide polymorphism (SNP) selection. A single tube whole-genome amplification reaction is used to amplify the genome, and loci of interest are captured by specific hybridization of amplified gDNA to 50-mer probe arrays. After target capture, SNPs are genotyped on the array by a primer extension reaction in the presence of hapten-labeled nucleotides. The resultant signal is amplified during staining and the array is read out on a high-resolution confocal scanner. We have employed our high-density BeadChips supporting up to 288,000 bead types to create an array that can query over 100,000 SNPs using the Infinium assay. In addition, we have developed an automated BeadChip processing platform using Tecan's GenePaint slide processing system. Hybridization, washing, array-based primer extension, and staining are performed directly in Tecan's capillary gap Te-Flow chambers. This automation process increases assay robustness and throughput greatly while enabling laboratory information management system control of sample tracking.
Collapse
|
1121
|
Packer BR, Yeager M, Burdett L, Welch R, Beerman M, Qi L, Sicotte H, Staats B, Acharya M, Crenshaw A, Eckert A, Puri V, Gerhard DS, Chanock SJ. SNP500Cancer: a public resource for sequence validation, assay development, and frequency analysis for genetic variation in candidate genes. Nucleic Acids Res 2006; 34:D617-21. [PMID: 16381944 PMCID: PMC1347513 DOI: 10.1093/nar/gkj151] [Citation(s) in RCA: 220] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2005] [Revised: 10/28/2005] [Accepted: 10/28/2005] [Indexed: 11/12/2022] Open
Abstract
The SNP500Cancer database provides sequence and genotype assay information for candidate SNPs useful in mapping complex diseases, such as cancer. The database is an integral component of the NCI Cancer Genome Anatomy Project (http://cgap.nci.nih.gov). SNP500Cancer reports sequence analysis of anonymized control DNA samples (n = 102 Coriell samples representing four self-described ethnic groups: African/African-American, Caucasian, Hispanic and Pacific Rim). The website is searchable by gene, chromosome, gene ontology pathway, dbSNP ID and SNP500Cancer SNP ID. As of October 2005, the database contains >13 400 SNPs, 9124 of which have been sequenced in the SNP500Cancer population. For each analysed SNP, gene location and >200 bp of surrounding annotated sequence (including nearby SNPs) are provided, with frequency information in total and per subpopulation as well as calculation of Hardy-Weinberg equilibrium for each subpopulation. The website provides the conditions for validated sequencing and genotyping assays, as well as genotype results for the 102 samples, in both viewable and downloadable formats. A subset of sequence validated SNPs with minor allele frequency >5% are entered into a high-throughput pipeline for genotyping analysis to determine concordance for the same 102 samples. In addition, the results of genotype analysis for select validated SNP assays (defined as 100% concordance between sequence analysis and genotype results) are posted for an additional 280 samples drawn from the Human Diversity Panel (HDP). SNP500Cancer provides an invaluable resource for investigators to select SNPs for analysis, design genotyping assays using validated sequence data, choose selected assays already validated on one or more genotyping platforms, and select reference standards for genotyping assays. The SNP500Cancer database is freely accessible via the web page at http://snp500cancer.nci.nih.gov.
Collapse
Affiliation(s)
- Bernice R Packer
- Intramural Research Support Program, SAIC-Frederick, NCI-FCRDC, Frederick, MD, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1122
|
He J, Zelikovsky A. Multiple linear regression for index SNP selection on unphased genotypes. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2006; 2006:5759-5762. [PMID: 17946329 DOI: 10.1109/iembs.2006.259408] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The search for the association between complex diseases and single nucleotide polymorphism (SNPs) or haplotypes has recently received great attention. Recent successes in high throughput genotyping technologies drastically increase the length of available SNP sequences. This elevates the importance for the use of a small subset of informative SNPs, called index SNPs, accurately representing the rest of the SNPs (i.e., the rest of the SNPs can be highly predicted from the index SNPs). Index SNP selection achieves the compaction of huge unphased genotype data (obtained, e.g., from Affimetrix Map Array) in order to make feasible fine genotype analysis. In this paper we propose a novel index SNP selection on unphased genotypes based on multiple linear regression (MLR) SNP prediction. We measure the quality of our index SNP selection algorithm by comparing actual SNPs with the SNPs computationally predicted from chosen index SNPs. We obtain an extremely good prediction rates and compression. For example, for region ENm010 (123 SNPs), we can use 2% of SNPs for representing all SNPs with 93.5% accuracy. An experimental study on 4 ENCODE regions from HapMap shows that our method uses significantly fewer index SNPs (e.g., up to two times less index SNPs to reach 90% prediction accuracy) than the state-of-the-art method of Halperin et al. for genotypes.
Collapse
Affiliation(s)
- Jingwu He
- Fac. Comput. Sci., Georgia State Univ., Atlanta, GA 30318, USA.
| | | |
Collapse
|
1123
|
Chen Q, Kamboh MI. Complete DNA Sequence Variation in the Apolipoprotein H (beta2-glycoprotein I) Gene and Identification of Informative SNPs. Ann Hum Genet 2006; 70:1-11. [PMID: 16441253 DOI: 10.1111/j.1529-8817.2005.00211.x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Apolipoprotein H (APOH), also known as beta2-glycoprotein I, is a major antigen for the production of antiphospholipid antibodies in autoimmune diseases. Previously we have examined DNA variation in the coding region of the APOH gene and determined the molecular basis of the common protein polymorphism. Here we report the results of DNA sequence variation in the entire APOH gene encompassing a 20.3 kb region in 46 Caucasian Americans and 48 African American chromosomes. A total of 150 single nucleotide polymorphisms (SNPs) and one tri-allelic polymorphism were identified, including 8 in the coding region, 14 in the 5'-region and 2 in the 3'- region; the remainder were observed in introns. The observed number of SNPs was higher in the African American sample than in the Caucasian sample (130 vs. 84). We examined the race-specific linkage disequilibrium pattern among SNPs and identified maximally informative SNPs for future association studies. Altogether, we have identified 17 informative SNPs among Caucasians and 35 in blacks. The discovery of a full range of sequence variation and identification of race-specific informative SNPs in the APOH gene may facilitate the rapid evaluation of this variation in relation to autoimmune diseases.
Collapse
Affiliation(s)
- Qi Chen
- Department of Human Genetics, Graduate School of Public Heath, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | | |
Collapse
|
1124
|
|
1125
|
Abstract
Background Genome-wide association will soon be available to use as an adjunct to traditional linkage analysis. We studied alcoholism in 119 families collected by the Collaborative Study on the Genetics of Alcoholism and made available in Genetic Analysis Workshop 14, using genome-wide linkage and association analyses. Methods Genome-wide linkage analysis was first performed using microsatellite markers and a region with the strongest linkage evidence was further analyzed using single-nucleotide polymorphisms (SNPs). Family based genome-wide association test was also conducted using the SNPs. Results Nonparametric linkage analysis revealed weak linkage evidence on chromosome 7, and association analysis identified SNP tsc0515272 on chromosome 3 as significantly associated with alcoholism. Conclusion Linkage analysis may require large sample sizes and high quality genotyping and marker maps to adequately improve power, while association analysis could hold more promise in efforts to identify variants responsible for complex traits.
Collapse
Affiliation(s)
- Xiaofeng Zhu
- Department of Preventive Medicine and Epidemiology, Loyola University Medical Center, Maywood, IL 60153
| | - Richard Cooper
- Department of Preventive Medicine and Epidemiology, Loyola University Medical Center, Maywood, IL 60153
| | - Donghui Kan
- Department of Preventive Medicine and Epidemiology, Loyola University Medical Center, Maywood, IL 60153
| | - Guichan Cao
- Department of Preventive Medicine and Epidemiology, Loyola University Medical Center, Maywood, IL 60153
| | - Xiaodong Wu
- Department of Preventive Medicine and Epidemiology, Loyola University Medical Center, Maywood, IL 60153
| |
Collapse
|
1126
|
Indap AR, Marth GT, Struble CA, Tonellato P, Olivier M. Analysis of concordance of different haplotype block partitioning algorithms. BMC Bioinformatics 2005; 6:303. [PMID: 16356172 PMCID: PMC1343594 DOI: 10.1186/1471-2105-6-303] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2005] [Accepted: 12/15/2005] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Different classes of haplotype block algorithms exist and the ideal dataset to assess their performance would be to comprehensively re-sequence a large genomic region in a large population. Such data sets are expensive to collect. Alternatively, we performed coalescent simulations to generate haplotypes with a high marker density and compared block partitioning results from diversity based, LD based, and information theoretic algorithms under different values of SNP density and allele frequency. RESULTS We simulated 1000 haplotypes using the standard coalescent for three world populations--European, African American, and East Asian--and applied three classes of block partitioning algorithms--diversity based, LD based, and information theoretic. We assessed algorithm differences in number, size, and coverage of blocks inferred under different conditions of SNP density, allele frequency, and sample size. Each algorithm inferred blocks differing in number, size, and coverage under different density and allele frequency conditions. Different partitions had few if any matching block boundaries. However they still overlapped and a high percentage of total chromosomal region was common to all methods. This percentage was generally higher with a higher density of SNPs and when rarer markers were included. CONCLUSION A gold standard definition of a haplotype block is difficult to achieve, but collecting haplotypes covered with a high density of SNPs, partitioning them with a variety of block algorithms, and identifying regions common to all methods may be the best way to identify genomic regions that harbor SNP variants that cause disease.
Collapse
Affiliation(s)
- Amit R Indap
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, USA
| | - Gabor T Marth
- Department of Biology, Boston College, Chestnut Hill, USA
| | - Craig A Struble
- Department of Mathematics, Statistics, and Computer Science, Marquette University, Milwaukee, USA
| | - Peter Tonellato
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, USA
| | - Michael Olivier
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, USA
| |
Collapse
|
1127
|
Abstract
Currently, more than 10 million DNA sequence variations have been uncovered in the human genome. The most detailed variation discovery efforts have focused on candidate genes involved in cardiovascular disease or in susceptibilities associated with exposure to environmental agents. Here we provide an overview of natural genetic variation from the literature and in 510 human candidate genes resequenced for variation discovery. The average human gene contains 126 biallelic polymorphisms, 46 of which are common (> or =5% minor allele frequency) and 5 of which are found in coding regions. Using this complete picture of genetic diversity, we explore conservation, signatures of selection, and historical recombination to mine information useful for candidate gene association studies. In general, we find that the patterns of human gene variation suggest that no one approach will be appropriate for genetic association studies across all genes. Therefore, many different approaches may be required to identify the elusive genotypes associated with common human phenotypes.
Collapse
Affiliation(s)
- Dana C Crawford
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
| | | | | |
Collapse
|
1128
|
Ribas G, González-Neira A, Salas A, Milne RL, Vega A, Carracedo B, González E, Barroso E, Fernández LP, Yankilevich P, Robledo M, Carracedo A, Benítez J. Evaluating HapMap SNP data transferability in a large-scale genotyping project involving 175 cancer-associated genes. Hum Genet 2005; 118:669-79. [PMID: 16323010 DOI: 10.1007/s00439-005-0094-9] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2005] [Accepted: 10/11/2005] [Indexed: 11/26/2022]
Abstract
One of the many potential uses of the HapMap project is its application to the investigation of complex disease aetiology among a wide range of populations. This study aims to assess the transferability of HapMap SNP data to the Spanish population in the context of cancer research. We have carried out a genotyping study in Spanish subjects involving 175 candidate cancer genes using an indirect gene-based approach and compared results with those for HapMap CEU subjects. Allele frequencies were very consistent between the two samples, with a high positive correlation (R) of 0.91 (P<<1x10(-6)). Linkage disequilibrium patterns and block structures across each gene were also very similar, with disequilibrium coefficient (r (2)) highly correlated (R=0.95, P<<1x10(-6)). We found that of the 21 genes that contained at least one block larger than 60 kb, nine (ATM, ATR, BRCA1, ERCC6, FANCC, RAD17, RAD50, RAD54B and XRCC4) belonged to the GO category "DNA repair". Haplotype frequencies per gene were also highly correlated (mean R=0.93), as was haplotype diversity (R=0.91, P<<1x10(-6)). "Yin yang" haplotypes were observed for 43% of the genes analysed and 18% of those were identical to the ancestral haplotype (identified in Chimpazee). Finally, the portability of tagSNPs identified in the HapMap CEU data using pairwise r (2) thresholds of 0.8 and 0.5 was assessed by applying these to the Spanish and current HapMap data for 66 genes. In general, the HapMap tagSNPs performed very well. Our results show generally high concordance with HapMap data in allele frequencies and haplotype distributions and confirm the applicability of HapMap SNP data to the study of complex diseases among the Spanish population.
Collapse
Affiliation(s)
- Gloria Ribas
- Grupo de Genética Humana Programa de Patología Molecular, Centro Nacional de Investigaciones Oncológicas (CNIO), C/Melchor Fdz Almagro 3, E-28029 Madrid, Spain.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1129
|
|
1130
|
Biskup S, Mueller JC, Sharma M, Lichtner P, Zimprich A, Berg D, Wüllner U, Illig T, Meitinger T, Gasser T. Common variants of LRRK2 are not associated with sporadic Parkinson's disease. Ann Neurol 2005; 58:905-8. [PMID: 16254973 DOI: 10.1002/ana.20664] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Multiple mutations in the gene for the leucine-rich repeat kinase (LRRK2) cause autosomal dominant late-onset parkinsonism (PARK8). The Gly2019Ser mutation appears to be common in different populations. To investigate whether this novel gene influences the non-Mendelian sporadic form of Parkinson's disease, we genotyped 121 single nucleotide polymorphisms comprehensively covering the entire LRRK2 gene region in a set of 340 Parkinson's disease patients and 680 matched control subjects from Germany. No association could be demonstrated. We have therefore no evidence for the existence of a common variant in LRRK2 that has a strong influence on Parkinson's disease risk.
Collapse
Affiliation(s)
- Saskia Biskup
- Institute of Human Genetics, GSF National Research Center for Environment and Health, Neuherberg, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1131
|
Reiner AP, Carlson CS, Rieder MJ, Schwartz SM, Siscovick DS. Common genomic sequence variation of the prothrombin gene and risk of non-fatal myocardial infarction in white women. J Thromb Haemost 2005; 3:2809-11. [PMID: 16359521 DOI: 10.1111/j.1538-7836.2005.01641.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
1132
|
Nothnagel M, Rohde K. The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates. Am J Hum Genet 2005; 77:988-98. [PMID: 16380910 PMCID: PMC1285181 DOI: 10.1086/498175] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2004] [Accepted: 09/16/2004] [Indexed: 11/03/2022] Open
Abstract
The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation.
Collapse
Affiliation(s)
- Michael Nothnagel
- Department of Bioinformatics, Max Delbrück Center for Molecular Medicine, Berlin, Germany.
| | | |
Collapse
|
1133
|
Hunter DJ, Riboli E, Haiman CA, Albanes D, Altshuler D, Chanock SJ, Haynes RB, Henderson BE, Kaaks R, Stram DO, Thomas G, Thun MJ, Blanché H, Buring JE, Burtt NP, Calle EE, Cann H, Canzian F, Chen YC, Colditz GA, Cox DG, Dunning AM, Feigelson HS, Freedman ML, Gaziano JM, Giovannucci E, Hankinson SE, Hirschhorn JN, Hoover RN, Key T, Kolonel LN, Kraft P, Le Marchand L, Liu S, Ma J, Melnick S, Pharaoh P, Pike MC, Rodriguez C, Setiawan VW, Stampfer MJ, Trapido E, Travis R, Virtamo J, Wacholder S, Willett WC. A candidate gene approach to searching for low-penetrance breast and prostate cancer genes. Nat Rev Cancer 2005; 5:977-85. [PMID: 16341085 DOI: 10.1038/nrc1754] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Most cases of breast and prostate cancer are not associated with mutations in known high-penetrance genes, indicating the involvement of multiple low-penetrance risk alleles. Studies that have attempted to identify these genes have met with limited success. The National Cancer Institute Breast and Prostate Cancer Cohort Consortium--a pooled analysis of multiple large cohort studies with a total of more than 5,000 cases of breast cancer and 8,000 cases of prostate cancer--was therefore initiated. The goal of this consortium is to characterize variations in approximately 50 genes that mediate two pathways that are associated with these cancers--the steroid-hormone metabolism pathway and the insulin-like growth factor signalling pathway--and to associate these variations with cancer risk.
Collapse
|
1134
|
Wagenleiter SEN, Jagiello P, Akkad DA, Arning L, Griga T, Klein W, Epplen JT. On the genetic involvement of apoptosis-related genes in Crohn's disease as revealed by an extended association screen using 245 markers: no evidence for new predisposing factors. J Negat Results Biomed 2005; 4:8. [PMID: 16318629 PMCID: PMC1315346 DOI: 10.1186/1477-5751-4-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2005] [Accepted: 11/30/2005] [Indexed: 12/29/2022] Open
Abstract
Crohn's disease (CD) presents as an inflammatory barrier disease with characteristic destructive processes in the intestinal wall. Although the pathomechanisms of CD are still not exactly understood, there is evidence that, in addition to e.g. bacterial colonisation, genetic predisposition contributes to the development of CD. In order to search for predisposing genetic factors we scrutinised 245 microsatellite markers in a population-based linkage mapping study. These microsatellites cover gene loci the encoded protein of which take part in the regulation of apoptosis and (innate) immune processes. Respective loci contribute to the activation/suppression of apoptosis, are involved in signal transduction and cell cycle regulators or they belong to the tumor necrosis factor superfamily, caspase related genes or the BCL2 family. Furthermore, several cytokines as well as chemokines were included. The approach is based on three steps: analyzing pooled DNAs of patients and controls, verification of significantly differing microsatellite markers by genotyping individual DNA samples and, finally, additional reinvestigation of the respective gene in the region covered by the associated microsatellite by analysing single-nucleotide polymorphisms (SNPs). Using this step-wise process we were unable to demonstrate evidence for genetic predisposition of the chosen apoptosis- and immunity-related genes with respect to susceptibility for CD.
Collapse
Affiliation(s)
| | - Peter Jagiello
- Institute for Clinical Molecular Biology, University Schleswig-Holstein, Kiel, Germany
| | - Denis A Akkad
- Department of Human Genetics, Ruhr-University, Bochum, Germany
| | - Larissa Arning
- Department of Human Genetics, Ruhr-University, Bochum, Germany
| | - Thomas Griga
- Department of Gastroenterology, University Hospital Bergmannsheil, Bochum, Germany
| | - Wolfram Klein
- Department of Human Genetics, Ruhr-University, Bochum, Germany
| | - Jörg T Epplen
- Department of Human Genetics, Ruhr-University, Bochum, Germany
| |
Collapse
|
1135
|
Thompson EE, Kuttab-Boulos H, Yang L, Roe BA, Di Rienzo A. Sequence diversity and haplotype structure at the human CYP3A cluster. THE PHARMACOGENOMICS JOURNAL 2005; 6:105-14. [PMID: 16314882 DOI: 10.1038/sj.tpj.6500347] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The four members of the human CYP3A subfamily play important roles in the clearance of xenobiotics, hormones, and environmental compounds. Many SNPs at the CYP3A locus have been characterized, with several showing large allele frequency differences across populations. In addition to the effects of CYP3A SNPs on drug metabolism, recent studies have highlighted the potential for CYP3A variation in susceptibility to several common phenotypes, including hypertension and cancer. We previously showed that the CYP3A4 and CYP3A5 genes have a strong haplotype structure at varying frequencies across ethnic groups. Here, we extend our re-sequencing survey to the remaining CYP3A genes in the same cluster, CYP3A7 and CYP3A43. Our study identified a large number of SNPs in coding and conserved noncoding sequences, several of which are common. The combined data set allows us to investigate patterns of sequence variation and linkage disequilibrium at the entire CYP3A locus for use in future association studies.
Collapse
|
1136
|
Abstract
Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.
Collapse
|
1137
|
Chien JW, Zhao LP, Hansen JA, Fan WH, Parimon T, Clark JG. Genetic variation in bactericidal/permeability-increasing protein influences the risk of developing rapid airflow decline after hematopoietic cell transplantation. Blood 2005; 107:2200-7. [PMID: 16304058 PMCID: PMC1895720 DOI: 10.1182/blood-2005-06-2338] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Innate immunity is involved in the biology of graft versus host disease and common airway diseases. We screened 15 genes in this pathway using a linkage disequilibrium-based approach to identify potential candidate genes that may be involved in the development of airflow obstruction after hematopoietic cell transplantation. Sixty-nine single-nucleotide polymorphisms were selected for assessment in a discovery cohort (n = 363). Significant associations were validated in a validation cohort (n = 209). Expression of the candidate gene was demonstrated by detecting gene transcript and protein in malignant and normal small airway epithelial cells. In the discovery cohort, 133 patients developed significant airflow decline. Four patient and donor bactericidal/permeability-increasing (BPI) haplotypes were associated with a 2-fold to 3-fold increased risk of developing significant airflow decline (P values, .004-.038). This association was confirmed in the validation cohort, which had 66 patients with significant airflow decline, with 9 significant haplotypes (P values, .013-.043). BPI gene transcript and protein were detected in airway epithelial cells. These results suggest mutations in the BPI gene significantly influence the risk of developing rapid airflow decline after hematopoietic cell transplantation and may represent a novel therapeutic target for this form of airway disease.
Collapse
Affiliation(s)
- Jason W Chien
- Pulmonary and Critical Care Section, Clinical Research Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave North, D5-280, Seattle, WA 98109-1024, USA.
| | | | | | | | | | | |
Collapse
|
1138
|
Qin ZS, Gopalakrishnan S, Abecasis GR. An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics 2005; 22:220-5. [PMID: 16269414 DOI: 10.1093/bioinformatics/bti762] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Selecting SNP markers for genome-wide association studies is an important and challenging task. The goal is to minimize the number of markers selected for genotyping in a particular platform and therefore reduce genotyping cost while simultaneously maximizing the information content provided by selected markers. RESULTS We devised an improved algorithm for tagSNP selection using the pairwise r(2) criterion. We first break down large marker sets into disjoint pieces, where more exhaustive searches can replace the greedy algorithm for tagSNP selection. These exhaustive searches lead to smaller tagSNP sets being generated. In addition, our method evaluates multiple solutions that are equivalent according to the linkage disequilibrium criteria to accommodate additional constraints. Its performance was assessed using HapMap data. AVAILABILITY A computer program named FESTA has been developed based on this algorithm. The program is freely available and can be downloaded at http://www.sph.umich.edu/csg/qin/FESTA/
Collapse
Affiliation(s)
- Zhaohui S Qin
- Center for Statistical Genetics, Department of Biostatistics, School of Public Health, University of Michigan 1420 Washington Heights, Ann Arbor, MI 48109-2029, USA.
| | | | | |
Collapse
|
1139
|
Carlson CS, Thomas DJ, Eberle MA, Swanson JE, Livingston RJ, Rieder MJ, Nickerson DA. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res 2005; 15:1553-65. [PMID: 16251465 PMCID: PMC1310643 DOI: 10.1101/gr.4326505] [Citation(s) in RCA: 201] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2005] [Accepted: 09/06/2005] [Indexed: 01/14/2023]
Abstract
The allele frequency spectrum of polymorphisms in DNA sequences can be used to test for signatures of natural selection that depart from the expected frequency spectrum under the neutral theory. We observed a significant (P = 0.001) correlation between the Tajima's D test statistic in full resequencing data and Tajima's D in a dense, genome-wide data set of genotyped polymorphisms for a set of 179 genes. Based on this, we used a sliding window analysis of Tajima's D across the human genome to identify regions putatively subject to strong, recent, selective sweeps. This survey identified seven Contiguous Regions of Tajima's D Reduction (CRTRs) in an African-descent population (AD), 23 in a European-descent population (ED), and 29 in a Chinese-descent population (XD). Only four CRTRs overlapped between populations: three between ED and XD and one between AD and ED. Full resequencing of eight genes within six CRTRs demonstrated frequency spectra inconsistent with neutral expectations for at least one gene within each CRTR. Identification of the functional polymorphism (and/or haplotype) responsible for the selective sweeps within each CRTR may provide interesting insights into the strongest selective pressures experienced by the human genome over recent evolutionary history.
Collapse
Affiliation(s)
- Christopher S Carlson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-7730, USA.
| | | | | | | | | | | | | |
Collapse
|
1140
|
Huang YT, Zhang K, Chen T, Chao KM. Selecting additional tag SNPs for tolerating missing data in genotyping. BMC Bioinformatics 2005; 6:263. [PMID: 16259642 PMCID: PMC1316880 DOI: 10.1186/1471-2105-6-263] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2005] [Accepted: 11/01/2005] [Indexed: 11/13/2022] Open
Abstract
Background Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data. Results We show there exists a subset of SNPs (referred to as robust tag SNPs) which can still distinguish all distinct haplotypes even when some SNPs are missing. The problem of finding minimum robust tag SNPs is shown to be NP-hard. To find robust tag SNPs efficiently, we propose two greedy algorithms and one linear programming relaxation algorithm. The experimental results indicate that (1) the solutions found by these algorithms are quite close to the optimal solution; (2) the genotyping cost saved by using tag SNPs can be as high as 80%; and (3) genotyping additional tag SNPs for tolerating missing data is still cost-effective. Conclusion Genotyping robust tag SNPs is more practical than just genotyping the minimum tag SNPs if we can not avoid the occurrence of missing data. Our theoretical analysis and experimental results show that the performance of our algorithms is not only efficient but the solution found is also close to the optimal solution.
Collapse
Affiliation(s)
- Yao-Ting Huang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
| | - Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, USA
| | - Ting Chen
- Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Kun-Mao Chao
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
- Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
1141
|
Hüffmeier U, Lascorz J, Traupe H, Böhm B, Schürmeier-Horst F, Ständer M, Kelsch R, Baumann C, Küster W, Burkhardt H, Reis A. Systematic Linkage Disequilibrium Analysis of SLC12A8 at PSORS5 Confirms a Role in Susceptibility to Psoriasis Vulgaris. J Invest Dermatol 2005; 125:906-12. [PMID: 16297188 DOI: 10.1111/j.0022-202x.2005.23847.x] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The gene for solute carrier family 12 member A8 has recently been proposed as a candidate gene for psoriasis susceptibility (PSORS5) on chromosome 3q based on association of five single nucleotide polymorphisms (SNP) in Swedish patients. To investigate whether this locus is relevant for German psoriasis vulgaris (PsV) patients, we analyzed a group of 210 trios and a case-control group including 375 patients. Based on our investigation of the linkage disequilibrium (LD) structure of SLC12A8, we assayed 35 haplotype tag SNP and grouped them into nine LD-blocks. In the case-control study, we detected an association for six SNP and three LD-based haplotypes. Association was strongest for ss35527511 (chi2 = 11.224, p = 0.0008) and haplotype E-2 (chi2 = 11.788, p = 0.00059) and independent of the presence of an HLA-associated PSORS1 risk allele. Through extended haplotype analysis, we could show that two independent association signals exist in SLC12A8, suggesting allelic heterogeneity. None of the SNP showed association in trios, apart from a weak association of rs2228674 (transmission disequilibrium test statistics p = 0.048), probably due to insufficient power. We conclude that SLC12A8 is a susceptibility locus for PsV. In order to establish the exact nature of this association, efforts to identify the disease-causing variants are ongoing.
Collapse
Affiliation(s)
- Ulrike Hüffmeier
- Institute of Human Genetics, University Erlangen-Nuremberg, Erlangen, Germany
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1142
|
Wang H, Chu W, Wang X, Zhang Z, Elbein SC. Evaluation of sequence variants in the pre-B cell leukemia transcription factor 1 gene: a positional and functional candidate for type 2 diabetes and impaired insulin secretion. Mol Genet Metab 2005; 86:384-91. [PMID: 16140554 DOI: 10.1016/j.ymgme.2005.07.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2005] [Revised: 06/29/2005] [Accepted: 07/06/2005] [Indexed: 10/25/2022]
Abstract
Pre-B cell leukemia transcription factor 1 (PBX1) encodes a homeodomain containing protein that is essential for pancreatic development and interacts with insulin promoter factor 1 to regulate insulin secretion. PBX1 maps to chromosome 1q22, a region with replicated linkage to type 2 diabetes (T2DM). We screened for sequence variation in nine exons, intronic regions flanking the exons, the 3' untranslated region (3' UTR), as well as 1-kb upstream of exon 1 in 16 Caucasians and 16 African American individuals with T2DM. We evaluated 18 variants including the nonsynonymous substitution G21S in exon 1, one 4 bp insertion/deletion, and one 7 bp insertion/deletion. We typed 10 variants on the basis of frequency and linkage disequilibrium patterns unrelated Caucasian subjects with T2DM and controls, and nine common variants in 129 Caucasian individuals for whom we had detailed assessments of insulin action and insulin secretion. We typed four common variants in African Americans individuals and additional SNPs in pooled DNA samples from both populations. No coding variant was associated with diabetes and no association was found among African American subjects. However, three variants in Caucasians (78287, 91227, and 252050 bp) were associated with T2DM (p<0.05), as were four marker haplotypes that included intron 2 variants. Additionally, three variants including G21S (61 bp) and the diabetes associated SNP at 78287 were significant determinants of insulin sensitivity (S(I)) in interaction with body mass index (p<0.02). Sequence variants in different locations of the PBX1 gene may have modest pleiotropic effects on T2DM susceptibility in Caucasians.
Collapse
Affiliation(s)
- Hua Wang
- Division of Endocrinology and Metabolism, Department of Medicine, College of Medicine, University of Arkansas for Medical Sciences, USA
| | | | | | | | | |
Collapse
|
1143
|
Pardi F, Lewis CM, Whittaker JC. SNP Selection for Association Studies: Maximizing Power across SNP Choice and Study Size. Ann Hum Genet 2005; 69:733-46. [PMID: 16266411 DOI: 10.1111/j.1529-8817.2005.00202.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Selection of single nucleotide polymorphisms (SNPs) is a problem of primary importance in association studies and several approaches have been proposed. However, none provides a satisfying answer to the problem of how many SNPs should be selected, and how this should depend on the pattern of linkage disequilibrium (LD) in the region under consideration. Moreover, SNP selection is usually considered as independent from deciding the sample size of the study. However, when resources are limited there is a tradeoff between the study size and the number of SNPs to genotype. We show that tuning the SNP density to the LD pattern can be achieved by looking for the best solution to this tradeoff. Our approach consists of formulating SNP selection as an optimization problem: the objective is to maximize the power of the final association study, whilst keeping the total costs below a given budget. We also propose two alternative algorithms for the solution of this optimization problem: a genetic algorithm and a hill climbing search. These standard techniques efficiently find good solutions, even when the number of possible SNPs to choose from is large. We compare the performance of these two algorithms on different chromosomal regions and show that, as expected, the selected SNPs reflect the LD pattern: the optimal SNP density varies dramatically between chromosomal regions.
Collapse
Affiliation(s)
- F Pardi
- Department of Medical and Molecular Genetics, Guy's, King's and St. Thomas' School of Medicine, King's College London, London, UK
| | | | | |
Collapse
|
1144
|
Maraganore DM, de Andrade M, Lesnick TG, Strain KJ, Farrer MJ, Rocca WA, Pant PVK, Frazer KA, Cox DR, Ballinger DG. High-resolution whole-genome association study of Parkinson disease. Am J Hum Genet 2005; 77:685-93. [PMID: 16252231 PMCID: PMC1271381 DOI: 10.1086/496902] [Citation(s) in RCA: 368] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2005] [Accepted: 07/28/2005] [Indexed: 01/13/2023] Open
Abstract
We performed a two-tiered, whole-genome association study of Parkinson disease (PD). For tier 1, we individually genotyped 198,345 uniformly spaced and informative single-nucleotide polymorphisms (SNPs) in 443 sibling pairs discordant for PD. For tier 2a, we individually genotyped 1,793 PD-associated SNPs (P<.01 in tier 1) and 300 genomic control SNPs in 332 matched case-unrelated control pairs. We identified 11 SNPs that were associated with PD (P<.01) in both tier 1 and tier 2 samples and had the same direction of effect. For these SNPs, we combined data from the case-unaffected sibling pair (tier 1) and case-unrelated control pair (tier 2) samples and employed a liberalization of the sibling transmission/disequilibrium test to calculate odds ratios, 95% confidence intervals, and P values. A SNP within the semaphorin 5A gene (SEMA5A) had the lowest combined P value (P=7.62 x 10(-6)). The protein encoded by this gene plays an important role in neurogenesis and in neuronal apoptosis, which is consistent with existing hypotheses regarding PD pathogenesis. A second SNP tagged the PARK11 late-onset PD susceptibility locus (P=1.70 x 10(-5)). In tier 2b, we also selected for genotyping additional SNPs that were borderline significant (P<.05) in tier 1 but that tested a priori biological and genetic hypotheses regarding susceptibility to PD (n=941 SNPs). In analysis of the combined tier 1 and tier 2b data, the two SNPs with the lowest P values (P=9.07 x 10(-6); P=2.96 x 10(-5)) tagged the PARK10 late-onset PD susceptibility locus. Independent replication across populations will clarify the role of the genomic loci tagged by these SNPs in conferring PD susceptibility.
Collapse
|
1145
|
Benayed R, Gharani N, Rossman I, Mancuso V, Lazar G, Kamdar S, Bruse SE, Tischfield S, Smith BJ, Zimmerman RA, Dicicco-Bloom E, Brzustowicz LM, Millonig JH. Support for the homeobox transcription factor gene ENGRAILED 2 as an autism spectrum disorder susceptibility locus. Am J Hum Genet 2005; 77:851-68. [PMID: 16252243 PMCID: PMC1271392 DOI: 10.1086/497705] [Citation(s) in RCA: 139] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2005] [Accepted: 08/26/2005] [Indexed: 11/03/2022] Open
Abstract
Our previous research involving 167 nuclear families from the Autism Genetic Resource Exchange (AGRE) demonstrated that two intronic SNPs, rs1861972 and rs1861973, in the homeodomain transcription factor gene ENGRAILED 2 (EN2) are significantly associated with autism spectrum disorder (ASD). In this study, significant replication of association for rs1861972 and rs1861973 is reported for two additional data sets: an independent set of 222 AGRE families (rs1861972-rs1861973 haplotype, P=.0016) and a separate sample of 129 National Institutes of Mental Health families (rs1861972-rs1861973 haplotype, P=.0431). Association analysis of the haplotype in the combined sample of both AGRE data sets (389 families) produced a P value of .0000033, whereas combining all three data sets (518 families) produced a P value of .00000035. Population-attributable risk calculations for the associated haplotype, performed using the entire sample of 518 families, determined that the risk allele contributes to as many as 40% of ASD cases in the general population. Linkage disequilibrium (LD) mapping with the use of polymorphisms distributed throughout the gene has shown that only intronic SNPs are in strong LD with rs1861972 and rs1861973. Resequencing and association analysis of all intronic SNPs have identified alleles associated with ASD, which makes them candidates for future functional analysis. Finally, to begin defining the function of EN2 during development, mouse En2 was ectopically expressed in cortical precursors. Fewer En2-transfected cells than controls displayed a differentiated phenotype. Together, these data provide further genetic evidence that EN2 might act as an ASD susceptibility locus, and they suggest that a risk allele that perturbs the spatial/temporal expression of EN2 could significantly alter normal brain development.
Collapse
Affiliation(s)
- Rym Benayed
- Center for Advanced Biotechnology and Medicine, Piscataway, NJ, 08854-5638, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1146
|
Zeggini E, Rayner W, Morris AP, Hattersley AT, Walker M, Hitman GA, Deloukas P, Cardon LR, McCarthy MI. An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets. Nat Genet 2005; 37:1320-2. [PMID: 16258542 DOI: 10.1038/ng1670] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2005] [Accepted: 10/04/2005] [Indexed: 11/09/2022]
Abstract
A substantial investment has been made in the generation of large public resources designed to enable the identification of tag SNP sets, but data establishing the adequacy of the sample sizes used are limited. Using large-scale empirical and simulated data sets, we found that the sample sizes used in the HapMap project are sufficient to capture common variation, but that performance declines substantially for variants with minor allele frequencies of <5%.
Collapse
Affiliation(s)
- Eleftheria Zeggini
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
1147
|
|
1148
|
Trowsdale J. HLA genomics in the third millennium. Curr Opin Immunol 2005; 17:498-504. [PMID: 16085407 DOI: 10.1016/j.coi.2005.07.015] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2005] [Accepted: 07/21/2005] [Indexed: 12/20/2022]
Abstract
The MHC region contains several unique characteristics that set it apart as the most important region in the vertebrate genome in relation to disease. Recent data fit with the long-held view that the polymorphism of this region is driven by resistance to infection, although this is not yet proven. Interestingly, the MHC gene complex is associated with most, if not all, of the common autoimmune conditions. It has been difficult to identify the precise MHC genes associated with infection and autoimmunity, mainly because of the strong linkage disequilibrium over the region. Over the past few years, tools have been developed in an attempt to overcome these problems, including multiple fully sequenced MHC haplotypes, which have led to high-density hapmaps. In conjunction with large well-documented patient/control groups and sophisticated statistical methods these advances are starting to provide a comprehensive view of the genetics of the HLA region and disease susceptibility.
Collapse
Affiliation(s)
- John Trowsdale
- Department of Pathology, Cambridge Institute for Medical Research, Addenbrookes Hospital, Cambridge, UK.
| |
Collapse
|
1149
|
de Bakker PIW, Yelensky R, Pe'er I, Gabriel SB, Daly MJ, Altshuler D. Efficiency and power in genetic association studies. Nat Genet 2005; 37:1217-23. [PMID: 16244653 DOI: 10.1038/ng1669] [Citation(s) in RCA: 1376] [Impact Index Per Article: 72.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2005] [Accepted: 09/27/2005] [Indexed: 02/06/2023]
Abstract
We investigated selection and analysis of tag SNPs for genome-wide association studies by specifically examining the relationship between investment in genotyping and statistical power. Do pairwise or multimarker methods maximize efficiency and power? To what extent is power compromised when tags are selected from an incomplete resource such as HapMap? We addressed these questions using genotype data from the HapMap ENCODE project, association studies simulated under a realistic disease model, and empirical correction for multiple hypothesis testing. We demonstrate a haplotype-based tagging method that uniformly outperforms single-marker tests and methods for prioritization that markedly increase tagging efficiency. Examining all observed haplotypes for association, rather than just those that are proxies for known SNPs, increases power to detect rare causal alleles, at the cost of reduced power to detect common causal alleles. Power is robust to the completeness of the reference panel from which tags are selected. These findings have implications for prioritizing tag SNPs and interpreting association studies.
Collapse
Affiliation(s)
- Paul I W de Bakker
- Center for Human Genetic Research, Massachusetts General Hospital, 185 Cambridge Street, CPZN-6818, Boston, Massachusetts 02114-2790, USA
| | | | | | | | | | | |
Collapse
|
1150
|
Zhang K, Sun F. Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples. BMC Genet 2005; 6:51. [PMID: 16236175 PMCID: PMC1274312 DOI: 10.1186/1471-2156-6-51] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2005] [Accepted: 10/19/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent studies have indicated that the human genome could be divided into regions with low haplotype diversity interspersed with regions of high haplotype diversity. In regions of low haplotype diversity, a small fraction of SNPs (tag SNPs) are sufficient to account for most of the haplotype diversity of the human genome. These tag SNPs can be extremely useful for testing the association of a marker locus with a qualitative or quantitative trait locus in that it may not be necessary to genotype all the SNPs. When tag SNPs are used to reduce the genotyping effort in association studies, it is important to know how much power is lost. It is also important to know how much power is gained when tag SNPs instead of the same number of randomly chosen SNPs are used. RESULTS We design a simulation study to tackle these problems for a variety of quantitative association tests using either case-parent samples or unrelated population samples. First, the samples are generated based on the quantitative trait model with the assumption of either an extremal sampling scheme or a random sampling scheme. Second, a small number of samples are selected to determine the haplotype blocks and the tag SNPs. Third, the statistical power of the tests is evaluated using four kinds of data: (1) all the SNPs and the corresponding haplotypes, (2) the tag SNPs and the corresponding haplotypes, (3) the same number of evenly spaced SNPs with minor allele frequency greater than a threshold and the corresponding haplotypes, (4) the same number of randomly chosen SNPs and their corresponding haplotypes. CONCLUSION Our results suggest that in most situations genotyping efforts can be significantly reduced by using tag SNPs for mapping the QTL in association studies without much loss of power, which is consistent with previous studies on association mapping of qualitative traits. For all situations considered, two-locus haplotype analysis using tag SNPs are more powerful than those using the same number of randomly selected SNPs, but the degree of such power differences depends upon the sampling scheme and the population history.
Collapse
Affiliation(s)
- Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Fengzhu Sun
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|