1
|
Pettit RW, Amos CI. Linkage Disequilibrium Score Statistic Regression for Identifying Novel Trait Associations. CURR EPIDEMIOL REP 2022. [DOI: 10.1007/s40471-022-00297-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
2
|
A SINE-VNTR- Alu in the LRIG2 Promoter Is Associated with Gene Expression at the Locus. Int J Mol Sci 2020; 21:ijms21228486. [PMID: 33187279 PMCID: PMC7697779 DOI: 10.3390/ijms21228486] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/06/2020] [Accepted: 11/09/2020] [Indexed: 12/12/2022] Open
Abstract
The hominid SINE-VNTR-Alu (SVA) retrotransposons represent a repertoire of genomic variation which could have significant effects on genome function. A human-specific SVA in the promoter region of the gene leucine-rich repeats and immunoglobulin-like domains 2 (LRIG2), which we termed SVA_LRIG2, is a common retrotransposon insertion polymorphism (RIP), defined as an element which is polymorphic for its presence or absence in the genome. We hypothesised that this RIP might be associated with differential levels of expression of LRIG2. The RIP genotype of SVA_LRIG2 was determined in a subset of frontal cortex DNA samples from the North American Brain Expression Consortium (NABEC) cohort and was imputed for a larger set of that cohort. Utilising available frontal cortex total RNA-seq and CpG methylation data for this cohort, we observed that increased allele dosage of SVA_LRIG2 was non-significantly associated with a decrease in transcription from the region and significantly associated with increased methylation of the CpG probe nearest to SVA_LRIG2, i.e., SVA_LRIG2 is a significant methylation quantitative trait loci (mQTL) at the LRIG2 locus. These data are consistent with SVA_LRIG2 being a transcriptional regulator, which in part may involve epigenetic modulation.
Collapse
|
3
|
Yuan Y, Ma Y, Zhang X, Han R, Hu X, Yang J, Wang M, Guan SY, Pan G, Xu SQ, Jiang S, Pan F. Genetic polymorphisms of G protein-coupled receptor 65 gene are associated with ankylosing spondylitis in a Chinese Han population: A case-control study. Hum Immunol 2018; 80:146-150. [PMID: 30529363 DOI: 10.1016/j.humimm.2018.12.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 11/20/2018] [Accepted: 12/03/2018] [Indexed: 01/13/2023]
Abstract
OBJECTIVE This study aimed to assess the association between two tag single nucleotide polymorphisms (SNPs) (rs68177277 and rs11624293) of G protein-coupled receptor 65 (GPR65) gene and ankylosing spondylitis (AS) susceptibility in a Chinese Han population. METHODS 673 patients with AS diagnosed according to the modified New York criteria and 679 age- and gender-matched healthy controls were recruited. SNP genotyping for rs68177277 and rs11624293 polymorphisms were performed using the SNPscan technique. Disease activity was assessed by the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI). RESULTS Genotype and allele distribution of rs11624293 but not rs68177277 were significantly different between AS and controls (p = 0.004 and p = 0.002). Compared to the wild-type T/T genotype and T allele at rs11624293, the frequencies of C/T genotype and C allele were significantly higher in AS than controls after adjusting for age and gender (OR = 1.527, 95%CIs: 1.190-1.958; OR = 1.515, 95%CIs: 1.183-1.942, respectively). Dominant and co-dominant model of rs11624293 were predictive of AS susceptibility. In AS patients, the genotype of rs11624293 was significantly associated with BASFI scores in those with low disease activity (BASDAI < 4, p = 0.007). CONCLUSIONS The results of our study suggest that rs11624293 polymorphism of GPR65 gene is associated with the susceptibility and severity of AS in Chinese Han population.
Collapse
Affiliation(s)
- Yaping Yuan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China
| | - Yubo Ma
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China
| | - Xu Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China
| | - Renfang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China
| | - Xingxing Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China
| | - Jiajia Yang
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China
| | - Mengmeng Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China
| | - Shi-Yang Guan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China
| | - Guixia Pan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China
| | - Sheng-Qian Xu
- Department of Rheumatism and Immunity, The First Affiliated Hospital of Anhui Medical University , Hefei, Anhui 230022, China
| | - Shanqun Jiang
- School of Life Sciences, Anhui University, Hefei, Anhui 230022, China
| | - Faming Pan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China; The Key Laboratory of Major Autoimmune Diseases, Anhui Medical University, 81 Meishan Road, Hefei, Anhui 230032, China.
| |
Collapse
|
4
|
Autophagy-related IRGM genes confer susceptibility to ankylosing spondylitis in a Chinese female population: a case-control study. Genes Immun 2016; 18:42-47. [PMID: 28031552 DOI: 10.1038/gene.2016.48] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 10/15/2016] [Accepted: 11/08/2016] [Indexed: 12/16/2022]
Abstract
It is known that ankylosing spondylitis (AS) and inflammatory bowel disease (IBD) shared a common genetic component. The gist of current study is to assess the role of IBD-associated autophagy gene IRGM on AS susceptibility in a Chinese Han population. A total of 1270 unrelated subjects (643 AS and 627 controls) were enrolled. Two tag single-nucleotide polymorphisms (SNPs) (rs10065172 and rs4958846) were selected and were genotyped by iMLDR Assay technology. Genotypes and haplotype analysis were conducted by using SPSS 16.0 and haploview 4.2 software. Among two tag SNPs of IRGM, no correlation was observed between rs10065172 and AS susceptibility. For rs4958846, genotype and allelic frequencies were marginally discrepant between female cases and controls before, not after, Bonferroni correction (P=0.049; P=0.031). Logistic regression analysis revealed that carriers with CT+TT or CT genotype had a significantly decreased risk for developing AS among female subjects when compared with CC genotype (OR=0.514, 95% CI=0.301-0.876, P=0.014; OR=0.518, 95% CI=0.297-0.902, P=0.020, respectively). Additionally, a risk haplotype rs4958846C-rs10065172C (OR=2.093, 95% CI=1.301-3.368) and a protective haplotype rs4958846T-rs10065172C (OR=0.652, 95% CI=0.441-0.964) were also identified to be associated with female AS. IBD-associated IRGM gene is also associated with AS susceptibility in the Chinese female population, indicating that autophagy pathway may involve in AS genetic predisposition.
Collapse
|
5
|
Henning JA, Coggins J, Peterson M. Simple SNP-based minimal marker genotyping for Humulus lupulus L. identification and variety validation. BMC Res Notes 2015; 8:542. [PMID: 26438052 PMCID: PMC4595125 DOI: 10.1186/s13104-015-1492-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 09/21/2015] [Indexed: 12/31/2022] Open
Abstract
Background Hop is an economically important crop for the Pacific Northwest USA as well as other regions of the world. It is a perennial crop with rhizomatous or clonal propagation system for varietal distribution. A big concern for growers as well as brewers is variety purity and questions are regularly posed to public agencies concerning the availability of genotype testing. Current means for genotyping are based upon 25 microsatellites that provides relatively accurate genotyping but cannot always differentiate sister-lines. In addition, numerous PCR runs (25) are required to complete this process and only a few laboratories exist that perform this service. A genotyping protocol based upon SNPs would enable rapid accurate genotyping that can be assayed at any laboratory facility set up for SNP-based genotyping. The results of this study arose from a larger project designed for whole genome association studies upon the USDA-ARS hop germplasm collection consisting of approximately 116 distinct hop varieties and germplasm (female lines) from around the world. Results The original dataset that arose from partial sequencing of 121 genotypes resulted in the identification of 374,829 SNPs using TASSEL-UNEAK pipeline. After filtering out genotypes with more than 50 % missing data (5 genotypes) and SNP markers with more than 20 % missing data, 32,206 highly filtered SNP markers across 116 genotypes were identified and considered for this study. Minor allele frequency (MAF) was calculated for each SNP and ranked according to the most informative to least informative. Only those markers without missing data across genotypes as well as 60 % or less heterozygous gamete calls were considered for further analysis. Genetic distances among individuals in the study were calculated using the marker with the highest MAF value, then by using a combination of the two markers with highest MAF values and so on. This process was reiterated until a set of markers was identified that allowed for all genotypes in the study to be genetically differentiated from each other. Next, we compared genetic matrices calculated from the minimal marker sets [(Table 2; 6-, 7-, 8-, 10- and 12-marker set matrices] and that of a matrix calculated from a set of markers with no missing data across all 116 samples (1006 SNP markers). The minimum number of markers required to meet both specifications was a set of 7-markers (Table 3). These seven SNPs were then aligned with a genome assembly, and DNA sequence both upstream and downstream were used to identify primer sequences that can be used to develop seven amplicons for high resolution melting curve PCR detection or other SNP-based PCR detection methods. Conclusions This study identifies a set of 7 SNP markers that may prove useful for the identification and validation of hop varieties and accessions. Variety validation of unknown samples assumes that the variety under question has been included a priori in a discovery panel. These results are based upon in silica studies and markers need to be validated using different SNP marker technology upon a differential set of hop genotypes. The marker sequence data and suggested primer sets provide potential means to fingerprint hop varieties in most genetic laboratories utilizing SNP-marker technology. Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1492-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- John A Henning
- USDA-ARS, 3450 SW Campus Way, Corvallis, OR, 97331, USA.
| | - Jamie Coggins
- ROY FARMS, INC., 401 Walters Road, Moxee, WA, 98936, USA.
| | - Matthew Peterson
- CGRB, ALS Building, Oregon State University, Corvallis, OR, 97331, USA.
| |
Collapse
|
6
|
Corbin LJ, Kranis A, Blott SC, Swinburne JE, Vaudin M, Bishop SC, Woolliams JA. The utility of low-density genotyping for imputation in the Thoroughbred horse. Genet Sel Evol 2014; 46:9. [PMID: 24495673 PMCID: PMC3930001 DOI: 10.1186/1297-9686-46-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 12/20/2013] [Indexed: 12/21/2022] Open
Abstract
Background Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem. Results Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money. Conclusions Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are starting to use the recently developed equine 70K SNP chip. However, more work is needed to evaluate the impact of between-breed differences on imputation accuracy.
Collapse
Affiliation(s)
| | | | | | | | | | | | - John A Woolliams
- Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK.
| |
Collapse
|
7
|
Wu C, Li S, Cui Y. Genetic association studies: an information content perspective. Curr Genomics 2012; 13:566-73. [PMID: 23633916 PMCID: PMC3468889 DOI: 10.2174/138920212803251382] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Revised: 06/04/2012] [Accepted: 06/18/2012] [Indexed: 01/02/2023] Open
Abstract
The availability of high-density single nucleotide polymorphisms (SNPs) data has made the human genetic association studies possible to identify common and rare variants underlying complex diseases in a genome-wide scale. A handful of novel genetic variants have been identified, which gives much hope and prospects for the future of genetic association studies. In this process, statistical and computational methods play key roles, among which information-based association tests have gained large popularity. This paper is intended to give a comprehensive review of the current literature in genetic association analysis casted in the framework of information theory. We focus our review on the following topics: (1) information theoretic approaches in genetic linkage and association studies; (2) entropy-based strategies for optimal SNP subset selection; and (3) the usage of theoretic information criteria in gene clustering and gene regulatory network construction.
Collapse
Affiliation(s)
- Cen Wu
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan 48824
| | - Shaoyu Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan 48824
- Center for Computational Biology, Beijing Forestry University, Beijing, China 100083
| |
Collapse
|
8
|
Genetic variation in cholesterol ester transfer protein, serum CETP activity, and coronary artery disease risk in Asian Indian diabetic cohort. Pharmacogenet Genomics 2012; 22:95-104. [PMID: 22143414 DOI: 10.1097/fpc.0b013e32834dc9ef] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2022]
Abstract
BACKGROUND The role of cholesteryl ester transfer protein (CETP) in the metabolism of high-density lipoprotein cholesterol (HDL-C) is well studied but still controversial. More recently, genome-wide association studies and meta-analyses reported the association of a promoter variant (rs3764261) with HDL-C in Caucasians and other ethnic groups. In this study, we have examined the role of genetic variation in the promoter region of CETP with HDL-C, CETP activity, coronary artery disease (CAD), CAD risk factors, and the interaction of genetic factors with environment in a unique diabetic cohort of Asian Indian Sikhs. METHODS AND RESULTS We genotyped four variants; three tagging single nucleotide polymorphisms from promoter (rs3764261, rs12447924, rs4783961) and one intronic variant (rs708272 Taq1B) on 2431 individuals from the Sikh Diabetes study. Two variants (rs3764261 and rs708272) exhibited a strong association with HDL-C in both normoglycemic controls (β=0.12; P=9.35×10 for rs3764261; β=0.10, P=0.002 for rs708272) and diabetic cases (β=0.07, P=0.016 for rs3764261; β=0.08, P=0.005 for rs708272) with increased levels among minor homozygous 'AA' carriers. In addition, the same 'A' allele carriers in rs3764261 showed a significant decrease in systolic blood pressure (β=-0.08, P=0.002) in normoglycemic controls. Haplotype analysis of rs3764261, rs12447924, rs4783961, and rs708272 further revealed a significant association of 'ATAA' haplotype with an increased HDL-C (β=2.71, P=6.38×10) and 'CTAG' haplotype with decreased HDL-C levels (β=-1.78, P=2.5×10). Although there was no direct association of CETP activity and CETP polymorphisms, low CETP activity was associated with an increased risk to CAD (age, BMI, and sex-adjusted odds ratio=2.2; 95% confidence interval: 1.4-3.4; P=0.001) in this study. Our data revealed a strong interaction of rs3764261 and rs708272 for affecting the association between CETP activity and HDL-C levels (P=2.2×10 and P=4.4×10, respectively). CONCLUSION Our results, in conjunction with earlier reports confirm low CETP activity to be associated with higher CAD risk. Although there was no direct association of CETP activity with CETP polymorphisms, our findings revealed a significant interaction between CETP variants and CETP activity for affecting HDL-C levels. These results urge a deeper evaluation of the individual genetic variation in the CETP before implementing pharmaceutical intervention of blocking CETP for preventing CAD events.
Collapse
|
9
|
Javed A, Drineas P, Mahoney MW, Paschou P. Efficient genomewide selection of PCA-correlated tSNPs for genotype imputation. Ann Hum Genet 2011; 75:707-22. [PMID: 21902678 DOI: 10.1111/j.1469-1809.2011.00673.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The linkage disequilibrium structure of the human genome allows identification of small sets of single nucleotide polymorphisms (SNPs) (tSNPs) that efficiently represent dense sets of markers. This structure can be translated into linear algebraic terms as evidenced by the well documented principal components analysis (PCA)-based methods. Here we apply, for the first time, PCA-based methodology for efficient genomewide tSNP selection; and explore the linear algebraic structure of the human genome. Our algorithm divides the genome into contiguous nonoverlapping windows of high linear structure. Coupling this novel window definition with a PCA-based tSNP selection method, we analyze 2.5 million SNPs from the HapMap phase 2 dataset. We show that 10-25% of these SNPs suffice to predict the remaining genotypes with over 95% accuracy. A comparison with other popular methods in the ENCODE regions indicates significant genotyping savings. We evaluate the portability of genome-wide tSNPs across a diverse set of populations (HapMap phase 3 dataset). Interestingly, African populations are good reference populations for the rest of the world. Finally, we demonstrate the applicability of our approach in a real genome-wide disease association study. The chosen tSNP panels can be used toward genotype imputation using either a simple regression-based algorithm or more sophisticated genotype imputation methods.
Collapse
Affiliation(s)
- Asif Javed
- Computational Biology Center, IBM TJ Watson Research, Yorktown Heights, NY 10598, USA
| | | | | | | |
Collapse
|
10
|
Abstract
In this chapter, mutation (specifically single-nucleotide polymorphisms, SNPs) and recombination will be covered in more detail, and the concepts of genotype and haplotype will be reviewed. Linkage disequilibrium (LD) describes the strength of a relationship between alleles at different loci. The definition for LD, its visual representation, and the calculation of statistics that measure LD will be presented. The power of genetic association studies to identify disease susceptibility alleles fundamentally relies on the genetic variants studied. A standard approach is to determine a set of tagging-SNPs (tSNPs) that capture the majority of genomic variation in regions of interest by exploiting local correlation structures. The concept of LD and how it is used to select tSNPs will be addressed, as well as specific procedures and algorithms that are practiced by researchers to determine these variants.
Collapse
Affiliation(s)
- Karen Curtin
- Genetic Epidemiology Division, Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | |
Collapse
|
11
|
Brunel H, Gallardo-Chacón JJ, Buil A, Vallverdú M, Soria JM, Caminal P, Perera A. MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis. ACTA ACUST UNITED AC 2010; 26:1811-8. [PMID: 20562420 DOI: 10.1093/bioinformatics/btq273] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Finding association between genetic variants and phenotypes related to disease has become an important vehicle for the study of complex disorders. In this context, multi-loci genetic association might unravel additional information when compared with single loci search. The main goal of this work is to propose a non-linear methodology based on information theory for finding combinatorial association between multi-SNPs and a given phenotype. RESULTS The proposed methodology, called MISS (mutual information statistical significance), has been integrated jointly with a feature selection algorithm and has been tested on a synthetic dataset with a controlled phenotype and in the particular case of the F7 gene. The MISS methodology has been contrasted with a multiple linear regression (MLR) method used for genetic association in both, a population-based study and a sib-pairs analysis and with the maximum entropy conditional probability modelling (MECPM) method, which searches for predictive multi-locus interactions. Several sets of SNPs within the F7 gene region have been found to show a significant correlation with the FVII levels in blood. The proposed multi-site approach unveils combinations of SNPs that explain more significant information of the phenotype than their individual polymorphisms. MISS is able to find more correlations between SNPs and the phenotype than MLR and MECPM. Most of the marked SNPs appear in the literature as functional variants with real effect on the protein FVII levels in blood. AVAILABILITY The code is available at http://sisbio.recerca.upc.edu/R/MISS_0.2.tar.gz
Collapse
Affiliation(s)
- Helena Brunel
- Institut de Bioenginyeria de Catalunya, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, Pau Gargallo 5, 08028 Barcelona, Spain.
| | | | | | | | | | | | | |
Collapse
|
12
|
Sanghera DK, Demirci FY, Been L, Ortega L, Ralhan S, Wander GS, Mehra NK, Singh J, Aston CE, Mulvihill JJ, Kamboh IM. PPARG and ADIPOQ gene polymorphisms increase type 2 diabetes mellitus risk in Asian Indian Sikhs: Pro12Ala still remains as the strongest predictor. Metabolism 2010; 59:492-501. [PMID: 19846176 PMCID: PMC2843807 DOI: 10.1016/j.metabol.2009.07.043] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/26/2008] [Accepted: 07/29/2009] [Indexed: 01/22/2023]
Abstract
We have examined the association of 14 tagging single nucleotide polymorphisms (tagSNPs) in peroxisome proliferator activated receptor-gamma transcripts 1 and 2 (PPARG1 and 2) and 5 tagSNPs in adiponectin (ADIPOQ) genes for their effect on type 2 diabetes mellitus (T2D) risk in Asian Indian Sikhs. A total of 554 T2D cases and 527 normoglycemic controls were examined for association with T2D and other subphenotypes of T2D. With the exception of a strong association of PPARG2/Pro12Ala with T2D (odds ratio, 0.13; 95% confidence interval, 0.03-0.56; P = .0007), no other tagSNP in the PPARG locus revealed any significant association with T2D in this population. Similarly, none of the tagSNPs in the ADIPOQ gene was associated with T2D susceptibility in single-site analysis. However, haplotype analysis provided strong evidence of association of these loci with T2D. Three-site haplotype analysis in the PPARG locus using the 2 marginally associated SNPs (P/rs11715073 and P/rs3892175) in combination with Pro12 Ala (P/rs1801282) revealed a strong association of 1 "risk" (CGC) (P = .003, permutation P = .015) and 1 "protective" (CAC) (P = .001, permutation P = .005) haplotype associated with T2D. However, the major effect still appears to be driven by Pro12Ala, as the association of these haplotypes did not remain significant when analyzed conditional upon Pro12Ala (P = .262). In addition, 2-site haplotype analysis in the ADIPOQ locus using only 2 marginally associated SNPs (AD/rs182052 and AD/rs7649121) revealed a significant protective association of the GA haplotype with T2D (P = .009, permutation P = .026). Multiple linear regression analysis also revealed significant association of an ADIPOQ variant (AD/rs12495941) with total body weight (P = .010), waist (P = .024), and hip (P = .021), although these associations were not significant after adjusting for multiple testing. Our new findings strongly suggest that the genetic variation in PPARG and ADIPOQ loci could contribute to the risk for the development of T2D in Indian Sikhs. Identification of causal SNPs in these important biological and positional candidate genes would help determine the true physiologic significance of these loci in T2D and obesity.
Collapse
Affiliation(s)
- Dharambir Kaur Sanghera
- Department of Pediatrics, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Zhang L, Prather D, Vanden Eng J, Crawford S, Kariuki S, ter Kuile F, Terlouw D, Nahlen B, Lal AA, Slutsker L, Udhayakumar V, Shi YP. Polymorphisms in genes of interleukin 12 and its receptors and their association with protection against severe malarial anaemia in children in western Kenya. Malar J 2010; 9:87. [PMID: 20350312 PMCID: PMC2858737 DOI: 10.1186/1475-2875-9-87] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 03/29/2010] [Indexed: 01/21/2023] Open
Abstract
Background Malarial anaemia is characterized by destruction of malaria infected red blood cells and suppression of erythropoiesis. Interleukin 12 (IL12) significantly boosts erythropoietic responses in murine models of malarial anaemia and decreased IL12 levels are associated with severe malarial anaemia (SMA) in children. Based on the biological relevance of IL12 in malaria anaemia, the relationship between genetic polymorphisms of IL12 and its receptors and SMA was examined. Methods Fifty-five tagging single nucleotide polymorphisms covering genes encoding two IL12 subunits, IL12A and IL12B, and its receptors, IL12RB1 and IL12RB2, were examined in a cohort of 913 children residing in Asembo Bay region of western Kenya. Results An increasing copy number of minor variant (C) in IL12A (rs2243140) was significantly associated with a decreased risk of SMA (P = 0.006; risk ratio, 0.52 for carrying one copy of allele C and 0.28 for two copies). Individuals possessing two copies of a rare variant (C) in IL12RB1 (rs429774) also appeared to be strongly protective against SMA (P = 0.00005; risk ratio, 0.18). In addition, children homozygous for another rare allele (T) in IL12A (rs22431348) were associated with reduced risk of severe anaemia (SA) (P = 0.004; risk ratio, 0.69) and of severe anaemia with any parasitaemia (SAP) (P = 0.004; risk ratio, 0.66). In contrast, AG genotype for another variant in IL12RB1 (rs383483) was associated with susceptibility to high-density parasitaemia (HDP) (P = 0.003; risk ratio, 1.21). Conclusions This study has shown strong associations between polymorphisms in the genes of IL12A and IL12RB1 and protection from SMA in Kenyan children, suggesting that human genetic variants of IL12 related genes may significantly contribute to the development of anaemia in malaria patients.
Collapse
Affiliation(s)
- Lyna Zhang
- Malaria Branch, Division of Parasitic Diseases, National Center for Zoonotic, Vector-Borne & Enteric Diseases, Coordinating Center for Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30341, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Tag SNP selection based on clustering according to dominant sets found using replicator dynamics. ADV DATA ANAL CLASSI 2010. [DOI: 10.1007/s11634-010-0059-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
15
|
Polymorphisms in toll-like receptor 4 and toll-like receptor 9 influence viral load in a seroincident cohort of HIV-1-infected individuals. AIDS 2009; 23:2387-95. [PMID: 19855253 DOI: 10.1097/qad.0b013e328330b489] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
OBJECTIVES Toll-like receptors (TLRs) are innate immune sensors that are integral to resisting chronic and opportunistic infections. Mounting evidence implicates TLR polymorphisms in susceptibilities to various infectious diseases, including HIV-1. We investigated the impact of TLR single nucleotide polymorphisms (SNPs) on clinical outcome in a seroincident cohort of HIV-1-infected volunteers. DESIGN We analyzed TLR SNPs in 201 antiretroviral treatment-naive HIV-1-infected volunteers from a longitudinal seroincident cohort with regular follow-up intervals (median follow-up 4.2 years, interquartile range 4.4). Participants were stratified into two groups according to either disease progression, defined as peripheral blood CD4(+) T-cell decline over time, or peak and setpoint viral load. METHODS Haplotype tagging SNPs from TLR2, TLR3, TLR4, and TLR9 were detected by mass array genotyping, and CD4(+) T-cell counts and viral load measurements were determined prior to antiretroviral therapy initiation. The association of TLR haplotypes with viral load and rapid progression was assessed by multivariate regression models using age and sex as covariates. RESULTS Two TLR4 SNPs in strong linkage disequilibrium [1063 A/G (D299G) and 1363 C/T (T399I)] were more frequent among individuals with high peak viral load compared with low/moderate peak viral load (odds ratio 6.65, 95% confidence interval 2.19-20.46, P < 0.001; adjusted P = 0.002 for 1063 A/G). In addition, a TLR9 SNP previously associated with slow progression was found less frequently among individuals with high viral setpoint compared with low/moderate setpoint (odds ratio 0.29, 95% confidence interval 0.13-0.65, P = 0.003, adjusted P = 0.04). CONCLUSION This study suggests a potentially new role for TLR4 polymorphisms in HIV-1 peak viral load and confirms a role for TLR9 polymorphisms in disease progression.
Collapse
|
16
|
Israel S, Lerer E, Shalev I, Uzefovsky F, Riebold M, Laiba E, Bachner-Melman R, Maril A, Bornstein G, Knafo A, Ebstein RP. The oxytocin receptor (OXTR) contributes to prosocial fund allocations in the dictator game and the social value orientations task. PLoS One 2009; 4:e5535. [PMID: 19461999 PMCID: PMC2680041 DOI: 10.1371/journal.pone.0005535] [Citation(s) in RCA: 168] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2008] [Accepted: 04/10/2009] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Economic games observe social decision making in the laboratory that involves real money payoffs. Previously we have shown that allocation of funds in the Dictator Game (DG), a paradigm that illustrates costly altruistic behavior, is partially determined by promoter-region repeat region variants in the arginine vasopressin 1a receptor gene (AVPR1a). In the current investigation, the gene encoding the related oxytocin receptor (OXTR) was tested for association with the DG and a related paradigm, the Social Values Orientation (SVO) task. METHODOLOGY/PRINCIPAL FINDINGS Association (101 male and 102 female students) using a robust-family based test between 15 single tagging SNPs (htSNPs) across the OXTR was demonstrated with both the DG and SVO. Three htSNPs across the gene region showed significant association with both of the two games. The most significant association was observed with rs1042778 (p = 0.001). Haplotype analysis also showed significant associations for both DG and SVO. Following permutation test adjustment, significance was observed for 2-5 locus haplotypes (p<0.05). A second sample of 98 female subjects was subsequently and independently recruited to play the dictator game and was genotyped for the three significant SNPs found in the first sample. The rs1042778 SNP was shown to be significant for the second sample as well (p = 0.004, Fisher's exact test). CONCLUSIONS The demonstration that genetic polymorphisms for the OXTR are associated with human prosocial decision making converges with a large body of animal research showing that oxytocin is an important social hormone across vertebrates including Homo sapiens. Individual differences in prosocial behavior have been shown by twin studies to have a substantial genetic basis and the current investigation demonstrates that common variants in the oxytocin receptor gene, an important element of mammalian social circuitry, underlie such individual differences.
Collapse
Affiliation(s)
- Salomon Israel
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Elad Lerer
- Department of Human Genetics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Idan Shalev
- Brain and Behavior Science, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Florina Uzefovsky
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Mathias Riebold
- Department of Human Genetics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Efrat Laiba
- Department of Human Genetics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | | | - Anat Maril
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Gary Bornstein
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
- Center for the Study of Rationality and Interactive Decision Theory, Jerusalem, Israel
| | - Ariel Knafo
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Richard P. Ebstein
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
- S. Herzog Memorial Hospital, Jerusalem, Israel
| |
Collapse
|
17
|
Brunel H, Perera A, Buil A, Sabater-Lleal M, Souto JC, Fontcuberta J, Vallverdu M, Soria JM, Caminal P. SNP sets selection under mutual information criterion, application to F7/FVII dataset. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2009; 2008:3783-6. [PMID: 19163535 DOI: 10.1109/iembs.2008.4650032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
One of the main goals of human genetics is to find genetic markers related to complex diseases. In blood coagulation process, it is known that genetic variability in F7 gene is the most responsible for observed variations in FVII levels in blood. In this work, we propose a method for selecting sets of Single Nucleotide Polymorphisms (SNPs) significantly correlated with a phenotype (FVII levels). This method employs a feature selection algorithm (variant of Sequential Forward Selection, SFS) based on a criterion of statistical significance of a mutual information functional. This algorithm is applied to a sample of independent individuals from the GAIT project. Main SNPs found by the algorithm are in correspondence with previous results published using family-based techniques.
Collapse
Affiliation(s)
- H Brunel
- Institut de Bioenginyeria de Catalunya, Centre de Recerca en Enginyeria Biomédica, Departament de Enginyeria, de Sistemes, Automàtica i Informàtica industrial, Universitat Politcénica de Catalunya, Pau Gargallo 5, Barcelona, Spain.
| | | | | | | | | | | | | | | | | |
Collapse
|
18
|
|
19
|
Snagger: a user-friendly program for incorporating additional information for tagSNP selection. BMC Bioinformatics 2008; 9:174. [PMID: 18371222 PMCID: PMC2375134 DOI: 10.1186/1471-2105-9-174] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2007] [Accepted: 03/27/2008] [Indexed: 11/10/2022] Open
Abstract
Background There has been considerable effort focused on developing efficient programs for tagging single-nucleotide polymorphisms (SNPs). Many of these programs do not account for potential reduced genomic coverage resulting from genotyping failures nor do they preferentially select SNPs based on functionality, which may be more likely to be biologically important. Results We have developed a user-friendly and efficient software program, Snagger, as an extension to the existing open-source software, Haploview, which uses pairwise r2 linkage disequilibrium between single nucleotide polymorphisms (SNPs) to select tagSNPs. Snagger distinguishes itself from existing SNP selection algorithms, including Tagger, by providing user options that allow for: (1) prioritization of tagSNPs based on certain characteristics, including platform-specific design scores, functionality (i.e., coding status), and chromosomal position, (2) efficient selection of SNPs across multiple populations, (3) selection of tagSNPs outside defined genomic regions to improve coverage and genotyping success, and (4) picking of surrogate tagSNPs that serve as backups for tagSNPs whose failure would result in a significant loss of data. Using HapMap genotype data from ten ENCODE regions and design scores for the Illumina platform, we show similar coverage and design score distribution and fewer total tagSNPs selected by Snagger compared to the web server Tagger. Conclusion Snagger improves upon current available tagSNP software packages by providing a means for researchers to select tagSNPs that reliably capture genetic variation across multiple populations while accounting for significant genotyping failure risk and prioritizing on SNP-specific characteristics.
Collapse
|
20
|
|
21
|
Windelinckx A, Vlietinck R, Aerssens J, Beunen G, Thomis MAI. Selection of genes and single nucleotide polymorphisms for fine mapping starting from a broad linkage region. Twin Res Hum Genet 2008; 10:871-85. [PMID: 18179400 DOI: 10.1375/twin.10.6.871] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Fine mapping of linkage peaks is one of the great challenges facing researchers who try to identify genes and genetic variants responsible for the variation in a certain trait or complex disease. Once the trait is linked to a certain chromosomal region, most studies use a candidate gene approach followed by a selection of polymorphisms within these genes, either based on their possibility to be functional, or based on the linkage disequilibrium between adjacent markers. For both candidate gene selection and SNP selection, several approaches have been described, and different software tools are available. However, mastering all these information sources and choosing between the different approaches can be difficult and time-consuming. Therefore, this article lists several of these in silico procedures, and the authors describe an empirical two-step fine mapping approach, in which candidate genes are prioritized using a bioinformatics approach (ENDEAVOUR), and the top genes are chosen for further SNP selection with a linkage disequilibrium based method (Tagger). The authors present the different actions that were applied within this approach on two previously identified linkage regions for muscle strength. This resulted in the selection of 331 polymorphisms located in 112 different candidate genes out of an initial set of 23,300 SNPs.
Collapse
Affiliation(s)
- An Windelinckx
- Research Center for Exercise and Health, Department of Biomedical Kinesiology, Faculty of Kinesiology and Rehabilitation Sciences, Katholieke Universiteit Leuven, Leuven, Belgium
| | | | | | | | | |
Collapse
|
22
|
Sabbagh A, Génin E, Darlu P. Selecting Predictive Markers for Pharmacogenetic Traits: Tagging vs. Data-Mining Approaches. Hum Hered 2008; 66:10-8. [DOI: 10.1159/000114161] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2006] [Accepted: 08/16/2007] [Indexed: 11/19/2022] Open
|
23
|
Nannya Y, Taura K, Kurokawa M, Chiba S, Ogawa S. Evaluation of genome-wide power of genetic association studies based on empirical data from the HapMap project. Hum Mol Genet 2007; 16:2494-505. [PMID: 17666406 DOI: 10.1093/hmg/ddm205] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
With recent advances in high-throughput single nucleotide polymorphism (SNP) typing technologies, genome-wide association studies have become a realistic approach to identify the causative genes that are responsible for common diseases of complex genetic traits. In this strategy, a trade-off between the increased genome coverage and a chance of finding SNPs incidentally showing a large statistics becomes serious due to extreme multiple-hypothesis testing. We investigated the extent to which this trade-off limits the genome-wide power with this approach by simulating a large number of case-control panels based on the empirical data from the HapMap Project. In our simulations, statistical costs of multiple hypothesis testing were evaluated by empirically calculating distributions of the maximum value of the chi(2) statistics for a series of marker sets having increasing numbers of SNPs, which were used to determine a genome-wide threshold in the following power simulations. With a practical study size, the cost of multiple testing largely offsets the potential benefits from increased genome coverage given modest genetic effects and/or low frequencies of causal alleles. In most realistic scenarios, increasing genome coverage becomes less influential on the power, while sample size is the predominant determinant of the feasibility of genome-wide association tests. Increasing genome coverage without corresponding increase in sample size will only consume resources without little gain in power. For common causal alleles with relatively large effect sizes [genotype relative risk > or =1.7], we can expect satisfactory power with currently available large-scale genotyping platforms using realistic sample size ( approximately 1000 per arm).
Collapse
Affiliation(s)
- Yasuhito Nannya
- Department of Hematology/Oncology, Graduate School of Medicine, University of Tokyo, Tokyo 113-8655, Japan
| | | | | | | | | |
Collapse
|
24
|
Angius A, Hyland FCL, Persico I, Pirastu N, Woodage T, Pirastu M, De la Vega FM. Patterns of linkage disequilibrium between SNPs in a Sardinian population isolate and the selection of markers for association studies. Hum Hered 2007; 65:9-22. [PMID: 17652959 DOI: 10.1159/000106058] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 04/30/2007] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE In isolated populations, 'background' linkage disequilibrium (LD) has been shown to extend over large genetic distances. This and their reduced environmental and genetic heterogeneity has stimulated interest in their potential for association mapping. We compared LD unit map distances with pair-wise measurements of LD in a dense single nucleotide polymorphism (SNP) set. METHODS We genotyped 771 SNPs in an 8 Mb segment of chromosome 22 on 101 individuals from the isolated village of Talana, Sardinia, and compared with outbred European populations. RESULTS Heterozygosity was remarkably similar in both populations. In contrast, the extent of LD observed was quite different. The decay of LD with distance is slower in the isolate. The differences in LD map lengths suggest that useful LD extends up to three times farther in the Sardinian population; smaller differences are seen with pairwise LD metrics. While LD map length slightly decreases with average relatedness, cryptic relatedness does not explain the decrease in LD map length. Haplotypes, block boundaries, and patterns of LD are similar in both populations, suggesting a shared distribution of recombination hotspots. CONCLUSIONS About 15% fewer haplotype tagging SNPs need to be genotyped in the isolate, and possibly 70% fewer if selecting SNPs evenly spaced on the metric LD map.
Collapse
|
25
|
Axelsson J, Devuyst O, Nordfors L, Heimbürger O, Stenvinkel P, Lindholm B. Place of genotyping and phenotyping in understanding and potentially modifying outcomes in peritoneal dialysis patients. Kidney Int 2007:S138-45. [PMID: 17080106 DOI: 10.1038/sj.ki.5001931] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
With the landmark publication of the human genome sequence and its subsequent division into haplotype blocks, the characterization of genetic variations is becoming a feasible approach to study both the pathophysiology and risk factors of complex traits. A number of strategies are available today for identifying candidate genes or polymorphisms associated with pertinent phenotypes. For Mendelian diseases with high penetrance owing to mutations in a single gene, such as polycystic kidney disease, linkage studies have been very successful in mapping the disease loci owing to the availability of families with multiple affected members. In contrast to monogenic conditions, complex diseases such as end-stage renal disease (ESRD) and complex traits such as individual variations in membrane transport and complications during the course of peritoneal dialysis (PD) therapy have a number of competing determinants and inhibitors, both genetic and environmental. Current results reflect this complexity, with few studies showing a large effect of any single risk factor on survival or outcome on PD. However, these studies have so far been small (less than 500 patients) and have not utilized bioinformatics or novel technologies (e.g., multiplex genotyping equipment). In the following review, we outline current approaches for using genetic data in clinical studies as well as highlight some of the most promising results in ESRD patients, particularly those on PD.
Collapse
Affiliation(s)
- J Axelsson
- Division of Renal Medicine, Department of Clinical Science, Intervention and Technology, Karolinska Intitutet, Karolinska University Hospital, Stockholm, Sweden
| | | | | | | | | | | |
Collapse
|
26
|
Moskvina V, O'Donovan MC. Detailed analysis of the relative power of direct and indirect association studies and the implications for their interpretation. Hum Hered 2007; 64:63-73. [PMID: 17483598 DOI: 10.1159/000101424] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES Genetic association studies are usually based upon restricted sets of 'tag' markers selected to represent the total sequence variation. Tag selection is often determined by some threshold for the r(2) coefficients of linkage disequilibrium (LD) between tag and untyped markers, it being widely assumed that power to detect an effect at the untyped sites is retained by typing the tag marker in a sample scaled by the inverse of the selected threshold (1/r(2)). However, unless only a single causal variant occurs at a locus, it has been shown [Eur J Hum Genet 2006;14:426-437] that significant power loss can occur if this principle is applied. We sought to investigate whether unexpected loss of power might be an exceptional case or more general concern. In the absence of detailed knowledge about the genetic architecture at complex disease loci, we developed a mathematical approach to test all possible situations. METHODS We derived mathematical formulae allowing the calculation of all possible odds ratios (OR) at a tag marker locus given the effect size that would be observed by typing a second locus and the r(2) between the two loci. For a range of allele frequencies, r(2) between loci, and strengths of association at the causal locus (OR from 0.5 to 2) that we consider realistic for complex disease loci, we next determined the sample sizes that would be necessary to give equivalent power to detect association by genotyping tag and causal loci and compared these with the sample sizes predicted by applying 1/r(2). RESULTS Under most of the hypothetical scenarios we examined, the calculated sample sizes required to maintain power by typing markers that tag the causal locus at even moderately high r(2) (0.8) were greater than that calculated by applying 1/r(2). Even in populations with apparently similar measurements of allele frequency, LD structure, and effect size at the susceptibility allele, the required sample size to detect association with a tag marker can vary substantially. We also show that in apparently similar populations, associations to either allele at the tag site are possible. CONCLUSIONS Indirect tests of association are less powered than sizes predicted by applying 1/r(2) in the majority of hypothetical scenarios we examined. Our findings pertain even for what we consider likely to be larger than average effect sizes in complex diseases (OR = 1.5-2) and even for moderately high r(2) values between the markers. Until a substantial number of disease genes have been identified through methods that are not based on tagging, and therefore biased towards those situations most favourable to tagging, it is impossible to know how the true scenarios are distributed across the range of possible scenarios. Nevertheless, while association designs based upon tag marker selection by necessity are the tool of choice for de novo gene discovery, our data suggest power to initially detect association may often be less than assumed. Moreover, our data suggest that to avoid genuine findings being subsequently discarded by unpredictable losses of power, follow up studies in other samples should be based upon more detailed analyses of the gene rather than simply on the tag SNPs showing association in the discovery study.
Collapse
Affiliation(s)
- V Moskvina
- Department of Psychological Medicine, Wales College of Medicine, Cardiff University, Cardiff, UK.
| | | |
Collapse
|
27
|
Abstract
Many genetic analyses are done with incomplete information; for example, unknown phase in haplotype-based association studies. Measures of the amount of available information can be used for efficient planning of studies and/or analyses. In particular, the linkage disequilibrium (LD) between two sets of markers can be interpreted as the amount of information one set of markers contains for testing allele frequency differences in the second set, and measuring LD can be viewed as quantifying information in a missing data problem. We introduce a framework for measuring the association between two sets of variables; for example, genotype data for two distinct groups of markers, or haplotype and genotype data for a given set of polymorphisms. The goal is to quantify how much information is in one data set, e.g. genotype data for a set of SNPs, for estimating parameters that are functions of frequencies in the second data set, e.g. haplotype frequencies, relative to the ideal case of actually observing the complete data, e.g. haplotypes. In the case of genotype data on two mutually exclusive sets of markers, the measure determines the amount of multi-locus LD, and is equal to the classical measure r(2), if the sets consist each of one bi-allelic marker. In general, the measures are interpreted as the asymptotic ratio of sample sizes necessary to achieve the same power in case-control testing. The focus of this paper is on case-control allele/haplotype tests, but the framework can be extended easily to other settings like regressing quantitative traits on allele/haplotype counts, or tests on genotypes or diplotypes. We highlight applications of the approach, including tools for navigating the HapMap database [The International HapMap Consortium, 2003], and genotyping strategies for positional cloning studies.
Collapse
Affiliation(s)
- Dan L Nicolae
- Departments of Medicine and Statistics, The University of Chicago, Chicago, Illinois 60637, USA.
| |
Collapse
|
28
|
Chi PB, Duggal P, Kao WHL, Mathias RA, Grant AV, Stockton ML, Garcia JGN, Ingersoll RG, Scott AF, Beaty TH, Barnes KC, Fallin MD. Comparison of SNP tagging methods using empirical data: association study of 713 SNPs on chromosome 12q14.3-12q24.21 for asthma and total serum IgE in an African Caribbean population. Genet Epidemiol 2007; 30:609-19. [PMID: 16830339 DOI: 10.1002/gepi.20172] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Few comparison studies have been performed on single nucleotide polymorphism (SNP) tagging methods to examine their consistency and effectiveness in terms of inferences about association with disease. We applied several SNP tagging methods to SNPs on chromosome 12q (n=713) and compared the utility of these methods to detect association for asthma and serum IgE levels among a sample of African Caribbean families from Barbados selected through asthmatic probands. We found that a high level of information regarding association is retained in Clayton's htSNP, Stram's TagSNP, and de Bakker's Tagger. We also found a high degree of consistency between TagSNP and Tagger. Using this set of 713 SNPs on chromosome 12q, our study provides insight towards analytic strategies for future studies of complex traits.
Collapse
Affiliation(s)
- Peter B Chi
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
De La Vega FM. Selecting single-nucleotide polymorphisms for association studies with SNPbrowser software. Methods Mol Biol 2007; 376:177-93. [PMID: 17984546 DOI: 10.1007/978-1-59745-389-9_13] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The design of genetic association studies using single-nucleotide polymorphisms (SNPs) requires the selection of subsets of the variants providing high statistical power at a reasonable cost. SNPs must be selected to maximize the probability that a causative mutation is in linkage disequilibrium (LD) with at least one marker genotyped in the study. The HapMap Project performed a genome-wide survey of genetic variation with over 3 million SNPs typed in four populations, providing a rich resource to inform the design of association studies. A number of strategies have been proposed for the selection of SNPs based on observed LD, including construction of metric LD maps and the selection of haplotype-tagging SNPs. Power calculations are important at the study design stage to ensure successful results. Integrating these methods and annotations can be challenging: the algorithms required to implement these methods are complex to deploy, and all the necessary data and annotations are deposited in disparate databases. Here, we review the typical workflows for the selection of markers for association studies utilizing the SNPbrowser software, a freely available, stand-alone application that incorporates the HapMap database together with gene and SNP annotations. Selected SNPs are screened for their conversion potential to genotyping platforms, expediting the set up of genetic studies with an increased probability of success.
Collapse
|
30
|
Thomson PA, Christoforou A, Morris SW, Adie E, Pickard BS, Porteous DJ, Muir WJ, Blackwood DHR, Evans KL. Association of Neuregulin 1 with schizophrenia and bipolar disorder in a second cohort from the Scottish population. Mol Psychiatry 2007; 12:94-104. [PMID: 16940976 DOI: 10.1038/sj.mp.4001889] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Neuregulin 1 (NRG1) is a strong candidate for involvement in the aetiology of schizophrenia. A haplotype, initially identified as showing association in the Icelandic and Scottish populations, has shown a consistent effect size in multiple European populations. Additionally, NRG1 has been implicated in susceptibility to bipolar disorder. In this first study to select markers systematically on the basis of linkage disequilibrium across the entire NRG1 gene, we used haplotype-tagging single-nucleotide polymorphisms to identify single markers and haplotypes associated with schizophrenia and bipolar disorder in an independently ascertained Scottish population. Haplotypes in two regions met an experiment-wide significance threshold of P=0.0016 (Nyholt's SpD) and were permuted to correct for multiple testing. Region A overlaps with the Icelandic haplotype and shows nominal association with schizophrenia (P=0.00032), bipolar disorder (P=0.0011), and the combined case group (P=0.0017). This region includes the 5' exon of the NRG1 GGF2 isoform and overlaps the expressed sequence tag (EST) cluster Hs.97362. However, no haplotype in Region A remains significant after permutation analysis (P>0.05). Region B contains a haplotype associated with both schizophrenia (P=0.00014), and the combined case group (P=0.000062), although it does not meet Nyholt's threshold in bipolar disorder alone (P=0.0022). This haplotype remained significant after permutation analysis in both the schizophrenia and combined case groups (P=0.024 and P=0.016, respectively). It spans a approximately 136 kb region that includes the coding sequence of the sensory and motor neuron derived factor (SMDF) isoform and 3' exons of all other known NRG1 isoforms. Our study identifies a new of NRG1 region involved in schizophrenia and bipolar disorder in the Scottish population.
Collapse
Affiliation(s)
- P A Thomson
- Department of Medical Sciences, Medical Genetics Section, Molecular Medicine Centre, University of Edinburgh, Western General Hospital, Edinburgh, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Ding K, Kullo IJ. Methods for the selection of tagging SNPs: a comparison of tagging efficiency and performance. Eur J Hum Genet 2006; 15:228-36. [PMID: 17164795 DOI: 10.1038/sj.ejhg.5201755] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
There is great interest in the use of tagging single nucleotide polymorphisms (tSNPs) to facilitate association studies of complex diseases. This is based on the premise that a minimum set of tSNPs may be sufficient to capture most of the variation in certain regions of the human genome. Several methods have been described to select tSNPs, based on either haplotype-block structure or independent of the underlying block structure. In this paper, we compare eight methods for choosing tSNPs in 10 representative resequenced candidate genes (a total of 194.2 kb) with different levels of linkage disequilibrium (LD) in a sample of European-Americans. We compared tagging efficiency (TE) and prediction accuracy of tSNPs identified by these methods, as a function of several factors, including LD level, minor allele frequency, and tagging criteria. We also assessed tagging consistency between each method. We found that tSNPs selected based on the methods Haplotype Diversity and Haplotype r2 provided the highest TE, whereas the prediction accuracy was comparable among different methods. Tagging consistency between different methods of tSNPs selection was poor. This work demonstrates that when tSNPs-based association studies are undertaken, the choice of method for selecting tSNPs requires careful consideration.
Collapse
Affiliation(s)
- Keyue Ding
- Division of Cardiovascular Diseases, Mayo Clinic and Foundation, Rochester, MN 55905, USA
| | | |
Collapse
|
32
|
Paschou P, Mahoney MW, Javed A, Kidd JR, Pakstis AJ, Gu S, Kidd KK, Drineas P. Intra- and interpopulation genotype reconstruction from tagging SNPs. Genome Res 2006; 17:96-107. [PMID: 17151345 PMCID: PMC1716273 DOI: 10.1101/gr.5741407] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects of debate. We propose novel, scalable matrix algorithms that address these issues and we evaluate them on genotypic data from 38 populations and four genomic regions (248 SNPs typed for approximately 2000 individuals). We also evaluate these algorithms on a second data set consisting of genotypes available from the HapMap database (1336 SNPs for four populations) over the same genomic regions. Furthermore, we test these methods in the setting of a real association study using a publicly available family data set. The algorithms we use for tSNP selection and unassayed SNP reconstruction do not require haplotype inference and they are, in principle, scalable even to genome-wide analysis. Moreover, they are greedy variants of recently developed matrix algorithms with provable performance guarantees. Using a small set of carefully selected tSNPs, we achieve very good reconstruction accuracy of "untyped" genotypes for most of the populations studied. Additionally, we demonstrate in a quantitative manner that the chosen tSNPs exhibit substantial transferability, both within and across different geographic regions. Finally, we show that reconstruction can be applied to retrieve significant SNP associations with disease, with important genotyping savings.
Collapse
Affiliation(s)
- Peristera Paschou
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06511, USA.
| | | | | | | | | | | | | | | |
Collapse
|
33
|
Sham PC, Ao SI, Kwan JSH, Kao P, Cheung F, Fong PY, Ng MK. Combining functional and linkage disequilibrium information in the selection of tag SNPs. Bioinformatics 2006; 23:129-31. [PMID: 17060359 DOI: 10.1093/bioinformatics/btl532] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED We have developed an online program, WCLUSTAG, for tag SNP selection that allows the user to specify variable tagging thresholds for different SNPs. Tag SNPs are selected such that a SNP with user-specified tagging threshold C will have a minimum R2 of C with at least one tag SNP. This flexible feature is useful for researchers who wish to prioritize genomic regions or SNPs in an association study. AVAILABILITY The online WCLUSTAG program is available at http://bioinfo.hku.hk/wclustag/
Collapse
Affiliation(s)
- P C Sham
- Department of Psychiatry, Institute of Psychiatry, King's College London, UK
| | | | | | | | | | | | | |
Collapse
|
34
|
Moskvina V, Schmidt KM. Individual SNP allele reconstruction from informative markers selected by a non-linear Gauss-type algorithm. Hum Hered 2006; 62:97-106. [PMID: 17047339 DOI: 10.1159/000096097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2006] [Accepted: 08/02/2006] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES In view of the linkage disequilibrium structure of the genome, the selection of maximally informative SNP markers is a fundamental issue in the design of association studies. Currently used selection methods rely on pairwise marker correlation or informativity measures for subsets of markers. Nevertheless, the selected markers do not provide a completely satisfactory description of the individual remaining markers. The number of tag markers can be further reduced by using haplotypic information, but then the results of association analysis are difficult to interpret. METHODS AND RESULTS We propose a non-linear Gauss-type algorithm selecting a subset of markers which is optimal with respect to the informativity measures and allows an explicit reconstruction of all other known markers, thus permitting direct inference of allelic association. The selection is based on the haplotype distribution in the population, but can be adapted to work with unphased genotypes directly. CONCLUSIONS The proposed algorithm provides a rational methodology of informative marker selection, allowing for control and optimisation of information content and full marker reconstruction. Moreover, the reconstruction step can also be applied to tag markers selected using a different method at the stage of study design, identifying those markers which cannot be uniquely recovered from the chosen tags.
Collapse
Affiliation(s)
- Valentina Moskvina
- Department of Psychological Medicine, College of Medicine, Cardiff University, Cardiff, UK.
| | | |
Collapse
|
35
|
Abstract
The goal of case-control association studies is to find genetic variants in the human genome that influence common traits. The Human Genome and HapMap projects have added fresh impetus to this goal by cataloguing the raw genetic data behind human DNA variation. Studies that associate these genetic variants with phenotype improve both molecular diagnostics and drug discovery and offer clinicians important opportunities to improve care of patients. In this review I focus on case-control studies, which are the most widely used design and expected to be the most powerful. I also address the problem of case-control non-replication, which is widespread despite enormous effort and use of resources. Important causes of non-replication include inadequate statistical power to detect small and moderate effects, phenotype heterogeneity, population stratification, publication bias, and multiple comparison testing.
Collapse
Affiliation(s)
- Daniel G Healy
- Institute of Neurology, Queen Square hospital, Lambert palace road, London, UK.
| |
Collapse
|
36
|
Nicolas P, Sun F, Li LM. A model-based approach to selection of tag SNPs. BMC Bioinformatics 2006; 7:303. [PMID: 16776821 PMCID: PMC1525207 DOI: 10.1186/1471-2105-7-303] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2006] [Accepted: 06/15/2006] [Indexed: 11/23/2022] Open
Abstract
Background Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. Results Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. Conclusion Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available.
Collapse
Affiliation(s)
- Pierre Nicolas
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
- Mathématique, Informatique et Génome, INRA, Jouy-en-Josas, France
| | - Fengzhu Sun
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
| | - Lei M Li
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
- Department of Mathematics, University of Southern California, Los Angeles, USA
| |
Collapse
|
37
|
Howie BN, Carlson CS, Rieder MJ, Nickerson DA. Efficient selection of tagging single-nucleotide polymorphisms in multiple populations. Hum Genet 2006; 120:58-68. [PMID: 16680432 DOI: 10.1007/s00439-006-0182-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2006] [Accepted: 03/30/2006] [Indexed: 10/24/2022]
Abstract
Common genetic polymorphism may explain a portion of the heritable risk for common diseases, so considerable effort has been devoted to finding and typing common single-nucleotide polymorphisms (SNPs) in the human genome. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), suggesting that only a subset of all SNPs (known as tagging SNPs, or tagSNPs) need to be genotyped for disease association studies. Based on the genetic differences that exist among human populations, most tagSNP sets are defined in a single population and applied only in populations that are closely related. To improve the efficiency of multi-population analyses, we have developed an algorithm called MultiPop-TagSelect that finds a near-minimal union of population-specific tagSNP sets across an arbitrary number of populations. We present this approach as an extension of LD-select, a tagSNP selection method that uses a greedy algorithm to group SNPs into bins based on their pairwise association patterns, although the MultiPop-TagSelect algorithm could be used with any SNP tagging approach that allows choices between nearly equivalent SNPs. We evaluate the algorithm by considering tagSNP selection in candidate-gene resequencing data and lower density whole-chromosome data. Our analysis reveals that an exhaustive search is often intractable, while the developed algorithm can quickly and reliably find near-optimal solutions even for difficult tagSNP selection problems. Using populations of African, Asian, and European ancestry, we also show that an optimal multi-population set of tagSNPs can be substantially smaller (up to 44%) than a typical set obtained through independent or sequential selection.
Collapse
Affiliation(s)
- Bryan N Howie
- Department of Genome Sciences, University of Washington, Box 357730, Seattle, WA 98195, USA
| | | | | | | |
Collapse
|
38
|
Lawrence R, Evans DM, Morris AP, Ke X, Hunt S, Paolucci M, Ragoussis J, Deloukas P, Bentley D, Cardon LR. Genetically indistinguishable SNPs and their influence on inferring the location of disease-associated variants. Genome Res 2006; 15:1503-10. [PMID: 16251460 PMCID: PMC1310638 DOI: 10.1101/gr.4217605] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
As part of a recent high-density linkage disequilibrium (LD) study of chromosome 20, we obtained genotypes for approximately 30,000 SNPs at a density of 1 SNP/2 kb on four different population samples (47 CEPH founders; 91 UK unrelateds [unrelated white individuals of western European ancestry]; 97 African Americans; 42 East Asians). We observed that approximately 50% of SNPs had at least one genetically indistinguishable partner; i.e., for every individual considered, their genotype at the first locus was identical to their genotype at the second locus, or in LD terms, the SNPs were in "perfect" LD (r2 = 1.0). These "genetically indistinguishable SNPs" (giSNPs) formed into clusters of varying size. The larger the cluster, the greater the tendency to be located within genes and to overlap with giSNP clusters in other population samples. As might be expected for this map density, many giSNPs were located close to one another, thus reflecting local regions of undetected recombination or haplotype blocks. However, approximately 1/3 of giSNP clusters had intermingled, non-indistinguishable SNPs with incomplete LD (D' and r2 <1), sometimes spanning hundreds of kilobases, comprising up to 70 indistinguishable markers and overlapping multiple haplotype blocks. These long-range, nonconsecutive giSNPs have implications for disease gene localization by allelic association as evidence for association at one locus will be indistinguishable from that at another locus, even though both loci may be situated far apart. We describe the distribution of giSNPs on this map of chromosome 20 and illustrate the potential impact they can have on association mapping.
Collapse
Affiliation(s)
- Robert Lawrence
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Sabbagh A, Darlu P. Data-Mining Methods as Useful Tools for Predicting Individual Drug Response: Application to CYP2D6 Data. Hum Hered 2006; 62:119-34. [PMID: 17057402 DOI: 10.1159/000096416] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2006] [Accepted: 08/22/2006] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES Selecting a maximally informative subset of polymorphisms to predict a clinical outcome, such as drug response, requires appropriate search methods due to the increased dimensionality associated with looking at multiple genotypes. In this study, we investigated the ability of several pattern recognition methods to identify the most informative markers in the CYP2D6 gene for the prediction of CYP2D6 metabolizer status. METHODS Four data-mining tools were explored: decision trees, random forests, artificial neural networks, and the multifactor dimensionality reduction (MDR) method. Marker selection was performed separately in eight population samples of different ethnic origin to evaluate to what extent the most informative markers differ across ethnic groups. RESULTS Our results show that the number of polymorphisms required to predict CYP2D6 metabolic phenotype with a high accuracy can be dramatically reduced owing to the strong haplotype block structure observed at CYP2D6. MDR and neural networks provided nearly identical results and performed the best. CONCLUSION Data-mining methods, such as MDR and neural networks, appear as promising tools to improve the efficiency of genotyping tests in pharmacogenetics with the ultimate goal of pre-screening patients for individual therapy selection with minimum genotyping effort.
Collapse
Affiliation(s)
- Audrey Sabbagh
- Unité de Recherche en Génétique Epidémiologique et Structure des Populations Humaines, INSERM U535, Villejuif, France.
| | | |
Collapse
|
40
|
Zeggini E, Rayner W, Morris AP, Hattersley AT, Walker M, Hitman GA, Deloukas P, Cardon LR, McCarthy MI. An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets. Nat Genet 2005; 37:1320-2. [PMID: 16258542 DOI: 10.1038/ng1670] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2005] [Accepted: 10/04/2005] [Indexed: 11/09/2022]
Abstract
A substantial investment has been made in the generation of large public resources designed to enable the identification of tag SNP sets, but data establishing the adequacy of the sample sizes used are limited. Using large-scale empirical and simulated data sets, we found that the sample sizes used in the HapMap project are sufficient to capture common variation, but that performance declines substantially for variants with minor allele frequencies of <5%.
Collapse
Affiliation(s)
- Eleftheria Zeggini
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|