1
|
Lin CY, Xing G, Ku HC, Elston RC, Xing C. Enhancing the power to detect low-frequency variants in genome-wide screens. Genetics 2014; 196:1293-302. [PMID: 24496013 PMCID: PMC3982702 DOI: 10.1534/genetics.113.160739] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Accepted: 01/26/2014] [Indexed: 11/18/2022] Open
Abstract
In genetic association studies a conventional test statistic is proportional to the correlation coefficient between the trait and the variant, with the result that it lacks power to detect association for low-frequency variants. Considering the link between the conventional association test statistics and the linkage disequilibrium measure r(2), we propose a test statistic analogous to the standardized linkage disequilibrium D' to increase the power of detecting association for low-frequency variants. By both simulation and real data analysis we show that the proposed D' test is more powerful than the conventional methods for detecting association for low-frequency variants in a genome-wide setting. The optimal coding strategy for the D' test and its asymptotic properties are also investigated. In summary, we advocate using the D' test in a dominant model as a complementary approach to enhancing the power of detecting association for low-frequency variants with moderate to large effect sizes in case-control genome-wide association studies.
Collapse
Affiliation(s)
- Chang-Yun Lin
- McDermott Center of Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas 75390
- Department of Applied Mathematics and Institute of Statistics, National Chung Hsing University, Taichung, Taiwan
| | - Guan Xing
- Bristol-Myers Squibb Company, Pennington, New Jersey 08534
| | - Hung-Chih Ku
- McDermott Center of Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas 75390
| | - Robert C. Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106
| | - Chao Xing
- McDermott Center of Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas 75390
| |
Collapse
|
2
|
Politopoulos I, Gibson J, Tapper W, Ennis S, Eccles D, Collins A. Genome-wide association of breast cancer: composite likelihood with imputed genotypes. Eur J Hum Genet 2010; 19:194-9. [PMID: 20959865 DOI: 10.1038/ejhg.2010.157] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
We describe composite likelihood-based analysis of a genome-wide breast cancer case-control sample from the Cancer Genetic Markers of Susceptibility project. We determine 14 380 genome regions of fixed size on a linkage disequilibrium (LD) map, which delimit comparable levels of LD. Although the numbers of single-nucleotide polymorphisms (SNPs) are highly variable, each region contains an average of ∼35 SNPs and an average of ∼69 after imputation of missing genotypes. Composite likelihood association mapping yields a single P-value for each region, established by a permutation test, along with a maximum likelihood disease location, SE and information weight. For single SNP analysis, the nominal P-value for the most significant SNP (msSNP) requires substantial correction given the number of SNPs in the region. Therefore, imputing genotypes may not always be advantageous for the msSNP test, in contrast to composite likelihood. For the region containing FGFR2 (a known breast cancer gene) the largest χ(2) is obtained under composite likelihood with imputed genotypes (χ(2)(2) increases from 20.6 to 22.7), and compares with a single SNP-based χ(2)(2) of 19.9 after correction. Imputation of additional genotypes in this region reduces the size of the 95% confidence interval for location of the disease gene by ∼40%. Among the highest ranked regions, SNPs in the NTSR1 gene would be worthy of examination in additional samples. Meta-analysis, which combines weighted evidence from composite likelihood in different samples, and refines putative disease locations, is facilitated through defining fixed regions on an underlying LD map.
Collapse
Affiliation(s)
- Ioannis Politopoulos
- Genetic Epidemiology and Bioinformatics Research Group, Human Genetics Research Division, University of Southampton, School of Medicine, Southampton General Hospital, Hants, UK
| | | | | | | | | | | |
Collapse
|
3
|
Scapoli C, Borzani I, Guarnelli M, Mamolini E, Annunziata M, Guida L, Trombelli L. IL-1 Gene Cluster is Not Linked to Aggressive Periodontitis. J Dent Res 2010; 89:457-61. [DOI: 10.1177/0022034510363232] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The interleukin-1 (IL-1) gene family has been associated with susceptibility to periodontal diseases, including aggressive periodontitis (AgP); however, the results are still conflicting. The present study investigated the association between IL-1 genes and AgP using 70 markers spanning the 1.1-Mb region, where the IL-1 gene family maps, and exploring both the linkage disequilibrium (LD) and the haplotype structure in a case-control study including 95 patients and 121 control individuals. No association between AgP and IL1A, IL1B, and IL1RN genes was found in either single-point or haplotype analyses. Also, the LD map of the region 2q13–14 under the Malécot model for multiple markers showed no causal association between AgP and polymorphisms within the region (p = 0.207). In conclusion, our findings failed to support the existence of a causative variant for generalized AgP within the 2q13–14 region in an Italian Caucasian population.
Collapse
Affiliation(s)
- C. Scapoli
- Department of Biology and Evolution, University of Ferrara, Corso Ercole I d’Este 32, I-44100 Ferrara, Italy
- Research Centre for the Study of Periodontal Diseases, University of Ferrara, Italy
| | - I. Borzani
- Department of Biology and Evolution, University of Ferrara, Corso Ercole I d’Este 32, I-44100 Ferrara, Italy
| | - M.E. Guarnelli
- Research Centre for the Study of Periodontal Diseases, University of Ferrara, Italy
| | - E. Mamolini
- Department of Biology and Evolution, University of Ferrara, Corso Ercole I d’Este 32, I-44100 Ferrara, Italy
| | - M. Annunziata
- Department of Odontostomatological, Orthodontic and Surgical Disciplines, Second University of Naples, Italy
| | - L. Guida
- Department of Odontostomatological, Orthodontic and Surgical Disciplines, Second University of Naples, Italy
| | - L. Trombelli
- Research Centre for the Study of Periodontal Diseases, University of Ferrara, Italy
| |
Collapse
|
4
|
Biernacka JM, Cordell HJ. A composite-likelihood approach for identifying polymorphisms that are potentially directly associated with disease. Eur J Hum Genet 2008; 17:644-50. [PMID: 19092770 DOI: 10.1038/ejhg.2008.242] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
If a linkage signal can be fully accounted for by the association of a particular polymorphism with the disease, this polymorphism may be the sole causal variant in the region. On the other hand, if the linkage signal exceeds that explained by the association, different or additional directly associated loci must exist in the region. Several methods have been proposed for testing the hypothesis that association with a particular candidate single-nucleotide polymorphism (SNP) can explain an observed linkage signal. When several candidate SNPs exist, all of the existing methods test the hypothesis for each candidate SNP separately, by fitting the appropriate model for each individual candidate SNP. Here we propose a method that combines analyses of two or more candidate SNPs using a composite-likelihood approach. We use simulations to demonstrate that the proposed method can lead to substantial power increases over the earlier single SNP analyses.
Collapse
Affiliation(s)
- Joanna M Biernacka
- Department of Health Sciences Research, Division of Biostatistics, Mayo Clinic, Rochester, MN 55905, USA.
| | | |
Collapse
|
5
|
Won S, Elston RC. The power of independent types of genetic information to detect association in a case-control study design. Genet Epidemiol 2008; 32:731-56. [DOI: 10.1002/gepi.20341] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
6
|
Individual disease risk and multimetric analysis of Crohn disease. Proc Natl Acad Sci U S A 2008; 105:15843-7. [PMID: 18843111 DOI: 10.1073/pnas.0808009105] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Rare dominant genes with high penetrance can be identified by linkage without inbreeding, whereas rare recessive genes with high penetrance are most efficiently recognized by autozygosity mapping of homozygotes in pedigrees with preferential inbreeding. On the contrary, complex inheritance is characterized by common genes with low penetrance, for which family studies and inbreeding are inefficient. Here, we develop the Fisherian theory for diallelic cases and controls, show that it compares favorably with Bayesian estimates, and evaluate their currently low power for discriminating cases and controls in Crohn disease (CD). Significance is enhanced by inclusion of composite likelihood, but identification of causal loci is delayed by low recognition of gene function. Clearly, association mapping is not yet optimal, and so strenuous effort is justified to develop a more inclusive gene map and association tests more powerful than single markers and the current use of composite likelihood. Because of its relatively high heritability and the correspondingly large number of detected causal loci, CD presents an ideal test system to determine the power and flaws of competing methods of whole-genome case/control association analysis in publicly available data. Until such a test is exploited by competing statisticians, their Herculean efforts will be inconclusive, and the costly advances from increased sample size will be suboptimal and disappointing.
Collapse
|
7
|
Abstract
The HapMap Project has shifted genetic epidemiology of complex inheritance away from linkage into association mapping of genes affecting disease and response to therapy. Starting with a physical map produced by the Human Genome Project and recent investigation of structural polymorphisms in HapMap samples, population-specific linkage disequilibrium (LD) maps that accurately reflect the fine structure of blocks and steps have been created for use in association mapping, and by interpolation to increase the resolution of linkage maps. All this evidence can be integrated by meta-analysis if expressed as an estimated location and its standard error, a property apparently unique to composite likelihood, recently freed from autocorrelation by permutation of affection status. Methods that do not estimate a standard error are easier to apply, but may be misleading if a causal marker has not been typed. The month of June 2007 saw advances in genome-wide association scans (GWAS) for several diseases. Many questions remain to be answered if genetic epidemiology is to continue the significant contribution to medicine that its definition promises and its history illustrates.
Collapse
Affiliation(s)
- Newton E Morton
- Human Genetics Division, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
8
|
Abstract
We studied the impact of marker density on the accuracy of association mapping using Genetic Analysis Workshop 15 simulated dense single-nucleotide polymorphism (SNP) data on chromosome 6. A total of 1500 cases and 2000 unaffected controls genotyped for 17,820 SNPs were analyzed. We applied the approach that combines information from multiple SNPs under the framework of the Malecot model and composite likelihood to non-overlapping regions of the chromosome. We successfully detected the associations with disease Loci C and D and predicted their locations as small as zero distance to Locus C when it was "typed" and 112 kb from the untyped rare Locus D. Reducing marker density decreased the accuracy of location estimates. However, the predicted locations were robust to variations in the number of SNPs. Generally, the linkage disequilibrium (LD) map reflecting distances between markers in relation to LD produced higher accuracy than the physical map. We also demonstrated that SNP selection based on equal LD distance outperforms that based on equal physical distance or SNP tagging. Furthermore, ignoring rare SNPs diminished the ability to detect rare causal variants.
Collapse
|
9
|
A multimetric approach to analysis of genome-wide association by single markers and composite likelihood. Proc Natl Acad Sci U S A 2008; 105:2592-7. [PMID: 18268331 DOI: 10.1073/pnas.0711903105] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Two case/control studies with different phenotypes, marker densities, and microarrays were examined for the most significant single markers in defined regions. They show a pronounced bias toward exaggerated significance that increases with the number of observed markers and would increase further with imputed markers. This bias is eliminated by Bonferroni adjustment, thereby allowing combination by principal component analysis with a Malecot model composite likelihood evaluated by a permutation procedure to allow for multiple dependent markers. This intermediate value identifies the only demonstrated causal locus as most significant even in the preliminary analysis and clearly recognizes the strongest candidate in the other sample. Because the three metrics (most significant single marker, composite likelihood, and their principal component) are correlated, choice of the n smallest P values by each test gives <3n regions for follow-up in the next stage. In this way, methods with different response to marker selection and density are given approximately equal weight and economically compared, without expressing an untested prejudice or sacrificing the most significant results for any of them. Large numbers of cases, controls, and markers are by themselves insufficient to control type 1 and 2 errors, and so efficient use of multiple metrics with Bonferroni adjustment promises to be valuable in identifying causal variants and optimal design simultaneously.
Collapse
|
10
|
Li N. The promise of composite likelihood methods for addressing computationally intensive challenges. ADVANCES IN GENETICS 2008; 60:637-654. [PMID: 18358335 DOI: 10.1016/s0065-2660(07)00422-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
High-dimensional genetic data, due to its complex correlation structure, poses an enormous challenge to standard likelihood-based methods for making statistical inference. As an approximation, composite likelihood has proved to be a successful strategy for some genetic applications. It has the potential to see even wider application and much research is needed. We first give a brief description of composite likelihood. The advantage of this method and potential challenges in inference are noted. Next, its applications in genetic studies are reviewed, specifically in estimating population genetics parameters such as recombination rate, and in multi-locus linkage disequilibrium mapping of disease genes with some discussion about future research directions.
Collapse
Affiliation(s)
- Na Li
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
11
|
Kuo TY, Lau W, Hu C, Zhang W. Association mapping of susceptibility loci for rheumatoid arthritis. BMC Proc 2007; 1 Suppl 1:S15. [PMID: 18466494 PMCID: PMC2367513 DOI: 10.1186/1753-6561-1-s1-s15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
We analyzed a case-control data set for chromosome 18q from the Genetic Analysis Workshop 15 to detect susceptibility loci for rheumatoid arthritis (RA). A total number of 460 cases and 460 unaffected controls were genotyped on 2300 single-nucleotide polymorphisms (SNPs) by the North American Rheumatoid Arthritis Consortium. Using a multimarker approach for association mapping under the framework of the Malecot model and composite likelihood, we identified a region showing significant association with RA (p < 0.002) and the predicted disease locus was at a genomic location of 53,306 kb with a 95% confidence interval (CI) of 53,295–53,331 kb. A common haplotype in this region was protective against RA (p = 0.002). In another region showing nominal significant association (51,585 kb, 95% CI: 51,541–51,628 kb, p = 0.037), a haplotype was also protective (p = 0.002). We further demonstrated that reducing SNP density decreased power and accuracy of association mapping. SNP selection based on equal linkage disequilibrium (LD) distance generally produced higher accuracy than that based on equal kilobase distance or tagging.
Collapse
Affiliation(s)
- Tai-Yue Kuo
- Human Genetics Division, Duthie Building (Mailpoint 808), Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK.
| | | | | | | |
Collapse
|
12
|
Abstract
Although single chi-square analysis of the North American Rheumatoid Arthritis Consortium (NARAC) data identifies many single-nucleotide polymorphisms (SNPs) with p-values less than 0.05, none remain significant after Bonferroni correction. In contrast, CHROMSCAN evades heavy Bonferroni correction and auto-correlation between SNPs by using composite likelihood to model association across all markers in a region and permutation to assess significance. Analysis by CHROMSCAN identifies a 36-kb interval that includes the most significant SNP (msSNP) observed in a 10-Mb target suggested by linkage. Unexpectedly, stratification by gender and age of onset shows that association evidence comes almost entirely from females with age of onset less than 40. Combining evidence from a meta-analysis of linkage studies and three subsets of the NARAC data provides significant evidence for a determinant of rheumatoid arthritis in a 36-kb interval and illustrates the principle that estimates of location and its information are more powerful than estimates of p-values alone.
Collapse
Affiliation(s)
- William Tapper
- Human Genetics Division, University of Southampton, Southampton General Hospital, Tremona Road, Southampton, Hampshire SO16 6YD, UK.
| | | | | |
Collapse
|
13
|
CHROMSCAN: genome-wide association using a linkage disequilibrium map. J Hum Genet 2007; 53:121-126. [DOI: 10.1007/s10038-007-0226-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2007] [Accepted: 11/07/2007] [Indexed: 10/22/2022]
|
14
|
Ennis S, Goverdhan S, Cree A, Hoh J, Collins A, Lotery A. Fine-scale linkage disequilibrium mapping of age-related macular degeneration in the complement factor H gene region. Br J Ophthalmol 2007; 91:966-70. [PMID: 17314151 PMCID: PMC1955647 DOI: 10.1136/bjo.2007.114090] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
AIM To present results from a nested association study of the complement factor H (CFH) gene region using a novel methodology that uses a high-resolution genetic linkage disequilibrium map to estimate a point location for a causal mutation. METHOD Age-related macular degeneration (AMD) case-control data from a genomewide single-nucleotide polymorphism (SNP) panel were used to identify the target interval to be genotyped at higher density in a second independent panel. The pattern of linkage disequilibrium (LD) and segmental duplications across this region are described in detail. RESULT Data were consistent with other studies in that strong association between the Y402H variant and AMD is observed. However, composite likelihood analysis, which combines association data from all SNPs in the region, and uses genetic locations on a high-resolution LD map, gave a point location for a causal variant between exons 1 and 2 of the CFH gene. CONCLUSION The findings are consistent with evidence that, in addition to the widely described Y402H variant, there is at least one and, most probably, several other mutations in the CFH gene which determine disease manifestation in AMD. A genetic model in which multiple mutations contribute to a varying degree to disease aetiology has been previously well described in ophthalmic genetics, and is typified by the COL2A1 and ABCA4 genes.
Collapse
Affiliation(s)
- Sarah Ennis
- Genetic Epidemiology and Bioinformatics Group, Human Genetics Division (MP 808), Southampton General Hospital, Southampton, UK
| | | | | | | | | | | |
Collapse
|
15
|
Dupuis J. Effect of linkage disequilibrium between markers in linkage and association analyses. Genet Epidemiol 2007; 31 Suppl 1:S139-48. [DOI: 10.1002/gepi.20291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
16
|
Wilcox MA, Li Z, Tapper W. Genetic association with rheumatoid arthritis—Genetic Analysis Workshop 15: summary of contributions from Group 2. Genet Epidemiol 2007; 31 Suppl 1:S12-21. [DOI: 10.1002/gepi.20276] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|