1
|
Wei C, Li M, Wen Y, Ye C, Lu Q. A multi-locus predictiveness curve and its summary assessment for genetic risk prediction. Stat Methods Med Res 2020; 29:44-56. [PMID: 30612522 PMCID: PMC6612460 DOI: 10.1177/0962280218819202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Genetic association studies using high-throughput genotyping and sequencing technologies have identified a large number of genetic variants associated with complex human diseases. These findings have provided an unprecedented opportunity to identify individuals in the population at high risk for disease who carry causal genetic mutations and hold great promise for early intervention and individualized medicine. While interest is high in building risk prediction models based on recent genetic findings, it is crucial to have appropriate statistical measurements to assess the performance of a genetic risk prediction model. Predictiveness curves were recently proposed as a graphic tool for evaluating a risk prediction model on the basis of a single continuous biomarker. The curve evaluates a risk prediction model for classification performance as well as its usefulness when applied to a population. In this article, we extend the predictiveness curve to measure the collective contribution of multiple genetic variants. We further propose a nonparametric, U-statistics-based measurement, referred to as the U-Index, to quantify the performance of a multi-locus predictiveness curve. In particular, a global U-Index and a partial U-Index can be used in the general population and a subpopulation of particular clinical interest, respectively. Through simulation studies, we demonstrate that the proposed U-Index has advantages over several existing summary statistics under various disease models. We also show that the partial U-Index can have its own uniqueness when rare variants have a substantial contribution to disease risk. Finally, we use the proposed predictiveness curve and its corresponding U-Index to evaluate the performance of a genetic risk prediction model for nicotine dependence.
Collapse
Affiliation(s)
- Changshuai Wei
- Core Artificial Intelligence, Amazon.com Inc, Seattle, WA, USA
| | - Ming Li
- Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, IN, USA
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Chengyin Ye
- Department of Health Management, Hangzhou Normal University, Hangzhou, China
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
2
|
Patron J, Serra-Cayuela A, Han B, Li C, Wishart DS. Assessing the performance of genome-wide association studies for predicting disease risk. PLoS One 2019; 14:e0220215. [PMID: 31805043 PMCID: PMC6894795 DOI: 10.1371/journal.pone.0220215] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Accepted: 11/01/2019] [Indexed: 12/24/2022] Open
Abstract
To date more than 3700 genome-wide association studies (GWAS) have been published that look at the genetic contributions of single nucleotide polymorphisms (SNPs) to human conditions or human phenotypes. Through these studies many highly significant SNPs have been identified for hundreds of diseases or medical conditions. However, the extent to which GWAS-identified SNPs or combinations of SNP biomarkers can predict disease risk is not well known. One of the most commonly used approaches to assess the performance of predictive biomarkers is to determine the area under the receiver-operator characteristic curve (AUROC). We have developed an R package called G-WIZ to generate ROC curves and calculate the AUROC using summary-level GWAS data. We first tested the performance of G-WIZ by using AUROC values derived from patient-level SNP data, as well as literature-reported AUROC values. We found that G-WIZ predicts the AUROC with <3% error. Next, we used the summary level GWAS data from GWAS Central to determine the ROC curves and AUROC values for 569 different GWA studies spanning 219 different conditions. Using these data we found a small number of GWA studies with SNP-derived risk predictors that have very high AUROCs (>0.75). On the other hand, the average GWA study produces a multi-SNP risk predictor with an AUROC of 0.55. Detailed AUROC comparisons indicate that most SNP-derived risk predictions are not as good as clinically based disease risk predictors. All our calculations (ROC curves, AUROCs, explained heritability) are in a publicly accessible database called GWAS-ROCS (http://gwasrocs.ca). The G-WIZ code is freely available for download at https://github.com/jonaspatronjp/GWIZ-Rscript/.
Collapse
Affiliation(s)
- Jonas Patron
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | | | - Beomsoo Han
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | - Carin Li
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | - David Scott Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
- Department of Computing Science, University of Alberta, Edmonton, Canada
| |
Collapse
|
3
|
Oliynyk RT. Future Preventive Gene Therapy of Polygenic Diseases from a Population Genetics Perspective. Int J Mol Sci 2019; 20:E5013. [PMID: 31658652 PMCID: PMC6834143 DOI: 10.3390/ijms20205013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/01/2019] [Accepted: 10/08/2019] [Indexed: 12/15/2022] Open
Abstract
With the accumulation of scientific knowledge of the genetic causes of common diseases and continuous advancement of gene-editing technologies, gene therapies to prevent polygenic diseases may soon become possible. This study endeavored to assess population genetics consequences of such therapies. Computer simulations were used to evaluate the heterogeneity in causal alleles for polygenic diseases that could exist among geographically distinct populations. The results show that although heterogeneity would not be easily detectable by epidemiological studies following population admixture, even significant heterogeneity would not impede the outcomes of preventive gene therapies. Preventive gene therapies designed to correct causal alleles to a naturally-occurring neutral state of nucleotides would lower the prevalence of polygenic early- to middle-age-onset diseases in proportion to the decreased population relative risk attributable to the edited alleles. The outcome would manifest differently for late-onset diseases, for which the therapies would result in a delayed disease onset and decreased lifetime risk; however, the lifetime risk would increase again with prolonging population life expectancy, which is a likely consequence of such therapies. If the preventive heritable gene therapies were to be applied on a large scale, the decreasing frequency of risk alleles in populations would reduce the disease risk or delay the age of onset, even with a fraction of the population receiving such therapies. With ongoing population admixture, all groups would benefit over generations.
Collapse
Affiliation(s)
- Roman Teo Oliynyk
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand.
- Department of Computer Science, University of Auckland, Auckland 1010, New Zealand.
| |
Collapse
|
4
|
Wessell AP, Kole MJ, Cannarsa G, Oliver J, Jindal G, Miller T, Gandhi D, Parikh G, Badjatia N, Aldrich EF, Simard JM. A sustained systemic inflammatory response syndrome is associated with shunt-dependent hydrocephalus after aneurysmal subarachnoid hemorrhage. J Neurosurg 2019; 130:1984-1991. [PMID: 29957109 DOI: 10.3171/2018.1.jns172925] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2017] [Accepted: 01/26/2018] [Indexed: 11/06/2022]
Abstract
OBJECTIVE The authors sought to evaluate whether a sustained systemic inflammatory response was associated with shunt-dependent hydrocephalus following aneurysmal subarachnoid hemorrhage. METHODS A retrospective analysis of 193 consecutive patients with aneurysmal subarachnoid hemorrhage was performed. Management of hydrocephalus followed a stepwise algorithm to determine the need for external CSF drainage and subsequent shunt placement. Systemic inflammatory response syndrome (SIRS) data were collected for all patients during the first 7 days of hospitalization. Patients who met the SIRS criteria every day for the first 7 days of hospitalization were considered as having a sustained SIRS. Univariate and multivariate regression analyses were used to determine predictors of shunt dependence. RESULTS Sixteen percent of patients required shunt placement. Sustained SIRS was observed in 35% of shunt-dependent patients compared to 14% in non-shunt-dependent patients (p = 0.004). On multivariate logistic regression, female sex (OR 0.35, 95% CI 0.142-0.885), moderate to severe vasospasm (OR 3.78, 95% CI 1.333-10.745), acute hydrocephalus (OR 21.39, 95% CI 2.260-202.417), and sustained SIRS (OR 2.94, 95% CI 1.125-7.689) were significantly associated with shunt dependence after aneurysmal subarachnoid hemorrhage. Receiver operating characteristic analysis revealed an area under the curve of 0.83 for the final regression model. CONCLUSIONS Sustained SIRS was a predictor of shunt-dependent hydrocephalus following aneurysmal subarachnoid hemorrhage even after adjustment for potential confounding variables in a multivariate logistic regression model.
Collapse
Affiliation(s)
| | | | | | | | - Gaurav Jindal
- Departments of1Neurosurgery
- 4Neurology
- 6Radiology, University of Maryland School of Medicine, Baltimore, Maryland
| | - Timothy Miller
- 6Radiology, University of Maryland School of Medicine, Baltimore, Maryland
| | - Dheeraj Gandhi
- Departments of1Neurosurgery
- 4Neurology
- 6Radiology, University of Maryland School of Medicine, Baltimore, Maryland
| | | | | | | | | |
Collapse
|
5
|
Ziv E, Tice JA, Sprague B, Vachon CM, Cummings SR, Kerlikowske K. Using Breast Cancer Risk Associated Polymorphisms to Identify Women for Breast Cancer Chemoprevention. PLoS One 2017; 12:e0168601. [PMID: 28107349 PMCID: PMC5249071 DOI: 10.1371/journal.pone.0168601] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 12/02/2016] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Breast cancer can be prevented with selective estrogen receptor modifiers (SERMs) and aromatase inhibitors (AIs). The US Preventive Services Task Force recommends that women with a 5-year breast cancer risk ≥3% consider chemoprevention for breast cancer. More than 70 single nucleotide polymorphisms (SNPs) have been associated with breast cancer. We sought to determine how to best integrate risk information from SNPs with other risk factors to risk stratify women for chemoprevention. METHODS We used the risk distribution among women ages 35-69 estimated by the Breast Cancer Surveillance Consortium (BCSC) risk model. We modeled the effect of adding 70 SNPs to the BCSC model and examined how this would affect how many women are reclassified above and below the threshold for chemoprevention. RESULTS We found that most of the benefit of SNP testing a population is achieved by testing a modest fraction of the population. For example, if women with a 5-year BCSC risk of >2.0% are tested (~21% of all women), ~75% of the benefit of testing all women (shifting women above or below 3% 5-year risk) would be derived. If women with a 5-year risk of >1.5% are tested (~36% of all women), ~90% of the benefit of testing all women would be derived. CONCLUSION SNP testing is effective for reclassification of women for chemoprevention, but is unlikely to reclassify women with <1.5% 5-year risk. These results can be used to implement an efficient two-step testing approach to identify high risk women who may benefit from chemoprevention.
Collapse
Affiliation(s)
- Elad Ziv
- Department of Medicine, University of California, San Francisco, California, United States of America
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, California, United States of America
- Institute for Human Genetics, University of California, San Francisco, California, United States of America
- * E-mail:
| | - Jeffrey A. Tice
- Department of Medicine, University of California, San Francisco, California, United States of America
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, United States of America
| | - Brian Sprague
- Department of Surgery and University of Vermont Cancer Center, University of Vermont, Burlington, Vermont, United States of America
| | - Celine M. Vachon
- Department of Health Sciences Research, Division of Epidemiology, Mayo Clinic College of Medicine, Rochester, Minnesota, United States of America
| | - Steven R. Cummings
- San Francisco Coordinating Center, California Pacific Medical Center Research Institute, San Francisco, California, United States of America
| | - Karla Kerlikowske
- Department of Medicine, University of California, San Francisco, California, United States of America
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, United States of America
- General Internal Medicine Section, Department of Veterans Affairs, University of California, San Francisco, California, United States of America
| |
Collapse
|
6
|
Breast cancer risk prediction using a clinical risk model and polygenic risk score. Breast Cancer Res Treat 2016; 159:513-25. [PMID: 27565998 DOI: 10.1007/s10549-016-3953-2] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2016] [Accepted: 08/18/2016] [Indexed: 12/12/2022]
Abstract
Breast cancer risk assessment can inform the use of screening and prevention modalities. We investigated the performance of the Breast Cancer Surveillance Consortium (BCSC) risk model in combination with a polygenic risk score (PRS) comprised of 83 single nucleotide polymorphisms identified from genome-wide association studies. We conducted a nested case-control study of 486 cases and 495 matched controls within a screening cohort. The PRS was calculated using a Bayesian approach. The contributions of the PRS and variables in the BCSC model to breast cancer risk were tested using conditional logistic regression. Discriminatory accuracy of the models was compared using the area under the receiver operating characteristic curve (AUROC). Increasing quartiles of the PRS were positively associated with breast cancer risk, with OR 2.54 (95 % CI 1.69-3.82) for breast cancer in the highest versus lowest quartile. In a multivariable model, the PRS, family history, and breast density remained strong risk factors. The AUROC of the PRS was 0.60 (95 % CI 0.57-0.64), and an Asian-specific PRS had AUROC 0.64 (95 % CI 0.53-0.74). A combined model including the BCSC risk factors and PRS had better discrimination than the BCSC model (AUROC 0.65 versus 0.62, p = 0.01). The BCSC-PRS model classified 18 % of cases as high-risk (5-year risk ≥3 %), compared with 7 % using the BCSC model. The PRS improved discrimination of the BCSC risk model and classified more cases as high-risk. Further consideration of the PRS's role in decision-making around screening and prevention strategies is merited.
Collapse
|
7
|
Kundu S, Kers JG, Janssens ACJW. Constructing Hypothetical Risk Data from the Area under the ROC Curve: Modelling Distributions of Polygenic Risk. PLoS One 2016; 11:e0152359. [PMID: 27023073 PMCID: PMC4811433 DOI: 10.1371/journal.pone.0152359] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Accepted: 03/14/2016] [Indexed: 02/07/2023] Open
Abstract
Background Modeling studies using hypothetical polygenic risk data can be an efficient tool for investigating the effectiveness of downstream applications such as targeting interventions to risk groups to justify whether empirical investigation is warranted. We investigated the assumptions underlying a method that simulates risk data for specific values of the area under the receiver operating characteristic curve (AUC). Methods The simulation method constructs risk data for a hypothetical population based on the population disease risk, and the odds ratios and frequencies of genetic variants. By systematically varying the parameters, we investigated under what conditions AUC values represent unique ROC curves with unique risk distributions for patients and nonpatients, and to what extend risk data can be simulated for precise values of the AUC. Results Using larger number of genetic variants each with a modest effect, we observed that the distributions of estimated risks of patients and nonpatients were similar for various combinations of the odds ratios and frequencies of the risk alleles. Simulated ROC curves overlapped empirical curves with the same AUC. Conclusions Polygenic risk data can be effectively and efficiently created using a simulation method. This allows to further investigate the potential applications of stratifying interventions on the basis of polygenic risk.
Collapse
Affiliation(s)
- Suman Kundu
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
- Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Jannigje G. Kers
- Department of Clinical Genetics/EMGO Institute for Health and Care Research, Section Community Genetics, VU University Medical Center, Amsterdam, The Netherlands
| | - A. Cecile J. W. Janssens
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
- Department of Clinical Genetics/EMGO Institute for Health and Care Research, Section Community Genetics, VU University Medical Center, Amsterdam, The Netherlands
- * E-mail:
| |
Collapse
|
8
|
Li M, Li J, Wei C, Lu Q, Tang X, Erickson SW, Macleod SL, Hobbs CA. A Three-Way Interaction among Maternal and Fetal Variants Contributing to Congenital Heart Defects. Ann Hum Genet 2016; 80:20-31. [PMID: 26612412 PMCID: PMC4839294 DOI: 10.1111/ahg.12139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 09/11/2015] [Indexed: 12/26/2022]
Abstract
Congenital heart defects (CHDs) develop through a complex interplay between genetic variants, epigenetic modifications, and maternal environmental exposures. Genetic studies of CHDs have commonly tested single genetic variants for association with CHDs. Less attention has been given to complex gene-by-gene and gene-by-environment interactions. In this study, we applied a recently developed likelihood-ratio Mann-Whitney (LRMW) method to detect joint actions among maternal variants, fetal variants, and maternal environmental exposures, allowing for high-order statistical interactions. All subjects are participants from the National Birth Defect Prevention Study, including 623 mother-offspring pairs with CHD-affected pregnancies and 875 mother-offspring pairs with unaffected pregnancies. Each individual has 872 single nucleotide polymorphisms encoding for critical enzymes in the homocysteine, folate, and trans-sulfuration pathways. By using the LRMW method, three variants (fetal rs625879, maternal rs2169650, and maternal rs8177441) were identified with a joint association to CHD risk (nominal P-value = 1.13e-07). These three variants are located within genes BHMT2, GSTP1, and GPX3, respectively. Further examination indicated that maternal SNP rs2169650 may interact with both fetal SNP rs625879 and maternal SNP rs8177441. Our findings suggest that the risk of CHD may be influenced by both the intragenerational interaction within the maternal genome and the intergenerational interaction between maternal and fetal genomes.
Collapse
Affiliation(s)
- Ming Li
- Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, IN 47405
| | - Jingyun Li
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR. 72211
| | - Changshuai Wei
- Department of Epidemiology and Biostatistics, University of North Texas Health Science Center, Fort Worth, TX 76107
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824
| | - Xinyu Tang
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR. 72211
| | - Stephen W. Erickson
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR. 72211
| | - Stewart L. Macleod
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR. 72211
| | - Charlotte A. Hobbs
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR. 72211
| | | |
Collapse
|
9
|
Wei C, Lu Q. GWGGI: software for genome-wide gene-gene interaction analysis. BMC Genet 2014; 15:101. [PMID: 25318532 PMCID: PMC4201693 DOI: 10.1186/s12863-014-0101-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Accepted: 09/11/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND While the importance of gene-gene interactions in human diseases has been well recognized, identifying them has been a great challenge, especially through association studies with millions of genetic markers and thousands of individuals. Computationally efficient and powerful tools are in great need for the identification of new gene-gene interactions in high-dimensional association studies. RESULT We develop C++ software for genome-wide gene-gene interaction analyses (GWGGI). GWGGI utilizes tree-based algorithms to search a large number of genetic markers for a disease-associated joint association with the consideration of high-order interactions, and then uses non-parametric statistics to test the joint association. The package includes two functions, likelihood ratio Mann-Whitney (LRMW) and Tree Assembling Mann-Whitney (TAMW). We optimize the data storage and computational efficiency of the software, making it feasible to run the genome-wide analysis on a personal computer. The use of GWGGI was demonstrated by using two real data-sets with nearly 500 k genetic markers. CONCLUSION Through the empirical study, we demonstrated that the genome-wide gene-gene interaction analysis using GWGGI could be accomplished within a reasonable time on a personal computer (i.e., ~3.5 hours for LRMW and ~10 hours for TAMW). We also showed that LRMW was suitable to detect interaction among a small number of genetic variants with moderate-to-strong marginal effect, while TAMW was useful to detect interaction among a larger number of low-marginal-effect genetic variants.
Collapse
Affiliation(s)
- Changshuai Wei
- />Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824 USA
- />Department of Biostatistics and Epidemiology, University of North Texas Health Science Center, Fort Worth, TX 76107 USA
| | - Qing Lu
- />Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824 USA
| |
Collapse
|
10
|
Kundu S, Mihaescu R, Meijer CMC, Bakker R, Janssens ACJW. Estimating the predictive ability of genetic risk models in simulated data based on published results from genome-wide association studies. Front Genet 2014; 5:179. [PMID: 24982668 PMCID: PMC4056181 DOI: 10.3389/fgene.2014.00179] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 05/27/2014] [Indexed: 01/18/2023] Open
Abstract
Background: There is increasing interest in investigating genetic risk models in empirical studies, but such studies are premature when the expected predictive ability of the risk model is low. We assessed how accurately the predictive ability of genetic risk models can be estimated in simulated data that are created based on the odds ratios (ORs) and frequencies of single-nucleotide polymorphisms (SNPs) obtained from genome-wide association studies (GWASs). Methods: We aimed to replicate published prediction studies that reported the area under the receiver operating characteristic curve (AUC) as a measure of predictive ability. We searched GWAS articles for all SNPs included in these models and extracted ORs and risk allele frequencies to construct genotypes and disease status for a hypothetical population. Using these hypothetical data, we reconstructed the published genetic risk models and compared their AUC values to those reported in the original articles. Results: The accuracy of the AUC values varied with the method used for the construction of the risk models. When logistic regression analysis was used to construct the genetic risk model, AUC values estimated by the simulation method were similar to the published values with a median absolute difference of 0.02 [range: 0.00, 0.04]. This difference was 0.03 [range: 0.01, 0.06] and 0.05 [range: 0.01, 0.08] for unweighted and weighted risk scores. Conclusions: The predictive ability of genetic risk models can be estimated using simulated data based on results from GWASs. Simulation methods can be useful to estimate the predictive ability in the absence of empirical data and to decide whether empirical investigation of genetic risk models is warranted.
Collapse
Affiliation(s)
- Suman Kundu
- Department of Epidemiology, Erasmus University Medical Center Rotterdam, Netherlands
| | - Raluca Mihaescu
- Department of Epidemiology, Erasmus University Medical Center Rotterdam, Netherlands
| | - Catherina M C Meijer
- Department of Epidemiology, Erasmus University Medical Center Rotterdam, Netherlands
| | - Rachel Bakker
- Department of Epidemiology, Erasmus University Medical Center Rotterdam, Netherlands
| | - A Cecile J W Janssens
- Department of Epidemiology, Erasmus University Medical Center Rotterdam, Netherlands ; Department of Epidemiology, Rollins School of Public Health, Emory University Atlanta, GA, USA
| |
Collapse
|
11
|
Bridge: a GUI package for genetic risk prediction. BMC Genet 2013; 14:122. [PMID: 24359333 PMCID: PMC3878190 DOI: 10.1186/1471-2156-14-122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Accepted: 12/11/2013] [Indexed: 11/19/2022] Open
Abstract
Background Risk prediction models capitalizing on genetic and environmental information hold great promise for individualized disease prediction and prevention. Nevertheless, linking the genetic and environmental risk predictors into a useful risk prediction model remains a great challenge. To facilitate risk prediction analyses, we have developed a graphical user interface package, Bridge. Results The package is built for both designing and analyzing a risk prediction model. In the design stage, it provides an estimated classification accuracy of the model using essential genetic and environmental information gained from public resources and/or previous studies, and determines the sample size required to verify this accuracy. In the analysis stage, it adopts a robust and powerful algorithm to form the risk prediction model. Conclusions The package is developed based on the optimality theory of the likelihood ratio and therefore theoretically could form a model with high performance. It can be used to handle a relatively large number of genetic and environmental predictors, with consideration of their possible interactions, and so is particularly useful for studying risk prediction models for common complex diseases.
Collapse
|
12
|
Scott IC, Seegobin SD, Steer S, Tan R, Forabosco P, Hinks A, Eyre S, Morgan AW, Wilson AG, Hocking LJ, Wordsworth P, Barton A, Worthington J, Cope AP, Lewis CM. Predicting the risk of rheumatoid arthritis and its age of onset through modelling genetic risk variants with smoking. PLoS Genet 2013; 9:e1003808. [PMID: 24068971 PMCID: PMC3778023 DOI: 10.1371/journal.pgen.1003808] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 08/05/2013] [Indexed: 11/18/2022] Open
Abstract
The improved characterisation of risk factors for rheumatoid arthritis (RA) suggests they could be combined to identify individuals at increased disease risks in whom preventive strategies may be evaluated. We aimed to develop an RA prediction model capable of generating clinically relevant predictive data and to determine if it better predicted younger onset RA (YORA). Our novel modelling approach combined odds ratios for 15 four-digit/10 two-digit HLA-DRB1 alleles, 31 single nucleotide polymorphisms (SNPs) and ever-smoking status in males to determine risk using computer simulation and confidence interval based risk categorisation. Only males were evaluated in our models incorporating smoking as ever-smoking is a significant risk factor for RA in men but not women. We developed multiple models to evaluate each risk factor's impact on prediction. Each model's ability to discriminate anti-citrullinated protein antibody (ACPA)-positive RA from controls was evaluated in two cohorts: Wellcome Trust Case Control Consortium (WTCCC: 1,516 cases; 1,647 controls); UK RA Genetics Group Consortium (UKRAGG: 2,623 cases; 1,500 controls). HLA and smoking provided strongest prediction with good discrimination evidenced by an HLA-smoking model area under the curve (AUC) value of 0.813 in both WTCCC and UKRAGG. SNPs provided minimal prediction (AUC 0.660 WTCCC/0.617 UKRAGG). Whilst high individual risks were identified, with some cases having estimated lifetime risks of 86%, only a minority overall had substantially increased odds for RA. High risks from the HLA model were associated with YORA (P<0.0001); ever-smoking associated with older onset disease. This latter finding suggests smoking's impact on RA risk manifests later in life. Our modelling demonstrates that combining risk factors provides clinically informative RA prediction; additionally HLA and smoking status can be used to predict the risk of younger and older onset RA, respectively. Rheumatoid arthritis (RA) is a common, incurable disease with major individual and health service costs. Preventing its development is therefore an important goal. Being able to predict who will develop RA would allow researchers to look at ways to prevent it. Many factors have been found that increase someone's risk of RA. These are divided into genetic and environmental (such as smoking) factors. The risk of RA associated with each factor has previously been reported. Here, we demonstrate a method that combines these risk factors in a process called “prediction modelling” to estimate someone's lifetime risk of RA. We show that firstly, our prediction models can identify people with very high-risks of RA and secondly, they can be used to identify people at risk of developing RA at a younger age. Although these findings are an important first step towards preventing RA, as only a minority of people tested had substantially increased disease risks our models could not be used to screen the general population. Instead they need testing in people already at risk of RA such as relatives of affected patients. In this context they could identify enough numbers of high-risk people to allow preventive methods to be evaluated.
Collapse
Affiliation(s)
- Ian C. Scott
- Academic Department of Rheumatology, Centre for Molecular and Cellular Biology of Inflammation, King's College London, London, United Kingdom
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
- * E-mail:
| | - Seth D. Seegobin
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
| | - Sophia Steer
- Department of Rheumatology, King's College Hospital, London, United Kingdom
| | - Rachael Tan
- Academic Department of Rheumatology, Centre for Molecular and Cellular Biology of Inflammation, King's College London, London, United Kingdom
| | - Paola Forabosco
- Istituto di Genetica delle Popolazioni, Consiglio Nazionale delle Ricerche, Sassari, Italy
| | - Anne Hinks
- Arthritis Research UK Epidemiology Unit, Centre for Musculoskeletal Research, Institute of Inflammation and Repair, The University of Manchester, Manchester, United Kingdom
| | - Stephen Eyre
- Arthritis Research UK Epidemiology Unit, Centre for Musculoskeletal Research, Institute of Inflammation and Repair, The University of Manchester, Manchester, United Kingdom
| | - Ann W. Morgan
- Division of Musculoskeletal Disease, Leeds Institute of Molecular Medicine, University of Leeds and National Institute for Health Research – Leeds Musculoskeletal Biomedical Research Unit, Leeds, United Kingdom
| | - Anthony G. Wilson
- Academic Unit of Rheumatology, Department of Infection and Immunity, University of Sheffield Medical School, Sheffield, United Kingdom
| | - Lynne J. Hocking
- Musculoskeletal Research Programme, Division of Applied Medicine, University of Aberdeen, Institute of Medical Sciences, Foresterhill, Aberdeen, United Kingdom
| | - Paul Wordsworth
- NIHR Oxford Musculoskeletal BRU, Nuffield Orthopaedic Centre, Oxford, United Kingdom
| | - Anne Barton
- Arthritis Research UK Epidemiology Unit, Centre for Musculoskeletal Research, Institute of Inflammation and Repair, The University of Manchester, Manchester, United Kingdom
| | - Jane Worthington
- Arthritis Research UK Epidemiology Unit, Centre for Musculoskeletal Research, Institute of Inflammation and Repair, The University of Manchester, Manchester, United Kingdom
| | - Andrew P. Cope
- Academic Department of Rheumatology, Centre for Molecular and Cellular Biology of Inflammation, King's College London, London, United Kingdom
| | - Cathryn M. Lewis
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
- Social, Genetic and Developmental Psychiatry Centre (MRC), Institute of Psychiatry, London, United Kingdom
| |
Collapse
|
13
|
Echouffo-Tcheugui JB, Dieffenbach SD, Kengne AP. Added value of novel circulating and genetic biomarkers in type 2 diabetes prediction: a systematic review. Diabetes Res Clin Pract 2013; 101:255-69. [PMID: 23647943 DOI: 10.1016/j.diabres.2013.03.023] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Revised: 10/13/2012] [Accepted: 03/15/2013] [Indexed: 02/02/2023]
Abstract
AIMS To provide a systematic overview of the added value of novel circulating and genetic biomarkers in predicting type 2 diabetes (T2DM). METHODS We searched MEDLINE and EMBASE (January 2000 to September 2012) for studies that reported a measure of improvement in the performance of T2DM risk prediction models subsequent to adding novel biomarkers to traditional risk factors. We extracted data on study methods and metrics of incremental predictive value of novel biomarkers. RESULTS We included 34 publications from 30 studies. All studies reported a change in the area under the receiver-operating characteristic curve, which was modest, ranging from -0.004 to 0.1, with claims of statistically significant improvements in eleven studies. The net reclassification index was evaluated in 11 studies, and ranged from -2.2% to 10.2% after inclusion of genetic markers in six studies (statistically significant in two cases), and from -0.5% to 27.5% after inclusion of non-genetic markers in five studies (non-significant in two studies). The integrated discrimination index (0-2.04) was reported in eight studies, being statistically significant in five of these. CONCLUSIONS Currently known novel circulating and genetic biomarkers do not substantially improve T2DM risk prediction above and beyond the ability of traditional risk factors.
Collapse
Affiliation(s)
- Justin B Echouffo-Tcheugui
- Hubert Department of Global Health, Rollins School of Public Health, Emory University, 1518 Clifton Road, Northeast Atlanta, GA 30322, USA.
| | | | | |
Collapse
|
14
|
Lu Q, Wei C, Ye C, Li M, Elston RC. A likelihood ratio-based Mann-Whitney approach finds novel replicable joint gene action for type 2 diabetes. Genet Epidemiol 2012; 36:583-93. [PMID: 22760990 PMCID: PMC3634342 DOI: 10.1002/gepi.21651] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2012] [Revised: 04/09/2012] [Accepted: 05/09/2012] [Indexed: 12/29/2022]
Abstract
The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio-based Mann-Whitney test to search for joint gene action either among candidate genes or genome-wide. It extends the traditional univariate Mann-Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high-order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome-wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single-nucleotide polymorphisms (SNPs), we identified a four-locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P-value < 0.001), and replicated the same finding in the Nurses' Health Study/Health Professionals Follow-Up Study (NHS/HPFS) (P-value = 3.03×10-11). We also conducted a genome-wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P-value = 1.29×10-5). The nominal significance of this same association reached 4.01×10-6 in the NHS/HPFS.
Collapse
Affiliation(s)
- Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan
| | - Changshuai Wei
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan
| | - Chengyin Ye
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan
| | - Ming Li
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan
| | - Robert C. Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
15
|
Westhoff CL, Petrie KA, Cremers S. Using changes in binding globulins to assess oral contraceptive compliance. Contraception 2012; 87:176-81. [PMID: 22795088 DOI: 10.1016/j.contraception.2012.06.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2012] [Revised: 06/05/2012] [Accepted: 06/06/2012] [Indexed: 01/29/2023]
Abstract
BACKGROUND Validity of oral contraceptive pill (OCP) clinical trial results depends on participant compliance. Ethinyl estradiol (EE2) induces increases in hepatic binding globulin (BG) levels. Measuring these BG increases may provide an effective and convenient approach to distinguish noncompliant from compliant OCP users in research settings. This analysis evaluated the usefulness of measuring increases in corticosteroid-, sex-hormone- and thyroxine-binding globulins (CBG, SHBG and TBG, respectively) as measures of OCP compliance. METHODS We used frozen serum from a trial that compared ovarian suppression between normal-weight and obese women randomized to one of two OCPs containing EE2 and levonorgestrel (LNG). Based on serial LNG measurements during the trial, 17% of participants were noncompliant. We matched noncompliant participants with compliant participants by age, body mass index, ethnicity and OCP formulation. We measured CBG, SHBG and TBG levels and compared change from baseline to 3-month follow-up between the noncompliant and compliant participants. Construction of receiver operator characteristic (ROC) curves allowed comparison of various BG measures. RESULTS Changes in CBG and TBG distinguished OCP noncompliant users from compliant users [area under the ROC curve (AUROC), 0.86 and 0.89, p<.01]. Changes in SHBG were less discriminating (AUROC 0.69) CONCLUSIONS EE2-induced increases in CBG and TBG provide a sensitive integrated marker of compliance with an LNG-containing OCP.
Collapse
Affiliation(s)
- Carolyn L Westhoff
- Department of Obstetrics and Gynecology, Columbia University Medical Center, New York, NY 10032, USA.
| | | | | |
Collapse
|
16
|
Analytical and simulation methods for estimating the potential predictive ability of genetic profiling: a comparison of methods and results. Eur J Hum Genet 2012; 20:1270-4. [PMID: 22643180 DOI: 10.1038/ejhg.2012.89] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Various modeling methods have been proposed to estimate the potential predictive ability of polygenic risk variants that predispose to various common diseases. However, it is unknown whether differences between them affect their conclusions on predictive ability. We reviewed input parameters, assumptions and output of the five most common methods and compared their estimates of the area under the receiver operating characteristic (ROC) curve (AUC) using hypothetical data representing effect sizes and frequencies of genetic variants, population disease risk and number of variants. To assess the accuracy of the estimated AUCs, we aimed to reproduce the AUCs of published empirical studies. All methods assumed that the combined effect of genetic variants on disease risk followed a multiplicative risk model of independent genetic effects, but they either assumed per allele, per genotype or dominant/recessive effects for the genetic variants. Modeling strategy and input parameters differed. Methods used simulation analysis or analytical formulas with effect sizes quantified by odds ratios (ORs) or relative risks. Estimated AUC values were similar for lower ORs (<1.2). When AUCs were larger (>0.7) due to variants with strong effects, differences in estimated AUCs between methods increased. The simulation methods accurately reproduced the AUC values of empirical studies, but the analytical methods did not. We conclude that despite differences in input parameters, the modeling methods estimate similar AUC for realistic values of the ORs. When one or more variants have stronger effects and AUC values are higher, the simulation methods tend to be more accurate.
Collapse
|
17
|
Abstract
People vary genetically in their susceptibility to the effects of environmental risk factors for many diseases. Genetic variation also underlies the extent to which people respond appropriately to clinical therapies. Defining the basis to the interactions between the genome and the environment may help elucidate the biologic basis to diseases such as type 2 diabetes, as well as help target preventive therapies and treatments. This review examines 1) some of the most current evidence on gene × environment interactions in relation to type 2 diabetes; 2) outlines how the availability of information on gene × environment interactions might help improve the prevention and treatment of type 2 diabetes; and 3) discusses existing and emerging strategies that might enhance our ability to detect and exploit gene × environment interactions in complex disease traits.
Collapse
Affiliation(s)
- Paul W Franks
- Department of Clinical Sciences, Genetic & Molecular Epidemiology Unit, Skåne University Hospital Malmö, 205 02 Malmö, Sweden.
| |
Collapse
|
18
|
Wei C, Lu Q. Collapsing ROC approach for risk prediction research on both common and rare variants. BMC Proc 2011; 5 Suppl 9:S42. [PMID: 22373267 PMCID: PMC3287879 DOI: 10.1186/1753-6561-5-s9-s42] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Risk prediction that capitalizes on emerging genetic findings holds great promise for improving public health and clinical care. However, recent risk prediction research has shown that predictive tests formed on existing common genetic loci, including those from genome-wide association studies, have lacked sufficient accuracy for clinical use. Because most rare variants on the genome have not yet been studied for their role in risk prediction, future disease prediction discoveries should shift toward a more comprehensive risk prediction strategy that takes into account both common and rare variants. We are proposing a collapsing receiver operating characteristic (CROC) approach for risk prediction research on both common and rare variants. The new approach is an extension of a previously developed forward ROC (FROC) approach, with additional procedures for handling rare variants. The approach was evaluated through the use of 533 single-nucleotide polymorphisms (SNPs) in 37 candidate genes from the Genetic Analysis Workshop 17 mini-exome data set. We found that a prediction model built on all SNPs gained more accuracy (AUC = 0.605) than one built on common variants alone (AUC = 0.585). We further evaluated the performance of two approaches by gradually reducing the number of common variants in the analysis. We found that the CROC method attained more accuracy than the FROC method when the number of common variants in the data decreased. In an extreme scenario, when there are only rare variants in the data, the CROC reached an AUC value of 0.603, whereas the FROC had an AUC value of 0.524.
Collapse
Affiliation(s)
- Changshuai Wei
- Department of Epidemiology, Michigan State University, East Lansing, MI 48824, USA.
| | | |
Collapse
|
19
|
Calle ML, Urrea V, Boulesteix AL, Malats N. AUC-RF: a new strategy for genomic profiling with random forest. Hum Hered 2011; 72:121-32. [PMID: 21996641 DOI: 10.1159/000330778] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2011] [Accepted: 07/11/2011] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE Genomic profiling, the use of genetic variants at multiple loci simultaneously for the prediction of disease risk, requires the selection of a set of genetic variants that best predicts disease status. The goal of this work was to provide a new selection algorithm for genomic profiling. METHODS We propose a new algorithm for genomic profiling based on optimizing the area under the receiver operating characteristic curve (AUC) of the random forest (RF). The proposed strategy implements a backward elimination process based on the initial ranking of variables. RESULTS AND CONCLUSIONS We demonstrate the advantage of using the AUC instead of the classification error as a measure of predictive accuracy of RF. In particular, we show that the use of the classification error is especially inappropriate when dealing with unbalanced data sets. The new procedure for variable selection and prediction, namely AUC-RF, is illustrated with data from a bladder cancer study and also with simulated data. The algorithm is publicly available as an R package, named AUCRF, at http://cran.r-project.org/.
Collapse
Affiliation(s)
- M Luz Calle
- Systems Biology Department, University of Vic, Spain. malu.calle @ uvic.cat
| | | | | | | |
Collapse
|
20
|
Nakaoka H, Cui T, Tajima A, Oka A, Mitsunaga S, Kashiwase K, Homma Y, Sato S, Suzuki Y, Inoko H, Inoue I. A systems genetics approach provides a bridge from discovered genetic variants to biological pathways in rheumatoid arthritis. PLoS One 2011; 6:e25389. [PMID: 21980439 PMCID: PMC3182219 DOI: 10.1371/journal.pone.0025389] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Accepted: 09/02/2011] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) have yielded novel genetic loci underlying common diseases. We propose a systems genetics approach to utilize these discoveries for better understanding of the genetic architecture of rheumatoid arthritis (RA). Current evidence of genetic associations with RA was sought through PubMed and the NHGRI GWAS catalog. The associations of 15 single nucleotide polymorphisms and HLA-DRB1 alleles were confirmed in 1,287 cases and 1,500 controls of Japanese subjects. Among these, HLA-DRB1 alleles and eight SNPs showed significant associations and all but one of the variants had the same direction of effect as identified in the previous studies, indicating that the genetic risk factors underlying RA are shared across populations. By receiver operating characteristic curve analysis, the area under the curve (AUC) for the genetic risk score based on the selected variants was 68.4%. For seropositive RA patients only, the AUC improved to 70.9%, indicating good but suboptimal predictive ability. A simulation study shows that more than 200 additional loci with similar effect size as recent GWAS findings or 20 rare variants with intermediate effects are needed to achieve AUC = 80.0%. We performed the random walk with restart (RWR) algorithm to prioritize genes for future mapping studies. The performance of the algorithm was confirmed by leave-one-out cross-validation. The RWR algorithm pointed to ZAP70 in the first rank, in which mutation causes RA-like autoimmune arthritis in mice. By applying the hierarchical clustering method to a subnetwork comprising RA-associated genes and top-ranked genes by the RWR, we found three functional modules relevant to RA etiology: "leukocyte activation and differentiation", "pattern-recognition receptor signaling pathway", and "chemokines and their receptors".These results suggest that the systems genetics approach is useful to find directions of future mapping strategies to illuminate biological pathways.
Collapse
Affiliation(s)
- Hirofumi Nakaoka
- Division of Human Genetics, Department of Integrated Genetics, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Tailin Cui
- Division of Molecular Life Science, School of Medicine, Tokai University, Isehara, Kanagawa, Japan
| | - Atsushi Tajima
- Division of Molecular Life Science, School of Medicine, Tokai University, Isehara, Kanagawa, Japan
- Department of Human Genetics and Public Health, Institute of Health Biosciences, The University of Tokusima Graduate School, Tokushima, Tokushima, Japan
| | - Akira Oka
- Division of Molecular Life Science, School of Medicine, Tokai University, Isehara, Kanagawa, Japan
| | - Shigeki Mitsunaga
- Division of Molecular Life Science, School of Medicine, Tokai University, Isehara, Kanagawa, Japan
| | - Koichi Kashiwase
- Department of Laboratory, Japanese Red Cross Tokyo Blood Center, Koto-ku, Tokyo, Japan
| | - Yasuhiko Homma
- Department of Clinical Health Science, Tokai University School of Medicine, Isehara, Kanagawa, Japan
| | - Shinji Sato
- Department of Internal Medicine, Division of Rheumatology, Tokai University School of Medicine, Isehara, Kanagawa, Japan
| | - Yasuo Suzuki
- Department of Internal Medicine, Division of Rheumatology, Tokai University School of Medicine, Isehara, Kanagawa, Japan
| | - Hidetoshi Inoko
- Division of Molecular Life Science, School of Medicine, Tokai University, Isehara, Kanagawa, Japan
| | - Ituro Inoue
- Division of Human Genetics, Department of Integrated Genetics, National Institute of Genetics, Mishima, Shizuoka, Japan
- Division of Molecular Life Science, School of Medicine, Tokai University, Isehara, Kanagawa, Japan
| |
Collapse
|
21
|
Torkamani A, Scott-Van Zeeland AA, Topol EJ, Schork NJ. Annotating individual human genomes. Genomics 2011; 98:233-41. [PMID: 21839162 DOI: 10.1016/j.ygeno.2011.07.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 07/26/2011] [Indexed: 02/03/2023]
Abstract
Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.
Collapse
|
22
|
Ye C, Cui Y, Wei C, Elston RC, Zhu J, Lu Q. A Non-Parametric Method for Building Predictive Genetic Tests on High-Dimensional Data. Hum Hered 2011; 71:161-70. [DOI: 10.1159/000327299] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2010] [Accepted: 03/10/2011] [Indexed: 12/12/2022] Open
|
23
|
Kang J, Kugathasan S, Georges M, Zhao H, Cho JH. Improved risk prediction for Crohn's disease with a multi-locus approach. Hum Mol Genet 2011; 20:2435-42. [PMID: 21427131 PMCID: PMC3298027 DOI: 10.1093/hmg/ddr116] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Revised: 01/28/2011] [Accepted: 03/16/2011] [Indexed: 12/14/2022] Open
Abstract
Genome-wide association studies have identified numerous loci demonstrating genome-wide significant association with Crohn's disease. However, when many single nucleotide polymorphisms (SNPs) have weak-to-moderate disease risks, genetic risk prediction models based only on those markers that pass the most stringent statistical significance testing threshold may be suboptimal. Haplotype-based predictive models may provide advantages over single-SNP approaches by facilitating detection of associations driven by cis-interactions among nearby SNPs. In addition, these approaches may be helpful in assaying non-genotyped, rare causal variants. In this study, we investigated the use of two-marker haplotypes for risk prediction in Crohn's disease and show that it leads to improved prediction accuracy compared with single-point analyses. With large numbers of predictors, traditional classification methods such as logistic regression and support vector machine approaches may be suboptimal. An alternative approach is to apply the risk-score method calculated as the number of risk haplotypes an individual carries, both within and across loci. We used the area under the curve (AUC) of the receiver operating curve to assess the performance of prediction models in large-scale genetic data, and observed that the prediction performance in the validation cohort continues to improve as thousands of haplotypes are included in the model, with the AUC reaching its plateau at 0.72 at ∼7000 haplotypes, and begins to gradually decline after that point. In contrast, using the SNP as predictors, we only obtained maximum AUC of 0.65. Validation studies in independent cohorts further support improved prediction capacity with multi-marker, as opposed to single marker analyses.
Collapse
Affiliation(s)
- Jia Kang
- Department of Epidemiology and Public Health and
| | - Subra Kugathasan
- Pediatrics and Human Genetics, Emory University, Atlanta, GA, USA and
| | | | - Hongyu Zhao
- Department of Epidemiology and Public Health and
| | - Judy H. Cho
- Department of Medicine and Genetics, Yale University, New Haven, CT, USA
| |
Collapse
|
24
|
Li M, Ye C, Fu W, Elston RC, Lu Q. Detecting genetic interactions for quantitative traits with U-statistics. Genet Epidemiol 2011; 35:457-68. [PMID: 21618602 DOI: 10.1002/gepi.20594] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2011] [Revised: 03/09/2011] [Accepted: 04/19/2011] [Indexed: 11/08/2022]
Abstract
The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this study, we propose a novel Forward U-Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene-gene/gene-environment interactions. In this new approach, a U-Statistic-based forward algorithm is first used to select potential disease-susceptibility loci and then a weighted U-statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U-Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high-dimensional gene-gene/gene-environment research. We illustrate our method with a real data application to nicotine dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene-gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (P-value = 5.31e-7). The association, which involves essential interaction, is replicated in two independent datasets with P-values of 1.08e-5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products.
Collapse
Affiliation(s)
- Ming Li
- Department of Epidemiology, Michigan State University, East Lansing, MI 48824, USA
| | | | | | | | | |
Collapse
|
25
|
Fang S, Fang X, Xiong M. Psoriasis prediction from genome-wide SNP profiles. BMC DERMATOLOGY 2011; 11:1. [PMID: 21214922 PMCID: PMC3022824 DOI: 10.1186/1471-5945-11-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2010] [Accepted: 01/07/2011] [Indexed: 11/10/2022]
Abstract
BACKGROUND With the availability of large-scale genome-wide association study (GWAS) data, choosing an optimal set of SNPs for disease susceptibility prediction is a challenging task. This study aimed to use single nucleotide polymorphisms (SNPs) to predict psoriasis from searching GWAS data. METHODS Totally we had 2,798 samples and 451,724 SNPs. Process for searching a set of SNPs to predict susceptibility for psoriasis consisted of two steps. The first one was to search top 1,000 SNPs with high accuracy for prediction of psoriasis from GWAS dataset. The second one was to search for an optimal SNP subset for predicting psoriasis. The sequential information bottleneck (sIB) method was compared with classical linear discriminant analysis(LDA) for classification performance. RESULTS The best test harmonic mean of sensitivity and specificity for predicting psoriasis by sIB was 0.674(95% CI: 0.650-0.698), while only 0.520(95% CI: 0.472-0.524) was reported for predicting disease by LDA. Our results indicate that the new classifier sIB performs better than LDA in the study. CONCLUSIONS The fact that a small set of SNPs can predict disease status with average accuracy of 68% makes it possible to use SNP data for psoriasis prediction.
Collapse
Affiliation(s)
- Shenying Fang
- Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Xiangzhong Fang
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
- School of Mathematical Sciences, Peking University, Beijing 100871, P.R. China
| | - Momiao Xiong
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| |
Collapse
|
26
|
Padhukasahasram B, Halperin E, Wessel J, Thomas DJ, Silver E, Trumbower H, Cargill M, Stephan DA. Presymptomatic risk assessment for chronic non-communicable diseases. PLoS One 2010; 5:e14338. [PMID: 21217814 PMCID: PMC3013091 DOI: 10.1371/journal.pone.0014338] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2010] [Accepted: 11/15/2010] [Indexed: 11/19/2022] Open
Abstract
The prevalence of common chronic non-communicable diseases (CNCDs) far overshadows the prevalence of both monogenic and infectious diseases combined. All CNCDs, also called complex genetic diseases, have a heritable genetic component that can be used for pre-symptomatic risk assessment. Common single nucleotide polymorphisms (SNPs) that tag risk haplotypes across the genome currently account for a non-trivial portion of the germ-line genetic risk and we will likely continue to identify the remaining missing heritability in the form of rare variants, copy number variants and epigenetic modifications. Here, we describe a novel measure for calculating the lifetime risk of a disease, called the genetic composite index (GCI), and demonstrate its predictive value as a clinical classifier. The GCI only considers summary statistics of the effects of genetic variation and hence does not require the results of large-scale studies simultaneously assessing multiple risk factors. Combining GCI scores with environmental risk information provides an additional tool for clinical decision-making. The GCI can be populated with heritable risk information of any type, and thus represents a framework for CNCD pre-symptomatic risk assessment that can be populated as additional risk information is identified through next-generation technologies.
Collapse
|
27
|
So HC, Sham PC. A unifying framework for evaluating the predictive power of genetic variants based on the level of heritability explained. PLoS Genet 2010; 6:e1001230. [PMID: 21151957 PMCID: PMC2996330 DOI: 10.1371/journal.pgen.1001230] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 10/31/2010] [Indexed: 01/31/2023] Open
Abstract
An increasing number of genetic variants have been identified for many complex diseases. However, it is controversial whether risk prediction based on genomic profiles will be useful clinically. Appropriate statistical measures to evaluate the performance of genetic risk prediction models are required. Previous studies have mainly focused on the use of the area under the receiver operating characteristic (ROC) curve, or AUC, to judge the predictive value of genetic tests. However, AUC has its limitations and should be complemented by other measures. In this study, we develop a novel unifying statistical framework that connects a large variety of predictive indices together. We showed that, given the overall disease probability and the level of variance in total liability (or heritability) explained by the genetic variants, we can estimate analytically a large variety of prediction metrics, for example the AUC, the mean risk difference between cases and non-cases, the net reclassification improvement (ability to reclassify people into high- and low-risk categories), the proportion of cases explained by a specific percentile of population at the highest risk, the variance of predicted risks, and the risk at any percentile. We also demonstrate how to construct graphs to visualize the performance of risk models, such as the ROC curve, the density of risks, and the predictiveness curve (disease risk plotted against risk percentile). The results from simulations match very well with our theoretical estimates. Finally we apply the methodology to nine complex diseases, evaluating the predictive power of genetic tests based on known susceptibility variants for each trait.
Collapse
Affiliation(s)
- Hon-Cheong So
- Department of Psychiatry, University of Hong Kong, Hong Kong, China
| | - Pak C. Sham
- Department of Psychiatry, University of Hong Kong, Hong Kong, China
- Genome Research Centre, University of Hong Kong, Hong Kong, China
- State Key Laboratory of Brain and Cognitive Sciences, University of Hong Kong, Hong Kong, China
| |
Collapse
|
28
|
Haritunians T, Taylor KD, Targan SR, Dubinsky M, Ippoliti A, Kwon S, Guo X, Melmed GY, Berel D, Mengesha E, Psaty BM, Glazer NL, Vasiliauskas EA, Rotter JI, Fleshner PR, McGovern DPB. Genetic predictors of medically refractory ulcerative colitis. Inflamm Bowel Dis 2010; 16:1830-40. [PMID: 20848476 PMCID: PMC2959149 DOI: 10.1002/ibd.21293] [Citation(s) in RCA: 119] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
BACKGROUND Acute severe ulcerative colitis (UC) remains a significant clinical challenge and the ability to predict, at an early stage, those individuals at risk of colectomy for medically refractory UC (MR-UC) would be a major clinical advance. The aim of this study was to use a genome-wide association study (GWAS) in a well-characterized cohort of UC patients to identify genetic variation that contributes to MR-UC. METHODS A GWAS comparing 324 MR-UC patients with 537 non-MR-UC patients was analyzed using logistic regression and Cox proportional hazards methods. In addition, the MR-UC patients were compared with 2601 healthy controls. RESULTS MR-UC was associated with more extensive disease (P = 2.7 × 10(-6)) and a positive family history of UC (P = 0.004). A risk score based on the combination of 46 single nucleotide polymorphisms (SNPs) associated with MR-UC explained 48% of the variance for colectomy risk in our cohort. Risk scores divided into quarters showed the risk of colectomy to be 0%, 17%, 74%, and 100% in the four groups. Comparison of the MR-UC subjects with healthy controls confirmed the contribution of the major histocompatibility complex to severe UC (peak association: rs17207986, P = 1.4 × 10(-16)) and provided genome-wide suggestive association at the TNFSF15 (TL1A) locus (peak association: rs11554257, P = 1.4 × 10(-6)). CONCLUSIONS A SNP-based risk scoring system, identified here by GWAS analyses, may provide a useful adjunct to clinical parameters for predicting the natural history of UC. Furthermore, discovery of genetic processes underlying disease severity may help to identify pathways for novel therapeutic intervention in severe UC.
Collapse
Affiliation(s)
- Talin Haritunians
- Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Lu Q, Obuchowski N, Won S, Zhu X, Elston RC. Using the optimal robust receiver operating characteristic (ROC) curve for predictive genetic tests. Biometrics 2010; 66:586-93. [PMID: 19508241 PMCID: PMC3039874 DOI: 10.1111/j.1541-0420.2009.01278.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Current ongoing genome-wide association (GWA) studies represent a powerful approach to uncover common unknown genetic variants causing common complex diseases. The discovery of these genetic variants offers an important opportunity for early disease prediction, prevention, and individualized treatment. We describe here a method of combining multiple genetic variants for early disease prediction, based on the optimality theory of the likelihood ratio (LR). Such theory simply shows that the receiver operating characteristic (ROC) curve based on the LR has maximum performance at each cutoff point and that the area under the ROC curve so obtained is highest among that of all approaches. Through simulations and a real data application, we compared it with the commonly used logistic regression and classification tree approaches. The three approaches show similar performance if we know the underlying disease model. However, for most common diseases we have little prior knowledge of the disease model and in this situation the new method has an advantage over logistic regression and classification tree approaches. We applied the new method to the type 1 diabetes GWA data from the Wellcome Trust Case Control Consortium. Based on five single nucleotide polymorphisms, the test reaches medium level classification accuracy. With more genetic findings to be discovered in the future, we believe a predictive genetic test for type 1 diabetes can be successfully constructed and eventually implemented for clinical use.
Collapse
Affiliation(s)
- Qing Lu
- Department of Epidemiology, Michigan State University, East Lansing, Michigan 48823, U.S.A
| | - Nancy Obuchowski
- Department of Quantitative Health Sciences, Cleveland Clinic, 9500 Euclid Ave., Cleveland, OH, 44195, U.S.A
| | - Sungho Won
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio 44106, U.S.A
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio 44106, U.S.A
| | - Robert C. Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio 44106, U.S.A
| |
Collapse
|
30
|
Ruchat SM, Vohl MC, Weisnagel SJ, Rankinen T, Bouchard C, Pérusse L. Combining genetic markers and clinical risk factors improves the risk assessment of impaired glucose metabolism. Ann Med 2010; 42:196-206. [PMID: 20384434 DOI: 10.3109/07853890903559716] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Although several candidate gene polymorphisms (SNPs) have been associated with increased risk of type 2 diabetes mellitus (T2DM), relatively few studies have assessed the ability of T2DM candidate genes to assess the risk of impaired fasting glucose (IFG), impaired glucose tolerance (IGT), and T2DM beyond the information provided by clinical risk factors. OBJECTIVE To test whether the inclusion of genetic markers in a regression model provides a better assessment of the risk of IFG, IGT, and T2DM than a model based only on non-genetic risk factors commonly assessed in clinical settings. METHODS Subjects (n = 485; 213 parents, 272 offspring) from the Quebec Family Study, not known to haveT2DM, were measured for several risk factors and underwent an oral glucose tolerance test. Thirty-eight SNPs in 25 susceptibility/ candidate genes previously reported to be associated with T2DM were genotyped. In order to identify risk factors associated with IFG/IGT/T2DM, two logistic regression models were tested: a full model (FM) including age, sex, body mass index (BMI), systolic and diastolic blood pressure, smoking status, and the 38 SNPs; and a reduced model (RM), in which the SNPs were dropped, which allowed us to test the null-hypothesis that the markers are not associated with the risk of IFG/IGT/T2DM. Performances of the models were compared by using a likelihood ratio test and the receiver-operating characteristic curves (ROC).The area under the curve (AUC) was calculated from the ROC curve. RESULTS The analyses showed that age (P < 0.0001), BMI (P < 0.0001), and six variants (IGF2BP2 rs4402960, P = 0.002; ADIPOQ+276 G>T, P = 0.004; UCP2Ala55Val, P = 0.01; CDKN2AI2B rs3731201, P = 0.02; rs495490, P = 0.02, and rsl 0811661, P = 0.03) were significantly associated with the risk of IFG/IGT/T2DM. Dropping genetic markers from the analysis significantly reduced the fit of the model to the data (chi-square = 38.98, P < 0.00001 contrasting RM to FM), suggesting that the genetic markers are significantly associated with the risk of IFG/IGT/T2DM. Furthermore, the AUC was higher for FM than for RM (0.85 (95% CI 0.81-0.89) versus 0.81 (95% CI 0.76-0.85), P = 0.004). CONCLUSION Our results suggest that combining genetic markers with traditional clinical risk factors has the potential to improve our ability to assess the risk of complex diseases such as T2DM.
Collapse
Affiliation(s)
- Stephanie-May Ruchat
- Department of Preventive Medicine, Laval University, 2300 rue de la Terrasse, Quebec, Canada
| | | | | | | | | | | |
Collapse
|
31
|
Development of a Pharmacogenetic Predictive Test in asthma: proof of concept. Pharmacogenet Genomics 2010; 20:86-93. [PMID: 20032818 DOI: 10.1097/fpc.0b013e32833428d0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE To assess the feasibility of developing a Combined Clinical and Pharmacogenetic Predictive Test, comprised of multiple single nucleotide polymorphisms (SNPs) that is associated with poor bronchodilator response (BDR). METHODS We genotyped SNPs that tagged the whole genome of the parents and children in the Childhood Asthma Management Program (CAMP) and implemented an algorithm using a family-based association test that ranked SNPs by statistical power. The top eight SNPs that were associated with BDR comprised the Pharmacogenetic Predictive Test. The Clinical Predictive Test was comprised of baseline forced expiratory volume in 1 s (FEV1). We evaluated these predictive tests and a Combined Clinical and Pharmacogenetic Predictive Test in three distinct populations: the children of the CAMP trial and two additional clinical trial populations of asthma. Our outcome measure was poor BDR, defined as BDR of less than 20th percentile in each population. BDR was calculated as the percent difference between the prebronchodilator and postbronchodilator (two puffs of albuterol at 180 microg/puff) FEV1 value. To assess the predictive ability of the test, the corresponding area under the receiver operating characteristic curves (AUROCs) were calculated for each population. RESULTS The AUROC values for the Clinical Predictive Test alone were not significantly different from 0.50, the AUROC of a random classifier. Our Combined Clinical and Pharmacogenetic Predictive Test comprised of genetic polymorphisms in addition to FEV1 predicted poor BDR with an AUROC of 0.65 in the CAMP children (n = 422) and 0.60 (n = 475) and 0.63 (n = 235) in the two independent populations. Both the Combined Clinical and Pharmacogenetic Predictive Test and the Pharmacogenetic Predictive Test were significantly more accurate than the Clinical Predictive Test (AUROC between 0.44 and 0.55) in each of the populations. CONCLUSION Our finding that genetic polymorphisms with a clinical trait are associated with BDR suggests that there is promise in using multiple genetic polymorphisms simultaneously to predict which asthmatics are likely to respond poorly to bronchodilators.
Collapse
|
32
|
Ziegler A. Genome-wide association studies: quality control and population-based measures. Genet Epidemiol 2010; 33 Suppl 1:S45-50. [PMID: 19924716 DOI: 10.1002/gepi.20472] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Genome-wide association studies, using hundreds of thousands of single-nucleotide polymorphism (SNP) markers, have become a standard approach for identifying disease susceptibility genes. The change in the technology poses substantial computational and statistical challenges that have been addressed in the quality control, imputation, and population-based measure groups of the Genetic Analysis Workshop 16. The computational challenges pertain to efficient memory management and computational speed of the statistical procedures, and we discuss an approach for efficient SNP storage. Accuracy and computational speed is relevant for genotype calling, and the results from a comparison of three calling algorithms are discussed. The first statistical challenge is related to statistical quality control, and we discuss two novel quality control procedures. These low-level analyses have an effect on subsequent preparatory steps for high-level analyses, e.g., the quality of genotype imputation approaches. After the conduct of a genome-wide association study with successful replication and/or validation, measures of diagnostic accuracy, including the area under the curve, are investigated. The area under the curve can be constructed from summary data in some situations. Finally, we discuss how the population-attributable risk of a genetic variant that is only measured in a reference data set can be determined.
Collapse
Affiliation(s)
- Andreas Ziegler
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Germany.
| |
Collapse
|
33
|
Wilcox MA, Paterson AD. Phenotype definition and development--contributions from Group 7. Genet Epidemiol 2010; 33 Suppl 1:S40-4. [PMID: 19924715 DOI: 10.1002/gepi.20471] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The papers in Genetic Analysis Workshop 16 Group 7 covered a wide range of topics. The effects of confounder misclassification and selection bias on association results were examined by one group. Another focused on bias introduced by various methods of accounting for treatment effects. Two groups used related methods to derive phenotypic traits. They used different analytic strategies for genetic associations with non-overlapping results (but because they used different sets of single-nucleotide polymorphisms (SNPs) and significance criteria, this is not surprising). Another group relied on the well-characterized definition of type 2 diabetes to show benefits of a novel predictive test. Transmission-ratio distortion was the focus of another paper. The results were extended to show a potential secondary benefit of the test to identify potentially mis-called SNPs.
Collapse
Affiliation(s)
- Marsha A Wilcox
- Epidemiology, Johnson & Johnson Pharmaceutical Research and Development, LLC, Titusville, New Jersey 08560, USA.
| | | |
Collapse
|
34
|
Abstract
Recent genome-wide association studies have identified many genetic variants affecting complex human diseases. It is of great interest to build disease risk prediction models based on these data. In this article, we first discuss statistical challenges in using genome-wide association data for risk predictions, and then review the findings from the literature on this topic. We also demonstrate the performance of different methods through both simulation studies and application to real-world data.
Collapse
Affiliation(s)
- Jia Kang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Judy Cho
- Section of Digestive Diseases, Department of Medicine, Yale University, New Haven, Connecticut, USA
- Department of Genetics, Yale University, New Haven, Connecticut, USA
| | - Hongyu Zhao
- Department of Genetics, Yale University, New Haven, Connecticut, USA
- Department of Epidemiology and Public Health, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
35
|
Lu Q, Cui Y, Ye C, Wei C, Elston RC. Bagging optimal ROC curve method for predictive genetic tests, with an application for rheumatoid arthritis. J Biopharm Stat 2010; 20:401-14. [PMID: 20309765 PMCID: PMC3823239 DOI: 10.1080/10543400903572811] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Translation studies have been initiated to assess the combined effect of genetic loci from recently accomplished genome-wide association studies and the existing risk factors for early disease prediction. We propose a bagging optimal receiver operating characteristic (ROC) curve method to facilitate this research. Through simulation and real data application, we compared the new method with the commonly used allele counting method and logistic regression, and found that the new method yields a better performance. The new method was applied on the Wellcome Trust data set to form a predictive genetic test for rheumatoid arthritis. The formed test reached an area under the curve (AUC) value of 0.7.
Collapse
Affiliation(s)
- Qing Lu
- Department of Epidemiology, Michigan State University, East Lansing, Michigan, USA.
| | | | | | | | | |
Collapse
|
36
|
Wray NR, Yang J, Goddard ME, Visscher PM. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet 2010; 6:e1000864. [PMID: 20195508 PMCID: PMC2829056 DOI: 10.1371/journal.pgen.1000864] [Citation(s) in RCA: 238] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2009] [Accepted: 01/28/2010] [Indexed: 12/11/2022] Open
Abstract
Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator. Genome-wide association studies in human populations have facilitated the creation of genomic profiles that combine the effects of many associated genetic variants to predict risk of disease. However, genomic profiles are inherently constrained in their ability to classify diseased from non-diseased individuals dictated by the genetic epidemiology of the disease. In this paper, we use a genetic interpretation to provide insight into the constraints on genomic profiles for risk prediction. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability available as an online calculator.
Collapse
Affiliation(s)
- Naomi R Wray
- Genetic Epidemiology and Queensland Statistical Genetics, Queensland Institute of Medical Research, Brisbane, Australia.
| | | | | | | |
Collapse
|
37
|
Carayol J, Schellenberg GD, Tores F, Hager J, Ziegler A, Dawson G. Assessing the impact of a combined analysis of four common low-risk genetic variants on autism risk. Mol Autism 2010; 1:4. [PMID: 20678243 PMCID: PMC2907567 DOI: 10.1186/2040-2392-1-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 02/22/2010] [Indexed: 02/02/2023] Open
Abstract
Background Autism is a complex disorder characterized by deficits involving communication, social interaction, and repetitive and restrictive patterns of behavior. Twin studies have shown that autism is strongly heritable, suggesting a strong genetic component. In other disease states with a complex etiology, such as type 2 diabetes, cancer and cardiovascular disease, combined analysis of multiple genetic variants in a genetic score has helped to identify individuals at high risk of disease. Genetic scores are designed to test for association of genetic markers with disease. Method The accumulation of multiple risk alleles markedly increases the risk of being affected, and compared with studying polymorphisms individually, it improves the identification of subgroups of individuals at greater risk. In the present study, we show that this approach can be applied to autism by specifically looking at a high-risk population of children who have siblings with autism. A two-sample study design and the generation of a genetic score using multiple independent genes were used to assess the risk of autism in a high-risk population. Results In both samples, odds ratios (ORs) increased significantly as a function of the number of risk alleles, with a genetic score of 8 being associated with an OR of 5.54 (95% confidence interval [CI] 2.45 to 12.49). The sensitivities and specificities for each genetic score were similar in both analyses, and the resultant area under the receiver operating characteristic curves were identical (0.59). Conclusions These results suggest that the accumulation of multiple risk alleles in a genetic score is a useful strategy for assessing the risk of autism in siblings of affected individuals, and may be better than studying single polymorphisms for identifying subgroups of individuals with significantly greater risk.
Collapse
|
38
|
Wray NR, Goddard ME. Multi-locus models of genetic risk of disease. Genome Med 2010; 2:10. [PMID: 20181060 PMCID: PMC2847701 DOI: 10.1186/gm131] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2009] [Revised: 01/22/2010] [Accepted: 02/02/2010] [Indexed: 03/18/2023] Open
Abstract
Background Evidence for genetic contribution to complex diseases is described by recurrence risks to relatives of diseased individuals. Genome-wide association studies allow a description of the genetics of the same diseases in terms of risk loci, their effects and allele frequencies. To reconcile the two descriptions requires a model of how risks from individual loci combine to determine an individual's overall risk. Methods We derive predictions of risk to relatives from risks at individual loci under a number of models and compare them with published data on disease risk. Results The model in which risks are multiplicative on the risk scale implies equality between the recurrence risk to monozygotic twins and the square of the recurrence risk to sibs, a relationship often not observed, especially for low prevalence diseases. We show that this theoretical equality is achieved by allowing impossible probabilities of disease. Other models, in which probabilities of disease are constrained to a maximum of one, generate results more consistent with empirical estimates for a range of diseases. Conclusions The unconstrained multiplicative model, often used in theoretical studies because of its mathematical tractability, is not a realistic model. We find three models, the constrained multiplicative, Odds (or Logit) and Probit (or liability threshold) models, all fit the data on risk to relatives. Currently, in practice it would be difficult to differentiate between these models, but this may become possible if genetic variants that explain the majority of the genetic variance are identified.
Collapse
Affiliation(s)
- Naomi R Wray
- Genetic Epidemiology and, Queensland Institute of Medical Research, Herston Road, Brisbane, Queensland 4006, Australia.
| | | |
Collapse
|
39
|
Ruderfer DM, Korn J, Purcell SM. Family-based genetic risk prediction of multifactorial disease. Genome Med 2010; 2:2. [PMID: 20193047 PMCID: PMC2829927 DOI: 10.1186/gm123] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2009] [Revised: 10/02/2009] [Accepted: 01/15/2010] [Indexed: 01/24/2023] Open
Abstract
Genome-wide association studies have detected dozens of variants underlying complex diseases, although it is uncertain how often these discoveries will translate into clinically useful predictors. Here, to improve genetic risk prediction, we consider including phenotypic and genotypic information from related individuals. We develop and evaluate a family-based liability-threshold prediction model and apply it to a simulation of known Crohn's disease risk variants. We show that genotypes of a relative of known phenotype can be informative for an individual's disease risk, over and above the same locus genotyped in the individual. This approach can lead to better-calibrated estimates of disease risk, although the overall benefit for prediction is typically only very modest.
Collapse
Affiliation(s)
- Douglas M Ruderfer
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Mass General Hospital, Boston, MA, USA.
| | | | | |
Collapse
|
40
|
Abstract
While recently performed genome-wide association studies have advanced the identification of genetic variants predisposing to type 2 diabetes (T2D), the potential application of these novel findings for disease prediction and prevention has not been well studied. Diabetes prediction and prevention have become urgent issues owing to the rapidly increasing prevalence of diabetes and its associated mortality, morbidity, and health care cost. New prediction approaches using genetic markers could facilitate early identification of high risk sub-groups of the population so that appropriate prevention methods could be effectively applied to delay, or even prevent, disease onset.This paper assessed 18 recently identified T2D loci for their potential role in diabetes prediction. We built a new predictive genetic test for T2D using the Framingham Heart Study dataset. Using logistic regression and 15 additional loci, the new test was slightly improved over the existing test using just three loci. A formal comparison between the two tests suggests no significant improvement. We further formed a predictive genetic test for identifying early onset T2D and found higher classification accuracy for this test, not only indicating that these 18 loci have great potential for predicting early onset T2D, but also suggesting that they may play important roles in causing early-onset T2D.To further improve the test's accuracy, we applied a newly developed nonparametric method capable of capturing high order interactions to the data, but it did not outperform a logistic regression that only considers single-locus effects. This could be explained by the absence of gene-gene interactions among the 18 loci.
Collapse
|
41
|
Abstract
Lu and Elston have recently proposed a procedure for developing optimal receiver operating characteristic curves that maximize the area under a receiver operating characteristic curve in the setting of a predictive genetic test. The method requires only summary data, not individual level genetic data. In an era of increased data sharing, we investigate the performance of this algorithm when individual level genetic data are available and compare this approach to more standard receiver operating characteristic curve-building methods. Conclusion Though the Lu-Elston method can produce an optimal area under the curve under some assumptions, the method typically has little advantage over standard multivariable logistic methods when data are available. Also, the standard approach easily allows comparison of nested models via likelihood ratio tests and incorporation of covariates - the Lu-Elston approach is shown to have some difficulties with such analyses. These conclusions are based on evaluations using the Genetic Analysis Workshop 16 rheumatoid arthritis data set.
Collapse
Affiliation(s)
- Neal Jeffries
- Office of Biostatistics Research, Division of Population and Prevention Studies, National Heart, Lung, and Blood Institute, Bethesda, Maryland 20892, USA.
| | | |
Collapse
|
42
|
Brito EC, Lyssenko V, Renström F, Berglund G, Nilsson PM, Groop L, Franks PW. Previously associated type 2 diabetes variants may interact with physical activity to modify the risk of impaired glucose regulation and type 2 diabetes: a study of 16,003 Swedish adults. Diabetes 2009; 58:1411-8. [PMID: 19324937 PMCID: PMC2682680 DOI: 10.2337/db08-1623] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
OBJECTIVE Recent advances in type 2 diabetes genetics have culminated in the discovery and confirmation of multiple risk variants. Two important and largely unanswered questions are whether this information can be used to identify individuals most susceptible to the adverse consequences of sedentary behavior and to predict their response to lifestyle intervention; such evidence would be mechanistically informative and provide a rationale for targeting genetically susceptible subgroups of the population. RESEARCH DESIGN AND METHODS Gene x physical activity interactions were assessed for 17 polymorphisms in a prospective population-based cohort of initially nondiabetic middle-aged adults. Outcomes were 1) impaired glucose regulation (IGR) versus normal glucose regulation determined with either fasting or 2-h plasma glucose concentrations (n = 16,003), 2) glucose intolerance (in mmol/l, n = 8,860), or 3) incident type 2 diabetes (n = 2,063 events). RESULTS Tests of gene x physical activity interactions on IGR risk for 3 of the 17 polymorphisms were nominally statistically significant:CDKN2A/B rs10811661 (P(interaction) = 0.015), HNF1B rs4430796 (P(interaction) = 0.026), and PPARG rs1801282 (P(interaction) = 0.04). Consistent interactions were observed for the CDKN2A/B (P(interaction) = 0.013) and HNF1B (P(interaction) = 0.0009) variants on 2-h glucose concentrations. Where type 2 diabetes was the outcome, only one statistically significant interaction effect was observed, and this was for the HNF1B rs4430796 variant (P(interaction) = 0.0004). The interaction effects for HNF1B on IGR risk and incident diabetes remained significant after correction for multiple testing (P(interaction) = 0.015 and 0.0068, respectively). CONCLUSIONS Our observations suggest that the genetic predisposition to hyperglycemia is partially dependent on a person's lifestyle.
Collapse
Affiliation(s)
- Ema C. Brito
- Genetic Epidemiology and Clinical Research Group, Department of Public Health and Clinical Medicine, Section for Medicine, Umeå University Hospital, Umeå, Sweden
| | - Valeriya Lyssenko
- Department of Clinical Sciences-Diabetes and Endocrinology, Clinical Research Center, Malmö University Hospital, Lund University, Malmö, Sweden
| | - Frida Renström
- Genetic Epidemiology and Clinical Research Group, Department of Public Health and Clinical Medicine, Section for Medicine, Umeå University Hospital, Umeå, Sweden
| | - Göran Berglund
- Department of Medicine, Malmö University Hospital, Lund University, Malmö, Sweden
| | - Peter M. Nilsson
- Department of Medicine, Malmö University Hospital, Lund University, Malmö, Sweden
| | - Leif Groop
- Department of Clinical Sciences-Diabetes and Endocrinology, Clinical Research Center, Malmö University Hospital, Lund University, Malmö, Sweden
| | - Paul W. Franks
- Genetic Epidemiology and Clinical Research Group, Department of Public Health and Clinical Medicine, Section for Medicine, Umeå University Hospital, Umeå, Sweden
- Department of Clinical Sciences-Diabetes and Endocrinology, Clinical Research Center, Malmö University Hospital, Lund University, Malmö, Sweden
- Corresponding author: Paul W. Franks,
| |
Collapse
|
43
|
Abstract
The brisk discovery of novel inherited disease markers by genome-wide association (GWA) studies has raised expectations for predicting disease risk by analysing multiple common alleles. However, the statistics used during the discovery phase of research (such as odds ratios or p values for association) are not the most appropriate measures for evaluating the predictive value of genetic profiles. We argue that other measures--such as sensitivity, specificity, and positive and negative predictive values--are more useful when proposing a genetic profile for risk prediction.
Collapse
|
44
|
Lin X, Song K, Lim N, Yuan X, Johnson T, Abderrahmani A, Vollenweider P, Stirnadel H, Sundseth SS, Lai E, Burns DK, Middleton LT, Roses AD, Matthews PM, Waeber G, Cardon L, Waterworth DM, Mooser V. Risk prediction of prevalent diabetes in a Swiss population using a weighted genetic score--the CoLaus Study. Diabetologia 2009; 52:600-8. [PMID: 19139842 DOI: 10.1007/s00125-008-1254-y] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/17/2008] [Accepted: 12/03/2008] [Indexed: 02/07/2023]
Abstract
AIMS/HYPOTHESIS Several susceptibility genes for type 2 diabetes have been discovered recently. Individually, these genes increase the disease risk only minimally. The goals of the present study were to determine, at the population level, the risk of diabetes in individuals who carry risk alleles within several susceptibility genes for the disease and the added value of this genetic information over the clinical predictors. METHODS We constructed an additive genetic score using the most replicated single-nucleotide polymorphisms (SNPs) within 15 type 2 diabetes-susceptibility genes, weighting each SNP with its reported effect. We tested this score in the extensively phenotyped population-based cross-sectional CoLaus Study in Lausanne, Switzerland (n = 5,360), involving 356 diabetic individuals. RESULTS The clinical predictors of prevalent diabetes were age, BMI, family history of diabetes, WHR, and triacylglycerol/HDL-cholesterol ratio. After adjustment for these variables, the risk of diabetes was 2.7 (95% CI 1.8-4.0, p = 0.000006) for individuals with a genetic score within the top quintile, compared with the bottom quintile. Adding the genetic score to the clinical covariates improved the area under the receiver operating characteristic curve slightly (from 0.86 to 0.87), yet significantly (p = 0.002). BMI was similar in these two extreme quintiles. CONCLUSIONS/INTERPRETATION In this population, a simple weighted 15 SNP-based genetic score provides additional information over clinical predictors of prevalent diabetes. At this stage, however, the clinical benefit of this genetic information is limited.
Collapse
Affiliation(s)
- X Lin
- Discovery Analytics, GlaxoSmithKline, Collegeville, PA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Jakobsdottir J, Gorin MB, Conley YP, Ferrell RE, Weeks DE. Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet 2009; 5:e1000337. [PMID: 19197355 PMCID: PMC2629574 DOI: 10.1371/journal.pgen.1000337] [Citation(s) in RCA: 201] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our age-related macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10−13, 10−13, and 10−3, respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn's disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time.
Collapse
Affiliation(s)
- Johanna Jakobsdottir
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
| | | | | | | | | |
Collapse
|
46
|
Identifying genes for primary hypertension: methodological limitations and gene-environment interactions. J Hum Hypertens 2008; 23:227-37. [PMID: 19005475 DOI: 10.1038/jhh.2008.134] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Hypertension segregates within families, indicating that genetic factors explain some of the variance in the risk of developing the disease; however, even with major advances in genotyping technologies facilitating the discovery of multiple genetic risk markers for cardiovascular and metabolic diseases, little progress has been made in defining the genetic defects that cause elevations in blood pressure. Several plausible explanations exist for this apparent paradox, one of which is that the risk conveyed by genes involved in the development of hypertension is context dependent. This notion is supported by a growing number of published animal and human studies, although none has yet provided unequivocal evidence that genetic and environmental factors interact to influence the risk of primary hypertension in humans. In this review, an assumption is made that common genetic variation contributes meaningfully to the development of primary hypertension. The review focuses on (i) several methodological limitations of genetic association studies and (ii) the roles that gene-environment interactions might play in the development of primary hypertension. The proceeding sections of the review examine the design features necessary for future studies to adequately test the hypothesis that genes for primary hypertension act in a context-dependent manner. Finally, an outline of how knowledge of gene-environment interactions might be used to optimize the prevention or treatment of primary hypertension is provided.
Collapse
|
47
|
Lango H, Palmer CNA, Morris AD, Zeggini E, Hattersley AT, McCarthy MI, Frayling TM, Weedon MN. Assessing the combined impact of 18 common genetic variants of modest effect sizes on type 2 diabetes risk. Diabetes 2008; 57:3129-35. [PMID: 18591388 PMCID: PMC2570411 DOI: 10.2337/db08-0504] [Citation(s) in RCA: 236] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/16/2008] [Accepted: 06/24/2008] [Indexed: 01/17/2023]
Abstract
OBJECTIVES Genome-wide association studies have dramatically increased the number of common genetic variants that are robustly associated with type 2 diabetes. A possible clinical use of this information is to identify individuals at high risk of developing the disease, so that preventative measures may be more effectively targeted. Here, we assess the ability of 18 confirmed type 2 diabetes variants to differentiate between type 2 diabetic case and control subjects. RESEARCH DESIGN AND METHODS We assessed index single nucleotide polymorphisms (SNPs) for the 18 independent loci in 2,598 control subjects and 2,309 case subjects from the Genetics of Diabetes Audit and Research Tayside Study. The discriminatory ability of the combined SNP information was assessed by grouping individuals based on number of risk alleles carried and determining relative odds of type 2 diabetes and by calculating the area under the receiver-operator characteristic curve (AUC). RESULTS Individuals carrying more risk alleles had a higher risk of type 2 diabetes. For example, 1.2% of individuals with >24 risk alleles had an odds ratio of 4.2 (95% CI 2.11-8.56) against the 1.8% with 10-12 risk alleles. The AUC (a measure of discriminative accuracy) for these variants was 0.60. The AUC for age, BMI, and sex was 0.78, and adding the genetic risk variants only marginally increased this to 0.80. CONCLUSIONS Currently, common risk variants for type 2 diabetes do not provide strong predictive value at a population level. However, the joint effect of risk variants identified subgroups of the population at substantially different risk of disease. Further studies are needed to assess whether individuals with extreme numbers of risk alleles may benefit from genetic testing.
Collapse
Affiliation(s)
- Hana Lango
- Genetics of Complex Traits, Institute of Biomedical and Clinical Science, Peninsula Medical School, Exeter, UK
| | | | | | | | | | | | | | | |
Collapse
|