1
|
Sassano M, Mariani M, Quaranta G, Pastorino R, Boccia S. Polygenic risk prediction models for colorectal cancer: a systematic review. BMC Cancer 2022; 22:65. [PMID: 35030997 PMCID: PMC8760647 DOI: 10.1186/s12885-021-09143-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/02/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Risk prediction models incorporating single nucleotide polymorphisms (SNPs) could lead to individualized prevention of colorectal cancer (CRC). However, the added value of incorporating SNPs into models with only traditional risk factors is still not clear. Hence, our primary aim was to summarize literature on risk prediction models including genetic variants for CRC, while our secondary aim was to evaluate the improvement of discriminatory accuracy when adding SNPs to a prediction model with only traditional risk factors. METHODS We conducted a systematic review on prediction models incorporating multiple SNPs for CRC risk prediction. We tested whether a significant trend in the increase of Area Under Curve (AUC) according to the number of SNPs could be observed, and estimated the correlation between AUC improvement and number of SNPs. We estimated pooled AUC improvement for SNP-enhanced models compared with non-SNP-enhanced models using random effects meta-analysis, and conducted meta-regression to investigate the association of specific factors with AUC improvement. RESULTS We included 33 studies, 78.79% using genetic risk scores to combine genetic data. We found no significant trend in AUC improvement according to the number of SNPs (p for trend = 0.774), and no correlation between the number of SNPs and AUC improvement (p = 0.695). Pooled AUC improvement was 0.040 (95% CI: 0.035, 0.045), and the number of cases in the study and the AUC of the starting model were inversely associated with AUC improvement obtained when adding SNPs to a prediction model. In addition, models constructed in Asian individuals achieved better AUC improvement with the incorporation of SNPs compared with those developed among individuals of European ancestry. CONCLUSIONS Though not conclusive, our results provide insights on factors influencing discriminatory accuracy of SNP-enhanced models. Genetic variants might be useful to inform stratified CRC screening in the future, but further research is needed.
Collapse
Affiliation(s)
- Michele Sassano
- Section of Hygiene, University Department of Life Sciences and Public Health, Università Cattolica del Sacro Cuore, 00168, Roma, Italy
| | - Marco Mariani
- Section of Hygiene, University Department of Life Sciences and Public Health, Università Cattolica del Sacro Cuore, 00168, Roma, Italy
| | - Gianluigi Quaranta
- Section of Hygiene, University Department of Life Sciences and Public Health, Università Cattolica del Sacro Cuore, 00168, Roma, Italy
- Department of Woman and Child Health and Public Health - Public Health Area, Fondazione Policlinico Universitario A. Gemelli IRCCS, Roma, Italy
| | - Roberta Pastorino
- Department of Woman and Child Health and Public Health - Public Health Area, Fondazione Policlinico Universitario A. Gemelli IRCCS, Roma, Italy.
| | - Stefania Boccia
- Section of Hygiene, University Department of Life Sciences and Public Health, Università Cattolica del Sacro Cuore, 00168, Roma, Italy
- Department of Woman and Child Health and Public Health - Public Health Area, Fondazione Policlinico Universitario A. Gemelli IRCCS, Roma, Italy
| |
Collapse
|
2
|
Patron J, Serra-Cayuela A, Han B, Li C, Wishart DS. Assessing the performance of genome-wide association studies for predicting disease risk. PLoS One 2019; 14:e0220215. [PMID: 31805043 PMCID: PMC6894795 DOI: 10.1371/journal.pone.0220215] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Accepted: 11/01/2019] [Indexed: 12/24/2022] Open
Abstract
To date more than 3700 genome-wide association studies (GWAS) have been published that look at the genetic contributions of single nucleotide polymorphisms (SNPs) to human conditions or human phenotypes. Through these studies many highly significant SNPs have been identified for hundreds of diseases or medical conditions. However, the extent to which GWAS-identified SNPs or combinations of SNP biomarkers can predict disease risk is not well known. One of the most commonly used approaches to assess the performance of predictive biomarkers is to determine the area under the receiver-operator characteristic curve (AUROC). We have developed an R package called G-WIZ to generate ROC curves and calculate the AUROC using summary-level GWAS data. We first tested the performance of G-WIZ by using AUROC values derived from patient-level SNP data, as well as literature-reported AUROC values. We found that G-WIZ predicts the AUROC with <3% error. Next, we used the summary level GWAS data from GWAS Central to determine the ROC curves and AUROC values for 569 different GWA studies spanning 219 different conditions. Using these data we found a small number of GWA studies with SNP-derived risk predictors that have very high AUROCs (>0.75). On the other hand, the average GWA study produces a multi-SNP risk predictor with an AUROC of 0.55. Detailed AUROC comparisons indicate that most SNP-derived risk predictions are not as good as clinically based disease risk predictors. All our calculations (ROC curves, AUROCs, explained heritability) are in a publicly accessible database called GWAS-ROCS (http://gwasrocs.ca). The G-WIZ code is freely available for download at https://github.com/jonaspatronjp/GWIZ-Rscript/.
Collapse
Affiliation(s)
- Jonas Patron
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | | | - Beomsoo Han
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | - Carin Li
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
| | - David Scott Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, Canada
- Department of Computing Science, University of Alberta, Edmonton, Canada
| |
Collapse
|
3
|
Janssens ACJW. Proprietary Algorithms for Polygenic Risk: Protecting Scientific Innovation or Hiding the Lack of It? Genes (Basel) 2019; 10:E448. [PMID: 31200546 PMCID: PMC6627729 DOI: 10.3390/genes10060448] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 06/08/2019] [Accepted: 06/11/2019] [Indexed: 12/17/2022] Open
Abstract
Direct-to-consumer genetic testing companies aim to predict the risks of complex diseases using proprietary algorithms. Companies keep algorithms as trade secrets for competitive advantage, but a market that thrives on the premise that customers can make their own decisions about genetic testing should respect customer autonomy and informed decision making and maximize opportunities for transparency. The algorithm itself is only one piece of the information that is deemed essential for understanding how prediction algorithms are developed and evaluated. Companies should be encouraged to disclose everything else, including the expected risk distribution of the algorithm when applied in the population, using a benchmark DNA dataset. A standardized presentation of information and risk distributions allows customers to compare test offers and scientists to verify whether the undisclosed algorithms could be valid. A new model of oversight in which stakeholders collaboratively keep a check on the commercial market is needed.
Collapse
Affiliation(s)
- A Cecile J W Janssens
- Department of Epidemiology, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, GA 30322, USA.
| |
Collapse
|
4
|
Choi S, Bae S, Park T. Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes. Genomics Inform 2016; 14:138-148. [PMID: 28154504 PMCID: PMC5287117 DOI: 10.5808/gi.2016.14.4.138] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 12/05/2016] [Accepted: 12/05/2016] [Indexed: 12/31/2022] Open
Abstract
The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.
Collapse
Affiliation(s)
- Sungkyoung Choi
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Sunghwan Bae
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea.; Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
5
|
McGeachie MJ, Clemmer GL, Croteau-Chonka DC, Castaldi PJ, Cho MH, Sordillo JE, Lasky-Su JA, Raby BA, Tantisira KG, Weiss ST. Whole genome prediction and heritability of childhood asthma phenotypes. IMMUNITY INFLAMMATION AND DISEASE 2016; 4:487-496. [PMID: 27980782 PMCID: PMC5134727 DOI: 10.1002/iid3.133] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Revised: 09/01/2016] [Accepted: 09/04/2016] [Indexed: 01/19/2023]
Abstract
Introduction While whole genome prediction (WGP) methods have recently demonstrated successes in the prediction of complex genetic diseases, they have not yet been applied to asthma and related phenotypes. Longitudinal patterns of lung function differ between asthmatics, but these phenotypes have not been assessed for heritability or predictive ability. Herein, we assess the heritability and genetic predictability of asthma‐related phenotypes. Methods We applied several WGP methods to a well‐phenotyped cohort of 832 children with mild‐to‐moderate asthma from CAMP. We assessed narrow‐sense heritability and predictability for airway hyperresponsiveness, serum immunoglobulin E, blood eosinophil count, pre‐ and post‐bronchodilator forced expiratory volume in 1 sec (FEV1), bronchodilator response, steroid responsiveness, and longitudinal patterns of lung function (normal growth, reduced growth, early decline, and their combinations). Prediction accuracy was evaluated using a training/testing set split of the cohort. Results We found that longitudinal lung function phenotypes demonstrated significant narrow‐sense heritability (reduced growth, 95%; normal growth with early decline, 55%). These same phenotypes also showed significant polygenic prediction (areas under the curve [AUCs] 56% to 62%). Including additional demographic covariates in the models increased prediction 4–8%, with reduced growth increasing from 62% to 66% AUC. We found that prediction with a genomic relatedness matrix was improved by filtering available SNPs based on chromatin evidence, and this result extended across cohorts. Conclusions Longitudinal reduced lung function growth displayed extremely high heritability. All phenotypes with significant heritability showed significant polygenic prediction. Using SNP‐prioritization increased prediction across cohorts. WGP methods show promise in predicting asthma‐related heritable traits.
Collapse
Affiliation(s)
- Michael J McGeachie
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - George L Clemmer
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Damien C Croteau-Chonka
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Peter J Castaldi
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Michael H Cho
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Joanne E Sordillo
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Jessica A Lasky-Su
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Benjamin A Raby
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Kelan G Tantisira
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| | - Scott T Weiss
- Channing Division of Network Medicine Brigham and Women's Hospital and Harvard Medical School Boston Massachusetts
| |
Collapse
|
6
|
Kundu S, Kers JG, Janssens ACJW. Constructing Hypothetical Risk Data from the Area under the ROC Curve: Modelling Distributions of Polygenic Risk. PLoS One 2016; 11:e0152359. [PMID: 27023073 PMCID: PMC4811433 DOI: 10.1371/journal.pone.0152359] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Accepted: 03/14/2016] [Indexed: 02/07/2023] Open
Abstract
Background Modeling studies using hypothetical polygenic risk data can be an efficient tool for investigating the effectiveness of downstream applications such as targeting interventions to risk groups to justify whether empirical investigation is warranted. We investigated the assumptions underlying a method that simulates risk data for specific values of the area under the receiver operating characteristic curve (AUC). Methods The simulation method constructs risk data for a hypothetical population based on the population disease risk, and the odds ratios and frequencies of genetic variants. By systematically varying the parameters, we investigated under what conditions AUC values represent unique ROC curves with unique risk distributions for patients and nonpatients, and to what extend risk data can be simulated for precise values of the AUC. Results Using larger number of genetic variants each with a modest effect, we observed that the distributions of estimated risks of patients and nonpatients were similar for various combinations of the odds ratios and frequencies of the risk alleles. Simulated ROC curves overlapped empirical curves with the same AUC. Conclusions Polygenic risk data can be effectively and efficiently created using a simulation method. This allows to further investigate the potential applications of stratifying interventions on the basis of polygenic risk.
Collapse
Affiliation(s)
- Suman Kundu
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
- Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Jannigje G. Kers
- Department of Clinical Genetics/EMGO Institute for Health and Care Research, Section Community Genetics, VU University Medical Center, Amsterdam, The Netherlands
| | - A. Cecile J. W. Janssens
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
- Department of Clinical Genetics/EMGO Institute for Health and Care Research, Section Community Genetics, VU University Medical Center, Amsterdam, The Netherlands
- * E-mail:
| |
Collapse
|
7
|
Martens FK, Kers JG, Janssens ACJW. Risk Analysis of Prostate Cancer in PRACTICAL Consortium-Letter. Cancer Epidemiol Biomarkers Prev 2016; 25:222. [PMID: 26762807 DOI: 10.1158/1055-9965.epi-15-0904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Affiliation(s)
- Forike K Martens
- Department of Clinical Genetics, EMGO Institute for Health and Care Research, Section Community Genetics, VU University Medical Center, Amsterdam, the Netherlands
| | - Jannigje G Kers
- Department of Clinical Genetics, EMGO Institute for Health and Care Research, Section Community Genetics, VU University Medical Center, Amsterdam, the Netherlands
| | - A Cecile J W Janssens
- Department of Clinical Genetics, EMGO Institute for Health and Care Research, Section Community Genetics, VU University Medical Center, Amsterdam, the Netherlands. Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia.
| |
Collapse
|
8
|
Short E, Thomas LE, Hurley J, Jose S, Sampson JR. Inherited predisposition to colorectal cancer: towards a more complete picture. J Med Genet 2015; 52:791-6. [PMID: 26297796 DOI: 10.1136/jmedgenet-2015-103298] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Accepted: 07/28/2015] [Indexed: 12/14/2022]
Abstract
Colorectal carcinoma (CRC) is the third most common cancer worldwide. Hereditary factors are important in 15%-35% of affected patients. This review provides an update on the genetic basis of inherited predisposition to CRC. Currently known genetic factors include a group of highly penetrant mutant genes associated with rare mendelian cancer syndromes and a group of common low-penetrance alleles that have been identified through genetic association studies. Additional mechanisms, which may underlie a predisposition to CRC, will be outlined, for example, variants in intermediate penetrance alleles. Recent findings, including mutations in POLE, POLD1 and NTHL1, will be highlighted, and we identify gaps in present knowledge and consider how these may be addressed through current and emerging genomic approaches. It is expected that identification of the missing heritable component of CRC will be resolved through evermore comprehensive cataloguing and phenotypic annotation of CRC-associated variants identified through sequencing approaches. This will have important clinical implications, particularly in areas such as risk stratification, public health and CRC prevention.
Collapse
Affiliation(s)
- Emma Short
- Institute of Cancer and Genetics, Cardiff University, Heath Park Campus, Cardiff, UK
| | - Laura E Thomas
- Institute of Cancer and Genetics, Cardiff University, Heath Park Campus, Cardiff, UK
| | - Joanna Hurley
- Department of Gastroenterology, Cwm Taf University Health Board, Prince Charles Hospital, Merthyr Tydfil, UK
| | - Sian Jose
- Institute of Medical Genetics, Cardiff and Vale Health Board, Cardiff, UK
| | - Julian R Sampson
- Institute of Cancer and Genetics, Cardiff University, Heath Park Campus, Cardiff, UK
| |
Collapse
|
9
|
Milton JN, Steinberg MH, Sebastiani P. Evaluation of an ensemble of genetic models for prediction of a quantitative trait. Front Genet 2015; 5:474. [PMID: 25628649 PMCID: PMC4292739 DOI: 10.3389/fgene.2014.00474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Accepted: 12/20/2014] [Indexed: 01/09/2023] Open
Abstract
Many genetic markers have been shown to be associated with common quantitative traits in genome-wide association studies. Typically these associated genetic markers have small to modest effect sizes and individually they explain only a small amount of the variability of the phenotype. In order to build a genetic prediction model without fitting a multiple linear regression model with possibly hundreds of genetic markers as predictors, researchers often summarize the joint effect of risk alleles into a genetic score that is used as a covariate in the genetic prediction model. However, the prediction accuracy can be highly variable and selecting the optimal number of markers to be included in the genetic score is challenging. In this manuscript we present a strategy to build an ensemble of genetic prediction models from data and we show that the ensemble-based method makes the challenge of choosing the number of genetic markers more amenable. Using simulated data with varying heritability and number of genetic markers, we compare the predictive accuracy and inclusion of true positive and false positive markers of a single genetic prediction model and our proposed ensemble method. The results show that the ensemble of genetic models tends to include a larger number of genetic variants than a single genetic model and it is more likely to include all of the true genetic markers. This increased sensitivity is obtained at the price of a lower specificity that appears to minimally affect the predictive accuracy of the ensemble.
Collapse
Affiliation(s)
- Jacqueline N Milton
- Department of Biostatistics, School of Public Health, Boston University Boston, MA, USA
| | - Martin H Steinberg
- Department of Medicine, School of Medicine, Boston University Boston, MA, USA
| | - Paola Sebastiani
- Department of Biostatistics, School of Public Health, Boston University Boston, MA, USA
| |
Collapse
|