1
|
Lin HY, Mazumder H, Sarkar I, Huang PY, Eeles RA, Kote-Jarai Z, Muir KR, Schleutker J, Pashayan N, Batra J, Neal DE, Nielsen SF, Nordestgaard BG, Grönberg H, Wiklund F, MacInnis RJ, Haiman CA, Travis RC, Stanford JL, Kibel AS, Cybulski C, Khaw KT, Maier C, Thibodeau SN, Teixeira MR, Cannon-Albright L, Brenner H, Kaneva R, Pandha H, Park JY. Cluster effect for SNP-SNP interaction pairs for predicting complex traits. Sci Rep 2024; 14:18677. [PMID: 39134575 PMCID: PMC11319716 DOI: 10.1038/s41598-024-66311-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 07/01/2024] [Indexed: 08/15/2024] Open
Abstract
Single nucleotide polymorphism (SNP) interactions are the key to improving polygenic risk scores. Previous studies reported several significant SNP-SNP interaction pairs that shared a common SNP to form a cluster, but some identified pairs might be false positives. This study aims to identify factors associated with the cluster effect of false positivity and develop strategies to enhance the accuracy of SNP-SNP interactions. The results showed the cluster effect is a major cause of false-positive findings of SNP-SNP interactions. This cluster effect is due to high correlations between a causal pair and null pairs in a cluster. The clusters with a hub SNP with a significant main effect and a large minor allele frequency (MAF) tended to have a higher false-positive rate. In addition, peripheral null SNPs in a cluster with a small MAF tended to enhance false positivity. We also demonstrated that using the modified significance criterion based on the 3 p-value rules and the bootstrap approach (3pRule + bootstrap) can reduce false positivity and maintain high true positivity. In addition, our results also showed that a pair without a significant main effect tends to have weak or no interaction. This study identified the cluster effect and suggested using the 3pRule + bootstrap approach to enhance SNP-SNP interaction detection accuracy.
Collapse
Affiliation(s)
- Hui-Yi Lin
- Biostatistics and Data Science Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA.
| | - Harun Mazumder
- Biostatistics and Data Science Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA
| | - Indrani Sarkar
- Biostatistics and Data Science Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA
| | - Po-Yu Huang
- Information and Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan
| | - Rosalind A Eeles
- The Institute of Cancer Research, London, SM2 5NG, UK
- Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK
| | | | - Kenneth R Muir
- Division of Population Health, Health Services Research and Primary Care, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Johanna Schleutker
- Institute of Biomedicine, University of Turku, Turku, Finland
- Department of Medical Genetics, Genomics, Laboratory Division, Turku University Hospital, PO Box 52, 20521, Turku, Finland
| | - Nora Pashayan
- Department of Applied Health Research, University College London, London, WC1E 7HB, UK
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, CB1 8RN, UK
| | - Jyotsna Batra
- Australian Prostate Cancer Research Centre-Qld, Institute of Health and Biomedical Innovation and School of Biomedical Science, Queensland University of Technology, Brisbane, QLD, 4059, Australia
- Translational Research Institute, Brisbane, QLD, 4102, Australia
| | - David E Neal
- Nuffield Department of Surgical Sciences, University of Oxford, John Radcliffe Hospital, Room 6603, Level 6, Headley Way, Headington, Oxford, OX3 9DU, UK
- Department of Oncology, University of Cambridge, Addenbrooke's Hospital, Hills Road, Box 279, Cambridge, CB2 0QQ, UK
- Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
| | - Sune F Nielsen
- Faculty of Health and Medical Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
- Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev, 2200, Copenhagen, Denmark
| | - Børge G Nordestgaard
- Faculty of Health and Medical Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
- Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev, 2200, Copenhagen, Denmark
| | - Henrik Grönberg
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 171 77, Stockholm, Sweden
| | - Fredrik Wiklund
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 171 77, Stockholm, Sweden
| | - Robert J MacInnis
- Cancer Epidemiology Division, Cancer Council Victoria, 200 Victoria Parade, East Melbourne, 3002, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Grattan Street, Parkville, VIC, 3010, Australia
| | - Christopher A Haiman
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, CA, 90015, USA
| | - Ruth C Travis
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
| | - Janet L Stanford
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, 98109-1024, USA
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA, 98195, USA
| | - Adam S Kibel
- Division of Urologic Surgery, Brigham and Womens Hospital, 75 Francis Street, Boston, MA, 02115, USA
| | - Cezary Cybulski
- International Hereditary Cancer Center, Department of Genetics and Pathology, Pomeranian Medical University, 70-115, Szczecin, Poland
| | - Kay-Tee Khaw
- Clinical Gerontology Unit, University of Cambridge, Cambridge, CB2 2QQ, UK
| | - Christiane Maier
- Humangenetik Tuebingen, Paul-Ehrlich-Str 23, 72076, Tuebingen, Germany
| | - Stephen N Thibodeau
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Manuel R Teixeira
- Department of Laboratory Genetics, Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP)/RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
- School of Medicine and Biomedical Sciences (ICBAS), University of Porto, Porto, Portugal
| | - Lisa Cannon-Albright
- Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, 84132, USA
- George E. Wahlen Department of Veterans Affairs Medical Center, Salt Lake City, UT, 84148, USA
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
- Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Radka Kaneva
- Molecular Medicine Center, Department of Medical Chemistry and Biochemistry, Medical University of Sofia, Sofia, 2 Zdrave Str., 1431, Sofia, Bulgaria
| | - Hardev Pandha
- The University of Surrey, Guildford, Surrey, GU2 7XH, UK
| | - Jong Y Park
- Department of Cancer Epidemiology, Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL, 33612, USA
| |
Collapse
|
2
|
Calbet‐Llopart N, Combalia M, Kiroglu A, Potrony M, Tell‐Martí G, Combalia A, Brugues A, Podlipnik S, Carrera C, Puig S, Malvehy J, Puig‐Butillé JA. Common genetic variants associated with melanoma risk or naevus count in patients with wildtype MC1R melanoma. Br J Dermatol 2022; 187:753-764. [PMID: 35701387 PMCID: PMC9804579 DOI: 10.1111/bjd.21707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/07/2022] [Accepted: 06/11/2022] [Indexed: 01/05/2023]
Abstract
BACKGROUND Hypomorphic MC1R variants are the most prevalent genetic determinants of melanoma risk in the white population. However, the genetic background of patients with wildtype (WT) MC1R melanoma is poorly studied. OBJECTIVES To analyse the role of candidate common genetic variants on the melanoma risk and naevus count in Spanish patients with WT MC1R melanoma. METHODS We examined 753 individuals with WT MC1R from Spain (497 patients and 256 controls). We used OpenArray reverse-transcriptase polymerase chain reaction to genotype a panel of 221 common genetic variants involved in melanoma, naevogenesis, hormonal pathways and proinflammatory pathways. Genetic models were tested using multivariate logistic regression models. Nonparametric multifactor dimensionality reduction (MDR) was used to detect gene-gene interactions within each biological subgroup of variants. RESULTS We found that variant rs12913832 in the HERC2 gene, which is associated with blue eye colour, increased melanoma risk in individuals with WT MC1R [odds ratio (OR) 1·97, 95% confidence interval (CI) 1·48-2·63; adjusted P < 0·001; corrected P < 0·001]. We also observed a trend between the rs3798577 variant in the oestrogen receptor alpha gene (ESR1) and a lower naevus count, which was restricted to female patients with WT MC1R (OR 0·51, 95% CI 0·33-0·79; adjusted P = 0·002; corrected P = 0·11). This sex-dependent association was statistically significant in a larger cohort of patients with melanoma regardless of their MC1R status (n = 1497; OR 0·71, 95% CI 0·57-0·88; adjusted P = 0·002), reinforcing the hypothesis of an association between hormonal pathways and susceptibility to melanocytic proliferation. Last, the MDR analysis revealed four genetic combinations associated with melanoma risk or naevus count in patients with WT MC1R. CONCLUSIONS Our data suggest that epistatic interaction among common variants related to melanocyte biology or proinflammatory pathways might influence melanocytic proliferation in individuals with WT MC1R. What is already known about this topic? Genetic variants in the MC1R gene are the most prevalent melanoma genetic risk factor in the white population. Still, 20-40% of cases of melanoma occur in individuals with wildtype MC1R. Multiple genetic variants have a pleiotropic effect in melanoma and naevogenesis. Additional variants in unexplored pathways might also have a role in melanocytic proliferation in these patients. Epidemiological evidence suggests an association of melanocytic proliferation with hormonal pathways and proinflammatory pathways. What does this study add? Variant rs12913832 in the HERC2 gene, which is associated with blue eye colour, increases the melanoma risk in individuals with wildtype MC1R. Variant rs3798577 in the oestrogen receptor gene is associated with naevus count regardless of the MC1R status in female patients with melanoma. We report epistatic interactions among common genetic variants with a role in modulating the risk of melanoma or the number of naevi in individuals with wildtype MC1R. What is the translational message? We report a potential role of hormonal signalling pathways in melanocytic proliferation, providing a basis for better understanding of sex-based differences observed at the epidemiological level. We show that gene-gene interactions among common genetic variants might be responsible for an increased risk for melanoma development in individuals with a low-risk phenotype, such as darkly pigmented hair and skin.
Collapse
Affiliation(s)
- Neus Calbet‐Llopart
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Marc Combalia
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Anil Kiroglu
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Miriam Potrony
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain,Biochemistry and Molecular Genetics DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Gemma Tell‐Martí
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Andrea Combalia
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Albert Brugues
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Sebastian Podlipnik
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Cristina Carrera
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Susana Puig
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Josep Malvehy
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Joan Anton Puig‐Butillé
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain,Molecular Biology CORE, Biochemistry and Molecular Genetics DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| |
Collapse
|
3
|
Curtis A, Yu Y, Carey M, Parfrey P, Yilmaz YE, Savas S. Examining SNP-SNP interactions and risk of clinical outcomes in colorectal cancer using multifactor dimensionality reduction based methods. Front Genet 2022; 13:902217. [PMID: 35991579 PMCID: PMC9385108 DOI: 10.3389/fgene.2022.902217] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 06/30/2022] [Indexed: 12/24/2022] Open
Abstract
Background: SNP interactions may explain the variable outcome risk among colorectal cancer patients. Examining SNP interactions is challenging, especially with large datasets. Multifactor Dimensionality Reduction (MDR)-based programs may address this problem.Objectives: 1) To compare two MDR-based programs for their utility; and 2) to apply these programs to sets of MMP and VEGF-family gene SNPs in order to examine their interactions in relation to colorectal cancer survival outcomes.Methods: This study applied two data reduction methods, Cox-MDR and GMDR 0.9, to study one to three way SNP interactions. Both programs were run using a 5-fold cross validation step and the top models were verified by permutation testing. Prognostic associations of the SNP interactions were verified using multivariable regression methods. Eight datasets, including SNPs from MMP family genes (n = 201) and seven sets of VEGF-family interaction networks (n = 1,517 SNPs) were examined.Results: ∼90 million potential interactions were examined. Analyses in the MMP and VEGF gene family datasets found several novel 1- to 3-way SNP interactions. These interactions were able to distinguish between the patients with different outcome risks (regression p-values 0.03–2.2E-09). The strongest association was detected for a 3-way interaction including CHRM3.rs665159_EPN1.rs6509955_PTGER3.rs1327460 variants.Conclusion: Our work demonstrates the utility of data reduction methods while identifying potential prognostic markers in colorectal cancer.
Collapse
Affiliation(s)
- Aaron Curtis
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
- Division of Biomedical Sciences, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
| | - Yajun Yu
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
- Division of Biomedical Sciences, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
| | - Megan Carey
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
| | - Patrick Parfrey
- Discipline of Medicine, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
| | - Yildiz E. Yilmaz
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
- Discipline of Medicine, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
- Department of Mathematics and Statistics, Faculty of Science, Memorial University, St. John’s, NL, Canada
| | - Sevtap Savas
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
- Division of Biomedical Sciences, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
- Discipline of Oncology, Faculty of Medicine, Memorial University, St. John’s, NL, Canada
- *Correspondence: Sevtap Savas,
| |
Collapse
|
4
|
Yang CH, Lin YD, Chuang LY. Class Balanced Multifactor Dimensionality Reduction to Detect Gene-Gene Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:71-81. [PMID: 30040653 DOI: 10.1109/tcbb.2018.2858776] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Detecting gene-gene interactions in single-nucleotide polymorphism data is vital for understanding disease susceptibility. However, existing approaches may be limited by the sample size in case-control studies. Herein, we propose a balance approach for the multifactor dimensionality reduction (BMDR) method to increase the accuracy of estimates of the prediction error rate in small samples. BMDR explicitly selects the best model by evaluating the average of prediction error rates over k-fold cross-validation without cross-validation consistency selection. In this study, we used several epistatic models with and without marginal effects under different parameter settings (heritability and minor allele frequencies) to evaluate the performance of existing approaches. Using simulated data sets, BMDR successfully detected gene-gene interactions, particularly for data sets with small sample sizes. A large data set was obtained from the Wellcome Trust Case Control Consortium, and results indicated that BMDR could effectively detect significant gene-gene interactions.
Collapse
|
5
|
Tessier F, Fontaine-Bisson B, Lefebvre JF, El-Sohemy A, Roy-Gagnon MH. Investigating Gene-Gene and Gene-Environment Interactions in the Association Between Overnutrition and Obesity-Related Phenotypes. Front Genet 2019; 10:151. [PMID: 30886629 PMCID: PMC6409307 DOI: 10.3389/fgene.2019.00151] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 02/12/2019] [Indexed: 01/12/2023] Open
Abstract
Introduction: Animal studies suggested that NFKB1, IKBKB, and SOCS3 genes could be involved in the association between overnutrition and obesity. This study aims to investigate interactions involving these genes and macronutrient intakes affecting obesity-related phenotypes. Methods: We used a traditional statistical method, logistic regression, and compared it to alternative statistical method, multifactor dimensionality reduction (MDR) and penalized logistic regression (PLR), to better detect genes/environment interactions in the Toronto Nutrigenomics and Health Study (n = 1639) using dichotomized body mass index (BMI) and waist circumference as obesity-related phenotypes. Exposure variables included genotype on 54 single nucleotide polymorphisms (NFKB1: 18, IKBKB: 9, SOCS3: 27), macronutrient (carbohydrates, protein, fat) and alcohol intakes and ethno-cultural background. Results: After correction for multiple testing, no interaction was found using logistic regression. MDR identified interactions between SOCS3 rs6501199 and rs4969172, and IKBKB rs3747811 affecting BMI in the Caucasian population; SOCS3 rs6501199 and NFKB1 rs1609798 affecting WC in the Caucasian population; and SOCS3 rs4436839 and IKBKB rs3747811 affecting WC in the South Asian population. PLR found a main effect of SOCS3 rs12944581 on BMI among the South Asian population. Conclusion: While MDR and PLR had discordant results, some models support results from previous studies. These results emphasize the need to use alternative statistical methods to investigate high-order interactions and suggest that variants in the nutrient-responsive hypothalamic IKKB/NF-kB signaling pathway may be involved in obesity pathogenesis.
Collapse
Affiliation(s)
- François Tessier
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
| | | | | | - Ahmed El-Sohemy
- Department of Nutritional Sciences, University of Toronto, Toronto, ON, Canada
| | | |
Collapse
|
6
|
Amosco MD, Tavera GR, Villar VAM, Naniong JMA, David-Bustamante LMG, Williams SM, Jose PA, Palmes-Saloma CP. Non-additive effects of ACVR2A in preeclampsia in a Philippine population. BMC Pregnancy Childbirth 2019; 19:11. [PMID: 30621627 PMCID: PMC6323705 DOI: 10.1186/s12884-018-2152-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 12/17/2018] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Multiple interrelated pathways contribute to the pathogenesis of preeclampsia, and variants in susceptibility genes may play a role among Filipinos, an ethnically distinct group with high prevalence of the disease. The objective of this study was to examine the association between variants in maternal candidate genes and the development of preeclampsia in a Philippine population. METHODS A case-control study involving 29 single nucleotide polymorphisms (SNPs) in 21 candidate genes was conducted in 150 patients with preeclampsia (cases) and 175 women with uncomplicated normal pregnancies (controls). Genotyping for the GRK4 and DRD1 gene variants was carried out using the TaqMan Assay, and all other variants were assayed using the Sequenom MassARRAY Iplex Platform. PLINK was used for SNP association testing. Multilocus association analysis was performed using multifactor dimensionality reduction (MDR) analysis. RESULTS Among the clinical factors, older age (P < 1 × 10-4), higher BMI (P < 1 × 10-4), having a new partner (P = 0.006), and increased time interval from previous pregnancy (P = 0.018) associated with preeclampsia. The MDR algorithm identified the genetic variant ACVR2A rs1014064 as interacting with age and BMI in association with preeclampsia among Filipino women. CONCLUSIONS The MDR algorithm identified an interaction between age, BMI and ACVR2A rs1014064, indicating that context among genetic variants and demographic/clinical factors may be crucial to understanding the pathogenesis of preeclampsia among Filipino women.
Collapse
Affiliation(s)
- Melissa D. Amosco
- National Institute of Molecular Biology and Biotechnology, National Science Complex, University of the Philippines, Diliman, 1101 Quezon City, Philippines
- Department of Obstetrics and Gynecology, Philippine General Hospital - University of the Philippines, Taft Avenue, 1000 Manila, Philippines
| | - Gloria R. Tavera
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, School of Medicine, Cleveland, OH 44106 USA
| | - Van Anthony M. Villar
- Division of Renal Diseases & Hypertension, Department of Medicine, The George Washington University of School of Medicine & Health Sciences, Washington, DC, 20037 USA
| | - Justin Michael A. Naniong
- National Institute of Molecular Biology and Biotechnology, National Science Complex, University of the Philippines, Diliman, 1101 Quezon City, Philippines
| | - Lara Marie G. David-Bustamante
- Department of Obstetrics and Gynecology, Philippine General Hospital - University of the Philippines, Taft Avenue, 1000 Manila, Philippines
| | - Scott M. Williams
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, School of Medicine, Cleveland, OH 44106 USA
| | - Pedro A. Jose
- Division of Renal Diseases & Hypertension, Department of Medicine, The George Washington University of School of Medicine & Health Sciences, Washington, DC, 20037 USA
- Department of Pharmacology and Physiology, The George Washington University of School of Medicine & Health Sciences, Washington, DC, 20037 USA
| | - Cynthia P. Palmes-Saloma
- National Institute of Molecular Biology and Biotechnology, National Science Complex, University of the Philippines, Diliman, 1101 Quezon City, Philippines
- Philippine Genome Center, National Science Complex, University of the Philippines, Diliman, 1101 Quezon City, Philippines
| |
Collapse
|
7
|
Solini A, Simeon V, Derosa L, Orlandi P, Rossi C, Fontana A, Galli L, Di Desidero T, Fioravanti A, Lucchesi S, Coltelli L, Ginocchi L, Allegrini G, Danesi R, Falcone A, Bocci G. Genetic interaction of P2X7 receptor and VEGFR-2 polymorphisms identifies a favorable prognostic profile in prostate cancer patients. Oncotarget 2016; 6:28743-54. [PMID: 26337470 PMCID: PMC4745689 DOI: 10.18632/oncotarget.4926] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 08/10/2015] [Indexed: 12/12/2022] Open
Abstract
VEGFR-2 and P2X7 receptor (P2X7R) have been described to stimulate the angiogenesis and inflammatory processes of prostate cancer. The present study has been performed to investigate the genetic interactions among VEGFR-2 and P2X7R SNPs and their correlation with overall survival (OS) in a population of metastatic prostate cancer patients. Analyses were performed on germline DNA obtained from blood samples and SNPs were investigated by real-time PCR technique. The survival dimensionality reduction (SDR) methodology was applied to investigate the genetic interaction between SNPs. One hundred patients were enrolled. The SDR software provided two genetic interaction profiles consisting of the combination between specific VEGFR-2 (rs2071559, rs11133360) and P2X7R (rs3751143, rs208294) genotypes. The median OS was 126 months (95% CI, 115.94–152.96) and 65.65 months (95% CI, 52.95–76.53) for the favorable and the unfavorable genetic profile, respectively (p < 0.0001). The genetic statistical interaction between VEGFR-2 (rs2071559, rs11133360) and P2X7R (rs3751143, rs208294) genotypes may identify a population of prostate cancer patients with a better prognosis.
Collapse
Affiliation(s)
- Anna Solini
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Vittorio Simeon
- Laboratory of Pre-Clinical and Translational Research, IRCCS - CROB Referral Cancer Center of Basilicata, Rionero in Vulture, Potenza, Italy
| | - Lisa Derosa
- Oncology Unit 2, University Hospital of Pisa, Pisa, Italy
| | - Paola Orlandi
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Chiara Rossi
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Andrea Fontana
- Oncology Unit 2, University Hospital of Pisa, Pisa, Italy
| | - Luca Galli
- Oncology Unit 2, University Hospital of Pisa, Pisa, Italy
| | - Teresa Di Desidero
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Anna Fioravanti
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Sara Lucchesi
- Division of Medical Oncology, Pontedera Hospital, Azienda USL of Pisa, Pontedera, Italy
| | - Luigi Coltelli
- Division of Medical Oncology, Pontedera Hospital, Azienda USL of Pisa, Pontedera, Italy
| | - Laura Ginocchi
- Division of Medical Oncology, Pontedera Hospital, Azienda USL of Pisa, Pontedera, Italy
| | - Giacomo Allegrini
- Division of Medical Oncology, Pontedera Hospital, Azienda USL of Pisa, Pontedera, Italy
| | - Romano Danesi
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | | | - Guido Bocci
- Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| |
Collapse
|
8
|
Abstract
INTRODUCTION Long-acting β2-agonists are an effective class of drugs, when combined with inhaled corticosteroids, for reducing symptoms and exacerbations in patients with asthma that is not adequately controlled by inhaled corticosteroids alone. However, because this class of drugs has been associated with severe adverse events, including hospitalization and death in small numbers of patients, efforts to identify a pharmacogenetic profile for patients at risk has been diligently investigated. AREAS COVERED The PubMed search engine of the National Library of Medicine was used to identify English-language and non-English language articles published from 1947 to March 2015 pertinent to asthma, pharmacogenomics, and long-acting β2-agonists. Keywords and topics included: asthma, asthma control, long-acting β2-agonists, salmeterol, formoterol, pharmacogenetics, and pharmacogenomics. This strategy was also used for the Cochrane Library Database and CINAHL. Reference types were randomized controlled trials, reviews, and editorials. Additional publications were culled from reference lists. The publications were reviewed by the authors and those most relevant were used to support the topics covered in this review. EXPERT OPINION Children, who carry the ADRB2 Arg16Arg genotype, may be at greater risk than adults for severe adverse events. Rare ADRB2 variants appear to provide better clues for identifying the at-risk population of asthmatics.
Collapse
Affiliation(s)
- Kathryn Blake
- a 1 Center for Pharmacogenomics and Translational Research, Nemours Children's Specialty Care , 807 Children's Way, Jacksonville, FL, USA +1 904 697 3806 ; +1 904 697 3799 ;
| | - John Lima
- b 2 Center for Pharmacogenomics and Translational Research, Nemours Children's Specialty Care , 807 Children's Way, Jacksonville, FL, USA
| |
Collapse
|
9
|
Bridging the gap between statistical and biological epistasis in Alzheimer's disease. BIOMED RESEARCH INTERNATIONAL 2015; 2015:870123. [PMID: 26075270 PMCID: PMC4449899 DOI: 10.1155/2015/870123] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 05/05/2015] [Indexed: 12/17/2022]
Abstract
Alzheimer's disease affects millions of people worldwide and incidence is expected to rise as the population ages, but no effective therapies exist despite decades of research and more than 20 known disease markers. Research has shown that Alzheimer's disease's missing heritability remains extensive with an estimated 25% of phenotypic variance unexplained by known variants. The missing heritability may be explained by missing variants or by epistasis. Researchers often focus on individual loci rather than epistatic interactions, which is likely an oversimplification of the underlying biology since most phenotypes are affected by multiple genes. Focusing research efforts on epistasis will be critical to resolving Alzheimer's disease etiology, and a major key to identifying and properly interpreting key epistatic interactions will be bridging the gap between statistical and biological epistasis. This review covers the current state of epistasis research in Alzheimer's disease and how researchers can bridge the gap between statistical and biological epistasis to help resolve Alzheimer's disease etiology.
Collapse
|
10
|
Schulte PA, Whittaker C, Curran CP. Considerations for Using Genetic and Epigenetic Information in Occupational Health Risk Assessment and Standard Setting. JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL HYGIENE 2015; 12 Suppl 1:S69-S81. [PMID: 26583908 PMCID: PMC4685594 DOI: 10.1080/15459624.2015.1060323#.xhlte1uzbx4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Risk assessment forms the basis for both occupational health decision-making and the development of occupational exposure limits (OELs). Although genetic and epigenetic data have not been widely used in risk assessment and ultimately, standard setting, it is possible to envision such uses. A growing body of literature demonstrates that genetic and epigenetic factors condition biological responses to occupational and environmental hazards or serve as targets of them. This presentation addresses the considerations for using genetic and epigenetic information in risk assessments, provides guidance on using this information within the classic risk assessment paradigm, and describes a framework to organize thinking about such uses. The framework is a 4 × 4 matrix involving the risk assessment functions (hazard identification, dose-response modeling, exposure assessment, and risk characterization) on one axis and inherited and acquired genetic and epigenetic data on the other axis. The cells in the matrix identify how genetic and epigenetic data can be used for each risk assessment function. Generally, genetic and epigenetic data might be used as endpoints in hazard identification, as indicators of exposure, as effect modifiers in exposure assessment and dose-response modeling, as descriptors of mode of action, and to characterize toxicity pathways. Vast amounts of genetic and epigenetic data may be generated by high-throughput technologies. These data can be useful for assessing variability and reducing uncertainty in extrapolations, and they may serve as the foundation upon which identification of biological perturbations would lead to a new paradigm of toxicity pathway-based risk assessments.
Collapse
Affiliation(s)
- P. A. Schulte
- Centers for Disease Control and Prevention (CDC), National Institute for Occupational Safety and Health (NIOSH), Education and Information Division, Cincinnati, Ohio
| | - C. Whittaker
- Centers for Disease Control and Prevention (CDC), National Institute for Occupational Safety and Health (NIOSH), Education and Information Division, Cincinnati, Ohio
| | - C. P. Curran
- Northern Kentucky University, Department of Biological Sciences, Highland Heights, Kentucky
| |
Collapse
|
11
|
Schulte PA, Whittaker C, Curran CP. Considerations for Using Genetic and Epigenetic Information in Occupational Health Risk Assessment and Standard Setting. JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL HYGIENE 2015; 12 Suppl 1:S69-81. [PMID: 26583908 PMCID: PMC4685594 DOI: 10.1080/15459624.2015.1060323] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Risk assessment forms the basis for both occupational health decision-making and the development of occupational exposure limits (OELs). Although genetic and epigenetic data have not been widely used in risk assessment and ultimately, standard setting, it is possible to envision such uses. A growing body of literature demonstrates that genetic and epigenetic factors condition biological responses to occupational and environmental hazards or serve as targets of them. This presentation addresses the considerations for using genetic and epigenetic information in risk assessments, provides guidance on using this information within the classic risk assessment paradigm, and describes a framework to organize thinking about such uses. The framework is a 4 × 4 matrix involving the risk assessment functions (hazard identification, dose-response modeling, exposure assessment, and risk characterization) on one axis and inherited and acquired genetic and epigenetic data on the other axis. The cells in the matrix identify how genetic and epigenetic data can be used for each risk assessment function. Generally, genetic and epigenetic data might be used as endpoints in hazard identification, as indicators of exposure, as effect modifiers in exposure assessment and dose-response modeling, as descriptors of mode of action, and to characterize toxicity pathways. Vast amounts of genetic and epigenetic data may be generated by high-throughput technologies. These data can be useful for assessing variability and reducing uncertainty in extrapolations, and they may serve as the foundation upon which identification of biological perturbations would lead to a new paradigm of toxicity pathway-based risk assessments.
Collapse
Affiliation(s)
- P. A. Schulte
- Centers for Disease Control and Prevention (CDC), National Institute for Occupational Safety and Health (NIOSH), Education and Information Division, Cincinnati, Ohio
- Address correspondence to Paul A. Schulte, Centers for Disease Control and Prevention (CDC), National Institute for Occupational Safety and Health (NIOSH), Education and Information Division, 4676 Columbia Parkway, MS-C14 Cincinnati, OH45226, . E-mail:
| | - C. Whittaker
- Centers for Disease Control and Prevention (CDC), National Institute for Occupational Safety and Health (NIOSH), Education and Information Division, Cincinnati, Ohio
| | - C. P. Curran
- Northern Kentucky University, Department of Biological Sciences, Highland Heights, Kentucky
| |
Collapse
|
12
|
Allegrini G, Coltelli L, Orlandi P, Fontana A, Camerini A, Ferro A, Cazzaniga M, Casadei V, Lucchesi S, Bona E, Di Lieto M, Pazzagli I, Villa F, Amoroso D, Scalese M, Arrighi G, Molinaro S, Fioravanti A, Finale C, Triolo R, Di Desidero T, Donati S, Marcucci L, Goletti O, Del Re M, Salvadori B, Ferrarini I, Danesi R, Falcone A, Bocci G. Pharmacogenetic interaction analysis of VEGFR-2 and IL-8 polymorphisms in advanced breast cancer patients treated with paclitaxel and bevacizumab. Pharmacogenomics 2014; 15:1985-99. [DOI: 10.2217/pgs.14.140] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Aim: To investigate pharmacogenetic interactions among VEGF-A, VEGFR-2, IL-8, HIF-1α, EPAS-1 and TSP-1 SNPs and their role on progression-free survival in a population of metastatic breast cancer patients treated with bevacizumab in combination with first-line paclitaxel. Patients & methods: Analyses were performed on germline DNA obtained from blood samples and SNPs were investigated by real-time polymerase chain reaction technique. The multifactor dimensionality reduction methodology was applied to investigate the interaction between SNPs. Results: One hundred and thirteen patients were enrolled from eight Italian Oncology Units ( clinicaltrial.gov : NCT01935102). The multifactor dimensionality reduction software provided two pharmacogenetic interaction profiles consisting of the combination between specific VEGFR-2 rs11133360 and IL-8 rs4073 genotypes. The median progression-free survival was 14.1 months (95% CI: 11.4–16.8) and 10.2 months (95% CI: 8.8–11.5) for the favorable and the unfavorable genetic profile, respectively (HR: 0.44, 95% CI: 0.29–0.66, p < 0.0001). Conclusion: The pharmacogenetic statistical interaction between VEGFR-2 rs11133360 and IL-8 rs4073 genotypes may identify a population of patients with a better outcome.
Collapse
Affiliation(s)
| | - Luigi Coltelli
- Division of Medical Oncology, Pontedera Hospital, Pisa, Italy
| | - Paola Orlandi
- Department of Clinical & Experimental Medicine, University of Pisa, Pisa, Italy
| | - Andrea Fontana
- Division of Medical Oncology II, Azienda Ospedaliero-Universitaria Pisana, S. Chiara Hospital, Pisa, Italy
| | - Andrea Camerini
- Division of Medical Oncology, Versilia Hospital, Lucca, Italy
| | - Antonella Ferro
- Division of Medical Oncology, S. Chiara Hospital, Trento, Italy
| | | | - Virginia Casadei
- Division of Medical Oncology, S. Salvatore Hospital, Pesaro, Italy
| | - Sara Lucchesi
- Division of Medical Oncology, Pontedera Hospital, Pisa, Italy
| | - Eleonora Bona
- Division of Medical Oncology II, Azienda Ospedaliero-Universitaria Pisana, S. Chiara Hospital, Pisa, Italy
| | - Marco Di Lieto
- Division of Medical Oncology, Azienda USL 3, Pistoia, Italy
| | - Ilaria Pazzagli
- Division of Medical Oncology, S. Cosma & Damiano Hospital, Pescia, Pistoia, Italy
| | - Federica Villa
- Division of Medical Oncology, AO S. Gerardo, Monza, Italy
| | | | - Marco Scalese
- Institute of Clinical Physiology, Italian National Research Council – CNR, Pisa Italy
| | - Giada Arrighi
- Division of Medical Oncology, Pontedera Hospital, Pisa, Italy
| | - Sabrina Molinaro
- Institute of Clinical Physiology, Italian National Research Council – CNR, Pisa Italy
| | - Anna Fioravanti
- Department of Clinical & Experimental Medicine, University of Pisa, Pisa, Italy
| | - Chiara Finale
- Division of Medical Oncology, Pontedera Hospital, Pisa, Italy
| | - Renza Triolo
- Division of Medical Oncology, S. Chiara Hospital, Trento, Italy
| | - Teresa Di Desidero
- Department of Clinical & Experimental Medicine, University of Pisa, Pisa, Italy
| | - Sara Donati
- Division of Medical Oncology, Versilia Hospital, Lucca, Italy
| | | | - Orlando Goletti
- Department of Translational Research & New Technology in Medicine & Surgery, University of Pisa, Italy
| | - Marzia Del Re
- Department of Clinical & Experimental Medicine, University of Pisa, Pisa, Italy
| | - Barbara Salvadori
- Division of Medical Oncology II, Azienda Ospedaliero-Universitaria Pisana, S. Chiara Hospital, Pisa, Italy
| | - Ilaria Ferrarini
- Division of Medical Oncology II, Azienda Ospedaliero-Universitaria Pisana, S. Chiara Hospital, Pisa, Italy
| | - Romano Danesi
- Department of Clinical & Experimental Medicine, University of Pisa, Pisa, Italy
| | - Alfredo Falcone
- Division of Medical Oncology II, Azienda Ospedaliero-Universitaria Pisana, S. Chiara Hospital, Pisa, Italy
- Division of Medical Oncology, Department of Translational Research & New Technology in Medicine & Surgery, University of Pisa, Italy
| | - Guido Bocci
- Department of Clinical & Experimental Medicine, University of Pisa, Pisa, Italy
- Istituto Toscano Tumori, Firenze, Italy
| |
Collapse
|
13
|
Roy R, De Sarkar N, Ghose S, Paul RR, Ray A, Mukhopadhyay I, Roy B. Association between risk of oral precancer and genetic variations in microRNA and related processing genes. J Biomed Sci 2014; 21:48. [PMID: 24885463 PMCID: PMC4035900 DOI: 10.1186/1423-0127-21-48] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 05/10/2014] [Indexed: 12/29/2022] Open
Abstract
Background MicroRNAs have been implicated in cancer but studies on their role in precancer, such as leukoplakia, are limited. Sequence variations at eight miRNA and four miRNA processing genes were studied in 452 healthy controls and 299 leukoplakia patients to estimate risk of disease. Results Genotyping by TaqMan assay followed by statistical analyses showed that variant genotypes at Gemin3 and mir-34b reduced risk of disease [OR = 0.5(0.3–0.9) and OR = 0.7(0.5–0.9) respectively] in overall patients as well as in smokers [OR = 0.58(0.3–1) and OR = 0.68(0.5–0.9) respectively]. Among chewers, only mir29a significantly increased risk of disease [OR = 1.8(1–3)]. Gene-environment interactions using MDR-pt program revealed that mir29a, mir34b, mir423 and Xpo5 modulated risk of disease (p < 0.002) which may be related to change in expression of these genes as observed by Real-Time PCR assays. But association between polymorphisms and gene expressions was not found in our sample set as well as in larger datasets from open access platforms like Genevar and 1000 Genome database. Conclusion Variations in microRNAs and their processing genes modulated risk of precancer but further in-depth study is needed to understand mechanism of disease process.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Bidyut Roy
- Human Genetics Unit, Indian Statistical Institute, 203, B,T, Road, Kolkata 700108, India.
| |
Collapse
|
14
|
Genetics of Alzheimer's disease. BIOMED RESEARCH INTERNATIONAL 2013; 2013:254954. [PMID: 23984328 PMCID: PMC3741956 DOI: 10.1155/2013/254954] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Revised: 07/08/2013] [Accepted: 07/08/2013] [Indexed: 01/18/2023]
Abstract
Alzheimer's disease is the most common form of dementia and is the only top 10 cause of death in the United States that lacks disease-altering treatments. It is a complex disorder with environmental and genetic components. There are two major types of Alzheimer's disease, early onset and the more common late onset. The genetics of early-onset Alzheimer's disease are largely understood with variants in three different genes leading to disease. In contrast, while several common alleles associated with late-onset Alzheimer's disease, including APOE, have been identified using association studies, the genetics of late-onset Alzheimer's disease are not fully understood. Here we review the known genetics of early- and late-onset Alzheimer's disease.
Collapse
|
15
|
Urbanowicz RJ, Andrew AS, Karagas MR, Moore JH. Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J Am Med Inform Assoc 2013; 20:603-12. [PMID: 23444013 PMCID: PMC3721175 DOI: 10.1136/amiajnl-2012-001574] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2012] [Revised: 01/28/2013] [Accepted: 01/31/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND AND OBJECTIVE Detecting complex patterns of association between genetic or environmental risk factors and disease risk has become an important target for epidemiological research. In particular, strategies that provide multifactor interactions or heterogeneous patterns of association can offer new insights into association studies for which traditional analytic tools have had limited success. MATERIALS AND METHODS To concurrently examine these phenomena, previous work has successfully considered the application of learning classifier systems (LCSs), a flexible class of evolutionary algorithms that distributes learned associations over a population of rules. Subsequent work dealt with the inherent problems of knowledge discovery and interpretation within these algorithms, allowing for the characterization of heterogeneous patterns of association. Whereas these previous advancements were evaluated using complex simulation studies, this study applied these collective works to a 'real-world' genetic epidemiology study of bladder cancer susceptibility. RESULTS AND DISCUSSION We replicated the identification of previously characterized factors that modify bladder cancer risk--namely, single nucleotide polymorphisms from a DNA repair gene, and smoking. Furthermore, we identified potentially heterogeneous groups of subjects characterized by distinct patterns of association. Cox proportional hazard models comparing clinical outcome variables between the cases of the two largest groups yielded a significant, meaningful difference in survival time in years (survivorship). A marginally significant difference in recurrence time was also noted. These results support the hypothesis that an LCS approach can offer greater insight into complex patterns of association. CONCLUSIONS This methodology appears to be well suited to the dissection of disease heterogeneity, a key component in the advancement of personalized medicine.
Collapse
Affiliation(s)
- Ryan John Urbanowicz
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Lebanon, New Hampshire 03756, USA.
| | | | | | | |
Collapse
|
16
|
Setsirichok D, Tienboon P, Jaroonruang N, Kittichaijaroen S, Wongseree W, Piroonratana T, Usavanarong T, Limwongse C, Aporntewan C, Phadoongsidhi M, Chaiyaratana N. An omnibus permutation test on ensembles of two-locus analyses can detect pure epistasis and genetic heterogeneity in genome-wide association studies. SPRINGERPLUS 2013; 2:230. [PMID: 24804170 PMCID: PMC4006521 DOI: 10.1186/2193-1801-2-230] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 04/24/2013] [Indexed: 01/20/2023]
Abstract
This article presents the ability of an omnibus permutation test on ensembles of two-locus analyses (2LOmb) to detect pure epistasis in the presence of genetic heterogeneity. The performance of 2LOmb is evaluated in various simulation scenarios covering two independent causes of complex disease where each cause is governed by a purely epistatic interaction. Different scenarios are set up by varying the number of available single nucleotide polymorphisms (SNPs) in data, number of causative SNPs and ratio of case samples from two affected groups. The simulation results indicate that 2LOmb outperforms multifactor dimensionality reduction (MDR) and random forest (RF) techniques in terms of a low number of output SNPs and a high number of correctly-identified causative SNPs. Moreover, 2LOmb is capable of identifying the number of independent interactions in tractable computational time and can be used in genome-wide association studies. 2LOmb is subsequently applied to a type 1 diabetes mellitus (T1D) data set, which is collected from a UK population by the Wellcome Trust Case Control Consortium (WTCCC). After screening for SNPs that locate within or near genes and exhibit no marginal single-locus effects, the T1D data set is reduced to 95,991 SNPs from 12,146 genes. The 2LOmb search in the reduced T1D data set reveals that 12 SNPs, which can be divided into two independent sets, are associated with the disease. The first SNP set consists of three SNPs from MUC21 (mucin 21, cell surface associated), three SNPs from MUC22 (mucin 22), two SNPs from PSORS1C1 (psoriasis susceptibility 1 candidate 1) and one SNP from TCF19 (transcription factor 19). A four-locus interaction between these four genes is also detected. The second SNP set consists of three SNPs from ATAD1 (ATPase family, AAA domain containing 1). Overall, the findings indicate the detection of pure epistasis in the presence of genetic heterogeneity and provide an alternative explanation for the aetiology of T1D in the UK population.
Collapse
Affiliation(s)
- Damrongrit Setsirichok
- Department of Electrical and Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Pracharat Sai 1 Road, Bangsue, Bangkok 10800, Thailand
| | - Phuwadej Tienboon
- Department of Electrical and Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Pracharat Sai 1 Road, Bangsue, Bangkok 10800, Thailand
| | - Nattapong Jaroonruang
- Department of Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha-utid Road, Bangmod, Toongkru, Bangkok 10140, Thailand
| | - Somkit Kittichaijaroen
- Department of Electrical and Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Pracharat Sai 1 Road, Bangsue, Bangkok 10800, Thailand
| | - Waranyu Wongseree
- Division of Technology of Information System Management, Faculty of Engineering, Mahidol University, 25/25 Phuttamonthon 4 Road, Nakhon Pathom 73170, Salaya, Thailand
| | - Theera Piroonratana
- Department of Electrical and Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Pracharat Sai 1 Road, Bangsue, Bangkok 10800, Thailand
| | - Touchpong Usavanarong
- Department of Electrical and Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Pracharat Sai 1 Road, Bangsue, Bangkok 10800, Thailand
| | - Chanin Limwongse
- Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkok 10700, Bangkoknoi, Thailand
| | - Chatchawit Aporntewan
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, 254 Phayathai Road, Pathumwan, Bangkok 10330, Thailand
| | - Marong Phadoongsidhi
- Department of Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha-utid Road, Bangmod, Toongkru, Bangkok 10140, Thailand
| | - Nachol Chaiyaratana
- Department of Electrical and Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Pracharat Sai 1 Road, Bangsue, Bangkok 10800, Thailand ; Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkok 10700, Bangkoknoi, Thailand
| |
Collapse
|
17
|
Beretta L, Santaniello A. Extension of the survival dimensionality reduction algorithm to detect epistasis in competing risks models (SDR-CR). J Biomed Inform 2012; 46:174-80. [PMID: 23153648 DOI: 10.1016/j.jbi.2012.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2011] [Revised: 09/19/2012] [Accepted: 11/04/2012] [Indexed: 11/17/2022]
Abstract
BACKGROUND The discovery and the description of the genetic background of common human diseases is hampered by their complexity and dynamic behavior. Appropriate bioinformatic tools are needed to account all the facets of complex diseases and to this end we recently described the survival dimensionality reduction (SDR) algorithm in the effort to model gene-gene interactions in the context of survival analysis. When one event precludes the occurrence of another event under investigation in the 'competing risk model', survival algorithms require particular adjustment to avoid the risk of reporting wrong or biased conclusions. METHODS The SDR algorithm was modified to incorporate the cumulative incidence function as well as an adapted version of the Brier score for mutually exclusive outcomes, to better search for epistatic models in the competing risk setting. The applicability of the new SDR algorithm (SDR-CR) was evaluated using synthetic lifetime epistatic datasets with competing risks and on a dataset of scleroderma patients. RESULTS/CONCLUSIONS The SDR-CR algorithms retains a satisfactory power to detect the causative variants in simulated datasets under different scenarios of sample size and degrees of type I or type II censoring. In the real-world dataset, SDR-CR was capable of detecting a significant interaction between the IL-1α C-889T and the IL-1β C-511T single-nucleotide polymorphisms to predict the occurrence of restrictive lung disease vs. isolated pulmonary hypertension. We provide an useful extension of the SDR algorithm to analyze epistatic interactions in the competing risk settings that may be of use to unveil the genetic background of complex human diseases. AVAILABILITY http://sourceforge.net/projects/sdrproject/files/.
Collapse
Affiliation(s)
- Lorenzo Beretta
- Referral Center for Systemic Autoimmune Diseases, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy.
| | | |
Collapse
|
18
|
Gory JJ, Sweeney HC, Reif DM, Motsinger-Reif AA. A comparison of internal model validation methods for multifactor dimensionality reduction in the case of genetic heterogeneity. BMC Res Notes 2012; 5:623. [PMID: 23126544 PMCID: PMC3599301 DOI: 10.1186/1756-0500-5-623] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Accepted: 10/29/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Determining the genes responsible for certain human traits can be challenging when the underlying genetic model takes a complicated form such as heterogeneity (in which different genetic models can result in the same trait) or epistasis (in which genes interact with other genes and the environment). Multifactor Dimensionality Reduction (MDR) is a widely used method that effectively detects epistasis; however, it does not perform well in the presence of heterogeneity partly due to its reliance on cross-validation for internal model validation. Cross-validation allows for only one "best" model and is therefore inadequate when more than one model could cause the same trait. We hypothesize that another internal model validation method known as a three-way split will be better at detecting heterogeneity models. RESULTS In this study, we test this hypothesis by performing a simulation study to compare the performance of MDR to detect models of heterogeneity with the two different internal model validation techniques. We simulated a range of disease models with both main effects and gene-gene interactions with a range of effect sizes. We assessed the performance of each method using a range of definitions of power. CONCLUSIONS Overall, the power of MDR to detect heterogeneity models was relatively poor, especially under more conservative (strict) definitions of power. While the overall power was low, our results show that the cross-validation approach greatly outperformed the three-way split approach in detecting heterogeneity. This would motivate using cross-validation with MDR in studies where heterogeneity might be present. These results also emphasize the challenge of detecting heterogeneity models and the need for further methods development.
Collapse
Affiliation(s)
- Jeffrey J Gory
- Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | | | | | | |
Collapse
|
19
|
Urbanowicz RJ, Kiralis J, Fisher JM, Moore JH. Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection. BioData Min 2012; 5:15. [PMID: 23014095 PMCID: PMC3549792 DOI: 10.1186/1756-0381-5-15] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 09/14/2012] [Indexed: 11/30/2022] Open
Abstract
Background Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection. Results We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model’s EDM and COR are each stronger predictors of model detection success than heritability. Conclusions This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models.
Collapse
Affiliation(s)
- Ryan J Urbanowicz
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Dartmouth Medical School, Lebanon, NH, USA.
| | | | | | | |
Collapse
|
20
|
Oki NO, Motsinger-Reif AA. Multifactor dimensionality reduction as a filter-based approach for genome wide association studies. Front Genet 2011; 2:80. [PMID: 22303374 PMCID: PMC3268633 DOI: 10.3389/fgene.2011.00080] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2011] [Accepted: 10/26/2011] [Indexed: 11/13/2022] Open
Abstract
Advances in genotyping technology and the multitude of genetic data available now provide a vast amount of data that is proving to be useful in the quest for a better understanding of human genetic diseases through the study of genetic variation. This has led to the development of approaches such as genome wide association studies (GWAS) designed specifically for interrogating variants across the genome for association with disease, typically by testing single locus, univariate associations. More recently it has been accepted that epistatic (interaction) effects may also be great contributors to these genetic effects, and GWAS methods are now being applied to find epistatic effects. The challenge for these methods still remain in prioritization and interpretation of results, as it has also become standard for initial findings to be independently investigated in replication cohorts or functional studies. This is motivating the development and implementation of filter-based approaches to prioritize variants found to be significant in a discovery stage for follow-up for replication. Such filters must be able to detect both univariate and interactive effects. In the current study we present and evaluate the use of multifactor dimensionality reduction (MDR) as such a filter, with simulated data and a wide range of effect sizes. Additionally, we compare the performance of the MDR filter to a similar filter approach using logistic regression (LR), the more traditional approach used in GWAS analysis, as well as evaporative cooling (EC)-another prominent machine learning filtering method. The results of our simulation study show that MDR is an effective method for such prioritization, and that it can detect main effects, and interactions with or without marginal effects. Importantly, it performed as well as EC and LR for main effect models. It also significantly outperforms LR for various two-locus epistatic models, while it has equivalent results as EC for the epistatic models. The results of this study demonstrate the potential of MDR as a filter to detect gene-gene interactions in GWAS studies.
Collapse
Affiliation(s)
- Noffisat O. Oki
- Bioinformatics Research Center, North Carolina State UniversityRaleigh, NC, USA
| | - Alison A. Motsinger-Reif
- Bioinformatics Research Center, North Carolina State UniversityRaleigh, NC, USA
- Department of Statistics, North Carolina State UniversityRaleigh, NC, USA
| |
Collapse
|
21
|
Butler MW, Burt A, Edwards TL, Zuchner S, Scott WK, Martin ER, Vance JM, Wang L. Vitamin D receptor gene as a candidate gene for Parkinson disease. Ann Hum Genet 2011; 75:201-10. [PMID: 21309754 DOI: 10.1111/j.1469-1809.2010.00631.x] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Vitamin D and vitamin D receptor (VDR) have been postulated as environmental and genetic factors in neurodegeneration disorders including multiple sclerosis (MS), Alzheimer disease (AD), and recently Parkinson disease (PD). Given the sparse data on PD, we conducted a two-stage study to evaluate the genetic effects of VDR in PD. In the discovery stage, 30 tagSNPs in VDR were tested for association with risk as a discrete trait and age-at-onset (AAO) as a quantitative trait in 770 Caucasian PD families. In the validation stage, 18 VDR SNPs were tested in an independent Caucasian cohort (267 cases and 267 controls) constructed from a genome-wide association study (GWAS). In the discovery dataset, SNPs in the 5' end of VDR were associated with both risk and AAO with more significant evidence of association with AAO (P= 0.0008-0.02). These 5' SNPs were also associated with AD in another study. In the validation dataset, SNPs in the 3' end of VDR were associated with AAO (P= 0.003) but not risk. The 3' end SNP has been associated with both MS and AD in previous studies. Our findings suggest VDR as a potential susceptibility gene and support an essential role of vitamin D in PD.
Collapse
Affiliation(s)
- Megan W Butler
- Department of Pediatrics, Duke University Medical Center, Duke University School of Medicine, Durham, NC, USA
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Survival dimensionality reduction (SDR): development and clinical application of an innovative approach to detect epistasis in presence of right-censored data. BMC Bioinformatics 2010; 11:416. [PMID: 20691091 PMCID: PMC2928804 DOI: 10.1186/1471-2105-11-416] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2010] [Accepted: 08/06/2010] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Epistasis is recognized as a fundamental part of the genetic architecture of individuals. Several computational approaches have been developed to model gene-gene interactions in case-control studies, however, none of them is suitable for time-dependent analysis. Herein we introduce the Survival Dimensionality Reduction (SDR) algorithm, a non-parametric method specifically designed to detect epistasis in lifetime datasets. RESULTS The algorithm requires neither specification about the underlying survival distribution nor about the underlying interaction model and proved satisfactorily powerful to detect a set of causative genes in synthetic epistatic lifetime datasets with a limited number of samples and high degree of right-censorship (up to 70%). The SDR method was then applied to a series of 386 Dutch patients with active rheumatoid arthritis that were treated with anti-TNF biological agents. Among a set of 39 candidate genes, none of which showed a detectable marginal effect on anti-TNF responses, the SDR algorithm did find that the rs1801274 SNP in the Fc gamma RIIa gene and the rs10954213 SNP in the IRF5 gene non-linearly interact to predict clinical remission after anti-TNF biologicals. CONCLUSIONS Simulation studies and application in a real-world setting support the capability of the SDR algorithm to model epistatic interactions in candidate-genes studies in presence of right-censored data. AVAILABILITY http://sourceforge.net/projects/sdrproject/.
Collapse
|
23
|
Thomas D. Methods for investigating gene-environment interactions in candidate pathway and genome-wide association studies. Annu Rev Public Health 2010; 31:21-36. [PMID: 20070199 DOI: 10.1146/annurev.publhealth.012809.103619] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Despite the considerable enthusiasm about the yield of novel and replicated discoveries of genetic associations from the new generation of genome-wide association studies (GWAS), the proportion of the heritability of most complex diseases that have been studied to date remains small. Some of this "dark matter" could be due to gene-environment (G x E) interactions or more complex pathways involving multiple genes and exposures. We review the basic epidemiologic study design and statistical analysis approaches to studying G x E interactions individually and then consider more comprehensive approaches to studying entire pathways or GWAS data. In addition to the usual issues in genetic association studies, particular care is needed in exposure assessment, and very large sample sizes are required. Although hypothesis-driven, pathway-based and agnostic GWA study approaches are generally viewed as opposite poles, we suggest that the two can be usefully married using hierarchical modeling strategies that exploit external pathway knowledge in mining genome-wide data.
Collapse
Affiliation(s)
- Duncan Thomas
- Department of Preventive Medicine, University of Southern California, Los Angeles, California, 90089-9011, USA.
| |
Collapse
|
24
|
A screening methodology based on Random Forests to improve the detection of gene-gene interactions. Eur J Hum Genet 2010; 18:1127-32. [PMID: 20461113 DOI: 10.1038/ejhg.2010.48] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
The search for susceptibility loci in gene-gene interactions imposes a methodological and computational challenge for statisticians because of the large dimensionality inherent to the modelling of gene-gene interactions or epistasis. In an era in which genome-wide scans have become relatively common, new powerful methods are required to handle the huge amount of feasible gene-gene interactions and to weed out false positives and negatives from these results. One solution to the dimensionality problem is to reduce data by preliminary screening of markers to select the best candidates for further analysis. Ideally, this screening step is statistically independent of the testing phase. Initially developed for small numbers of markers, the Multifactor Dimensionality Reduction (MDR) method is a nonparametric, model-free data reduction technique to associate sets of markers with optimal predictive properties to disease. In this study, we examine the power of MDR in larger data sets and compare it with other approaches that are able to identify gene-gene interactions. Under various interaction models (purely and not purely epistatic), we use a Random Forest (RF)-based prescreening method, before executing MDR, to improve its performance. We find that the power of MDR increases when noisy SNPs are first removed, by creating a collection of candidate markers with RFs. We validate our technique by extensive simulation studies and by application to asthma data from the European Committee of Respiratory Health Study II.
Collapse
|
25
|
Edwards TL, Turner SD, Torstenson ES, Dudek SM, Martin ER, Ritchie MD. A general framework for formal tests of interaction after exhaustive search methods with applications to MDR and MDR-PDT. PLoS One 2010; 5:e9363. [PMID: 20186329 PMCID: PMC2826406 DOI: 10.1371/journal.pone.0009363] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Accepted: 12/16/2009] [Indexed: 02/03/2023] Open
Abstract
The initial presentation of multifactor dimensionality reduction (MDR) featured cross-validation to mitigate over-fitting, computationally efficient searches of the epistatic model space, and variable construction with constructive induction to alleviate the curse of dimensionality. However, the method was unable to differentiate association signals arising from true interactions from those due to independent main effects at individual loci. This issue leads to problems in inference and interpretability for the results from MDR and the family-based compliment the MDR-pedigree disequilibrium test (PDT). A suggestion from previous work was to fit regression models post hoc to specifically evaluate the null hypothesis of no interaction for MDR or MDR-PDT models. We demonstrate with simulation that fitting a regression model on the same data as that analyzed by MDR or MDR-PDT is not a valid test of interaction. This is likely to be true for any other procedure that searches for models, and then performs an uncorrected test for interaction. We also show with simulation that when strong main effects are present and the null hypothesis of no interaction is true, that MDR and MDR-PDT reject at far greater than the nominal rate. We also provide a valid regression-based permutation test procedure that specifically tests the null hypothesis of no interaction, and does not reject the null when only main effects are present. The regression-based permutation test implemented here conducts a valid test of interaction after a search for multilocus models, and can be applied to any method that conducts a search to find a multilocus model representing an interaction.
Collapse
Affiliation(s)
- Todd L. Edwards
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Center for Genetic Epidemiology and Statistical Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, United States of America
| | - Stephen D. Turner
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Eric S. Torstenson
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Scott M. Dudek
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Eden R. Martin
- Center for Genetic Epidemiology and Statistical Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, United States of America
| | - Marylyn D. Ritchie
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
26
|
Ritchie MD, Bush WS. Genome simulation approaches for synthesizing in silico datasets for human genomics. ADVANCES IN GENETICS 2010; 72:1-24. [PMID: 21029846 DOI: 10.1016/b978-0-12-380862-2.00001-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Simulated data is a necessary first step in the evaluation of new analytic methods because in simulated data the true effects are known. To successfully develop novel statistical and computational methods for genetic analysis, it is vital to simulate datasets consisting of single nucleotide polymorphisms (SNPs) spread throughout the genome at a density similar to that observed by new high-throughput molecular genomics studies. In addition, the simulation of environmental data and effects will be essential to properly formulate risk models for complex disorders. Data simulations are often criticized because they are much less noisy than natural biological data, as it is nearly impossible to simulate the multitude of possible sources of natural and experimental variability. However, simulating data in silico is the most straightforward way to test the true potential of new methods during development. Thus, advances that increase the complexity of data simulations will permit investigators to better assess new analytical methods. In this work, we will briefly describe some of the current approaches for the simulation of human genomics data describing the advantages and disadvantages of the various approaches. We will also include details on software packages available for data simulation. Finally, we will expand upon one particular approach for the creation of complex, human genomic datasets that uses a forward-time population simulation algorithm: genomeSIMLA. Many of the hallmark features of biological datasets can be synthesized in silico; still much research is needed to enhance our capabilities to create datasets that capture the natural complexity of biological datasets.
Collapse
Affiliation(s)
- Marylyn D Ritchie
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, USA
| | | |
Collapse
|
27
|
Wongseree W, Assawamakin A, Piroonratana T, Sinsomros S, Limwongse C, Chaiyaratana N. Detecting purely epistatic multi-locus interactions by an omnibus permutation test on ensembles of two-locus analyses. BMC Bioinformatics 2009; 10:294. [PMID: 19761607 PMCID: PMC2759961 DOI: 10.1186/1471-2105-10-294] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2009] [Accepted: 09/17/2009] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Purely epistatic multi-locus interactions cannot generally be detected via single-locus analysis in case-control studies of complex diseases. Recently, many two-locus and multi-locus analysis techniques have been shown to be promising for the epistasis detection. However, exhaustive multi-locus analysis requires prohibitively large computational efforts when problems involve large-scale or genome-wide data. Furthermore, there is no explicit proof that a combination of multiple two-locus analyses can lead to the correct identification of multi-locus interactions. RESULTS The proposed 2LOmb algorithm performs an omnibus permutation test on ensembles of two-locus analyses. The algorithm consists of four main steps: two-locus analysis, a permutation test, global p-value determination and a progressive search for the best ensemble. 2LOmb is benchmarked against an exhaustive two-locus analysis technique, a set association approach, a correlation-based feature selection (CFS) technique and a tuned ReliefF (TuRF) technique. The simulation results indicate that 2LOmb produces a low false-positive error. Moreover, 2LOmb has the best performance in terms of an ability to identify all causative single nucleotide polymorphisms (SNPs) and a low number of output SNPs in purely epistatic two-, three- and four-locus interaction problems. The interaction models constructed from the 2LOmb outputs via a multifactor dimensionality reduction (MDR) method are also included for the confirmation of epistasis detection. 2LOmb is subsequently applied to a type 2 diabetes mellitus (T2D) data set, which is obtained as a part of the UK genome-wide genetic epidemiology study by the Wellcome Trust Case Control Consortium (WTCCC). After primarily screening for SNPs that locate within or near 372 candidate genes and exhibit no marginal single-locus effects, the T2D data set is reduced to 7,065 SNPs from 370 genes. The 2LOmb search in the reduced T2D data reveals that four intronic SNPs in PGM1 (phosphoglucomutase 1), two intronic SNPs in LMX1A (LIM homeobox transcription factor 1, alpha), two intronic SNPs in PARK2 (Parkinson disease (autosomal recessive, juvenile) 2, parkin) and three intronic SNPs in GYS2 (glycogen synthase 2 (liver)) are associated with the disease. The 2LOmb result suggests that there is no interaction between each pair of the identified genes that can be described by purely epistatic two-locus interaction models. Moreover, there are no interactions between these four genes that can be described by purely epistatic multi-locus interaction models with marginal two-locus effects. The findings provide an alternative explanation for the aetiology of T2D in a UK population. CONCLUSION An omnibus permutation test on ensembles of two-locus analyses can detect purely epistatic multi-locus interactions with marginal two-locus effects. The study also reveals that SNPs from large-scale or genome-wide case-control data which are discarded after single-locus analysis detects no association can still be useful for genetic epidemiology studies.
Collapse
Affiliation(s)
- Waranyu Wongseree
- Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand
| | - Anunchai Assawamakin
- Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkoknoi, Bangkok 10700, Thailand
| | - Theera Piroonratana
- Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand
| | - Saravudh Sinsomros
- Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand
| | - Chanin Limwongse
- Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkoknoi, Bangkok 10700, Thailand
| | - Nachol Chaiyaratana
- Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand
- Division of Molecular Genetics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Prannok Road, Bangkoknoi, Bangkok 10700, Thailand
| |
Collapse
|