1
|
Sun N, Chu J, He Q, Wang Y, Han Q, Yi N, Zhang R, Shen Y. BHAFT: Bayesian heredity-constrained accelerated failure time models for detecting gene-environment interactions in survival analysis. Stat Med 2024; 43:4013-4026. [PMID: 38963094 DOI: 10.1002/sim.10145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 05/06/2024] [Accepted: 06/06/2024] [Indexed: 07/05/2024]
Abstract
In addition to considering the main effects, understanding gene-environment (G × E) interactions is imperative for determining the etiology of diseases and the factors that affect their prognosis. In the existing statistical framework for censored survival outcomes, there are several challenges in detecting G × E interactions, such as handling high-dimensional omics data, diverse environmental factors, and algorithmic complications in survival analysis. The effect heredity principle has widely been used in studies involving interaction identification because it incorporates the dependence of the main and interaction effects. However, Bayesian survival models that incorporate the assumption of this principle have not been developed. Therefore, we propose Bayesian heredity-constrained accelerated failure time (BHAFT) models for identifying main and interaction (M-I) effects with novel spike-and-slab or regularized horseshoe priors to incorporate the assumption of effect heredity principle. The R package rstan was used to fit the proposed models. Extensive simulations demonstrated that BHAFT models had outperformed other existing models in terms of signal identification, coefficient estimation, and prognosis prediction. Biologically plausible G × E interactions associated with the prognosis of lung adenocarcinoma were identified using our proposed model. Notably, BHAFT models incorporating the effect heredity principle could identify both main and interaction effects, which are highly useful in exploring G × E interactions in high-dimensional survival analysis. The code and data used in our paper are available at https://github.com/SunNa-bayesian/BHAFT.
Collapse
Affiliation(s)
- Na Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Qida He
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Yu Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Qiang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Ruyang Zhang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| |
Collapse
|
2
|
Sun NA, Wang YU, Chu J, Han Q, Shen Y. Bayesian Approaches in Exploring Gene-environment and Gene-gene Interactions: A Comprehensive Review. Cancer Genomics Proteomics 2023; 20:669-678. [PMID: 38035701 PMCID: PMC10687732 DOI: 10.21873/cgp.20414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Rapid advancements in high-throughput biological techniques have facilitated the generation of high-dimensional omics datasets, which have provided a solid foundation for precision medicine and prognosis prediction. Nonetheless, the problem of missing heritability persists. To solve this problem, it is essential to explain the genetic structure of disease incidence risk and prognosis by incorporating interactions. The development of the Bayesian theory has provided new approaches for developing models for interaction identification and estimation. Several Bayesian models have been developed to improve the accuracy of model and identify the main effect, gene-environment (G×E) and gene-gene (G×G) interactions. Studies based on single-nucleotide polymorphisms (SNPs) are significant for the exploration of rare and common variants. Models based on the effect heredity principle and group-based models are relatively flexible and do not require strict constraints when dealing with the hierarchical structure between the main effect and interactions (M-I). These models have a good interpretability of biological mechanisms. Machine learning-based Bayesian approaches are highly competitive in improving prediction accuracy. These models provide insights into the mechanisms underlying the occurrence and progression of complex diseases, identify more reliable biomarkers, and develop higher predictive accuracy. In this paper, we provide a comprehensive review of these Bayesian approaches.
Collapse
Affiliation(s)
- N A Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Y U Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Qiang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| |
Collapse
|
3
|
Ni X, Zhou M, Wang H, He KY, Broeckel U, Hanis C, Kardia S, Redline S, Cooper RS, Tang H, Zhu X. Detecting fitness epistasis in recently admixed populations with genome-wide data. BMC Genomics 2020; 21:476. [PMID: 32652930 PMCID: PMC7353720 DOI: 10.1186/s12864-020-06874-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 06/30/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fitness epistasis, the interaction effect of genes at different loci on fitness, makes an important contribution to adaptive evolution. Although fitness interaction evidence has been observed in model organisms, it is more difficult to detect and remains poorly understood in human populations as a result of limited statistical power and experimental constraints. Fitness epistasis is inferred from non-independence between unlinked loci. We previously observed ancestral block correlation between chromosomes 4 and 6 in African Americans. The same approach fails when examining ancestral blocks on the same chromosome due to the strong confounding effect observed in a recently admixed population. RESULTS We developed a novel approach to eliminate the bias caused by admixture linkage disequilibrium when searching for fitness epistasis on the same chromosome. We applied this approach in 16,252 unrelated African Americans and identified significant ancestral correlations in two pairs of genomic regions (P-value< 8.11 × 10- 7) on chromosomes 1 and 10. The ancestral correlations were not explained by population admixture. Historical African-European crossover events are reduced between pairs of epistatic regions. We observed multiple pairs of co-expressed genes shared by the two regions on each chromosome, including ADAR being co-expressed with IFI44 in almost all tissues and DARC being co-expressed with VCAM1, S1PR1 and ELTD1 in multiple tissues in the Genotype-Tissue Expression (GTEx) data. Moreover, the co-expressed gene pairs are associated with the same diseases/traits in the GWAS Catalog, such as white blood cell count, blood pressure, lung function, inflammatory bowel disease and educational attainment. CONCLUSIONS Our analyses revealed two instances of fitness epistasis on chromosomes 1 and 10, and the findings suggest a potential approach to improving our understanding of adaptive evolution.
Collapse
Affiliation(s)
- Xumin Ni
- Department of Mathematics, School of Science, Beijing Jiaotong University, Beijing, 100044, China
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Mengshi Zhou
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Heming Wang
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Karen Y He
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Uli Broeckel
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Craig Hanis
- Department of Epidemiology, Human Genetics and Environmental Sciences, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Sharon Kardia
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard S Cooper
- Department of Public Health Science, Loyola University Medical Center, Maywood, IL, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA.
| |
Collapse
|
4
|
Bhattacharya D, Bhattacharya S. A Bayesian semiparametric approach to learning about gene–gene interactions in case-control studies. J Appl Stat 2018. [DOI: 10.1080/02664763.2018.1444741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Durba Bhattacharya
- St. Xavier's College, Kolkata, India
- Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, India
| | - Sourabh Bhattacharya
- Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
5
|
Distinguishing real from fake ivory products by elemental analyses: A Bayesian hybrid classification method. Forensic Sci Int 2017; 272:142-149. [PMID: 28157639 DOI: 10.1016/j.forsciint.2017.01.016] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 10/23/2016] [Accepted: 01/15/2017] [Indexed: 01/24/2023]
Abstract
As laws tighten to limit commercial ivory trading and protect threatened species like whales and elephants, increased sales of fake ivory products have become widespread. This study describes a method, handheld X-ray fluorescence (XRF) as a noninvasive technique for elemental analysis, to differentiate quickly between ivory (Asian and African elephant, mammoth) from non-ivory (bones, teeth, antler, horn, wood, synthetic resin, rock) materials. An equation consisting of 20 elements and light elements from a stepwise discriminant analysis was used to classify samples, followed by Bayesian binary regression to determine the probability of a sample being 'ivory', with complementary log log analysis to identify the best fit model for this purpose. This Bayesian hybrid classification model was 93% accurate with 92% precision in discriminating ivory from non-ivory materials. The method was then validated by scanning an additional ivory and non-ivory samples, correctly identifying bone as not ivory with >95% accuracy, except elephant bone, which was 72%. It was less accurate for wood and rock (25-85%); however, a preliminary screening to determine if samples are not Ca-dominant could eliminate inorganic materials. In conclusion, elemental analyses by XRF can be used to identify several forms of fake ivory samples, which could have forensic application.
Collapse
|
6
|
Ye L, Wang G, Tang Y, Bai J. A population-specific correlation between ADIPOQ rs2241766 and rs 1501299 and colorectal cancer risk: a meta-analysis for debate. Int J Clin Oncol 2016; 22:307-315. [PMID: 27704292 DOI: 10.1007/s10147-016-1044-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 09/24/2016] [Indexed: 01/31/2023]
Abstract
AIMS Many epidemiological studies have investigated the correlation between adiponectin, C1Q and collagen domain containing (ADIPOQ) single nucleotide polymorphisms (SNPs) and risk of colorectal cancer (CRC). Although conflicting results have been reported, there was dispute regarding two SNPs (rs2241766 T/G and rs1501299 G/T). Therefore, we conducted a meta-analysis to systematically assess the associations and try to find the reasons for the dispute. METHODS We searched PubMed, the Cochrane Library, Elsevier, Wiley Online Library, China National Knowledge Infrastructure, WanFang data and Chongqing VIP to search for all eligible case-control studies published up to January 2015. Effect sizes of odds ratios (OR) and 95 % confidence intervals (95 % CI) were calculated using a fixed- or random-effect model. RESULTS Ten case-control studies including 4377 cases and 5584 controls were selected. A significant difference was observed in Chinese (OR 0.76; 95 % CI 0.68, 0.85; P < 0.001) and Ashkenazi Jewish populations (OR 0.79; 95 % CI 0.63, 0.99; P = 0.04) for rs2241766 with dominant model (TT vs TG + GG). A significant difference was observed in the Chinese population (OR 1.23; 95 % CI 1.11, 1.37; P < 0.001) for rs1501299 with dominant model (TT vs TG + GG). In addition, intake of red meat showed a synergistic effect between ADIPOQ gene and risk of colorectal cancer (CRC). CONCLUSIONS ADIPOQ SNPs rs2241766 T/G and rs 1501299 G/T have a population-specific correlation with risk of CRC. However, small sample studies may increase reporting bias, particularly if the total number of studies included in the analysis is small.
Collapse
Affiliation(s)
- Lin Ye
- Department of Gastrointestinal Surgery, Union Hospital, Tongji Medical College, HuaZhong University of Science and Technology, 1277 JieFang Avenue, Wuhan, 430022, China.
| | - Guobin Wang
- Department of Gastrointestinal Surgery, Union Hospital, Tongji Medical College, HuaZhong University of Science and Technology, 1277 JieFang Avenue, Wuhan, 430022, China
| | - Yong Tang
- Department of Gastrointestinal Surgery, Union Hospital, Tongji Medical College, HuaZhong University of Science and Technology, 1277 JieFang Avenue, Wuhan, 430022, China
| | - Jie Bai
- Department of Gastrointestinal Surgery, Union Hospital, Tongji Medical College, HuaZhong University of Science and Technology, 1277 JieFang Avenue, Wuhan, 430022, China
| |
Collapse
|
7
|
Mallick H, Tiwari HK. EM Adaptive LASSO-A Multilocus Modeling Strategy for Detecting SNPs Associated with Zero-inflated Count Phenotypes. Front Genet 2016; 7:32. [PMID: 27066062 PMCID: PMC4811966 DOI: 10.3389/fgene.2016.00032] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 02/22/2016] [Indexed: 11/13/2022] Open
Abstract
Count data are increasingly ubiquitous in genetic association studies, where it is possible to observe excess zero counts as compared to what is expected based on standard assumptions. For instance, in rheumatology, data are usually collected in multiple joints within a person or multiple sub-regions of a joint, and it is not uncommon that the phenotypes contain enormous number of zeroes due to the presence of excessive zero counts in majority of patients. Most existing statistical methods assume that the count phenotypes follow one of these four distributions with appropriate dispersion-handling mechanisms: Poisson, Zero-inflated Poisson (ZIP), Negative Binomial, and Zero-inflated Negative Binomial (ZINB). However, little is known about their implications in genetic association studies. Also, there is a relative paucity of literature on their usefulness with respect to model misspecification and variable selection. In this article, we have investigated the performance of several state-of-the-art approaches for handling zero-inflated count data along with a novel penalized regression approach with an adaptive LASSO penalty, by simulating data under a variety of disease models and linkage disequilibrium patterns. By taking into account data-adaptive weights in the estimation procedure, the proposed method provides greater flexibility in multi-SNP modeling of zero-inflated count phenotypes. A fast coordinate descent algorithm nested within an EM (expectation-maximization) algorithm is implemented for estimating the model parameters and conducting variable selection simultaneously. Results show that the proposed method has optimal performance in the presence of multicollinearity, as measured by both prediction accuracy and empirical power, which is especially apparent as the sample size increases. Moreover, the Type I error rates become more or less uncontrollable for the competing methods when a model is misspecified, a phenomenon routinely encountered in practice.
Collapse
Affiliation(s)
- Himel Mallick
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Harvard UniversityBoston, MA, USA; Program of Medical and Population Genetics, Broad Institute of MIT and HarvardCambridge, MA, USA
| | - Hemant K Tiwari
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham Birmingham, AL, USA
| |
Collapse
|
8
|
Park J, Kim I, Jung KJ, Kim S, Jee SH, Yoon SK. Gene-gene interaction analysis identifies a new genetic risk factor for colorectal cancer. J Biomed Sci 2015; 22:73. [PMID: 26362652 PMCID: PMC4566297 DOI: 10.1186/s12929-015-0180-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 08/23/2015] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Adiponectin levels have been shown to be associated with colorectal cancer (CRC). Furthermore, a newly identified adiponectin receptor, T-cadherin, has been associated with plasma adiponectin levels. Therefore, we investigated the potential for a genetic association between T-cadherin and CRC risk. RESULT We conducted a case-control study using the Korean Cancer Prevention study-II cohort, which is composed of 325 CRC patients and 977 normal individuals. Study results revealed that rs3865188 in the 5' flanking region of the T-cadherin gene (CDH13) was significantly associated with CRC (p = 0.0474). The odds ratio (OR) for the TT genotype as compared to the TA + AA genotype was 1.577 (p = 0.0144). In addition, the interaction between CDH13 and the adiponectin gene (APN) for CRC risk was investigated using a logistic regression analysis. Among six APN single nucleotide polymorphisms (rs182052, rs17366568, rs2241767, rs3821799, rs3774261, and rs6773957), an interaction with the rs3865188 was found for four (rs2241767, rs3821799, rs3774261, and rs6773957). The group with combined genotypes of TT for rs3865188 and GG for rs377426 displayed the highest risk for CRC development as compared to those with the other genotype combinations. The OR for the TT/GG genotype as compared to the AA/AA genotype was 4.108 (p = 0.004). Furthermore, the plasma adiponectin level showed a correlation with the gene-gene interaction, and the group with the highest risk for CRC had the lowest adiponectin level (median, 4.8 μg/mL for the TT/GG genotype vs.7.835 μg/mL for the AA/AA genotype, p = 0.0017). CONCLUSIONS The present study identified a new genetic factor for CRC risk and an interaction between CDH13 and APN in CRC risk. These genetic factors may be useful for predicting CRC risk.
Collapse
Affiliation(s)
- Jongkeun Park
- Department of Medical Lifesciences, The Catholic University of Korea, 505 Banpo-dong, Seocho-gu, Seoul, 137-701, Republic of Korea
| | - Injung Kim
- Department of Medical Lifesciences, The Catholic University of Korea, 505 Banpo-dong, Seocho-gu, Seoul, 137-701, Republic of Korea
| | - Keum Ji Jung
- Department of Epidemiology and Health Promotion, Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, South Korea
| | - Soriul Kim
- Department of Epidemiology and Health Promotion, Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, South Korea
| | - Sun Ha Jee
- Department of Epidemiology and Health Promotion, Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, South Korea
| | - Sungjoo Kim Yoon
- Department of Medical Lifesciences, The Catholic University of Korea, 505 Banpo-dong, Seocho-gu, Seoul, 137-701, Republic of Korea.
| |
Collapse
|
9
|
Upton A, Trelles O, Cornejo-García JA, Perkins JR. Review: High-performance computing to detect epistasis in genome scale data sets. Brief Bioinform 2015; 17:368-79. [PMID: 26272945 DOI: 10.1093/bib/bbv058] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Indexed: 11/14/2022] Open
Abstract
It is becoming clear that most human diseases have a complex etiology that cannot be explained by single nucleotide polymorphisms (SNPs) or simple additive combinations; the general consensus is that they are caused by combinations of multiple genetic variations. The limited success of some genome-wide association studies is partly a result of this focus on single genetic markers. A more promising approach is to take into account epistasis, by considering the association of multiple SNP interactions with disease. However, as genomic data continues to grow in resolution, and genome and exome sequencing become more established, the number of combinations of variants to consider increases rapidly. Two potential solutions should be considered: the use of high-performance computing, which allows us to consider a larger number of variables, and heuristics to make the solution more tractable, essential in the case of genome sequencing. In this review, we look at different computational methods to analyse epistatic interactions within disease-related genetic data sets created by microarray technology. We also review efforts to use epistatic analysis results to produce biomarkers for diagnostic tests and give our views on future directions in this field in light of advances in sequencing technology and variants in non-coding regions.
Collapse
|
10
|
Three ADIPOR1 Polymorphisms and Cancer Risk: A Meta-Analysis of Case-Control Studies. PLoS One 2015; 10:e0127253. [PMID: 26047008 PMCID: PMC4457489 DOI: 10.1371/journal.pone.0127253] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2014] [Accepted: 04/11/2015] [Indexed: 12/31/2022] Open
Abstract
Background Studies have come to conflicting conclusions about whether polymorphisms in the adiponectin receptor 1 gene (ADIPOR1) are associated with cancer risk. To help resolve this question, we meta-analyzed case-control studies in the literature. Methods PubMed, EMBASE, Cochrane Library, the Chinese Biological Medical Database and the Chinese National Knowledge Infrastructure Database were systematically searched to identify all case-control studies published through February 2015 examining any ADIPOR1 polymorphisms and risk of any type of cancer. Pooled odds ratios (ORs) and corresponding 95% confidence intervals (CIs) were calculated. Results A total of 13 case-control studies involving 5,750 cases and 6,762 controls were analyzed. Analysis of the entire study population revealed a significant association between rs1342387(G/A) and overall cancer risk using a homozygous model (OR 0.82, 95%CI 0.72 to 0.94), heterozygous model (OR 0.84, 95%CI 0.76 to 0.93), dominant model (OR 0.85, 95%CI 0.75 to 0.97) and allele contrast model (OR 0.88, 95%CI 0.80 to 0.97). However, subgroup analysis showed that this association was significant only for Asians in the case of colorectal cancer. No significant associations were found between rs12733285(C/T) or rs7539542(C/G) and cancer risk, either in analyses of the entire study population or in analyses of subgroups. Conclusions Our meta-analysis suggests that the ADIPOR1 rs1342387(G/A) polymorphism, but not rs12733285(C/T) or rs7539542(C/G), may be associated with cancer risk, especially risk of colorectal cancer in Asians. Large, well-designed studies are needed to verify our findings.
Collapse
|
11
|
Talluri R, Shete S. Evaluating methods for modeling epistasis networks with application to head and neck cancer. Cancer Inform 2015; 14:17-23. [PMID: 25733798 PMCID: PMC4332043 DOI: 10.4137/cin.s17289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 01/05/2015] [Accepted: 01/06/2015] [Indexed: 11/23/2022] Open
Abstract
Epistasis helps to explain how multiple single-nucleotide polymorphisms (SNPs) interact to cause disease. A variety of tools have been developed to detect epistasis. In this article, we explore the strengths and weaknesses of an information theory approach for detecting epistasis and compare it to the logistic regression approach through simulations. We consider several scenarios to simulate the involvement of SNPs in an epistasis network with respect to linkage disequilibrium patterns among them and the presence or absence of main and interaction effects. We conclude that the information theory approach more efficiently detects interaction effects when main effects are absent, whereas, in general, the logistic regression approach is appropriate in all scenarios but results in higher false positives. We compute epistasis networks for SNPs in the FSD1L gene using a two-phase head and neck cancer genome-wide association study involving 2,185 cases and 4,507 controls to demonstrate the practical application of the methods.
Collapse
Affiliation(s)
- Rajesh Talluri
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Sanjay Shete
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
12
|
Grange L, Bureau JF, Nikolayeva I, Paul R, Van Steen K, Schwikowski B, Sakuntabhai A. Filter-free exhaustive odds ratio-based genome-wide interaction approach pinpoints evidence for interaction in the HLA region in psoriasis. BMC Genet 2015; 16:11. [PMID: 25655172 PMCID: PMC4341885 DOI: 10.1186/s12863-015-0174-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 01/23/2015] [Indexed: 12/02/2022] Open
Abstract
Background Deciphering the genetic architecture of complex traits is still a major challenge for human genetics. In most cases, genome-wide association studies have only partially explained the heritability of traits and diseases. Epistasis, one potentially important cause of this missing heritability, is difficult to explore at the genome-wide level. Here, we develop and assess a tool based on interactive odds ratios (IOR), Fast Odds Ratio-based sCan for Epistasis (FORCE), as a novel approach for exhaustive genome-wide epistasis search. IOR is the ratio between the multiplicative term of the odds ratio (OR) of having each variant over the OR of having both of them. By definition, an IOR that significantly deviates from 1 suggests the occurrence of an interaction (epistasis). As the IOR is fast to calculate, we used the IOR to rank and select pairs of interacting polymorphisms for P value estimation, which is more time consuming. Results FORCE displayed power and accuracy similar to existing parametric and non-parametric methods, and is fast enough to complete a filter-free genome-wide epistasis search in a few days on a standard computer. Analysis of psoriasis data uncovered novel epistatic interactions in the HLA region, corroborating the known major and complex role of the HLA region in psoriasis susceptibility. Conclusions Our systematic study revealed the ability of FORCE to uncover novel interactions, highlighted the importance of exhaustiveness, as well as its specificity for certain types of interactions that were not detected by existing approaches. We therefore believe that FORCE is a valuable new tool for decoding the genetic basis of complex diseases. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0174-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laura Grange
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France. .,Université Paris Diderot, Paris, 75013, France.
| | - Jean-François Bureau
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Iryna Nikolayeva
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France. .,Université Paris-Descartes, Sorbonne Paris Cité, Paris, France.
| | - Richard Paul
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore institute, University of Liège, Liège, Belgium. .,Bioinformatics and Modeling, GiGA-R, University of Liège, Liège, Belgium.
| | - Benno Schwikowski
- Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France.
| | - Anavaj Sakuntabhai
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| |
Collapse
|
13
|
Stingo FC, Swartz MD, Vannucci M. A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data. STATISTICS AND ITS INTERFACE 2015; 8:137-151. [PMID: 28989562 PMCID: PMC5630184 DOI: 10.4310/sii.2015.v8.n2.a2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data. In this paper we propose a novel Bayesian modeling framework to identify molecular biomarkers for disease prediction. Our method combines pathway-based approaches with multiple SNP analyses of a specified region of interest. The model's development is motivated by SNP data from a lung cancer study. In our approach we define gene-level scores based on SNP allele frequencies and use a linear modeling setting to study the scores association to the observed phenotype. The basic idea behind the definition of gene-level scores is to weigh the SNPs within the gene according to their rarity, based on genotype frequencies expected under the Hardy-Weinberg equilibrium law. This results in scores giving more importance to the unusually low frequencies, i.e. to SNPs that might indicate peculiar genetic differences between subjects belonging to different groups. An additional feature of our approach is that we incorporate information on SNP-to-SNP associations into the model. In particular, we use network priors that model the linkage disequilibrium between SNPs. For posterior inference, we design a stochastic search method that identifies significant biomarkers (genes and SNPs) for disease prediction. We assess performances on simulated data and compare results to existing approaches. We then show the ability of the proposed methodology to detect relevant genes and associated SNPs in a lung cancer dataset.
Collapse
Affiliation(s)
- Francesco C Stingo
- Department of Biostatistics, MD Anderson Cancer Center, 1400 Pressler St. Houston, TX 77030, USA
| | - Michael D Swartz
- Department of Biostatistics, UT School of Public Health, 1200 Pressler St. Houston, TX 77030, USA
| | - Marina Vannucci
- Department of Statistics, MS 138, Rice University, 6100 Main St. Houston, TX 77251-1892 USA
| |
Collapse
|
14
|
Ou Y, Chen P, Zhou Z, Li C, Liu J, Tajima K, Guo J, Cao J, Wang H. Associations between variants on ADIPOQ and ADIPOR1 with colorectal cancer risk: a Chinese case-control study and updated meta-analysis. BMC MEDICAL GENETICS 2014; 15:137. [PMID: 25516230 PMCID: PMC4411774 DOI: 10.1186/s12881-014-0137-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 12/11/2014] [Indexed: 12/11/2022]
Abstract
Background Epidemiological studies have suggested that variants on adiponectin (ADIPOQ) and its receptor ADIPOR1 (adiponectin receptor 1) are associated with colorectal cancer (CRC) risk; however, the results were inconclusive. The aim of the study was to evaluate the associations between the variants on ADIPOQ and ADIPOR1 and the CRC risk with a hospital-based case-control study in the Chinese population along with meta-analysis of available epidemiological studies. Methods With a hospital-based case-control study of 341 cases and 727 controls, the associations between the common variants on ADIPOQ (rs266729, rs822395, rs2241766 and rs1501299) and ADIPOR1 (rs1342387 and rs12733285) and CRC susceptibility were evaluated. Meta-analysis of the published epidemiological studies was performed to investigate the associations between the variants and CRC risk. Results For the population study, we found that variant rs1342387 of ADIPOR1 was associated with a reduced risk for CRC [adjusted odds ratio (OR) = 0.74, 95% confidential intervals (95% CI) = 0.57-0.97; CT/TT vs. CC]. The meta-analysis also suggested a significant association for rs1342387 and CRC risk; the pooled OR was 0.79 (95% CI = 0.66-0.95) for the CT/TT carriers compared to CC homozygotes under the random-effects model (Q = 8.06, df = 4, P = 0.089; I2 = 50.4%). The case-control study found no significant association for variants rs266729, rs822395, rs2241766, and rs1501299 on ADIPOQ or variant rs12733285 on ADIPOR1 and CRC susceptibility, which were consistent with results from the meta-analysis studies. Conclusions These data suggested that variant rs1342387 on ADIPOR1 may be a novel CRC susceptibility factor. Electronic supplementary material The online version of this article (doi:10.1186/s12881-014-0137-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yiyi Ou
- Key Laboratory of Food Safety Research, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, 200031, P. R China. .,Medical Department, The General Hospital of Navy, Beijing, 100037, P. R China.
| | - Peizhan Chen
- Key Laboratory of Food Safety Research, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, 200031, P. R China.
| | - Ziyuan Zhou
- Toxicology Institute, Key Lab of Medical Protection for Electromagnetic Radiation, Ministry of Education of China, College of Preventive Medicine; Third Military Medical University, Chongqing, 400038, P. R. China. .,Department of Environment Health, College of Preventive Medicine; Third Military Medical University, Chongqing, 400038, P. R. China.
| | - Chenglin Li
- Key Laboratory of Food Safety Research, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, 200031, P. R China.
| | - Jinyi Liu
- Toxicology Institute, Key Lab of Medical Protection for Electromagnetic Radiation, Ministry of Education of China, College of Preventive Medicine; Third Military Medical University, Chongqing, 400038, P. R. China.
| | - Kazuo Tajima
- Division of Epidemiology and Prevention, Aichi Cancer Center Research Institute, Nagoya, Japan.
| | - Junsheng Guo
- Department of Military Hygiene, Faculty of Naval Medicine, Second Military Medical University, Shanghai, 200433, P. R. China.
| | - Jia Cao
- Toxicology Institute, Key Lab of Medical Protection for Electromagnetic Radiation, Ministry of Education of China, College of Preventive Medicine; Third Military Medical University, Chongqing, 400038, P. R. China.
| | - Hui Wang
- Key Laboratory of Food Safety Research, Institute for Nutritional Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, 200031, P. R China. .,Key Laboratory of Food Safety Risk Assessment, Ministry of Health, Beijing, 100021, P. R. China. .,School of Life Science and Technology, ShanghaiTech University, Shanghai, 200031, P. R. China.
| |
Collapse
|
15
|
Li J, Zhong W, Li R, Wu R. A FAST ALGORITHM FOR DETECTING GENE-GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION STUDIES. Ann Appl Stat 2014; 8:2292-2318. [PMID: 26457126 DOI: 10.1214/14-aoas771] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
With the recent advent of high-throughput genotyping techniques, genetic data for genome-wide association studies (GWAS) have become increasingly available, which entails the development of efficient and effective statistical approaches. Although many such approaches have been developed and used to identify single-nucleotide polymorphisms (SNPs) that are associated with complex traits or diseases, few are able to detect gene-gene interactions among different SNPs. Genetic interactions, also known as epistasis, have been recognized to play a pivotal role in contributing to the genetic variation of phenotypic traits. However, because of an extremely large number of SNP-SNP combinations in GWAS, the model dimensionality can quickly become so overwhelming that no prevailing variable selection methods are capable of handling this problem. In this paper, we present a statistical framework for characterizing main genetic effects and epistatic interactions in a GWAS study. Specifically, we first propose a two-stage sure independence screening (TS-SIS) procedure and generate a pool of candidate SNPs and interactions, which serve as predictors to explain and predict the phenotypes of a complex trait. We also propose a rates adjusted thresholding estimation (RATE) approach to determine the size of the reduced model selected by an independence screening. Regularization regression methods, such as LASSO or SCAD, are then applied to further identify important genetic effects. Simulation studies show that the TS-SIS procedure is computationally efficient and has an outstanding finite sample performance in selecting potential SNPs as well as gene-gene interactions. We apply the proposed framework to analyze an ultrahigh-dimensional GWAS data set from the Framingham Heart Study, and select 23 active SNPs and 24 active epistatic interactions for the body mass index variation. It shows the capability of our procedure to resolve the complexity of genetic control.
Collapse
Affiliation(s)
- Jiahan Li
- Department of Applied and Computational Mathematics and Statistics University of Notre Dame Notre Dame, Indiana 46556 USA
| | - Wei Zhong
- Institute for Studies in Economics Department of Statistics School of Economics Fujian Key Laboratory of Statistical Science Xiamen University Xiamen, Fujian 361005 China
| | - Runze Li
- The Methodology Center Department of Statistics Pennsylvania State University University Park, Pennsylvania 16802 USA
| | - Rongling Wu
- Center for Statistical Genetics Pennsylvania State University Hershey, Pennsylvania 17033 USA
| |
Collapse
|
16
|
Abstract
Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.
Collapse
|
17
|
Li Q, Ma Y, Sang W, Cui W, Li X, Liu X, Zhang W. Five common haplotype-tagging variants of adiponectin (ADIPOQ) and cancer susceptibility: a meta-analysis. Genet Test Mol Biomarkers 2014; 18:417-24. [PMID: 24720830 DOI: 10.1089/gtmb.2013.0493] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
AIMS The relationship between common haplotype-tagging polymorphisms (rs266729 [11365C>G], rs822395 [-4034A>C], rs822396 [-3964A>G], rs2241766 [45T>G], and rs1501299 [276G>T]) in the ADIPOQ gene and cancer risk has been investigated in different ethnic groups; however, these studies have yielded contradictory results. With this in mind, this meta-analysis was performed in an attempt to draw a more precise conclusion regarding the association between ADIPOQ polymorphisms and cancer risk. RESULTS In this study, with a total of 19 eligible articles consisting of 52 studies, the pooled odds ratios (ORs) for the association between ADIPOQ rs1501299 and cancer risk were statistically significant (dominant model, TT/GT vs. GG, OR=0.84, 95% confidence interval [CI]: 0.77-0.92; homozygous model, TT vs. GG, OR=0.80, 95% CI: 0.68-0.94). These results suggested that ADIPOQ rs1501299 might be a protection-associated polymorphism in cancer. The stratified analyses indicated that the variant T allele of ADIPOQ rs1501299 was associated with decreased risk of cancer in both Caucasian and Asian populations when compared with the G allele. No significant association for the rest of the polymorphisms was observed under any genetic model. CONCLUSIONS This meta-analysis suggests that the ADIPOQ rs1501299 may be a protective factor for carcinogenesis.
Collapse
Affiliation(s)
- Qiaoxin Li
- 1 Department of Pathology, First Affiliated Hospital, Xinjiang Medical University , Urumqi, China
| | | | | | | | | | | | | |
Collapse
|
18
|
Yi N, Xu S, Lou XY, Mallick H. Multiple comparisons in genetic association studies: a hierarchical modeling approach. Stat Appl Genet Mol Biol 2014; 13:35-48. [PMID: 24259248 PMCID: PMC5003626 DOI: 10.1515/sagmb-2012-0040] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Multiple comparisons or multiple testing has been viewed as a thorny issue in genetic association studies aiming to detect disease-associated genetic variants from a large number of genotyped variants. We alleviate the problem of multiple comparisons by proposing a hierarchical modeling approach that is fundamentally different from the existing methods. The proposed hierarchical models simultaneously fit as many variables as possible and shrink unimportant effects towards zero. Thus, the hierarchical models yield more efficient estimates of parameters than the traditional methods that analyze genetic variants separately, and also coherently address the multiple comparisons problem due to largely reducing the effective number of genetic effects and the number of statistically "significant" effects. We develop a method for computing the effective number of genetic effects in hierarchical generalized linear models, and propose a new adjustment for multiple comparisons, the hierarchical Bonferroni correction, based on the effective number of genetic effects. Our approach not only increases the power to detect disease-associated variants but also controls the Type I error. We illustrate and evaluate our method with real and simulated data sets from genetic association studies. The method has been implemented in our freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
Collapse
Affiliation(s)
- Nengjun Yi
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL 35294
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | - Xiang-Yang Lou
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL 35294
| | - Himel Mallick
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL 35294
| |
Collapse
|
19
|
Hutter CM, Mechanic LE, Chatterjee N, Kraft P, Gillanders EM. Gene-environment interactions in cancer epidemiology: a National Cancer Institute Think Tank report. Genet Epidemiol 2013; 37:643-57. [PMID: 24123198 PMCID: PMC4143122 DOI: 10.1002/gepi.21756] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Revised: 08/06/2013] [Accepted: 08/14/2013] [Indexed: 01/04/2023]
Abstract
Cancer risk is determined by a complex interplay of genetic and environmental factors. Genome-wide association studies (GWAS) have identified hundreds of common (minor allele frequency [MAF] > 0.05) and less common (0.01 < MAF < 0.05) genetic variants associated with cancer. The marginal effects of most of these variants have been small (odds ratios: 1.1-1.4). There remain unanswered questions on how best to incorporate the joint effects of genes and environment, including gene-environment (G × E) interactions, into epidemiologic studies of cancer. To help address these questions, and to better inform research priorities and allocation of resources, the National Cancer Institute sponsored a "Gene-Environment Think Tank" on January 10-11, 2012. The objective of the Think Tank was to facilitate discussions on (1) the state of the science, (2) the goals of G × E interaction studies in cancer epidemiology, and (3) opportunities for developing novel study designs and analysis tools. This report summarizes the Think Tank discussion, with a focus on contemporary approaches to the analysis of G × E interactions. Selecting the appropriate methods requires first identifying the relevant scientific question and rationale, with an important distinction made between analyses aiming to characterize the joint effects of putative or established genetic and environmental factors and analyses aiming to discover novel risk factors or novel interaction effects. Other discussion items include measurement error, statistical power, significance, and replication. Additional designs, exposure assessments, and analytical approaches need to be considered as we move from the current small number of success stories to a fuller understanding of the interplay of genetic and environmental factors.
Collapse
Affiliation(s)
- Carolyn M Hutter
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | | | | | | | | |
Collapse
|
20
|
Yi N, Ma S. Hierarchical shrinkage priors and model fitting for high-dimensional generalized linear models. Stat Appl Genet Mol Biol 2012; 11:/j/sagmb.2012.11.issue-6/1544-6115.1803/1544-6115.1803.xml. [PMID: 23192052 PMCID: PMC3658361 DOI: 10.1515/1544-6115.1803] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract Genetic and other scientific studies routinely generate very many predictor variables, which can be naturally grouped, with predictors in the same groups being highly correlated. It is desirable to incorporate the hierarchical structure of the predictor variables into generalized linear models for simultaneous variable selection and coefficient estimation. We propose two prior distributions: hierarchical Cauchy and double-exponential distributions, on coefficients in generalized linear models. The hierarchical priors include both variable-specific and group-specific tuning parameters, thereby not only adopting different shrinkage for different coefficients and different groups but also providing a way to pool the information within groups. We fit generalized linear models with the proposed hierarchical priors by incorporating flexible expectation-maximization (EM) algorithms into the standard iteratively weighted least squares as implemented in the general statistical package R. The methods are illustrated with data from an experiment to identify genetic polymorphisms for survival of mice following infection with Listeria monocytogenes. The performance of the proposed procedures is further assessed via simulation studies. The methods are implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
Collapse
Affiliation(s)
- Nengjun Yi
- University of Alabama, Birmingham, AL, USA
| | | |
Collapse
|
21
|
Xu Y, He B, Pan Y, Gu L, Nie Z, Chen L, Li R, Gao T, Wang S. The roles of ADIPOQ genetic variations in cancer risk: evidence from published studies. Mol Biol Rep 2012; 40:1135-44. [PMID: 23065236 DOI: 10.1007/s11033-012-2154-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2012] [Accepted: 10/04/2012] [Indexed: 12/19/2022]
Abstract
Adiponectin produced by adipose tissue, which is involved in complex diseases related to obesity, such as cancer. Genetic variations in ADIPOQ are thought to influence the activity of adiponectin, thus relating to cancer occurrence. However, epidemiological results were inconsistent. To examine this controversy, we assessed reported studies of association between ADIPOQ polymorphisms and cancer risk. Relevant studies were selected by PUBMED, EMBASE update to January 12th, 2012. According to the acceptance and exclusion criteria, 15 studies involved three polymorphisms (rs266729, rs2241766, rs1501299) of ADIPOQ were included. Summary odds ratio (ORs) and 95 % confidence intervals (CIs) were calculated using random-effect or fixed-effect models based on the heterogeneity of included studies. A total of 15 case-control studies related rs266729 (5,615 cases and 6,425 controls), rs2241766 (5,318 cases and 6,118 controls) and rs1501299 (3,751 cases and 5,104 controls) were included to analyze the ADIPOQ polymorphisms and cancer risk. For rs1501299, T allele was associated with decreased cancer risk. In addition, cancer type subgroup analysis revealed T allele was associated with decreased colorectal and prostate cancer risk. Ethnicity subgroup analysis observed a decreased risk in both Asian and Caucasian descendents. As to rs2241766, a borderline decreased cancer risk was observed. This meta-analysis indicated T allele of rs1501299 was an obvious protection factor for cancer risk, and G allele of rs2241766 was a potential protection factor for cancer risk, especially in Caucasian descendents. Further studies should be performed to clarify the roles of ADIPOQ polymorphisms in the cancer risk.
Collapse
Affiliation(s)
- Yeqiong Xu
- Central Laboratory of Nanjing First Hospital, Nanjing Medical University, 68 Changle Road, Nanjing, 210006, China
| | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Yi N, Liu N, Zhi D, Li J. Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects. PLoS Genet 2011; 7:e1002382. [PMID: 22144906 PMCID: PMC3228815 DOI: 10.1371/journal.pgen.1002382] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 09/29/2011] [Indexed: 12/19/2022] Open
Abstract
Complex diseases and traits are likely influenced by many common and rare genetic variants and environmental factors. Detecting disease susceptibility variants is a challenging task, especially when their frequencies are low and/or their effects are small or moderate. We propose here a comprehensive hierarchical generalized linear model framework for simultaneously analyzing multiple groups of rare and common variants and relevant covariates. The proposed hierarchical generalized linear models introduce a group effect and a genetic score (i.e., a linear combination of main-effect predictors for genetic variants) for each group of variants, and jointly they estimate the group effects and the weights of the genetic scores. This framework includes various previous methods as special cases, and it can effectively deal with both risk and protective variants in a group and can simultaneously estimate the cumulative contribution of multiple variants and their relative importance. Our computational strategy is based on extending the standard procedure for fitting generalized linear models in the statistical software R to the proposed hierarchical models, leading to the development of stable and flexible tools. The methods are illustrated with sequence data in gene ANGPTL4 from the Dallas Heart Study. The performance of the proposed procedures is further assessed via simulation studies. The methods are implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
Collapse
Affiliation(s)
- Nengjun Yi
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama, USA.
| | | | | | | |
Collapse
|
23
|
Kaklamani V, Yi N, Zhang K, Sadim M, Offit K, Oddoux C, Ostrer H, Mantzoros C, Pasche B. Polymorphisms of ADIPOQ and ADIPOR1 and prostate cancer risk. Metabolism 2011; 60:1234-43. [PMID: 21397927 PMCID: PMC3134585 DOI: 10.1016/j.metabol.2011.01.005] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Revised: 01/09/2011] [Accepted: 01/17/2011] [Indexed: 11/20/2022]
Abstract
Studies have linked prostate cancer risk with insulin resistance and obesity. Circulating levels of adiponectin, a protein involved in insulin resistance and obesity, have been associated with prostate cancer risk. We studied the association of prostate cancer risk with haplotype tagging single nucleotide polymorphisms (SNPs) of the adiponectin (ADIPOQ) and adiponectin receptor 1 (ADIPOR1) chosen based on their functional relevance or association with other types of cancer. DNA samples from 465 cases and 441 healthy volunteers from New York City were genotyped for ADIPOQ rs266729, rs822395, rs822396, rs1501299, and rs2241766 SNPs and ADIPOR1 rs12733285, rs1342387, rs7539542, rs2232853, and rs10920531 SNPs. We performed both single- and multiple-SNP analyses. We found that rs12733285, rs7539452, rs266729, rs822395, rs822396, and rs1501299 were significantly associated with prostate cancer risk. Haplotype analysis confirmed these results and identified 5 ADIPOQ 4-SNP haplotypes and 1 ADIPOR1 2-SNP haplotype tightly associated with prostate cancer risk. Importantly, 2 ADIPOQ SNPs, rs266729 and rs1501299, have been previously associated with colon and breast cancer risk, respectively, in the same direction as in this study. These findings suggest that variants of the adiponectin pathway may be associated with susceptibility to various forms of common cancers and warrant validation studies.
Collapse
Affiliation(s)
- Virginia Kaklamani
- Cancer Genetics Program, Division of Hematology/Oncology, Department of Medicine and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611
| | - Nengjun Yi
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294
| | - Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294
| | - Maureen Sadim
- Cancer Genetics Program, Division of Hematology/Oncology, Department of Medicine and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611
| | - Kenneth Offit
- Clinical Genetics Service, Memorial Sloan-Kettering Cancer Center, 1275 York Ave, New York, NY 10021
| | - Carole Oddoux
- Human Genetics Program, Department of Pediatrics, New York University Medical Center, New York, NY 10016
| | - Harry Ostrer
- Human Genetics Program, Department of Pediatrics, New York University Medical Center, New York, NY 10016
| | - Christos Mantzoros
- Division of Endocrinology and Metabolism, Department of Medicine, Beth Israel Deaconess Medical Center (BIDMC), Harvard Medical School, 330 Brookline Avenue, Stoneman 816, Boston, MA 02215
| | - Boris Pasche
- Division of Hematology/Oncology and Comprehensive Cancer Center, University of Alabama, Birmingham, AL 35294
| |
Collapse
|
24
|
Abstract
Many common human diseases and complex traits are highly heritable and influenced by multiple genetic and environmental factors. Although genome-wide association studies (GWAS) have successfully identified many disease-associated variants, these genetic variants explain only a small proportion of the heritability of most complex diseases. Genetic interactions (gene-gene and gene-environment) substantially contribute to complex traits and diseases and could be one of the main sources of the missing heritability. This paper provides an overview of the available statistical methods and related computer software for identifying genetic interactions in animal and plant experimental crosses and human genetic association studies. The main discussion falls under the three broad issues in statistical analysis of genetic interactions: the definition, detection and interpretation of genetic interactions. Recently developed methods based on modern techniques for high-dimensional data are reviewed, including penalized likelihood approaches and hierarchical models; the relationships between these methods are also discussed. I conclude this review by highlighting some areas of future research.
Collapse
|
25
|
Li J, Zhang K, Yi N. A Bayesian hierarchical model for detecting haplotype-haplotype and haplotype-environment interactions in genetic association studies. Hum Hered 2011; 71:148-60. [PMID: 21778734 DOI: 10.1159/000324841] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2010] [Accepted: 02/03/2011] [Indexed: 12/27/2022] Open
Abstract
OBJECTIVE Genetic association studies based on haplotypes are powerful in the discovery and characterization of the genetic basis of complex human diseases. However, statistical methods for detecting haplotype-haplotype and haplotype-environment interactions have not yet been fully developed owing to the difficulties encountered: large numbers of potential haplotypes and unknown haplotype pairs. Furthermore, methods for detecting the association between rare haplotypes and disease have not kept pace with their counterpart of common haplotypes. METHODS/RESULTS We herein propose an efficient and robust method to tackle these problems based on a Bayesian hierarchical generalized linear model. Our model simultaneously fits environmental effects, main effects of numerous common and rare haplotypes, and haplotype-haplotype and haplotype-environment interactions. The key to the approach is the use of a continuous prior distribution on coefficients that favors sparseness in the fitted model and facilitates computation. We develop a fast expectation-maximization algorithm to fit models by estimating posterior modes of coefficients. We incorporate our algorithm into the iteratively weighted least squares for classical generalized linear models as implemented in the R package glm. We evaluate the proposed method and compare its performance to existing methods on extensive simulated data. CONCLUSION The results show that the proposed method performs well under all situations and is more powerful than existing approaches.
Collapse
Affiliation(s)
- Jun Li
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL 35294-0022, USA
| | | | | |
Collapse
|
26
|
Arcaroli JJ, Liu N, Yi N, Abraham E. Association between IL-32 genotypes and outcome in infection-associated acute lung injury. CRITICAL CARE : THE OFFICIAL JOURNAL OF THE CRITICAL CARE FORUM 2011; 15:R138. [PMID: 21649914 PMCID: PMC3219007 DOI: 10.1186/cc10258] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Revised: 04/21/2011] [Accepted: 06/07/2011] [Indexed: 01/11/2023]
Abstract
INTRODUCTION Our purpose was to investigate variation within the IL-32 promoter and gene, and susceptibility to and outcomes from infection associated acute lung injury (ALI). METHODS Retrospective case-control study involving healthy individuals (controls) and patients (cases) with infection-associated ALI. Two hundred fifty-eight healthy normal controls and 251 patients with infection-associated ALI were used for comparison. The IL-32 promoter/gene was sequenced in 52 healthy Caucasian individuals to identify single nucleotide polymorphisms (SNPs). Allelic discrimination was performed on 11 SNPs to determine differences between cases and controls and outcomes in patients with infection associated ALI. RESULTS Logistic and normal regression models were used to evaluate the associations with SNPs in cases and controls, and outcomes in patients with infection associated ALI. rs12934561, an intronic SNP, was found to be associated with risk for ALI in the case-control study and with more severe clinical course, as shown by increased time on the ventilator and the presence of fluid unresponsive hypotension. Further, it was found that rs12934561 has gender-specific effects and strongly interacts with other SNPs. CONCLUSIONS A common IL-32 genotype, rs12934561, is associated with the risk of ALI as well as the need for prolonged mechanical ventilatory support. This finding suggests that IL-32 is not only involved in the initiating inflammatory and cellular events that result in ALI, but also participates in determining the severity of pulmonary dysfunction associated with ALI.
Collapse
Affiliation(s)
- John J Arcaroli
- Department of Medicine, University of Alabama at Birmingham, 1530 3rd Avenue South, Birmingham, AL 35294-0012, USA
| | | | | | | |
Collapse
|
27
|
Kaklamani V, Yi N, Sadim M, Siziopikou K, Zhang K, Xu Y, Tofilon S, Agarwal S, Pasche B, Mantzoros C. The role of the fat mass and obesity associated gene (FTO) in breast cancer risk. BMC MEDICAL GENETICS 2011; 12:52. [PMID: 21489227 PMCID: PMC3089782 DOI: 10.1186/1471-2350-12-52] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Accepted: 04/13/2011] [Indexed: 12/29/2022]
Abstract
BACKGROUND Obesity has been shown to increase breast cancer risk. FTO is a novel gene which has been identified through genome wide association studies (GWAS) to be related to obesity. Our objective was to evaluate tissue expression of FTO in breast and the role of FTO SNPs in predicting breast cancer risk. METHODS We performed a case-control study of 354 breast cancer cases and 364 controls. This study was conducted at Northwestern University. We examined the role of single nucleotide polymorphisms (SNPs) of intron 1 of FTO in breast cancer risk. We genotyped cases and controls for four SNPs: rs7206790, rs8047395, rs9939609 and rs1477196. We also evaluated tissue expression of FTO in normal and malignant breast tissue. RESULTS We found that all SNPs were significantly associated with breast cancer risk with rs1477196 showing the strongest association. We showed that FTO is expressed both in normal and malignant breast tissue. We found that FTO genotypes provided powerful classifiers to predict breast cancer risk and a model with epistatic interactions further improved the prediction accuracy with a receiver operating characteristic (ROC) curves of 0.68. CONCLUSION In conclusion we have shown a significant expression of FTO in malignant and normal breast tissue and that FTO SNPs in intron 1 are significantly associated with breast cancer risk. Furthermore, these FTO SNPs are powerful classifiers in predicting breast cancer risk.
Collapse
Affiliation(s)
- Virginia Kaklamani
- Cancer Genetics Program, Division of Hematology/Oncology, Department of Medicine and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 676 N St Clair st suite 850, Chicago, IL 60611,USA
| | - Nengjun Yi
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, 1665 University Blvd, Ryals Bldg. 317F, Birmingham, AL 35294, USA
| | - Maureen Sadim
- Cancer Genetics Program, Division of Hematology/Oncology, Department of Medicine and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 303 E Superior Street, Chicago, IL60611, USA
| | - Kalliopi Siziopikou
- Department of Pathology, Northwestern Memorial Hospital, 675 N St Clair st, Chicago, IL 60611, USA
| | - Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, 1665 University Blvd, Ryals Bldg. 327H, Birmingham, AL 35294, USA
| | - Yanfei Xu
- Division of Hematology/Oncology, Department of Medicine and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 303 E Superior Street, Chicago, IL 60611, USA
| | - Sarah Tofilon
- Department of Medicine and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 303 E Superior Street, Chicago, IL 60611, USA
| | - Surbhi Agarwal
- Department of Medicine and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 303 E Superior Street, Chicago, IL 60611, USA
| | - Boris Pasche
- Division of Hematology/Oncology and Comprehensive Cancer Center, University of Alabama, 1802 6th Avenue South, NP 2566, Birmingham, AL35294, USA
| | - Christos Mantzoros
- Division of Endocrinology and Metabolism, Department of Medicine, Beth Israel Deaconess Medical Center (BIDMC), Harvard Medical School, 330 Brookline Avenue FD-876, Boston, MA 02215, USA
| |
Collapse
|
28
|
Yi N, Zhi D. Bayesian analysis of rare variants in genetic association studies. Genet Epidemiol 2011; 35:57-69. [PMID: 21181897 DOI: 10.1002/gepi.20554] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recent advances in next-generation sequencing technologies facilitate the detection of rare variants, making it possible to uncover the roles of rare variants in complex diseases. As any single rare variants contain little variation, association analysis of rare variants requires statistical methods that can effectively combine the information across variants and estimate their overall effect. In this study, we propose a novel Bayesian generalized linear model for analyzing multiple rare variants within a gene or genomic region in genetic association studies. Our model can deal with complicated situations that have not been fully addressed by existing methods, including issues of disparate effects and nonfunctional variants. Our method jointly models the overall effect and the weights of multiple rare variants and estimates them from the data. This approach produces different weights to different variants based on their contributions to the phenotype, yielding an effective summary of the information across variants. We evaluate the proposed method and compare its performance to existing methods on extensive simulated data. The results show that the proposed method performs well under all situations and is more powerful than existing approaches.
Collapse
Affiliation(s)
- Nengjun Yi
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294-0022, USA.
| | | |
Collapse
|