1
|
Sha Z, Freda PJ, Bhandary P, Ghosh A, Matsumoto N, Moore JH, Hu T. Distinct Network Patterns Emerge from Cartesian and XOR Epistasis Models: A Comparative Network Science Analysis. RESEARCH SQUARE 2024:rs.3.rs-4392123. [PMID: 38826481 PMCID: PMC11142370 DOI: 10.21203/rs.3.rs-4392123/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Background Epistasis, the phenomenon where the effect of one gene (or variant) is masked or modified by one or more other genes, can significantly contribute to the observed phenotypic variance of complex traits. To date, it has been generally assumed that genetic interactions can be detected using a Cartesian, or multiplicative, interaction model commonly utilized in standard regression approaches. However, a recent study investigating epistasis in obesity-related traits in rats and mice has identified potential limitations of the Cartesian model, revealing that it only detects some of the genetic interactions occurring in these systems. By applying an alternative approach, the exclusive-or (XOR) model, the researchers detected a greater number of epistatic interactions and identified more biologically relevant ontological terms associated with the interacting loci. This suggests that the XOR model may provide a more comprehensive understanding of epistasis in these species and phenotypes. To further explore these findings and determine if different interaction models also make up distinct epistatic networks, we leverage network science to provide a more comprehensive view into the genetic interactions underlying BMI in this system. Results Our comparative analysis of networks derived from Cartesian and XOR interaction models in rats (Rattus norvegicus) uncovers distinct topological characteristics for each model-derived network. Notably, we discover that networks based on the XOR model exhibit an enhanced sensitivity to epistatic interactions. This sensitivity enables the identification of network communities, revealing novel trait-related biological functions through enrichment analysis. Furthermore, we identify triangle network motifs in the XOR epistatic network, suggestive of higher-order epistasis, based on the topology of lower-order epistasis. Conclusions These findings highlight the XOR model's ability to uncover meaningful biological associations as well as higher-order epistasis from lower-order epistatic networks. Additionally, our results demonstrate that network approaches not only enhance epistasis detection capabilities but also provide more nuanced understandings of genetic architectures underlying complex traits. The identification of community structures and motifs within these distinct networks, especially in XOR, points to the potential for network science to aid in the discovery of novel genetic pathways and regulatory networks. Such insights are important for advancing our understanding of phenotype-genotype relationships.
Collapse
Affiliation(s)
- Zhendong Sha
- School of Computing, Queen’s University, 557 Goodwin Hall, 21-25 Union St, Kingston, Ontario, K7L 2N8, Canada
| | - Philip J. Freda
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Priyanka Bhandary
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Attri Ghosh
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Nicholas Matsumoto
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Jason H. Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Ting Hu
- School of Computing, Queen’s University, 557 Goodwin Hall, 21-25 Union St, Kingston, Ontario, K7L 2N8, Canada
| |
Collapse
|
2
|
Ma J, Li J, Chen Y, Yang Z, He Y. Poor statistical power in population-based association study of gene interaction. BMC Med Genomics 2024; 17:111. [PMID: 38678264 PMCID: PMC11055307 DOI: 10.1186/s12920-024-01884-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 04/19/2024] [Indexed: 04/29/2024] Open
Abstract
BACKGROUND Statistical epistasis, or "gene-gene interaction" in genetic association studies, means the nonadditive effects between the polymorphic sites on two different genes affecting the same phenotype. In the genetic association analysis of complex traits, nevertheless, the researchers haven't found enough clues of statistical epistasis so far. METHODS We developed a statistical model where the statistical epistasis was presented as an extra linkage disequilibrium between the polymorphic sites of different risk genes. The power of statistical test for identifying the gene-gene interaction was calculated and then compared in different hypothesis scenarios. RESULTS Our results show the statistical power increases with the increasing of interaction coefficient, relative risk, and linkage disequilibrium with genetic markers. However, the power of interaction discovery is much lower than that of regular single-site association test. When rigorous criteria were employed in statistical tests, the identification of gene-gene interaction became a very difficult task. Since the criterion of significance was given to be p-value ≤ 5.0 × 10-8, the same as that of many genome-wide association studies, there is little chance to identify the gene-gene interaction in all kind of circumstances. CONCLUSIONS The lack of epistasis tends to be an inevitable result caused by the statistical principles of methods in the genetic association studies and therefore is the inherent characteristic of the research itself.
Collapse
Affiliation(s)
- Jiarui Ma
- Shanghai Key Laboratory of Medical Epigenetics, International Co-Laboratory of Medical Epigenetics and Metabolism (Ministry of Science and Technology), Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Jian Li
- Shanghai Key Laboratory of Medical Epigenetics, International Co-Laboratory of Medical Epigenetics and Metabolism (Ministry of Science and Technology), Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Yuqi Chen
- Shanghai Key Laboratory of Medical Epigenetics, International Co-Laboratory of Medical Epigenetics and Metabolism (Ministry of Science and Technology), Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Zhen Yang
- Center for Medical Research and Innovation of Pudong Hospital, Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
| | - Yungang He
- Shanghai Fifth People's Hospital, Intelligent Medicine Institute, Fudan University, Shanghai, 200032, PR China.
| |
Collapse
|
3
|
Batista S, Madar VS, Freda PJ, Bhandary P, Ghosh A, Matsumoto N, Chitre AS, Palmer AA, Moore JH. Interaction models matter: an efficient, flexible computational framework for model-specific investigation of epistasis. BioData Min 2024; 17:7. [PMID: 38419006 PMCID: PMC10900690 DOI: 10.1186/s13040-024-00358-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 02/20/2024] [Indexed: 03/02/2024] Open
Abstract
PURPOSE Epistasis, the interaction between two or more genes, is integral to the study of genetics and is present throughout nature. Yet, it is seldom fully explored as most approaches primarily focus on single-locus effects, partly because analyzing all pairwise and higher-order interactions requires significant computational resources. Furthermore, existing methods for epistasis detection only consider a Cartesian (multiplicative) model for interaction terms. This is likely limiting as epistatic interactions can evolve to produce varied relationships between genetic loci, some complex and not linearly separable. METHODS We present new algorithms for the interaction coefficients for standard regression models for epistasis that permit many varied models for the interaction terms for loci and efficient memory usage. The algorithms are given for two-way and three-way epistasis and may be generalized to higher order epistasis. Statistical tests for the interaction coefficients are also provided. We also present an efficient matrix based algorithm for permutation testing for two-way epistasis. We offer a proof and experimental evidence that methods that look for epistasis only at loci that have main effects may not be justified. Given the computational efficiency of the algorithm, we applied the method to a rat data set and mouse data set, with at least 10,000 loci and 1,000 samples each, using the standard Cartesian model and the XOR model to explore body mass index. RESULTS This study reveals that although many of the loci found to exhibit significant statistical epistasis overlap between models in rats, the pairs are mostly distinct. Further, the XOR model found greater evidence for statistical epistasis in many more pairs of loci in both data sets with almost all significant epistasis in mice identified using XOR. In the rat data set, loci involved in epistasis under the XOR model are enriched for biologically relevant pathways. CONCLUSION Our results in both species show that many biologically relevant epistatic relationships would have been undetected if only one interaction model was applied, providing evidence that varied interaction models should be implemented to explore epistatic interactions that occur in living systems.
Collapse
Affiliation(s)
- Sandra Batista
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA.
| | | | - Philip J Freda
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA
| | - Priyanka Bhandary
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA
| | - Attri Ghosh
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA
| | - Nicholas Matsumoto
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California, San Diego, 9500 Gilman Dr., Mailcode: 0667, La Jolla, CA, 92093-0667, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California, San Diego, 9500 Gilman Dr., Mailcode: 0667, La Jolla, CA, 92093-0667, USA
- Institute for Genomic Medicine, University of California, San Diego, 9500 Gilman Dr., Mailcode: 0667, La Jolla, CA, 92093-0667, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA.
| |
Collapse
|
4
|
Yang CH, Hou MF, Chuang LY, Yang CS, Lin YD. Dimensionality reduction approach for many-objective epistasis analysis. Brief Bioinform 2023; 24:6858949. [PMID: 36458451 DOI: 10.1093/bib/bbac512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/07/2022] [Accepted: 10/26/2022] [Indexed: 12/04/2022] Open
Abstract
In epistasis analysis, single-nucleotide polymorphism-single-nucleotide polymorphism interactions (SSIs) among genes may, alongside other environmental factors, influence the risk of multifactorial diseases. To identify SSI between cases and controls (i.e. binary traits), the score for model quality is affected by different objective functions (i.e. measurements) because of potential disease model preferences and disease complexities. Our previous study proposed a multiobjective approach-based multifactor dimensionality reduction (MOMDR), with the results indicating that two objective functions could enhance SSI identification with weak marginal effects. However, SSI identification using MOMDR remains a challenge because the optimal measure combination of objective functions has yet to be investigated. This study extended MOMDR to the many-objective version (i.e. many-objective MDR, MaODR) by integrating various disease probability measures based on a two-way contingency table to improve the identification of SSI between cases and controls. We introduced an objective function selection approach to determine the optimal measure combination in MaODR among 10 well-known measures. In total, 6 disease models with and 40 disease models without marginal effects were used to evaluate the general algorithms, namely those based on multifactor dimensionality reduction, MOMDR and MaODR. Our results revealed that the MaODR-based three objective function model, correct classification rate, likelihood ratio and normalized mutual information (MaODR-CLN) exhibited the higher 6.47% detection success rates (Accuracy) than MOMDR and higher 17.23% detection success rates than MDR through the application of an objective function selection approach. In a Wellcome Trust Case Control Consortium, MaODR-CLN successfully identified the significant SSIs (P < 0.001) associated with coronary artery disease. We performed a systematic analysis to identify the optimal measure combination in MaODR among 10 objective functions. Our combination detected SSIs-based binary traits with weak marginal effects and thus reduced spurious variables in the score model. MOAI is freely available at https://sites.google.com/view/maodr/home.
Collapse
Affiliation(s)
- Cheng-Hong Yang
- Department of Information Management at the Tainan University of Technology, and at the Department of Electronic Engineering at National Kaohsiung of Science and Technology, Taiwan.,Biomedical Engineering, Kaohsiung Medical University, Taiwan
| | - Ming-Feng Hou
- Kaohsiung Medical University Hospital, and Professor at the Department of Surgery, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Li-Yeh Chuang
- Department of Chemical Engineering & Institute of Biotechnology and Chemical Engineering at I-Shou University, Taiwan
| | - Cheng-San Yang
- Department of Plastic Surgery, and serves as the Medical Matters Secretary of Chia-Yi Christian Hospital, Taiwan
| | - Yu-Da Lin
- Department of Computer Science and Information Engineering, and at the National Penghu University of Science and Technology, Taiwan
| |
Collapse
|
5
|
Pudjihartono N, Fadason T, Kempa-Liehr AW, O'Sullivan JM. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. FRONTIERS IN BIOINFORMATICS 2022; 2:927312. [PMID: 36304293 PMCID: PMC9580915 DOI: 10.3389/fbinf.2022.927312] [Citation(s) in RCA: 75] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 06/03/2022] [Indexed: 01/14/2023] Open
Abstract
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called “curse of dimensionality” (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most “informative” features and remove noisy “non-informative,” irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
Collapse
Affiliation(s)
| | - Tayaza Fadason
- Liggins Institute, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| | - Andreas W. Kempa-Liehr
- Department of Engineering Science, The University of Auckland, Auckland, New Zealand
- *Correspondence: Andreas W. Kempa-Liehr, ; Justin M. O'Sullivan,
| | - Justin M. O'Sullivan
- Liggins Institute, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Australian Parkinson’s Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia
- *Correspondence: Andreas W. Kempa-Liehr, ; Justin M. O'Sullivan,
| |
Collapse
|
6
|
Martins J, Yusupov N, Binder EB, Brückl TM, Czamara D. Early adversity as the prototype gene × environment interaction in mental disorders? Pharmacol Biochem Behav 2022; 215:173371. [PMID: 35271857 DOI: 10.1016/j.pbb.2022.173371] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 02/03/2022] [Accepted: 02/28/2022] [Indexed: 10/18/2022]
Abstract
Childhood adversity (CA) as a significant stressor has consistently been associated with the development of mental disorders. The interaction between CA and genetic variants has been proposed to play a substantial role in disease etiology. In this review, we focus on the gene by environment (GxE) paradigm, its background and interpretation and stress the necessity of its implementation in psychiatric research. Further, we discuss the findings supporting GxCA interactions, ranging from candidate gene studies to polygenic and genome-wide approaches, their strengths and limitations. To illustrate potential underlying epigenetic mechanisms by which GxE effects are translated, we focus on results from FKBP5 × CA studies and discuss how molecular evidence can supplement previous GxE findings. In conclusion, while GxE studies constitute a valuable line of investigation, more harmonized GxE studies in large, deep-phenotyped, longitudinal cohorts, and across different developmental stages are necessary to further substantiate and understand reported GxE findings.
Collapse
Affiliation(s)
- Jade Martins
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich 80804, Germany.
| | - Natan Yusupov
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich 80804, Germany; International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich, Germany
| | - Elisabeth B Binder
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich 80804, Germany; Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, GA 30329, USA
| | - Tanja M Brückl
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - Darina Czamara
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich 80804, Germany
| |
Collapse
|
7
|
Yilmaz S, Fakhouri M, Koyutürk M, Çiçek AE, Tastan O. Uncovering complementary sets of variants for predicting quantitative phenotypes. Bioinformatics 2022; 38:908-917. [PMID: 34864867 DOI: 10.1093/bioinformatics/btab803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/21/2021] [Accepted: 11/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Genome-wide association studies show that variants in individual genomic loci alone are not sufficient to explain the heritability of complex, quantitative phenotypes. Many computational methods have been developed to address this issue by considering subsets of loci that can collectively predict the phenotype. This problem can be considered a challenging instance of feature selection in which the number of dimensions (loci that are screened) is much larger than the number of samples. While currently available methods can achieve decent phenotype prediction performance, they either do not scale to large datasets or have parameters that require extensive tuning. RESULTS We propose a fast and simple algorithm, Macarons, to select a small, complementary subset of variants by avoiding redundant pairs that are likely to be in linkage disequilibrium. Our method features two interpretable parameters that control the time/performance trade-off without requiring parameter tuning. In our computational experiments, we show that Macarons consistently achieves similar or better prediction performance than state-of-the-art selection methods while having a simpler premise and being at least two orders of magnitude faster. Overall, Macarons can seamlessly scale to the human genome with ∼107 variants in a matter of minutes while taking the dependencies between the variants into account. AVAILABILITYAND IMPLEMENTATION Macarons is available in Matlab and Python at https://github.com/serhan-yilmaz/macarons. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Serhan Yilmaz
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Mohamad Fakhouri
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA.,Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| | - A Ercüment Çiçek
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey.,Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Oznur Tastan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| |
Collapse
|
8
|
Lin YD, Lee YC, Chiang CP, Moi SH, Kan JY. MOAI: a multi-outcome interaction identification approach reveals an interaction between vaspin and carcinoembryonic antigen on colorectal cancer prognosis. Brief Bioinform 2021; 23:6398687. [PMID: 34661627 DOI: 10.1093/bib/bbab427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 09/14/2021] [Accepted: 09/18/2021] [Indexed: 11/12/2022] Open
Abstract
Identifying and characterizing the interaction between risk factors for multiple outcomes (multi-outcome interaction) has been one of the greatest challenges faced by complex multifactorial diseases. However, the existing approaches have several limitations in identifying the multi-outcome interaction. To address this issue, we proposed a multi-outcome interaction identification approach called MOAI. MOAI was motivated by the limitations of estimating the interaction simultaneously occurring in multi-outcomes and by the success of Pareto set filter operator for identifying multi-outcome interaction. MOAI permits the identification for the interaction of multiple outcomes and is applicable in population-based study designs. Our experimental results exhibited that the existing approaches are not effectively used to identify the multi-outcome interaction, whereas MOAI obviously exhibited superior performance in identifying multi-outcome interaction. We applied MOAI to identify the interaction between risk factors for colorectal cancer (CRC) in both metastases and mortality prognostic outcomes. An interaction between vaspin and carcinoembryonic antigen (CEA) was found, and the interaction indicated that patients with CRC characterized by higher vaspin (≥30%) and CEA (≥5) levels could simultaneously increase both metastases and mortality risk. The immunostaining evidence revealed that determined multi-outcome interaction could effectively identify the difference between non-metastases/survived and metastases/deceased patients, which offers multi-prognostic outcome risk estimation for CRC. To our knowledge, this is the first report of a multi-outcome interaction associated with a complex multifactorial disease. MOAI is freely available at https://sites.google.com/view/moaitool/home.
Collapse
Affiliation(s)
- Yu-Da Lin
- Department of Computer Science and Information Engineering, National Penghu University of Science and Technology, Magong, Penghu, 880011, Taiwan
| | - Yi-Chen Lee
- Department of Anatomy at Kaohsiung Medical University, Taiwan
| | - Chih-Po Chiang
- Division of Breast Oncology and Surgery, Department of Surgery, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 80756, Taiwan
| | - Sin-Hua Moi
- Center of Cancer Program Development, E-Da Cancer Hospital, I-Shou University, Kaohsiung 824, Taiwan
| | - Jung-Yu Kan
- Division of Breast Oncology and Surgery, Department of Surgery, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung 80756, Taiwan
| |
Collapse
|
9
|
Dyson G. An Application of the Patient Rule-Induction Method to Detect Clinically Meaningful Subgroups from Failed Phase III Clinical Trials. INTERNATIONAL JOURNAL OF CLINICAL BIOSTATISTICS AND BIOMETRICS 2021; 7. [PMID: 34632463 PMCID: PMC8496893 DOI: 10.23937/2469-5831/1510038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Background Phase III superiority clinical trials have negative results (new treatment is not statistically better than standard of care) due to a number of factors, including patient and disease heterogeneity. However, even a treatment regime that fails to show population-level clinical improvement will have a subgroup of patients that attain a measurable clinical benefit. Objective The goal of this paper is to modify the Patient Rule-Induction Method to identify statistically significant subgroups, defined by clinical and/or demographic factors, of the clinical trial population where the experimental treatment performs better than the standard of care and better than observed in the entire clinical trial sample. Results We illustrate this method using part A of the SUCCESS clinical trial, which showed no overall difference between treatment arms: HR (95% CI) = 0.97 (0.78, 1.20). Using PRIM, we identified one subgroup defined by the mutational profile in BRCA1 which resulted in a significant benefit for adding Gemcitabine to the standard treatment: HR (95% CI) = 0.59 (0.40, 0.87). Conclusion This result demonstrates that useful information can be extracted from existing databases that could provide insight into why a phase III trial failed and assist in the design of future clinical trials involving the experimental treatment.
Collapse
Affiliation(s)
- Greg Dyson
- Department of Oncology, Karmanos Cancer Institute, Wayne State University, Detroit MI, USA
| |
Collapse
|
10
|
MIDESP: Mutual Information-Based Detection of Epistatic SNP Pairs for Qualitative and Quantitative Phenotypes. BIOLOGY 2021; 10:biology10090921. [PMID: 34571798 PMCID: PMC8469369 DOI: 10.3390/biology10090921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 11/17/2022]
Abstract
Simple Summary The interactions between SNPs, which are known as epistasis, can strongly influence the phenotype. Their detection is still a challenge, which is made even more difficult through the existence of background associations that can hide correct epistatic interactions. To address the limitations of existing methods, we present in this study our novel method MIDESP for the detection of epistatic SNP pairs. It is the first mutual information-based method that can be applied to both qualitative and quantitative phenotypes and which explicitly accounts for background associations in the dataset. Abstract The interactions between SNPs result in a complex interplay with the phenotype, known as epistasis. The knowledge of epistasis is a crucial part of understanding genetic causes of complex traits. However, due to the enormous number of SNP pairs and their complex relationship to the phenotype, identification still remains a challenging problem. Many approaches for the detection of epistasis have been developed using mutual information (MI) as an association measure. However, these methods have mainly been restricted to case–control phenotypes and are therefore of limited applicability for quantitative traits. To overcome this limitation of MI-based methods, here, we present an MI-based novel algorithm, MIDESP, to detect epistasis between SNPs for qualitative as well as quantitative phenotypes. Moreover, by incorporating a dataset-dependent correction technique, we deal with the effect of background associations in a genotypic dataset to separate correct epistatic interaction signals from those of false positive interactions resulting from the effect of single SNP×phenotype associations. To demonstrate the effectiveness of MIDESP, we apply it on two real datasets with qualitative and quantitative phenotypes, respectively. Our results suggest that by eliminating the background associations, MIDESP can identify important genes, which play essential roles for bovine tuberculosis or the egg weight of chickens.
Collapse
|
11
|
Okazaki A, Horpaopan S, Zhang Q, Randesi M, Ott J. Genotype Pattern Mining for Pairs of Interacting Variants Underlying Digenic Traits. Genes (Basel) 2021; 12:1160. [PMID: 34440333 PMCID: PMC8391494 DOI: 10.3390/genes12081160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/23/2021] [Accepted: 07/27/2021] [Indexed: 12/15/2022] Open
Abstract
Some genetic diseases ("digenic traits") are due to the interaction between two DNA variants, which presumably reflects biochemical interactions. For example, certain forms of Retinitis Pigmentosa, a type of blindness, occur in the presence of two mutant variants, one each in the ROM1 and RDS genes, while the occurrence of only one such variant results in a normal phenotype. Detecting variant pairs underlying digenic traits by standard genetic methods is difficult and is downright impossible when individual variants alone have minimal effects. Frequent pattern mining (FPM) methods are known to detect patterns of items. We make use of FPM approaches to find pairs of genotypes (from different variants) that can discriminate between cases and controls. Our method is based on genotype patterns of length two, and permutation testing allows assigning p-values to genotype patterns, where the null hypothesis refers to equal pattern frequencies in cases and controls. We compare different interaction search approaches and their properties on the basis of published datasets. Our implementation of FPM to case-control studies is freely available.
Collapse
Affiliation(s)
- Atsuko Okazaki
- Department of Diagnostics and Therapeutics of Intractable Diseases, Juntendo University, Bunkyo-ku, Tokyo 113-8421, Japan;
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10065, USA
| | - Sukanya Horpaopan
- Department of Anatomy, Faculty of Medical Science, Naresuan University, Phitsanulok 65000, Thailand;
| | - Qingrun Zhang
- Department of Mathematics and Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada;
| | - Matthew Randesi
- Laboratory of the Biology of Addictive Diseases, Rockefeller University, New York, NY 10065, USA;
| | - Jurg Ott
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
12
|
Yilmaz S, Tastan O, Cicek AE. SPADIS: An Algorithm for Selecting Predictive and Diverse SNPs in GWAS. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1208-1216. [PMID: 31443041 DOI: 10.1109/tcbb.2019.2935437] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Phenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants identified in genome-wide association studies (GWAS). Many methods have been developed to select a subset of variant loci, which are associated with or predictive of the phenotype. Selecting connected SNPs on SNP-SNP networks have been proven successful in finding biologically interpretable and predictive SNPs. However, we argue that the connectedness constraint favors selecting redundant features that affect similar biological processes and therefore does not necessarily yield better predictive performance. In this paper, we propose a novel method called SPADIS that favors the selection of remotely located SNPs in order to account for their complementary effects in explaining a phenotype. SPADIS selects a diverse set of loci on a SNP-SNP network. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor approximation to the optimal solution. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana with continuous flowering time phenotypes. SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when the same number of SNPs are selected and provides consistent improvements across multiple networks and settings on average. Moreover, it identifies more candidate genes and runs faster.
Collapse
|
13
|
Wu Q, Nasoz F, Jung J, Bhattarai B, Han MV, Greenes RA, Saag KG. Machine learning approaches for the prediction of bone mineral density by using genomic and phenotypic data of 5130 older men. Sci Rep 2021; 11:4482. [PMID: 33627720 PMCID: PMC7904941 DOI: 10.1038/s41598-021-83828-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 02/09/2021] [Indexed: 02/07/2023] Open
Abstract
The study aimed to utilize machine learning (ML) approaches and genomic data to develop a prediction model for bone mineral density (BMD) and identify the best modeling approach for BMD prediction. The genomic and phenotypic data of Osteoporotic Fractures in Men Study (n = 5130) was analyzed. Genetic risk score (GRS) was calculated from 1103 associated SNPs for each participant after a comprehensive genotype imputation. Data were normalized and divided into a training set (80%) and a validation set (20%) for analysis. Random forest, gradient boosting, neural network, and linear regression were used to develop BMD prediction models separately. Ten-fold cross-validation was used for hyper-parameters optimization. Mean square error and mean absolute error were used to assess model performance. When using GRS and phenotypic covariates as the predictors, all ML models' performance and linear regression in BMD prediction were similar. However, when replacing GRS with the 1103 individual SNPs in the model, ML models performed significantly better than linear regression (with lasso regularization), and the gradient boosting model performed the best. Our study suggested that ML models, especially gradient boosting, can improve BMD prediction in genomic data.
Collapse
Affiliation(s)
- Qing Wu
- Nevada Institute of Personalized Medicine, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV, 89154-4009, USA.
- Department of Epidemiology and Biostatistics, School of Public Health, University of Nevada, Las Vegas, NV, USA.
| | - Fatma Nasoz
- Department of Computer Science, University of Nevada, Las Vegas, NV, USA
- The Lincy Institute, University of Nevada, Las Vegas, NV, USA
| | - Jongyun Jung
- Nevada Institute of Personalized Medicine, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV, 89154-4009, USA
- Department of Epidemiology and Biostatistics, School of Public Health, University of Nevada, Las Vegas, NV, USA
| | - Bibek Bhattarai
- Department of Computer Science, University of Nevada, Las Vegas, NV, USA
| | - Mira V Han
- Nevada Institute of Personalized Medicine, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV, 89154-4009, USA
- School of Life Sciences, University of Nevada, Las Vegas, NV, USA
| | - Robert A Greenes
- College of Health Solutions, Arizona State University, Phoenix, AZ, USA
- Department of Health Science Research, Mayo Clinic, Scottsdale, AZ, USA
| | - Kenneth G Saag
- Department of Medicine, Division of Clinical Immunology and Rheumatology, the University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
14
|
Guo X. JS-MA: A Jensen-Shannon Divergence Based Method for Mapping Genome-Wide Associations on Multiple Diseases. Front Genet 2020; 11:507038. [PMID: 33193597 PMCID: PMC7662082 DOI: 10.3389/fgene.2020.507038] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 09/21/2020] [Indexed: 12/14/2022] Open
Abstract
Taking advantage of the high-throughput genotyping technology of Single Nucleotide Polymorphism (SNP), Genome-Wide Association Studies (GWASs) have been successfully implemented for defining the relative role of genes and the environment in disease risk, assisting in enabling preventative and precision medicine. However, current multi-locus-based methods are insufficient in terms of computational cost and discrimination power to detect statistically significant interactions with different genetic effects on multifarious diseases. Statistical tests for multi-locus interactions (≥2 SNPs) raise huge analytical challenges because computational cost increases exponentially as the growth of the cardinality of SNPs in an interaction module. In this paper, we develop a simple, fast, and powerful method, named JS-MA, based on Jensen-Shannon divergence and agglomerative hierarchical clustering, to detect the genome-wide multi-locus interactions associated with multiple diseases. From the systematical simulation, JS-MA is more powerful and efficient compared with the state-of-the-art association mapping tools. JS-MA was applied to the real GWAS datasets for two common diseases, i.e., Rheumatoid Arthritis and Type 1 Diabetes. The results showed that JS-MA not only confirmed recently reported, biologically meaningful associations, but also identified novel multi-locus interactions. Therefore, we believe that JS-MA is suitable and efficient for a full-scale analysis of multi-disease-related interactions in the large GWASs.
Collapse
Affiliation(s)
- Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, Denton, TX, United States
| |
Collapse
|
15
|
Zhou X, Chan KCC, Huang Z, Wang J. Determining dependency and redundancy for identifying gene-gene interaction associated with complex disease. J Bioinform Comput Biol 2020; 18:2050035. [PMID: 33064052 DOI: 10.1142/s0219720020500353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
As interactions among genetic variants in different genes can be an important factor for predicting complex diseases, many computational methods have been proposed to detect if a particular set of genes has interaction with a particular complex disease. However, even though many such methods have been shown to be useful, they can be made more effective if the properties of gene-gene interactions can be better understood. Towards this goal, we have attempted to uncover patterns in gene-gene interactions and the patterns reveal an interesting property that can be reflected in an inequality that describes the relationship between two genotype variables and a disease-status variable. We show, in this paper, that this inequality can be generalized to [Formula: see text] genotype variables. Based on this inequality, we establish a conditional independence and redundancy (CIR)-based definition of gene-gene interaction and the concept of an interaction group. From these new definitions, a novel measure of gene-gene interaction is then derived. We discuss the properties of these concepts and explain how they can be used in a novel algorithm to detect high-order gene-gene interactions. Experimental results using both simulated and real datasets show that the proposed method can be very promising.
Collapse
Affiliation(s)
- Xiangdong Zhou
- College of Mathematics and Computer Science, Fuzhou University Fuzhou, Fujian 350108, P. R. China
| | - Keith C C Chan
- Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, P. R. China
| | - Zhihua Huang
- College of Mathematics and Computer Science, Fuzhou University Fuzhou, Fujian 350108, P. R. China
| | - Jingbin Wang
- College of Mathematics and Computer Science, Fuzhou University Fuzhou, Fujian 350108, P. R. China
| |
Collapse
|
16
|
Wen J, Ford CT, Janies D, Shi X. A parallelized strategy for epistasis analysis based on Empirical Bayesian Elastic Net models. Bioinformatics 2020; 36:3803-3810. [PMID: 32227194 DOI: 10.1093/bioinformatics/btaa216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 03/05/2020] [Accepted: 03/26/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Epistasis reflects the distortion on a particular trait or phenotype resulting from the combinatorial effect of two or more genes or genetic variants. Epistasis is an important genetic foundation underlying quantitative traits in many organisms as well as in complex human diseases. However, there are two major barriers in identifying epistasis using large genomic datasets. One is that epistasis analysis will induce over-fitting of an over-saturated model with the high-dimensionality of a genomic dataset. Therefore, the problem of identifying epistasis demands efficient statistical methods. The second barrier comes from the intensive computing time for epistasis analysis, even when the appropriate model and data are specified. RESULTS In this study, we combine statistical techniques and computational techniques to scale up epistasis analysis using Empirical Bayesian Elastic Net (EBEN) models. Specifically, we first apply a matrix manipulation strategy for pre-computing the correlation matrix and pre-filter to narrow down the search space for epistasis analysis. We then develop a parallelized approach to further accelerate the modeling process. Our experiments on synthetic and empirical genomic data demonstrate that our parallelized methods offer tens of fold speed up in comparison with the classical EBEN method which runs in a sequential manner. We applied our parallelized approach to a yeast dataset, and we were able to identify both main and epistatic effects of genetic variants associated with traits such as fitness. AVAILABILITY AND IMPLEMENTATION The software is available at github.com/shilab/parEBEN.
Collapse
Affiliation(s)
- Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Colby T Ford
- Department of Bioinformatics and Genomics, College of Computing and Informatics.,School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Daniel Janies
- Department of Bioinformatics and Genomics, College of Computing and Informatics
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
17
|
Testing the Significance of Interactions in Genetic Studies Using Interaction Information and Resampling Technique. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7304020 DOI: 10.1007/978-3-030-50420-5_38] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Interaction information is a model-free, non-parametric measure used for detection of interaction among variables. It frequently finds interactions which remain undetected by standard model-based methods. However in the previous studies application of interaction information was limited by lack of appropriate statistical tests. We study a challenging problem of testing the positiveness of interaction information which allows to confirm the statistical significance of the investigated interactions. It turns out that commonly used chi-squared test detects too many spurious interactions when the dependence between the variables (e.g. between two genetic markers) is strong. To overcome this problem we consider permutation test and also propose a novel HYBRID method that combines permutation and chi-squared tests and takes into account dependence between studied variables. We show in numerical experiments that, in contrast to chi-squared based test, the proposed method controls well the actual significance level and in many situations detects interactions which are undetected by standard methods. Moreover HYBRID method outperforms permutation test with respect to power and computational efficiency. The method is applied to find interactions among Single Nucleotide Polymorphisms as well as among gene expression levels of human immune cells.
Collapse
|
18
|
Application of simulation-based CYP26 SNP-environment barcodes for evaluating the occurrence of oral malignant disorders by odds ratio-based binary particle swarm optimization: A case-control study in the Taiwanese population. PLoS One 2019; 14:e0220719. [PMID: 31465460 PMCID: PMC6715230 DOI: 10.1371/journal.pone.0220719] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 07/22/2019] [Indexed: 12/15/2022] Open
Abstract
Introduction Genetic polymorphisms and social factors (alcohol consumption, betel quid (BQ) usage, and cigarette consumption), both separately or jointly, play a crucial role in the occurrence of oral malignant disorders such as oral and pharyngeal cancers and oral potentially malignant disorders (OPMD). Material and methods Simultaneous analyses of multiple single nucleotide polymorphisms (SNPs) and environmental effects on oral malignant disorders are essential to examine, albeit challenging. Thus, we conducted a case-control study (N = 576) to analyze the risk of occurrence of oral malignant disorders by using binary particle swarm optimization (BPSO) with an odds ratio (OR)-based method. Results We demonstrated that a combination of SNPs (CYP26B1 rs887844 and CYP26C1 rs12256889) and socio-demographic factors (age, ethnicity, and BQ chewing), referred to as the combined effects of SNP-environment, correlated with maximal risk diversity of occurrence observed between the oral malignant disorder group and the control group. The risks were more prominent in the oral and pharyngeal cancers group (OR = 10.30; 95% confidence interval (CI) = 4.58–23.15) than in the OPMD group (OR = 5.42; 95% CI = 1.94–15.12). Conclusions Simulation-based “SNP-environment barcodes” may be used to predict the risk of occurrence of oral malignant disorders. Applying simulation-based “SNP-environment barcodes” may provide insight into the importance of screening tests in preventing oral and pharyngeal cancers and OPMD.
Collapse
|
19
|
Lawania S, Singh A, Sharma S, Singh N, Behera D. The multi-faceted high order polymorphic synergistic interactions among nucleotide excision repair genes increase the risk of lung cancer in North Indians. Mutat Res 2019; 816-818:111673. [PMID: 31195348 DOI: 10.1016/j.mrfmmm.2019.111673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 05/08/2019] [Accepted: 06/04/2019] [Indexed: 11/25/2022]
Abstract
It is evident that gene-gene interactions are pervasive in the determination of the susceptibility of human diseases. Polymorphisms in nucleotide excision repair pathway (NER) genes can cause variations in the repair capacity and therefore, might lead to increase in susceptibility towards lung cancer through complex gene-gene and gene-smoking interactions. Logistic regression analysis, along with high order genetic interaction were analyzed using data mining tools such as multifactor dimensionality reduction (MDR) and classification and regression tree analysis (CART). Overall, a protective effect was reported when a combinatorial effect of SNPs were studied by applying logistic regression analysis. Multifactor dimensionality reduction (MDR) analysis, revealed that the four factor model i.e. XPC K939Q, XPA 5'UTR, XPG F670W and XPG D1104H had the best ability to predict lung cancer risk (CVC = 100, p < 0.0001). While a two factor model, including smoking and XPG F670W suggested smoking was associated with the risk of developing lung cancer (CVC = 100, p < 0.0001). Individually XPG F670W was identified as the primary risk factor. In classification and regression tree analysis (CART), we observed a 6-fold risk for SCLC patients carrying XPA 5'UTR (M), XPD K751Q (W) (OR: 6.20; 95%CI: 2.40-16.01, p = 0.0001).Polymorphic NER genes might jointly modulate lung cancer risk through gene-gene and gene-smoking interaction.
Collapse
Affiliation(s)
- Shweta Lawania
- Department of Biotechnology, Thapar University, Punjab, 147002, India
| | - Amrita Singh
- Department of Biotechnology, Thapar University, Punjab, 147002, India
| | - Siddharth Sharma
- Department of Biotechnology, Thapar University, Punjab, 147002, India.
| | - Navneet Singh
- Department of Pulmonary Medicine, Post Graduate Institute of Medical Education & Research (PGIMER), Sector 14, Chandigarh, India
| | - Digamber Behera
- Department of Pulmonary Medicine, Post Graduate Institute of Medical Education & Research (PGIMER), Sector 14, Chandigarh, India
| |
Collapse
|
20
|
Guan B, Zhao Y, Sun W. Ant colony optimization with an automatic adjustment mechanism for detecting epistatic interactions. Comput Biol Chem 2018; 77:354-362. [DOI: 10.1016/j.compbiolchem.2018.11.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 10/01/2018] [Accepted: 11/05/2018] [Indexed: 12/13/2022]
|
21
|
Hou TT, Lin F, Bai S, Cleves MA, Xu HM, Lou XY. Generalized multifactor dimensionality reduction approaches to identification of genetic interactions underlying ordinal traits. Genet Epidemiol 2018; 43:24-36. [PMID: 30387901 DOI: 10.1002/gepi.22169] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 08/31/2018] [Accepted: 09/21/2018] [Indexed: 12/11/2022]
Abstract
The manifestation of complex traits is influenced by gene-gene and gene-environment interactions, and the identification of multifactor interactions is an important but challenging undertaking for genetic studies. Many complex phenotypes such as disease severity are measured on an ordinal scale with more than two categories. A proportional odds model can improve statistical power for these outcomes, when compared to a logit model either collapsing the categories into two mutually exclusive groups or limiting the analysis to pairs of categories. In this study, we propose a proportional odds model-based generalized multifactor dimensionality reduction (GMDR) method for detection of interactions underlying polytomous ordinal phenotypes. Computer simulations demonstrated that this new GMDR method has a higher power and more accurate predictive ability than the GMDR methods based on a logit model and a multinomial logit model. We applied this new method to the genetic analysis of low-density lipoprotein (LDL) cholesterol, a causal risk factor for coronary artery disease, in the Multi-Ethnic Study of Atherosclerosis, and identified a significant joint action of the CELSR2, SERPINA12, HPGD, and APOB genes. This finding provides new information to advance the limited knowledge about genetic regulation and gene interactions in metabolic pathways of LDL cholesterol. In conclusion, the proportional odds model-based GMDR is a useful tool that can boost statistical power and prediction accuracy in studying multifactor interactions underlying ordinal traits.
Collapse
Affiliation(s)
- Ting-Ting Hou
- Biostatistics Program, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas.,Arkansas Children's Research Institute, Little Rock, Arkansas.,Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Feng Lin
- Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Shasha Bai
- Biostatistics Program, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas.,Arkansas Children's Research Institute, Little Rock, Arkansas
| | - Mario A Cleves
- Biostatistics Program, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas.,Arkansas Children's Research Institute, Little Rock, Arkansas
| | - Hai-Ming Xu
- Biostatistics Program, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas.,Arkansas Children's Research Institute, Little Rock, Arkansas.,Institute of Bioinformatics and Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Xiang-Yang Lou
- Biostatistics Program, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas.,Arkansas Children's Research Institute, Little Rock, Arkansas.,Arkansas Children's Nutrition Center, Little Rock, Arkansas
| |
Collapse
|
22
|
Zhou X, Chan KCC. Detecting gene-gene interactions for complex quantitative traits using generalized fuzzy classification. BMC Bioinformatics 2018; 19:329. [PMID: 30227829 PMCID: PMC6145205 DOI: 10.1186/s12859-018-2361-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 09/09/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Quantitative traits or continuous outcomes related to complex diseases can provide more information and therefore more accurate analysis for identifying gene-gene and gene- environment interactions associated with complex diseases. Multifactor Dimensionality Reduction (MDR) is originally proposed to identify gene-gene and gene- environment interactions associated with binary status of complex diseases. Some efforts have been made to extend it to quantitative traits (QTs) and ordinal traits. However these and other methods are still not computationally efficient or effective. RESULTS Generalized Fuzzy Quantitative trait MDR (GFQMDR) is proposed in this paper to strengthen identification of gene-gene interactions associated with a quantitative trait by first transforming it to an ordinal trait and then selecting best sets of genetic markers, mainly single nucleotide polymorphisms (SNPs) or simple sequence length polymorphic markers (SSLPs), as having strong association with the trait through generalized fuzzy classification using extended member functions. Experimental results on simulated datasets and real datasets show that our algorithm has better success rate, classification accuracy and consistency in identifying gene-gene interactions associated with QTs. CONCLUSION The proposed algorithm provides a more effective way to identify gene-gene interactions associated with quantitative traits.
Collapse
Affiliation(s)
- Xiangdong Zhou
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian China
| | - Keith C. C. Chan
- Department of Computing, the Hong Kong Polytechnic University, Kowloon, Hong Kong China
| |
Collapse
|
23
|
Cole BS, Hall MA, Urbanowicz RJ, Gilbert‐Diamond D, Moore JH. Analysis of Gene‐Gene Interactions. ACTA ACUST UNITED AC 2018; 95:1.14.1-1.14.10. [DOI: 10.1002/cphg.45] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Brian S. Cole
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania Philadelphia Pennsylvania
| | - Molly A. Hall
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania Philadelphia Pennsylvania
- The Center for Systems Genomics, The Pennsylvania State University, University Park Pennsylvania
| | - Ryan J. Urbanowicz
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania Philadelphia Pennsylvania
| | - Diane Gilbert‐Diamond
- Institute for Quantitative Biomedical Sciences at Dartmouth Hanover New Hampshire
- Department of Epidemiology, Geisel School of Medicine at Dartmouth Hanover New Hampshire
| | - Jason H. Moore
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania Philadelphia Pennsylvania
| |
Collapse
|
24
|
Xu Y, Wu Y, Wu J. Capturing pair-wise epistatic effects associated with three agronomic traits in barley. Genetica 2018; 146:161-170. [PMID: 29349538 DOI: 10.1007/s10709-018-0008-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 01/11/2018] [Indexed: 11/25/2022]
Abstract
Genetic association mapping has been widely applied to determine genetic markers favorably associated with a trait of interest and provide information for marker-assisted selection. Many association mapping studies commonly focus on main effects due to intolerable computing intensity. This study aims to select several sets of DNA markers with potential epistasis to maximize genetic variations of some key agronomic traits in barley. By doing so, we integrated a MDR (multifactor dimensionality reduction) method with a forward variable selection approach. This integrated approach was used to determine single nucleotide polymorphism pairs with epistasis effects associated with three agronomic traits: heading date, plant height, and grain yield in barley from the barley Coordinated Agricultural Project. Our results showed that four, seven, and five SNP pairs accounted for 51.06, 45.66 and 40.42% for heading date, plant height, and grain yield, respectively with epistasis being considered, while corresponding contributions to these three traits were 45.32, 31.39, 31.31%, respectively without epistasis being included. The results suggested that epistasis model was more effective than non-epistasis model in this study and can be more preferred for other applications.
Collapse
Affiliation(s)
- Yi Xu
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Box 2140C, Brookings, SD, 57007, USA
| | - Yajun Wu
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD, 57007, USA
| | - Jixiang Wu
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Box 2140C, Brookings, SD, 57007, USA.
| |
Collapse
|
25
|
Mielniczuk J, Teisseyre P. A deeper look at two concepts of measuring gene-gene interactions: logistic regression and interaction information revisited. Genet Epidemiol 2017; 42:187-200. [PMID: 29265411 DOI: 10.1002/gepi.22108] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 10/23/2017] [Accepted: 11/15/2017] [Indexed: 11/09/2022]
Abstract
Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures.
Collapse
Affiliation(s)
- Jan Mielniczuk
- Institute of Computer Science, Polish Academy of Sciences, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Poland
| | - Paweł Teisseyre
- Institute of Computer Science, Polish Academy of Sciences, Poland
| |
Collapse
|
26
|
Hall MA, Moore JH, Ritchie MD. Embracing Complex Associations in Common Traits: Critical Considerations for Precision Medicine. Trends Genet 2017; 32:470-484. [PMID: 27392675 DOI: 10.1016/j.tig.2016.06.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 06/01/2016] [Accepted: 06/02/2016] [Indexed: 10/21/2022]
Abstract
Genome-wide association studies (GWAS) have identified numerous loci associated with human phenotypes. This approach, however, does not consider the richly diverse and complex environment with which humans interact throughout the life course, nor does it allow for interrelationships between genetic loci and across traits. As we move toward making precision medicine a reality, whereby we make predictions about disease risk based on genomic profiles, we need to identify improved predictive models of the relationship between genome and phenome. Methods that embrace pleiotropy (the effect of one locus on more than one trait), and gene-environment (G×E) and gene-gene (G×G) interactions, will further unveil the impact of alterations in biological pathways and identify genes that are only involved with disease in the context of the environment. This valuable information can be used to assess personal risk and choose the most appropriate medical interventions based on the genotype and environment of an individual, the whole premise of precision medicine.
Collapse
Affiliation(s)
- Molly A Hall
- Institute for Biomedical Informatics, Departments of Genetics and Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, 3535 Market Street, Philadelphia, PA 19104, USA
| | - Jason H Moore
- Institute for Biomedical Informatics, Departments of Genetics and Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, 3535 Market Street, Philadelphia, PA 19104, USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA; Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
27
|
Wen J, Quitadamo A, Hall B, Shi X. Epistasis analysis of microRNAs on pathological stages in colon cancer based on an Empirical Bayesian Elastic Net method. BMC Genomics 2017. [PMID: 29513198 PMCID: PMC5657052 DOI: 10.1186/s12864-017-4130-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background Colon cancer is a leading cause of worldwide cancer death. It has become clear that microRNAs (miRNAs) play a role in the progress of colon cancer and understanding the effect of miRNAs on tumorigenesis could lead to better prognosis and improved treatment. However, most studies have focused on studying differentially expressed miRNAs between tumor and non-tumor samples or between stages in tumor tissue. Limited work has conducted to study the interactions or epistasis between miRNAs and how the epistasis brings about effect on tumor progression. In this study, we investigate the main and pair-wise epistatic effects of miRNAs on the pathological stages of colon cancer using datasets from The Cancer Genome Atlas. Results We develop a workflow composed of multiple steps for feature selection based on the Empirical Bayesian Elastic Net (EBEN) method. First, we identify the main effects using a model with only main effect on the phenotype. Second, a corrected phenotype is calculated by removing the significant main effect from the original phenotype. Third, we select features with epistatic effect on the corrected phenotype. Finally, we run the full model with main and epistatic effects on the previously selected main and epistatic features. Using the multi-step workflow, we identify a set of miRNAs with main and epistatic effect on the pathological stages of colon cancer. Many of miRNAs with main effect on colon cancer have been previously reported to be associated with colon cancer, and the majority of the epistatic miRNAs share common target genes that could explain their epistasis effect on the pathological stages of colon cancer. We also find many of the target genes of detected miRNAs are associated with colon cancer. Go Ontology Enrichment Analysis of the experimentally validates targets of main and epistatic miRNAs, shows that these target genes are enriched for biological processes associated with cancer progression. Conclusion Our results provide a set of candidate miRNAs associated with colon cancer progression that could have potential translational and therapeutic utility. Our analysis workflow offers a new opportunity to efficiently explore epistatic interactions among genetic and epigenetic factors that could be associated with human diseases. Furthermore, our workflow is flexible and can be applied to analyze the main and epistatic effect of various genetic and epigenetic factors on a wide range of phenotypes. Electronic supplementary material The online version of this article (10.1186/s12864-017-4130-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jia Wen
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Andrew Quitadamo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Benika Hall
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
28
|
Kim G, Lai CQ, Arnett DK, Parnell LD, Ordovas JM, Kim Y, Kim J. Detection of gene-environment interactions in a family-based population using SCAD. Stat Med 2017; 36:3547-3559. [PMID: 28707299 DOI: 10.1002/sim.7382] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 05/19/2017] [Accepted: 06/02/2017] [Indexed: 11/07/2022]
Abstract
Gene-environment interaction (GxE) is emphasized as one potential source of missing genetic variation on disease traits, and the ultimate goal of GxE research is prediction of individual risk and prevention of complex diseases. However, there are various challenges in statistical analysis of GxE. In this paper, we focus on the three methodological challenges: (i) the high dimensions of genes; (ii) the hierarchical structure between interaction effects and their corresponding main effects; and (iii) the correlation among subjects from family-based population studies. In this paper, we propose an algorithm that approaches all three challenges simultaneously. This is the first penalized method focusing on an interaction search based on a linear mixed effect model. For verification, we compare the empirical performance of our new method with other existing methods in simulation study. The results demonstrate the superiority of our method under overall simulation setup. In particular, the outperformance obviously becomes greater as the correlation among subjects increases. In addition, the new method provides a robust estimate for the correlation among subjects. We also apply the new method on Genetics of Lipid Lowering Drugs and Diet Network study data. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Gwangsu Kim
- Data Science for Knowledge Creation Research Center, Seoul National University, Seoul, Korea
| | - Chao-Qiang Lai
- Jean Mayer US Department of Agriculture Human Nutrition Research Center on Aging, Tufts University, Boston, MA, U.S.A
| | - Donna K Arnett
- University of Kentucky College of Public Health, Lexington, KY, U.S.A
| | - Laurence D Parnell
- Jean Mayer US Department of Agriculture Human Nutrition Research Center on Aging, Tufts University, Boston, MA, U.S.A
| | - Jose M Ordovas
- Jean Mayer US Department of Agriculture Human Nutrition Research Center on Aging, Tufts University, Boston, MA, U.S.A.,Department of Epidemiology and Population Genetics, Centro Nacional Investigacion Cardiovasculares (CNIC) Madrid, Madrid, Spain
| | - Yongdai Kim
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Joungyoun Kim
- Department of Information Statistics, Chungbuk National University, Cheongju, Chungbuk, Korea
| |
Collapse
|
29
|
Moore JH, Andrews PC, Olson RS, Carlson SE, Larock CR, Bulhoes MJ, O'Connor JP, Greytak EM, Armentrout SL. Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases. BioData Min 2017; 10:19. [PMID: 28572842 PMCID: PMC5450417 DOI: 10.1186/s13040-017-0139-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 05/18/2017] [Indexed: 11/18/2022] Open
Abstract
Background Large-scale genetic studies of common human diseases have focused almost exclusively on the independent main effects of single-nucleotide polymorphisms (SNPs) on disease susceptibility. These studies have had some success, but much of the genetic architecture of common disease remains unexplained. Attention is now turning to detecting SNPs that impact disease susceptibility in the context of other genetic factors and environmental exposures. These context-dependent genetic effects can manifest themselves as non-additive interactions, which are more challenging to model using parametric statistical approaches. The dimensionality that results from a multitude of genotype combinations, which results from considering many SNPs simultaneously, renders these approaches underpowered. We previously developed the multifactor dimensionality reduction (MDR) approach as a nonparametric and genetic model-free machine learning alternative. Approaches such as MDR can improve the power to detect gene-gene interactions but are limited in their ability to exhaustively consider SNP combinations in genome-wide association studies (GWAS), due to the combinatorial explosion of the search space. We introduce here a stochastic search algorithm called Crush for the application of MDR to modeling high-order gene-gene interactions in genome-wide data. The Crush-MDR approach uses expert knowledge to guide probabilistic searches within a framework that capitalizes on the use of biological knowledge to filter gene sets prior to analysis. Here we evaluated the ability of Crush-MDR to detect hierarchical sets of interacting SNPs using a biology-based simulation strategy that assumes non-additive interactions within genes and additivity in genetic effects between sets of genes within a biochemical pathway. Results We show that Crush-MDR is able to identify genetic effects at the gene or pathway level significantly better than a baseline random search with the same number of model evaluations. We then applied the same methodology to a GWAS for Alzheimer’s disease and showed base level validation that Crush-MDR was able to identify a set of interacting genes with biological ties to Alzheimer’s disease. Conclusions We discuss the role of stochastic search and cloud computing for detecting complex genetic effects in genome-wide data.
Collapse
Affiliation(s)
- Jason H Moore
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104 PA USA
| | - Peter C Andrews
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104 PA USA
| | - Randal S Olson
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104 PA USA
| | | | | | | | | | | | | |
Collapse
|
30
|
|
31
|
Guo X, Zhang J, Cai Z, Du DZ, Pan Y. Searching Genome-Wide Multi-Locus Associations for Multiple Diseases Based on Bayesian Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:600-610. [PMID: 26887006 DOI: 10.1109/tcbb.2016.2527648] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Taking the advantage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unraveling complex relationships between genotypes and phenotypes. Current multi-locus-based methods are insufficient to detect interactions with diverse genetic effects on multifarious diseases. Also, statistic tests for high-order epistasis ( ≥ 2 SNPs) raise huge computational and analytical challenges because the computation increases exponentially as the growth of the cardinality of SNPs combinations. In this paper, we provide a simple, fast and powerful method, named DAM, using Bayesian inference to detect genome-wide multi-locus epistatic interactions in multiple diseases. Experimental results on simulated data demonstrate that our method is powerful and efficient. We also apply DAM on two GWAS datasets from WTCCC, i.e., Rheumatoid Arthritis and Type 1 Diabetes, and identify some novel findings. Therefore, we believe that our method is suitable and efficient for the full-scale analysis of multi-disease-related interactions in GWASs.
Collapse
|
32
|
Abstract
BACKGROUND Detection of gene-gene interaction (GGI) is a key challenge towards solving the problem of missing heritability in genetics. The multifactor dimensionality reduction (MDR) method has been widely studied for detecting GGIs. MDR reduces the dimensionality of multi-factor by means of binary classification into high-risk (H) or low-risk (L) groups. Unfortunately, this simple binary classification does not reflect the uncertainty of H/L classification. Thus, we proposed Fuzzy MDR to overcome limitations of binary classification by introducing the degree of membership of two fuzzy sets H/L. While Fuzzy MDR demonstrated higher power than that of MDR, its performance is highly dependent on the several tuning parameters. In real applications, it is not easy to choose appropriate tuning parameter values. RESULT In this work, we propose an empirical fuzzy MDR (EF-MDR) which does not require specifying tuning parameters values. Here, we propose an empirical approach to estimating the membership degree that can be directly estimated from the data. In EF-MDR, the membership degree is estimated by the maximum likelihood estimator of the proportion of cases(controls) in each genotype combination. We also show that the balanced accuracy measure derived from this new membership function is a linear function of the standard chi-square statistics. This relationship allows us to perform the standard significance test using p-values in the MDR framework without permutation. Through two simulation studies, the power of the proposed EF-MDR is shown to be higher than those of MDR and Fuzzy MDR. We illustrate the proposed EF-MDR by analyzing Crohn's disease (CD) and bipolar disorder (BD) in the Wellcome Trust Case Control Consortium (WTCCC) dataset. CONCLUSION We propose an empirical Fuzzy MDR for detecting GGI using the maximum likelihood of the proportion of cases(controls) as the membership degree of the genotype combination. The program written in R for EF-MDR is available at http://statgen.snu.ac.kr/software/EF-MDR .
Collapse
Affiliation(s)
- Sangseob Leem
- Department of Statistics, Seoul National University, Seoul, 08826 South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, 08826 South Korea
| |
Collapse
|
33
|
Lin H, Mueller-Nurasyid M, Smith AV, Arking DE, Barnard J, Bartz TM, Lunetta KL, Lohman K, Kleber ME, Lubitz SA, Geelhoed B, Trompet S, Niemeijer MN, Kacprowski T, Chasman DI, Klarin D, Sinner MF, Waldenberger M, Meitinger T, Harris TB, Launer LJ, Soliman EZ, Chen LY, Smith JD, Van Wagoner DR, Rotter JI, Psaty BM, Xie Z, Hendricks AE, Ding J, Delgado GE, Verweij N, van der Harst P, Macfarlane PW, Ford I, Hofman A, Uitterlinden A, Heeringa J, Franco OH, Kors JA, Weiss S, Völzke H, Rose LM, Natarajan P, Kathiresan S, Kääb S, Gudnason V, Alonso A, Chung MK, Heckbert SR, Benjamin EJ, Liu Y, März W, Rienstra M, Jukema JW, Stricker BH, Dörr M, Albert CM, Ellinor PT. Gene-gene Interaction Analyses for Atrial Fibrillation. Sci Rep 2016; 6:35371. [PMID: 27824142 PMCID: PMC5099695 DOI: 10.1038/srep35371] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 09/28/2016] [Indexed: 11/29/2022] Open
Abstract
Atrial fibrillation (AF) is a heritable disease that affects more than thirty million individuals worldwide. Extensive efforts have been devoted to the study of genetic determinants of AF. The objective of our study is to examine the effect of gene-gene interaction on AF susceptibility. We performed a large-scale association analysis of gene-gene interactions with AF in 8,173 AF cases, and 65,237 AF-free referents collected from 15 studies for discovery. We examined putative interactions between genome-wide SNPs and 17 known AF-related SNPs. The top interactions were then tested for association in an independent cohort for replication, which included more than 2,363 AF cases and 114,746 AF-free referents. One interaction, between rs7164883 at the HCN4 locus and rs4980345 at the SLC28A1 locus, was found to be significantly associated with AF in the discovery cohorts (interaction OR = 1.44, 95% CI: 1.27–1.65, P = 4.3 × 10–8). Eight additional gene-gene interactions were also marginally significant (P < 5 × 10–7). However, none of the top interactions were replicated. In summary, we did not find significant interactions that were associated with AF susceptibility. Future increases in sample size and denser genotyping might facilitate the identification of gene-gene interactions associated with AF.
Collapse
Affiliation(s)
- Honghuang Lin
- National Heart Lung and Blood Institute's and Boston University's Framingham Heart Study, Framingham, MA, USA.,Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Martina Mueller-Nurasyid
- Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.,Department of Medicine I, Ludwig-Maximilians-University Munich, Munich, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Munich Heart Alliance, Munich, Germany
| | - Albert V Smith
- Icelandic Heart Association, Kopavogur, Iceland.,Faculty of Medicine, University of Iceland, Reykjavik, Iceland
| | - Dan E Arking
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | - Traci M Bartz
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Kathryn L Lunetta
- National Heart Lung and Blood Institute's and Boston University's Framingham Heart Study, Framingham, MA, USA.,Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Kurt Lohman
- Department of Biostatistical Sciences, Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Marcus E Kleber
- Vth Department of Medicine, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany
| | - Steven A Lubitz
- Cardiac Arrhythmia Service, Massachusetts General Hospital, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | - Bastiaan Geelhoed
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Stella Trompet
- Department of Cardiology, Leiden University Medical Center, the Netherlands.,Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, the Netherlands
| | - Maartje N Niemeijer
- Department of Epidemiology, Erasmus MC - University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Tim Kacprowski
- Department of Functional Genomics, Interfaculty Institute for Genetics and Functional Genomics, University Medicine and Ernst-Moritz-Arndt University Greifswald, Greifswald, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Greifswald, Greifswald, Germany
| | - Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston MA, USA
| | - Derek Klarin
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.,Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA.,Department of Surgery, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | - Moritz F Sinner
- Department of Medicine I, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Melanie Waldenberger
- DZHK (German Centre for Cardiovascular Research), partner site Munich Heart Alliance, Munich, Germany.,Vth Department of Medicine, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Thomas Meitinger
- DZHK (German Centre for Cardiovascular Research), partner site Munich Heart Alliance, Munich, Germany.,Institute of Human Genetics, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Institute of Human Genetics, Technische Universität München, Munich, Germany
| | - Tamara B Harris
- National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Lenore J Launer
- National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Elsayed Z Soliman
- Epidemiological Cardiology Research Center, Wake Forest School of Medicine, Winston Salem, NC, USA
| | - Lin Y Chen
- Cardiovascular Division, Department of Medicine, University of Minnesota Medical School, Minneapolis, MN, USA
| | | | | | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences (J.I.R.), Departments of Pediatrics and Medicine, LABioMed at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology and Health Services, University of Washington, Seattle, WA, USA.,Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | - Zhijun Xie
- Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Audrey E Hendricks
- National Heart Lung and Blood Institute's and Boston University's Framingham Heart Study, Framingham, MA, USA.,Mathematical and Statistical Sciences, University of Colorado, Denver, Denver, CO, USA
| | - Jingzhong Ding
- Department of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Graciela E Delgado
- Vth Department of Medicine, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany
| | - Niek Verweij
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Pim van der Harst
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Peter W Macfarlane
- Institute of Health and Wellbeing, College of Veterinary, Medical and Life Sciences, University of Glasgow, United Kingdom
| | - Ian Ford
- Robertson Center for Biostatistics, University of Glasgow, United Kingdom
| | - Albert Hofman
- Department of Epidemiology, Erasmus MC - University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - André Uitterlinden
- Department of Epidemiology &Internal Medicine, Erasmus MC - University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Jan Heeringa
- Department of Epidemiology, Erasmus MC - University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Oscar H Franco
- Department of Epidemiology, Erasmus MC - University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Jan A Kors
- Department of Medical Informatics, Erasmus MC - University Medical Center Rotterdam, the Netherlands
| | - Stefan Weiss
- Department of Functional Genomics, Interfaculty Institute for Genetics and Functional Genomics, University Medicine and Ernst-Moritz-Arndt University Greifswald, Greifswald, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Greifswald, Greifswald, Germany
| | - Henry Völzke
- DZHK (German Centre for Cardiovascular Research), partner site Greifswald, Greifswald, Germany.,Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Lynda M Rose
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston MA, USA
| | - Pradeep Natarajan
- Harvard Medical School, Boston, MA, USA.,Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.,Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | - Sekar Kathiresan
- Harvard Medical School, Boston, MA, USA.,Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.,Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | - Stefan Kääb
- Department of Medicine I, Ludwig-Maximilians-University Munich, Munich, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Munich Heart Alliance, Munich, Germany
| | - Vilmundur Gudnason
- Icelandic Heart Association, Kopavogur, Iceland.,Faculty of Medicine, University of Iceland, Reykjavik, Iceland
| | - Alvaro Alonso
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | | | - Susan R Heckbert
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA.,Department of Epidemiology, Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Emelia J Benjamin
- National Heart Lung and Blood Institute's and Boston University's Framingham Heart Study, Framingham, MA, USA.,Section of Cardiovascular Medicine and Preventive Medicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA.,Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Yongmei Liu
- Department of Epidemiology &Prevention, Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Winfried März
- Synlab Academy, Synlab Services, GmbH P5,7, 68161 Mannheim, Germany.,Clinical Institute of Medical and Chemical Laboratory Diagnostics, Medical University of Graz, Graz, Austria.,Medical Clinic V (Nephrology, Hypertensiology, Rheumatology, Endocrinology, Diabetology), Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Michiel Rienstra
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - J Wouter Jukema
- Department of Cardiology, Leiden University Medical Center, the Netherlands
| | - Bruno H Stricker
- Department of Epidemiology &Internal Medicine, Erasmus MC - University Medical Center Rotterdam, Rotterdam, the Netherlands.,Inspectorate of Health Care, Utrecht, the Netherlands
| | - Marcus Dörr
- DZHK (German Centre for Cardiovascular Research), partner site Greifswald, Greifswald, Germany.,Department of Internal Medicine B, University Medicine Greifswald, Greifswald, Germany
| | - Christine M Albert
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston MA, USA
| | | |
Collapse
|
34
|
Kodama K, Saigo H. KDSNP: A kernel-based approach to detecting high-order SNP interactions. J Bioinform Comput Biol 2016; 14:1644003. [DOI: 10.1142/s0219720016440030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Despite the accumulation of quantitative trait loci (QTL) data in many complex human diseases, most of current approaches that have attempted to relate genotype to phenotype have achieved limited success, and genetic factors of many common diseases are yet remained to be elucidated. One of the reasons that makes this problem complex is the existence of single nucleotide polymorphism (SNP) interaction, or epistasis. Due to excessive amount of computation for searching the combinatorial space, existing approaches cannot fully incorporate high-order SNP interactions into their models, but limit themselves to detecting only lower-order SNP interactions. We present an empirical approach based on ridge regression with polynomial kernels and model selection technique for determining the true degree of epistasis among SNPs. Computer experiments in simulated data show the ability of the proposed method to correctly predict the number of interacting SNPs provided that the number of samples is large enough relative to the number of SNPs. For cases in which the number of the available samples is limited, we propose to perform sliding window approach to ensure sufficiently large sample/SNP ratio in each window. In computational experiments using heterogeneous stock mice data, our approach has successfully detected subregions that harbor known causal SNPs. Our analysis further suggests the existence of additional candidate causal SNPs interacting to each other in the neighborhood of the known causal gene. Software is available from https://github.com/HirotoSaigo/KDSNP .
Collapse
Affiliation(s)
- Kento Kodama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka 820-8502, Fukuoka, Japan
| | - Hiroto Saigo
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka 820-8502, Fukuoka, Japan
| |
Collapse
|
35
|
Li M, Wei C, Wen Y, Wang T, Lu Q. Detecting Gene-Gene Interactions Associated with Multiple Complex Traits with U-Statistics. Curr Genomics 2016; 17:403-415. [PMID: 28479869 PMCID: PMC5320542 DOI: 10.2174/1389202917666160513100946] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Revised: 05/26/2015] [Accepted: 06/06/2015] [Indexed: 12/02/2022] Open
Abstract
Many complex diseases, such as psychiatric and behavioral disorders, are commonly characterized through various measurements that reflect physical, behavioral and psychological aspects of diseases. While it remains a great challenge to find a unified measurement to characterize a disease, the available multiple phenotypes can be analyzed jointly in the genetic association study. Simultaneously testing these phenotypes has many advantages, including considering different aspects of the disease in the analysis, and utilizing correlated phenotypes to improve the power of detecting disease-associated variants. Furthermore, complex diseases are likely caused by the interplay of multiple genetic variants through complicated mechanisms. Considering gene-gene interactions in the joint association analysis of complex diseases could further increase our ability to discover genetic variants involving complex disease pathways. In this article, we propose a stepwise U-test for joint association analysis of multiple loci and multiple phenotypes. Through simulations, we demonstrated that testing multiple phenotypes simultaneously could attain higher power than testing one single phenotype at a time, especially when there are shared genes contributing to multiple phenotypes. We also illustrated the proposed method with an application to Nicotine Dependence (ND), using datasets from the Study of Addition, Genetics and Environment (SAGE). The joint analysis of three ND phenotypes identified two SNPs, rs10508649 and rs2491397, and reached a nominal P-value of 3.79e-13. The association was further replicated in two independent datasets with P-values of 2.37e-05 and 7.46e-05.
Collapse
Affiliation(s)
| | | | | | | | - Qing Lu
- Address correspondence to this author at the Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi 030001, P.R. China; Tel: 517.353.8623 x137; Fax: 517.432.1130;, E-mail:
| |
Collapse
|
36
|
Chen Q, Mao X, Zhang Z, Zhu R, Yin Z, Leng Y, Yu H, Jia H, Jiang S, Ni Z, Jiang H, Han X, Liu C, Hu Z, Wu X, Hu G, Xin D, Qi Z. SNP-SNP Interaction Analysis on Soybean Oil Content under Multi-Environments. PLoS One 2016; 11:e0163692. [PMID: 27668866 PMCID: PMC5036806 DOI: 10.1371/journal.pone.0163692] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Accepted: 09/13/2016] [Indexed: 11/22/2022] Open
Abstract
Soybean oil content is one of main quality traits. In this study, we used the multifactor dimensionality reduction (MDR) method and a soybean high-density genetic map including 5,308 markers to identify stable single nucleotide polymorphism (SNP)—SNP interactions controlling oil content in soybean across 23 environments. In total, 36,442,756 SNP-SNP interaction pairs were detected, 1865 of all interaction pairs associated with soybean oil content were identified under multiple environments by the Bonferroni correction with p <3.55×10−11. Two and 1863 SNP-SNP interaction pairs detected stable across 12 and 11 environments, respectively, which account around 50% of total environments. Epistasis values and contribution rates of stable interaction (the SNP interaction pairs were detected in more than 2 environments) pairs were detected by the two way ANOVA test, the available interaction pairs were ranged 0.01 to 0.89 and from 0.01 to 0.85, respectively. Some of one side of the interaction pairs were identified with previously research as a major QTL without epistasis effects. The results of this study provide insights into the genetic architecture of soybean oil content and can serve as a basis for marker-assisted selection breeding.
Collapse
Affiliation(s)
- Qingshan Chen
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Xinrui Mao
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Zhanguo Zhang
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Rongsheng Zhu
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Zhengong Yin
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
- Crop Breeding Institute, Heilongjiang Academy of Agricultural Sciences, Harbin, 150086, Heilongjiang, People’s Republic of China
| | - Yue Leng
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Hongxiao Yu
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Huiying Jia
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Shanshan Jiang
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Zhongqiu Ni
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Hongwei Jiang
- The Crop Research and Breeding Center of Land-Reclamation of Heilongjiang Province, Harbin, 150090, Heilongjiang, People’s Republic of China
| | - Xue Han
- The Crop Research and Breeding Center of Land-Reclamation of Heilongjiang Province, Harbin, 150090, Heilongjiang, People’s Republic of China
| | - Chunyan Liu
- The Crop Research and Breeding Center of Land-Reclamation of Heilongjiang Province, Harbin, 150090, Heilongjiang, People’s Republic of China
| | - Zhenbang Hu
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Xiaoxia Wu
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
| | - Guohua Hu
- The Crop Research and Breeding Center of Land-Reclamation of Heilongjiang Province, Harbin, 150090, Heilongjiang, People’s Republic of China
| | - Dawei Xin
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
- * E-mail: (DX); (ZQ)
| | - Zhaoming Qi
- College of Agriculture, Soybean biology Key Laboratory of the Ministry of Education, Northeast Agricultural University, Harbin, 150030, Heilongjiang, People’s Republic of China
- * E-mail: (DX); (ZQ)
| |
Collapse
|
37
|
Simon PHG, Sylvestre MP, Tremblay J, Hamet P. Key Considerations and Methods in the Study of Gene-Environment Interactions. Am J Hypertens 2016; 29:891-9. [PMID: 27037711 DOI: 10.1093/ajh/hpw021] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 02/08/2016] [Indexed: 12/16/2022] Open
Abstract
With increased involvement of genetic data in most epidemiological investigations, gene-environment (G × E) interactions now stand as a topic, which must be meticulously assessed and thoroughly understood. The level, mode, and outcomes of interactions between environmental factors and genetic traits have the capacity to modulate disease risk. These must, therefore, be carefully evaluated as they have the potential to offer novel insights on the "missing heritability problem", reaching beyond our current limitations. First, we review a definition of G × E interactions. We then explore how concepts such as the early manifestation of the genetic components of a disease, the heterogeneity of complex traits, the clear definition of epidemiological strata, and the effect of varying physiological conditions can affect our capacity to detect (or miss) G × E interactions. Lastly, we discuss the shortfalls of regression models to study G × E interactions and how other methods such as the ReliefF algorithm, pattern recognition methods, or the LASSO (Least Absolute Shrinkage and Selection Operator) method can enable us to more adequately model G × E interactions. Overall, we present the elements to consider and a path to follow when studying genetic determinants of disease in order to uncover potential G × E interactions.
Collapse
Affiliation(s)
- Paul H G Simon
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Marie-Pierre Sylvestre
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Johanne Tremblay
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Pavel Hamet
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada.
| |
Collapse
|
38
|
Evaluation of associative classification-based multifactor dimensionality reduction in the presence of noise. ACTA ACUST UNITED AC 2016. [DOI: 10.1007/s13721-016-0114-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
39
|
A forest-based feature screening approach for large-scale genome data with complex structures. BMC Genet 2015; 16:148. [PMID: 26698561 PMCID: PMC4690313 DOI: 10.1186/s12863-015-0294-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2015] [Accepted: 11/13/2015] [Indexed: 01/06/2023] Open
Abstract
Background Genome-wide association studies (GWAS) interrogate large-scale whole genome to characterize the complex genetic architecture for biomedical traits. When the number of SNPs dramatically increases to half million but the sample size is still limited to thousands, the traditional p-value based statistical approaches suffer from unprecedented limitations. Feature screening has proved to be an effective and powerful approach to handle ultrahigh dimensional data statistically, yet it has not received much attention in GWAS. Feature screening reduces the feature space from millions to hundreds by removing non-informative noise. However, the univariate measures used to rank features are mainly based on individual effect without considering the mutual interactions with other features. In this article, we explore the performance of a random forest (RF) based feature screening procedure to emphasize the SNPs that have complex effects for a continuous phenotype. Results Both simulation and real data analysis are conducted to examine the power of the forest-based feature screening. We compare it with five other popular feature screening approaches via simulation and conclude that RF can serve as a decent feature screening tool to accommodate complex genetic effects such as nonlinear, interactive, correlative, and joint effects. Unlike the traditional p-value based Manhattan plot, we use the Permutation Variable Importance Measure (PVIM) to display the relative significance and believe that it will provide as much useful information as the traditional plot. Conclusion Most complex traits are found to be regulated by epistatic and polygenic variants. The forest-based feature screening is proven to be an efficient, easily implemented, and accurate approach to cope whole genome data with complex structures. Our explorations should add to a growing body of enlargement of feature screening better serving the demands of contemporary genome data.
Collapse
|
40
|
Sapin E, Keedwell E, Frayling T. Ant colony optimisation of decision tree and contingency table models for the discovery of gene-gene interactions. IET Syst Biol 2015; 9:218-25. [PMID: 26577156 PMCID: PMC8687348 DOI: 10.1049/iet-syb.2015.0017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Revised: 05/15/2015] [Accepted: 05/31/2015] [Indexed: 11/20/2022] Open
Abstract
In this study, ant colony optimisation (ACO) algorithm is used to derive near-optimal interactions between a number of single nucleotide polymorphisms (SNPs). This approach is used to discover small numbers of SNPs that are combined into a decision tree or contingency table model. The ACO algorithm is shown to be very robust as it is proven to be able to find results that are discriminatory from a statistical perspective with logical interactions, decision tree and contingency table models for various numbers of SNPs considered in the interaction. A large number of the SNPs discovered here have been already identified in large genome-wide association studies to be related to type II diabetes in the literature, lending additional confidence to the results.
Collapse
Affiliation(s)
- Emmanuel Sapin
- College of Engineering, Mathematics and Physical Sciences, University of Exeter, UK.
| | - Ed Keedwell
- College of Engineering, Mathematics and Physical Sciences, University of Exeter, UK
| | | |
Collapse
|
41
|
Kullo IJ, Leeper NJ. The genetic basis of peripheral arterial disease: current knowledge, challenges, and future directions. Circ Res 2015; 116:1551-60. [PMID: 25908728 DOI: 10.1161/circresaha.116.303518] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Several risk factors for atherosclerotic peripheral arterial disease (PAD), such as dyslipidemia, diabetes mellitus, and hypertension, are heritable. However, predisposition to PAD may be influenced by genetic variants acting independently of these risk factors. Identification of such genetic variants will provide insights into underlying pathophysiologic mechanisms and facilitate the development of novel diagnostic and therapeutic approaches. In contrast to coronary heart disease, relatively few genetic variants that influence susceptibility to PAD have been discovered. This may be, in part, because of greater clinical and genetic heterogeneity in PAD. In this review, we (1) provide an update on the current state of knowledge about the genetic basis of PAD, including results of family studies and candidate gene, linkage as well as genome-wide association studies; (2) highlight the challenges in investigating the genetic basis of PAD and possible strategies to overcome these challenges; and (3) discuss the potential of genome sequencing, RNA sequencing, differential gene expression, epigenetic profiling, and systems biology in increasing our understanding of the molecular genetics of PAD.
Collapse
Affiliation(s)
- Iftikhar J Kullo
- From the Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN (I.J.K.); and Department of Vascular Surgery, Stanford, Stanford, CA (N.J.L.).
| | - Nicholas J Leeper
- From the Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN (I.J.K.); and Department of Vascular Surgery, Stanford, Stanford, CA (N.J.L.)
| |
Collapse
|
42
|
Rule-based analysis for detecting epistasis using associative classification mining. ACTA ACUST UNITED AC 2015. [DOI: 10.1007/s13721-015-0084-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
43
|
Gao H, Wu Y, Li J, Li H, Li J, Yang R. Forward LASSO analysis for high-order interactions in genome-wide association study. Brief Bioinform 2015; 15:552-61. [PMID: 23775311 DOI: 10.1093/bib/bbt037] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Previous genome-wide association study (GWAS) focused on low-order interactions between pairwise single-nucleotide polymorphisms (SNPs) with significant main effects. Little is known how high-order interactions effect, especially one among the SNPs without main effects regulates quantitative traits. Within the frameworks of linear model and generalized linear model, the LASSO with coordinate descent step can be used to simultaneously analyze thousands and thousands of SNPs for normal and discrete traits. With consideration of high-order interactions among SNPs, a huge number of genetic effects make the LASSO failing to work under the presented condition of computation. Forward LASSO analysis is, therefore, proposed to shrink most of genetic effects to be zeros stage by stage. Simulation demonstrates that our proposed method could be used instead of the LASSO method for full model in mapping high-order interactions. Application of forward LASSO method is provided to GWAS for carcass traits and meat quality traits in beef cattle.
Collapse
|
44
|
Ding X, Wang J, Zelikovsky A, Guo X, Xie M, Pan Y. Searching High-Order SNP Combinations for Complex Diseases Based on Energy Distribution Difference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:695-704. [PMID: 26357280 DOI: 10.1109/tcbb.2014.2363459] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Single nucleotide polymorphisms, a dominant type of genetic variants, have been used successfully to identify defective genes causing human single gene diseases. However, most common human diseases are complex diseases and caused by gene-gene and gene-environment interactions. Many SNP-SNP interaction analysis methods have been introduced but they are not powerful enough to discover interactions more than three SNPs. The paper proposes a novel method that analyzes all SNPs simultaneously. Different from existing methods, the method regards an individual's genotype data on a list of SNPs as a point with a unit of energy in a multi-dimensional space, and tries to find a new coordinate system where the energy distribution difference between cases and controls reaches the maximum. The method will find different multiple SNPs combinatorial patterns between cases and controls based on the new coordinate system. The experiment on simulated data shows that the method is efficient. The tests on the real data of age-related macular degeneration (AMD) disease show that it can find out more significant multi-SNP combinatorial patterns than existing methods.
Collapse
|
45
|
Su L, Liu G, Wang H, Tian Y, Zhou Z, Han L, Yan L. Research on single nucleotide polymorphisms interaction detection from network perspective. PLoS One 2015; 10:e0119146. [PMID: 25763929 PMCID: PMC4357495 DOI: 10.1371/journal.pone.0119146] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 01/09/2015] [Indexed: 12/02/2022] Open
Abstract
Single Nucleotide Polymorphisms (SNPs) found in Genome-Wide Association Study (GWAS) mainly influence the susceptibility of complex diseases, but they still could not comprehensively explain the relationships between mutations and diseases. Interactions between SNPs are considered so important for deeply understanding of those relationships that several strategies have been proposed to explore such interactions. However, part of those methods perform poorly when marginal effects of disease loci are weak or absent, others may lack of considering high-order SNPs interactions, few methods have achieved the requirements in both performance and accuracy. Considering the above reasons, not only low-order, but also high-order SNP interactions as well as main-effect SNPs, should be taken into account in detection methods under an acceptable computational complexity. In this paper, a new pairwise (or low-order) interaction detection method IG (Interaction Gain) is introduced, in which disease models are not required and parallel computing is utilized. Furthermore, high-order SNP interactions were proposed to be detected by finding closely connected function modules of the network constructed from IG detection results. Tested by a wide range of simulated datasets and four WTCCC real datasets, the proposed methods accurately detected both low-order and high-order SNP interactions as well as disease-associated main-effect SNPS and it surpasses all competitors in performances. The research will advance complex diseases research by providing more reliable SNP interactions.
Collapse
Affiliation(s)
- Lingtao Su
- College of Computer Science and Technology, Jilin University, Changchun, People’s Republic of China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People’s Republic of China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, Changchun, People’s Republic of China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People’s Republic of China
- * E-mail:
| | - Han Wang
- College of Computer Science and Information Technology, Northeast Normal University, Changchun, People’s Republic of China
| | - Yuan Tian
- College of Computer Science and Technology, Jilin University, Changchun, People’s Republic of China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People’s Republic of China
| | - Zhihui Zhou
- College of Computer Science and Technology, Jilin University, Changchun, People’s Republic of China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People’s Republic of China
| | - Liang Han
- College of Computer Science and Technology, Jilin University, Changchun, People’s Republic of China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People’s Republic of China
| | - Lun Yan
- College of Computer Science and Technology, Jilin University, Changchun, People’s Republic of China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People’s Republic of China
| |
Collapse
|
46
|
Broer L, Buchman AS, Deelen J, Evans DS, Faul JD, Lunetta KL, Sebastiani P, Smith JA, Smith AV, Tanaka T, Yu L, Arnold AM, Aspelund T, Benjamin EJ, De Jager PL, Eirkisdottir G, Evans DA, Garcia ME, Hofman A, Kaplan RC, Kardia SLR, Kiel DP, Oostra BA, Orwoll ES, Parimi N, Psaty BM, Rivadeneira F, Rotter JI, Seshadri S, Singleton A, Tiemeier H, Uitterlinden AG, Zhao W, Bandinelli S, Bennett DA, Ferrucci L, Gudnason V, Harris TB, Karasik D, Launer LJ, Perls TT, Slagboom PE, Tranah GJ, Weir DR, Newman AB, van Duijn CM, Murabito JM. GWAS of longevity in CHARGE consortium confirms APOE and FOXO3 candidacy. J Gerontol A Biol Sci Med Sci 2015; 70:110-8. [PMID: 25199915 PMCID: PMC4296168 DOI: 10.1093/gerona/glu166] [Citation(s) in RCA: 204] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 08/07/2014] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The genetic contribution to longevity in humans has been estimated to range from 15% to 25%. Only two genes, APOE and FOXO3, have shown association with longevity in multiple independent studies. METHODS We conducted a meta-analysis of genome-wide association studies including 6,036 longevity cases, age ≥90 years, and 3,757 controls that died between ages 55 and 80 years. We additionally attempted to replicate earlier identified single nucleotide polymorphism (SNP) associations with longevity. RESULTS In our meta-analysis, we found suggestive evidence for the association of SNPs near CADM2 (odds ratio [OR] = 0.81; p value = 9.66 × 10(-7)) and GRIK2 (odds ratio = 1.24; p value = 5.09 × 10(-8)) with longevity. When attempting to replicate findings earlier identified in genome-wide association studies, only the APOE locus consistently replicated. In an additional look-up of the candidate gene FOXO3, we found that an earlier identified variant shows a highly significant association with longevity when including published data with our meta-analysis (odds ratio = 1.17; p value = 1.85×10(-10)). CONCLUSIONS We did not identify new genome-wide significant associations with longevity and did not replicate earlier findings except for APOE and FOXO3. Our inability to find new associations with survival to ages ≥90 years because longevity represents multiple complex traits with heterogeneous genetic underpinnings, or alternatively, that longevity may be regulated by rare variants that are not captured by standard genome-wide genotyping and imputation of common variants.
Collapse
Affiliation(s)
- Linda Broer
- Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands. Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, The Netherlands. Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Aron S Buchman
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, Illinois
| | - Joris Deelen
- Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, The Netherlands. Department of Molecular Epidemiology, Leiden University Medical Center, The Netherlands
| | - Daniel S Evans
- California Pacific Medical Center Research Institute, San Francisco
| | - Jessica D Faul
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor
| | - Kathryn L Lunetta
- Department of Biostatistics, Boston University School of Public Health, Massachusetts. NHLBI's and Boston Univesity's Framingham Heart Study, Massachusetts
| | - Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, Massachusetts
| | | | - Albert V Smith
- Icelandic Heart Association, Kopavogur, Iceland. Department of Medicine, University of Iceland, Reykjavik
| | - Toshiko Tanaka
- Translational Gerontology Branch, National Institute on Aging, Baltimore, Maryland
| | - Lei Yu
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, Illinois
| | - Alice M Arnold
- Department of Biostatistics, University of Washington, Seattle
| | - Thor Aspelund
- Icelandic Heart Association, Kopavogur, Iceland. Department of Medicine, University of Iceland, Reykjavik
| | - Emelia J Benjamin
- NHLBI's and Boston Univesity's Framingham Heart Study, Massachusetts. Department of Medicine, Sections of Preventive Medicine and Cardiology, Boston University School of Medicine, Massachusetts. Department of Epidemiology, Boston University School of Public Health, Massachusetts
| | - Philip L De Jager
- Program in Translational NeuroPsychiatric Genomics, Institute for the Neurosciences, Departments of Neurology and Psychiatry, Brigham and Women's Hospital, Boston, Massachusetts. Harvard Medical School, Boston, Massachusetts. Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts
| | | | - Denis A Evans
- Rush Institute for Healthy Aging and Department of Internal Medicine, Rush University Medical Center, Chicago, Illinois
| | - Melissa E Garcia
- Laboratory of Epidemiology and Population Sciences, National Institute on Aging, Bethesda, Maryland
| | - Albert Hofman
- Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands. Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, The Netherlands
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College, Bronx, New York
| | | | - Douglas P Kiel
- Harvard Medical School, Boston, Massachusetts. Institute for Aging Research, Hebrew SeniorLife, Harvard Medical School Department of Medicine, Boston, Massachusetts. Division of Gerontology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Ben A Oostra
- Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands. Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, The Netherlands
| | - Eric S Orwoll
- School of Medicine, Oregon Health and Science University, Portland
| | - Neeta Parimi
- California Pacific Medical Center Research Institute, San Francisco
| | - Bruce M Psaty
- Department of Medicine, University of Washington, Seattle. Deparment of Epidemiology, University of Washington, Seattle. Department of Health Services, University of Washington, Seattle. Group Health Research Institute, Group Health Cooperative, Seattle, Washington
| | - Fernando Rivadeneira
- Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands. Institute for Aging Research, Hebrew SeniorLife, Harvard Medical School Department of Medicine, Boston, Massachusetts
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute and Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, California
| | - Sudha Seshadri
- Department of Biostatistics, Boston University School of Public Health, Massachusetts. Department of Neurology, Boston University School of Medicine, Massachusetts
| | - Andrew Singleton
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, Maryland
| | - Henning Tiemeier
- Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands. Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, The Netherlands. Department of Child and Adolescent Psychiatry, Erasmus MC and Sophia Children's Hospital, Rotterdam, The Netherlands
| | - André G Uitterlinden
- Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands. Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, The Netherlands. Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Wei Zhao
- Department of Epidemiology, University of Michigan, Ann Arbor
| | | | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, Illinois
| | - Luigi Ferrucci
- Translational Gerontology Branch, National Institute on Aging, Baltimore, Maryland
| | - Vilmundur Gudnason
- Icelandic Heart Association, Kopavogur, Iceland. Department of Medicine, University of Iceland, Reykjavik
| | - Tamara B Harris
- Laboratory of Epidemiology and Population Sciences, National Institute on Aging, Bethesda, Maryland
| | - David Karasik
- Institute for Aging Research, Hebrew SeniorLife, Harvard Medical School Department of Medicine, Boston, Massachusetts. Faculty of Medicine in The Galilee, Bar-Ilan University, Safed, Israel
| | - Lenore J Launer
- Laboratory of Epidemiology and Population Sciences, National Institute on Aging, Bethesda, Maryland
| | - Thomas T Perls
- Section of Geriatrics, Boston University School of Medicine and Boston Medical Center, Massachusetts
| | - P Eline Slagboom
- Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, The Netherlands. Department of Molecular Epidemiology, Leiden University Medical Center, The Netherlands
| | - Gregory J Tranah
- California Pacific Medical Center Research Institute, San Francisco. Department of Epidemiology and Biostatistics, University of California, San Francisco
| | - David R Weir
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor
| | - Anne B Newman
- Department of Epidemiology, University of Pittsburgh, Pennsylvania. *These authors contributed equally to this work
| | - Cornelia M van Duijn
- Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands. Netherlands Consortium for Healthy Ageing, Leiden University Medical Center, The Netherlands. *These authors contributed equally to this work
| | - Joanne M Murabito
- NHLBI's and Boston Univesity's Framingham Heart Study, Massachusetts. Department of Medicine, Section of General Internal Medicine, Boston University School of Medicine, Massachusetts. *These authors contributed equally to this work.
| |
Collapse
|
47
|
Xu HM, Sun XW, Qi T, Lin WY, Liu N, Lou XY. Multivariate dimensionality reduction approaches to identify gene-gene and gene-environment interactions underlying multiple complex traits. PLoS One 2014; 9:e108103. [PMID: 25259584 PMCID: PMC4178067 DOI: 10.1371/journal.pone.0108103] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2014] [Accepted: 08/18/2014] [Indexed: 11/30/2022] Open
Abstract
The elusive but ubiquitous multifactor interactions represent a stumbling block that urgently needs to be removed in searching for determinants involved in human complex diseases. The dimensionality reduction approaches are a promising tool for this task. Many complex diseases exhibit composite syndromes required to be measured in a cluster of clinical traits with varying correlations and/or are inherently longitudinal in nature (changing over time and measured dynamically at multiple time points). A multivariate approach for detecting interactions is thus greatly needed on the purposes of handling a multifaceted phenotype and longitudinal data, as well as improving statistical power for multiple significance testing via a two-stage testing procedure that involves a multivariate analysis for grouped phenotypes followed by univariate analysis for the phenotypes in the significant group(s). In this article, we propose a multivariate extension of generalized multifactor dimensionality reduction (GMDR) based on multivariate generalized linear, multivariate quasi-likelihood and generalized estimating equations models. Simulations and real data analysis for the cohort from the Study of Addiction: Genetics and Environment are performed to investigate the properties and performance of the proposed method, as compared with the univariate method. The results suggest that the proposed multivariate GMDR substantially boosts statistical power.
Collapse
Affiliation(s)
- Hai-Ming Xu
- Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, P.R. China
- Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, P.R. China
| | - Xi-Wei Sun
- Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, P.R. China
| | - Ting Qi
- Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, P.R. China
| | - Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Nianjun Liu
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Xiang-Yang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- * E-mail:
| |
Collapse
|
48
|
Gusareva ES, Van Steen K. Practical aspects of genome-wide association interaction analysis. Hum Genet 2014; 133:1343-58. [DOI: 10.1007/s00439-014-1480-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 08/18/2014] [Indexed: 12/31/2022]
|
49
|
Detecting epistatic interactions in metagenome-wide association studies by metaBOOST. BIOMED RESEARCH INTERNATIONAL 2014; 2014:398147. [PMID: 25165702 PMCID: PMC4131565 DOI: 10.1155/2014/398147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2014] [Accepted: 07/14/2014] [Indexed: 01/27/2023]
Abstract
Material and Methods. We recall the definition of epistasis and extend it for metagenomic biomarkers and then we describe the overview of our method metaBOOST and provide detailed information about each step of metaBOOST. Results. We describe the data sources for both simulation studies and real metagenomic datasets. Then, we describe the procedure of simulation studies and provide results for it. After that, we conduct real datasets studies and report the results. Conclusions and Discussion. Finally, we conclude our method and discuss some possible improvements for the future.
Collapse
|
50
|
Zhang Q, Long Q, Ott J. AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects. PLoS Comput Biol 2014; 10:e1003627. [PMID: 24901472 PMCID: PMC4046917 DOI: 10.1371/journal.pcbi.1003627] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 04/01/2014] [Indexed: 12/11/2022] Open
Abstract
Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term “glycosaminoglycan biosynthetic process” was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences. Genes do not operate in vacuum. They interact with each other in many ways. Therefore, to figure out genetic causes of disease by case-control association studies, it is important to take interactions into account. There are two fundamental challenges in interaction-focused analysis. The first is the number of possible combinations of genetic variants easily goes to astronomic which is beyond current computational facility, which is referred as “the curse of dimensionality” in field of computer science. The other is, even if all potential combinations could be exhaustively checked, genuine signals are likely to be buried by false positives that are composed of single variant with large main effect and some other irrelevant variant. In this work, we propose AprioriGWAS that employees Apriori, an algorithm that pioneers the branch of “Frequent Itemset Mining” in computer science to cope with daunting numbers of combinations, and conditional permutation, to enable real signals standing out. By applying AprioriGWAS to age-related macular degeneration (AMD) data and bipolar disorder (BD) in WTCCC data, we found interesting interactions between sensible genes in terms of disease. Consequently, AprioriGWAS could be a good tool to find epistasis interaction from GWA data.
Collapse
Affiliation(s)
- Qingrun Zhang
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multi-scale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- * E-mail: , (QZ); (QL)
| | - Quan Long
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multi-scale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- * E-mail: , (QZ); (QL)
| | - Jurg Ott
- Institute of Psychology, Chinese Academy of Sciences, Chaoyang District, Beijing, PR China
- Laboratory of Statistical Genetics, The Rockefeller University, New York, New York, United States of America
| |
Collapse
|