1
|
Wang D, Perera D, He J, Cao C, Kossinna P, Li Q, Zhang W, Guo X, Platt A, Wu J, Zhang Q. cLD: Rare-variant linkage disequilibrium between genomic regions identifies novel genomic interactions. PLoS Genet 2023; 19:e1011074. [PMID: 38109434 PMCID: PMC10758262 DOI: 10.1371/journal.pgen.1011074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 01/01/2024] [Accepted: 11/20/2023] [Indexed: 12/20/2023] Open
Abstract
Linkage disequilibrium (LD) is a fundamental concept in genetics; critical for studying genetic associations and molecular evolution. However, LD measurements are only reliable for common genetic variants, leaving low-frequency variants unanalyzed. In this work, we introduce cumulative LD (cLD), a stable statistic that captures the rare-variant LD between genetic regions, which reflects more biological interactions between variants, in addition to lack of recombination. We derived the theoretical variance of cLD using delta methods to demonstrate its higher stability than LD for rare variants. This property is also verified by bootstrapped simulations using real data. In application, we find cLD reveals an increased genetic association between genes in 3D chromatin interactions, a phenomenon recently reported negatively by calculating standard LD between common variants. Additionally, we show that cLD is higher between gene pairs reported in interaction databases, identifies unreported protein-protein interactions, and reveals interacting genes distinguishing case/control samples in association studies.
Collapse
Affiliation(s)
- Dinghao Wang
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| | - Deshan Perera
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Jingni He
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Chen Cao
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Pathum Kossinna
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Qing Li
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - William Zhang
- The Harker School, San Jose, California, United States of America
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Alexander Platt
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jingjing Wu
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| | - Qingrun Zhang
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
2
|
Zhang Q, Bhatia M, Park T, Ott J. A multi-threaded approach to genotype pattern mining for detecting digenic disease genes. Front Genet 2023; 14:1222517. [PMID: 37693313 PMCID: PMC10483394 DOI: 10.3389/fgene.2023.1222517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 07/31/2023] [Indexed: 09/12/2023] Open
Abstract
To locate disease-causing DNA variants on the human gene map, the customary approach has been to carry out a genome-wide association study for one variant after another by testing for genotype frequency differences between individuals affected and unaffected with disease. So-called digenic traits are due to the combined effects of two variants, often on different chromosomes, while individual variants may have little or no effect on disease. Machine learning approaches have been developed to find variant pairs underlying digenic traits. However, many of these methods have large memory requirements so that only small datasets can be analyzed. The increasing availability of desktop computers with large numbers of processors and suitable programming to distribute the workload evenly over all processors in a machine make a new and relatively straightforward approach possible, that is, to evaluate all existing variant and genotype pairs for disease association. We present a prototype of such a method with two components, Vpairs and Gpairs, and demonstrate its advantages over existing implementations of such well-known algorithms as Apriori and FP-growth. We apply these methods to published case-control datasets on age-related macular degeneration and Parkinson disease and construct an ROC curve for a large set of genotype patterns.
Collapse
Affiliation(s)
- Qingrun Zhang
- Department of Mathematics and Statistics, University of Calgary, Calgary, AB, Canada
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada
| | - Muskan Bhatia
- Amity Institute of Biotechnology, Amity University Madhya Pradesh, Gwalior, India
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, Republic of Korea
| | - Jurg Ott
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY, United States
| |
Collapse
|
3
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 06/06/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
4
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 11/23/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
5
|
Ott J, Park T. Overview of frequent pattern mining. Genomics Inform 2022; 20:e39. [PMID: 36617647 PMCID: PMC9847378 DOI: 10.5808/gi.22074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 12/22/2022] [Indexed: 12/31/2022] Open
Abstract
Various methods of frequent pattern mining have been applied to genetic problems, specifically, to the combined association of two genotypes (a genotype pattern, or diplotype) at different DNA variants with disease. These methods have the ability to come up with a selection of genotype patterns that are more common in affected than unaffected individuals, and the assessment of statistical significance for these selected patterns poses some unique problems, which are briefly outlined here.
Collapse
Affiliation(s)
- Jurg Ott
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10065, USA,Corresponding author E-mail:
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
6
|
Machine learning approaches to explore digenic inheritance. Trends Genet 2022; 38:1013-1018. [DOI: 10.1016/j.tig.2022.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 04/16/2022] [Accepted: 04/25/2022] [Indexed: 11/22/2022]
|
7
|
Barki M, Xue H. GABRB2, a key player in neuropsychiatric disorders and beyond. Gene 2022; 809:146021. [PMID: 34673206 DOI: 10.1016/j.gene.2021.146021] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 08/05/2021] [Accepted: 09/14/2021] [Indexed: 01/11/2023]
Abstract
The GABA receptors represent the main inhibitory system in the central nervous system that ensure synaptogenesis, neurogenesis, and the regulation of neuronal plasticity and learning. GABAA receptors are pentameric in structure and belong to the Cys-loop superfamily. The GABRB2 gene, located on chromosome 5q34, encodes the β2 subunit that combines with the α and γ subunits to form the major subtype of GABAA receptors, which account for 43% of all GABAA receptors in the mammalian brain. Each subunit probably consists of an extracellular N-terminal domain, four membrane-spanning segments, a large intracellular loop between TM3 and TM4, and an extracellular C-terminal domain. Alternative splicing of the RNA transcript of the GABRB2 gene gives rise at least to four long and short isoforms with dissimilar electrophysiological properties. Furthermore, GABRB2 is imprinted and subjected to epigenetic regulation and positive selection. It has been associated with schizophrenia first in Han Chinese, and subsequently validated in other populations. Gabrb2 knockout mice also exhibited schizophrenia-like behavior and neuroinflammation that were ameliorated by the antipsychotic drug risperidone. GABRB2 was also associated with other neuropsychiatric disorders including bipolar disorder, epilepsy, autism spectrum disorder, Alzheimer's disease, frontotemporal dementia, substance dependence, depression, internet gaming disorder, and premenstrual dysphoric disorder. Recently, it has been postulated that GABRB2 might be a potential marker for different cancer types. As GABRB2 has a pivotal role in the central nervous system and is increasingly recognized to contribute to human diseases, further understanding of its structure and function may expedite the generation of new therapeutic approaches.
Collapse
Affiliation(s)
- Manel Barki
- Center for Cancer Genomics, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Hong Xue
- Center for Cancer Genomics, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, Jiangsu, China; Division of Life Science and Applied Genomics Center, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.
| |
Collapse
|
8
|
Adam Y, Samtal C, Brandenburg JT, Falola O, Adebiyi E. Performing post-genome-wide association study analysis: overview, challenges and recommendations. F1000Res 2021; 10:1002. [PMID: 35222990 PMCID: PMC8847724 DOI: 10.12688/f1000research.53962.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/22/2021] [Indexed: 12/17/2022] Open
Abstract
Genome-wide association studies (GWAS) provide huge information on statistically significant single-nucleotide polymorphisms (SNPs) associated with various human complex traits and diseases. By performing GWAS studies, scientists have successfully identified the association of hundreds of thousands to millions of SNPs to a single phenotype. Moreover, the association of some SNPs with rare diseases has been intensively tested. However, classic GWAS studies have not yet provided solid, knowledgeable insight into functional and biological mechanisms underlying phenotypes or mechanisms of diseases. Therefore, several post-GWAS (pGWAS) methods have been recommended. Currently, there is no simple scientific document to provide a quick guide for performing pGWAS analysis. pGWAS is a crucial step for a better understanding of the biological machinery beyond the SNPs. Here, we provide an overview to performing pGWAS analysis and demonstrate the challenges behind each method. Furthermore, we direct readers to key articles for each pGWAS method and present the overall issues in pGWAS analysis. Finally, we include a custom pGWAS pipeline to guide new users when performing their research.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
| | - Chaimae Samtal
- Laboratory of Biotechnology, Environment, Agri-food and Health, Sidi Mohammed Ben Abdellah University, Fez, Fez-Meknes, 30000, Morocco
| | - Jean-tristan Brandenburg
- Sydney Brenner Institute for Molecular Bioscience (SBIMB), University of the Witwatersrand, Johannesburg, South Africa
| | - Oluwadamilare Falola
- Laboratory of Biotechnology, Environment, Agri-food and Health, Sidi Mohammed Ben Abdellah University, Fez, Fez-Meknes, 30000, Morocco
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun, 112233, Nigeria
- Computer & Information Sciences, Covenant University, Ota, Ogun, 112233, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence, Covenant University, Ota, Ogun, 112233, Nigeria
- Applied Bioinformatics Division, German Cancer Center DKFZ - Heidelberg University, Heidelberg, Baden-Württemberg, 69120, Germany
| |
Collapse
|
9
|
Okazaki A, Horpaopan S, Zhang Q, Randesi M, Ott J. Genotype Pattern Mining for Pairs of Interacting Variants Underlying Digenic Traits. Genes (Basel) 2021; 12:1160. [PMID: 34440333 PMCID: PMC8391494 DOI: 10.3390/genes12081160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/23/2021] [Accepted: 07/27/2021] [Indexed: 12/15/2022] Open
Abstract
Some genetic diseases ("digenic traits") are due to the interaction between two DNA variants, which presumably reflects biochemical interactions. For example, certain forms of Retinitis Pigmentosa, a type of blindness, occur in the presence of two mutant variants, one each in the ROM1 and RDS genes, while the occurrence of only one such variant results in a normal phenotype. Detecting variant pairs underlying digenic traits by standard genetic methods is difficult and is downright impossible when individual variants alone have minimal effects. Frequent pattern mining (FPM) methods are known to detect patterns of items. We make use of FPM approaches to find pairs of genotypes (from different variants) that can discriminate between cases and controls. Our method is based on genotype patterns of length two, and permutation testing allows assigning p-values to genotype patterns, where the null hypothesis refers to equal pattern frequencies in cases and controls. We compare different interaction search approaches and their properties on the basis of published datasets. Our implementation of FPM to case-control studies is freely available.
Collapse
Affiliation(s)
- Atsuko Okazaki
- Department of Diagnostics and Therapeutics of Intractable Diseases, Juntendo University, Bunkyo-ku, Tokyo 113-8421, Japan;
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10065, USA
| | - Sukanya Horpaopan
- Department of Anatomy, Faculty of Medical Science, Naresuan University, Phitsanulok 65000, Thailand;
| | - Qingrun Zhang
- Department of Mathematics and Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada;
| | - Matthew Randesi
- Laboratory of the Biology of Addictive Diseases, Rockefeller University, New York, NY 10065, USA;
| | - Jurg Ott
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10065, USA
| |
Collapse
|
10
|
Cao C, Kwok D, Edie S, Li Q, Ding B, Kossinna P, Campbell S, Wu J, Greenberg M, Long Q. kTWAS: integrating kernel machine with transcriptome-wide association studies improves statistical power and reveals novel genes. Brief Bioinform 2021; 22:5985285. [PMID: 33200776 DOI: 10.1093/bib/bbaa270] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 09/17/2020] [Accepted: 09/18/2020] [Indexed: 12/31/2022] Open
Abstract
The power of genotype-phenotype association mapping studies increases greatly when contributions from multiple variants in a focal region are meaningfully aggregated. Currently, there are two popular categories of variant aggregation methods. Transcriptome-wide association studies (TWAS) represent a set of emerging methods that select variants based on their effect on gene expressions, providing pretrained linear combinations of variants for downstream association mapping. In contrast to this, kernel methods such as sequence kernel association test (SKAT) model genotypic and phenotypic variance use various kernel functions that capture genetic similarity between subjects, allowing nonlinear effects to be included. From the perspective of machine learning, these two methods cover two complementary aspects of feature engineering: feature selection/pruning and feature aggregation. Thus far, no thorough comparison has been made between these categories, and no methods exist which incorporate the advantages of TWAS- and kernel-based methods. In this work, we developed a novel method called kernel-based TWAS (kTWAS) that applies TWAS-like feature selection to a SKAT-like kernel association test, combining the strengths of both approaches. Through extensive simulations, we demonstrate that kTWAS has higher power than TWAS and multiple SKAT-based protocols, and we identify novel disease-associated genes in Wellcome Trust Case Control Consortium genotyping array data and MSSNG (Autism) sequence data. The source code for kTWAS and our simulations are available in our GitHub repository (https://github.com/theLongLab/kTWAS).
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, University of Calgary
| | - Devin Kwok
- Department of Mathematics & Statistics, University of Calgary
| | | | - Qing Li
- Department of Biochemistry & Molecular Biology, University of Calgary
| | - Bowei Ding
- Department of Mathematics & Statistics, University of Calgary
| | - Pathum Kossinna
- Department of Biochemistry & Molecular Biology, University of Calgary
| | | | - Jingjing Wu
- Department of Mathematics & Statistics, University of Calgary
| | | | - Quan Long
- Departments of Biochemistry & Molecular Biology, Medical Genetics and Mathematics & Statistics
| |
Collapse
|
11
|
Kosnik MB, Enroth S, Karlsson O. Distinct genetic regions are associated with differential population susceptibility to chemical exposures. ENVIRONMENT INTERNATIONAL 2021; 152:106488. [PMID: 33714141 DOI: 10.1016/j.envint.2021.106488] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/22/2021] [Accepted: 02/24/2021] [Indexed: 06/12/2023]
Abstract
Interactions between environmental factors and genetics underlie the majority of chronic human diseases. Chemical exposures are likely an underestimated contributor, yet gene-environment (GxE) interaction studies rarely assess their modifying effects. Here, we describe a novel method to profile the human genome and identify regions associated with differential population susceptibility to chemical exposures. Single nucleotide polymorphisms (SNPs) implicated in enriched chemical-disease intersections were identified and validated for three chemical classes with expected GxE interaction potential (neuroactive, hepatoactive, and cardioactive compounds). The same approach was then used to characterize consumer product classes with unknown risk for GxE interactions (washing products, cosmetics, and adhesives). Additionally, high-risk variant sets that may confer differential population susceptibility were identified for these consumer product groups through frequent itemset mining and pathway analysis. A dataset of 2454 consumer product chemical-disease linkages, with risk values, SNPs, and pathways for each association was developed, describing the interplay between environmental factors and genetics in human disease progression. We found that genetic hotspots implicated in GxE interactions differ across chemical classes (e.g., washing products had high-risk SNPs implicated in nervous system disease) and illustrate how this approach can discover new associations (e.g., washing product n-butoxyethanol implicated SNPs in the PI3K-Akt signaling pathway for Alzheimer's disease). Hence, our approach can predict high-risk genetic regions for differential population susceptibility to chemical exposures and characterize chemical modifying factors in specific diseases. These methods show promise for describing how chemical exposures can lead to varied health outcomes in a population and for incorporating inter-individual variability into chemical risk assessment.
Collapse
Affiliation(s)
- Marissa B Kosnik
- Science for Life Laboratory, Department of Environmental Science, Stockholm University, 114 18 Stockholm, Sweden.
| | - Stefan Enroth
- Department of Immunology, Genetics, and Pathology, Biomedical Center, Science for Life Laboratory Uppsala, Uppsala University, 751 85 Uppsala, Sweden.
| | - Oskar Karlsson
- Science for Life Laboratory, Department of Environmental Science, Stockholm University, 114 18 Stockholm, Sweden.
| |
Collapse
|
12
|
EpiMOGA: An Epistasis Detection Method Based on a Multi-Objective Genetic Algorithm. Genes (Basel) 2021; 12:genes12020191. [PMID: 33525573 PMCID: PMC7911965 DOI: 10.3390/genes12020191] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 01/12/2021] [Accepted: 01/25/2021] [Indexed: 12/28/2022] Open
Abstract
In genome-wide association studies, detecting high-order epistasis is important for analyzing the occurrence of complex human diseases and explaining missing heritability. However, there are various challenges in the actual high-order epistasis detection process due to the large amount of data, “small sample size problem”, diversity of disease models, etc. This paper proposes a multi-objective genetic algorithm (EpiMOGA) for single nucleotide polymorphism (SNP) epistasis detection. The K2 score based on the Bayesian network criterion and the Gini index of the diversity of the binary classification problem were used to guide the search process of the genetic algorithm. Experiments were performed on 26 simulated datasets of different models and a real Alzheimer’s disease dataset. The results indicated that EpiMOGA was obviously superior to other related and competitive methods in both detection efficiency and accuracy, especially for small-sample-size datasets, and the performance of EpiMOGA remained stable across datasets of different disease models. At the same time, a number of SNP loci and 2-order epistasis associated with Alzheimer’s disease were identified by the EpiMOGA method, indicating that this method is capable of identifying high-order epistasis from genome-wide data and can be applied in the study of complex diseases.
Collapse
|
13
|
Chen X, Wong KC. Epistasis Detection Based on Epi-GTBN. Methods Mol Biol 2021; 2212:325-335. [PMID: 33733365 DOI: 10.1007/978-1-0716-0947-7_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Epistasis detection is a hot topic in bioinformatics due to its relevance to the detection of specific phenotypic traits and gene-gene interactions. Here, we present a step-by-step protocol to apply Epi-GTBN, a machine learning-based method based on genetic algorithm and Bayesian network to effectively mine the epistasis loci. Epi-GTBN utilizes the advantages of genetic algorithm that can achieve a global search and avoid falling into local optima incorporating it into the Bayesian network to obtain the best structure of the model. In this chapter, we describe an example of Epi-GTBN to help researchers to analyze the epistasis and gene-gene interactions of their own datasets and build the corresponding SNP-SNP network.
Collapse
Affiliation(s)
- Xingjian Chen
- City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Ka-Chun Wong
- Department of Computer Science, College of Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong.
| |
Collapse
|
14
|
Epi-GTBN: an approach of epistasis mining based on genetic Tabu algorithm and Bayesian network. BMC Bioinformatics 2019; 20:444. [PMID: 31455207 PMCID: PMC6712799 DOI: 10.1186/s12859-019-3022-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 08/07/2019] [Indexed: 12/31/2022] Open
Abstract
Background Mining epistatic loci which affects specific phenotypic traits is an important research issue in the field of biology. Bayesian network (BN) is a graphical model which can express the relationship between genetic loci and phenotype. Until now, it has been widely used into epistasis mining in many research work. However, this method has two disadvantages: low learning efficiency and easy to fall into local optimum. Genetic algorithm has the excellence of rapid global search and avoiding falling into local optimum. It is scalable and easy to integrate with other algorithms. This work proposes an epistasis mining approach based on genetic tabu algorithm and Bayesian network (Epi-GTBN). It uses genetic algorithm into the heuristic search strategy of Bayesian network. The individual structure can be evolved through the genetic operations of selection, crossover and mutation. It can help to find the optimal network structure, and then further to mine the epistasis loci effectively. In order to enhance the diversity of the population and obtain a more effective global optimal solution, we use the tabu search strategy into the operations of crossover and mutation in genetic algorithm. It can help to accelerate the convergence of the algorithm. Results We compared Epi-GTBN with other recent algorithms using both simulated and real datasets. The experimental results demonstrate that our method has much better epistasis detection accuracy in the case of not affecting the efficiency for different datasets. Conclusions The presented methodology (Epi-GTBN) is an effective method for epistasis detection, and it can be seen as an interesting addition to the arsenal used in complex traits analyses. Electronic supplementary material The online version of this article (10.1186/s12859-019-3022-z) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Kao PYP, Leung KH, Chan LWC, Yip SP, Yap MKH. Pathway analysis of complex diseases for GWAS, extending to consider rare variants, multi-omics and interactions. Biochim Biophys Acta Gen Subj 2016; 1861:335-353. [PMID: 27888147 DOI: 10.1016/j.bbagen.2016.11.030] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 10/17/2016] [Accepted: 11/19/2016] [Indexed: 12/20/2022]
Abstract
BACKGROUND Genome-wide association studies (GWAS) is a major method for studying the genetics of complex diseases. Finding all sequence variants to explain fully the aetiology of a disease is difficult because of their small effect sizes. To better explain disease mechanisms, pathway analysis is used to consolidate the effects of multiple variants, and hence increase the power of the study. While pathway analysis has previously been performed within GWAS only, it can now be extended to examining rare variants, other "-omics" and interaction data. SCOPE OF REVIEW 1. Factors to consider in the choice of software for GWAS pathway analysis. 2. Examples of how pathway analysis is used to analyse rare variants, other "-omics" and interaction data. MAJOR CONCLUSIONS To choose appropriate software tools, factors for consideration include covariate compatibility, null hypothesis, one- or two-step analysis required, curation method of gene sets, size of pathways, and size of flanking regions to define gene boundaries. For rare variants, analysis performance depends on consistency between assumed and actual effect distribution of variants. Integration of other "-omics" data and interaction can better explain gene functions. GENERAL SIGNIFICANCE Pathway analysis methods will be more readily used for integration of multiple sources of data, and enable more accurate prediction of phenotypes.
Collapse
Affiliation(s)
- Patrick Y P Kao
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Kim Hung Leung
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Lawrence W C Chan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Shea Ping Yip
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China.
| | - Maurice K H Yap
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
16
|
Woo HJ, Yu C, Kumar K, Gold B, Reifman J. Genotype distribution-based inference of collective effects in genome-wide association studies: insights to age-related macular degeneration disease mechanism. BMC Genomics 2016; 17:695. [PMID: 27576376 PMCID: PMC5006276 DOI: 10.1186/s12864-016-2871-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Accepted: 07/01/2016] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Genome-wide association studies provide important insights to the genetic component of disease risks. However, an existing challenge is how to incorporate collective effects of interactions beyond the level of independent single nucleotide polymorphism (SNP) tests. While methods considering each SNP pair separately have provided insights, a large portion of expected heritability may reside in higher-order interaction effects. RESULTS We describe an inference approach (discrete discriminant analysis; DDA) designed to probe collective interactions while treating both genotypes and phenotypes as random variables. The genotype distributions in case and control groups are modeled separately based on empirical allele frequency and covariance data, whose differences yield disease risk parameters. We compared pairwise tests and collective inference methods, the latter based both on DDA and logistic regression. Analyses using simulated data demonstrated that significantly higher sensitivity and specificity can be achieved with collective inference in comparison to pairwise tests, and with DDA in comparison to logistic regression. Using age-related macular degeneration (AMD) data, we demonstrated two possible applications of DDA. In the first application, a genome-wide SNP set is reduced into a small number (∼100) of variants via filtering and SNP pairs with significant interactions are identified. We found that interactions between SNPs with highest AMD association were epigenetically active in the liver, adipocytes, and mesenchymal stem cells. In the other application, multiple groups of SNPs were formed from the genome-wide data and their relative strengths of association were compared using cross-validation. This analysis allowed us to discover novel collections of loci for which interactions between SNPs play significant roles in their disease association. In particular, we considered pathway-based groups of SNPs containing up to ∼10, 000 variants in each group. In addition to pathways related to complement activation, our collective inference pointed to pathway groups involved in phospholipid synthesis, oxidative stress, and apoptosis, consistent with the AMD pathogenesis mechanism where the dysfunction of retinal pigment epithelium cells plays central roles. CONCLUSIONS The simultaneous inference of collective interaction effects within a set of SNPs has the potential to reveal novel aspects of disease association.
Collapse
Affiliation(s)
- Hyung Jun Woo
- Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, USA
| | - Chenggang Yu
- Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, USA
| | - Kamal Kumar
- Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, USA
| | - Bert Gold
- Laboratory of Genomic Diversity, National Cancer Institute, Frederick, Maryland, USA
| | - Jaques Reifman
- Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, USA.
| |
Collapse
|
17
|
Simon PHG, Sylvestre MP, Tremblay J, Hamet P. Key Considerations and Methods in the Study of Gene-Environment Interactions. Am J Hypertens 2016; 29:891-9. [PMID: 27037711 DOI: 10.1093/ajh/hpw021] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 02/08/2016] [Indexed: 12/16/2022] Open
Abstract
With increased involvement of genetic data in most epidemiological investigations, gene-environment (G × E) interactions now stand as a topic, which must be meticulously assessed and thoroughly understood. The level, mode, and outcomes of interactions between environmental factors and genetic traits have the capacity to modulate disease risk. These must, therefore, be carefully evaluated as they have the potential to offer novel insights on the "missing heritability problem", reaching beyond our current limitations. First, we review a definition of G × E interactions. We then explore how concepts such as the early manifestation of the genetic components of a disease, the heterogeneity of complex traits, the clear definition of epidemiological strata, and the effect of varying physiological conditions can affect our capacity to detect (or miss) G × E interactions. Lastly, we discuss the shortfalls of regression models to study G × E interactions and how other methods such as the ReliefF algorithm, pattern recognition methods, or the LASSO (Least Absolute Shrinkage and Selection Operator) method can enable us to more adequately model G × E interactions. Overall, we present the elements to consider and a path to follow when studying genetic determinants of disease in order to uncover potential G × E interactions.
Collapse
Affiliation(s)
- Paul H G Simon
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Marie-Pierre Sylvestre
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Johanne Tremblay
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Pavel Hamet
- CHUM Research Center, Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada.
| |
Collapse
|
18
|
FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm. PLoS One 2016; 11:e0150669. [PMID: 27014873 PMCID: PMC4807955 DOI: 10.1371/journal.pone.0150669] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 02/16/2016] [Indexed: 12/24/2022] Open
Abstract
Motivation Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models. Method In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models. Results We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.
Collapse
|
19
|
Chimusa ER, Mbiyavanga M, Mazandu GK, Mulder NJ. ancGWAS: a post genome-wide association study method for interaction, pathway and ancestry analysis in homogeneous and admixed populations. Bioinformatics 2015; 32:549-56. [PMID: 26508762 DOI: 10.1093/bioinformatics/btv619] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 10/16/2015] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Despite numerous successful Genome-wide Association Studies (GWAS), detecting variants that have low disease risk still poses a challenge. GWAS may miss disease genes with weak genetic effects or strong epistatic effects due to the single-marker testing approach commonly used. GWAS may thus generate false negative or inconclusive results, suggesting the need for novel methods to combine effects of single nucleotide polymorphisms within a gene to increase the likelihood of fully characterizing the susceptibility gene. RESULTS We developed ancGWAS, an algebraic graph-based centrality measure that accounts for linkage disequilibrium in identifying significant disease sub-networks by integrating the association signal from GWAS data sets into the human protein-protein interaction (PPI) network. We validated ancGWAS using an association study result from a breast cancer data set and the simulation of interactive disease loci in the simulation of a complex admixed population, as well as pathway-based GWAS simulation. This new approach holds promise for deconvoluting the interactions between genes underlying the pathogenesis of complex diseases. Results obtained yield a novel central breast cancer sub-network of the human interactome implicated in the proteoglycan syndecan-mediated signaling events pathway which is known to play a major role in mesenchymal tumor cell proliferation, thus providing further insights into breast cancer pathogenesis. AVAILABILITY AND IMPLEMENTATION The ancGWAS package and documents are available at http://www.cbio.uct.ac.za/~emile/software.html.
Collapse
Affiliation(s)
- Emile R Chimusa
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925, Observatory, South Africa and
| | - Mamana Mbiyavanga
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925, Observatory, South Africa and African Institute for Mathematical Sciences, 7945 Muizenberg, Cape Town, South Africa
| | - Gaston K Mazandu
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925, Observatory, South Africa and African Institute for Mathematical Sciences, 7945 Muizenberg, Cape Town, South Africa
| | - Nicola J Mulder
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, 7925, Observatory, South Africa and
| |
Collapse
|
20
|
Huh I, Kwon MS, Park T. An Efficient Stepwise Statistical Test to Identify Multiple Linked Human Genetic Variants Associated with Specific Phenotypic Traits. PLoS One 2015; 10:e0138700. [PMID: 26406920 PMCID: PMC4583484 DOI: 10.1371/journal.pone.0138700] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 09/02/2015] [Indexed: 11/19/2022] Open
Abstract
Recent advances in genotyping methodologies have allowed genome-wide association studies (GWAS) to accurately identify genetic variants that associate with common or pathological complex traits. Although most GWAS have focused on associations with single genetic variants, joint identification of multiple genetic variants, and how they interact, is essential for understanding the genetic architecture of complex phenotypic traits. Here, we propose an efficient stepwise method based on the Cochran-Mantel-Haenszel test (for stratified categorical data) to identify causal joint multiple genetic variants in GWAS. This method combines the CMH statistic with a stepwise procedure to detect multiple genetic variants associated with specific categorical traits, using a series of associated I × J contingency tables and a null hypothesis of no phenotype association. Through a new stratification scheme based on the sum of minor allele count criteria, we make the method more feasible for GWAS data having sample sizes of several thousands. We also examine the properties of the proposed stepwise method via simulation studies, and show that the stepwise CMH test performs better than other existing methods (e.g., logistic regression and detection of associations by Markov blanket) for identifying multiple genetic variants. Finally, we apply the proposed approach to two genomic sequencing datasets to detect linked genetic variants associated with bipolar disorder and obesity, respectively.
Collapse
Affiliation(s)
- Iksoo Huh
- Department of Statistics, Seoul National University, Gwanak-gu, Seoul, Korea
| | - Min-Seok Kwon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-gu, Seoul, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Gwanak-gu, Seoul, Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-gu, Seoul, Korea
- * E-mail:
| |
Collapse
|