1
|
Ghamlouch H, Boyle EM, Blaney P, Wang Y, Choi J, Williams L, Bauer M, Auclair D, Bruno B, Walker BA, Davies FE, Morgan GJ. Insights into high-risk multiple myeloma from an analysis of the role of PHF19 in cancer. J Exp Clin Cancer Res 2021; 40:380. [PMID: 34857028 PMCID: PMC8638425 DOI: 10.1186/s13046-021-02185-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/13/2021] [Indexed: 02/07/2023] Open
Abstract
Despite improvements in outcome, 15-25% of newly diagnosed multiple myeloma (MM) patients have treatment resistant high-risk (HR) disease with a poor survival. The lack of a genetic basis for HR has focused attention on the role played by epigenetic changes. Aberrant expression and somatic mutations affecting genes involved in the regulation of tri-methylation of the lysine (K) 27 on histone 3 H3 (H3K27me3) are common in cancer. H3K27me3 is catalyzed by EZH2, the catalytic subunit of the Polycomb Repressive Complex 2 (PRC2). The deregulation of H3K27me3 has been shown to be involved in oncogenic transformation and tumor progression in a variety of hematological malignancies including MM. Recently we have shown that aberrant overexpression of the PRC2 subunit PHD Finger Protein 19 (PHF19) is the most significant overall contributor to HR status further focusing attention on the role played by epigenetic change in MM. By modulating both the PRC2/EZH2 catalytic activity and recruitment, PHF19 regulates the expression of key genes involved in cell growth and differentiation. Here we review the expression, regulation and function of PHF19 both in normal and the pathological contexts of solid cancers and MM. We present evidence that strongly implicates PHF19 in the regulation of genes important in cell cycle and the genetic stability of MM cells making it highly relevant to HR MM behavior. A detailed understanding of the normal and pathological functions of PHF19 will allow us to design therapeutic strategies able to target aggressive subsets of MM.
Collapse
Affiliation(s)
- Hussein Ghamlouch
- Myeloma Research Program, NYU Langone Medical Center, Perlmutter Cancer Center, 522 1st Avenue, Manhattan, New York City, NY, 10016, USA.
| | - Eileen M Boyle
- Myeloma Research Program, NYU Langone Medical Center, Perlmutter Cancer Center, 522 1st Avenue, Manhattan, New York City, NY, 10016, USA
| | - Patrick Blaney
- Myeloma Research Program, NYU Langone Medical Center, Perlmutter Cancer Center, 522 1st Avenue, Manhattan, New York City, NY, 10016, USA
- Applied Bioinformatics Laboratories (ABL), NYU Langone Medical Center, New York, NY, USA
| | - Yubao Wang
- Myeloma Research Program, NYU Langone Medical Center, Perlmutter Cancer Center, 522 1st Avenue, Manhattan, New York City, NY, 10016, USA
| | - Jinyoung Choi
- Myeloma Research Program, NYU Langone Medical Center, Perlmutter Cancer Center, 522 1st Avenue, Manhattan, New York City, NY, 10016, USA
| | - Louis Williams
- Myeloma Research Program, NYU Langone Medical Center, Perlmutter Cancer Center, 522 1st Avenue, Manhattan, New York City, NY, 10016, USA
| | - Michael Bauer
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Daniel Auclair
- The Multiple Myeloma Research Foundation (MMRF), Norwalk, CT, USA
| | - Benedetto Bruno
- Myeloma Research Program, NYU Langone Medical Center, Perlmutter Cancer Center, 522 1st Avenue, Manhattan, New York City, NY, 10016, USA
| | - Brian A Walker
- Division of Hematology Oncology, Indiana University, Indianapolis, IN, USA
| | - Faith E Davies
- Myeloma Research Program, NYU Langone Medical Center, Perlmutter Cancer Center, 522 1st Avenue, Manhattan, New York City, NY, 10016, USA
| | - Gareth J Morgan
- Myeloma Research Program, NYU Langone Medical Center, Perlmutter Cancer Center, 522 1st Avenue, Manhattan, New York City, NY, 10016, USA.
| |
Collapse
|
2
|
Zhang W, Li J, Guo Y, Zhang L, Xu L, Gao X, Zhu B, Gao H, Ni H, Chen Y. Multi-strategy genome-wide association studies identify the DCAF16-NCAPG region as a susceptibility locus for average daily gain in cattle. Sci Rep 2016; 6:38073. [PMID: 27892541 PMCID: PMC5125095 DOI: 10.1038/srep38073] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 11/04/2016] [Indexed: 01/16/2023] Open
Abstract
Average daily gain (ADG) is the most economically important trait in beef cattle industry. Using genome-wide association study (GWAS) approaches, previous studies have identified several causal variants within the PLAG1, NCAPG and LCORL genes for ADG in cattle. Multi-strategy GWASs were implemented in this study to improve detection and to explore the causal genes and regions. In this study, we conducted GWASs based on the genotypes of 1,173 Simmental cattle. In the SNP-based GWAS, the most significant SNPs (rs109303784 and rs110058857, P = 1.78 × 10−7) were identified in the NCAPG intron on BTA6 and explained 4.01% of the phenotypic variance, and the independent and significant SNP (rs110406669, P = 5.18 × 10−6) explained 3.32% of the phenotypic variance. Similarly, in the haplotype-based GWAS, the most significant haplotype block, Hap-6-N1416 (P = 2.56 × 10−8), spanned 12.7 kb on BTA6 and explained 4.85% of the phenotypic variance. Also, in the gene-based GWAS, seven significant genes were obtained which included DCAF16 and NCAPG. Moreover, analysis of the transcript levels confirmed that transcripts abundance of NCAPG (P = 0.046) and DCAF16 (P = 0.046) were significantly correlated with the ADG trait. Overall, our results from the multi-strategy GWASs revealed the DCAF16-NCAPG region to be a susceptibility locus for ADG in cattle.
Collapse
Affiliation(s)
- Wengang Zhang
- Cattle Genetics and Breeding Group, Institute of Animal Science (IAS), Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China
| | - Junya Li
- Cattle Genetics and Breeding Group, Institute of Animal Science (IAS), Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China
| | - Yong Guo
- Animal Science and Technology College, Beijing University of Agriculture (BUA), Beijing 102206, China
| | - Lupei Zhang
- Cattle Genetics and Breeding Group, Institute of Animal Science (IAS), Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China
| | - Lingyang Xu
- Cattle Genetics and Breeding Group, Institute of Animal Science (IAS), Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China
| | - Xue Gao
- Cattle Genetics and Breeding Group, Institute of Animal Science (IAS), Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China
| | - Bo Zhu
- Cattle Genetics and Breeding Group, Institute of Animal Science (IAS), Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China
| | - Huijiang Gao
- Cattle Genetics and Breeding Group, Institute of Animal Science (IAS), Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China
| | - Hemin Ni
- Animal Science and Technology College, Beijing University of Agriculture (BUA), Beijing 102206, China
| | - Yan Chen
- Cattle Genetics and Breeding Group, Institute of Animal Science (IAS), Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China
| |
Collapse
|
3
|
Wen Y, He Z, Li M, Lu Q. Risk Prediction Modeling of Sequencing Data Using a Forward Random Field Method. Sci Rep 2016; 6:21120. [PMID: 26892725 PMCID: PMC4759688 DOI: 10.1038/srep21120] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 01/18/2016] [Indexed: 11/09/2022] Open
Abstract
With the advance in high-throughput sequencing technology, it is feasible to investigate the role of common and rare variants in disease risk prediction. While the new technology holds great promise to improve disease prediction, the massive amount of data and low frequency of rare variants pose great analytical challenges on risk prediction modeling. In this paper, we develop a forward random field method (FRF) for risk prediction modeling using sequencing data. In FRF, subjects' phenotypes are treated as stochastic realizations of a random field on a genetic space formed by subjects' genotypes, and an individual's phenotype can be predicted by adjacent subjects with similar genotypes. The FRF method allows for multiple similarity measures and candidate genes in the model, and adaptively chooses the optimal similarity measure and disease-associated genes to reflect the underlying disease model. It also avoids the specification of the threshold of rare variants and allows for different directions and magnitudes of genetic effects. Through simulations, we demonstrate the FRF method attains higher or comparable accuracy over commonly used support vector machine based methods under various disease models. We further illustrate the FRF method with an application to the sequencing data obtained from the Dallas Heart Study.
Collapse
Affiliation(s)
- Yalu Wen
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Zihuai He
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A
| | - Ming Li
- Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, IN 47405, U.S.A
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, U.S.A
| |
Collapse
|
4
|
Lin X, Deng FY, Lu X, Lei SF. Susceptibility Genes for Multiple Sclerosis Identified in a Gene-Based Genome-Wide Association Study. J Clin Neurol 2015; 11:311-8. [PMID: 26320842 PMCID: PMC4596110 DOI: 10.3988/jcn.2015.11.4.311] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Revised: 11/22/2014] [Accepted: 11/26/2014] [Indexed: 01/18/2023] Open
Abstract
Background and Purpose Multiple sclerosis (MS) is a demyelinating and inflammatory disease of the central nervous system. The aim of this study was to identify more genes associated with MS. Methods Based on the publicly available data of the single-nucleotide polymorphism-based genome-wide association study (GWAS) from the database of Genotypes and Phenotypes, we conducted a powerful gene-based GWAS in an initial sample with 931 family trios, and a replication study sample with 978 cases and 883 controls. For interesting genes, gene expression in MS-related cells between MS cases and controls was examined by using publicly available datasets. Results A total of 58 genes was identified, including 20 "novel" genes significantly associated with MS (p<1.40×10-4). In the replication study, 44 of the 58 identified genes had been genotyped and 35 replicated the association. In the gene-expression study, 21 of the 58 identified genes exhibited differential expressions in MS-related cells. Thus, 15 novel genes were supported by replicated association and/or differential expression. In particular, four of the novel genes, those encoding myelin oligodendrocyte glycoprotein (MOG), coiled-coil alpha-helical rod protein 1 (CCHCR1), human leukocyte antigen complex group 22 (HCG22), and major histocompatibility complex, class II, DM alpha (HLA-DMA), were supported by the evidence of both. Conclusions The results of this study emphasize the high power of gene-based GWAS in detecting the susceptibility genes of MS. The novel genes identified herein may provide new insights into the molecular genetic mechanisms underlying MS.
Collapse
Affiliation(s)
- Xiang Lin
- Center for Genetic Epidemiology and Genomics, School of Public Health, Soochow University, Suzhou, Jiangsu, People's Republic of China.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, School of Public Health, Soochow University, Suzhou, Jiangsu, People's Republic of China.,Department of Tuberculosis Control, Ningbo Municipal Center for Disease Control & Prevention, Ningbo, Zhejiang, People's Republic of China
| | - Fei Yan Deng
- Center for Genetic Epidemiology and Genomics, School of Public Health, Soochow University, Suzhou, Jiangsu, People's Republic of China.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, School of Public Health, Soochow University, Suzhou, Jiangsu, People's Republic of China
| | - Xin Lu
- Center for Genetic Epidemiology and Genomics, School of Public Health, Soochow University, Suzhou, Jiangsu, People's Republic of China.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, School of Public Health, Soochow University, Suzhou, Jiangsu, People's Republic of China
| | - Shu Feng Lei
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, School of Public Health, Soochow University, Suzhou, Jiangsu, People's Republic of China.,Center for Genetic Epidemiology and Genomics, School of Public Health, Soochow University, Suzhou, Jiangsu, People's Republic of China.
| |
Collapse
|
5
|
Yuan Z, Zhang X, Li F, Zhao J, Xue F. Comparing partial least square approaches in a gene- or region-based association study for multiple quantitative phenotypes. Hum Biol 2014; 86:51-8. [PMID: 25401986 DOI: 10.3378/027.086.0106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/31/2013] [Indexed: 11/05/2022]
Abstract
On thinking quantitatively of complex diseases, there are at least three statistical strategies for association studies: one single-nucleotide polymorphism (SNP) on a single trait, gene or region (with multiple SNPs) on a single trait, and gene or region on multiple traits. The third approach is the most general in dissecting genetic mechanisms underlying complex diseases underpinning multiple quantitative traits. Gene or region association methods based on partial least square (PLS) approaches have been shown to have apparent power advantage. However, few approaches have been developed for multiple quantitative phenotypes or traits underlying a condition or disease, and the performance of various PLS approaches used in association studies for multiple quantitative traits have not been assessed. Here we exploit association between multiple SNPs and multiple phenotypes or traits, from a regression perspective, through exhaustive scan statistics (sliding window) using PLS and sparse PLS regressions. Simulations were conducted to assess the performance of the proposed scan statistics and compare them with existing methods. The proposed methods were applied to 12 regions of genome-wide association study data from the European Prospective Investigation of Cancer-Norfolk study.
Collapse
Affiliation(s)
- Zhongshang Yuan
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Shandong, China
| | - Xiaoshuai Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Shandong, China
| | - Fangyu Li
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Shandong, China
| | - Jinghua Zhao
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
| | - Fuzhong Xue
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Shandong, China
| |
Collapse
|
6
|
Yang HC, Lin CW, Chen CW, Chen JJ. Applying genome-wide gene-based expression quantitative trait locus mapping to study population ancestry and pharmacogenetics. BMC Genomics 2014; 15:319. [PMID: 24779372 PMCID: PMC4236814 DOI: 10.1186/1471-2164-15-319] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 04/15/2014] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Gene-based analysis has become popular in genomic research because of its appealing biological and statistical properties compared with those of a single-locus analysis. However, only a few, if any, studies have discussed a mapping of expression quantitative trait loci (eQTL) in a gene-based framework. Neither study has discussed ancestry-informative eQTL nor investigated their roles in pharmacogenetics by integrating single nucleotide polymorphism (SNP)-based eQTL (s-eQTL) and gene-based eQTL (g-eQTL). RESULTS In this g-eQTL mapping study, the transcript expression levels of genes (transcript-level genes; T-genes) were correlated with the SNPs of genes (sequence-level genes; S-genes) by using a method of gene-based partial least squares (PLS). Ancestry-informative transcripts were identified using a rank-score-based multivariate association test, and ancestry-informative eQTL were identified using Fisher's exact test. Furthermore, key ancestry-predictive eQTL were selected in a flexible discriminant analysis. We analyzed SNPs and gene expression of 210 independent people of African-, Asian- and European-descent. We identified numerous cis- and trans-acting g-eQTL and s-eQTL for each population by using PLS. We observed ancestry information enriched in eQTL. Furthermore, we identified 2 ancestry-informative eQTL associated with adverse drug reactions and/or drug response. Rs1045642, located on MDR1, is an ancestry-informative eQTL (P = 2.13E-13, using Fisher's exact test) associated with adverse drug reactions to amitriptyline and nortriptyline and drug responses to morphine. Rs20455, located in KIF6, is an ancestry-informative eQTL (P = 2.76E-23, using Fisher's exact test) associated with the response to statin drugs (e.g., pravastatin and atorvastatin). The ancestry-informative eQTL of drug biotransformation genes were also observed; cross-population cis-acting expression regulators included SPG7, TAP2, SLC7A7, and CYP4F2. Finally, we also identified key ancestry-predictive eQTL and established classification models with promising training and testing accuracies in separating samples from close populations. CONCLUSIONS In summary, we developed a gene-based PLS procedure and a SAS macro for identifying g-eQTL and s-eQTL. We established data archives of eQTL for global populations. The program and data archives are accessible at http://www.stat.sinica.edu.tw/hsinchou/genetics/eQTL/HapMapII.htm. Finally, the results from our investigations regarding the interrelationship between eQTL, ancestry information, and pharmacodynamics provide rich resources for future eQTL studies and practical applications in population genetics and medical genetics.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, No 128, Academia Road, Section 2, Nankang, Taipei, Taiwan.
| | | | | | | |
Collapse
|
7
|
Qiu YH, Deng FY, Li MJ, Lei SF. Identification of novel risk genes associated with type 1 diabetes mellitus using a genome-wide gene-based association analysis. J Diabetes Investig 2014; 5:649-56. [PMID: 25422764 PMCID: PMC4234227 DOI: 10.1111/jdi.12228] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/18/2013] [Revised: 01/23/2014] [Accepted: 02/21/2014] [Indexed: 01/05/2023] Open
Abstract
Aims/Introduction Type 1 diabetes mellitus is a serious disorder characterized by destruction of pancreatic β-cells, culminating in absolute insulin deficiency. Genetic factors contribute to the susceptibility of type 1 diabetes mellitus. The aim of the present study was to identify more susceptibility genes of type 1 diabetes mellitus. Materials and Methods We carried out an initial gene-based genome-wide association study in a total of 4,075 type 1 diabetes mellitus cases and 2,604 controls by using the Gene-based Association Test using Extended Simes procedure. Furthermore, we carried out replication studies, differential expression analysis and functional annotation clustering analysis to support the significance of the identified susceptibility genes. Results We identified 452 genes associated with type 1 diabetes mellitus, even after adapting the genome-wide threshold for significance (P < 9.05E-04). Among these genes, 171 were newly identified for type 1 diabetes mellitus, which were ignored in single-nucleotide polymorphism-based association analysis and were not previously reported. We found that 53 genes have supportive evidence from replication studies and/or differential expression studies. In particular, seven genes including four non-human leukocyte antigen (HLA) genes (RASIP1, STRN4, BCAR1 and MYL2) are replicated in at least one independent population and also differentially expressed in peripheral blood mononuclear cells or monocytes. Furthermore, the associated genes tend to enrich in immune-related pathways or Gene Ontology project terms. Conclusions The present results suggest the high power of gene-based association analysis in detecting disease-susceptibility genes. Our findings provide more insights into the genetic basis of type 1 diabetes mellitus.
Collapse
Affiliation(s)
- Ying-Hua Qiu
- Center for Genetic Epidemiology and Genomics, School of Public Health, Soochow University Suzhou, Jiangsu, China ; Department of Epidemiology, School of Public Health, Soochow University Suzhou, Jiangsu, China
| | - Fei-Yan Deng
- Center for Genetic Epidemiology and Genomics, School of Public Health, Soochow University Suzhou, Jiangsu, China ; Department of Epidemiology, School of Public Health, Soochow University Suzhou, Jiangsu, China
| | - Min-Jing Li
- Center for Genetic Epidemiology and Genomics, School of Public Health, Soochow University Suzhou, Jiangsu, China ; Department of Epidemiology, School of Public Health, Soochow University Suzhou, Jiangsu, China
| | - Shu-Feng Lei
- Center for Genetic Epidemiology and Genomics, School of Public Health, Soochow University Suzhou, Jiangsu, China ; Department of Epidemiology, School of Public Health, Soochow University Suzhou, Jiangsu, China
| |
Collapse
|
8
|
Li F, Zhao J, Yuan Z, Zhang X, Ji J, Xue F. A powerful latent variable method for detecting and characterizing gene-based gene-gene interaction on multiple quantitative traits. BMC Genet 2013; 14:89. [PMID: 24059907 PMCID: PMC3848962 DOI: 10.1186/1471-2156-14-89] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2013] [Accepted: 09/17/2013] [Indexed: 01/10/2023] Open
Abstract
Background On thinking quantitatively of complex diseases, there are at least three statistical strategies for analyzing the gene-gene interaction: SNP by SNP interaction on single trait, gene-gene (each can involve multiple SNPs) interaction on single trait and gene-gene interaction on multiple traits. The third one is the most general in dissecting the genetic mechanism underlying complex diseases underpinning multiple quantitative traits. In this paper, we developed a novel statistic for this strategy through modifying the Partial Least Squares Path Modeling (PLSPM), called mPLSPM statistic. Results Simulation studies indicated that mPLSPM statistic was powerful and outperformed the principal component analysis (PCA) based linear regression method. Application to real data in the EPIC-Norfolk GWAS sub-cohort showed suggestive interaction (γ) between TMEM18 gene and BDNF gene on two composite body shape scores (γ = 0.047 and γ = 0.058, with P = 0.021, P = 0.005), and BMI (γ = 0.043, P = 0.034). This suggested these scores (synthetically latent traits) were more suitable to capture the obesity related genetic interaction effect between genes compared to single trait. Conclusions The proposed novel mPLSPM statistic is a valid and powerful gene-based method for detecting gene-gene interaction on multiple quantitative phenotypes.
Collapse
Affiliation(s)
- Fangyu Li
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan 250012, China.
| | | | | | | | | | | |
Collapse
|
9
|
SNP set association analysis for genome-wide association studies. PLoS One 2013; 8:e62495. [PMID: 23658731 PMCID: PMC3643925 DOI: 10.1371/journal.pone.0062495] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 03/22/2013] [Indexed: 11/29/2022] Open
Abstract
Genome-wide association study (GWAS) is a promising approach for identifying common genetic variants of the diseases on the basis of millions of single nucleotide polymorphisms (SNPs). In order to avoid low power caused by overmuch correction for multiple comparisons in single locus association study, some methods have been proposed by grouping SNPs together into a SNP set based on genomic features, then testing the joint effect of the SNP set. We compare the performances of principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), and sliced inverse regression (SIR). Simulated SNP sets are generated under scenarios of 0, 1 and ≥2 causal SNPs model. Our simulation results show that all of these methods can control the type I error at the nominal significance level. SPCA is always more powerful than the other methods at different settings of linkage disequilibrium structures and minor allele frequency of the simulated datasets. We also apply these four methods to a real GWAS of non-small cell lung cancer (NSCLC) in Han Chinese population
Collapse
|
10
|
Li S, Cui Y. Gene-centric gene–gene interaction: A model-based kernel machine method. Ann Appl Stat 2012. [DOI: 10.1214/12-aoas545] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
Yang HC, Liang YJ, Chen JW, Chiang KM, Chung CM, Ho HY, Ting CT, Lin TH, Sheu SH, Tsai WC, Chen JH, Leu HB, Yin WH, Chiu TY, Chern CI, Lin SJ, Tomlinson B, Guo Y, Sham PC, Cherny SS, Lam TH, Thomas GN, Pan WH. Identification of IGF1, SLC4A4, WWOX, and SFMBT1 as hypertension susceptibility genes in Han Chinese with a genome-wide gene-based association study. PLoS One 2012; 7:e32907. [PMID: 22479346 PMCID: PMC3315540 DOI: 10.1371/journal.pone.0032907] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Accepted: 02/07/2012] [Indexed: 01/11/2023] Open
Abstract
Hypertension is a complex disorder with high prevalence rates all over the world. We conducted the first genome-wide gene-based association scan for hypertension in a Han Chinese population. By analyzing genome-wide single-nucleotide-polymorphism data of 400 matched pairs of young-onset hypertensive patients and normotensive controls genotyped with the Illumina HumanHap550-Duo BeadChip, 100 susceptibility genes for hypertension were identified and also validated with permutation tests. Seventeen of the 100 genes exhibited differential allelic and expression distributions between patient and control groups. These genes provided a good molecular signature for classifying hypertensive patients and normotensive controls. Among the 17 genes, IGF1, SLC4A4, WWOX, and SFMBT1 were not only identified by our gene-based association scan and gene expression analysis but were also replicated by a gene-based association analysis of the Hong Kong Hypertension Study. Moreover, cis-acting expression quantitative trait loci associated with the differentially expressed genes were found and linked to hypertension. IGF1, which encodes insulin-like growth factor 1, is associated with cardiovascular disorders, metabolic syndrome, decreased body weight/size, and changes of insulin levels in mice. SLC4A4, which encodes the electrogenic sodium bicarbonate cotransporter 1, is associated with decreased body weight/size and abnormal ion homeostasis in mice. WWOX, which encodes the WW domain-containing protein, is related to hypoglycemia and hyperphosphatemia. SFMBT1, which encodes the scm-like with four MBT domains protein 1, is a novel hypertension gene. GRB14, TMEM56 and KIAA1797 exhibited highly significant differential allelic and expressed distributions between hypertensive patients and normotensive controls. GRB14 was also found relevant to blood pressure in a previous genetic association study in East Asian populations. TMEM56 and KIAA1797 may be specific to Taiwanese populations, because they were not validated by the two replication studies. Identification of these genes enriches the collection of hypertension susceptibility genes, thereby shedding light on the etiology of hypertension in Han Chinese populations.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Yu-Jen Liang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Jaw-Wen Chen
- National Yang-Ming University School of Medicine and Taipei Veterans General Hospital, Taipei, Taiwan
| | - Kuang-Mao Chiang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
- School of Public Health, National Medical Defense Center, Taipei, Taiwan
| | - Chia-Min Chung
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Hung-Yun Ho
- Cardiovascular Center, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Chih-Tai Ting
- Cardiovascular Center, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Tsung-Hsien Lin
- Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Sheng-Hsiung Sheu
- Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Wei-Chuan Tsai
- Department of Internal Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Jyh-Hong Chen
- Department of Internal Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Hsin-Bang Leu
- National Yang-Ming University School of Medicine and Taipei Veterans General Hospital, Taipei, Taiwan
| | - Wei-Hsian Yin
- Division of Cardiology, Cheng-Hsin Rehabilitation Medical Center, Taipei, Taiwan
| | - Ting-Yu Chiu
- Division of Cardiology, Min-Sheng General Hospital, Taoyuan, Taiwan
| | - Ching-Iuan Chern
- Division of Cardiology, Min-Sheng General Hospital, Taoyuan, Taiwan
| | - Shing-Jong Lin
- National Yang-Ming University School of Medicine and Taipei Veterans General Hospital, Taipei, Taiwan
| | - Brian Tomlinson
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, China
| | - Youling Guo
- Department of Psychiatry, The University of Hong Kong, Hong Kong, China
| | - Pak C. Sham
- Department of Psychiatry, The University of Hong Kong, Hong Kong, China
- The State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong, China
| | - Stacey S. Cherny
- Department of Psychiatry, The University of Hong Kong, Hong Kong, China
- The State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong, China
| | - Tai Hing Lam
- School of Public Health, The University of Hong Kong, Hong Kong, China
| | - G. Neil Thomas
- Public Health, Epidemiology and Biostatistics, School of Health and Population Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Wen-Harn Pan
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
- Division of Preventive Medicine and Health Services Research, Institute of Population Health Sciences, National Health Research Institutes, Miaoli, Taiwan
- * E-mail:
| |
Collapse
|
12
|
Gao Q, He Y, Yuan Z, Zhao J, Zhang B, Xue F. Gene- or region-based association study via kernel principal component analysis. BMC Genet 2011; 12:75. [PMID: 21871061 PMCID: PMC3176196 DOI: 10.1186/1471-2156-12-75] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2011] [Accepted: 08/26/2011] [Indexed: 11/12/2022] Open
Abstract
Background In genetic association study, especially in GWAS, gene- or region-based methods have been more popular to detect the association between multiple SNPs and diseases (or traits). Kernel principal component analysis combined with logistic regression test (KPCA-LRT) has been successfully used in classifying gene expression data. Nevertheless, the purpose of association study is to detect the correlation between genetic variations and disease rather than to classify the sample, and the genomic data is categorical rather than numerical. Recently, although the kernel-based logistic regression model in association study has been proposed by projecting the nonlinear original SNPs data into a linear feature space, it is still impacted by multicolinearity between the projections, which may lead to loss of power. We, therefore, proposed a KPCA-LRT model to avoid the multicolinearity. Results Simulation results showed that KPCA-LRT was always more powerful than principal component analysis combined with logistic regression test (PCA-LRT) at different sample sizes, different significant levels and different relative risks, especially at the genewide level (1E-5) and lower relative risks (RR = 1.2, 1.3). Application to the four gene regions of rheumatoid arthritis (RA) data from Genetic Analysis Workshop16 (GAW16) indicated that KPCA-LRT had better performance than single-locus test and PCA-LRT. Conclusions KPCA-LRT is a valid and powerful gene- or region-based method for the analysis of GWAS data set, especially under lower relative risks and lower significant levels.
Collapse
Affiliation(s)
- Qingsong Gao
- Department of Epidemiology and Health Statistics, School of Public Health, Shandong University, Jinan 250012, China
| | | | | | | | | | | |
Collapse
|
13
|
Lehne B, Lewis CM, Schlitt T. From SNPs to genes: disease association at the gene level. PLoS One 2011; 6:e20133. [PMID: 21738570 PMCID: PMC3128073 DOI: 10.1371/journal.pone.0020133] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 04/26/2011] [Indexed: 01/16/2023] Open
Abstract
Interpreting Genome-Wide Association Studies (GWAS) at a gene level is an important step towards understanding the molecular processes that lead to disease. In order to incorporate prior biological knowledge such as pathways and protein interactions in the analysis of GWAS data it is necessary to derive one measure of association for each gene. We compare three different methods to obtain gene-wide test statistics from Single Nucleotide Polymorphism (SNP) based association data: choosing the test statistic from the most significant SNP; the mean test statistics of all SNPs; and the mean of the top quartile of all test statistics. We demonstrate that the gene-wide test statistics can be controlled for the number of SNPs within each gene and show that all three methods perform considerably better than expected by chance at identifying genes with confirmed associations. By applying each method to GWAS data for Crohn's Disease and Type 1 Diabetes we identified new potential disease genes.
Collapse
Affiliation(s)
- Benjamin Lehne
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
| | - Cathryn M. Lewis
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, London, United Kingdom
| | - Thomas Schlitt
- Department of Medical and Molecular Genetics, King's College London, London, United Kingdom
- * E-mail:
| |
Collapse
|
14
|
Li MX, Gui HS, Kwan JSH, Sham PC. GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet 2011; 88:283-93. [PMID: 21397060 DOI: 10.1016/j.ajhg.2011.01.019] [Citation(s) in RCA: 300] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2010] [Revised: 01/26/2011] [Accepted: 01/31/2011] [Indexed: 01/01/2023] Open
Abstract
The gene has been proposed as an attractive unit of analysis for association studies, but a simple yet valid, powerful, and sufficiently fast method of evaluating the statistical significance of all genes in large, genome-wide datasets has been lacking. Here we propose the use of an extended Simes test that integrates functional information and association evidence to combine the p values of the single nucleotide polymorphisms within a gene to obtain an overall p value for the association of the entire gene. Our computer simulations demonstrate that this test is more powerful than the SNP-based test, offers effective control of the type 1 error rate regardless of gene size and linkage-disequilibrium pattern among markers, and does not need permutation or simulation to evaluate empirical significance. Its statistical power in simulated data is at least comparable, and often superior, to that of several alternative gene-based tests. When applied to real genome-wide association study (GWAS) datasets on Crohn disease, the test detected more significant genes than SNP-based tests and alternative gene-based tests. The proposed test, implemented in an open-source package, has the potential to identify additional novel disease-susceptibility genes for complex diseases from large GWAS datasets.
Collapse
Affiliation(s)
- Miao-Xin Li
- Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences, the University of Hong Kong, Pokfulam, Hong Kong
| | | | | | | |
Collapse
|
15
|
Liu JZ, Mcrae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, Hayward NK, Montgomery GW, Visscher PM, Martin NG, Macgregor S, Macgregor S. A versatile gene-based test for genome-wide association studies. Am J Hum Genet 2010; 87:139-45. [PMID: 20598278 DOI: 10.1016/j.ajhg.2010.06.009] [Citation(s) in RCA: 591] [Impact Index Per Article: 42.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2010] [Revised: 06/07/2010] [Accepted: 06/11/2010] [Indexed: 12/14/2022] Open
Abstract
We have derived a versatile gene-based test for genome-wide association studies (GWAS). Our approach, called VEGAS (versatile gene-based association study), is applicable to all GWAS designs, including family-based GWAS, meta-analyses of GWAS on the basis of summary data, and DNA-pooling-based GWAS, where existing approaches based on permutation are not possible, as well as singleton data, where they are. The test incorporates information from a full set of markers (or a defined subset) within a gene and accounts for linkage disequilibrium between markers by using simulations from the multivariate normal distribution. We show that for an association study using singletons, our approach produces results equivalent to those obtained via permutation in a fraction of the computation time. We demonstrate proof-of-principle by using the gene-based test to replicate several genes known to be associated on the basis of results from a family-based GWAS for height in 11,536 individuals and a DNA-pooling-based GWAS for melanoma in approximately 1300 cases and controls. Our method has the potential to identify novel associated genes; provide a basis for selecting SNPs for replication; and be directly used in network (pathway) approaches that require per-gene association test statistics. We have implemented the approach in both an easy-to-use web interface, which only requires the uploading of markers with their association p-values, and a separate downloadable application.
Collapse
|
16
|
Beyene J, Tritchler D, Asimit JL, Hamid JS. Gene- or region-based analysis of genome-wide association studies. Genet Epidemiol 2010; 33 Suppl 1:S105-10. [PMID: 19924708 DOI: 10.1002/gepi.20481] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With rapid advances in genotyping technologies in recent years and the growing number of available markers, genome-wide association studies are emerging as promising approaches for the study of complex diseases and traits. However, there are several challenges with analysis and interpretation of such data. First, there is a massive multiple testing problem, due to the large number of markers that need to be analyzed, leading to an increased risk of false positives and decreased ability for association studies to detect truly associated markers. In particular, the ability to detect modest genetic effects can be severely compromised. Second, a genetic association of a given single-nucleotide polymorphism as determined by univariate statistical analyses does not typically explain biologically interesting features, and often requires subsequent interpretation using a higher unit, such as a gene or region, for example, as defined by haplotype blocks. Third, missing genotypes in the data set and other data quality issues can pose challenges when comparisons across platforms and replications are planned. Finally, depending on the type of univariate analysis, computational burden can arise as the number of markers continues to grow into the millions. One way to deal with these and related challenges is to consider higher units for the analysis, such as genes or regions. This article summarizes analytical methods and strategies that have been proposed and applied by Group 16 to two genome-wide association data sets made available through the Genetic Analysis Workshop 16.
Collapse
Affiliation(s)
- Joseph Beyene
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
| | | | | | | |
Collapse
|
17
|
Abstract
Genome-wide association studies, which analyzes hundreds of thousands of single-nucleotide polymorphisms to identify disease susceptibility genes, are challenging because the work involves intensive computation and complex modeling. We propose a two-stage genome-wide association scanning procedure, consisting of a single-locus association scan for the first stage and a gene-based association scan for the second stage. Marginal effects of single-nucleotide polymorphisms are examined by using the exact Armitage trend test or logistic regression, and gene effects are examined by using a p-value combination method. Compared with some existing single-locus and multilocus methods, the proposed method has the following merits: 1) convenient for definition of biologically meaningful regions, 2) powerful for detection of minor-effect genes, 3) helpful for alleviation of a multiple-testing problem, and 4) convenient for result interpretation. The method was applied to study Genetic Analysis Workshop 16 Problem 1 rheumatoid arthritis data, and strong association signals were found. The results show that the human major histocompatibility complex region is the most important genomic region associated with rheumatoid arthritis. Moreover, previously reported genes including PTPN22, C5, and IL2RB were confirmed; novel genes including HLA-DRA, BTNL2, C6orf10, NOTCH4, TAP2, and TNXB were identified by our analysis.
Collapse
|