101
|
Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges. ENTROPY 2020; 22:e22040427. [PMID: 33286201 PMCID: PMC7516904 DOI: 10.3390/e22040427] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 03/18/2020] [Accepted: 04/03/2020] [Indexed: 12/22/2022]
Abstract
Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.
Collapse
|
102
|
Xie J, Ma A, Fennell A, Ma Q, Zhao J. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform 2020; 20:1449-1464. [PMID: 29490019 DOI: 10.1093/bib/bby014] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 01/16/2018] [Indexed: 12/12/2022] Open
Abstract
Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.
Collapse
|
103
|
Shahamatdar S, He MX, Reyna MA, Gusev A, AlDubayan SH, Van Allen EM, Ramachandran S. Germline Features Associated with Immune Infiltration in Solid Tumors. Cell Rep 2020; 30:2900-2908.e4. [PMID: 32130895 PMCID: PMC7082123 DOI: 10.1016/j.celrep.2020.02.039] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 08/12/2019] [Accepted: 02/07/2020] [Indexed: 12/13/2022] Open
Abstract
The immune composition of the tumor microenvironment influences response and resistance to immunotherapies. While numerous studies have identified somatic correlates of immune infiltration, germline features that associate with immune infiltrates in cancers remain incompletely characterized. We analyze seven million autosomal germline variants in the TCGA cohort and test for association with established immune-related phenotypes that describe the tumor immune microenvironment. We identify one SNP associated with the amount of infiltrating follicular helper T cells; 23 candidate genes, some of which are involved in cytokine-mediated signaling and others containing cancer-risk SNPs; and networks with genes that are part of the DNA repair and transcription elongation pathways. In addition, we find a positive association between polygenic risk for rheumatoid arthritis and amount of infiltrating CD8+ T cells. Overall, we identify multiple germline genetic features associated with tumor-immune phenotypes and develop a framework for probing inherited features that contribute to differences in immune infiltration.
Collapse
Affiliation(s)
- Sahar Shahamatdar
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912, USA
| | - Meng Xiao He
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Harvard Graduate Program in Biophysics, Boston, MA 02115, USA
| | - Matthew A Reyna
- Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA; Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Alexander Gusev
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Saud H AlDubayan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Eliezer M Van Allen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
104
|
Guo B, Wu B. Powerful and efficient SNP-set association tests across multiple phenotypes using GWAS summary data. Bioinformatics 2020; 35:1366-1372. [PMID: 30239606 DOI: 10.1093/bioinformatics/bty811] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 08/29/2018] [Accepted: 09/18/2018] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Many GWAS conducted in the past decade have identified tens of thousands of disease related variants, which in total explained only part of the heritability for most traits. There remain many more genetics variants with small effect sizes to be discovered. This has motivated the development of sequencing studies with larger sample sizes and increased resolution of genotyped variants, e.g., the ongoing NHLBI Trans-Omics for Precision Medicine (TOPMed) whole genome sequencing project. An alternative approach is the development of novel and more powerful statistical methods. The current dominating approach in the field of GWAS analysis is the "single trait single variant" association test, despite the fact that most GWAS are conducted in deeply-phenotyped cohorts with many correlated traits measured. In this paper, we aim to develop rigorous methods that integrate multiple correlated traits and multiple variants to improve the power to detect novel variants. In recognition of the difficulty of accessing raw genotype and phenotype data due to privacy and logistic concerns, we develop methods that are applicable to publicly available GWAS summary data. RESULTS We build rigorous statistical models for GWAS summary statistics to motivate novel multi-trait SNP-set association tests, including variance component test, burden test and their adaptive test, and develop efficient numerical algorithms to quickly compute their analytical P-values. We implement the proposed methods in an open source R package. We conduct thorough simulation studies to verify the proposed methods rigorously control type I errors at the genome-wide significance level, and further demonstrate their utility via comprehensive analysis of GWAS summary data for multiple lipids traits and glycemic traits. We identified many novel loci that were not detected by the individual trait based GWAS analysis. AVAILABILITY AND IMPLEMENTATION We have implemented the proposed methods in an R package freely available at http://www.github.com/baolinwu/MSKAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Guo
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
105
|
Uncovering Tumour Heterogeneity through PKR and nc886 Analysis in Metastatic Colon Cancer Patients Treated with 5-FU-Based Chemotherapy. Cancers (Basel) 2020; 12:cancers12020379. [PMID: 32045987 PMCID: PMC7072376 DOI: 10.3390/cancers12020379] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 02/03/2020] [Accepted: 02/04/2020] [Indexed: 12/18/2022] Open
Abstract
Colorectal cancer treatment has advanced over the past decade. The drug 5-fluorouracil is still used with a wide percentage of patients who do not respond. Therefore, a challenge is the identification of predictive biomarkers. The protein kinase R (PKR also called EIF2AK2) and its regulator, the non-coding pre-mir-nc886, have multiple effects on cells in response to numerous types of stress, including chemotherapy. In this work, we performed an ambispective study with 197 metastatic colon cancer patients with unresectable metastases to determine the relative expression levels of both nc886 and PKR by qPCR, as well as the location of PKR by immunohistochemistry in tumour samples and healthy tissues (plasma and colon epithelium). As primary end point, the expression levels were related to the objective response to first-line chemotherapy following the response evaluation criteria in solid tumours (RECIST) and, as the second end point, with survival at 18 and 36 months. Hierarchical agglomerative clustering was performed to accommodate the heterogeneity and complexity of oncological patients’ data. High expression levels of nc886 were related to the response to treatment and allowed to identify clusters of patients. Although the PKR mRNA expression was not associated with chemotherapy response, the absence of PKR location in the nucleolus was correlated with first-line chemotherapy response. Moreover, a relationship between survival and the expression of both PKR and nc886 in healthy tissues was found. Therefore, this work evaluated the best way to analyse the potential biomarkers PKR and nc886 in order to establish clusters of patients depending on the cancer outcomes using algorithms for complex and heterogeneous data.
Collapse
|
106
|
Solis-Lemus CR, Fischer ST, Todor A, Liu C, Leslie EJ, Cutler DJ, Ghosh D, Epstein MP. Leveraging Family History in Case-Control Analyses of Rare Variation. Genetics 2020; 214:295-303. [PMID: 31843756 PMCID: PMC7017020 DOI: 10.1534/genetics.119.302846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Accepted: 12/10/2019] [Indexed: 11/18/2022] Open
Abstract
Standard methods for case-control association studies of rare variation often treat disease outcome as a dichotomous phenotype. However, both theoretical and experimental studies have demonstrated that subjects with a family history of disease can be enriched for risk variation relative to subjects without such history. Assuming family history information is available, this observation motivates the idea of replacing the standard dichotomous outcome variable used in case-control studies with a more informative ordinal outcome variable that distinguishes controls (0), sporadic cases (1), and cases with a family history (2), with the expectation that we should observe increasing number of risk variants with increasing category of the ordinal variable. To leverage this expectation, we propose a novel rare-variant association test that incorporates family history information based on our previous GAMuT framework for rare-variant association testing of multivariate phenotypes. We use simulated data to show that, when family history information is available, our new method outperforms standard rare-variant association methods, like burden and SKAT tests, that ignore family history. We further illustrate our method using a rare-variant study of cleft lip and palate.
Collapse
Affiliation(s)
| | - S Taylor Fischer
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, 30329 Georgia
| | - Andrei Todor
- Department of Human Genetics, Emory University, Atlanta, 30030 Georgia
| | - Cuining Liu
- Department of Biostatistics and Informatics, University of Colorado, Aurora, 80045 Colorado
| | | | - David J Cutler
- Department of Human Genetics, Emory University, Atlanta, 30030 Georgia
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado, Aurora, 80045 Colorado
| | - Michael P Epstein
- Department of Human Genetics, Emory University, Atlanta, 30030 Georgia
| |
Collapse
|
107
|
Renaux C, Buzdugan L, Kalisch M, Bühlmann P. Hierarchical inference for genome-wide association studies: a view on methodology with software. Comput Stat 2020. [DOI: 10.1007/s00180-019-00939-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
108
|
Xue Y, Ding J, Wang J, Zhang S, Pan D. Two-phase SSU and SKAT in genetic association studies. J Genet 2020; 99:9. [PMID: 32089528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The sum of squared score (SSU) and sequence kernel association test (SKAT) are the two good alternative tests for genetic association studies in case-control data. Both SSU and SKAT are derived through assuming a dose-response model between the risk of disease and genotypes. However, in practice, the real genetic mode of inheritance is impossible to know. Thus, these two tests might losepower substantially as shown in simulation results when the genetic model is misspecified. Here, to make both the tests suitable in broad situations, we propose two-phase SSU (tpSSU) and two-phase SKAT (tpSKAT), where the Hardy-Weinberg equilibrium test is adopted to choose the genetic model in the first phase and the SSU and SKAT are constructed corresponding to the selected genetic model in the second phase. We found that both tpSSU and tpSKAT outperformed the original SSU and SKAT in most of our simulation scenarios. Byapplying tpSSU and tpSKAT to the study of type 2 diabetes data, we successfully identified some genes that have direct effects on obesity. Besides, we also detected the significant chromosomal region 10q21.22 in GAW16 rheumatoid arthritis dataset, with P<10-6. These findings suggest that tpSSU and tpSKAT can be effective in identifying genetic variants for complex diseases in case-control association studies.
Collapse
Affiliation(s)
- Yuan Xue
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China.
| | | | | | | | | |
Collapse
|
109
|
Abstract
Human personality is 30-60% heritable according to twin and adoption studies. Hundreds of genetic variants are expected to influence its complex development, but few have been identified. We used a machine learning method for genome-wide association studies (GWAS) to uncover complex genotypic-phenotypic networks and environmental interactions. The Temperament and Character Inventory (TCI) measured the self-regulatory components of personality critical for health (i.e., the character traits of self-directedness, cooperativeness, and self-transcendence). In a discovery sample of 2149 healthy Finns, we identified sets of single-nucleotide polymorphisms (SNPs) that cluster within particular individuals (i.e., SNP sets) regardless of phenotype. Second, we identified five clusters of people with distinct profiles of character traits regardless of genotype. Third, we found 42 SNP sets that identified 727 gene loci and were significantly associated with one or more of the character profiles. Each character profile was related to different SNP sets with distinct molecular processes and neuronal functions. Environmental influences measured in childhood and adulthood had small but significant effects. We confirmed the replicability of 95% of the 42 SNP sets in healthy Korean and German samples, as well as their associations with character. The identified SNPs explained nearly all the heritability expected for character in each sample (50 to 58%). We conclude that self-regulatory personality traits are strongly influenced by organized interactions among more than 700 genes despite variable cultures and environments. These gene sets modulate specific molecular processes in brain for intentional goal-setting, self-reflection, empathy, and episodic learning and memory.
Collapse
|
110
|
Zwir I, Mishra P, Del-Val C, Gu CC, de Erausquin GA, Lehtimäki T, Cloninger CR. Uncovering the complex genetics of human personality: response from authors on the PGMRA Model. Mol Psychiatry 2020; 25:2210-2213. [PMID: 30886336 PMCID: PMC7515846 DOI: 10.1038/s41380-019-0399-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Accepted: 02/14/2019] [Indexed: 12/01/2022]
Affiliation(s)
- Igor Zwir
- grid.4367.60000 0001 2355 7002Washington University School of Medicine, Department of Psychiatry, St. Louis, MO USA ,grid.4489.10000000121678994University of Granada, Department of Computer Science, Granada, Spain
| | - Pashupati Mishra
- grid.502801.e0000 0001 2314 6254Department of Clinical Chemistry, Fimlab Laboratories, and Finnish Cardiovascular Research Center - Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Coral Del-Val
- grid.4489.10000000121678994University of Granada, Department of Computer Science, Granada, Spain
| | - C. Charles Gu
- grid.4367.60000 0001 2355 7002Washington University, School of Medicine, Division of Biostatistics, St. Louis, MO USA
| | - Gabriel A. de Erausquin
- grid.449717.80000 0004 5374 269XUniversity of Texas Rio-Grande Valley, School of Medicine, Department of Psychiatry and Neurology, and Institute of Neurosciences, Harlingen, TX USA
| | - Terho Lehtimäki
- grid.502801.e0000 0001 2314 6254Department of Clinical Chemistry, Fimlab Laboratories, and Finnish Cardiovascular Research Center - Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - C. Robert Cloninger
- grid.4367.60000 0001 2355 7002Washington University School of Medicine, Department of Psychiatry, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Washington University, School of Arts and Sciences, Department of Psychological and Brain Sciences, and School of Medicine, Department of Genetics, St. Louis, MO USA
| |
Collapse
|
111
|
Fan KH, Feingold E, Rosenthal SL, Demirci FY, Ganguli M, Lopez OL, Kamboh MI. Whole-Exome Sequencing Analysis of Alzheimer's Disease in Non-APOE*4 Carriers. J Alzheimers Dis 2020; 76:1553-1565. [PMID: 32651314 PMCID: PMC7484092 DOI: 10.3233/jad-200037] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The genetics of late-onset Alzheimer's disease (AD) is complex due to the heterogeneous nature of the disorder. APOE*4 is the strongest genetic risk factor for AD. Genome-wide association studies have identified more than 30 additional loci, each having relatively small effect size. Known AD loci explain only about 30% of the genetic variance, and thus much of the genetic variance remains unexplained. To identify some of the missing heritability of AD, we analyzed whole-exome sequencing (WES) data focusing on non-APOE*4 carriers from two WES datasets: 720 cases and controls from the University of Pittsburgh and 7,252 cases and controls from the Alzheimer's Disease Sequencing Project. Following separate WES analyses in each dataset, we performed meta-analysis for overlapping markers present in both datasets. Among the four variants reaching the exome-wide significance threshold, three were from known AD loci: APOE/rs7412 (odds ratio (OR) = 0.40; p = 5.46E-24), TOMM40/rs157581 (OR = 1.49; p = 4.04E-07), and TREM2/rs75932628 (OR = 4.00; p = 1.15E-07). The fourth significant variant, rs199533, was from a novel locus on chromosome 17 in the NSF gene (OR = 0.78; p = 2.88E-07). NSF was also significant in the gene-based analysis (p = 1.20E-05). In the GTEx data, NSF/rs199533 is a cis-eQTL for multiple genes in the brain and blood, including NSF that is highly expressed across all brain tissues, including regions that typically show amyloid-β accumulation. Further characterization of genes that are affected by NSF/rs199533 may help to shed light on the roles of these genes in AD etiology.
Collapse
Affiliation(s)
- Kang-Hsien Fan
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Eleanor Feingold
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Samantha L. Rosenthal
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - F. Yesim Demirci
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Mary Ganguli
- Department of Psychiatry, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Oscar L. Lopez
- Department of Neurology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - M. Ilyas Kamboh
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
112
|
Zwir I, Arnedo J, Del-Val C, Pulkki-Råback L, Konte B, Yang SS, Romero-Zaliz R, Hintsanen M, Cloninger KM, Garcia D, Svrakic DM, Rozsa S, Martinez M, Lyytikäinen LP, Giegling I, Kähönen M, Hernandez-Cuervo H, Seppälä I, Raitoharju E, de Erausquin GA, Raitakari O, Rujescu D, Postolache TT, Sung J, Keltikangas-Järvinen L, Lehtimäki T, Cloninger CR. Uncovering the complex genetics of human temperament. Mol Psychiatry 2020; 25:2275-2294. [PMID: 30279457 PMCID: PMC7515831 DOI: 10.1038/s41380-018-0264-5] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Revised: 07/21/2018] [Accepted: 08/15/2018] [Indexed: 11/11/2022]
Abstract
Experimental studies of learning suggest that human temperament may depend on the molecular mechanisms for associative conditioning, which are highly conserved in animals. The main genetic pathways for associative conditioning are known in experimental animals, but have not been identified in prior genome-wide association studies (GWAS) of human temperament. We used a data-driven machine learning method for GWAS to uncover the complex genotypic-phenotypic networks and environmental interactions related to human temperament. In a discovery sample of 2149 healthy Finns, we identified sets of single-nucleotide polymorphisms (SNPs) that cluster within particular individuals (i.e., SNP sets) regardless of phenotype. Second, we identified 3 clusters of people with distinct temperament profiles measured by the Temperament and Character Inventory regardless of genotype. Third, we found 51 SNP sets that identified 736 gene loci and were significantly associated with temperament. The identified genes were enriched in pathways activated by associative conditioning in animals, including the ERK, PI3K, and PKC pathways. 74% of the identified genes were unique to a specific temperament profile. Environmental influences measured in childhood and adulthood had small but significant effects. We confirmed the replicability of the 51 Finnish SNP sets in healthy Korean (90%) and German samples (89%), as well as their associations with temperament. The identified SNPs explained nearly all the heritability expected in each sample (37-53%) despite variable cultures and environments. We conclude that human temperament is strongly influenced by more than 700 genes that modulate associative conditioning by molecular processes for synaptic plasticity and long-term memory.
Collapse
Affiliation(s)
- Igor Zwir
- grid.4367.60000 0001 2355 7002Department of Psychiatry, Washington University School of Medicine, St. Louis, MO USA ,grid.4489.10000000121678994Department of Computer Science, University of Granada, Granada, Spain
| | - Javier Arnedo
- grid.4367.60000 0001 2355 7002Department of Psychiatry, Washington University School of Medicine, St. Louis, MO USA ,grid.4489.10000000121678994Department of Computer Science, University of Granada, Granada, Spain
| | - Coral Del-Val
- grid.4489.10000000121678994Department of Computer Science, University of Granada, Granada, Spain
| | - Laura Pulkki-Råback
- grid.7737.40000 0004 0410 2071Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
| | - Bettina Konte
- grid.9018.00000 0001 0679 2801Department of Psychiatry, Martin-Luther-University Halle-Wittenberg, Halle, Germany
| | - Sarah S. Yang
- grid.31501.360000 0004 0470 5905Department of Epidemiology, School of Public Health, Institute of Health and Environment, Seoul National University, Seoul, Korea
| | - Rocio Romero-Zaliz
- grid.4489.10000000121678994Department of Computer Science, University of Granada, Granada, Spain
| | - Mirka Hintsanen
- grid.10858.340000 0001 0941 4873Unit of Psychology, Faculty of Education, University of Oulu, Oulu, Finland
| | | | - Danilo Garcia
- grid.8761.80000 0000 9919 9582Department of Psychology, University of Gothenburg, Gothenburg, Sweden ,grid.435885.70000 0001 0597 1381Blekinge Centre of Competence, Blekinge County Council, Karlskrona, Sweden
| | - Dragan M. Svrakic
- grid.4367.60000 0001 2355 7002Department of Psychiatry, Washington University School of Medicine, St. Louis, MO USA
| | - Sandor Rozsa
- grid.4367.60000 0001 2355 7002Department of Psychiatry, Washington University School of Medicine, St. Louis, MO USA
| | - Maribel Martinez
- grid.4367.60000 0001 2355 7002Department of Psychiatry, Washington University School of Medicine, St. Louis, MO USA
| | - Leo-Pekka Lyytikäinen
- grid.502801.e0000 0001 2314 6254Fimlab Laboratories, Department of Clinical Chemistry, Faculty of Medicine and Life Sciences, Finnish Cardiovascular Research Center-Tampere, University of Tampere, Tampere, Finland
| | - Ina Giegling
- grid.9018.00000 0001 0679 2801Department of Psychiatry, Martin-Luther-University Halle-Wittenberg, Halle, Germany ,grid.5252.00000 0004 1936 973XUniversity Clinic, Ludwig-Maximilian University, Munich, Germany
| | - Mika Kähönen
- grid.502801.e0000 0001 2314 6254Department of Clinical Physiology, Faculty of Medicine and Life Sciences, Tampere University Hospital, University of Tampere, Tampere, Finland
| | - Helena Hernandez-Cuervo
- grid.170693.a0000 0001 2353 285XDepartment of Psychiatry and Neurosurgery, University of South Florida, Tampa, FL USA
| | - Ilkka Seppälä
- grid.502801.e0000 0001 2314 6254Fimlab Laboratories, Department of Clinical Chemistry, Faculty of Medicine and Life Sciences, Finnish Cardiovascular Research Center-Tampere, University of Tampere, Tampere, Finland
| | - Emma Raitoharju
- grid.502801.e0000 0001 2314 6254Fimlab Laboratories, Department of Clinical Chemistry, Faculty of Medicine and Life Sciences, Finnish Cardiovascular Research Center-Tampere, University of Tampere, Tampere, Finland
| | - Gabriel A. de Erausquin
- grid.449717.80000 0004 5374 269XDepartment of Psychiatry and Neurology, Institute of Neurosciences, University of Texas Rio-Grande Valley School of Medicine, Harlingen, TX USA
| | - Olli Raitakari
- grid.410552.70000 0004 0628 215XDepartment of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
| | - Dan Rujescu
- grid.9018.00000 0001 0679 2801Department of Psychiatry, Martin-Luther-University Halle-Wittenberg, Halle, Germany
| | - Teodor T. Postolache
- grid.411024.20000 0001 2175 4264Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD USA ,Rocky Mountain Mental Illness, Research, Education and Clinical Center for Veteran Suicide Prevention, Denver, CO USA
| | - Joohon Sung
- grid.31501.360000 0004 0470 5905Department of Epidemiology, School of Public Health, Institute of Health and Environment, Seoul National University, Seoul, Korea
| | - Liisa Keltikangas-Järvinen
- grid.7737.40000 0004 0410 2071Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
| | - Terho Lehtimäki
- grid.502801.e0000 0001 2314 6254Fimlab Laboratories, Department of Clinical Chemistry, Faculty of Medicine and Life Sciences, Finnish Cardiovascular Research Center-Tampere, University of Tampere, Tampere, Finland
| | - C. Robert Cloninger
- grid.4367.60000 0001 2355 7002Department of Psychiatry, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Department of Psychological and Brain Sciences, School of Arts and Sciences, and Department of Genetics, School of Medicine, Washington University School of Medicine, St. Louis, MO USA
| |
Collapse
|
113
|
Kim Y, Chi YY, Zou F. An efficient integrative resampling method for gene-trait association analysis. Genet Epidemiol 2019; 44:197-207. [PMID: 31820489 DOI: 10.1002/gepi.22271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 10/27/2019] [Accepted: 11/25/2019] [Indexed: 11/07/2022]
Abstract
Genetic association studies are popular for identifying genetic variants, such as single nucleotide polymorphisms (SNPs), that are associated with complex traits. Statistical tests are commonly performed one SNP at a time with an assumed mode of inheritance such as recessive, additive, or dominant genetic model. Such analysis can result in inadequate power when the employed model deviates from the underlying true genetic model. We propose an integrative association test procedure under a generalized linear model framework to flexibly model the data from the above three common genetic models and beyond. A computationally efficient resampling procedure is adopted to estimate the null distribution of the proposed test statistic. Simulation results show that our methods maintain the Type I error rate irrespective of the existence of confounding covariates and achieve adequate power compared to the methods with the true genetic model. The new methods are applied to two genetic studies on the resistance of severe malaria and sarcoidosis.
Collapse
Affiliation(s)
- Yeonil Kim
- Early Development Statistics, Merck & Co., Inc., Rahway, New Jersey
| | - Yueh-Yun Chi
- Department of Biostatistics, University of Florida, Gainesville, Florida
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
114
|
Sanyal N, Lo MT, Kauppi K, Djurovic S, Andreassen OA, Johnson VE, Chen CH. GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies. Bioinformatics 2019; 35:1-11. [PMID: 29931045 DOI: 10.1093/bioinformatics/bty472] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 06/12/2018] [Indexed: 01/29/2023] Open
Abstract
Motivation Multiple marker analysis of the genome-wide association study (GWAS) data has gained ample attention in recent years. However, because of the ultra high-dimensionality of GWAS data, such analysis is challenging. Frequently used penalized regression methods often lead to large number of false positives, whereas Bayesian methods are computationally very expensive. Motivated to ameliorate these issues simultaneously, we consider the novel approach of using non-local priors in an iterative variable selection framework. Results We develop a variable selection method, named, iterative non-local prior based selection for GWAS, or GWASinlps, that combines, in an iterative variable selection framework, the computational efficiency of the screen-and-select approach based on some association learning and the parsimonious uncertainty quantification provided by the use of non-local priors. The hallmark of our method is the introduction of 'structured screen-and-select' strategy, that considers hierarchical screening, which is not only based on response-predictor associations, but also based on response-response associations and concatenates variable selection within that hierarchy. Extensive simulation studies with single nucleotide polymorphisms having realistic linkage disequilibrium structures demonstrate the advantages of our computationally efficient method compared to several frequentist and Bayesian variable selection methods, in terms of true positive rate, false discovery rate, mean squared error and effect size estimation error. Further, we provide empirical power analysis useful for study design. Finally, a real GWAS data application was considered with human height as phenotype. Availability and implementation An R-package for implementing the GWASinlps method is available at https://cran.r-project.org/web/packages/GWASinlps/index.html. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nilotpal Sanyal
- Department of Radiology, University of California, San Diego, La Jolla, CA, USA
| | - Min-Tzu Lo
- Department of Radiology, University of California, San Diego, La Jolla, CA, USA
| | - Karolina Kauppi
- Department of Radiation Sciences, Umeå University, Umeå, Sweden
| | - Srdjan Djurovic
- Department of Medical Genetics, NORMENT, KG Jebsen Centre, University of Bergen, Bergen, Oslo University Hospital, Oslo, Norway
| | - Ole A Andreassen
- Division of Mental Health and Addiction, NORMENT, KG Jebsen Centre, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Valen E Johnson
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Chi-Hua Chen
- Department of Radiology, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
115
|
Sun R, Lin X. Genetic Variant Set-Based Tests Using the Generalized Berk-Jones Statistic with Application to a Genome-Wide Association Study of Breast Cancer. J Am Stat Assoc 2019; 115:1079-1091. [PMID: 33041403 DOI: 10.1080/01621459.2019.1660170] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Studying the effects of groups of single nucleotide polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases like breast cancer, uncovering new genetic associations and augmenting the information that can be gleaned from studying SNPs individually. Common challenges in set-based genetic association testing include weak effect sizes, correlation between SNPs in a SNP-set, and scarcity of signals, with individual SNP effects often ranging from extremely sparse to moderately sparse in number. Motivated by these challenges, we propose the Generalized Berk-Jones (GBJ) test for the association between a SNP-set and outcome. The GBJ extends the Berk-Jones statistic by accounting for correlation among SNPs, and it provides advantages over the Generalized Higher Criticism test when signals in a SNP-set are moderately sparse. We also provide an analytic p-value calculation for SNP-sets of any finite size, and we develop an omnibus statistic that is robust to the degree of signal sparsity. An additional advantage of our work is the ability to conduct inference using individual SNP summary statistics from a genome-wide association study (GWAS). We evaluate the finite sample performance of the GBJ through simulation and apply the method to identify breast cancer risk genes in a GWAS conducted by the Cancer Genetic Markers of Susceptibility Consortium. Our results suggest evidence of association between FGFR2 and breast cancer and also identify other potential susceptibility genes, complementing conventional SNP-level analysis.
Collapse
Affiliation(s)
- Ryan Sun
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115; Department of Statistics, Harvard University, Cambridge, MA 02138
| |
Collapse
|
116
|
Li Y, Giorgi EE, Beckman KB, Caberto C, Kazma R, Lum-Jones A, Haiman CA, Marchand LL, Stram DO, Saxena R, Cheng I. Association between mitochondrial genetic variation and breast cancer risk: The Multiethnic Cohort. PLoS One 2019; 14:e0222284. [PMID: 31577800 PMCID: PMC6774509 DOI: 10.1371/journal.pone.0222284] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Accepted: 08/26/2019] [Indexed: 01/17/2023] Open
Abstract
Background The mitochondrial genome encodes for thirty-seven proteins, among them thirteen are essential for the oxidative phosphorylation (OXPHOS) system. Inherited variation in mitochondrial genes may influence cancer development through changes in mitochondrial proteins, altering the OXPHOS process and promoting the production of reactive oxidative species. Methods To investigate the association between mitochondrial genetic variation and breast cancer risk, we tested 314 mitochondrial SNPs (mtSNPs), capturing four complexes of the mitochondrial OXPHOS pathway and mtSNP groupings for rRNA and tRNA, in 2,723 breast cancer cases and 3,260 controls from the Multiethnic Cohort Study. Results We examined the collective set of 314 mtSNPs as well as subsets of mtSNPs grouped by mitochondrial OXPHOS pathway, complexes, and genes, using the sequence kernel association test and adjusting for age, sex, and principal components of global ancestry. We also tested haplogroup associations using unconditional logistic regression and adjusting for the same covariates. Stratified analyses were conducted by self-reported maternal race/ethnicity. No significant mitochondrial OXPHOS pathway, gene, and haplogroup associations were observed in African Americans, Asian Americans, Latinos, and Native Hawaiians. In European Americans, a global test of all genetic variants of the mitochondrial genome identified an association with breast cancer risk (P = 0.017, q = 0.102). In mtSNP-subset analysis, the gene MT-CO2 (P = 0.001, q = 0.09) in Complex IV (cytochrome c oxidase) and MT-ND2 (P = 0.004, q = 0.19) in Complex I (NADH dehydrogenase (ubiquinone)) were significantly associated with breast cancer risk. Conclusions In summary, our findings suggest that collective mitochondrial genetic variation and particularly in the MT-CO2 and MT-ND2 may play a role in breast cancer risk among European Americans. Further replication is warranted in larger populations and future studies should evaluate the contribution of mitochondrial proteins encoded by both the nuclear and mitochondrial genomes to breast cancer risk.
Collapse
Affiliation(s)
- Yuqing Li
- Department of Epidemiology and Biostatistics, School of Medicine, University of California, San Francisco, California, United States of America
| | - Elena E. Giorgi
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, New Mexico
| | - Kenneth B. Beckman
- University of Minnesota Genomics Center, Minneapolis, Minnesota, United States of America
| | - Christian Caberto
- Epidemiology Program, University of Hawaii Cancer Center, University of Hawaii, Honolulu, Hawaii, United States of America
| | - Remi Kazma
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Switzerland
| | - Annette Lum-Jones
- Epidemiology Program, University of Hawaii Cancer Center, University of Hawaii, Honolulu, Hawaii, United States of America
| | - Christopher A. Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Loïc Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, University of Hawaii, Honolulu, Hawaii, United States of America
| | - Daniel O. Stram
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Richa Saxena
- Center for Human Genetic Research, Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program of Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Iona Cheng
- Department of Epidemiology and Biostatistics, School of Medicine, University of California, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
117
|
Xu Y, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Sci Rep 2019; 9:13686. [PMID: 31548641 PMCID: PMC6757104 DOI: 10.1038/s41598-019-50229-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 09/09/2019] [Indexed: 12/18/2022] Open
Abstract
Genome-wide association studies (GWASs) aim to detect genetic risk factors for complex human diseases by identifying disease-associated single-nucleotide polymorphisms (SNPs). The traditional SNP-wise approach along with multiple testing adjustment is over-conservative and lack of power in many GWASs. In this article, we proposed a model-based clustering method that transforms the challenging high-dimension-small-sample-size problem to low-dimension-large-sample-size problem and borrows information across SNPs by grouping SNPs into three clusters. We pre-specify the patterns of clusters by minor allele frequencies of SNPs between cases and controls, and enforce the patterns with prior distributions. In the simulation studies our proposed novel model outperforms traditional SNP-wise approach by showing better controls of false discovery rate (FDR) and higher sensitivity. We re-analyzed two real studies to identifying SNPs associated with severe bortezomib-induced peripheral neuropathy (BiPN) in patients with multiple myeloma (MM). The original analysis in the literature failed to identify SNPs after FDR adjustment. Our proposed method not only detected the reported SNPs after FDR adjustment but also discovered a novel BiPN-associated SNP rs4351714 that has been reported to be related to MM in another study.
Collapse
Affiliation(s)
- Yan Xu
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | - Li Xing
- Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, SK, Canada
| | - Jessica Su
- Channing Division of Network Medicine, Brigham and Women's Hospital/Harvard Medical School, Boston, MA, USA
| | - Xuekui Zhang
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada.
| | - Weiliang Qiu
- Channing Division of Network Medicine, Brigham and Women's Hospital/Harvard Medical School, Boston, MA, USA
| |
Collapse
|
118
|
Roy Sarkar T, Maity AK, Niu Y, Mallick BK. Multiple Omics Data Integration to Identify Long Noncoding RNA Responsible for Breast Cancer-Related Mortality. Cancer Inform 2019; 18:1176935119871933. [PMID: 31488946 PMCID: PMC6710679 DOI: 10.1177/1176935119871933] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 07/21/2019] [Indexed: 12/30/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are a large and diverse class of transcribed RNAs, which have been shown to play a significant role in developing cancer. In this study, we apply integrative modeling framework to integrate the DNA copy number variation (CNV), lncRNA expression, and downstream target protein expression to predict patient survival in breast cancer. We develop a 3-stage model combining a mechanical model (lncRNA regressed on CNV and target proteins regressed on lncRNA) and a clinical model (survival regressed on estimated effects from the mechanical models). Using lncRNAs (such as HOTAIR and MALAT1) along with their CNV, target protein expressions, and survival outcomes from The Cancer Genome Atlas (TCGA) database, we show that predicted mean square error and integrated Brier score (IBS) are both lower for the proposed 3-step integrated model than that of 2-step model. Therefore, the integrative model has better predictive ability than the 2-step model not considering target protein information.
Collapse
Affiliation(s)
- Tapasree Roy Sarkar
- Department of Biology, Texas A&M University, College Station, TX, USA.,Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Arnab Kumar Maity
- Early Clinical Development Oncology Statistics, Pfizer Inc, San Diego, CA, USA
| | - Yabo Niu
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Bani K Mallick
- Department of Statistics, Texas A&M University, College Station, TX, USA
| |
Collapse
|
119
|
Schaid DJ, Tong X, Batzler A, Sinnwell JP, Qing J, Biernacka JM. Multivariate generalized linear model for genetic pleiotropy. Biostatistics 2019; 20:111-128. [PMID: 29267957 DOI: 10.1093/biostatistics/kxx067] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 11/05/2017] [Indexed: 02/07/2023] Open
Abstract
When a single gene influences more than one trait, known as pleiotropy, it is important to detect pleiotropy to improve the biological understanding of a gene. This can lead to improved screening, diagnosis, and treatment of diseases. Yet, most current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or fewer traits are associated with a genetic variant. We recently developed statistical methods to analyze pleiotropy for quantitative traits having a multivariate normal distribution. We now extend this approach to traits that can be modeled by generalized linear models, such as analysis of binary, ordinal, or quantitative traits, or a mixture of these types of traits. Based on methods from estimating equations, we developed a new test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that $k+1$ traits are associated, given that the null of $k$ associated traits was rejected. This provides a testing framework to determine the number of traits associated with a genetic variant, as well as which traits, while accounting for correlations among the traits. By simulations, we illustrate the Type-I error rate and power of our new methods, describe how they are influenced by sample size, the number of traits, and the trait correlations, and apply the new methods to a genome-wide association study of multivariate traits measuring symptoms of major depression. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic, Harwick 775, 200 First ST SW, Rochester, MN, USA
| | - Xingwei Tong
- School of Statistics, Beijing Normal University, Beijing, China
| | - Anthony Batzler
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jason P Sinnwell
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jiang Qing
- School of Statistics, Beijing Normal University, Beijing, China
| | | |
Collapse
|
120
|
Zhang J, Zhao Z, Guo X, Guo B, Wu B. Powerful statistical method to detect disease-associated genes using publicly available genome-wide association studies summary data. Genet Epidemiol 2019; 43:941-951. [PMID: 31392781 DOI: 10.1002/gepi.22251] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Revised: 07/14/2019] [Accepted: 07/16/2019] [Indexed: 12/11/2022]
Abstract
Genome-wide association studies (GWAS) have thus far achieved substantial success. In the last decade, a large number of common variants underlying complex diseases have been identified through GWAS. In most existing GWAS, the identified common variants are obtained by single marker-based tests, that is, testing one single-nucleotide polymorphism (SNP) at a time. Generally, the basic functional unit of inheritance is a gene, rather than a SNP. Thus, results from gene-level association test can be more readily integrated with downstream functional and pathogenic investigation. In this paper, we propose a general gene-based p-value adaptive combination approach (GPA) which can integrate association evidence of multiple genetic variants using only GWAS summary statistics (either p-value or other test statistics). The proposed method could be used to test genetic association for both continuous and binary traits through not only one study but also multiple studies, which would be helpful to overcome the limitation of existing methods that can only be applied to a specific type of data. We conducted thorough simulation studies to verify that the proposed method controls type I errors well, and performs favorably compared to single-marker analysis and other existing methods. We demonstrated the utility of our proposed method through analysis of GWAS meta-analysis results for fasting glucose and lipids from the international MAGIC consortium and Global Lipids Consortium, respectively. The proposed method identified some novel trait associated genes which can improve our understanding of the mechanisms involved in β -cell function, glucose homeostasis, and lipids traits.
Collapse
Affiliation(s)
- Jianjun Zhang
- Department of Mathematics, University of North Texas, Denton, Texas
| | - Zihan Zhao
- Texas Academy of Mathematics & Science, University of North Texas, Denton, Texas
| | - Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, Denton, Texas
| | - Bin Guo
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
121
|
Yang C, Chen M, Huang H, Li X, Qian D, Hong X, Zheng L, Hong J, Hong J, Zhu Z, Zheng X, Sheng Y, Zhang X. Exome-Wide Rare Loss-of-Function Variant Enrichment Study of 21,347 Han Chinese Individuals Identifies Four Susceptibility Genes for Psoriasis. J Invest Dermatol 2019; 140:799-805.e1. [PMID: 31376382 DOI: 10.1016/j.jid.2019.07.692] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Revised: 06/25/2019] [Accepted: 07/10/2019] [Indexed: 11/20/2022]
Abstract
Most psoriasis-related genes or loci identified by GWAS represent common clusters and are located in noncoding regions of the human genome, providing only limited evidence for the roles of rare coding variants in psoriasis. Two exome-wide case-control genotyping data sets (11,245 cases and 11,177 controls) were obtained from our previous study. Quality controls were established for each data set, and the markers remaining in each set were annotated using ANNOVAR. Gene-based analysis was performed on the annotation results. A total of 250 and 35 genes in the Exome_Fine and Exome_Asian array cohorts, respectively, exceeded the threshold (P < 4.43 × 10-6). Merged gene-based analysis was then conducted on the same set of SNPs from seven genes common to both arrays, and the chi-square test was used to confirm all gene-based results. Ultimately, four susceptibility genes were identified: BBS7 (Pcombine = 1.38 × 10-29), GSTCD (Pcombine = 8.35 × 10-47), LIPK (Pcombine = 1.02 × 10-19), and PPP4R3B (Pcombine = 1.79 × 10-33). This study identified four susceptibility genes for psoriasis via a gene-based method using rare variants, contributing to our understanding of the pathogenesis of psoriasis.
Collapse
Affiliation(s)
- Chao Yang
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - Mengyun Chen
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - He Huang
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - Xueying Li
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - Danfeng Qian
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - Xiaojie Hong
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - Lijun Zheng
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - Jiaqi Hong
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - Jiaqi Hong
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - Zhengwei Zhu
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China
| | - Xiaodong Zheng
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China.
| | - Yujun Sheng
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China.
| | - Xuejun Zhang
- Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei, China; Institute of Dermatology, Anhui Medical University, Hefei, China; Key Laboratory of Dermatology, Anhui Medical University, Ministry of Education, Hefei, China.
| |
Collapse
|
122
|
Hishida A, Ugai T, Fujii R, Nakatochi M, Wu MC, Ito H, Oze I, Tajika M, Niwa Y, Nishiyama T, Nakagawa-Senda H, Suzuki S, Koyama T, Matsui D, Watanabe Y, Kawaguchi T, Matsuda F, Momozawa Y, Kubo M, Naito M, Matsuo K, Wakai K. GWAS analysis reveals a significant contribution of PSCA to the risk of Heliobacter pylori-induced gastric atrophy. Carcinogenesis 2019; 40:661-668. [DOI: 10.1093/carcin/bgz016] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023] Open
Abstract
Abstract
Although recent genome-wide association studies (GWASs) have identified genetic variants associated with Helicobacter pylori (HP)-induced gastric cancer, few studies have examined the genetic traits associated with the risk of HP-induced gastric precancerous conditions. This study aimed to elucidate genetic variants associated with these conditions using a genome-wide approach. Data from four sites of the Japan Multi-Institutional Collaborative Cohort (J-MICC) Study were used in the discovery phase (Stage I); two datasets from the Hospital-based Epidemiologic Research Program at Aichi Cancer Center 2 (HERPACC2) study were used in the replication phases (Stages II and III) and SKAT (SNP-set Kernel Association Test) and single variant-based GWASs were conducted for the risks of gastric atrophy (GA) and severe GA defined by serum pepsinogen (PG) levels, and PG1 and PG1/2 ratios. In the gene-based SKAT in Stage I, prostate stem cell antigen (PSCA) was significantly associated with the risks of GA and severe GA, and serum PG1/2 level by linear kernel [false discovery rate (FDR) = 0.011, 0.230 and 7.2 × 10−7, respectively]. The single variant-based GWAS revealed that nine PSCA single nucleotide polymorphisms (SNPs) fulfilled the genome-wide significance level (P < 5 × 10−8) for the risks of both GA and severe GA in the combined study, although most of these associations did not reach genome-wide significance in the discovery or validation cohort on their own. GWAS for serum PG1 levels and PG1/2 ratios revealed that the PSCA rs2920283 SNP had a striking P-value of 4.31 × 10−27 for PG1/2 ratios. The present GWAS revealed the genetic locus of PSCA as the most significant locus for the risk of HP-induced GA, which confirmed the recently reported association in Europeans.
Collapse
Affiliation(s)
- Asahi Hishida
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Tomotaka Ugai
- Division of Cancer Epidemiology and Prevention, Aichi Cancer Center Research Institute, Nagoya, Japan
| | - Ryosuke Fujii
- Department of Preventive Medical Sciences, Fujita Medical University School of Health Sciences, Toyoake, Japan
| | - Masahiro Nakatochi
- Data Coordinating Center, Department of Advanced Medicine, Nagoya University Hospital, Nagoya, Japan
| | - Michael C Wu
- Biostatistics and Biomathematics Program, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Hidemi Ito
- Division of Cancer Information and Control, Aichi Cancer Center Research Institute, Nagoya, Japan
| | - Isao Oze
- Division of Cancer Epidemiology and Prevention, Aichi Cancer Center Research Institute, Nagoya, Japan
| | | | | | - Takeshi Nishiyama
- Department of Public Health, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Hiroko Nakagawa-Senda
- Department of Public Health, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Sadao Suzuki
- Department of Public Health, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Teruhide Koyama
- Department of Epidemiology for Community Health and Medicine, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Daisuke Matsui
- Department of Epidemiology for Community Health and Medicine, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Yoshiyuki Watanabe
- Department of Epidemiology for Community Health and Medicine, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Takahisa Kawaguchi
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Fumihiko Matsuda
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Michiaki Kubo
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Mariko Naito
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
- Department of Oral Epidemiology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Keitaro Matsuo
- Division of Cancer Epidemiology and Prevention, Aichi Cancer Center Research Institute, Nagoya, Japan
- Department of Epidemiology, Nagoya University Graduate School of Medicine, Nagoya, Japan
| | - Kenji Wakai
- Department of Preventive Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
| |
Collapse
|
123
|
Aterido A, Cañete JD, Tornero J, Blanco F, Fernández-Gutierrez B, Pérez C, Alperi-López M, Olivè A, Corominas H, Martínez-Taboada V, González I, Fernández-Nebro A, Erra A, López-Lasanta M, López Corbeto M, Palau N, Marsal S, Julià A. A Combined Transcriptomic and Genomic Analysis Identifies a Gene Signature Associated With the Response to Anti-TNF Therapy in Rheumatoid Arthritis. Front Immunol 2019; 10:1459. [PMID: 31312201 PMCID: PMC6614444 DOI: 10.3389/fimmu.2019.01459] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 06/10/2019] [Indexed: 12/14/2022] Open
Abstract
Background: Rheumatoid arthritis (RA) is the most frequent autoimmune disease involving the joints. Although anti-TNF therapies have proven effective in the management of RA, approximately one third of patients do not show a significant clinical response. The objective of this study was to identify new genetic variation associated with the clinical response to anti-TNF therapy in RA. Methods: We performed a sequential multi-omic analysis integrating different sources of molecular information. First, we extracted the RNA from synovial biopsies of 11 RA patients starting anti-TNF therapy to identify gene coexpression modules (GCMs) in the RA synovium. Second, we analyzed the transcriptomic association between each GCM and the clinical response to anti-TNF therapy. The clinical response was determined at week 14 using the EULAR criteria. Third, we analyzed the association between the GCMs and anti-TNF response at the genetic level. For this objective, we used genome-wide data from a cohort of 348 anti-TNF treated patients from Spain. The GCMs that were significantly associated with the anti-TNF response were then tested for validation in an independent cohort of 2,706 anti-TNF treated patients. Finally, the functional implication of the validated GCMs was evaluated via pathway and cell type epigenetic enrichment analyses. Results: A total of 149 GCMs were identified in the RA synovium. From these, 13 GCMs were found to be significantly associated with anti-TNF response (P < 0.05). At the genetic level, we detected two of the 13 GCMs to be significantly associated with the response to adalimumab (P = 0.0015) and infliximab (P = 0.021) in the Spain cohort. Using the independent cohort of RA patients, we replicated the association of the GCM associated with the response to adalimumab (P = 0.0019). The validated module was found to be significantly enriched for genes involved in the nucleotide metabolism (P = 2.41e-5) and epigenetic marks from immune cells, including CD4+ regulatory T cells (P = 0.041). Conclusions: These findings show the existence of a drug-specific genetic basis for anti-TNF response, thereby supporting treatment stratification in the search for response biomarkers in RA.
Collapse
Affiliation(s)
- Adrià Aterido
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain.,Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan D Cañete
- Rheumatology Department, Hospital Clínic de Barcelona and Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
| | - Jesús Tornero
- Rheumatology Department, Hospital Universitario De Guadalajara, Guadalajara, Spain
| | - Francisco Blanco
- Rheumatology Department, INIBIC-Hospital Universitario A Coruña, A Coruña, Spain
| | | | - Carolina Pérez
- Rheumatology Department, Parc de Salut Mar, Barcelona, Spain
| | | | - Alex Olivè
- Rheumatology Department, Hospital Universitari Germans Trias i Pujol, Barcelona, Spain
| | - Héctor Corominas
- Rheumatology Department, Hospital Moisès Broggi, Barcelona, Spain
| | | | - Isidoro González
- Rheumatology Department, Hospital Universitario La Princesa, IIS La Princesa, Madrid, Spain
| | - Antonio Fernández-Nebro
- UGC Reumatología, Instituto Investigación Biomédica Málaga, Hospital Regional Universitario, Universidad de Málaga, Málaga, Spain
| | - Alba Erra
- Rheumatology Department, Hospital Sant Rafael, Barcelona, Spain
| | - María López-Lasanta
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| | | | - Núria Palau
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| | - Sara Marsal
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| | - Antonio Julià
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| |
Collapse
|
124
|
Huang YT. Variance component tests of multivariate mediation effects under composite null hypotheses. Biometrics 2019; 75:1191-1204. [PMID: 31009061 DOI: 10.1111/biom.13073] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 03/28/2019] [Indexed: 11/27/2022]
Abstract
Mediation effects of multiple mediators are determined by two associations: one between an exposure and mediators ( S - M ) and the other between the mediators and an outcome conditional on the exposure ( M - Y ). The test for mediation effects is conducted under a composite null hypothesis, that is, either one of the S - M and M - Y associations is zero or both are zeros. Without accounting for the composite null, the type 1 error rate within a study containing a large number of multimediator tests may be much less than the expected. We propose a novel test to address the issue. For each mediation test j , j = 1 , … , J , we examine the S - M and M - Y associations using two separate variance component tests. Assuming a zero-mean working distribution with a common variance for the element-wise S - M (and M - Y ) associations, score tests for the variance components are constructed. We transform the test statistics into two normally distributed statistics under the null. Using a recently developed result, we conduct J hypothesis tests accounting for the composite null hypothesis by adjusting for the variances of the normally distributed statistics for the S - M and M - Y associations. Advantages of the proposed test over other methods are illustrated in simulation studies and a data application where we analyze lung cancer data from The Cancer Genome Atlas to investigate the smoking effect on gene expression through DNA methylation in 15 114 genes.
Collapse
Affiliation(s)
- Yen-Tsung Huang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
125
|
Petrykey K, Lippé S, Robaey P, Sultan S, Laniel J, Drouin S, Bertout L, Beaulieu P, St-Onge P, Boulet-Craig A, Rezgui A, Yasui Y, Sapkota Y, Krull KR, Hudson MM, Laverdière C, Sinnett D, Krajinovic M. Influence of genetic factors on long-term treatment related neurocognitive complications, and on anxiety and depression in survivors of childhood acute lymphoblastic leukemia: The Petale study. PLoS One 2019; 14:e0217314. [PMID: 31181069 PMCID: PMC6557490 DOI: 10.1371/journal.pone.0217314] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 05/08/2019] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND A substantial number of survivors of childhood acute lymphoblastic leukemia suffer from treatment-related late adverse effects including neurocognitive impairment. While multiple studies have described neurocognitive outcomes in childhood acute lymphoblastic leukemia (ALL) survivors, relatively few have investigated their association with individual genetic constitution. METHODS To further address this issue, genetic variants located in 99 genes relevant to the effects of anticancer drugs and in 360 genes implicated in nervous system function and predicted to affect protein function, were pooled from whole exome sequencing data of childhood ALL survivors (PETALE cohort) and analyzed for an association with neurocognitive complications, as well as with anxiety and depression. Variants that sustained correction for multiple testing were genotyped in entire cohort (n = 236) and analyzed with same outcomes. RESULTS Common variants in MTR, PPARA, ABCC3, CALML5, CACNB2 and PCDHB10 genes were associated with deficits in neurocognitive tests performance, whereas a variant in SLCO1B1 and EPHA5 genes was associated with anxiety and depression. Majority of associations were modulated by intensity of treatment. Associated variants were further analyzed in an independent SJLIFE cohort of 545 ALL survivors. Two variants, rs1805087 in methionine synthase, MTR and rs58225473 in voltage-dependent calcium channel protein encoding gene, CACNB2 are of particular interest, since associations of borderline significance were found in replication cohort and remain significant in combined discovery and replication groups (OR = 1.5, 95% CI, 1-2.3; p = 0.04 and; OR = 3.7, 95% CI, 1.25-11; p = 0.01, respectively). Variant rs4149056 in SLCO1B1 gene also deserves further attention since previously shown to affect methotrexate clearance and short-term toxicity in ALL patients. CONCLUSIONS Current findings can help understanding of the influence of genetic component on long-term neurocognitive impairment. Further studies are needed to confirm whether identified variants may be useful in identifying survivors at increased risk of these complications.
Collapse
Affiliation(s)
- Kateryna Petrykey
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
- Department of Pharmacology and Physiology, Université de Montréal, Montreal, Quebec, Canada
| | - Sarah Lippé
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
- Department of Psychology, Université de Montréal, Montreal, Quebec, Canada
| | - Philippe Robaey
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
- Children’s Hospital of Eastern Ontario, Ottawa, Ontario, Canada
- Department of Psychiatry, Université de Montréal, Montreal, Quebec, Canada
- Department of Psychiatry, University of Ottawa, Ottawa, Ontario, Canada
| | - Serge Sultan
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
- Department of Psychology, Université de Montréal, Montreal, Quebec, Canada
| | - Julie Laniel
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
- Department of Psychology, Université de Montréal, Montreal, Quebec, Canada
| | - Simon Drouin
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
| | - Laurence Bertout
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
| | - Patrick Beaulieu
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
| | - Pascal St-Onge
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
| | - Aubrée Boulet-Craig
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
- Department of Psychology, Université de Montréal, Montreal, Quebec, Canada
| | - Aziz Rezgui
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
| | - Yutaka Yasui
- Epidemiology and Cancer Control Department, St. Jude Children’s Research Hospital, Memphis, TN, United States of America
| | - Yadav Sapkota
- Epidemiology and Cancer Control Department, St. Jude Children’s Research Hospital, Memphis, TN, United States of America
| | - Kevin R. Krull
- Epidemiology and Cancer Control Department, St. Jude Children’s Research Hospital, Memphis, TN, United States of America
| | - Melissa M. Hudson
- Epidemiology and Cancer Control Department, St. Jude Children’s Research Hospital, Memphis, TN, United States of America
- Oncology Department, St. Jude Children’s Research Hospital, Memphis, TN, United States of America
| | - Caroline Laverdière
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
- Department of Pediatrics, Université de Montréal, Montreal, Quebec, Canada
| | - Daniel Sinnett
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
- Department of Pediatrics, Université de Montréal, Montreal, Quebec, Canada
| | - Maja Krajinovic
- Sainte-Justine University Health Center (SJUHC), Montreal, Quebec, Canada
- Department of Pharmacology and Physiology, Université de Montréal, Montreal, Quebec, Canada
- Department of Pediatrics, Université de Montréal, Montreal, Quebec, Canada
| |
Collapse
|
126
|
Zhao Y, Zhu H, Lu Z, Knickmeyer RC, Zou F. Structured Genome-Wide Association Studies with Bayesian Hierarchical Variable Selection. Genetics 2019; 212:397-415. [PMID: 31010934 PMCID: PMC6553832 DOI: 10.1534/genetics.119.301906] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 04/08/2019] [Indexed: 02/04/2023] Open
Abstract
It becomes increasingly important in using genome-wide association studies (GWAS) to select important genetic information associated with qualitative or quantitative traits. Currently, the discovery of biological association among SNPs motivates various strategies to construct SNP-sets along the genome and to incorporate such set information into selection procedure for a higher selection power, while facilitating more biologically meaningful results. The aim of this paper is to propose a novel Bayesian framework for hierarchical variable selection at both SNP-set (group) level and SNP (within group) level. We overcome a key limitation of existing posterior updating scheme in most Bayesian variable selection methods by proposing a novel sampling scheme to explicitly accommodate the ultrahigh-dimensionality of genetic data. Specifically, by constructing an auxiliary variable selection model under SNP-set level, the new procedure utilizes the posterior samples of the auxiliary model to subsequently guide the posterior inference for the targeted hierarchical selection model. We apply the proposed method to a variety of simulation studies and show that our method is computationally efficient and achieves substantially better performance than competing approaches in both SNP-set and SNP selection. Applying the method to the Alzheimers Disease Neuroimaging Initiative (ADNI) data, we identify biologically meaningful genetic factors under several neuroimaging volumetric phenotypes. Our method is general and readily to be applied to a wide range of biomedical studies.
Collapse
Affiliation(s)
- Yize Zhao
- Department of Healthcare Policy and Research, Cornell University Weill Cornell, New York, New York 10065
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Zhaohua Lu
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Rebecca C Knickmeyer
- Department of Pediatrics and Human Development, Michigan State University, East Lansing, Michigan 48824
| | - Fei Zou
- Department of Biostatistics, University of Florida, Gainesville, Florida 32611
| |
Collapse
|
127
|
Evans KL, Wirtz HS, Li J, She R, Maya J, Gui H, Hamer A, Depre C, Lanfear DE. Genetics of heart rate in heart failure patients (GenHRate). Hum Genomics 2019; 13:22. [PMID: 31113495 PMCID: PMC6528282 DOI: 10.1186/s40246-019-0206-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 04/22/2019] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Elevated resting heart rate (HR) is a risk factor and therapeutic target in patients with heart failure (HF) and reduced ejection fraction (HFrEF). Previous studies indicate a genetic contribution to HR in population samples but there is little data in patients with HFrEF. METHODS Patients who met Framingham criteria for HF and had an ejection fraction < 50% were prospectively enrolled in a genetic HF registry (2007-2015, n = 1060). All participants donated blood for DNA and underwent genome-wide genotyping with additional variants called via imputation. We performed testing of previously identified variant "hits" (43 loci) as well as a genome-wide association (GWAS) of HR, adjusted for race, using Efficient Mixed-Model Association Expedited (EMMAX). RESULTS The cohort was 35% female, 51% African American, and averaged 68 years of age. There was a 2 beats per minute (bpm) difference in HR by race, AA being slightly higher. Among 43 candidate variants, 4 single nucleotide polymorphisms (SNPs) in one gene (GJA1) were significantly associated with HR. In genome-wide testing, one statistically significant association peak was identified on chromosome 22q13, with strongest SNP rs535263906 (p = 3.3 × 10-8). The peak is located within the gene Cadherin EGF LAG Seven-Pass G-Type Receptor 1 (CELSR1), encoding a cadherin super-family cell surface protein identified in GWAS of other phenotypes (e.g., stroke). The highest associated SNP was specific to the African American population. CONCLUSIONS These data confirm GJA1 association with HR in the setting of HFrEF and identify novel candidate genes for HR in HFrEF patients, particularly CELSR1. These associations should be tested in additional cohorts.
Collapse
Affiliation(s)
- Kaleigh L. Evans
- 0000 0001 2160 8953grid.413103.4Department of Internal Medicine, Henry Ford Hospital, 2799 West Grand Blvd. K-14, Detroit, MI 48202 USA
| | - Heidi S. Wirtz
- 0000 0001 0657 5612grid.417886.4Amgen Inc, Thousand Oaks, CA USA
| | - Jia Li
- 0000 0000 8523 7701grid.239864.2Department of Public Health Sciences, Henry Ford Health System, Detroit, MI USA
| | - Ruicong She
- 0000 0000 8523 7701grid.239864.2Department of Public Health Sciences, Henry Ford Health System, Detroit, MI USA
| | - Juan Maya
- 0000 0001 0657 5612grid.417886.4Amgen Inc, Thousand Oaks, CA USA
| | - Hongsheng Gui
- 0000 0001 2160 8953grid.413103.4Center for Individualized and Genomic Medicine Research, Henry Ford Hospital, Detroit, MI USA
| | - Andrew Hamer
- 0000 0001 0657 5612grid.417886.4Amgen Inc, Thousand Oaks, CA USA
| | - Christophe Depre
- 0000 0001 0657 5612grid.417886.4Amgen Inc, Thousand Oaks, CA USA
| | - David E. Lanfear
- 0000 0001 2160 8953grid.413103.4Department of Internal Medicine, Henry Ford Hospital, 2799 West Grand Blvd. K-14, Detroit, MI 48202 USA
- 0000 0001 2160 8953grid.413103.4Center for Individualized and Genomic Medicine Research, Henry Ford Hospital, Detroit, MI USA
- 0000 0001 2160 8953grid.413103.4Heart and Vascular Institute, Henry Ford Hospital, Detroit, MI USA
| |
Collapse
|
128
|
Djordjilović V, Page CM, Gran JM, Nøst TH, Sandanger TM, Veierød MB, Thoresen M. Global test for high-dimensional mediation: Testing groups of potential mediators. Stat Med 2019; 38:3346-3360. [PMID: 31074092 DOI: 10.1002/sim.8199] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 04/18/2019] [Accepted: 04/22/2019] [Indexed: 11/08/2022]
Abstract
We address the problem of testing whether a possibly high-dimensional vector may act as a mediator between some exposure variable and the outcome of interest. We propose a global test for mediation, which combines a global test with the intersection-union principle. We discuss theoretical properties of our approach and conduct simulation studies that demonstrate that it performs equally well or better than its competitor. We also propose a multiple testing procedure, ScreenMin, that provides asymptotic control of either familywise error rate or false discovery rate when multiple groups of potential mediators are tested simultaneously. We apply our approach to data from a large Norwegian cohort study, where we look at the hypothesis that smoking increases the risk of lung cancer by modifying the level of DNA methylation.
Collapse
Affiliation(s)
- Vera Djordjilović
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Christian M Page
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway.,Center for Fertility and Health, Division of Mental and Physical Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Jon Michael Gran
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.,Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway
| | - Therese H Nøst
- Department of Community Medicine, The Arctic University of Norway, Tromsø, Norway
| | - Torkjel M Sandanger
- Department of Community Medicine, The Arctic University of Norway, Tromsø, Norway
| | - Marit B Veierød
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Magne Thoresen
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| |
Collapse
|
129
|
Yan Q, Liu N, Forno E, Canino G, Celedón JC, Chen W. An integrative association method for omics data based on a modified Fisher's method with application to childhood asthma. PLoS Genet 2019; 15:e1008142. [PMID: 31063461 PMCID: PMC6524814 DOI: 10.1371/journal.pgen.1008142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 05/17/2019] [Accepted: 04/16/2019] [Indexed: 02/07/2023] Open
Abstract
The development of high-throughput biotechnologies allows the collection of omics data to study the biological mechanisms underlying complex diseases at different levels, such as genomics, epigenomics, and transcriptomics. However, each technology is designed to collect a specific type of omics data. Thus, the association between a disease and one type of omics data is usually tested individually, but this strategy is suboptimal. To better articulate biological processes and increase the consistency of variant identification, omics data from various platforms need to be integrated. In this report, we introduce an approach that uses a modified Fisher's method (denoted as Omnibus-Fisher) to combine separate p-values of association testing for a trait and SNPs, DNA methylation markers, and RNA sequencing, calculated by kernel machine regression into an overall gene-level p-value to account for correlation between omics data. To consider all possible disease models, we extend Omnibus-Fisher to an optimal test by using perturbations. In our simulations, a usual Fisher's method has inflated type I error rates when directly applied to correlated omics data. In contrast, Omnibus-Fisher preserves the expected type I error rates. Moreover, Omnibus-Fisher has increased power compared to its optimal version when the true disease model involves all types of omics data. On the other hand, the optimal Omnibus-Fisher is more powerful than its regular version when only one type of data is causal. Finally, we illustrate our proposed method by analyzing whole-genome genotyping, DNA methylation data, and RNA sequencing data from a study of childhood asthma in Puerto Ricans.
Collapse
Affiliation(s)
- Qi Yan
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
- * E-mail: (QY); (WC)
| | - Nianjun Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, IN
| | - Erick Forno
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
| | - Glorisa Canino
- Behavioral Sciences Research Institute, University of Puerto Rico, San Juan, PR
| | - Juan C. Celedón
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
| | - Wei Chen
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, PA
- * E-mail: (QY); (WC)
| |
Collapse
|
130
|
Li Z, Li X, Liu Y, Shen J, Chen H, Zhou H, Morrison AC, Boerwinkle E, Lin X. Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole-Genome Sequencing Studies. Am J Hum Genet 2019; 104:802-814. [PMID: 30982610 PMCID: PMC6507043 DOI: 10.1016/j.ajhg.2019.03.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2018] [Accepted: 03/01/2019] [Indexed: 11/19/2022] Open
Abstract
Whole-genome sequencing (WGS) studies are being widely conducted in order to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set-based analyses are commonly used by researchers for analyzing rare variants. However, existing variant-set-based approaches need to pre-specify genetic regions for analysis; hence, they are not directly applicable to WGS data because of the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding-window method requires the pre-specification of fixed window sizes, which are often unknown as a priori, are difficult to specify in practice, and are subject to limitations given that the sizes of genetic-association regions are likely to vary across the genome and phenotypes. We propose a computationally efficient and dynamic scan-statistic method (Scan the Genome [SCANG]) for analyzing WGS data; this method flexibly detects the sizes and the locations of rare-variant association regions without the need to specify a prior, fixed window size. The proposed method controls for the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected sizes of rare-variant association regions to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative methods for detecting rare-variant-associations while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.
Collapse
Affiliation(s)
- Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Yaowu Liu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Jincheng Shen
- Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Center for Precision Health, School of Public Health and School of Biomedical Informatics, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
131
|
Skalkidou A, Poromaa IS, Iliadis SI, Huizink AC, Hellgren C, Freyhult E, Comasco E. Stress-related genetic polymorphisms in association with peripartum depression symptoms and stress hormones: A longitudinal population-based study. Psychoneuroendocrinology 2019; 103:296-305. [PMID: 30776573 DOI: 10.1016/j.psyneuen.2019.02.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 02/02/2019] [Accepted: 02/04/2019] [Indexed: 02/06/2023]
Abstract
Individual differences in the response of the stress system to hormonal changes during pregnancy and the postpartum period render some women susceptible to developing depression. The present study sought to investigate peripartum depression and stress hormones in relation to stress-related genotypes. The Edinburgh Postnatal Depression Scale was used to assess peripartum depressive symptoms in a sample of 1629 women, followed from pregnancy week seventeen to six months postpartum. Genotypes of ninety-four haplotype-tag single nucleotide polymorphisms (SNPs) in sixteen genes of the hypothalamus-pituitary-adrenal axis pathway were analyzed and data on psychosocial and demographic factors was collected. In sub-studies, salivary cortisol awakening response in gestational week 35-39, salivary evening cortisol levels in gestational week 36 and postpartum week 6, and blood cortisol and cortisone levels in gestational week 35-39 were analyzed. SNP-set kernel association tests were performed at the gene-level, considering psychosocial and demographic factors, followed by post-hoc analyses of SNPs of significant genes. Statistically significant findings at the 0.05 p-level included SNPs in the hydroxysteroid 11-beta dehydrogenase 1 (HSD11B1) gene in relation to self-rated depression scores in postpartum week six among all participants, and serpin family A member 6 (SERPINA6) gene at the same time-point among women with de novo onset of postpartum depression. SNPs in these genes also associated with stress hormone levels during pregnancy. The present study adds knowledge to the neurobiological basis of peripartum depression by systematically assessing SNPs in stress-regulatory genes and stress-hormone levels in a population-based sample of women.
Collapse
Affiliation(s)
- Alkistis Skalkidou
- Department of Women's and Children's Health, Uppsala University, Uppsala, Sweden.
| | | | - Stavros I Iliadis
- Department of Women's and Children's Health, Uppsala University, Uppsala, Sweden
| | - Anja C Huizink
- Section of Clinical Developmental Psychology, Vrije Universiteit Amsterdam, the Netherlands; School of Health and Education, University of Skövde, Sweden
| | - Charlotte Hellgren
- Department of Women's and Children's Health, Uppsala University, Uppsala, Sweden
| | - Eva Freyhult
- Department of Medical Science, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Erika Comasco
- Department of Neuroscience, Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
132
|
Xiang B, Wang Q, Lei W, Li M, Li Y, Zhao L, Ma X, Wang Y, Yu H, Li X, Meng Y, Guo W, Deng W, Ren H, Li T. Genes in immune pathways associated with abnormal white matter integrity in first-episode and treatment-naïve patients with schizophrenia. Br J Psychiatry 2019; 214:281-287. [PMID: 30722794 DOI: 10.1192/bjp.2018.297] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
BACKGROUND Previous studies have inferred a strong genetic component in schizophrenia. However, the genetic variants involved in the susceptibility to schizophrenia remain unclear.AimsTo detect potential gene pathways and networks associated with schizophrenia, and to explore the relationship between common and rare variants in these pathways and abnormal white matter integrity in schizophrenia. METHOD The analysis included 100 first-episode treatment-naïve patients with schizophrenia and 140 healthy controls. A network-based analysis was carried out on the data collected from the Psychiatric Genomics Consortium Phase I (PGC-I). Based on our genome-wide association study and whole-exome sequencing data-sets, we performed a gene-set analysis to detect associations between the combining effects of common and rare genetic variants and abnormal white matter integrity in schizophrenia. RESULTS Patients had significantly reduced functional anisotropy in the left and right anterior cingulate cortex, left and right precuneus and extra-nuclear (t = 4.61-5.10, PFDR < 0.01), compared with controls. Generated from co-expression network analysis of the PGC-1 summary statistics of schizophrenia, a subnetwork of 207 genes associated with schizophrenia was identified (P < 0.01), and 176 genes were co-expressed in four gene modules. Functional enrichment analysis for genes in each module revealed that the yellow module was enriched with highly co-expressed, innate immune response genes. Furthermore, rare variants of enriched genes in the yellow module were associated with reduced functional anisotropy in the left anterior cingulate cortex (P = 0.006; Padjusted = 0.024) in patients only. CONCLUSIONS The pathogenesis of schizophrenia may be substantially influenced by genes involved in the immune system, via both pathway and network.Declaration of interestsNone.
Collapse
Affiliation(s)
- Bo Xiang
- Assistant Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University; andDepartment of Psychiatry,Affiliated Hospital of Southwest Medical University,China
| | - Qiang Wang
- Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Wei Lei
- Assistant Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University; andDepartment of Psychiatry,Affiliated Hospital of Southwest Medical University,China
| | - Mingli Li
- Associate Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Yinfei Li
- Attending Doctor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Liansheng Zhao
- Assistant Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Xiaohong Ma
- Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Yingcheng Wang
- Assistant Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Hua Yu
- Attending Doctor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Xiaojing Li
- Attending Doctor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Yajing Meng
- Attending Doctor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Wanjun Guo
- Associate Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Wei Deng
- Associate Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Hongyan Ren
- Attending Doctor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| | - Tao Li
- Professor,Mental Health Center and Psychiatric Laboratory,State Key Laboratory of Biotherapy,West China Brain Research Center,West China Hospital of Sichuan University,China
| |
Collapse
|
133
|
Liu Y, Xie J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J Am Stat Assoc 2019; 115:393-402. [PMID: 33012899 DOI: 10.1080/01621459.2018.1554485] [Citation(s) in RCA: 183] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Combining individual p-values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher's combination test. In modern large-scale data analysis, correlation and sparsity are common features and efficient computation is a necessary requirement for dealing with massive data. To overcome these challenges, we propose a new test that takes advantage of the Cauchy distribution. Our test statistic has a simple form and is defined as a weighted sum of Cauchy transformation of individual p-values. We prove a non-asymptotic result that the tail of the null distribution of our proposed test statistic can be well approximated by a Cauchy distribution under arbitrary dependency structures. Based on this theoretical result, the p-value calculation of our proposed test is not only accurate, but also as simple as the classic z-test or t-test, making our test well suited for analyzing massive data. We further show that the power of the proposed test is asymptotically optimal in a strong sparsity setting. Extensive simulations demonstrate that the proposed test has both strong power against sparse alternatives and a good accuracy with respect to p-value calculations, especially for very small p-values. The proposed test has also been applied to a genome-wide association study of Crohn's disease and compared with several existing tests.
Collapse
Affiliation(s)
- Yaowu Liu
- Department of Biostatistics, Harvard School of Public Health
| | - Jun Xie
- Department of Statistics, Purdue University
| |
Collapse
|
134
|
Svishcheva GR. A generalized model for combining dependent SNP-level summary statistics and its extensions to statistics of other levels. Sci Rep 2019; 9:5461. [PMID: 30940856 PMCID: PMC6445108 DOI: 10.1038/s41598-019-41827-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 03/06/2019] [Indexed: 11/12/2022] Open
Abstract
Here I propose a fundamentally new flexible model to reveal the association between a trait and a set of genetic variants in a genomic region/gene. This model was developed for the situation when original individual-level phenotype and genotype data are not available, but the researcher possesses the results of statistical analyses conducted on these data (namely, SNP-level summary Z score statistics and SNP-by-SNP correlations). The new model was analytically derived from the classical multiple linear regression model applied for the region-based association analysis of individual-level phenotype and genotype data by using the linear compression of data, where the SNP-by-SNP correlations are among the explanatory variables, and the summary Z score statistics are categorized as the response variables. I analytically show that the regional association analysis methods developed within the framework of the classical multiple linear regression model with additive effects of genetic variants can be reformulated in terms of the new model without the loss of information. The results obtained from the regional association analysis utilizing the classical model and those derived using the proposed model are identical when SNP-by-SNP correlations and SNP-level statistics are estimated from the same genetic data.
Collapse
Affiliation(s)
- Gulnara R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090, Russia. .,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia.
| |
Collapse
|
135
|
Zhao N, Zhang H, Clark JJ, Maity A, Wu MC. Composite kernel machine regression based on likelihood ratio test for joint testing of genetic and gene–environment interaction effect. Biometrics 2019; 75:625-637. [DOI: 10.1111/biom.13003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 10/09/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Ni Zhao
- Department of BiostatisticsJohns Hopkins UniversityBaltimore, Maryland
| | - Haoyu Zhang
- Department of BiostatisticsJohns Hopkins UniversityBaltimore, Maryland
| | - Jennifer J. Clark
- Department of BiostatisticsUniversity of North Carolina at Chapel HillChapel Hill, North Carolina
| | - Arnab Maity
- Department of StatisticsNorth Carolina State UniversityRaleigh, North Carolina
| | - Michael C. Wu
- Public Health Sciences Division,Fred Hutchinson Cancer Research CenterSeattle, Washington
| |
Collapse
|
136
|
Multivariate association test for rare variants controlling for cryptic and family relatedness. CAN J STAT 2019. [DOI: 10.1002/cjs.11475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
137
|
Marceau West R, Lu W, Rotroff DM, Kuenemann MA, Chang SM, Wu MC, Wagner MJ, Buse JB, Motsinger-Reif AA, Fourches D, Tzeng JY. Identifying individual risk rare variants using protein structure guided local tests (POINT). PLoS Comput Biol 2019; 15:e1006722. [PMID: 30779729 PMCID: PMC6396946 DOI: 10.1371/journal.pcbi.1006722] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 03/01/2019] [Accepted: 12/17/2018] [Indexed: 01/08/2023] Open
Abstract
Rare variants are of increasing interest to genetic association studies because of their etiological contributions to human complex diseases. Due to the rarity of the mutant events, rare variants are routinely analyzed on an aggregate level. While aggregation analyses improve the detection of global-level signal, they are not able to pinpoint causal variants within a variant set. To perform inference on a localized level, additional information, e.g., biological annotation, is often needed to boost the information content of a rare variant. Following the observation that important variants are likely to cluster together on functional domains, we propose a protein structure guided local test (POINT) to provide variant-specific association information using structure-guided aggregation of signal. Constructed under a kernel machine framework, POINT performs local association testing by borrowing information from neighboring variants in the 3-dimensional protein space in a data-adaptive fashion. Besides merely providing a list of promising variants, POINT assigns each variant a p-value to permit variant ranking and prioritization. We assess the selection performance of POINT using simulations and illustrate how it can be used to prioritize individual rare variants in PCSK9, ANGPTL4 and CETP in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial data.
Collapse
Affiliation(s)
- Rachel Marceau West
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Daniel M. Rotroff
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Melaine A. Kuenemann
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Sheng-Mao Chang
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Michael J. Wagner
- Center for Pharmacogenomics and Individualized Therapy, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - John B. Buse
- Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, North Carolina, United States of America
| | - Alison A. Motsinger-Reif
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Denis Fourches
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
138
|
Larson NB, Chen J, Schaid DJ. A review of kernel methods for genetic association studies. Genet Epidemiol 2019; 43:122-136. [PMID: 30604442 DOI: 10.1002/gepi.22180] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/09/2018] [Accepted: 11/26/2018] [Indexed: 12/17/2022]
Abstract
Evaluating the association of multiple genetic variants with a trait of interest by use of kernel-based methods has made a significant impact on how genetic association analyses are conducted. An advantage of kernel methods is that they tend to be robust when the genetic variants have effects that are a mixture of positive and negative effects, as well as when there is a small fraction of causal variants. Another advantage is that kernel methods fit within the framework of mixed models, providing flexible ways to adjust for additional covariates that influence traits. Herein, we review the basic ideas behind the use of kernel methods for genetic association analysis as well as recent methodological advancements for different types of traits, multivariate traits, pedigree data, and longitudinal data. Finally, we discuss opportunities for future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Jun Chen
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
139
|
Xiang B, Yang BZ, Zhou H, Kranzler HR, Gelernter J. GWAS and network analysis of co-occurring nicotine and alcohol dependence identifies significantly associated alleles and network. Am J Med Genet B Neuropsychiatr Genet 2019; 180:3-11. [PMID: 30488612 PMCID: PMC6918694 DOI: 10.1002/ajmg.b.32692] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 08/02/2018] [Accepted: 09/26/2018] [Indexed: 12/11/2022]
Abstract
Alcohol dependence (AD) and nicotine dependence (ND) co-occur frequently (AD+ND). We integrated SNP-based, gene-based, and protein-protein interaction network analyses to identify shared risk genes or gene subnetworks for AD+ND in African Americans (AAs, N = 2,094) and European Americans (EAs, N = 1,207). The DSM-IV criterion counts for AD and ND were modeled as two dependent variables in a multivariate linear mixed model, and analyzed separately for the two populations. The most significant SNP was rs6579845 in EAs (p < 1.29 × 10-8 ) in GM2A, which encodes GM2 ganglioside activator, and is a cis-expression quantitative locus that affects GM2A expression in blood and brain tissues. However, this SNP was not replicated in our another small sample (N = 678). We identified a subnetwork of 24 genes that contributed to the AD+ND criterion counts. In the gene-set analysis for the subnetwork in an independent sample, the Study of Addiction: Genetics and Environment project (predominately EAs), these 24 genes as a set differed in AD+ND versus control subjects in EAs (p = .041). Functional enrichment analysis for this subnetwork revealed that the gene enrichment involved primarily nerve growth factor pathways, and cocaine and amphetamine addiction. In conclusion, we identified a genome-wide significant variant at GM2A and a gene subnetwork underlying the genetic trait of shared AD+ND. These results increase our understanding of the shared (pleiotropic) genetic risk that underlies AD+ND.
Collapse
Affiliation(s)
- Bo Xiang
- Department of Psychiatry, Yale University School of Medicine, New Haven, and VA CT Healthcare Center, West Haven, CT, USA,Department of Psychiatry, Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan Province, China
| | - Bao-Zhu Yang
- Department of Psychiatry, Yale University School of Medicine, New Haven, and VA CT Healthcare Center, West Haven, CT, USA
| | - Hang Zhou
- Department of Psychiatry, Yale University School of Medicine, New Haven, and VA CT Healthcare Center, West Haven, CT, USA
| | - Henry R. Kranzler
- Department of Psychiatry, Center for Studies of Addiction, University of Pennsylvania and VISN 4 MIRECC, Crescenz VAMC, Philadelphia, PA, USA
| | - Joel Gelernter
- Department of Psychiatry, Yale University School of Medicine, New Haven, and VA CT Healthcare Center, West Haven, CT, USA,Departments of Genetics and Neuroscience, Yale University School of Medicine, New Haven, CT, USA
| |
Collapse
|
140
|
Mariette J, Villa-Vialaneix N. Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics 2019; 34:1009-1015. [PMID: 29077792 DOI: 10.1093/bioinformatics/btx682] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 10/24/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account. Results We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single kernel PCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of these two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system. Availability and implementation Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/. Contact jerome.mariette@inra.fr or nathalie.villa-vialaneix@inra.fr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jérôme Mariette
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan, France
| | | |
Collapse
|
141
|
Wallace HJ, Cadby G, Melton PE, Wood FM, Falder S, Crowe MM, Martin LJ, Marlow K, Ward SV, Fear MW. Genetic influence on scar height and pliability after burn injury in individuals of European ancestry: A prospective cohort study. Burns 2018; 45:567-578. [PMID: 30595539 DOI: 10.1016/j.burns.2018.10.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Revised: 08/15/2018] [Accepted: 10/04/2018] [Indexed: 12/26/2022]
Abstract
After similar extent of injury there is considerable variability in scarring between individuals, in part due to genetic factors. This study aimed to identify genetic variants associated with scar height and pliability after burn injury. An exome-wide array association study and gene pathway analysis were performed on a prospective cohort of 665 patients treated for burn injury. Outcomes were scar height (SH) and scar pliability (SP) sub-scores of the modified Vancouver Scar Scale (mVSS). DNA was genotyped using the Infinium® HumanCoreExome-24 BeadChip. Associations between genetic variants (single nucleotide polymorphisms) and SH and SP were estimated using an additive genetic model adjusting for age, sex, number of surgical procedures and % total body surface area of burn in subjects of European ancestry. No individual genetic variants achieved the cut-off threshold of significance. Gene regions were analysed for spatially correlated single nucleotide polymorphisms and significant regions identified using comb-p software. This gene list was subject to gene pathway analysis to find which biological process terms were over-represented. Using this approach biological processes related to the nervous system and cell adhesion were the predominant gene pathways associated with both SH and SP. This study suggests genes associated with innervation may be important in scar fibrosis. Further studies using similar and larger datasets will be essential to validate these findings.
Collapse
Affiliation(s)
- Hilary J Wallace
- Burn Injury Research Unit, School of Biomedical Sciences, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Australia; School of Medicine, The University of Notre Dame Australia, Fremantle, Australia.
| | - Gemma Cadby
- Centre for Genetic Origins of Health and Disease, Faculty of Health and Medical Sciences, The University of Western Australia and Faculty of Health Science, Curtin University, Perth, Australia
| | - Phillip E Melton
- Centre for Genetic Origins of Health and Disease, Faculty of Health and Medical Sciences, The University of Western Australia and Faculty of Health Science, Curtin University, Perth, Australia; School of Pharmacy and Biomedical Sciences, Faculty of Health Science, Curtin University, Perth, Australia
| | - Fiona M Wood
- Burn Injury Research Unit, School of Biomedical Sciences, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Australia; Burns Service of Western Australia, Princess Margaret Hospital for Children and Fiona Stanley Hospital, Perth, Australia
| | - Sian Falder
- Alder Hey Children's NHS Foundation Trust, Liverpool, UK
| | - Margaret M Crowe
- Burns Service of Western Australia, Princess Margaret Hospital for Children and Fiona Stanley Hospital, Perth, Australia
| | - Lisa J Martin
- Burns Service of Western Australia, Princess Margaret Hospital for Children and Fiona Stanley Hospital, Perth, Australia
| | - Karen Marlow
- Alder Hey Children's NHS Foundation Trust, Liverpool, UK
| | - Sarah V Ward
- Centre for Genetic Origins of Health and Disease, Faculty of Health and Medical Sciences, The University of Western Australia and Faculty of Health Science, Curtin University, Perth, Australia
| | - Mark W Fear
- Burn Injury Research Unit, School of Biomedical Sciences, Faculty of Health and Medical Sciences, The University of Western Australia, Perth, Australia
| |
Collapse
|
142
|
Aterido A, Cañete JD, Tornero J, Ferrándiz C, Pinto JA, Gratacós J, Queiró R, Montilla C, Torre-Alonso JC, Pérez-Venegas JJ, Fernández Nebro A, Muñoz-Fernández S, González CM, Roig D, Zarco P, Erra A, Rodríguez J, Castañeda S, Rubio E, Salvador G, Díaz-Torné C, Blanco R, Willisch Domínguez A, Mosquera JA, Vela P, Sánchez-Fernández SA, Corominas H, Ramírez J, de la Cueva P, Fonseca E, Fernández E, Puig L, Dauden E, Sánchez-Carazo JL, López-Estebaranz JL, Moreno D, Vanaclocha F, Herrera E, Blanco F, Fernández-Gutiérrez B, González A, Pérez-García C, Alperi-López M, Olivé Marques A, Martínez-Taboada V, González-Álvaro I, Sanmartí R, Tomás Roura C, García-Montero AC, Bonàs-Guarch S, Mercader JM, Torrents D, Codó L, Gelpí JL, López-Corbeto M, Pluma A, López-Lasanta M, Tortosa R, Palau N, Absher D, Myers R, Marsal S, Julià A. Genetic variation at the glycosaminoglycan metabolism pathway contributes to the risk of psoriatic arthritis but not psoriasis. Ann Rheum Dis 2018; 78:annrheumdis-2018-214158. [PMID: 30552173 DOI: 10.1136/annrheumdis-2018-214158] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 11/16/2018] [Accepted: 11/16/2018] [Indexed: 12/31/2022]
Abstract
OBJECTIVE Psoriatic arthritis (PsA) is a chronic inflammatory arthritis affecting up to 30% of patients with psoriasis (Ps). To date, most of the known risk loci for PsA are shared with Ps, and identifying disease-specific variation has proven very challenging. The objective of the present study was to identify genetic variation specific for PsA. METHODS We performed a genome-wide association study in a cohort of 835 patients with PsA and 1558 controls from Spain. Genetic association was tested at the single marker level and at the pathway level. Meta-analysis was performed with a case-control cohort of 2847 individuals from North America. To confirm the specificity of the genetic associations with PsA, we tested the associated variation using a purely cutaneous psoriasis cohort (PsC, n=614) and a rheumatoid arthritis cohort (RA, n=1191). Using network and drug-repurposing analyses, we further investigated the potential of the PsA-specific associations to guide the development of new drugs in PsA. RESULTS We identified a new PsA risk single-nucleotide polymorphism at B3GNT2 locus (p=1.10e-08). At the pathway level, we found 14 genetic pathways significantly associated with PsA (pFDR<0.05). From these, the glycosaminoglycan (GAG) metabolism pathway was confirmed to be disease-specific after comparing the PsA cohort with the cohorts of patients with PsC and RA. Finally, we identified candidate drug targets in the GAG metabolism pathway as well as new PsA indications for approved drugs. CONCLUSION These findings provide insights into the biological mechanisms that are specific for PsA and could contribute to develop more effective therapies.
Collapse
Affiliation(s)
- Adrià Aterido
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain.,Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan D Cañete
- Rheumatology Department, Hospital Clínic de Barcelona and IDIBAPS, Barcelona, Spain
| | - Jesús Tornero
- Rheumatology Department, Hospital Universitario Guadalajara, Guadalajara, Spain
| | - Carlos Ferrándiz
- Dermatology Department, Hospital Universitari Germans Trias i Pujol, Badalona, Spain
| | - José Antonio Pinto
- Rheumatology Department, Complejo Hospitalario Juan Canalejo, A Coruña, Spain
| | - Jordi Gratacós
- Rheumatology Department, Hospital Parc Taulí, Sabadell, Spain
| | - Rubén Queiró
- Rheumatology Department, Hospital Universitario Central de Asturias, Oviedo, Spain
| | - Carlos Montilla
- Rheumatology Department, Hospital Virgen de la Vega, Salamanca, Spain
| | | | | | - Antonio Fernández Nebro
- Rheumatology Department, Instituto de Investigación Biomédica de Málaga, Hospital Regional Universitario de Málaga, Málaga, Spain
| | - Santiago Muñoz-Fernández
- Rheumatology Department, Hospital Universitario Infanta Sofía, Universidad Europea, Madrid, Spain
| | - Carlos M González
- Rheumatology Department, Hospital Universitario Gregorio Marañón, Madrid, Spain
| | - Daniel Roig
- Rheumatology Department, Hospital Moisès Broggi, Barcelona, Spain
| | - Pedro Zarco
- Rheumatology Department, Hospital Universitario Fundación Alcorcón, Madrid, Spain
| | - Alba Erra
- Rheumatology Department, Hospital Sant Rafael, Barcelona, Spain
| | - Jesús Rodríguez
- Rheumatology Department, Hospital Universitari de Bellvitge, Barcelona, Spain
| | - Santos Castañeda
- Rheumatology Department, Hospital Universitario La Princesa, IIS La Princesa, Madrid, Spain
| | - Esteban Rubio
- Rheumatology Department, Centro de Salud Virgen de los Reyes, Sevilla, Spain
| | - Georgina Salvador
- Rheumatology Department, Hospital Universitario Mútua de Terrassa, Terrassa, Spain
| | - Cesar Díaz-Torné
- Rheumatology Department, Hospital de la Santa Creu i Sant Pau, Barcelona, Spain
| | - Ricardo Blanco
- Rheumatology Department, Hospital Universitario Marqués de Valdecilla, Santander, Spain
| | | | - José Antonio Mosquera
- Rheumatology Department, Complejo Hospitalario Hospital Provincial de Pontevedra, Pontevedra, Spain
| | - Paloma Vela
- Rheumatology Department, Hospital General Universitario de Alicante, Alicante, Spain
| | | | - Héctor Corominas
- Rheumatology Department, Hospital de la Santa Creu i Sant Pau, Barcelona, Spain.,Rheumatology Department, Hospital Dos de Maig, Barcelona, Spain
| | - Julio Ramírez
- Rheumatology Department, Hospital Clínic de Barcelona and IDIBAPS, Barcelona, Spain
| | - Pablo de la Cueva
- Dermatology Department, Hospital Universitario Infanta Leonor, Madrid, Spain
| | - Eduardo Fonseca
- Dermatology Department, Complejo Hospitalario Universitario de A Coruña, A Coruña, Spain
| | - Emilia Fernández
- Dermatology Department, Hospital Universitario de Salamanca, Salamanca, Spain
| | - Lluis Puig
- Dermatology Department, Hospital de la Santa Creu i Sant Pau, Barcelona, Spain
| | - Esteban Dauden
- Dermatology Department, Hospital Universitario La Princesa, IIS La Princesa, Madrid, Spain
| | | | | | - David Moreno
- Dermatology Department, Hospital Virgen Macarena, Sevilla, Spain
| | | | - Enrique Herrera
- Dermatology Department, Hospital Universitario Virgen de la Victoria, Málaga, Spain
| | - Francisco Blanco
- Rheumatology Department, INIBIC-Hospital Universitario A Coruña, A Coruña, Spain
| | | | - Antonio González
- Instituto de Investigación Sanitaria Hospital Clínico Universitario de Santiago, Santiago de Compostela, Spain
| | | | | | | | | | | | - Raimon Sanmartí
- Rheumatology Department, Hospital Clínic de Barcelona and IDIBAPS, Barcelona, Spain
| | | | | | - Sílvia Bonàs-Guarch
- Barcelona Supercomputing Centre (BSC), Joint BSC-CRG-IRB Research Program in Computational Biology, Barcelona, Spain
| | - Josep Maria Mercader
- Barcelona Supercomputing Centre (BSC), Joint BSC-CRG-IRB Research Program in Computational Biology, Barcelona, Spain
| | - David Torrents
- Barcelona Supercomputing Centre (BSC), Joint BSC-CRG-IRB Research Program in Computational Biology, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Laia Codó
- Life Sciences Department, Barcelona Supercomputing Centre, Barcelona, Spain
| | - Josep Lluís Gelpí
- Life Sciences Department, Barcelona Supercomputing Centre, Barcelona, Spain
| | | | - Andrea Pluma
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| | - Maria López-Lasanta
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| | - Raül Tortosa
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| | - Nuria Palau
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| | - Devin Absher
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
| | - Richard Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
| | - Sara Marsal
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| | - Antonio Julià
- Rheumatology Research Group, Vall d'Hebron Research Institute, Barcelona, Spain
| |
Collapse
|
143
|
Guinot F, Szafranski M, Ambroise C, Samson F. Learning the optimal scale for GWAS through hierarchical SNP aggregation. BMC Bioinformatics 2018; 19:459. [PMID: 30497371 PMCID: PMC6267789 DOI: 10.1186/s12859-018-2475-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 11/09/2018] [Indexed: 11/16/2022] Open
Abstract
Background Genome-Wide Association Studies (GWAS) seek to identify causal genomic variants associated with rare human diseases. The classical statistical approach for detecting these variants is based on univariate hypothesis testing, with healthy individuals being tested against affected individuals at each locus. Given that an individual’s genotype is characterized by up to one million SNPs, this approach lacks precision, since it may yield a large number of false positives that can lead to erroneous conclusions about genetic associations with the disease. One way to improve the detection of true genetic associations is to reduce the number of hypotheses to be tested by grouping SNPs. Results We propose a dimension-reduction approach which can be applied in the context of GWAS by making use of the haplotype structure of the human genome. We compare our method with standard univariate and group-based approaches on both synthetic and real GWAS data. Conclusion We show that reducing the dimension of the predictor matrix by aggregating SNPs gives a greater precision in the detection of associations between the phenotype and genomic regions.
Collapse
Affiliation(s)
- Florent Guinot
- UMR 8071 LaMME - UEVE, CNRS, ENSIIE, USC INRA, 23 bd de France, Evry, 91000, France. .,BIOptimize, Reims, 51000, France.
| | - Marie Szafranski
- UMR 8071 LaMME - UEVE, CNRS, ENSIIE, USC INRA, 23 bd de France, Evry, 91000, France
| | - Christophe Ambroise
- UMR 8071 LaMME - UEVE, CNRS, ENSIIE, USC INRA, 23 bd de France, Evry, 91000, France.,UMR MIA-Paris - AgroParisTech, INRA, Université Paris-Saclay, Paris, 75005, France
| | - Franck Samson
- UMR 8071 LaMME - UEVE, CNRS, ENSIIE, USC INRA, 23 bd de France, Evry, 91000, France
| |
Collapse
|
144
|
Schweiger R, Fisher E, Weissbrod O, Rahmani E, Müller-Nurasyid M, Kunze S, Gieger C, Waldenberger M, Rosset S, Halperin E. Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests. Nat Commun 2018; 9:4919. [PMID: 30464216 PMCID: PMC6249264 DOI: 10.1038/s41467-018-07276-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 10/26/2018] [Indexed: 01/08/2023] Open
Abstract
Testing for association between a set of genetic markers and a phenotype is a fundamental task in genetic studies. Standard approaches for heritability and set testing strongly rely on parametric models that make specific assumptions regarding phenotypic variability. Here, we show that resulting p-values may be inflated by up to 15 orders of magnitude, in a heritability study of methylation measurements, and in a heritability and expression quantitative trait loci analysis of gene expression profiles. We propose FEATHER, a method for fast permutation-based testing of marker sets and of heritability, which properly controls for false-positive results. FEATHER eliminated 47% of methylation sites found to be heritable by the parametric test, suggesting a substantial inflation of false-positive findings by alternative methods. Our approach can rapidly identify heritable phenotypes out of millions of phenotypes acquired via high-throughput technologies, does not suffer from model misspecification and is highly efficient. Standard approaches for heritability and set testing in statistical genetics rely on parametric models that might not hold in reality and give inflated p-values. Here, the authors develop a fast method for permutation-based testing of marker sets and of heritability that does not suffer from model misspecification.
Collapse
Affiliation(s)
- Regev Schweiger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 6997801, Israel.
| | - Eyal Fisher
- School of Mathematical Sciences, Department of Statistics, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA
| | - Elior Rahmani
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, 85764, Germany.,Department of Medicine I, Ludwig-Maximilians-Universität, Munich, 80539, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Munich Heart Alliance, Munich, 80636, Germany
| | - Sonja Kunze
- Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764, Neuherberg, Germany
| | - Christian Gieger
- Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764, Neuherberg, Germany
| | - Melanie Waldenberger
- DZHK (German Centre for Cardiovascular Research), partner site Munich Heart Alliance, Munich, 80636, Germany.,Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764, Neuherberg, Germany
| | - Saharon Rosset
- School of Mathematical Sciences, Department of Statistics, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Eran Halperin
- Los Angeles, University of California Los Angeles, Los Angeles, 90095, CA, USA.,Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, 90095, CA, USA
| |
Collapse
|
145
|
He T, Li S, Zhong PS, Cui Y. An optimal kernel-based U
-statistic method for quantitative gene-set association analysis. Genet Epidemiol 2018; 43:137-149. [DOI: 10.1002/gepi.22170] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 08/19/2018] [Accepted: 09/26/2018] [Indexed: 11/09/2022]
Affiliation(s)
- Tao He
- Department of Mathematics; San Francisco State University; San Francisco California
| | - Shaoyu Li
- Department of Mathematics and Statistics; University of North Carolina at Charlotte; Charlotte North Carolina
| | - Ping-Shou Zhong
- Department of Mathematics, Statistics, and Computer Science; University of Illinois at Chicago; Chicago Illinois
| | - Yuehua Cui
- Department of Statistics & Probability; Michigan State University; East Lansing Michigan
- School of Public Health, Zhengzhou University; Zhengzhou China
| |
Collapse
|
146
|
Goodman MO, Chibnik L, Cai T. Variance components genetic association test for zero-inflated count outcomes. Genet Epidemiol 2018; 43:82-101. [PMID: 30353568 DOI: 10.1002/gepi.22162] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 05/15/2018] [Accepted: 06/12/2018] [Indexed: 01/27/2023]
Abstract
Commonly in biomedical research, studies collect data in which an outcome measure contains informative excess zeros; for example, when observing the burden of neuritic plaques (NPs) in brain pathology studies, those who show none contribute to our understanding of neurodegenerative disease. The outcome may be characterized by a mixture distribution with one component being the "structural zero" and the other component being a Poisson distribution. We propose a novel variance components score test of genetic association between a set of genetic markers and a zero-inflated count outcome from a mixture distribution. This test shares advantageous properties with single-nucleotide polymorphism (SNP)-set tests which have been previously devised for standard continuous or binary outcomes, such as the sequence kernel association test. In particular, our method has superior statistical power compared to competing methods, especially when there is correlation within the group of markers, and when the SNPs are associated with both the mixing proportion and the rate of the Poisson distribution. We apply the method to Alzheimer's data from the Rush University Religious Orders Study and Memory and Aging Project, where as proof of principle we find highly significant associations with the APOE gene, in both the "structural zero" and "count" parameters, when applied to a zero-inflated NPs count outcome.
Collapse
Affiliation(s)
- Matthew O Goodman
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Lori Chibnik
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| |
Collapse
|
147
|
Wang X, Wang S, Meng X. A novel SNP-set analytical method without distinguishing common variants or rare variants in genome-wide association study. INT J BIOMATH 2018. [DOI: 10.1142/s1793524518500948] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Single nucleotide polymorphism (SNP)-set analysis in genome-wide association studies (GWASs) has become a hot topic. Most existing SNP-set analystic methods are designed and work well according to the different natures of common or rare variants and associated diseases. But the information that the disease associated variants are common or rare cannot be gained in advance. Therefore, in this research, we proposed a new and powerful weighted function method without distinguishing common or rare variants to select tagging SNP-set. We applied our selection method to sequence kernel association test (SKAT) and compared the power with some existing methods. The simulation results showed that our method has higher power not only than SKAT in un-weighted case, but also than SKAT in other weighted functions. Moreover, the power is improved significantly when the minor allele frequency (MAF) of causal SNP is relatively small.
Collapse
Affiliation(s)
- Xinzeng Wang
- State Key Laboratory of Mining Disaster Prevention and Control Co-founded by Shandong Province and the Ministry of Science and Technology, Shandong University of Science and Technology, Qingdao 266590, P. R. China
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao 266510, P. R. China
| | - Shudong Wang
- College of Computer and Communication Engineering, China University of Petroleum (East China), Qingdao, Shandong 266580, P. R. China
| | - Xinzhu Meng
- State Key Laboratory of Mining Disaster Prevention and Control Co-founded by Shandong Province and the Ministry of Science and Technology, Shandong University of Science and Technology, Qingdao 266590, P. R. China
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao 266510, P. R. China
| |
Collapse
|
148
|
Absence of Mutation Enrichment for Genes Phylogenetically Conserved in the Olivocerebellar Motor Circuitry in a Cohort of Canadian Essential Tremor Cases. Mol Neurobiol 2018; 56:4317-4321. [PMID: 30315477 DOI: 10.1007/s12035-018-1369-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2018] [Accepted: 09/27/2018] [Indexed: 10/28/2022]
Abstract
Essential Tremor is a prevalent neurological disorder of unknown etiology. Studies suggest that genetic factors contribute to this pathology. To date, no causative mutations in a gene have been reproducibly reported. All three structures of the olivocerebellar motor circuitry have been linked to Essential Tremor. We postulated that genes enriched for their expression in the olivocerebellar circuitry would be more susceptible to harbor mutations in Essential Tremor patients. A list of 11 candidate genes, enriched for their expression in the olivocerebellar circuitry, was assessed for their variation spectrum and frequency in a cohort of Canadian Essential Tremor cases. Our results from this list of 11 candidate genes do not support an association for Essential Tremor in our cohort of Canadian cases. The heterogenic nature of ET and modest size of the cohort used in this study are two confounding factors that could explain these results.
Collapse
|
149
|
Heller R, Chatterjee N, Krieger A, Shi J. Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1375933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Ruth Heller
- Department of Statistics and Operations Research, Tel-Aviv University, Tel-Aviv, Israel
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, and Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD
| | - Abba Krieger
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| | - Jianxin Shi
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD
| |
Collapse
|
150
|
Gao TH, Zhang J, Miguelangel DM, Wang X. Methods to evaluate rare variants gene-age interaction for triglycerides. BMC Proc 2018; 12:49. [PMID: 30263050 PMCID: PMC6156913 DOI: 10.1186/s12919-018-0136-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2023] Open
Abstract
Triglycerides are an important measure of heart health. Although more than 90 genes have been found to be associated to lipids, they only explain 12 to 15% of the variance in lipid levels. Evidence suggests that age may interact with the genetic effect on lipid levels. Existing methods to detect the main effect of rare variants cannot be readily applied for testing the gene environment interaction effect of rare variants, as those methods either have unstable results or inflated Type I error rates when the main effect exists. To overcome these difficulties, we developed two statistical methods: testing of optimally weighted combination of single-nucleotide polymorphism (SNP) environment interaction (TOW-SE) and a variable weight TOW-SE (VW-TOW-SE) to test the gene environment interaction effect of rare variants by grouping SNPs into biologically meaningful SNP-sets (SNPs in a gene or pathway) to improve power and interpretability. The proposed methods can be applied to either continuous or binary environmental variables, and to either continuous or binary outcomes. Simulation studies show that Type I error rates of the proposed methods are under control. Comparing the two methods with the existing interaction sequence kernel association test (iSKAT), the VW-TOW-SE is the most powerful test and the TOW-SE is the second most powerful test when gene environment interaction effect exists for both rare and common variants. The three tests were applied to the GAW20 simulated data, among the five regions in which the main effect of common SNPs was simulated and the gene–age interaction effect was not included. As expected, none of the tests indicated positive results.
Collapse
Affiliation(s)
- Tony Huayang Gao
- 1Texas Academy of Mathematics & Science, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203 USA
| | - Jianjun Zhang
- 2Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203 USA
| | | | - Xuexia Wang
- 2Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203 USA
| |
Collapse
|