1
|
Dong Z, Zhao H, DeWan AT. A mediation analysis framework based on variance component to remove genetic confounding effect. J Hum Genet 2024; 69:301-309. [PMID: 38528049 DOI: 10.1038/s10038-024-01232-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/27/2024]
Abstract
Identification of pleiotropy at the single nucleotide polymorphism (SNP) level provides valuable insights into shared genetic signals among phenotypes. One approach to study these signals is through mediation analysis, which dissects the total effect of a SNP on the outcome into a direct effect and an indirect effect through a mediator. However, estimated effects from mediation analysis can be confounded by the genetic correlation between phenotypes, leading to inaccurate results. To address this confounding effect in the context of genetic mediation analysis, we propose a restricted-maximum-likelihood (REML)-based mediation analysis framework called REML-mediation, which can be applied to either individual-level or summary statistics data. Simulations demonstrated that REML-mediation provides unbiased estimates of the true cross-trait causal effect, assuming certain assumptions, albeit with a slightly inflated standard error compared to traditional linear regression. To validate the effectiveness of REML-mediation, we applied it to UK Biobank data and analyzed several mediator-outcome trait pairs along with their corresponding sets of pleiotropic SNPs. REML-mediation successfully identified and corrected for genetic confounding effects in these trait pairs, with correction magnitudes ranging from 7% to 39%. These findings highlight the presence of genetic confounding effects in cross-trait epidemiological studies and underscore the importance of accounting for them in data analysis.
Collapse
Affiliation(s)
- Zihan Dong
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
- Center for Perinatal, Pediatric and Environmental Epidemiology, Yale School of Public Health, New Haven, CT, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
| | - Andrew T DeWan
- Center for Perinatal, Pediatric and Environmental Epidemiology, Yale School of Public Health, New Haven, CT, USA.
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, CT, USA.
| |
Collapse
|
2
|
Du J, Zhou X, Clark-Boucher D, Hao W, Liu Y, Smith JA, Mukherjee B. Methods for large-scale single mediator hypothesis testing: Possible choices and comparisons. Genet Epidemiol 2023; 47:167-184. [PMID: 36465006 PMCID: PMC10329872 DOI: 10.1002/gepi.22510] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 09/30/2022] [Accepted: 11/11/2022] [Indexed: 12/12/2022]
Abstract
Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis,H 0 : α β = 0 ${H}_{0}:\alpha \beta =0$ (α $\alpha $ : effect of the exposure on the mediator after adjusting for confounders;β $\beta $ : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large-scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure-mediator interaction so that the productα β $\alpha \beta $ has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1)α = 0 , β ≠ 0 $\alpha =0,\beta \ne 0$ ; (2)α ≠ 0 , β = 0 $\alpha \ne 0,\beta =0$ ; and (3)α = β = 0 $\alpha =\beta =0$ . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the three p values obtained under each case of the null so that the reference distribution of the composite statistic is approximatelyU ( 0 , 1 ) $U(0,1)$ . In addition to these existing methods, we developed the Sobel-comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi-Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R package medScan available on the CRAN for implementing all the six methods.
Collapse
Affiliation(s)
- Jiacong Du
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Dylan Clark-Boucher
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Wei Hao
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Yongmei Liu
- Department of Medicine, Divisions of Cardiology and Neurology, Duke University Medical Center, Durham, North Carolina, USA
| | - Jennifer A Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, Michigan, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
3
|
Jin C, Lee B, Shen L, Long Q. Integrating multi-omics summary data using a Mendelian randomization framework. Brief Bioinform 2022; 23:bbac376. [PMID: 36094096 PMCID: PMC9677504 DOI: 10.1093/bib/bbac376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 07/29/2022] [Accepted: 08/09/2022] [Indexed: 12/14/2022] Open
Abstract
Mendelian randomization is a versatile tool to identify the possible causal relationship between an omics biomarker and disease outcome using genetic variants as instrumental variables. A key theme is the prioritization of genes whose omics readouts can be used as predictors of the disease outcome through analyzing GWAS and QTL summary data. However, there is a dearth of study of the best practice in probing the effects of multiple -omics biomarkers annotated to the same gene of interest. To bridge this gap, we propose powerful combination tests that integrate multiple correlated $P$-values without assuming the dependence structure between the exposures. Our extensive simulation experiments demonstrate the superiority of our proposed approach compared with existing methods that are adapted to the setting of our interest. The top hits of the analyses of multi-omics Alzheimer's disease datasets include genes ABCA7 and ATP1B1.
Collapse
Affiliation(s)
- Chong Jin
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brian Lee
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|
4
|
Wang Z, Wiggs JL, Aung T, Khawaja AP, Khor CC. The genetic basis for adult onset glaucoma: Recent advances and future directions. Prog Retin Eye Res 2022; 90:101066. [PMID: 35589495 DOI: 10.1016/j.preteyeres.2022.101066] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 04/19/2022] [Accepted: 04/23/2022] [Indexed: 11/26/2022]
Abstract
Glaucoma, a diverse group of eye disorders that results in the degeneration of retinal ganglion cells, is the world's leading cause of irreversible blindness. Apart from age and ancestry, the major risk factor for glaucoma is increased intraocular pressure (IOP). In primary open-angle glaucoma (POAG), the anterior chamber angle is open but there is resistance to aqueous outflow. In primary angle-closure glaucoma (PACG), crowding of the anterior chamber angle due to anatomical alterations impede aqueous drainage through the angle. In exfoliation syndrome and exfoliation glaucoma, deposition of white flaky material throughout the anterior chamber directly interfere with aqueous outflow. Observational studies have established that there is a strong hereditable component for glaucoma onset and progression. Indeed, a succession of genome wide association studies (GWAS) that were centered upon single nucleotide polymorphisms (SNP) have yielded more than a hundred genetic markers associated with glaucoma risk. However, a shortcoming of GWAS studies is the difficulty in identifying the actual effector genes responsible for disease pathogenesis. Building on the foundation laid by GWAS studies, research groups have recently begun to perform whole exome-sequencing to evaluate the contribution of protein-changing, coding sequence genetic variants to glaucoma risk. The adoption of this technology in both large population-based studies as well as family studies are revealing the presence of novel, protein-changing genetic variants that could enrich our understanding of the pathogenesis of glaucoma. This review will cover recent advances in the genetics of primary open-angle glaucoma, primary angle-closure glaucoma and exfoliation glaucoma, which collectively make up the vast majority of all glaucoma cases in the world today. We will discuss how recent advances in research methodology have uncovered new risk genes, and how follow up biological investigations could be undertaken in order to define how the risk encoded by a genetic sequence variant comes into play in patients. We will also hypothesise how data arising from characterising these genetic variants could be utilized to predict glaucoma risk and the manner in which new therapeutic strategies might be informed.
Collapse
Affiliation(s)
- Zhenxun Wang
- Duke-NUS Medical School, Singapore; Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore.
| | - Janey L Wiggs
- Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| | - Tin Aung
- Duke-NUS Medical School, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
| | - Anthony P Khawaja
- NIHR Biomedical Research Centre, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, United Kingdom
| | - Chiea Chuen Khor
- Duke-NUS Medical School, Singapore; Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
| |
Collapse
|
5
|
Wang T, Qiao J, Zhang S, Wei Y, Zeng P. Simultaneous test and estimation of total genetic effect in eQTL integrative analysis through mixed models. Brief Bioinform 2022; 23:6535679. [PMID: 35212359 DOI: 10.1093/bib/bbac038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 01/22/2022] [Accepted: 02/07/2021] [Indexed: 11/14/2022] Open
Abstract
Integration of expression quantitative trait loci (eQTL) into genome-wide association studies (GWASs) is a promising manner to reveal functional roles of associated single-nucleotide polymorphisms (SNPs) in complex phenotypes and has become an active research field in post-GWAS era. However, how to efficiently incorporate eQTL mapping study into GWAS for prioritization of causal genes remains elusive. We herein proposed a novel method termed as Mixed transcriptome-wide association studies (TWAS) and mediated Variance estimation (MTV) by modeling the effects of cis-SNPs of a gene as a function of eQTL. MTV formulates the integrative method and TWAS within a unified framework via mixed models and therefore includes many prior methods/tests as special cases. We further justified MTV from another two statistical perspectives of mediation analysis and two-stage Mendelian randomization. Relative to existing methods, MTV is superior for pronounced features including the processing of direct effects of cis-SNPs on phenotypes, the powerful likelihood ratio test for assessment of joint effects of cis-SNPs and genetically regulated gene expression (GReX), two useful quantities to measure relative genetic contributions of GReX and cis-SNPs to phenotypic variance, and the computationally efferent parameter expansion expectation maximum algorithm. With extensive simulations, we identified that MTV correctly controlled the type I error in joint evaluation of the total genetic effect and proved more powerful to discover true association signals across various scenarios compared to existing methods. We finally applied MTV to 41 complex traits/diseases available from three GWASs and discovered many new associated genes that had otherwise been missed by existing methods. We also revealed that a small but substantial fraction of phenotypic variation was mediated by GReX. Overall, MTV constructs a robust and realistic modeling foundation for integrative omics analysis and has the advantage of offering more attractive biological interpretations of GWAS results.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Jiahao Qiao
- Department of Biostatistics at Xuzhou Medical University, China
| | - Shuo Zhang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Yongyue Wei
- Department of Biostatistics at Nanjing Medical University, China
| | - Ping Zeng
- Department of Biostatistics, Center for Medical Statistics and Data Analysis and Key Laboratory of Human Genetics and Environmental Medicine at Xuzhou Medical University, China
| |
Collapse
|
6
|
Ngwa JS, Yanek LR, Kammers K, Kanchan K, Taub MA, Scharpf RB, Faraday N, Becker LC, Mathias RA, Ruczinski I. Secondary analyses for genome-wide association studies using expression quantitative trait loci. Genet Epidemiol 2022; 46:170-181. [PMID: 35312098 PMCID: PMC9086181 DOI: 10.1002/gepi.22448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 11/19/2021] [Accepted: 01/20/2022] [Indexed: 01/01/2023]
Abstract
Genome-wide association studies (GWAS) have successfully identified thousands of single nucleotide polymorphisms (SNPs) associated with complex traits; however, the identified SNPs account for a fraction of trait heritability, and identifying the functional elements through which genetic variants exert their effects remains a challenge. Recent evidence suggests that SNPs associated with complex traits are more likely to be expression quantitative trait loci (eQTL). Thus, incorporating eQTL information can potentially improve power to detect causal variants missed by traditional GWAS approaches. Using genomic, transcriptomic, and platelet phenotype data from the Genetic Study of Atherosclerosis Risk family-based study, we investigated the potential to detect novel genomic risk loci by incorporating information from eQTL in the relevant target tissues (i.e., platelets and megakaryocytes) using established statistical principles in a novel way. Permutation analyses were performed to obtain family-wise error rates for eQTL associations, substantially lowering the genome-wide significance threshold for SNP-phenotype associations. In addition to confirming the well known association between PEAR1 and platelet aggregation, our eQTL-focused approach identified a novel locus (rs1354034) and gene (ARHGEF3) not previously identified in a GWAS of platelet aggregation phenotypes. A colocalization analysis showed strong evidence for a functional role of this eQTL.
Collapse
Affiliation(s)
- Julius S. Ngwa
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMarylandUSA
| | - Lisa R. Yanek
- Department of MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Kai Kammers
- Department of OncologyJohns Hopkins University, School of MedicineBaltimoreMarylandUSA
| | - Kanika Kanchan
- Department of MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Margaret A. Taub
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMarylandUSA
| | - Robert B. Scharpf
- Department of OncologyJohns Hopkins University, School of MedicineBaltimoreMarylandUSA
| | - Nauder Faraday
- Department of Anesthesiology and Critical Care MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Lewis C. Becker
- Department of MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Rasika A. Mathias
- Department of MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Ingo Ruczinski
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMarylandUSA
| |
Collapse
|
7
|
Zhu M, Yin P, Hu F, Jiang J, Yin L, Li Y, Wang S. Integrating genome-wide association and transcriptome prediction model identifies novel target genes for osteoporosis. Osteoporos Int 2021; 32:2493-2503. [PMID: 34142171 PMCID: PMC8608767 DOI: 10.1007/s00198-021-06024-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 05/31/2021] [Indexed: 12/12/2022]
Abstract
UNLABELLED In this study, we integrated large-scale GWAS summary data and used the predicted transcriptome-wide association study method to discover novel genes associated with osteoporosis. We identified 204 candidate genes, which provide novel clues for understanding the genetic mechanism of osteoporosis and indicate potential therapeutic targets. INTRODUCTION Osteoporosis is a highly polygenetic disease characterized by low bone mass and deterioration of the bone microarchitecture. Our objective was to discover novel candidate genes associated with osteoporosis. METHODS To identify potential causal genes of the associated loci, we investigated trait-gene expression associations using the transcriptome-wide association study (TWAS) method. This method directly imputes gene expression effects from genome-wide association study (GWAS) data using a statistical prediction model trained on GTEx reference transcriptome data. We then performed a colocalization analysis to evaluate the posterior probability of biological patterns: associations characterized by a single causal variant or multiple distinct causal variants. Finally, a functional enrichment analysis of gene sets was performed using the VarElect and CluePedia tools, which assess the causal relationships between genes and a disease and search for potential gene's functional pathways. The osteoporosis-associated genes were further confirmed based on the differentially expressed genes profiled from mRNA expression data of bone tissue. RESULTS Our analysis identified 204 candidate genes, including 154 genes that have been previously associated with osteoporosis, 50 genes that have not been previously discovered. A biological function analysis found that 20 of the candidate genes were directly associated with osteoporosis. Further analysis of multiple gene expression profiles showed that 15 genes were differentially expressed in patients with osteoporosis. Among these, SLC11A2, MAP2K5, NFATC4, and HSP90B1 were enriched in four pathways, namely, mineral absorption pathway, MAPK signaling pathway, Wnt signaling pathway, and PI3K-Akt signaling pathway, which indicates a causal relationship with the occurrence of osteoporosis. CONCLUSIONS We demonstrated that transcriptome fine-mapping identifies more osteoporosis-related genes and provides key insight into the development of novel targeted therapeutics for the treatment of osteoporosis.
Collapse
Affiliation(s)
- M Zhu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - P Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - F Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - J Jiang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - L Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Y Li
- AnLan AI, Shenzhen, China
| | - S Wang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
8
|
Liao Y, Liu J, Coffman DL, Li R. Varying Coefficient Mediation Model and Application to Analysis of Behavioral Economics Data. JOURNAL OF BUSINESS & ECONOMIC STATISTICS : A PUBLICATION OF THE AMERICAN STATISTICAL ASSOCIATION 2021; 40:1759-1771. [PMID: 36330150 PMCID: PMC9624463 DOI: 10.1080/07350015.2021.1971089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This article is concerned with causal mediation analysis with varying indirect and direct effects. We propose a varying coefficient mediation model, which can also be viewed as an extension of moderation analysis on a causal diagram. We develop a new estimation procedure for the direct and indirect effects based on B-splines. Under mild conditions, rates of convergence and asymptotic distributions of the resulting estimates are established. We further propose a F-type test for the direct effect. We conduct simulation study to examine the finite sample performance of the proposed methodology, and apply the new procedures for empirical analysis of behavioral economics data.
Collapse
Affiliation(s)
- Yujie Liao
- Department of Statistics, Pennsylvania State University, University Park, PA
| | - Jingyuan Liu
- MOE Key Laboratory of Econometrics, Department of Statistics and Data Science, School of Economics, Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, China
- Fujian Key Lab of Statistics, Xiamen University, Xiamen, China
| | - Donna L. Coffman
- Department of Epidemiology and Biostatistics, Temple University, Philadelphia, PA
| | - Runze Li
- Department of Statistics, Pennsylvania State University, University Park, PA
| |
Collapse
|
9
|
Kulkarni O, Sugier PE, Guibon J, Boland-Augé A, Lonjou C, Bacq-Daian D, Olaso R, Rubino C, Souchard V, Rachedi F, Lence-Anta JJ, Ortiz RM, Xhaard C, Laurent-Puig P, Mulot C, Guizard AV, Schvartz C, Boutron-Ruault MC, Ostroumova E, Kesminiene A, Deleuze JF, Guénel P, De Vathaire F, Truong T, Lesueur F. Gene network and biological pathways associated with susceptibility to differentiated thyroid carcinoma. Sci Rep 2021; 11:8932. [PMID: 33903625 PMCID: PMC8076215 DOI: 10.1038/s41598-021-88253-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 04/09/2021] [Indexed: 12/11/2022] Open
Abstract
Variants identified in earlier genome-wide association studies (GWAS) on differentiated thyroid carcinoma (DTC) explain about 10% of the overall estimated genetic contribution and could not provide complete insights into biological mechanisms involved in DTC susceptibility. Integrating systems biology information from model organisms, genome-wide expression data from tumor and matched normal tissue and GWAS data could help identifying DTC-associated genes, and pathways or functional networks in which they are involved. We performed data mining of GWAS data of the EPITHYR consortium (1551 cases and 1957 controls) using various pathways and protein-protein interaction (PPI) annotation databases and gene expression data from The Cancer Genome Atlas. We identified eight DTC-associated genes at known loci 2q35 (DIRC3), 8p12 (NRG1), 9q22 (FOXE1, TRMO, HEMGN, ANP32B, NANS) and 14q13 (MBIP). Using the EW_dmGWAS approach we found that gene networks related to glycogenolysis, glycogen metabolism, insulin metabolism and signal transduction pathways associated with muscle contraction were overrepresented with association signals (false discovery rate adjusted p-value < 0.05). Additionally, suggestive association of 21 KEGG and 75 REACTOME pathways with DTC indicate a link between DTC susceptibility and functions related to metabolism of cholesterol, amino sugar and nucleotide sugar metabolism, steroid biosynthesis, and downregulation of ERBB2 signaling pathways. Together, our results provide novel insights into biological mechanisms contributing to DTC risk.
Collapse
Affiliation(s)
- Om Kulkarni
- Inserm, U900, Institut Curie, PSL University, Mines ParisTech, 75248, Paris, France
| | | | - Julie Guibon
- Inserm, U900, Institut Curie, PSL University, Mines ParisTech, 75248, Paris, France
- Université Paris-Saclay, UVSQ, Gustave Roussy, Inserm, CESP, 94807, Villejuif, France
| | - Anne Boland-Augé
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057, Evry, France
| | - Christine Lonjou
- Inserm, U900, Institut Curie, PSL University, Mines ParisTech, 75248, Paris, France
| | - Delphine Bacq-Daian
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057, Evry, France
| | - Robert Olaso
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057, Evry, France
| | - Carole Rubino
- Université Paris-Saclay, UVSQ, Gustave Roussy, Inserm, CESP, 94807, Villejuif, France
| | - Vincent Souchard
- Université Paris-Saclay, UVSQ, Gustave Roussy, Inserm, CESP, 94807, Villejuif, France
| | - Frédérique Rachedi
- Centre Hospitalier Territorial de Polynésie Française, CHTPF, Pirae, Tahiti, 98713, Papeete, French Polynesia
| | | | - Rosa Maria Ortiz
- Instituto Nacional de Oncologia y de Radiobiologia, INOR, La Havana, Cuba
| | - Constance Xhaard
- Université Paris-Saclay, UVSQ, Gustave Roussy, Inserm, CESP, 94807, Villejuif, France
- University of Lorraine, INSERM CIC 1433, Nancy CHRU, Inserm U1116, FCRIN, INI-CRCT, 54000, Nancy, France
| | - Pierre Laurent-Puig
- Centre de Recherche des Cordeliers, INSERM, Sorbonne Université, USPC, Université Paris Descartes, Université Paris Diderot, EPIGENETEC, 75006, Paris, France
| | - Claire Mulot
- Centre de Recherche des Cordeliers, INSERM, Sorbonne Université, USPC, Université Paris Descartes, Université Paris Diderot, EPIGENETEC, 75006, Paris, France
| | - Anne-Valérie Guizard
- Registre Général des Tumeurs du Calvados, Centre François Baclesse, 14000, Caen, France
- Inserm U1086-UCNB, Cancers and Prevention, 14000, Caen, France
| | - Claire Schvartz
- Registre des Cancers Thyroïdiens, Institut Jean Godinot, 51100, Reims, France
| | | | - Evgenia Ostroumova
- Environment and Radiation Section, International Agency for Research on Cancer, 69008, Lyon, France
| | - Ausrele Kesminiene
- Environment and Radiation Section, International Agency for Research on Cancer, 69008, Lyon, France
| | - Jean-François Deleuze
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057, Evry, France
| | - Pascal Guénel
- Université Paris-Saclay, UVSQ, Gustave Roussy, Inserm, CESP, 94807, Villejuif, France
| | - Florent De Vathaire
- Université Paris-Saclay, UVSQ, Gustave Roussy, Inserm, CESP, 94807, Villejuif, France
| | - Thérèse Truong
- Université Paris-Saclay, UVSQ, Gustave Roussy, Inserm, CESP, 94807, Villejuif, France
| | - Fabienne Lesueur
- Inserm, U900, Institut Curie, PSL University, Mines ParisTech, 75248, Paris, France.
| |
Collapse
|
10
|
Zhong W, Darville T, Zheng X, Fine J, Li Y. Generalized multi-SNP mediation intersection-union test. Biometrics 2020; 78:364-375. [PMID: 33316078 DOI: 10.1111/biom.13418] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 11/23/2020] [Accepted: 12/04/2020] [Indexed: 12/15/2022]
Abstract
To elucidate the molecular mechanisms underlying genetic variants identified from genome-wide association studies (GWAS) for a variety of phenotypic traits encompassing binary, continuous, count, and survival outcomes, we propose a novel and flexible method to test for mediation that can simultaneously accommodate multiple genetic variants and different types of outcome variables. Specifically, we employ the intersection-union test approach combined with the likelihood ratio test to detect mediation effect of multiple genetic variants via some mediator (e.g., the expression of a neighboring gene) on outcome. We fit high-dimensional generalized linear mixed models under the mediation framework, separately under the null and alternative hypothesis. We leverage Laplace approximation to compute the marginal likelihood of outcome and use coordinate descent algorithm to estimate corresponding parameters. Our extensive simulations demonstrate the validity of our proposed methods and substantial, up to 97%, power gains over alternative methods. Applications to real data for the study of Chlamydia trachomatis infection further showcase advantages of our methods. We believe our proposed methods will be of value and general interest in this post-GWAS era to disentangle the potential causal mechanism from DNA to phenotype for new drug discovery and personalized medicine.
Collapse
Affiliation(s)
- Wujuan Zhong
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Toni Darville
- Department of Pediatrics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xiaojing Zheng
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Pediatrics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jason Fine
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Yun Li
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
11
|
Discover novel disease-associated genes based on regulatory networks of long-range chromatin interactions. Methods 2020; 189:22-33. [PMID: 33096239 DOI: 10.1016/j.ymeth.2020.10.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 08/29/2020] [Accepted: 10/18/2020] [Indexed: 02/01/2023] Open
Abstract
Identifying genes and non-coding genetic variants that are genetically associated with complex diseases and the underlying mechanisms is one of the most important questions in functional genomics. Due to the limited statistical power and the lack of mechanistic modeling, traditional genome-wide association studies (GWAS) is restricted to fully address this question. Based on multi-omics data integration, cell-type specific regulatory networks can be built to improve GWAS analysis. In this study, we developed a new computational infrastructure, APRIL, to incorporate 3D chromatin interactions into regulatory network construction, which can extend the networks to include long-range cis-regulatory links between non-coding GWAS SNPs and target genes. Combinatorial transcription factors that co-regulate groups of genes are also inferred to further expand the networks with trans-regulation. A suite of machine learning predictions and statistical tests are incorporated in APRIL to predict novel disease-associated genes based on the expanded regulatory networks. Important features of non-coding regulatory elements and genetic variants are prioritized in network-based predictions, providing systems-level insights on the mechanisms of transcriptional dysregulation associated with complex diseases.
Collapse
|
12
|
Zhu Y, Ji J, Lin W, Li M, Liu L, Zhu H, Xue F, Li X, Zhou X, Yuan Z. MCC-SP: a powerful integration method for identification of causal pathways from genetic variants to complex disease. BMC Genet 2020; 21:90. [PMID: 32847502 PMCID: PMC7477886 DOI: 10.1186/s12863-020-00899-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 08/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) have successfully identified genetic susceptible variants for complex diseases. However, the underlying mechanism of such association remains largely unknown. Most disease-associated genetic variants have been shown to reside in noncoding regions, leading to the hypothesis that regulation of gene expression may be the primary biological mechanism. Current methods to characterize gene expression mediating the effect of genetic variant on diseases, often analyzed one gene at a time and ignored the network structure. The impact of genetic variant can propagate to other genes along the links in the network, then to the final disease. There could be multiple pathways from the genetic variant to the final disease, with each having the chain structure since the first node is one specific SNP (Single Nucleotide Polymorphism) variant and the end is disease outcome. One key but inadequately addressed question is how to measure the between-node connection strength and rank the effects of such chain-type pathways, which can provide statistical evidence to give the priority of some pathways for potential drug development in a cost-effective manner. RESULTS We first introduce the maximal correlation coefficient (MCC) to represent the between-node connection, and then integrate MCC with K shortest paths algorithm to rank and identify the potential pathways from genetic variant to disease. The pathway importance score (PIS) was further provided to quantify the importance of each pathway. We termed this method as "MCC-SP". Various simulations are conducted to illustrate MCC is a better measurement of the between-node connection strength than other quantities including Pearson correlation, Spearman correlation, distance correlation, mutual information, and maximal information coefficient. Finally, we applied MCC-SP to analyze one real dataset from the Religious Orders Study and the Memory and Aging Project, and successfully detected 2 typical pathways from APOE genotype to Alzheimer's disease (AD) through gene expression enriched in Alzheimer's disease pathway. CONCLUSIONS MCC-SP has powerful and robust performance in identifying the pathway(s) from the genetic variant to the disease. The source code of MCC-SP is freely available at GitHub ( https://github.com/zhuyuchen95/ADnet ).
Collapse
Affiliation(s)
- Yuchen Zhu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Jiadong Ji
- Department of Data Science, School of Statistics, Shandong University of Finance and Economics, Jinan, 250014 China
| | - Weiqiang Lin
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Mingzhuo Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Lu Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Huanhuan Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109 USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Xiujun Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109 USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| |
Collapse
|
13
|
The Gene scb-1 Underlies Variation in Caenorhabditis elegans Chemotherapeutic Responses. G3-GENES GENOMES GENETICS 2020; 10:2353-2364. [PMID: 32385045 PMCID: PMC7341127 DOI: 10.1534/g3.120.401310] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Pleiotropy, the concept that a single gene controls multiple distinct traits, is prevalent in most organisms and has broad implications for medicine and agriculture. The identification of the molecular mechanisms underlying pleiotropy has the power to reveal previously unknown biological connections between seemingly unrelated traits. Additionally, the discovery of pleiotropic genes increases our understanding of both genetic and phenotypic complexity by characterizing novel gene functions. Quantitative trait locus (QTL) mapping has been used to identify several pleiotropic regions in many organisms. However, gene knockout studies are needed to eliminate the possibility of tightly linked, non-pleiotropic loci. Here, we use a panel of 296 recombinant inbred advanced intercross lines of Caenorhabditis elegans and a high-throughput fitness assay to identify a single large-effect QTL on the center of chromosome V associated with variation in responses to eight chemotherapeutics. We validate this QTL with near-isogenic lines and pair genome-wide gene expression data with drug response traits to perform mediation analysis, leading to the identification of a pleiotropic candidate gene, scb-1, for some of the eight chemotherapeutics. Using deletion strains created by genome editing, we show that scb-1, which was previously implicated in response to bleomycin, also underlies responses to other double-strand DNA break-inducing chemotherapeutics. This finding provides new evidence for the role of scb-1 in the nematode drug response and highlights the power of mediation analysis to identify causal genes.
Collapse
|
14
|
Zhou RR, Wang L, Zhao SD. Estimation and inference for the indirect effect in high-dimensional linear mediation models. Biometrika 2020; 107:573-589. [PMID: 32831353 DOI: 10.1093/biomet/asaa016] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Indexed: 12/19/2022] Open
Abstract
Mediation analysis is difficult when the number of potential mediators is larger than the sample size. In this paper we propose new inference procedures for the indirect effect in the presence of high-dimensional mediators for linear mediation models. We develop methods for both incomplete mediation, where a direct effect may exist, and complete mediation, where the direct effect is known to be absent. We prove consistency and asymptotic normality of our indirect effect estimators. Under complete mediation, where the indirect effect is equivalent to the total effect, we further prove that our approach gives a more powerful test compared to directly testing for the total effect. We confirm our theoretical results in simulations, as well as in an integrative analysis of gene expression and genotype data from a pharmacogenomic study of drug response. We present a novel analysis of gene sets to understand the molecular mechanisms of drug response, and also identify a genome-wide significant noncoding genetic variant that cannot be detected using standard analysis methods.
Collapse
Affiliation(s)
- Ruixuan Rachel Zhou
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S. Wright Street, Champaign, Illinois 61820, U.S.A
| | - Liewei Wang
- Division of Clinical Pharmacology, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, 200 First St. SW, Rochester, Minnesota 55905, U.S.A
| | - Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S. Wright Street, Champaign, Illinois 61820, U.S.A
| |
Collapse
|
15
|
Carter KM, Lu M, Jiang H, An L. An Information-Based Approach for Mediation Analysis on High-Dimensional Metagenomic Data. Front Genet 2020; 11:148. [PMID: 32231681 PMCID: PMC7083016 DOI: 10.3389/fgene.2020.00148] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 02/10/2020] [Indexed: 12/13/2022] Open
Abstract
The human microbiome plays a critical role in the development of gut-related illnesses such as inflammatory bowel disease and clinical pouchitis. A mediation model can be used to describe the interaction between host gene expression, the gut microbiome, and clinical/health situation (e.g., diseased or not, inflammation level) and may provide insights into underlying disease mechanisms. Current mediation regression methodology cannot adequately model high-dimensional exposures and mediators or mixed data types. Additionally, regression based mediation models require some assumptions for the model parameters, and the relationships are usually assumed to be linear and additive. With the microbiome being the mediators, these assumptions are violated. We propose two novel nonparametric procedures utilizing information theory to detect significant mediation effects with high-dimensional exposures and mediators and varying data types while avoiding standard regression assumptions. Compared with available methods through comprehensive simulation studies, the proposed method shows higher power and lower error. The innovative method is applied to clinical pouchitis data as well and interesting results are obtained.
Collapse
Affiliation(s)
- Kyle M Carter
- Interdiciplanary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ, United States
| | - Meng Lu
- Interdiciplanary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ, United States
| | - Hongmei Jiang
- Department of Statistics, Northwestern University, Evanston, IL, United States
| | - Lingling An
- Interdiciplanary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ, United States.,Department of Epidemiology and Biostatistics, The University of Arizona, Tucson, AZ, United States.,Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|
16
|
Gao Y, Yang H, Fang R, Zhang Y, Goode EL, Cui Y. Testing Mediation Effects in High-Dimensional Epigenetic Studies. Front Genet 2019; 10:1195. [PMID: 31824577 PMCID: PMC6883258 DOI: 10.3389/fgene.2019.01195] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 10/29/2019] [Indexed: 12/24/2022] Open
Abstract
Mediation analysis has been a powerful tool to identify factors mediating the association between exposure variables and outcomes. It has been applied to various genomic applications with the hope to gain novel insights into the underlying mechanism of various diseases. Given the high-dimensional nature of epigenetic data, recent effort on epigenetic mediation analysis is to first reduce the data dimension by applying high-dimensional variable selection techniques, then conducting testing in a low dimensional setup. In this paper, we propose to assess the mediation effect by adopting a high-dimensional testing procedure which can produce unbiased estimates of the regression coefficients and can properly handle correlations between variables. When the data dimension is ultra-high, we first reduce the data dimension from ultra-high to high by adopting a sure independence screening (SIS) method. We apply the method to two high-dimensional epigenetic studies: one is to assess how DNA methylations mediate the association between alcohol consumption and epithelial ovarian cancer (EOC) status; the other one is to assess how methylation signatures mediate the association between childhood maltreatment and post-traumatic stress disorder (PTSD) in adulthood. We compare the performance of the method with its counterpart via simulation studies. Our method can be applied to other high-dimensional mediation studies where high-dimensional mediation variables are collected.
Collapse
Affiliation(s)
- Yuzhao Gao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Haitao Yang
- Division of Health Statistics, School of Public Health, Hebei Medical University, Shijiazhuang, China
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yanbo Zhang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Ellen L Goode
- Department of Health Sciences Research, College of Medicine, Mayo Clinic, Rochester, MN, United States
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
17
|
Peng C, Wang J, Asante I, Louie S, Jin R, Chatzi L, Casey G, Thomas DC, Conti DV. A latent unknown clustering integrating multi-omics data (LUCID) with phenotypic traits. Bioinformatics 2019; 36:842-850. [PMID: 31504184 PMCID: PMC7986585 DOI: 10.1093/bioinformatics/btz667] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 08/04/2019] [Accepted: 08/21/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Epidemiologic, clinical and translational studies are increasingly generating multiplatform omics data. Methods that can integrate across multiple high-dimensional data types while accounting for differential patterns are critical for uncovering novel associations and underlying relevant subgroups. RESULTS We propose an integrative model to estimate latent unknown clusters (LUCID) aiming to both distinguish unique genomic, exposure and informative biomarkers/omic effects while jointly estimating subgroups relevant to the outcome of interest. Simulation studies indicate that we can obtain consistent estimates reflective of the true simulated values, accurately estimate subgroups and recapitulate subgroup-specific effects. We also demonstrate the use of the integrated model for future prediction of risk subgroups and phenotypes. We apply this approach to two real data applications to highlight the integration of genomic, exposure and metabolomic data. AVAILABILITY AND IMPLEMENTATION The LUCID method is implemented through the LUCIDus R package available on CRAN (https://CRAN.R-project.org/package=LUCIDus). SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cheng Peng
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | - Jun Wang
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | - Isaac Asante
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Stan Louie
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Ran Jin
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | - Lida Chatzi
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | - Graham Casey
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Duncan C Thomas
- Department of Preventive Medicine, Keck School of Medicine, Los Angeles, CA 90089, USA
| | | |
Collapse
|
18
|
Rojo C, Zhang Q, Keleş S. iFunMed: Integrative functional mediation analysis of GWAS and eQTL studies. Genet Epidemiol 2019; 43:742-760. [PMID: 31328826 DOI: 10.1002/gepi.22217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/17/2019] [Accepted: 05/07/2019] [Indexed: 11/08/2022]
Abstract
Genome-wide association studies (GWAS) have successfully identified thousands of genetic variants contributing to disease and other phenotypes. However, significant obstacles hamper our ability to elucidate causal variants, identify genes affected by causal variants, and characterize the mechanisms by which genotypes influence phenotypes. The increasing availability of genome-wide functional annotation data is providing unique opportunities to incorporate prior information into the analysis of GWAS to better understand the impact of variants on disease etiology. Although there have been many advances in incorporating prior information into prioritization of trait-associated variants in GWAS, functional annotation data have played a secondary role in the joint analysis of GWAS and molecular (i.e., expression) quantitative trait loci (eQTL) data in assessing evidence for association. To address this, we develop a novel mediation framework, iFunMed, to integrate GWAS and eQTL data with the utilization of publicly available functional annotation data. iFunMed extends the scope of standard mediation analysis by incorporating information from multiple genetic variants at a time and leveraging variant-level summary statistics. Data-driven computational experiments convey how informative annotations improve single-nucleotide polymorphism (SNP) selection performance while emphasizing robustness of iFunMed to noninformative annotations. Application to Framingham Heart Study data indicates that iFunMed is able to boost detection of SNPs with mediation effects that can be attributed to regulatory mechanisms.
Collapse
Affiliation(s)
- Constanza Rojo
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin
| | - Qi Zhang
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin
| |
Collapse
|
19
|
Badsha MB, Fu AQ. Learning Causal Biological Networks With the Principle of Mendelian Randomization. Front Genet 2019; 10:460. [PMID: 31164902 PMCID: PMC6536645 DOI: 10.3389/fgene.2019.00460] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 04/30/2019] [Indexed: 01/09/2023] Open
Abstract
Although large amounts of genomic data are available, it remains a challenge to reliably infer causal (i. e., regulatory) relationships among molecular phenotypes (such as gene expression), especially when multiple phenotypes are involved. We extend the interpretation of the Principle of Mendelian randomization (PMR) and present MRPC, a novel machine learning algorithm that incorporates the PMR in the PC algorithm, a classical algorithm for learning causal graphs in computer science. MRPC learns a causal biological network efficiently and robustly from integrating individual-level genotype and molecular phenotype data, in which directed edges indicate causal directions. We demonstrate through simulation that MRPC outperforms several popular general-purpose network inference methods and PMR-based methods. We apply MRPC to distinguish direct and indirect targets among multiple genes associated with expression quantitative trait loci. Our method is implemented in the R package MRPC, available on CRAN (https://cran.r-project.org/web/packages/MRPC/index.html).
Collapse
Affiliation(s)
- Md. Bahadur Badsha
- Department of Statistical Science, Center for Modeling Complex Interactions, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States
| | - Audrey Qiuyan Fu
- Department of Statistical Science, Center for Modeling Complex Interactions, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID, United States
| |
Collapse
|
20
|
Song W, Zheng S, Li M, Zhang X, Cao R, Ye C, Shao R, Li G, Li J, Liu S, Li H, Li L. Linking endotypes to omics profiles in difficult-to-control asthma using the diagnostic Chinese medicine syndrome differentiation algorithm. J Asthma 2019; 57:532-542. [PMID: 30915875 DOI: 10.1080/02770903.2019.1590589] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Objective: Patients with difficult-to-control asthma have difficulty breathing almost all of the time, even leading to life-threatening asthma attacks. However, only few diagnostic markers for this disease have been identified. We aimed to take advantage of unique Chinese medicine theories for phenotypic classification and to explore molecular signatures in difficult-to-control asthma. Methods: The Chinese medicine syndrome differentiation algorithm (CMSDA) is a syndrome-scoring classification method based on the Chinese medicine overall observation theory. Patients with difficult-to-control asthma were classified into Cold- and Hot-pattern groups according to the CMSDA. DNA methylation and metabolomic profiles were obtained using Infinium Human Methylation 450 BeadChip and gas chromatography-mass spectrometer. Subsequently, an integrated bioinformatics analysis was performed to compare those two patterns and identify Cold/Hot-associated candidates, followed by functional validation studies. Results: A total of 20 patients with difficult-to-control asthma were enrolled in the study. Ten were grouped as Cold and 10 as Hot according to the CMSDA. We identified distinct whole-genome DNA methylation and metabolomic profiles between Cold- and Hot-pattern groups. ALDH3A1 gene exhibited variations in the DNA methylation probe cg10791966, while two metabolic pathways were associated with those two patterns. Conclusions: Our study introduced a novel diagnostic classification approach, the CMSDA, for difficult-to-control asthma. This is an alternative way to categorize diverse syndromes and link endotypes with omics profiles of this disease. ALDH3A1 might be a potential biomarker for precision diagnosis of difficult-to-control asthma.
Collapse
Affiliation(s)
- Wenping Song
- Key Laboratory of Antibiotic Bioengineering of National Health and Family Planning Commission (NHFPC), Institute of Medicinal Biotechnology (IMB), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Si Zheng
- Institute of Medical Information (IMI) and Library, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Meng Li
- Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Xia Zhang
- Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Rui Cao
- Key Laboratory of Antibiotic Bioengineering of National Health and Family Planning Commission (NHFPC), Institute of Medicinal Biotechnology (IMB), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Cheng Ye
- Key Laboratory of Antibiotic Bioengineering of National Health and Family Planning Commission (NHFPC), Institute of Medicinal Biotechnology (IMB), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Rongguang Shao
- Key Laboratory of Antibiotic Bioengineering of National Health and Family Planning Commission (NHFPC), Institute of Medicinal Biotechnology (IMB), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Guangxi Li
- Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Jiao Li
- Institute of Medical Information (IMI) and Library, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Shigang Liu
- Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Hui Li
- Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Liang Li
- Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| |
Collapse
|
21
|
Shao F, Wang Y, Zhao Y, Yang S. Identifying and exploiting gene-pathway interactions from RNA-seq data for binary phenotype. BMC Genet 2019; 20:36. [PMID: 30890140 PMCID: PMC6423879 DOI: 10.1186/s12863-019-0739-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 03/12/2019] [Indexed: 11/29/2022] Open
Abstract
Background RNA sequencing (RNA-seq) technology has identified multiple differentially expressed (DE) genes associated to complex disease, however, these genes only explain a modest part of variance. Omnigenic model assumes that disease may be driven by genes with indirect relevance to disease and be propagated by functional pathways. Here, we focus on identifying the interactions between the external genes and functional pathways, referring to gene-pathway interactions (GPIs). Specifically, relying on the relationship between the garrote kernel machine (GKM) and variance component test and permutations for the empirical distributions of score statistics, we propose an efficient analysis procedure as Permutation based gEne-pAthway interaction identification in binary phenotype (PEA). Results Various simulations show that PEA has well-calibrated type I error rates and higher power than the traditional likelihood ratio test (LRT). In addition, we perform the gene set enrichment algorithms and PEA to identifying the GPIs from a pan-cancer data (GES68086). These GPIs and genes possibly further illustrate the potential etiology of cancers, most of which are identified and some external genes and significant pathways are consistent with previous studies. Conclusions PEA is an efficient tool for identifying the GPIs from RNA-seq data. It can be further extended to identify the interactions between one variable and one functional set of other omics data for binary phenotypes. Electronic supplementary material The online version of this article (10.1186/s12863-019-0739-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Fang Shao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, Jiangsu, People's Republic of China
| | - Yaqi Wang
- Department of Pharmacy Informatics, School of Science, China Pharmaceutical University, 24 Tongjia Xiang, Nanjing , Jiangsu, People's Republic of China
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, Jiangsu, People's Republic of China
| | - Sheng Yang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, Jiangsu, People's Republic of China.
| |
Collapse
|
22
|
Gaynor SM, Schwartz J, Lin X. Mediation analysis for common binary outcomes. Stat Med 2018; 38:512-529. [PMID: 30256434 DOI: 10.1002/sim.7945] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 07/26/2018] [Accepted: 07/26/2018] [Indexed: 11/06/2022]
Abstract
Mediation analysis provides an attractive causal inference framework to decompose the total effect of an exposure on an outcome into natural direct effects and natural indirect effects acting through a mediator. For binary outcomes, mediation analysis methods have been developed using logistic regression when the binary outcome is rare. These methods will not hold in practice when a disease is common. In this paper, we develop mediation analysis methods that relax the rare disease assumption when using logistic regression. We calculate the natural direct and indirect effects for common diseases by exploiting the relationship between logit and probit models. Specifically, we derive closed-form expressions for the natural direct and indirect effects on the odds ratio scale. Mediation models for both continuous and binary mediators are considered. We demonstrate through simulation that the proposed method performs well for common binary outcomes. We apply the proposed methods to analyze the Normative Aging Study to identify DNA methylation sites that are mediators of smoking behavior on the outcome of obstructed airway function.
Collapse
Affiliation(s)
- Sheila M Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Joel Schwartz
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.,Department of Statistics, Harvard University, Cambridge, Massachusetts
| |
Collapse
|
23
|
GWAS with Heterogeneous Data: Estimating the Fraction of Phenotypic Variation Mediated by Gene Expression Data. G3-GENES GENOMES GENETICS 2018; 8:3059-3068. [PMID: 30068524 PMCID: PMC6118313 DOI: 10.1534/g3.118.200571] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Intermediate phenotypes such as gene expression values can be used to elucidate the mechanisms by which genetic variation causes phenotypic variation, but jointly analyzing such heterogeneous data are far from trivial. Here we extend a so-called mediation model to handle the confounding effects of genetic background, and use it to analyze flowering time variation in Arabidopsis thaliana, focusing in particular on the central role played by the key regulator FLOWERING TIME LOCUS C (FLC). FLC polymorphism and FLC expression are both strongly correlated with flowering time variation, but the effect of the former is only partly mediated through the latter. Furthermore, the latter also reflects genetic background effects. We demonstrate that it is possible to partition these effects, shedding light on the complex regulatory network that underlies flowering time variation.
Collapse
|
24
|
Mason VC, Schaefer RJ, McCue ME, Leeb T, Gerber V. eQTL discovery and their association with severe equine asthma in European Warmblood horses. BMC Genomics 2018; 19:581. [PMID: 30071827 PMCID: PMC6090848 DOI: 10.1186/s12864-018-4938-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 07/11/2018] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Severe equine asthma, also known as recurrent airway obstruction (RAO), is a debilitating, performance limiting, obstructive respiratory condition in horses that is phenotypically similar to human asthma. Past genome wide association studies (GWAS) have not discovered coding variants associated with RAO, leading to the hypothesis that causative variant(s) underlying the signals are likely non-coding, regulatory variant(s). Regions of the genome containing variants that influence the number of expressed RNA molecules are expression quantitative trait loci (eQTLs). Variation associated with RAO that also regulates a gene's expression in a disease relevant tissue could help identify candidate genes that influence RAO if that gene's expression is also associated with RAO disease status. RESULTS We searched for eQTLs by analyzing peripheral blood mononuclear cells (PBMCs) from two half-sib families and one unrelated cohort of 82 European Warmblood horses that were previously treated in vitro with: no stimulation (MCK), lipopolysaccharides (LPS), recombinant cyathostomin antigen (RCA), and hay-dust extract (HDE). We identified high confidence eQTLs that did not violate linear modeling assumptions and were not significant due to single outlier individuals. We identified a mean of 4347 high confidence eQTLs in four treatments of PBMCs, and discovered two trans regulatory hotspots regulating genes involved in related biological pathways. We corroborated previous RAO associated single nucleotide polymorphisms (SNPs), and increased the resolution of past GWAS by analyzing 1,056,195 SNPs in 361 individuals. We identified four RAO-associated SNPs that only regulate gene expression of dexamethasone-induced protein (DEXI), however we found no significant association between DEXI gene expression and presence of RAO. CONCLUSIONS Thousands of genetic variants regulate gene expression in PBMCs of European Warmblood horses in cis and trans. Most high confidence eSNPs are significantly enriched near the transcription start sites of their target genes. Two trans regulatory hotspots on chromosome 11 and 13 regulate many genes involved in transmembrane cell signaling and neurological development respectively when PBMCs are treated with HDE. None of the top fifteen RAO associated SNPs strongly influence disease status through gene expression regulation.
Collapse
Affiliation(s)
- Victor C. Mason
- Department of Clinical Veterinary Medicine, Swiss Institute of Equine Medicine, Vetsuisse Faculty, University of Bern, and Agroscope, Länggassstrasse 124, 3012 Bern, Switzerland
| | - Robert J. Schaefer
- Department of Veterinary Population Medicine, University of Minnesota, 1365 Gortner Ave, Saint Paul, MN 55108 USA
| | - Molly E. McCue
- Department of Veterinary Population Medicine, University of Minnesota, 1365 Gortner Ave, Saint Paul, MN 55108 USA
| | - Tosso Leeb
- Department of Clinical Research and Veterinary Public Health, Institute of Genetics, Vetsuisse Faculty, University of Bern, Bremgartenstrasse 109A, 3012 Bern, Switzerland
| | - Vinzenz Gerber
- Department of Clinical Veterinary Medicine, Swiss Institute of Equine Medicine, Vetsuisse Faculty, University of Bern, and Agroscope, Länggassstrasse 124, 3012 Bern, Switzerland
| |
Collapse
|
25
|
Abstract
Background Glioma accounts for 80% of malignant brain tumors, but its etiologic determinants remain elusive. Despite genetic susceptibility loci identified by genome-wide association study (GWAS), the agnostic approach leaves open the possibility that other susceptibility genes remain to be discovered. Here we conduct a gene-centric integrative GWAS (iGWAS) of glioma risk that combines transcriptomics and genetics. Methods We synthesized a brain transcriptomics dataset (n = 354), a GWAS dataset (n = 4203), and an advanced glioma tumor transcriptomic dataset (n = 483) to conduct an iGWAS. Using the expression quantitative trait loci (eQTL) dataset, we built models to predict gene expression for the GWAS data, based on eQTL genotypes. With the predicted gene expression, iGWAS analyses were performed using a novel statistical method. Gene signature risk score was constructed using a penalized logistic regression model. Results A total of 30527 transcripts were analyzed using the iGWAS approach. Four novel glioma susceptibility genes were identified with internal and external validation, including DRD5 (P = 3.0 × 10-79), WDR1 (P = 8.4 × 10-77), NOMO1 (P = 1.3 × 10-25), and PDXDC1 (P = 8.3 × 10-24). The genotype-predicted transcription pattern between cases and controls is consistent with that between tumor and its matched normal tissue. The genotype-based 4-gene signature improved the classification between glioma cases and controls based on age, gender, and population stratification, with area under the receiver operating characteristic curve increasing from 0.77 to 0.85 (P = 8.1 × 10-23). Conclusion A new genotype-based gene signature of glioma was identified using a novel iGWAS approach, which integrates multiplatform genomic data as well as different genetic association studies.
Collapse
Affiliation(s)
- Yen-Tsung Huang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan; Department of Epidemiology; Department of Biostatistics, Brown University, Providence, Rhode Island; Department of Public Health and Community Medicine, Tufts University, Boston, Massachusetts
| | - Yi Zhang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan; Department of Epidemiology; Department of Biostatistics, Brown University, Providence, Rhode Island; Department of Public Health and Community Medicine, Tufts University, Boston, Massachusetts
| | - Zhijin Wu
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan; Department of Epidemiology; Department of Biostatistics, Brown University, Providence, Rhode Island; Department of Public Health and Community Medicine, Tufts University, Boston, Massachusetts
| | - Dominique S Michaud
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan; Department of Epidemiology; Department of Biostatistics, Brown University, Providence, Rhode Island; Department of Public Health and Community Medicine, Tufts University, Boston, Massachusetts
| |
Collapse
|
26
|
Barfield R, Shen J, Just AC, Vokonas PS, Schwartz J, Baccarelli AA, VanderWeele TJ, Lin X. Testing for the indirect effect under the null for genome-wide mediation analyses. Genet Epidemiol 2017; 41:824-833. [PMID: 29082545 DOI: 10.1002/gepi.22084] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 08/02/2017] [Accepted: 08/30/2017] [Indexed: 01/07/2023]
Abstract
Mediation analysis helps researchers assess whether part or all of an exposure's effect on an outcome is due to an intermediate variable. The indirect effect can help in designing interventions on the mediator as opposed to the exposure and better understanding the outcome's mechanisms. Mediation analysis has seen increased use in genome-wide epidemiological studies to test for an exposure of interest being mediated through a genomic measure such as gene expression or DNA methylation (DNAm). Testing for the indirect effect is challenged by the fact that the null hypothesis is composite. We examined the performance of commonly used mediation testing methods for the indirect effect in genome-wide mediation studies. When there is no association between the exposure and the mediator and no association between the mediator and the outcome, we show that these common tests are overly conservative. This is a case that will arise frequently in genome-wide mediation studies. Caution is hence needed when applying the commonly used mediation tests in genome-wide mediation studies. We evaluated the performance of these methods using simulation studies, and performed an epigenome-wide mediation association study in the Normative Aging Study, analyzing DNAm as a mediator of the effect of pack-years on FEV1 .
Collapse
Affiliation(s)
- Richard Barfield
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Jincheng Shen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Allan C Just
- Department of Environmental Medicine & Public Health, Icahn School of Medicine at Mount Sinai, New York City, New York, United States of America
| | - Pantel S Vokonas
- VA Normative Aging Study, Veterans Affairs Boston Healthcare System, The Department of Medicine, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Joel Schwartz
- Departments of Environmental Health and Program in Quantitative Genomics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Andrea A Baccarelli
- Departments of Environmental Health Sciences and Epidemiology, Columbia University Mailman School of Public Health, New York City, New York, United States of America
| | - Tyler J VanderWeele
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.,Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
27
|
Chu SH, Huang YT. Integrated genomic analysis of biological gene sets with applications in lung cancer prognosis. BMC Bioinformatics 2017; 18:336. [PMID: 28697753 PMCID: PMC5505153 DOI: 10.1186/s12859-017-1737-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 06/22/2017] [Indexed: 01/22/2023] Open
Abstract
Background Burgeoning interest in integrative analyses has produced a rise in studies which incorporate data from multiple genomic platforms. Literature for conducting formal hypothesis testing on an integrative gene set level is considerably sparse. This paper is biologically motivated by our interest in the joint effects of epigenetic methylation loci and their associated mRNA gene expressions on lung cancer survival status. Results We provide an efficient screening approach across multiplatform genomic data on the level of biologically related sets of genes, and our methods are applicable to various disease models regardless whether the underlying true model is known (iTEGS) or unknown (iNOTE). Our proposed testing procedure dominated two competing methods. Using our methods, we identified a total of 28 gene sets with significant joint epigenomic and transcriptomic effects on one-year lung cancer survival. Conclusions We propose efficient variance component-based testing procedures to facilitate the joint testing of multiplatform genomic data across an entire gene set. The testing procedure for the gene set is self-contained, and can easily be extended to include more or different genetic platforms. iTEGS and iNOTE implemented in R are freely available through the inote package at https://cran.r-project.org//. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1737-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Su Hee Chu
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital Harvard Medical School, 181 Longwood Ave, Boston, MA, USA
| | - Yen-Tsung Huang
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Department of Biostatistics, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Institute of Statistical Science, Academia Sinica, No. 128, Section 2, Academia Rd, Taipei City, Taiwan.
| |
Collapse
|
28
|
Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 2017; 18:117-127. [PMID: 27840428 PMCID: PMC5449190 DOI: 10.1038/nrg.2016.142] [Citation(s) in RCA: 268] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.
Collapse
Affiliation(s)
- Bogdan Pasaniuc
- Departments of Human Genetics, and Pathology and Laboratory Medicine, University of California, Los Angeles, California 90095, USA
| | - Alkes L Price
- Departments of Epidemiology and Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
29
|
Hormozdiari F, van de Bunt M, Segrè AV, Li X, Joo JWJ, Bilow M, Sul JH, Sankararaman S, Pasaniuc B, Eskin E. Colocalization of GWAS and eQTL Signals Detects Target Genes. Am J Hum Genet 2016; 99:1245-1260. [PMID: 27866706 DOI: 10.1016/j.ajhg.2016.10.003] [Citation(s) in RCA: 442] [Impact Index Per Article: 49.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 10/03/2016] [Indexed: 01/01/2023] Open
Abstract
The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci.
Collapse
Affiliation(s)
- Farhad Hormozdiari
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Martijn van de Bunt
- Oxford Centre for Diabetes, Endocrinology, & Metabolism, University of Oxford, Oxford OX3 7LJ, UK; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Ayellet V Segrè
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Xiao Li
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jong Wha J Joo
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Michael Bilow
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jae Hoon Sul
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095, USA; Semel Center for Informatics and Personalized Genomics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
30
|
Kao PYP, Leung KH, Chan LWC, Yip SP, Yap MKH. Pathway analysis of complex diseases for GWAS, extending to consider rare variants, multi-omics and interactions. Biochim Biophys Acta Gen Subj 2016; 1861:335-353. [PMID: 27888147 DOI: 10.1016/j.bbagen.2016.11.030] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 10/17/2016] [Accepted: 11/19/2016] [Indexed: 12/20/2022]
Abstract
BACKGROUND Genome-wide association studies (GWAS) is a major method for studying the genetics of complex diseases. Finding all sequence variants to explain fully the aetiology of a disease is difficult because of their small effect sizes. To better explain disease mechanisms, pathway analysis is used to consolidate the effects of multiple variants, and hence increase the power of the study. While pathway analysis has previously been performed within GWAS only, it can now be extended to examining rare variants, other "-omics" and interaction data. SCOPE OF REVIEW 1. Factors to consider in the choice of software for GWAS pathway analysis. 2. Examples of how pathway analysis is used to analyse rare variants, other "-omics" and interaction data. MAJOR CONCLUSIONS To choose appropriate software tools, factors for consideration include covariate compatibility, null hypothesis, one- or two-step analysis required, curation method of gene sets, size of pathways, and size of flanking regions to define gene boundaries. For rare variants, analysis performance depends on consistency between assumed and actual effect distribution of variants. Integration of other "-omics" data and interaction can better explain gene functions. GENERAL SIGNIFICANCE Pathway analysis methods will be more readily used for integration of multiple sources of data, and enable more accurate prediction of phenotypes.
Collapse
Affiliation(s)
- Patrick Y P Kao
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Kim Hung Leung
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Lawrence W C Chan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Shea Ping Yip
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China.
| | - Maurice K H Yap
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
31
|
Thingholm LB, Andersen L, Makalic E, Southey MC, Thomassen M, Hansen LL. Strategies for Integrated Analysis of Genetic, Epigenetic, and Gene Expression Variation in Cancer: Addressing the Challenges. Front Genet 2016; 7:2. [PMID: 26870081 PMCID: PMC4740898 DOI: 10.3389/fgene.2016.00002] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2015] [Accepted: 01/11/2016] [Indexed: 12/15/2022] Open
Abstract
The development and progression of cancer, a collection of diseases with complex genetic architectures, is facilitated by the interplay of multiple etiological factors. This complexity challenges the traditional single-platform study design and calls for an integrated approach to data analysis. However, integration of heterogeneous measurements of biological variation is a non-trivial exercise due to the diversity of the human genome and the variety of output data formats and genome coverage obtained from the commonly used molecular platforms. This review article will provide an introduction to integration strategies used for analyzing genetic risk factors for cancer. We critically examine the ability of these strategies to handle the complexity of the human genome and also accommodate information about the biological and functional interactions between the elements that have been measured-making the assessment of disease risk against a composite genomic factor possible. The focus of this review is to provide an overview and introduction to the main strategies and to discuss where there is a need for further development.
Collapse
Affiliation(s)
- Louise B Thingholm
- Department of Pathology, The University of MelbourneMelbourne, VIC, Australia; Department of Biomedicine, The University of AarhusAarhus, Denmark
| | - Lars Andersen
- Department of Clinical Genetics, Odense University Hospital Odense, Denmark
| | - Enes Makalic
- Centre for Epidemiology and Biostatistics, The University of Melbourne Melbourne, VIC, Australia
| | - Melissa C Southey
- Department of Pathology, The University of Melbourne Melbourne, VIC, Australia
| | - Mads Thomassen
- Department of Clinical Genetics, Odense University Hospital Odense, Denmark
| | | |
Collapse
|