1
|
Wang T, Qiao J, Zhang S, Wei Y, Zeng P. Simultaneous test and estimation of total genetic effect in eQTL integrative analysis through mixed models. Brief Bioinform 2022; 23:6535679. [PMID: 35212359 DOI: 10.1093/bib/bbac038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 01/22/2022] [Accepted: 02/07/2021] [Indexed: 11/14/2022] Open
Abstract
Integration of expression quantitative trait loci (eQTL) into genome-wide association studies (GWASs) is a promising manner to reveal functional roles of associated single-nucleotide polymorphisms (SNPs) in complex phenotypes and has become an active research field in post-GWAS era. However, how to efficiently incorporate eQTL mapping study into GWAS for prioritization of causal genes remains elusive. We herein proposed a novel method termed as Mixed transcriptome-wide association studies (TWAS) and mediated Variance estimation (MTV) by modeling the effects of cis-SNPs of a gene as a function of eQTL. MTV formulates the integrative method and TWAS within a unified framework via mixed models and therefore includes many prior methods/tests as special cases. We further justified MTV from another two statistical perspectives of mediation analysis and two-stage Mendelian randomization. Relative to existing methods, MTV is superior for pronounced features including the processing of direct effects of cis-SNPs on phenotypes, the powerful likelihood ratio test for assessment of joint effects of cis-SNPs and genetically regulated gene expression (GReX), two useful quantities to measure relative genetic contributions of GReX and cis-SNPs to phenotypic variance, and the computationally efferent parameter expansion expectation maximum algorithm. With extensive simulations, we identified that MTV correctly controlled the type I error in joint evaluation of the total genetic effect and proved more powerful to discover true association signals across various scenarios compared to existing methods. We finally applied MTV to 41 complex traits/diseases available from three GWASs and discovered many new associated genes that had otherwise been missed by existing methods. We also revealed that a small but substantial fraction of phenotypic variation was mediated by GReX. Overall, MTV constructs a robust and realistic modeling foundation for integrative omics analysis and has the advantage of offering more attractive biological interpretations of GWAS results.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Jiahao Qiao
- Department of Biostatistics at Xuzhou Medical University, China
| | - Shuo Zhang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Yongyue Wei
- Department of Biostatistics at Nanjing Medical University, China
| | - Ping Zeng
- Department of Biostatistics, Center for Medical Statistics and Data Analysis and Key Laboratory of Human Genetics and Environmental Medicine at Xuzhou Medical University, China
| |
Collapse
|
2
|
Wang X, Wen Y. A U-statistics for integrative analysis of multilayer omics data. Bioinformatics 2020; 36:2365-2374. [PMID: 31913435 DOI: 10.1093/bioinformatics/btaa004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/09/2019] [Accepted: 01/02/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The emerging multilayer omics data provide unprecedented opportunities for detecting biomarkers that are associated with complex diseases at various molecular levels. However, the high-dimensionality of multiomics data and the complex disease etiologies have brought tremendous analytical challenges. RESULTS We developed a U-statistics-based non-parametric framework for the association analysis of multilayer omics data, where consensus and permutation-based weighting schemes are developed to account for various types of disease models. Our proposed method is flexible for analyzing different types of outcomes as it makes no assumptions about their distributions. Moreover, it explicitly accounts for various types of underlying disease models through weighting schemes and thus provides robust performance against them. Through extensive simulations and the application to dataset obtained from the Alzheimer's Disease Neuroimaging Initiatives, we demonstrated that our method outperformed the commonly used kernel regression-based methods. AVAILABILITY AND IMPLEMENTATION The R-package is available at https://github.com/YaluWen/Uomic. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaqiong Wang
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
3
|
Geng P, Tong X, Lu Q. An integrative U method for joint analysis of multi-level omic data. BMC Genet 2019; 20:40. [PMID: 30967125 PMCID: PMC6457037 DOI: 10.1186/s12863-019-0742-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 03/20/2019] [Indexed: 11/30/2022] Open
Abstract
Background The advance of high-throughput technologies has made it cost-effective to collect diverse types of omic data in large-scale clinical and biological studies. While the collection of the vast amounts of multi-level omic data from these studies provides a great opportunity for genetic research, the high dimensionality of omic data and complex relationships among multi-level omic data bring tremendous analytic challenges. Results To address these challenges, we develop an integrative U (IU) method for the design and analysis of multi-level omic data. While non-parametric methods make less model assumptions and are flexible for analyzing different types of phenotypes and omic data, they have been less developed for association analysis of omic data. The IU method is a nonparametric method that can accommodate various types of omic and phenotype data, and consider interactive relationship among different levels of omic data. Through simulations and a real data application, we compare the IU test with commonly used variance component tests. Conclusions Results show that the proposed test attains more robust type I error performance and higher empirical power than variance component tests under various types of phenotypes and different underlying interaction effects. Electronic supplementary material The online version of this article (10.1186/s12863-019-0742-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pei Geng
- Department of Mathematics, Illinois State University, Normal, IL, 61761, USA
| | - Xiaoran Tong
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
4
|
Shao F, Wang Y, Zhao Y, Yang S. Identifying and exploiting gene-pathway interactions from RNA-seq data for binary phenotype. BMC Genet 2019; 20:36. [PMID: 30890140 PMCID: PMC6423879 DOI: 10.1186/s12863-019-0739-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 03/12/2019] [Indexed: 11/29/2022] Open
Abstract
Background RNA sequencing (RNA-seq) technology has identified multiple differentially expressed (DE) genes associated to complex disease, however, these genes only explain a modest part of variance. Omnigenic model assumes that disease may be driven by genes with indirect relevance to disease and be propagated by functional pathways. Here, we focus on identifying the interactions between the external genes and functional pathways, referring to gene-pathway interactions (GPIs). Specifically, relying on the relationship between the garrote kernel machine (GKM) and variance component test and permutations for the empirical distributions of score statistics, we propose an efficient analysis procedure as Permutation based gEne-pAthway interaction identification in binary phenotype (PEA). Results Various simulations show that PEA has well-calibrated type I error rates and higher power than the traditional likelihood ratio test (LRT). In addition, we perform the gene set enrichment algorithms and PEA to identifying the GPIs from a pan-cancer data (GES68086). These GPIs and genes possibly further illustrate the potential etiology of cancers, most of which are identified and some external genes and significant pathways are consistent with previous studies. Conclusions PEA is an efficient tool for identifying the GPIs from RNA-seq data. It can be further extended to identify the interactions between one variable and one functional set of other omics data for binary phenotypes. Electronic supplementary material The online version of this article (10.1186/s12863-019-0739-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Fang Shao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, Jiangsu, People's Republic of China
| | - Yaqi Wang
- Department of Pharmacy Informatics, School of Science, China Pharmaceutical University, 24 Tongjia Xiang, Nanjing , Jiangsu, People's Republic of China
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, Jiangsu, People's Republic of China
| | - Sheng Yang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, Jiangsu, People's Republic of China.
| |
Collapse
|
5
|
Feldman K, Johnson RA, Chawla NV. The State of Data in Healthcare: Path Towards Standardization. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2018; 2:248-271. [PMID: 35415409 PMCID: PMC8982788 DOI: 10.1007/s41666-018-0019-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 03/21/2018] [Accepted: 03/29/2018] [Indexed: 12/23/2022]
Abstract
Coupled with the rise of data science and machine learning, the increasing availability of digitized health and wellness data has provided an exciting opportunity for complex analyses of problems throughout the healthcare domain. Whereas many early works focused on a particular aspect of patient care, often drawing on data from a specific clinical or administrative source, it has become clear such a single-source approach is insufficient to capture the complexity of the human condition. Instead, adequately modeling health and wellness problems requires the ability to draw upon data spanning multiple facets of an individual's biology, their care, and the social aspects of their life. Although such an awareness has greatly expanded the breadth of health and wellness data collected, the diverse array of data sources and intended uses often leave researchers and practitioners with a scattered and fragmented view of any particular patient. As a result, there exists a clear need to catalogue and organize the range of healthcare data available for analysis. This work represents an effort at developing such an organization, presenting a patient-centric framework deemed the Healthcare Data Spectrum (HDS). Comprised of six layers, the HDS begins with the innermost micro-level omics and macro-level demographic data that directly characterize a patient, and extends at its outermost to aggregate population-level data derived from attributes of care for each individual patient. For each level of the HDS, this manuscript will examine the specific types of constituent data, provide examples of how the data aid in a broad set of research problems, and identify the primary terminology and standards used to describe the data.
Collapse
Affiliation(s)
- Keith Feldman
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46656 USA
- iCeNSA, University of Notre Dame, Notre Dame, IN 46656 USA
| | - Reid A. Johnson
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46656 USA
- iCeNSA, University of Notre Dame, Notre Dame, IN 46656 USA
| | - Nitesh V. Chawla
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46656 USA
- iCeNSA, University of Notre Dame, Notre Dame, IN 46656 USA
| |
Collapse
|
6
|
Abstract
Background Glioma accounts for 80% of malignant brain tumors, but its etiologic determinants remain elusive. Despite genetic susceptibility loci identified by genome-wide association study (GWAS), the agnostic approach leaves open the possibility that other susceptibility genes remain to be discovered. Here we conduct a gene-centric integrative GWAS (iGWAS) of glioma risk that combines transcriptomics and genetics. Methods We synthesized a brain transcriptomics dataset (n = 354), a GWAS dataset (n = 4203), and an advanced glioma tumor transcriptomic dataset (n = 483) to conduct an iGWAS. Using the expression quantitative trait loci (eQTL) dataset, we built models to predict gene expression for the GWAS data, based on eQTL genotypes. With the predicted gene expression, iGWAS analyses were performed using a novel statistical method. Gene signature risk score was constructed using a penalized logistic regression model. Results A total of 30527 transcripts were analyzed using the iGWAS approach. Four novel glioma susceptibility genes were identified with internal and external validation, including DRD5 (P = 3.0 × 10-79), WDR1 (P = 8.4 × 10-77), NOMO1 (P = 1.3 × 10-25), and PDXDC1 (P = 8.3 × 10-24). The genotype-predicted transcription pattern between cases and controls is consistent with that between tumor and its matched normal tissue. The genotype-based 4-gene signature improved the classification between glioma cases and controls based on age, gender, and population stratification, with area under the receiver operating characteristic curve increasing from 0.77 to 0.85 (P = 8.1 × 10-23). Conclusion A new genotype-based gene signature of glioma was identified using a novel iGWAS approach, which integrates multiplatform genomic data as well as different genetic association studies.
Collapse
Affiliation(s)
- Yen-Tsung Huang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan; Department of Epidemiology; Department of Biostatistics, Brown University, Providence, Rhode Island; Department of Public Health and Community Medicine, Tufts University, Boston, Massachusetts
| | - Yi Zhang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan; Department of Epidemiology; Department of Biostatistics, Brown University, Providence, Rhode Island; Department of Public Health and Community Medicine, Tufts University, Boston, Massachusetts
| | - Zhijin Wu
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan; Department of Epidemiology; Department of Biostatistics, Brown University, Providence, Rhode Island; Department of Public Health and Community Medicine, Tufts University, Boston, Massachusetts
| | - Dominique S Michaud
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan; Department of Epidemiology; Department of Biostatistics, Brown University, Providence, Rhode Island; Department of Public Health and Community Medicine, Tufts University, Boston, Massachusetts
| |
Collapse
|
7
|
Zeng P, Zhou X, Huang S. Prediction of gene expression with cis-SNPs using mixed models and regularization methods. BMC Genomics 2017; 18:368. [PMID: 28490319 PMCID: PMC5425981 DOI: 10.1186/s12864-017-3759-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 05/03/2017] [Indexed: 12/25/2022] Open
Abstract
Background It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. Methods We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. Results The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R2 ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R2 ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Conclusions Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, 209 Tongshan Rd, Xuzhou, Jiangsu, 221004, China. .,Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48104, USA.
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48104, USA
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, 209 Tongshan Rd, Xuzhou, Jiangsu, 221004, China.
| |
Collapse
|
8
|
Zhao SD, Cai TT, Cappola TP, Margulies KB, Li H. Sparse simultaneous signal detection for identifying genetically controlled disease genes. J Am Stat Assoc 2017; 112:1032-1046. [PMID: 29375169 PMCID: PMC5784841 DOI: 10.1080/01621459.2016.1270825] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 12/01/2016] [Indexed: 10/20/2022]
Abstract
Genome-wide association studies (GWAS) and differential expression analyses have had limited success in finding genes that cause complex diseases such as heart failure (HF), a leading cause of death in the United States. This paper proposes a new statistical approach that integrates GWAS and expression quantitative trait loci (eQTL) data to identify important HF genes. For such genes, genetic variations that perturb its expression are also likely to influence disease risk. The proposed method thus tests for the presence of simultaneous signals: SNPs that are associated with the gene's expression as well as with disease. An analytic expression for the p-value is obtained, and the method is shown to be asymptotically adaptively optimal under certain conditions. It also allows the GWAS and eQTL data to be collected from different groups of subjects, enabling investigators to integrate public resources with their own data. Simulation experiments show that it can be more powerful than standard approaches and also robust to linkage disequilibrium between variants. The method is applied to an extensive analysis of HF genomics and identifies several genes with biological evidence for being functionally relevant in the etiology of HF. It is implemented in the R package ssa.
Collapse
Affiliation(s)
- Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign
| | - T Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania
| | - Thomas P Cappola
- Penn Cardiovascular Institute and Department of Medicine, Perelman School of Medicine, University of Pennsylvania
| | - Kenneth B Margulies
- Penn Cardiovascular Institute and Department of Medicine, Perelman School of Medicine, University of Pennsylvania
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania
| |
Collapse
|
9
|
Thingholm LB, Andersen L, Makalic E, Southey MC, Thomassen M, Hansen LL. Strategies for Integrated Analysis of Genetic, Epigenetic, and Gene Expression Variation in Cancer: Addressing the Challenges. Front Genet 2016; 7:2. [PMID: 26870081 PMCID: PMC4740898 DOI: 10.3389/fgene.2016.00002] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2015] [Accepted: 01/11/2016] [Indexed: 12/15/2022] Open
Abstract
The development and progression of cancer, a collection of diseases with complex genetic architectures, is facilitated by the interplay of multiple etiological factors. This complexity challenges the traditional single-platform study design and calls for an integrated approach to data analysis. However, integration of heterogeneous measurements of biological variation is a non-trivial exercise due to the diversity of the human genome and the variety of output data formats and genome coverage obtained from the commonly used molecular platforms. This review article will provide an introduction to integration strategies used for analyzing genetic risk factors for cancer. We critically examine the ability of these strategies to handle the complexity of the human genome and also accommodate information about the biological and functional interactions between the elements that have been measured-making the assessment of disease risk against a composite genomic factor possible. The focus of this review is to provide an overview and introduction to the main strategies and to discuss where there is a need for further development.
Collapse
Affiliation(s)
- Louise B Thingholm
- Department of Pathology, The University of MelbourneMelbourne, VIC, Australia; Department of Biomedicine, The University of AarhusAarhus, Denmark
| | - Lars Andersen
- Department of Clinical Genetics, Odense University Hospital Odense, Denmark
| | - Enes Makalic
- Centre for Epidemiology and Biostatistics, The University of Melbourne Melbourne, VIC, Australia
| | - Melissa C Southey
- Department of Pathology, The University of Melbourne Melbourne, VIC, Australia
| | - Mads Thomassen
- Department of Clinical Genetics, Odense University Hospital Odense, Denmark
| | | |
Collapse
|
10
|
Sun YV, Hu YJ. Integrative Analysis of Multi-omics Data for Discovery and Functional Studies of Complex Human Diseases. ADVANCES IN GENETICS 2016; 93:147-90. [PMID: 26915271 DOI: 10.1016/bs.adgen.2015.11.004] [Citation(s) in RCA: 239] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Complex and dynamic networks of molecules are involved in human diseases. High-throughput technologies enable omics studies interrogating thousands to millions of makers with similar biochemical properties (eg, transcriptomics for RNA transcripts). However, a single layer of "omics" can only provide limited insights into the biological mechanisms of a disease. In the case of genome-wide association studies, although thousands of single nucleotide polymorphisms have been identified for complex diseases and traits, the functional implications and mechanisms of the associated loci are largely unknown. Additionally, the genomic variants alone are not able to explain the changing disease risk across the life span. DNA, RNA, protein, and metabolite often have complementary roles to jointly perform a certain biological function. Such complementary effects and synergistic interactions between omic layers in the life course can only be captured by integrative study of multiple molecular layers. Building upon the success in single-omics discovery research, population studies started adopting the multi-omics approach to better understanding the molecular function and disease etiology. Multi-omics approaches integrate data obtained from different omic levels to understand their interrelation and combined influence on the disease processes. Here, we summarize major omics approaches available in population research, and review integrative approaches and methodologies interrogating multiple omic layers, which enhance the gene discovery and functional analysis of human diseases. We seek to provide analytical recommendations for different types of multi-omics data and study designs to guide the emerging multi-omic research, and to suggest improvement of the existing analytical methods.
Collapse
Affiliation(s)
- Yan V Sun
- Department of Epidemiology, Rollins School of Public Health, Atlanta, GA, United States; Department of Biomedical Informatics, School of Medicine, Atlanta, GA, United States
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| |
Collapse
|
11
|
Barr CL, Misener VL. Decoding the non-coding genome: elucidating genetic risk outside the coding genome. GENES, BRAIN, AND BEHAVIOR 2016; 15:187-204. [PMID: 26515765 PMCID: PMC4833497 DOI: 10.1111/gbb.12269] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Revised: 10/19/2015] [Accepted: 10/28/2015] [Indexed: 12/11/2022]
Abstract
Current evidence emerging from genome-wide association studies indicates that the genetic underpinnings of complex traits are likely attributable to genetic variation that changes gene expression, rather than (or in combination with) variation that changes protein-coding sequences. This is particularly compelling with respect to psychiatric disorders, as genetic changes in regulatory regions may result in differential transcriptional responses to developmental cues and environmental/psychosocial stressors. Until recently, however, the link between transcriptional regulation and psychiatric genetic risk has been understudied. Multiple obstacles have contributed to the paucity of research in this area, including challenges in identifying the positions of remote (distal from the promoter) regulatory elements (e.g. enhancers) and their target genes and the underrepresentation of neural cell types and brain tissues in epigenome projects - the availability of high-quality brain tissues for epigenetic and transcriptome profiling, particularly for the adolescent and developing brain, has been limited. Further challenges have arisen in the prediction and testing of the functional impact of DNA variation with respect to multiple aspects of transcriptional control, including regulatory-element interaction (e.g. between enhancers and promoters), transcription factor binding and DNA methylation. Further, the brain has uncommon DNA-methylation marks with unique genomic distributions not found in other tissues - current evidence suggests the involvement of non-CG methylation and 5-hydroxymethylation in neurodevelopmental processes but much remains unknown. We review here knowledge gaps as well as both technological and resource obstacles that will need to be overcome in order to elucidate the involvement of brain-relevant gene-regulatory variants in genetic risk for psychiatric disorders.
Collapse
Affiliation(s)
- C. L. Barr
- Toronto Western Research Institute, University Health Network, Toronto, ON, Canada
- Program in Neurosciences and Mental Health, The Hospital for Sick Children, Toronto, ON, Canada
| | - V. L. Misener
- Toronto Western Research Institute, University Health Network, Toronto, ON, Canada
| |
Collapse
|
12
|
Huen K, Yousefi P, Street K, Eskenazi B, Holland N. PON1 as a model for integration of genetic, epigenetic, and expression data on candidate susceptibility genes. ENVIRONMENTAL EPIGENETICS 2015; 1:dvv003. [PMID: 26913202 PMCID: PMC4762373 DOI: 10.1093/eep/dvv003] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 06/30/2015] [Accepted: 07/14/2015] [Indexed: 05/27/2023]
Abstract
Recent genome- and epigenome-wide studies demonstrate that the DNA methylation is controlled in part by genetics, highlighting the importance of integrating genetic and epigenetic data. To better understand molecular mechanisms affecting gene expression, we used the candidate susceptibility gene paraoxonase 1 (PON1) as a model to assess associations of PON1 genetic polymorphisms with DNA methylation and arylesterase activity, a marker of PON1 expression. PON1 has been associated with susceptibility to obesity, cardiovascular disease, and pesticide exposure. In this study, we assessed DNA methylation in 18 CpG sites located along PON1 shores, shelves, and its CpG island in blood specimens collected from newborns and 9-year-old children participating (n = 449) in the CHAMACOS birth cohort study. The promoter polymorphism, PON1-108 , was strongly associated with methylation, particularly for CpG sites located near the CpG island (P << 0.0005). Among newborns, these relationships were even more pronounced after adjusting for blood cell composition. We also observed significant decreases in arylesterase activity with increased methylation at the same nine CpG sites at both ages. Using causal mediation analysis, we found statistically significant indirect effects of methylation (β(95% confidence interval): 6.9(1.5, 12.4)) providing evidence that DNA methylation mediates the relationship between PON1-108 genotype and PON1 expression. Our findings show that integration of genetic, epigenetic, and expression data can shed light on the functional mechanisms involving genetic and epigenetic regulation of candidate susceptibility genes like PON1.
Collapse
Affiliation(s)
- Karen Huen
- School of Public Health, University of California, Berkeley, 50 University Hall #7360, Berkeley, CA 94720-7360, USA
| | - Paul Yousefi
- School of Public Health, University of California, Berkeley, 50 University Hall #7360, Berkeley, CA 94720-7360, USA
| | - Kelly Street
- School of Public Health, University of California, Berkeley, 50 University Hall #7360, Berkeley, CA 94720-7360, USA
| | - Brenda Eskenazi
- School of Public Health, University of California, Berkeley, 50 University Hall #7360, Berkeley, CA 94720-7360, USA
| | - Nina Holland
- School of Public Health, University of California, Berkeley, 50 University Hall #7360, Berkeley, CA 94720-7360, USA
| |
Collapse
|
13
|
Measuring epigenetics as the mediator of gene/environment interactions in DOHaD. J Dev Orig Health Dis 2014; 6:10-6. [PMID: 25315715 DOI: 10.1017/s2040174414000506] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Analysis of DNA methylation data in epigenome-wide association studies provides many bioinformatics and statistical challenges. Not least of these, are the non-independence of individual DNA methylation marks from each other, from genotype and from technical sources of variation. In this review we discuss DNA methylation data from the Infinium450K array and processing methodologies to reduce technical variation. We describe recent approaches to harness the concordance of neighbouring DNA methylation values to improve power in association studies. We also describe how the non-independence of genotype and DNA methylation has been used to infer causality (in the case of Mendelian randomization approaches); suggest the mediating effect of DNA methylation in linking intergenic single nucleotide polymorphisms, identified in genome-wide association studies, to phenotype; and to uncover the widespread influence of gene and environment interactions on methylation levels.
Collapse
|
14
|
Montague E, Stanberry L, Higdon R, Janko I, Lee E, Anderson N, Choiniere J, Stewart E, Yandl G, Broomall W, Kolker N, Kolker E. MOPED 2.5--an integrated multi-omics resource: multi-omics profiling expression database now includes transcriptomics data. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:335-43. [PMID: 24910945 PMCID: PMC4048574 DOI: 10.1089/omi.2014.0061] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Multi-omics data-driven scientific discovery crucially rests on high-throughput technologies and data sharing. Currently, data are scattered across single omics repositories, stored in varying raw and processed formats, and are often accompanied by limited or no metadata. The Multi-Omics Profiling Expression Database (MOPED, http://moped.proteinspire.org ) version 2.5 is a freely accessible multi-omics expression database. Continual improvement and expansion of MOPED is driven by feedback from the Life Sciences Community. In order to meet the emergent need for an integrated multi-omics data resource, MOPED 2.5 now includes gene relative expression data in addition to protein absolute and relative expression data from over 250 large-scale experiments. To facilitate accurate integration of experiments and increase reproducibility, MOPED provides extensive metadata through the Data-Enabled Life Sciences Alliance (DELSA Global, http://delsaglobal.org ) metadata checklist. MOPED 2.5 has greatly increased the number of proteomics absolute and relative expression records to over 500,000, in addition to adding more than four million transcriptomics relative expression records. MOPED has an intuitive user interface with tabs for querying different types of omics expression data and new tools for data visualization. Summary information including expression data, pathway mappings, and direct connection between proteins and genes can be viewed on Protein and Gene Details pages. These connections in MOPED provide a context for multi-omics expression data exploration. Researchers are encouraged to submit omics data which will be consistently processed into expression summaries. MOPED as a multi-omics data resource is a pivotal public database, interdisciplinary knowledge resource, and platform for multi-omics understanding.
Collapse
Affiliation(s)
- Elizabeth Montague
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Larissa Stanberry
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Roger Higdon
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Imre Janko
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Elaine Lee
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Nathaniel Anderson
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - John Choiniere
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Elizabeth Stewart
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Gregory Yandl
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - William Broomall
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Natali Kolker
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
| | - Eugene Kolker
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, Washington
- High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, Washington
- Predictive Analytics, Seattle Children's, Seattle, Washington
- Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, Washington
- Departments of Biomedical Informatics and Medical Education and Pediatrics, University of Washington, Seattle, Washington
| |
Collapse
|