1
|
Xu H, Shao Z, Zhang S, Liu X, Zeng P. How can childhood maltreatment affect post-traumatic stress disorder in adult: Results from a composite null hypothesis perspective of mediation analysis. Front Psychiatry 2023; 14:1102811. [PMID: 36970281 PMCID: PMC10033829 DOI: 10.3389/fpsyt.2023.1102811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/20/2023] [Indexed: 03/11/2023] Open
Abstract
BackgroundA greatly growing body of literature has revealed the mediating role of DNA methylation in the influence path from childhood maltreatment to psychiatric disorders such as post-traumatic stress disorder (PTSD) in adult. However, the statistical method is challenging and powerful mediation analyses regarding this issue are lacking.MethodsTo study how the maltreatment in childhood alters long-lasting DNA methylation changes which further affect PTSD in adult, we here carried out a gene-based mediation analysis from a perspective of composite null hypothesis in the Grady Trauma Project (352 participants and 16,565 genes) with childhood maltreatment as exposure, multiple DNA methylation sites as mediators, and PTSD or its relevant scores as outcome. We effectively addressed the challenging issue of gene-based mediation analysis by taking its composite null hypothesis testing nature into consideration and fitting a weighted test statistic.ResultsWe discovered that childhood maltreatment could substantially affected PTSD or PTSD-related scores, and that childhood maltreatment was associated with DNA methylation which further had significant roles in PTSD and these scores. Furthermore, using the proposed mediation method, we identified multiple genes within which DNA methylation sites exhibited mediating roles in the influence path from childhood maltreatment to PTSD-relevant scores in adult, with 13 for Beck Depression Inventory and 6 for modified PTSD Symptom Scale, respectively.ConclusionOur results have the potential to confer meaningful insights into the biological mechanism for the impact of early adverse experience on adult diseases; and our proposed mediation methods can be applied to other similar analysis settings.
Collapse
Affiliation(s)
- Haibo Xu
- Center for Mental Health Education and Research, Xuzhou Medical University, Xuzhou, China
- School of Management, Xuzhou Medical University, Xuzhou, China
- *Correspondence: Haibo Xu,
| | - Zhonghe Shao
- Department of Epidemiology and Biostatistics, Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Xin Liu
- Center for Mental Health Education and Research, Xuzhou Medical University, Xuzhou, China
- School of Management, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Ping Zeng,
| |
Collapse
|
2
|
A Two-Part Mixed Model for Differential Expression Analysis in Single-Cell High-Throughput Gene Expression Data. Genes (Basel) 2022; 13:genes13020377. [PMID: 35205420 PMCID: PMC8872627 DOI: 10.3390/genes13020377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 02/07/2022] [Accepted: 02/15/2022] [Indexed: 02/04/2023] Open
Abstract
The high-throughput gene expression data generated from recent single-cell RNA sequencing (scRNA-seq) and parallel single-cell reverse transcription quantitative real-time PCR (scRT-qPCR) technologies enable biologists to study the function of transcriptome at the level of individual cells. Compared with bulk RNA-seq and RT-qPCR gene expression data, single-cell data show notable distinct features, including excessive zero expression values, high variability, and clustered design. We propose to model single-cell high-throughput gene expression data using a two-part mixed model, which not only adequately accounts for the aforementioned features of single-cell expression data but also provides the flexibility of adjusting for covariates. An efficient computational algorithm, automatic differentiation, is used for estimating the model parameters. Compared with existing methods, our approach shows improved power for detecting differential expressed genes in single-cell high-throughput gene expression data.
Collapse
|
3
|
Pluta D, Shen T, Xue G, Chen C, Ombao H, Yu Z. Ridge-penalized adaptive Mantel test and its application in imaging genetics. Stat Med 2021; 40:5313-5332. [PMID: 34216035 DOI: 10.1002/sim.9127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 06/01/2021] [Accepted: 06/16/2021] [Indexed: 01/23/2023]
Abstract
We propose a ridge-penalized adaptive Mantel test (AdaMant) for evaluating the association of two high-dimensional sets of features. By introducing a ridge penalty, AdaMant tests the association across many metrics simultaneously. We demonstrate how ridge penalization bridges Euclidean and Mahalanobis distances and their corresponding linear models from the perspective of association measurement and testing. This result is not only theoretically interesting but also has important implications in penalized hypothesis testing, especially in high-dimensional settings such as imaging genetics. Applying the proposed method to an imaging genetic study of visual working memory in healthy adults, we identified interesting associations of brain connectivity (measured by electroencephalogram coherence) with selected genetic features.
Collapse
Affiliation(s)
- Dustin Pluta
- Department of Statistics, University of California, Irvine, Irvine, California, USA
| | - Tong Shen
- Department of Statistics, University of California, Irvine, Irvine, California, USA
| | - Gui Xue
- Center for Brain and Learning Science, Beijing Normal University, Beijing, China
| | - Chuansheng Chen
- Department of Psychology and Social Behavior, University of California, Irvine, Irvine, California, USA
| | - Hernando Ombao
- Statistics Program, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Zhaoxia Yu
- Department of Statistics, University of California, Irvine, Irvine, California, USA
| |
Collapse
|
4
|
Lu H, Zhang J, Jiang Z, Zhang M, Wang T, Zhao H, Zeng P. Detection of Genetic Overlap Between Rheumatoid Arthritis and Systemic Lupus Erythematosus Using GWAS Summary Statistics. Front Genet 2021; 12:656545. [PMID: 33815486 PMCID: PMC8012913 DOI: 10.3389/fgene.2021.656545] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/01/2021] [Indexed: 01/04/2023] Open
Abstract
Background Clinical and epidemiological studies have suggested systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) are comorbidities and common genetic etiologies can partly explain such coexistence. However, shared genetic determinations underlying the two diseases remain largely unknown. Methods Our analysis relied on summary statistics available from genome-wide association studies of SLE (N = 23,210) and RA (N = 58,284). We first evaluated the genetic correlation between RA and SLE through the linkage disequilibrium score regression (LDSC). Then, we performed a multiple-tissue eQTL (expression quantitative trait loci) weighted integrative analysis for each of the two diseases and aggregated association evidence across these tissues via the recently proposed harmonic mean P-value (HMP) combination strategy, which can produce a single well-calibrated P-value for correlated test statistics. Afterwards, we conducted the pleiotropy-informed association using conjunction conditional FDR (ccFDR) to identify potential pleiotropic genes associated with both RA and SLE. Results We found there existed a significant positive genetic correlation (rg = 0.404, P = 6.01E-10) via LDSC between RA and SLE. Based on the multiple-tissue eQTL weighted integrative analysis and the HMP combination across various tissues, we discovered 14 potential pleiotropic genes by ccFDR, among which four were likely newly novel genes (i.e., INPP5B, OR5K2, RP11-2C24.5, and CTD-3105H18.4). The SNP effect sizes of these pleiotropic genes were typically positively dependent, with an average correlation of 0.579. Functionally, these genes were implicated in multiple auto-immune relevant pathways such as inositol phosphate metabolic process, membrane and glucagon signaling pathway. Conclusion This study reveals common genetic components between RA and SLE and provides candidate associated loci for understanding of molecular mechanism underlying the comorbidity of the two diseases.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Zhou Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Meng Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
5
|
Zhuo B, Jiang D, Di Y. Test-statistic correlation and data-row correlation. Stat Probab Lett 2020; 167. [DOI: 10.1016/j.spl.2020.108903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Gauthier M, Agniel D, Thiébaut R, Hejblum BP. dearseq: a variance component score test for RNA-seq differential analysis that effectively controls the false discovery rate. NAR Genom Bioinform 2020; 2:lqaa093. [PMID: 33575637 PMCID: PMC7676475 DOI: 10.1093/nargab/lqaa093] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 10/14/2020] [Accepted: 10/23/2020] [Indexed: 12/20/2022] Open
Abstract
RNA-seq studies are growing in size and popularity. We provide evidence that the most commonly used methods for differential expression analysis (DEA) may yield too many false positive results in some situations. We present dearseq, a new method for DEA that controls the false discovery rate (FDR) without making any assumption about the true distribution of RNA-seq data. We show that dearseq controls the FDR while maintaining strong statistical power compared to the most popular methods. We demonstrate this behavior with mathematical proofs, simulations and a real data set from a study of tuberculosis, where our method produces fewer apparent false positives.
Collapse
Affiliation(s)
- Marine Gauthier
- INRIA SISTM, INSERM Bordeaux Population Health Research Center, University of Bordeaux, F-33000 Bordeaux, France
| | | | - Rodolphe Thiébaut
- INRIA SISTM, INSERM Bordeaux Population Health Research Center, University of Bordeaux, F-33000 Bordeaux, France
| | - Boris P Hejblum
- INRIA SISTM, INSERM Bordeaux Population Health Research Center, University of Bordeaux, F-33000 Bordeaux, France
| |
Collapse
|
7
|
Liu Z, Barnett I, Lin X. A COMPARISON OF PRINCIPAL COMPONENT METHODS BETWEEN MULTIPLE PHENOTYPE REGRESSION AND MULTIPLE SNP REGRESSION IN GENETIC ASSOCIATION STUDIES. Ann Appl Stat 2020; 14:433-451. [PMID: 37398898 PMCID: PMC10313330 DOI: 10.1214/19-aoas1312] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2023]
Abstract
Principal component analysis (PCA) is a popular method for dimension reduction in unsupervised multivariate analysis. However, existing ad hoc uses of PCA in both multivariate regression (multiple outcomes) and multiple regression (multiple predictors) lack theoretical justification. The differences in the statistical properties of PCAs in these two regression settings are not well understood. In this paper we provide theoretical results on the power of PCA in genetic association testings in both multiple phenotype and SNP-set settings. The multiple phenotype setting refers to the case when one is interested in studying the association between a single SNP and multiple phenotypes as outcomes. The SNP-set setting refers to the case when one is interested in studying the association between multiple SNPs in a SNP set and a single phenotype as the outcome. We demonstrate analytically that the properties of the PC-based analysis in these two regression settings are substantially different. We show that the lower order PCs, that is, PCs with large eigenvalues, are generally preferred and lead to a higher power in the SNP-set setting, while the higher-order PCs, that is, PCs with small eigenvalues, are generally preferred in the multiple phenotype setting. We also investigate the power of three other popular statistical methods, the Wald test, the variance component test and the minimum p -value test, in both multiple phenotype and SNP-set settings. We use theoretical power, simulation studies, and two real data analyses to validate our findings.
Collapse
Affiliation(s)
- Zhonghua Liu
- Department of Statistics and Actuarial Science, The University of Hong Kong
| | - Ian Barnett
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania
| | - Xihong Lin
- Department of Biostatistics and Statistics, Harvard University
| |
Collapse
|
8
|
Salviato E, Djordjilović V, Chiogna M, Romualdi C. SourceSet: A graphical model approach to identify primary genes in perturbed biological pathways. PLoS Comput Biol 2019; 15:e1007357. [PMID: 31652275 PMCID: PMC6834292 DOI: 10.1371/journal.pcbi.1007357] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 11/06/2019] [Accepted: 08/23/2019] [Indexed: 11/24/2022] Open
Abstract
Topological gene-set analysis has emerged as a powerful means for omic data interpretation. Although numerous methods for identifying dysregulated genes have been proposed, few of them aim to distinguish genes that are the real source of perturbation from those that merely respond to the signal dysregulation. Here, we propose a new method, called SourceSet, able to distinguish between the primary and the secondary dysregulation within a Gaussian graphical model context. The proposed method compares gene expression profiles in the control and in the perturbed condition and detects the differences in both the mean and the covariance parameters with a series of likelihood ratio tests. The resulting evidence is used to infer the primary and the secondary set, i.e. the genes responsible for the primary dysregulation, and the genes affected by the perturbation through network propagation. The proposed method demonstrates high specificity and sensitivity in different simulated scenarios and on several real biological case studies. In order to fit into the more traditional pathway analysis framework, SourceSet R package also extends the analysis from a single to multiple pathways and provides several graphical outputs, including Cytoscape visualization to browse the results. The rapid increase in omic studies has created a need to understand the biological implications of their results. Gene-set analysis has emerged as a powerful means for gaining such understanding, evolving in the last decade from the classical enrichment analysis to the more powerful topological approaches. Although numerous methods for identifying dysregulated genes have been proposed, few of them aim to distinguish genes that are the real source of perturbation from those that merely respond to the signal dysregulation. This distinction is crucial for network medicine, where the prioritization of the effect of biological perturbations may help in the molecular understanding of drug treatments and diseases. Here we propose a new method, called SourceSet, able to distinguish between primary and secondary dysregulation within a graphical model context, demonstrating a high specificity and sensitivity in different simulated scenarios and on real biological case studies.
Collapse
Affiliation(s)
- Elisa Salviato
- IFOM - The FIRC Institute of Molecular Oncology, Milan, Italy
- * E-mail: (ES); (CR)
| | | | - Monica Chiogna
- Department of Statistical Sciences, University of Bologna, Bologna, Italy
| | - Chiara Romualdi
- Department of Biology, University of Padova, Padova, Italy
- * E-mail: (ES); (CR)
| |
Collapse
|
9
|
Liu Z, Lin X. A Geometric Perspective on the Power of Principal Component Association Tests in Multiple Phenotype Studies. J Am Stat Assoc 2019; 114:975-990. [PMID: 31564761 DOI: 10.1080/01621459.2018.1513363] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Joint analysis of multiple phenotypes can increase statistical power in genetic association studies. Principal component analysis, as a popular dimension reduction method, especially when the number of phenotypes is high-dimensional, has been proposed to analyze multiple correlated phenotypes. It has been empirically observed that the first PC, which summarizes the largest amount of variance, can be less powerful than higher order PCs and other commonly used methods in detecting genetic association signals. In this paper, we investigate the properties of PCA-based multiple phenotype analysis from a geometric perspective by introducing a novel concept called principal angle. A particular PC is powerful if its principal angle is 0° and is powerless if its principal angle is 90°. Without prior knowledge about the true principal angle, each PC can be powerless. We propose linear, non-linear and data-adaptive omnibus tests by combining PCs. We demonstrate that the Wald test is a special quadratic PC-based test. We show that the omnibus PC test is robust and powerful in a wide range of scenarios. We study the properties of the proposed methods using power analysis and eigen-analysis. The subtle differences and close connections between these combined PC methods are illustrated graphically in terms of their rejection boundaries. Our proposed tests have convex acceptance regions and hence are admissible. The p-values for the proposed tests can be efficiently calculated analytically and the proposed tests have been implemented in a publicly available R package MPAT. We conduct simulation studies in both low and high dimensional settings with various signal vectors and correlation structures. We apply the proposed tests to the joint analysis of metabolic syndrome related phenotypes with data sets collected from four international consortia to demonstrate the effectiveness of the proposed combined PC testing procedures.
Collapse
Affiliation(s)
- Zhonghua Liu
- Department of Statistics and Actuarial Science, University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Xihong Lin
- Chair and Henry Pickering Walcott Professor of Biostatistics, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| |
Collapse
|
10
|
Agniel D, Hejblum BP. Variance component score test for time-course gene set analysis of longitudinal RNA-seq data. Biostatistics 2018; 18:589-604. [PMID: 28334305 DOI: 10.1093/biostatistics/kxx005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 01/04/2017] [Indexed: 01/28/2023] Open
Abstract
As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. It has been proposed to model RNA-seq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity. In this vein, we propose tcgsaseq, a principled, model-free, and efficient method for detecting longitudinal changes in RNA-seq gene sets defined a priori. The method identifies those gene sets whose expression varies over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the (transformed) counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, tcgsaseq is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods ROAST (rotation gene set testing), edgeR, and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.
Collapse
Affiliation(s)
- Denis Agniel
- Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck St, Boston, MA 02115, USA
| | - Boris P Hejblum
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA University of Bordeaux, ISPED, INSERM U1219, INRIA SISTM, 146 rue Léo Saignat, 33076 Bordeaux, FRANCE Vaccine Research Institute, Créteil, FRANCE
| |
Collapse
|
11
|
Yang S, Shao F, Duan W, Zhao Y, Chen F. Variance component testing for identifying differentially expressed genes in RNA-seq data. PeerJ 2017; 5:e3797. [PMID: 28929020 PMCID: PMC5592911 DOI: 10.7717/peerj.3797] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 08/21/2017] [Indexed: 01/28/2023] Open
Abstract
RNA sequencing (RNA-Seq) enables the measurement and comparison of gene expression with isoform-level quantification. Differences in the effect of each isoform may make traditional methods, which aggregate isoforms, ineffective. Here, we introduce a variance component-based test that can jointly test multiple isoforms of one gene to identify differentially expressed (DE) genes, especially those with isoforms that have differential effects. We model isoform-level expression data from RNA-Seq using a negative binomial distribution and consider the baseline abundance of isoforms and their effects as two random terms. Our approach tests the global null hypothesis of no difference in any of the isoforms. The null distribution of the derived score statistic is investigated using empirical and theoretical methods. The results of simulations suggest that the performance of the proposed set test is superior to that of traditional algorithms and almost reaches optimal power when the variance of covariates is large. This method is also applied to analyze real data. Our algorithm, as a supplement to traditional algorithms, is superior at selecting DE genes with sparse or opposite effects for isoforms.
Collapse
Affiliation(s)
- Sheng Yang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| | - Fang Shao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| | - Weiwei Duan
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| | - Yang Zhao
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| | - Feng Chen
- Department of Biostatistics, School of Public Health, Nanjing Medical University, China
| |
Collapse
|
12
|
Chu SH, Huang YT. Integrated genomic analysis of biological gene sets with applications in lung cancer prognosis. BMC Bioinformatics 2017; 18:336. [PMID: 28697753 PMCID: PMC5505153 DOI: 10.1186/s12859-017-1737-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 06/22/2017] [Indexed: 01/22/2023] Open
Abstract
Background Burgeoning interest in integrative analyses has produced a rise in studies which incorporate data from multiple genomic platforms. Literature for conducting formal hypothesis testing on an integrative gene set level is considerably sparse. This paper is biologically motivated by our interest in the joint effects of epigenetic methylation loci and their associated mRNA gene expressions on lung cancer survival status. Results We provide an efficient screening approach across multiplatform genomic data on the level of biologically related sets of genes, and our methods are applicable to various disease models regardless whether the underlying true model is known (iTEGS) or unknown (iNOTE). Our proposed testing procedure dominated two competing methods. Using our methods, we identified a total of 28 gene sets with significant joint epigenomic and transcriptomic effects on one-year lung cancer survival. Conclusions We propose efficient variance component-based testing procedures to facilitate the joint testing of multiplatform genomic data across an entire gene set. The testing procedure for the gene set is self-contained, and can easily be extended to include more or different genetic platforms. iTEGS and iNOTE implemented in R are freely available through the inote package at https://cran.r-project.org//. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1737-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Su Hee Chu
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital Harvard Medical School, 181 Longwood Ave, Boston, MA, USA
| | - Yen-Tsung Huang
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Department of Biostatistics, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Institute of Statistical Science, Academia Sinica, No. 128, Section 2, Academia Rd, Taipei City, Taiwan.
| |
Collapse
|
13
|
Sørensen IF, Edwards SM, Rohde PD, Sørensen P. Multiple Trait Covariance Association Test Identifies Gene Ontology Categories Associated with Chill Coma Recovery Time in Drosophila melanogaster. Sci Rep 2017; 7:2413. [PMID: 28546557 PMCID: PMC5445101 DOI: 10.1038/s41598-017-02281-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 04/10/2017] [Indexed: 12/29/2022] Open
Abstract
The genomic best linear unbiased prediction (GBLUP) model has proven to be useful for prediction of complex traits as well as estimation of population genetic parameters. Improved inference and prediction accuracy of GBLUP may be achieved by identifying genomic regions enriched for causal genetic variants. We aimed at searching for patterns in GBLUP-derived single-marker statistics, by including them in genetic marker set tests, that could reveal associations between a set of genetic markers (genomic feature) and a complex trait. GBLUP-derived set tests proved to be powerful for detecting genomic features, here defined by gene ontology (GO) terms, enriched for causal variants affecting a quantitative trait in a population with low degree of relatedness. Different set test approaches were compared using simulated data illustrating the impact of trait- and genomic feature-specific factors on detection power. We extended the most powerful single trait set test, covariance association test (CVAT), to a multiple trait setting. The multiple trait CVAT (MT-CVAT) identified functionally relevant GO categories associated with the quantitative trait, chill coma recovery time, in the unrelated, sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel.
Collapse
Affiliation(s)
- Izel Fourie Sørensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark.
| | - Stefan M Edwards
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark.,The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Palle Duun Rohde
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark.,Centre for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus, Denmark.,iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, 8000, Aarhus, Denmark
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark
| |
Collapse
|
14
|
Zhuo B, Jiang D. MEACA: efficient gene-set interpretation of expression data using mixed models.. [DOI: 10.1101/106781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
AbstractCompetitive gene-set analysis, or enrichment analysis, is widely used for functional interpretation of gene expression data. It tests a known category (e.g. pathway) of genes for enriched differential expression signals. Current methods do not properly capture inter-gene correlations and heterogeneity, resulting in mis-calibration and power loss. We propose MEACA, a new gene-set method based on mixed-effects models. MEACA flexibly incorporates unknown heterogeneity and correlations across genes, and does not need time-consuming permutations. Compared to existing methods, MEACA substantially improves type 1 error control and power in widely ranging scenarios. Real data applications demonstrate MEACA’s ability to recover biologically meaningful relationships.
Collapse
|
15
|
Covariance Association Test (CVAT) Identifies Genetic Markers Associated with Schizophrenia in Functionally Associated Biological Processes. Genetics 2016; 203:1901-13. [PMID: 27317683 DOI: 10.1534/genetics.116.189498] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Accepted: 06/09/2016] [Indexed: 12/12/2022] Open
Abstract
Schizophrenia is a psychiatric disorder with large personal and social costs, and understanding the genetic etiology is important. Such knowledge can be obtained by testing the association between a disease phenotype and individual genetic markers; however, such single-marker methods have limited power to detect genetic markers with small effects. Instead, aggregating genetic markers based on biological information might increase the power to identify sets of genetic markers of etiological significance. Several set test methods have been proposed: Here we propose a new set test derived from genomic best linear unbiased prediction (GBLUP), the covariance association test (CVAT). We compared the performance of CVAT to other commonly used set tests. The comparison was conducted using a simulated study population having the same genetic parameters as for schizophrenia. We found that CVAT was among the top performers. When extending CVAT to utilize a mixture of SNP effects, we found an increase in power to detect the causal sets. Applying the methods to a Danish schizophrenia case-control data set, we found genomic evidence for association of schizophrenia with vitamin A metabolism and immunological responses, which previously have been implicated with schizophrenia based on experimental and observational studies.
Collapse
|
16
|
Epigenetic patterns in successful weight loss maintainers: a pilot study. Int J Obes (Lond) 2014; 39:865-868. [PMID: 25520250 PMCID: PMC4422763 DOI: 10.1038/ijo.2014.213] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 11/20/2014] [Accepted: 11/26/2014] [Indexed: 11/08/2022]
Abstract
DNA methylation changes occur in animal models of calorie restriction, simulating human dieting, and in human subjects undergoing behavioral weight loss interventions. This suggests that obese (OB) individuals may possess unique epigenetic patterns that may vary with weight loss. Here, we examine whether methylation patterns in leukocytes differ in individuals who lost sufficient weight to go from OB to normal weight (NW; successful weight loss maintainers; SWLMs) vs currently OB or NW individuals. This study examined peripheral blood mononuclear cell (PBMC) methylation patterns in NW (n=16, current/lifetime BMI 18.5-24.9) and OB individuals (n=16, current body mass index (BMI)⩾30), and SWLM (n=16, current BMI 18.5-24.9, lifetime maximum BMI ⩾30, average weight loss 57.4 lbs) using an Illumina Infinium HumanMethylation450 BeadArray. No leukocyte population-adjusted epigenome-wide analyses were significant; however, potentially differentially methylated loci across the groups were observed in ryanodine receptor-1 (RYR1; P=1.54E-6), myelin protein zero-like 3 (MPZL3; P=4.70E-6) and alpha 3c tubulin (TUBA3C; P=4.78E-6). In 32 obesity-related candidate genes, differential methylation patterns were found in brain-derived neurotrophic factor (BDNF; gene-wide P=0.00018). In RYR1, TUBA3C and BDNF, SWLM differed from OB but not NW. In this preliminary investigation, leukocyte SWLM DNA methylation patterns more closely resembled NW than OB individuals in three gene regions. These results suggest that PBMC methylation is associated with weight status.
Collapse
|
17
|
Huang YT, Hsu T, Christiani DC. TEGS-CN: A Statistical Method for Pathway Analysis of Genome-wide Copy Number Profile. Cancer Inform 2014; 13:15-23. [PMID: 25452685 PMCID: PMC4218657 DOI: 10.4137/cin.s13978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2014] [Revised: 06/05/2014] [Accepted: 06/06/2014] [Indexed: 11/30/2022] Open
Abstract
The effects of copy number alterations make up a significant part of the tumor genome profile, but pathway analyses of these alterations are still not well established. We proposed a novel method to analyze multiple copy numbers of genes within a pathway, termed Test for the Effect of a Gene Set with Copy Number data (TEGS-CN). TEGS-CN was adapted from TEGS, a method that we previously developed for gene expression data using a variance component score test. With additional development, we extend the method to analyze DNA copy number data, accounting for different sizes and thus various numbers of copy number probes in genes. The test statistic follows a mixture of X2 distributions that can be obtained using permutation with scaled X2 approximation. We conducted simulation studies to evaluate the size and the power of TEGS-CN and to compare its performance with TEGS. We analyzed a genome-wide copy number data from 264 patients of non-small-cell lung cancer. With the Molecular Signatures Database (MSigDB) pathway database, the genome-wide copy number data can be classified into 1814 biological pathways or gene sets. We investigated associations of the copy number profile of the 1814 gene sets with pack-years of cigarette smoking. Our analysis revealed five pathways with significant P values after Bonferroni adjustment (<2.8 × 10−5), including the PTEN pathway (7.8 × 10−7), the gene set up-regulated under heat shock (3.6 × 10−6), the gene sets involved in the immune profile for rejection of kidney transplantation (9.2 × 10−6) and for transcriptional control of leukocytes (2.2 × 10−5), and the ganglioside biosynthesis pathway (2.7 × 10−5). In conclusion, we present a new method for pathway analyses of copy number data, and causal mechanisms of the five pathways require further study.
Collapse
Affiliation(s)
- Yen-Tsung Huang
- Department of Epidemiology, Brown University, Providence, RI
| | - Thomas Hsu
- Program in Biology, Brown University, Providence, RI
| | - David C Christiani
- Departments of Environmental Health and Epidemiology, Harvard School of Public Health, Boston, MA. ; Massachusetts General Hospital/Harvard Medical School, Boston, MA
| |
Collapse
|