2
|
Agniel D, Hejblum BP. Variance component score test for time-course gene set analysis of longitudinal RNA-seq data. Biostatistics 2018; 18:589-604. [PMID: 28334305 DOI: 10.1093/biostatistics/kxx005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 01/04/2017] [Indexed: 01/28/2023] Open
Abstract
As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. It has been proposed to model RNA-seq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity. In this vein, we propose tcgsaseq, a principled, model-free, and efficient method for detecting longitudinal changes in RNA-seq gene sets defined a priori. The method identifies those gene sets whose expression varies over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the (transformed) counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, tcgsaseq is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods ROAST (rotation gene set testing), edgeR, and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.
Collapse
Affiliation(s)
- Denis Agniel
- Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck St, Boston, MA 02115, USA
| | - Boris P Hejblum
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA University of Bordeaux, ISPED, INSERM U1219, INRIA SISTM, 146 rue Léo Saignat, 33076 Bordeaux, FRANCE Vaccine Research Institute, Créteil, FRANCE
| |
Collapse
|
3
|
Palluzzi F, Ferrari R, Graziano F, Novelli V, Rossi G, Galimberti D, Rainero I, Benussi L, Nacmias B, Bruni AC, Cusi D, Salvi E, Borroni B, Grassi M. A novel network analysis approach reveals DNA damage, oxidative stress and calcium/cAMP homeostasis-associated biomarkers in frontotemporal dementia. PLoS One 2017; 12:e0185797. [PMID: 29020091 PMCID: PMC5636111 DOI: 10.1371/journal.pone.0185797] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Accepted: 09/19/2017] [Indexed: 01/04/2023] Open
Abstract
Frontotemporal Dementia (FTD) is the form of neurodegenerative dementia with the highest prevalence after Alzheimer’s disease, equally distributed in men and women. It includes several variants, generally characterized by behavioural instability and language impairments. Although few mendelian genes (MAPT, GRN, and C9orf72) have been associated to the FTD phenotype, in most cases there is only evidence of multiple risk loci with relatively small effect size. To date, there are no comprehensive studies describing FTD at molecular level, highlighting possible genetic interactions and signalling pathways at the origin FTD-associated neurodegeneration. In this study, we designed a broad FTD genetic interaction map of the Italian population, through a novel network-based approach modelled on the concepts of disease-relevance and interaction perturbation, combining Steiner tree search and Structural Equation Model (SEM) analysis. Our results show a strong connection between Calcium/cAMP metabolism, oxidative stress-induced Serine/Threonine kinases activation, and postsynaptic membrane potentiation, suggesting a possible combination of neuronal damage and loss of neuroprotection, leading to cell death. In our model, Calcium/cAMP homeostasis and energetic metabolism impairments are primary causes of loss of neuroprotection and neural cell damage, respectively. Secondly, the altered postsynaptic membrane potentiation, due to the activation of stress-induced Serine/Threonine kinases, leads to neurodegeneration. Our study investigates the molecular underpinnings of these processes, evidencing key genes and gene interactions that may account for a significant fraction of unexplained FTD aetiology. We emphasized the key molecular actors in these processes, proposing them as novel FTD biomarkers that could be crucial for further epidemiological and molecular studies.
Collapse
Affiliation(s)
- Fernando Palluzzi
- Department of Brain and Behavioural Sciences, Medical and Genomic Statistics Unit, University of Pavia, Pavia, Italy
- * E-mail:
| | - Raffaele Ferrari
- Department of Molecular Neuroscience, Institute of Neurology, University College London (UCL), London, United Kingdom
| | - Francesca Graziano
- Department of Brain and Behavioural Sciences, Medical and Genomic Statistics Unit, University of Pavia, Pavia, Italy
| | - Valeria Novelli
- Department of Genetics, Fondazione Policlinico A. Gemelli, Roma, Italy
| | - Giacomina Rossi
- Division of Neurology V and Neuropathology, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milano, Italy
| | - Daniela Galimberti
- Department of Neurological Sciences, Dino Ferrari Institute, University of Milan, Milano, Italy
| | - Innocenzo Rainero
- Department of Neuroscience, Neurology I, University of Torino and Città della Salute e della Scienza di Torino, Torino, Italy
| | - Luisa Benussi
- Molecular Markers Laboratory, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | - Benedetta Nacmias
- Department of Neuroscience, Psychology, Drug Research and Child Health, University of Florence, Firenze, Italy
| | - Amalia C. Bruni
- Neurogenetic Regional Centre ASPCZ Lamezia Terme, Lamezia Terme (CZ), Italy
| | - Daniele Cusi
- Department of Health Sciences, University of Milan at San Paolo Hospital, Milano, Italy
- Institute of Biomedical Technologies, Italian National Research Council, Milano, Italy
| | - Erika Salvi
- Institute of Biomedical Technologies, Italian National Research Council, Milano, Italy
| | - Barbara Borroni
- Department of Medical Sciences, Neurology Clinic, University of Brescia, Brescia, Italy
| | - Mario Grassi
- Department of Brain and Behavioural Sciences, Medical and Genomic Statistics Unit, University of Pavia, Pavia, Italy
| |
Collapse
|
4
|
Hejblum BP, Skinner J, Thiébaut R. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLoS Comput Biol 2015; 11:e1004310. [PMID: 26111374 PMCID: PMC4482329 DOI: 10.1371/journal.pcbi.1004310] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 04/30/2015] [Indexed: 01/13/2023] Open
Abstract
Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. Gene set analysis methods use prior biological knowledge to analyze gene expression data. This prior knowledge takes the form of predefined groups of genes, linked through their biological function. Gene set analysis methods have been successfully applied in transversal studies, their results being more sensitive and interpretable than those of methods investigating genomic data one gene at a time. The time-course gene set analysis (TcGSA) introduced here is an extension of such gene set analysis to longitudinal data. This method identifies a priori defined groups of genes whose expression is not stable over time, taking into account the potential heterogeneity between patients and between genes. When biological conditions are compared, it identifies the gene sets that have different expression dynamics according to these conditions. Data from 2 studies are analyzed: data from an HIV therapeutic vaccine trial, and data from a recent study on influenza and pneumococcal vaccines. In both cases, TcGSA provided new insights compared to standard approaches thanks to an increased sensitivity compared to other approaches. Those results highlight the benefits of the TcGSA method for analyzing gene expression dynamics.
Collapse
Affiliation(s)
- Boris P. Hejblum
- Univ. Bordeaux, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, F-33000 Bordeaux, France
- INSERM, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, F-33000 Bordeaux, France
- INRIA, Team SISTM, F-33000 Bordeaux, France
- Vaccine Research Institute-VRI, Hôpital Henri Mondor, Créteil, France
- Baylor Institute for Immunology Research, Dallas, Texas, United States of America
| | - Jason Skinner
- Vaccine Research Institute-VRI, Hôpital Henri Mondor, Créteil, France
- Baylor Institute for Immunology Research, Dallas, Texas, United States of America
| | - Rodolphe Thiébaut
- Univ. Bordeaux, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, F-33000 Bordeaux, France
- INSERM, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, F-33000 Bordeaux, France
- INRIA, Team SISTM, F-33000 Bordeaux, France
- Vaccine Research Institute-VRI, Hôpital Henri Mondor, Créteil, France
- Baylor Institute for Immunology Research, Dallas, Texas, United States of America
- * E-mail:
| |
Collapse
|