1
|
Liehrmann A, Delannoy E, Launay-Avon A, Gilbault E, Loudet O, Castandet B, Rigaill G. DiffSegR: an RNA-seq data driven method for differential expression analysis using changepoint detection. NAR Genom Bioinform 2023; 5:lqad098. [PMID: 37954572 PMCID: PMC10632193 DOI: 10.1093/nargab/lqad098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/27/2023] [Accepted: 10/23/2023] [Indexed: 11/14/2023] Open
Abstract
To fully understand gene regulation, it is necessary to have a thorough understanding of both the transcriptome and the enzymatic and RNA-binding activities that shape it. While many RNA-Seq-based tools have been developed to analyze the transcriptome, most only consider the abundance of sequencing reads along annotated patterns (such as genes). These annotations are typically incomplete, leading to errors in the differential expression analysis. To address this issue, we present DiffSegR - an R package that enables the discovery of transcriptome-wide expression differences between two biological conditions using RNA-Seq data. DiffSegR does not require prior annotation and uses a multiple changepoints detection algorithm to identify the boundaries of differentially expressed regions in the per-base log2 fold change. In a few minutes of computation, DiffSegR could rightfully predict the role of chloroplast ribonuclease Mini-III in rRNA maturation and chloroplast ribonuclease PNPase in (3'/5')-degradation of rRNA, mRNA and tRNA precursors as well as intron accumulation. We believe DiffSegR will benefit biologists working on transcriptomics as it allows access to information from a layer of the transcriptome overlooked by the classical differential expression analysis pipelines widely used today. DiffSegR is available at https://aliehrmann.github.io/DiffSegR/index.html.
Collapse
Affiliation(s)
- Arnaud Liehrmann
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
- Laboratoire de Mathématiques et de Modélisation d’Evry (LaMME), Université d’Evry-Val-d’Essonne, UMR CNRS 8071, ENSIIE, USC INRAE, Evry,91037, France
| | - Etienne Delannoy
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
| | - Alexandra Launay-Avon
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
| | - Elodie Gilbault
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000, Versailles, France
| | - Olivier Loudet
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000, Versailles, France
| | - Benoît Castandet
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
| | - Guillem Rigaill
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
- Laboratoire de Mathématiques et de Modélisation d’Evry (LaMME), Université d’Evry-Val-d’Essonne, UMR CNRS 8071, ENSIIE, USC INRAE, Evry,91037, France
| |
Collapse
|
2
|
Liu H, Ma T, Liu C, Liu S. Causal Responsibility Division of Chronological Continuous Treatment Based on Change-Point Detection. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1164. [PMID: 37628194 PMCID: PMC10453889 DOI: 10.3390/e25081164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/18/2023] [Accepted: 07/31/2023] [Indexed: 08/27/2023]
Abstract
This paper introduces a novel approach, called causal relation quantification, based on change-point detection to address the issue of harmonic responsibility division in power systems. The proposed method focuses on determining the causal effect of chronological continuous treatment, enabling the identification of crucial treatment intervals. Within each interval, three propensity-score-based algorithms are executed to assess their respective causal effects. By integrating the results from each interval, the overall causal effect of a chronological continuous treatment variable can be calculated. This calculated overall causal effect represents the causal responsibility of each harmonic customer. The effectiveness of the proposed method is evaluated through a simulation study and demonstrated in an empirical harmonic application. The results of the simulation study indicate that our method provides accurate and robust estimates, while the calculated results in the harmonic application align closely with the real-world scenario as verified by on-site investigations.
Collapse
Affiliation(s)
- Hang Liu
- School of Statistics, Southwestern University of Finance and Economics, Chengdu 611130, China; (H.L.); (T.M.)
| | - Tiefeng Ma
- School of Statistics, Southwestern University of Finance and Economics, Chengdu 611130, China; (H.L.); (T.M.)
| | - Conan Liu
- Business School, University of New South Wales, Sydney, NSW 2052, Australia;
| | - Shuangzhe Liu
- Faculty of Science and Technology, University of Canberra, Bruce, ACT 2617, Australia
| |
Collapse
|
4
|
Cappello L, Madrid Padilla OH, Palacios JA. Bayesian change point detection with spike and slab priors. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2023.2182312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Affiliation(s)
| | | | - Julia A. Palacios
- Departments of Statistics and Biomedical Data Science, Stanford University
| |
Collapse
|
6
|
Quinn M, Chung A, Glass K. Automated selection of changepoints using empirical P-values and trimming. JAMIA Open 2022; 5:ooac090. [PMID: 36325307 PMCID: PMC9617685 DOI: 10.1093/jamiaopen/ooac090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 09/27/2022] [Accepted: 10/14/2022] [Indexed: 11/25/2022] Open
Abstract
OBJECTIVES One challenge that arises when analyzing mobile health (mHealth) data is that updates to the proprietary algorithms that process these data can change apparent patterns. Since the timings of these updates are not publicized, an analytic approach is necessary to determine whether changes in mHealth data are due to lifestyle behaviors or algorithmic updates. Existing methods for identifying changepoints do not consider multiple types of changepoints, may require prespecifying the number of changepoints, and often involve nonintuitive parameters. We propose a novel approach, Automated Selection of Changepoints using Empirical P-values and Trimming (ASCEPT), to select an optimal set of changepoints in mHealth data. MATERIALS AND METHODS ASCEPT involves 2 stages: (1) identification of a statistically significant set of changepoints from sequential iterations of a changepoint detection algorithm; and (2) trimming changepoints within linear and seasonal trends. ASCEPT is available at https://github.com/matthewquinn1/changepointSelect. RESULTS We demonstrate ASCEPT's utility using real-world mHealth data collected through the Precision VISSTA study. We also demonstrate that ASCEPT outperforms a comparable method, circular binary segmentation, and illustrate the impact when adjusting for changepoints in downstream analysis. DISCUSSION ASCEPT offers a practical approach for identifying changepoints in mHealth data that result from algorithmic updates. ASCEPT's only required parameters are a significance level and goodness-of-fit threshold, offering a more intuitive option compared to other approaches. CONCLUSION ASCEPT provides an intuitive and useful way to identify which changepoints in mHealth data are likely the result of updates to the underlying algorithms that process the data.
Collapse
Affiliation(s)
- Matthew Quinn
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Arlene Chung
- Department of Biostatistics & Bioinformatics, Duke School of Medicine, Durham, North Carolina, USA
| | - Kimberly Glass
- Corresponding Author: Kimberly Glass, Channing Division of Network Medicine, Brigham and Women’s Hospital, 181 Longwood Ave., Boston, MA, USA;
| |
Collapse
|
8
|
Sen N. Investigation of Regression-Based Effect Size Methods Developed in Single-Subject Studies. Behav Modif 2021; 46:1346-1382. [PMID: 34727705 DOI: 10.1177/01454455211054018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The purpose of this study is to provide a brief introduction to effect size calculation in single-subject design studies, including a description of nonparametric and regression-based effect sizes. We then focus the rest of the tutorial on common regression-based methods used to calculate effect size in single-subject experimental studies. We start by first describing the difference between five regression-based methods (Gorsuch, White et al., Center et al., Allison and Gorman, Huitema and McKean). This is followed by an example using the five regression-based effect size methods and a demonstration how these methods can be applied using a sample data set. In this way, the question of how the values obtained from different effect size methods differ was answered. The specific regression models used in these five regression-based methods and how these models can be obtained from the SPSS program were shown. R2 values obtained from these five methods were converted to Cohen's d value and compared in this study. The d values obtained from the same data set were estimated as 0.003, 0.357, 2.180, 3.470, and 2.108 for the Allison and Gorman, Gorsuch, White et al., Center et al., as well as for Huitema and McKean methods, respectively. A brief description of selected statistical programs available to conduct regression-based methods was given.
Collapse
Affiliation(s)
- Nihal Sen
- Bolu Abant Izzet Baysal University, Turkey
| |
Collapse
|
9
|
Liehrmann A, Rigaill G, Hocking TD. Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models. BMC Bioinformatics 2021; 22:323. [PMID: 34126932 PMCID: PMC8201703 DOI: 10.1186/s12859-021-04221-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/19/2021] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. RESULTS Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS ( https://github.com/aLiehrmann/CROCS ), detect the peaks more accurately than algorithms which rely on natural assumptions. CONCLUSION The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.
Collapse
Affiliation(s)
- Arnaud Liehrmann
- Institut des Sciences des Plantes de Paris-Saclay (IPS2), Université Paris-Saclay, Université Evry, CNRS, INRAE, 91405 Orsay, France
- Laboratoire de Mathématiques et Modélisation d’Evry (LAMME), Université Paris-Saclay, Université Evry, CNRS, 91037 Evry, France
| | - Guillem Rigaill
- Institut des Sciences des Plantes de Paris-Saclay (IPS2), Université Paris-Saclay, Université Evry, CNRS, INRAE, 91405 Orsay, France
- Laboratoire de Mathématiques et Modélisation d’Evry (LAMME), Université Paris-Saclay, Université Evry, CNRS, 91037 Evry, France
| | - Toby Dylan Hocking
- School of Informatics, Computing, and Cyber Systems (SICCS), Northern Arizona University, 86011 Flagstaff, AZ USA
| |
Collapse
|