1
|
Liang M, Koslovsky MD, Hébert ET, Kendzor DE, Businelle MS, Vannucci M. Bayesian continuous-time hidden Markov models with covariate selection for intensive longitudinal data with measurement error. Psychol Methods 2023; 28:880-894. [PMID: 34928674 PMCID: PMC9207158 DOI: 10.1037/met0000433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Intensive longitudinal data collected with ecological momentary assessment methods capture information on participants' behaviors, feelings, and environment in near real-time. While these methods can reduce recall biases typically present in survey data, they may still suffer from other biases commonly found in self-reported data (e.g., measurement error and social desirability bias). To accommodate potential biases, we develop a Bayesian hidden Markov model to simultaneously identify risk factors for subjects transitioning between discrete latent states as well as risk factors potentially associated with them misreporting their true behaviors. We use simulated data to demonstrate how ignoring potential measurement error can negatively affect variable selection performance and estimation accuracy. We apply our proposed model to smartphone-based ecological momentary assessment data collected within a randomized controlled trial that evaluated the impact of incentivizing abstinence from cigarette smoking among socioeconomically disadvantaged adults. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
| | | | - Emily T. Hébert
- Department of Health Promotion and Behavioral Sciences, University of Texas Health Science Center at Austin (UTHealth) School of Public Health
| | - Darla E. Kendzor
- Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center
| | - Michael S. Businelle
- Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center
| | | |
Collapse
|
2
|
Baldoni PL, Rashid NU, Ibrahim JG. Improved detection of epigenomic marks with mixed-effects hidden Markov models. Biometrics 2019; 75:1401-1413. [PMID: 31081192 PMCID: PMC6851437 DOI: 10.1111/biom.13083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 05/03/2019] [Indexed: 11/30/2022]
Abstract
Chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is a technique to detect genomic regions containing protein-DNA interaction, such as transcription factor binding sites or regions containing histone modifications. One goal of the analysis of ChIP-seq experiments is to identify genomic loci enriched for sequencing reads pertaining to DNA bound to the factor of interest. The accurate identification of such regions aids in the understanding of epigenomic marks and gene regulatory mechanisms. Given the reduction of massively parallel sequencing costs, methods to detect consensus regions of enrichment across multiple samples are of interest. Here, we present a statistical model to detect broad consensus regions of enrichment from ChIP-seq technical or biological replicates through a class of zero-inflated mixed-effects hidden Markov models. We show that the proposed model outperforms existing methods for consensus peak calling in common epigenomic marks by accounting for the excess zeros and sample-specific biases. We apply our method to data from the Encyclopedia of DNA Elements and Roadmap Epigenomics projects and also from an extensive simulation study.
Collapse
Affiliation(s)
- Pedro L. Baldoni
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Naim U. Rashid
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Joseph G. Ibrahim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
3
|
Rashid NU, Li Q, Yeh JJ, Ibrahim JG. Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction. J Am Stat Assoc 2019; 115:1125-1138. [PMID: 33012902 DOI: 10.1080/01621459.2019.1671197] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
In the genomic era, the identification of gene signatures associated with disease is of significant interest. Such signatures are often used to predict clinical outcomes in new patients and aid clinical decision-making. However, recent studies have shown that gene signatures are often not replicable. This occurrence has practical implications regarding the generalizability and clinical applicability of such signatures. To improve replicability, we introduce a novel approach to select gene signatures from multiple datasets whose effects are consistently non-zero and account for between-study heterogeneity. We build our model upon some rank-based quantities, facilitating integration over different genomic datasets. A high dimensional penalized Generalized Linear Mixed Model (pGLMM) is used to select gene signatures and address data heterogeneity. We compare our method to some commonly used strategies that select gene signatures ignoring between-study heterogeneity. We provide asymptotic results justifying the performance of our method and demonstrate its advantage in the presence of heterogeneity through thorough simulation studies. Lastly, we motivate our method through a case study subtyping pancreatic cancer patients from four gene expression studies.
Collapse
Affiliation(s)
- Naim U Rashid
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A.,Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A
| | - Quefeng Li
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A
| | - Jen Jen Yeh
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A.,Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A.,Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A
| | - Joseph G Ibrahim
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A
| |
Collapse
|
4
|
Shi X, Ma S, Huang Y. Promoting sign consistency in the cure model estimation and selection. Stat Methods Med Res 2019; 29:15-28. [PMID: 30600776 DOI: 10.1177/0962280218820356] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In survival analysis, when a subset of subjects has extremely long survival, the two-part cure rate model has been commonly adopted. In the two-part model, the first part is for a binary response and describes the probability of cure. The second part is for a survival response and describes the probability of survival. Despite their intuitive interconnections, most of the existing works estimate the two parts without any constraint. The existing works on proportionality promote similarity in magnitudes (i.e. quantitative similarity) and can be too restrictive. In this study, for the two-part cure rate model, we propose imposing a sign-based penalty to promote similarity in signs (i.e. qualitative similarity). The proposed strategy can be more informative than those that neglect the two-part interconnections and be less restrictive than the existing proportionality works. Penalty is also imposed to select relevant variables and accommodate high-dimensional data. Numerical studies, including simulation and two data analyses, demonstrate the advantageous performance of the proposed approach.
Collapse
Affiliation(s)
- Xingjie Shi
- Department of Statistics, Nanjing University of Finance and Economics, Nanjing, Jiangsu, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Yuan Huang
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
5
|
Choi H, Ghosh D, Qin Z. Computationally Tractable Multivariate HMM in Genome-Wide Mapping Studies. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2018; 1552:135-148. [PMID: 28224496 DOI: 10.1007/978-1-4939-6753-7_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Hidden Markov model (HMM) is widely used for modeling spatially correlated genomic data (series data). In genomics, datasets of this kind are generated from genome-wide mapping studies through high-throughput methods such as chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq). When multiple regulatory protein binding sites or related epigenetic modifications are mapped simultaneously, the correlation between data series can be incorporated into the latent variable inference in a multivariate form of HMM, potentially increasing the statistical power of signal detection. In this chapter, we review the challenges of multivariate HMMs and propose a computationally tractable method called sparsely correlated HMMs (scHMM). We illustrate the method and the scHMM package using an example mouse ChIP-seq dataset.
Collapse
Affiliation(s)
- Hyungwon Choi
- Saw Swee Hock School of Public Health, Tahir Foundation Building, National University of Singapore, Singapore, 117549, Singapore.
| | - Debashis Ghosh
- Department of Biostatstics & Informatics, University of Colorado Anschutz Medical Campus, University Park, State College, PA, USA
| | - Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA, USA
| |
Collapse
|
6
|
Richardson S, Tseng GC, Sun W. Statistical Methods in Integrative Genomics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2016; 3:181-209. [PMID: 27482531 PMCID: PMC4963036 DOI: 10.1146/annurev-statistics-041715-033506] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.
Collapse
Affiliation(s)
- Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, CB2 0SR, United Kingdom
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261
| | - Wei Sun
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 27516
| |
Collapse
|