1
|
Miranda MF. A canonical polyadic tensor basis for fast Bayesian estimation of multi-subject brain activation patterns. Front Neuroinform 2024; 18:1399391. [PMID: 39188665 PMCID: PMC11345152 DOI: 10.3389/fninf.2024.1399391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/29/2024] [Indexed: 08/28/2024] Open
Abstract
Task-evoked functional magnetic resonance imaging studies, such as the Human Connectome Project (HCP), are a powerful tool for exploring how brain activity is influenced by cognitive tasks like memory retention, decision-making, and language processing. A fast Bayesian function-on-scalar model is proposed for estimating population-level activation maps linked to the working memory task. The model is based on the canonical polyadic (CP) tensor decomposition of coefficient maps obtained for each subject. This decomposition effectively yields a tensor basis capable of extracting both common features and subject-specific features from the coefficient maps. These subject-specific features, in turn, are modeled as a function of covariates of interest using a Bayesian model that accounts for the correlation of the CP-extracted features. The dimensionality reduction achieved with the tensor basis allows for a fast MCMC estimation of population-level activation maps. This model is applied to one hundred unrelated subjects from the HCP dataset, yielding significant insights into brain signatures associated with working memory.
Collapse
Affiliation(s)
- Michelle F. Miranda
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| |
Collapse
|
2
|
Zhang S, Morrison J, Sun T, Kowal DR, Greene E. Evaluating integration of letter fragments through contrast and spatially targeted masking. J Vis 2024; 24:9. [PMID: 38856981 PMCID: PMC11174100 DOI: 10.1167/jov.24.6.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 04/02/2024] [Indexed: 06/11/2024] Open
Abstract
Four experiments were conducted to gain a better understanding of the visual mechanisms related to how integration of partial shape cues provides for recognition of the full shape. In each experiment, letters formed as outline contours were displayed as a sequence of adjacent segments (fragments), each visible during a 17-ms time frame. The first experiment varied the contrast of the fragments. There were substantial individual differences in contrast sensitivity, so stimulus displays in the masking experiments that followed were calibrated to the sensitivity of each participant. Masks were displayed either as patterns that filled the entire screen (full field) or as successive strips that were sliced from the pattern, each strip lying across the location of the letter fragment that had been shown a moment before. Contrast of masks were varied to be lighter or darker than the letter fragments. Full-field masks, whether light or dark, provided relatively little impairment of recognition, as was the case for mask strips that were lighter than the letter fragments. However, dark strip masks proved to be very effective, with the degree of recognition impairment becoming larger as mask contrast was increased. A final experiment found the strip masks to be most effective when they overlapped the location where the letter fragments had been shown a moment before. They became progressively less effective with increased spatial separation from that location. Results are discussed with extensive reference to potential brain mechanisms for integrating shape cues.
Collapse
Affiliation(s)
- Sherry Zhang
- Department of Psychology, University of Southern California, Los Angeles, CA, USA
| | | | - Thomas Sun
- Department of Statistics, Rice University, Houston, TX, USA
| | - Daniel R Kowal
- Department of Statistics, Rice University, Houston, TX, USA
| | - Ernest Greene
- Department of Psychology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
3
|
Shokoohi F, Stephens DA, Greenwood CMT. Identifying Differential Methylation in Cancer Epigenetics via a Bayesian Functional Regression Model. Biomolecules 2024; 14:639. [PMID: 38927043 PMCID: PMC11201607 DOI: 10.3390/biom14060639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/20/2024] [Accepted: 05/20/2024] [Indexed: 06/28/2024] Open
Abstract
DNA methylation plays an essential role in regulating gene activity, modulating disease risk, and determining treatment response. We can obtain insight into methylation patterns at a single-nucleotide level via next-generation sequencing technologies. However, complex features inherent in the data obtained via these technologies pose challenges beyond the typical big data problems. Identifying differentially methylated cytosines (dmc) or regions is one such challenge. We have developed DMCFB, an efficient dmc identification method based on Bayesian functional regression, to tackle these challenges. Using simulations, we establish that DMCFB outperforms current methods and results in better smoothing and efficient imputation. We analyzed a dataset of patients with acute promyelocytic leukemia and control samples. With DMCFB, we discovered many new dmcs and, more importantly, exhibited enhanced consistency of differential methylation within islands and their adjacent shores. Additionally, we detected differential methylation at more of the binding sites of the fused gene involved in this cancer.
Collapse
Affiliation(s)
- Farhad Shokoohi
- Department of Mathematical Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - David A. Stephens
- Department of Mathematics and Statistics, McGill University, Montreal, QC H3A 0B9, Canada;
| | - Celia M. T. Greenwood
- Lady Davis Institute for Medical Research, Montreal, QC H3T 1E2, Canada;
- Gerald Bronfman Department of Oncology, McGill University, Montreal, QC H4A 3T2, Canada
- Department of Epidemiology, Biostatistics & Occupational Health, McGill University, Montreal, QC H3A 1G1, Canada
| |
Collapse
|
4
|
Sergazinov R, Leroux A, Cui E, Crainiceanu C, Aurora RN, Punjabi NM, Gaynanova I. A case study of glucose levels during sleep using multilevel fast function on scalar regression inference. Biometrics 2023; 79:3873-3882. [PMID: 37189239 DOI: 10.1111/biom.13878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 04/26/2023] [Indexed: 05/17/2023]
Abstract
Continuous glucose monitors (CGMs) are increasingly used to measure blood glucose levels and provide information about the treatment and management of diabetes. Our motivating study contains CGM data during sleep for 174 study participants with type II diabetes mellitus measured at a 5-min frequency for an average of 10 nights. We aim to quantify the effects of diabetes medications and sleep apnea severity on glucose levels. Statistically, this is an inference question about the association between scalar covariates and functional responses observed at multiple visits (sleep periods). However, many characteristics of the data make analyses difficult, including (1) nonstationary within-period patterns; (2) substantial between-period heterogeneity, non-Gaussianity, and outliers; and (3) large dimensionality due to the number of study participants, sleep periods, and time points. For our analyses, we evaluate and compare two methods: fast univariate inference (FUI) and functional additive mixed models (FAMMs). We extend FUI and introduce a new approach for testing the hypotheses of no effect and time invariance of the covariates. We also highlight areas for further methodological development for FAMM. Our study reveals that (1) biguanide medication and sleep apnea severity significantly affect glucose trajectories during sleep and (2) the estimated effects are time invariant.
Collapse
Affiliation(s)
- Renat Sergazinov
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| | - Andrew Leroux
- Department of Biostatistics & Informatics, University of Colorado Anschutz Medical Campus, Colorado, USA
| | - Erjia Cui
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Ciprian Crainiceanu
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - R Nisha Aurora
- New York University Grossman School of Medicine, New York, New York, USA
| | - Naresh M Punjabi
- Miller School of Medicine, University of Miami, Coral Gables, Florida, USA
| | - Irina Gaynanova
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
5
|
Shokoohi F, Khaniki SH. Uncovering Alterations in Cancer Epigenetics via Trans-Dimensional Markov Chain Monte Carlo and Hidden Markov Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545168. [PMID: 37398181 PMCID: PMC10312753 DOI: 10.1101/2023.06.15.545168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Epigenetic alterations are key drivers in the development and progression of cancer. Identifying differentially methylated cytosines (DMCs) in cancer samples is a crucial step toward understanding these changes. In this paper, we propose a trans-dimensional Markov chain Monte Carlo (TMCMC) approach that uses hidden Markov models (HMMs) with binomial emission, and bisulfite sequencing (BS-Seq) data, called DMCTHM, to identify DMCs in cancer epigenetic studies. We introduce the Expander-Collider penalty to tackle under and over-estimation in TMCMC-HMMs. We address all known challenges inherent in BS-Seq data by introducing novel approaches for capturing functional patterns and autocorrelation structure of the data, as well as for handling missing values, multiple covariates, multiple comparisons, and family-wise errors. We demonstrate the effectiveness of DMCTHM through comprehensive simulation studies. The results show that our proposed method outperforms other competing methods in identifying DMCs. Notably, with DMCTHM, we uncovered new DMCs and genes in Colorectal cancer that were significantly enriched in the Tp53 pathway.
Collapse
Affiliation(s)
- Farhad Shokoohi
- Department of Mathematical Sciences, University of Nevada-Las Vegas, Las Vega, NV 89154, USA
| | - Saeedeh Hajebi Khaniki
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
6
|
Stöcker A, Steyer L, Greven S. Functional additive models on manifolds of planar shapes and forms. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2023.2175687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Almond Stöcker
- School of Business and Economics, Humboldt-Universität zu Berlin
| | - Lisa Steyer
- School of Business and Economics, Humboldt-Universität zu Berlin
| | - Sonja Greven
- School of Business and Economics, Humboldt-Universität zu Berlin
| |
Collapse
|
7
|
Huo S, Morris JS, Zhu H. Ultra-Fast Approximate Inference Using Variational Functional Mixed Models. J Comput Graph Stat 2022; 32:353-365. [PMID: 37608921 PMCID: PMC10441618 DOI: 10.1080/10618600.2022.2107532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 07/23/2022] [Indexed: 10/16/2022]
Abstract
While Bayesian functional mixed models have been shown effective to model functional data with various complex structures, their application to extremely high-dimensional data is limited due to computational challenges involved in posterior sampling. We introduce a new computational framework that enables ultra-fast approximate inference for high-dimensional data in functional form. This framework adopts parsimonious basis to represent functional observations, which facilitates efficient compression and parallel computing in basis space. Instead of performing expensive Markov chain Monte Carlo sampling, we approximate the posterior distribution using variational Bayes and adopt a fast iterative algorithm to estimate parameters of the approximate distribution. Our approach facilitates a fast multiple testing procedure in basis space, which can be used to identify significant local regions that reflect differences across groups of samples. We perform two simulation studies to assess the performance of approximate inference, and demonstrate applications of the proposed approach by using a proteomic mass spectrometry dataset and a brain imaging dataset. Supplementary materials are available online.
Collapse
Affiliation(s)
| | - Jeffrey S Morris
- Department of Biostatistics, Epidemiology and Informatics, Department of Statistics, University of Pennsylvania
| | | |
Collapse
|
8
|
Meyer MJ, Morris JS, Gazes RP, Coull BA. Ordinal probit functional outcome regression with application to computer-use behavior in rhesus monkeys. Ann Appl Stat 2022; 16:537-550. [DOI: 10.1214/21-aoas1513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Mark J. Meyer
- Department of Mathematics and Statistics, Georgetown University
| | - Jeffrey S. Morris
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania
| | - Regina Paxton Gazes
- Department of Psychology and Program in Animal Behavior, Bucknell University
| | - Brent A. Coull
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| |
Collapse
|
9
|
Roy A, Ghosal S. Optimal Bayesian smoothing of functional observations over a large graph. J MULTIVARIATE ANAL 2021. [DOI: 10.1016/j.jmva.2021.104876] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
10
|
Kowal DR, Bravo M, Leong H, Bui A, Griffin RJ, Ensor KB, Miranda ML. Bayesian variable selection for understanding mixtures in environmental exposures. Stat Med 2021; 40:4850-4871. [PMID: 34132416 PMCID: PMC8440371 DOI: 10.1002/sim.9099] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 05/26/2021] [Accepted: 05/26/2021] [Indexed: 11/10/2022]
Abstract
Social and environmental stressors are crucial factors in child development. However, there exists a multitude of measurable social and environmental factors-the effects of which may be cumulative, interactive, or null. Using a comprehensive cohort of children in North Carolina, we study the impact of social and environmental variables on 4th end-of-grade exam scores in reading and mathematics. To identify the essential factors that predict these educational outcomes, we design new tools for Bayesian linear variable selection using decision analysis. We extract a predictive optimal subset of explanatory variables by coupling a loss function with a novel model-based penalization scheme, which leads to coherent Bayesian decision analysis and empirically improves variable selection, estimation, and prediction on simulated data. The Bayesian linear model propagates uncertainty quantification to all predictive evaluations, which is important for interpretable and robust model comparisons. These predictive comparisons are conducted out-of-sample with a customized approximation algorithm that avoids computationally intensive model refitting. We apply our variable selection techniques to identify the joint collection of social and environmental stressors-and their interactions-that offer clear and quantifiable improvements in prediction of reading and mathematics exam scores.
Collapse
Affiliation(s)
| | - Mercedes Bravo
- Biostatistics and Epidemiology Division, RTI International,
North Carolina, U.S.A
- Children’s Environmental Health Initiative,
University of Notre Dame, Indiana, U.S.A
| | - Henry Leong
- Children’s Environmental Health Initiative,
University of Notre Dame, Indiana, U.S.A
| | - Alexander Bui
- Department of Civil and Environmental Engineering, Rice
University, Texas, U.S.A
| | - Robert J. Griffin
- Department of Civil and Environmental Engineering, Rice
University, Texas, U.S.A
| | | | - Marie Lynn Miranda
- Children’s Environmental Health Initiative,
University of Notre Dame, Indiana, U.S.A
- Department of Applied and Computational Mathematics and
Statistics, University of Notre Dame, Indiana, U.S.A
| |
Collapse
|
11
|
Zemplenyi M, Meyer MJ, Cardenas A, Hivert MF, Rifas-Shiman SL, Gibson H, Kloog I, Schwartz J, Oken E, DeMeo DL, Gold DR, Coull BA. Function-on-function regression for the identification of epigenetic regions exhibiting windows of susceptibility to environmental exposures. Ann Appl Stat 2021; 15:1366-1385. [DOI: 10.1214/20-aoas1425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Michele Zemplenyi
- Department of Biostatistics, Harvard T. H. Chan School of Public Health
| | - Mark J. Meyer
- Department of Mathematics and Statistics, Georgetown University
| | - Andres Cardenas
- Division of Environmental Health Sciences, University of California, Berkeley
| | | | | | - Heike Gibson
- Department of Environmental Health, Harvard T. H. Chan School of Public Health
| | - Itai Kloog
- Department of Geography and Environmental Development, Ben-Gurion University
| | - Joel Schwartz
- Department of Environmental Health, Harvard T. H. Chan School of Public Health
| | - Emily Oken
- Department of Population Medicine, Harvard Medical School
| | - Dawn L. DeMeo
- Center for Chest Diseases, Brigham and Women’s Hospital
| | - Diane R. Gold
- Department of Environmental Health, Harvard T. H. Chan School of Public Health
| | - Brent A. Coull
- Department of Biostatistics, Harvard T. H. Chan School of Public Health
| |
Collapse
|
12
|
Moran KR, Dunson D, Wheeler MW, Herring AH. Bayesian joint modeling of chemical structure and dose response curves. Ann Appl Stat 2021; 15:1405-1430. [DOI: 10.1214/21-aoas1461] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - David Dunson
- Department of Statistical Science, Duke University
| | - Matthew W. Wheeler
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences
| | | |
Collapse
|
13
|
Meyer MJ, Malloy EJ, Coull BA. Bayesian Wavelet-packet Historical Functional Linear Models. STATISTICS AND COMPUTING 2021; 31:14. [PMID: 36324372 PMCID: PMC9624484 DOI: 10.1007/s11222-020-09981-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 10/21/2020] [Indexed: 06/16/2023]
Abstract
Historical Functional Linear Models (HFLM) quantify associations between a functional predictor and functional outcome where the predictor is an exposure variable that occurs before, or at least concurrently with, the outcome. Prior work on the HFLM has largely focused on estimation of a surface that represents a time-varying association between the functional outcome and the functional exposure. This existing work has employed frequentist and spline-based estimation methods, with little attention paid to formal inference or adjustment for multiple testing and no approaches that implement wavelet-bases. In this work, we propose a new functional regression model that estimates the time-varying, lagged association between a functional outcome and a functional exposure. Building off of recently developed function-on-function regression methods, the model employs a novel use the wavelet-packet decomposition of the exposure and outcome functions that allows us to strictly enforce the temporal ordering of exposure and outcome, which is not possible with existing wavelet-based functional models. Using a fully Bayesian approach, we conduct formal inference on the time-varying lagged association, while adjusting for multiple testing. We investigate the operating characteristics of our wavelet-packet HFLM and compare them to those of two existing estimation procedures in simulation. We also assess several inference techniques and use the model to analyze data on the impact of lagged exposure to particulate matter finer than 2.5μg, or PM2.5, on heart rate variability in a cohort of journeyman boilermakers during the morning of a typical day's shift.
Collapse
Affiliation(s)
- Mark J Meyer
- Department of Mathematics and Statistics, Georgetown University
| | | | - Brent A Coull
- Department of Biostatistics, Harvard T. H. Chan School of Public Health
| |
Collapse
|
14
|
Luo R, Qi X. Functional Regression for Densely Observed Data With Novel Regularization. J Comput Graph Stat 2020. [DOI: 10.1080/10618600.2020.1807994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Ruiyan Luo
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA
| | - Xin Qi
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA
| |
Collapse
|
15
|
Cao G, Wang S, Wang L. Estimation and inference for functional linear regression models with partially varying regression coefficients. Stat (Int Stat Inst) 2020. [DOI: 10.1002/sta4.286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Guanqun Cao
- Department of Mathematics and Statistics Auburn University Auburn 36849 AL USA
| | - Shuoyang Wang
- Department of Mathematics and Statistics Auburn University Auburn 36849 AL USA
| | - Lily Wang
- Department of Statistics Iowa State University Ames 50011 IA USA
| |
Collapse
|
16
|
Liu Y, Li M, Morris JS. FUNCTION-ON-SCALAR QUANTILE REGRESSION WITH APPLICATION TO MASS SPECTROMETRY PROTEOMICS DATA. Ann Appl Stat 2020; 14:521-541. [PMID: 37981999 PMCID: PMC10655915 DOI: 10.1214/19-aoas1319] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2023]
Abstract
Mass spectrometry proteomics, characterized by spiky, spatially heterogeneous functional data, can be used to identify potential cancer biomarkers. Existing mass spectrometry analyses utilize mean regression to detect spectral regions that are differentially expressed across groups. However, given the inter-patient heterogeneity that is a key hallmark of cancer, many biomarkers are only present at aberrant levels for a subset of, not all, cancer samples. Differences in these biomarkers can easily be missed by mean regression, but might be more easily detected by quantile-based approaches. Thus, we propose a unified Bayesian framework to perform quantile regression on functional responses. Our approach utilizes an asymmetric Laplace working likelihood, represents the functional coefficients with basis representations which enable borrowing of strength from nearby locations, and places a global-local shrinkage prior on the basis coefficients to achieve adaptive regularization. Different types of basis transform and continuous shrinkage priors can be used in our framework. A scalable Gibbs sampler is developed to generate posterior samples that can be used to perform Bayesian estimation and inference while accounting for multiple testing. Our framework performs quantile regression and coefficient regularization in a unified manner, allowing them to inform each other and leading to improvement in performance over competing methods as demonstrated by simulation studies. We also introduce an adjustment procedure to the model to improve its frequentist properties of posterior inference. We apply our model to identify proteomic biomarkers of pancreatic cancer that are differentially expressed for a subset of cancer patients compared to the normal controls, which were missed by previous mean-regression based approaches. Supplementary materials for this article are available online.
Collapse
|
17
|
Kowal DR, Bourgeois DC. Bayesian Function-on-Scalars Regression for High-Dimensional Data. J Comput Graph Stat 2020. [DOI: 10.1080/10618600.2019.1710837] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
18
|
Noh H, Choi T, Park J, Chung Y. Bayesian latent factor regression for multivariate functional data with variable selection. J Korean Stat Soc 2020. [DOI: 10.1007/s42952-019-00044-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
19
|
Zhu H, Chen K, Luo X, Yuan Y, Wang JL. FMEM: Functional Mixed Effects Models for Longitudinal Functional Responses. Stat Sin 2019; 29:2007-2033. [PMID: 31745381 DOI: 10.5705/ss.202017.0505] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The aim of this paper is to conduct a systematic and theoretical analysis of estimation and inference for a class of functional mixed effects models (FMEM). Such FMEMs consist of fixed effects that characterize the association between longitudinal functional responses and covariates of interest and random effects that capture the spatial-temporal correlations of longitudinal functional responses. We propose local linear estimates of refined fixed effect functions and establish their weak convergence along with a simultaneous confidence band for each fixed-effect function. We propose a global test for the linear hypotheses of varying coefficient functions and derive the associated asymptotic distribution under the null hypothesis and the asymptotic power under the alternative hypothesis are derived. We also establish the convergence rates of the estimated spatial-temporal covariance operators and their associated eigenvalues and eigenfunctions. We conduct extensive simulations and apply our method to a white-matter fiber data set from a national database for autism research to examine the finite-sample performance of the proposed estimation and inference procedures.
Collapse
Affiliation(s)
- Hongtu Zhu
- The University of Texas MD Anderson Cancer Center
| | | | | | - Ying Yuan
- The University of Texas MD Anderson Cancer Center.,University of Pittsburgh.,Statistics & Decision Sciences.,University of California at Davis
| | | |
Collapse
|
20
|
Wu R, Wang B. Coherent mortality forecasting by the weighted multilevel functional principal component approach. J Appl Stat 2019. [DOI: 10.1080/02664763.2019.1572718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Ruhao Wu
- Department of Mathematics, University of Leicester, Leicester, UK
| | - Bo Wang
- Department of Mathematics, University of Leicester, Leicester, UK
| |
Collapse
|
21
|
Yang H, Baladandayuthapani V, Rao AUK, Morris JS. Quantile Function on Scalar Regression Analysis for Distributional Data. J Am Stat Assoc 2019; 115:90-106. [PMID: 32981991 PMCID: PMC7517594 DOI: 10.1080/01621459.2019.1609969] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 03/08/2019] [Accepted: 04/07/2019] [Indexed: 02/05/2023]
Abstract
Radiomics involves the study of tumor images to identify quantitative markers explaining cancer heterogeneity. The predominant approach is to extract hundreds to thousands of image features, including histogram features comprised of summaries of the marginal distribution of pixel intensities, which leads to multiple testing problems and can miss out on insights not contained in the selected features. In this paper, we present methods to model the entire marginal distribution of pixel intensities via the quantile function as functional data, regressed on a set of demographic, clinical, and genetic predictors to investigate their effects of imaging-based cancer heterogeneity. We call this approach quantile functional regression, regressing subject-specific marginal distributions across repeated measurements on a set of covariates, allowing us to assess which covariates are associated with the distribution in a global sense, as well as to identify distributional features characterizing these differences, including mean, variance, skewness, heavy-tailedness, and various upper and lower quantiles. To account for smoothness in the quantile functions, account for intrafunctional correlation, and gain statistical power, we introduce custom basis functions we call quantlets that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set and containing a Gaussian subspace so non-Gaussianness can be assessed. We fit this model using a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and provides fully Bayesian inference after fitting a Markov chain Monte Carlo. We demonstrate the benefit of the basis space modeling through simulation studies, and apply the method to Magnetic resonance imaging (MRI) based radiomic dataset from Glioblastoma Multiforme to relate imaging-based quantile functions to various demographic, clinical, and genetic predictors, finding specific differences in tumor pixel intensity distribution between males and females and between tumors with and without DDIT3 mutations.
Collapse
Affiliation(s)
- Hojin Yang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | | | - Arvind U K Rao
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | - Jeffrey S Morris
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| |
Collapse
|
22
|
Kowal DR, Matteson DS, Ruppert D. Dynamic shrinkage processes. J R Stat Soc Series B Stat Methodol 2019. [DOI: 10.1111/rssb.12325] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
23
|
Affiliation(s)
- Ruiyan Luo
- Division of Epidemiology and Biostatistics, School of Public Health, Georgia State University, Atlanta, GA
| | - Xin Qi
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA
| |
Collapse
|
24
|
Zhu H, Versace F, Cinciripini PM, Rausch P, Morris JS. Robust and Gaussian spatial functional regression models for analysis of event-related potentials. Neuroimage 2018; 181:501-512. [PMID: 30057352 DOI: 10.1016/j.neuroimage.2018.07.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 06/01/2018] [Accepted: 07/03/2018] [Indexed: 10/28/2022] Open
Abstract
Event-related potentials (ERPs) summarize electrophysiological brain response to specific stimuli. They can be considered as correlated functions of time with both spatial correlation across electrodes and nested correlations within subjects. Commonly used analytical methods for ERPs often focus on pre-determined extracted components and/or ignore the correlation among electrodes or subjects, which can miss important insights, and tend to be sensitive to outlying subjects, time points or electrodes. Motivated by ERP data in a smoking cessation study, we introduce a Bayesian spatial functional regression framework that models the entire ERPs as spatially correlated functional responses and the stimulus types as covariates. This novel framework relies on mixed models to characterize the effects of stimuli while simultaneously accounting for the multilevel correlation structure. The spatial correlation among the ERP profiles is captured through basis-space Matérn assumptions that allow either separable or nonseparable spatial correlations over time. We induce both adaptive regularization over time and spatial smoothness across electrodes via a correlated normal-exponential-gamma (CNEG) prior on the fixed effect coefficient functions. Our proposed framework includes both Gaussian models as well as robust models using heavier-tailed distributions to make the regression automatically robust to outliers. We introduce predictive methods to select among Gaussian vs. robust models and models with separable vs. non-separable spatiotemporal correlation structures. Our proposed analysis produces global tests for stimuli effects across entire time (or time-frequency) and electrode domains, plus multiplicity-adjusted pointwise inference based on experiment-wise error rate or false discovery rate to flag spatiotemporal (or spatio-temporal-frequency) regions that characterize stimuli differences, and can also produce inference for any prespecified waveform components. Our analysis of the smoking cessation ERP data set reveals numerous effects across different types of visual stimuli.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA, USA.
| | - Francesco Versace
- Department of Behavioral Science, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Paul M Cinciripini
- Department of Behavioral Science, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Philip Rausch
- Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
25
|
Lee W, Miranda MF, Rausch P, Baladandayuthapani V, Fazio M, Downs JC, Morris JS. Bayesian Semiparametric Functional Mixed Models for Serially Correlated Functional Data, with Application to Glaucoma Data. J Am Stat Assoc 2018; 114:495-513. [PMID: 31235987 PMCID: PMC6590079 DOI: 10.1080/01621459.2018.1476242] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 12/01/2017] [Indexed: 10/14/2022]
Abstract
Glaucoma, a leading cause of blindness, is characterized by optic nerve damage related to intraocular pressure (IOP), but its full etiology is unknown. Researchers at UAB have devised a custom device to measure scleral strain continuously around the eye under fixed levels of IOP, which here is used to assess how strain varies around the posterior pole, with IOP, and across glaucoma risk factors such as age. The hypothesis is that scleral strain decreases with age, which could alter biomechanics of the optic nerve head and cause damage that could eventually lead to glaucoma. To evaluate this hypothesis, we adapted Bayesian Functional Mixed Models to model these complex data consisting of correlated functions on spherical scleral surface, with nonparametric age effects allowed to vary in magnitude and smoothness across the scleral surface, multi-level random effect functions to capture within-subject correlation, and functional growth curve terms to capture serial correlation across IOPs that can vary around the scleral surface. Our method yields fully Bayesian inference on the scleral surface or any aggregation or transformation thereof, and reveals interesting insights into the biomechanical etiology of glaucoma. The general modeling framework described is very flexible and applicable to many complex, high-dimensional functional data.
Collapse
Affiliation(s)
- Wonyul Lee
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| | - Michelle F Miranda
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| | - Philip Rausch
- Department of Psychology, Institut für Psychologie, Humboldt-Universität zu Berlin, Germany
| | | | - Massimo Fazio
- Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, AL 35294
| | - J Crawford Downs
- Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, AL 35294
| | - Jeffrey S Morris
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| |
Collapse
|
26
|
Sun X, Du P, Wang X, Ma P. Optimal Penalized Function-on-Function Regression under a Reproducing Kernel Hilbert Space Framework. J Am Stat Assoc 2018; 113:1601-1611. [PMID: 30799886 DOI: 10.1080/01621459.2017.1356320] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Many scientific studies collect data where the response and predictor variables are both functions of time, location, or some other covariate. Understanding the relationship between these functional variables is a common goal in these studies. Motivated from two real-life examples, we present in this paper a function-on-function regression model that can be used to analyze such kind of functional data. Our estimator of the 2D coefficient function is the optimizer of a form of penalized least squares where the penalty enforces a certain level of smoothness on the estimator. Our first result is the Representer Theorem which states that the exact optimizer of the penalized least squares actually resides in a data-adaptive finite dimensional subspace although the optimization problem is defined on a function space of infinite dimensions. This theorem then allows us an easy incorporation of the Gaussian quadrature into the optimization of the penalized least squares, which can be carried out through standard numerical procedures. We also show that our estimator achieves the minimax convergence rate in mean prediction under the framework of function-on-function regression. Extensive simulation studies demonstrate the numerical advantages of our method over the existing ones, where a sparse functional data extension is also introduced. The proposed method is then applied to our motivating examples of the benchmark Canadian weather data and a histone regulation study.
Collapse
Affiliation(s)
| | - Pang Du
- Department of Statistics, Virginia Tech
| | - Xiao Wang
- Department of Statistics, Purdue University
| | - Ping Ma
- Department of Statistics, University of Georgia
| |
Collapse
|
27
|
Zhu H, Caspers P, Morris JS, Wu X, Müller R. A Unified Analysis of Structured Sonar-terrain Data using Bayesian Functional Mixed Models. Technometrics 2018; 60:112-123. [PMID: 29749977 DOI: 10.1080/00401706.2016.1274681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Sonar emits pulses of sound and uses the reflected echoes to gain information about target objects. It offers a low cost, complementary sensing modality for small robotic platforms. While existing analytical approaches often assume independence across echoes, real sonar data can have more complicated structures due to device setup or experimental design. In this paper, we consider sonar echo data collected from multiple terrain substrates with a dual-channel sonar head. Our goals are to identify the differential sonar responses to terrains and study the effectiveness of this dual-channel design in discriminating targets. We describe a unified analytical framework that achieves these goals rigorously, simultaneously, and automatically. The analysis was done by treating the echo envelope signals as functional responses and the terrain/channel information as covariates in a functional regression setting. We adopt functional mixed models that facilitate the estimation of terrain and channel effects while capturing the complex hierarchical structure in data. This unified analytical framework incorporates both Gaussian models and robust models. We fit the models using a full Bayesian approach, which enables us to perform multiple inferential tasks under the same modeling framework, including selecting models, estimating the effects of interest, identifying significant local regions, discriminating terrain types, and describing the discriminatory power of local regions. Our analysis of the sonar-terrain data identifies time regions that reflect differential sonar responses to terrains. The discriminant analysis suggests that a multi- or dual-channel design achieves target identification performance comparable with or better than a single-channel design.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Philip Caspers
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
| | - Jeffrey S Morris
- The University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Rolf Müller
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
| |
Collapse
|
28
|
|
29
|
Zhu H, Morris JS, Wei F, Cox DD. Multivariate functional response regression, with application to fluorescence spectroscopy in a cervical pre-cancer study. Comput Stat Data Anal 2017; 111:88-101. [PMID: 29051679 PMCID: PMC5642121 DOI: 10.1016/j.csda.2017.02.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors. This novel modeling framework simultaneously accounts for correlations between functional variables and potential multi-level structures in data that are induced by experimental design. The model is fitted by performing a two-stage linear transformation-a basis expansion to each functional variable followed by principal component analysis for the concatenated basis coefficients. This transformation effectively reduces the intra-and inter-function correlations and facilitates fast and convenient calculation. A fully Bayesian approach is adopted to sample the model parameters in the transformed space, and posterior inference is performed after inverse-transforming the regression coefficients back to the original data domain. The proposed approach produces functional tests that flag local regions on the functional effects, while controlling the overall experiment-wise error rate or false discovery rate. It also enables functional discriminant analysis through posterior predictive calculation. Analysis of the fluorescence spectroscopy data reveals local regions with differential expressions across the pre-cancer and normal samples. These regions may serve as biomarkers for prognosis and disease assessment.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Jeffrey S Morris
- The University of Texas MD Anderson Cancer Center, Houston, TX 77230
| | - Fengrong Wei
- Department of Mathematics, University of West Georgia, Carrollton, GA 30118
| | - Dennis D Cox
- Department of Statistics, Rice University, Houston, TX 77005
| |
Collapse
|
30
|
Morris JS, Baladandayuthapani V. Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration. STAT MODEL 2017; 17:245-289. [PMID: 29129969 PMCID: PMC5679480 DOI: 10.1177/1471082x17698255] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The advent of high-throughput multi-platform genomics technologies providing whole-genome molecular summaries of biological samples has revolutionalized biomedical research. These technologiees yield highly structured big data, whose analysis poses significant quantitative challenges. The field of Bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to effectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, focusing on four areas: (1) experimental design and reproducibility, (2) preprocessing and feature extraction, (3) unified modeling, and (4) structure learning and integration. In each of these areas, we highlight some key contributions and try to elucidate the key statistical principles underlying these methods and approaches. Our goals are to demonstrate major ways in which statisticians have contributed to bioinformatics, encourage statisticians to get involved early in methods development as new technologies emerge, and to stimulate future methodological work based on the statistical principles elucidated in this article and utilizing all availble information to uncover new biological insights.
Collapse
Affiliation(s)
- Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | | |
Collapse
|
31
|
Abstract
Researchers are increasingly interested in regression models for functional data. This article discusses a comprehensive framework for additive (mixed) models for functional responses and/or functional covariates based on the guiding principle of reframing functional regression in terms of corresponding models for scalar data, allowing the adaptation of a large body of existing methods for these novel tasks. The framework encompasses many existing as well as new models. It includes regression for ‘generalized’ functional data, mean regression, quantile regression as well as generalized additive models for location, shape and scale (GAMLSS) for functional data. It admits many flexible linear, smooth or interaction terms of scalar and functional covariates as well as (functional) random effects and allows flexible choices of bases—particularly splines and functional principal components—and corresponding penalties for each term. It covers functional data observed on common (dense) or curve-specific (sparse) grids. Penalized-likelihood-based and gradient-boosting-based inference for these models are implemented in R packages refund and FDboost , respectively. We also discuss identifiability and computational complexity for the functional regression models covered. A running example on a longitudinal multiple sclerosis imaging study serves to illustrate the flexibility and utility of the proposed model class. Reproducible code for this case study is made available online.
Collapse
Affiliation(s)
- Sonja Greven
- Department of Statistics, Ludwig-Maximilians-Universität München, Germany
| | - Fabian Scheipl
- Department of Statistics, Ludwig-Maximilians-Universität München, Germany
| |
Collapse
|
32
|
Affiliation(s)
- Sonja Greven
- Department of Statistics, Ludwig-Maximilians-Universität München, Germany
| | - Fabian Scheipl
- Department of Statistics, Ludwig-Maximilians-Universität München, Germany
| |
Collapse
|
33
|
Abstract
In this article, Greven and Scheipl describe an impressively general framework for performing functional regression that builds upon the generalized additive modeling framework. Over the past number of years, my collaborators and I have also been developing a general framework for functional regression, functional mixed models, which shares many similarities with this framework, but has many differences as well. In this discussion, I compare and contrast these two frameworks, to hopefully illuminate characteristics of each, highlighting their respecitve strengths and weaknesses, and providing recommendations regarding the settings in which each approach might be preferable.
Collapse
Affiliation(s)
- Jeffrey S Morris
- The University of Texas, MD Anderson Cancer Center, Unit 1411, PO Box 301402, Houston, TX 77230-1402
| |
Collapse
|
34
|
Pomann GM, Staicu AM, Lobaton EJ, Mejia AF, Dewey BE, Reich DS, Sweeney EM, Shinohara RT. A LAG FUNCTIONAL LINEAR MODEL FOR PREDICTION OF MAGNETIZATION TRANSFER RATIO IN MULTIPLE SCLEROSIS LESIONS. Ann Appl Stat 2016; 10:2325-2348. [PMID: 35791328 PMCID: PMC9252322 DOI: 10.1214/16-aoas981] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/14/2023]
Abstract
We propose a lag functional linear model to predict a response using multiple functional predictors observed at discrete grids with noise. Two procedures are proposed to estimate the regression parameter functions: (1) an approach that ensures smoothness for each value of time using generalized cross-validation; and (2) a global smoothing approach using a restricted maximum likelihood framework. Numerical studies are presented to analyze predictive accuracy in many realistic scenarios. The methods are employed to estimate a magnetic resonance imaging (MRI)-based measure of tissue damage (the magnetization transfer ratio, or MTR) in multiple sclerosis (MS) lesions, a disease that causes damage to the myelin sheaths around axons in the central nervous system. Our method of estimation of MTR within lesions is useful retrospectively in research applications where MTR was not acquired, as well as in clinical practice settings where acquiring MTR is not currently part of the standard of care. The model facilitates the use of commonly acquired imaging modalities to estimate MTR within lesions, and outperforms cross-sectional models that do not account for temporal patterns of lesion development and repair.
Collapse
Affiliation(s)
- Gina-Maria Pomann
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina 27710, USA
| | - Ana-Maria Staicu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Edgar J Lobaton
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Amanda F Mejia
- Department of Statistics, Indiana University Bloomington, Bloomington, Indiana 47405, USA
| | - Blake E Dewey
- National Institute of Neurological Disorders and Stroke NIH, Bethesda, Maryland 20892, USA
| | - Daniel S Reich
- National Institute of Neurological Disorders and Stroke NIH, Bethesda, Maryland 20892, USA
| | | | - Russell T Shinohara
- Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatisti Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
35
|
Luo R, Qi X, Wang Y. Functional wavelet regression for linear function-on-function models. Electron J Stat 2016. [DOI: 10.1214/16-ejs1204] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
36
|
Lee W, Morris JS. Identification of differentially methylated loci using wavelet-based functional mixed models. Bioinformatics 2015; 32:664-72. [PMID: 26559505 DOI: 10.1093/bioinformatics/btv659] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 11/05/2015] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION DNA methylation is a key epigenetic modification that can modulate gene expression. Over the past decade, a lot of studies have focused on profiling DNA methylation and investigating its alterations in complex diseases such as cancer. While early studies were mostly restricted to CpG islands or promoter regions, recent findings indicate that many of important DNA methylation changes can occur in other regions and DNA methylation needs to be examined on a genome-wide scale. In this article, we apply the wavelet-based functional mixed model methodology to analyze the high-throughput methylation data for identifying differentially methylated loci across the genome. Contrary to many commonly-used methods that model probes independently, this framework accommodates spatial correlations across the genome through basis function modeling as well as correlations between samples through functional random effects, which allows it to be applied to many different settings and potentially leads to more power in detection of differential methylation. RESULTS We applied this framework to three different high-dimensional methylation data sets (CpG Shore data, THREE data and NIH Roadmap Epigenomics data), studied previously in other works. A simulation study based on CpG Shore data suggested that in terms of detection of differentially methylated loci, this modeling approach using wavelets outperforms analogous approaches modeling the loci as independent. For the THREE data, the method suggests newly detected regions of differential methylation, which were not reported in the original study. AVAILABILITY AND IMPLEMENTATION Automated software called WFMM is available at https://biostatistics.mdanderson.org/SoftwareDownload CpG Shore data is available at http://rafalab.dfci.harvard.edu NIH Roadmap Epigenomics data is available at http://compbio.mit.edu/roadmap SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT jefmorris@mdanderson.org.
Collapse
Affiliation(s)
- Wonyul Lee
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|