1
|
Miranda MF. A canonical polyadic tensor basis for fast Bayesian estimation of multi-subject brain activation patterns. Front Neuroinform 2024; 18:1399391. [PMID: 39188665 PMCID: PMC11345152 DOI: 10.3389/fninf.2024.1399391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/29/2024] [Indexed: 08/28/2024] Open
Abstract
Task-evoked functional magnetic resonance imaging studies, such as the Human Connectome Project (HCP), are a powerful tool for exploring how brain activity is influenced by cognitive tasks like memory retention, decision-making, and language processing. A fast Bayesian function-on-scalar model is proposed for estimating population-level activation maps linked to the working memory task. The model is based on the canonical polyadic (CP) tensor decomposition of coefficient maps obtained for each subject. This decomposition effectively yields a tensor basis capable of extracting both common features and subject-specific features from the coefficient maps. These subject-specific features, in turn, are modeled as a function of covariates of interest using a Bayesian model that accounts for the correlation of the CP-extracted features. The dimensionality reduction achieved with the tensor basis allows for a fast MCMC estimation of population-level activation maps. This model is applied to one hundred unrelated subjects from the HCP dataset, yielding significant insights into brain signatures associated with working memory.
Collapse
Affiliation(s)
- Michelle F. Miranda
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| |
Collapse
|
2
|
Zhou F, He K, Wang K, Xu Y, Ni Y. Functional Bayesian networks for discovering causality from multivariate functional data. Biometrics 2023; 79:3279-3293. [PMID: 37635676 PMCID: PMC10840881 DOI: 10.1111/biom.13922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 08/10/2023] [Indexed: 08/29/2023]
Abstract
Multivariate functional data arise in a wide range of applications. One fundamental task is to understand the causal relationships among these functional objects of interest. In this paper, we develop a novel Bayesian network (BN) model for multivariate functional data where conditional independencies and causal structure are encoded by a directed acyclic graph. Specifically, we allow the functional objects to deviate from Gaussian processes, which is the key to unique causal structure identification even when the functions are measured with noises. A fully Bayesian framework is designed to infer the functional BN model with natural uncertainty quantification through posterior summaries. Simulation studies and real data examples demonstrate the practical utility of the proposed model.
Collapse
Affiliation(s)
- Fangting Zhou
- Department of Statistics, Texas A&M University, College Station, Texas, USA
- Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing, China
| | - Kejun He
- Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing, China
| | - Kunbo Wang
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Yang Ni
- Department of Statistics, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
3
|
Huo S, Morris JS, Zhu H. Ultra-Fast Approximate Inference Using Variational Functional Mixed Models. J Comput Graph Stat 2022; 32:353-365. [PMID: 37608921 PMCID: PMC10441618 DOI: 10.1080/10618600.2022.2107532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 07/23/2022] [Indexed: 10/16/2022]
Abstract
While Bayesian functional mixed models have been shown effective to model functional data with various complex structures, their application to extremely high-dimensional data is limited due to computational challenges involved in posterior sampling. We introduce a new computational framework that enables ultra-fast approximate inference for high-dimensional data in functional form. This framework adopts parsimonious basis to represent functional observations, which facilitates efficient compression and parallel computing in basis space. Instead of performing expensive Markov chain Monte Carlo sampling, we approximate the posterior distribution using variational Bayes and adopt a fast iterative algorithm to estimate parameters of the approximate distribution. Our approach facilitates a fast multiple testing procedure in basis space, which can be used to identify significant local regions that reflect differences across groups of samples. We perform two simulation studies to assess the performance of approximate inference, and demonstrate applications of the proposed approach by using a proteomic mass spectrometry dataset and a brain imaging dataset. Supplementary materials are available online.
Collapse
Affiliation(s)
| | - Jeffrey S Morris
- Department of Biostatistics, Epidemiology and Informatics, Department of Statistics, University of Pennsylvania
| | | |
Collapse
|
4
|
Meyer MJ, Morris JS, Gazes RP, Coull BA. Ordinal probit functional outcome regression with application to computer-use behavior in rhesus monkeys. Ann Appl Stat 2022; 16:537-550. [DOI: 10.1214/21-aoas1513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Mark J. Meyer
- Department of Mathematics and Statistics, Georgetown University
| | - Jeffrey S. Morris
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania
| | - Regina Paxton Gazes
- Department of Psychology and Program in Animal Behavior, Bucknell University
| | - Brent A. Coull
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| |
Collapse
|
5
|
Cui E, Leroux A, Smirnova E, Crainiceanu CM. Fast Univariate Inference for Longitudinal Functional Models. J Comput Graph Stat 2022; 31:219-230. [PMID: 35712524 PMCID: PMC9197085 DOI: 10.1080/10618600.2021.1950006] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
We propose fast univariate inferential approaches for longitudinal Gaussian and non-Gaussian functional data. The approach consists of three steps: (1) fit massively univariate pointwise mixed effects models; (2) apply any smoother along the functional domain; and (3) obtain joint confidence bands using analytic approaches for Gaussian data or a bootstrap of study participants for non-Gaussian data. Methods are motivated by two applications: (1) Diffusion Tensor Imaging (DTI) measured at multiple visits along the corpus callosum of multiple sclerosis (MS) patients; and (2) physical activity data measured by body-worn accelerometers for multiple days. An extensive simulation study indicates that model fitting and inference are accurate and much faster than existing approaches. Moreover, the proposed approach was the only one that was computationally feasible for the physical activity data application. Methods are accompanied by R software, though the method is "read-and-use", as it can be implemented by any analyst who is familiar with mixed effects model software.
Collapse
Affiliation(s)
- Erjia Cui
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, USA
| | - Andrew Leroux
- Department of Biostatistics and Informatics, University of Colorado, USA
| | | | | |
Collapse
|
6
|
Meyer MJ, Malloy EJ, Coull BA. Bayesian Wavelet-packet Historical Functional Linear Models. STATISTICS AND COMPUTING 2021; 31:14. [PMID: 36324372 PMCID: PMC9624484 DOI: 10.1007/s11222-020-09981-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 10/21/2020] [Indexed: 06/16/2023]
Abstract
Historical Functional Linear Models (HFLM) quantify associations between a functional predictor and functional outcome where the predictor is an exposure variable that occurs before, or at least concurrently with, the outcome. Prior work on the HFLM has largely focused on estimation of a surface that represents a time-varying association between the functional outcome and the functional exposure. This existing work has employed frequentist and spline-based estimation methods, with little attention paid to formal inference or adjustment for multiple testing and no approaches that implement wavelet-bases. In this work, we propose a new functional regression model that estimates the time-varying, lagged association between a functional outcome and a functional exposure. Building off of recently developed function-on-function regression methods, the model employs a novel use the wavelet-packet decomposition of the exposure and outcome functions that allows us to strictly enforce the temporal ordering of exposure and outcome, which is not possible with existing wavelet-based functional models. Using a fully Bayesian approach, we conduct formal inference on the time-varying lagged association, while adjusting for multiple testing. We investigate the operating characteristics of our wavelet-packet HFLM and compare them to those of two existing estimation procedures in simulation. We also assess several inference techniques and use the model to analyze data on the impact of lagged exposure to particulate matter finer than 2.5μg, or PM2.5, on heart rate variability in a cohort of journeyman boilermakers during the morning of a typical day's shift.
Collapse
Affiliation(s)
- Mark J Meyer
- Department of Mathematics and Statistics, Georgetown University
| | | | - Brent A Coull
- Department of Biostatistics, Harvard T. H. Chan School of Public Health
| |
Collapse
|
7
|
Maiti T, Safikhani A, Zhong P. On uncertainty estimation in functional linear mixed models. CAN J STAT 2020. [DOI: 10.1002/cjs.11585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Tapabrata Maiti
- Department of Statistics and Probability Michigan State University East Lansing MI 48824 U.S.A
| | - Abolfazl Safikhani
- Department of Statistics University of Florida Gainesville FL 32611 U.S.A
| | - Ping‐Shou Zhong
- Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Chicago IL 60607 U.S.A
| |
Collapse
|
8
|
Shan G, Hou Y, Liu B. Bayesian robust estimation of partially functional linear regression models using heavy-tailed distributions. Comput Stat 2020. [DOI: 10.1007/s00180-020-00975-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
9
|
Cao C, Wang Y, Jin S, Chen Y. Improved likelihood ratio tests in a measurement error model for multivariate replicated data. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2018.1554125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Chunzheng Cao
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, China
| | - Yahui Wang
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, China
| | - Shaobo Jin
- Department of Statistics, Uppsala University, Uppsala, Sweden
| | - Yunjie Chen
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, China
| |
Collapse
|
10
|
Kowal DR, Bourgeois DC. Bayesian Function-on-Scalars Regression for High-Dimensional Data. J Comput Graph Stat 2020. [DOI: 10.1080/10618600.2019.1710837] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
11
|
Noh H, Choi T, Park J, Chung Y. Bayesian latent factor regression for multivariate functional data with variable selection. J Korean Stat Soc 2020. [DOI: 10.1007/s42952-019-00044-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Zhu H, Chen K, Luo X, Yuan Y, Wang JL. FMEM: Functional Mixed Effects Models for Longitudinal Functional Responses. Stat Sin 2019; 29:2007-2033. [PMID: 31745381 DOI: 10.5705/ss.202017.0505] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The aim of this paper is to conduct a systematic and theoretical analysis of estimation and inference for a class of functional mixed effects models (FMEM). Such FMEMs consist of fixed effects that characterize the association between longitudinal functional responses and covariates of interest and random effects that capture the spatial-temporal correlations of longitudinal functional responses. We propose local linear estimates of refined fixed effect functions and establish their weak convergence along with a simultaneous confidence band for each fixed-effect function. We propose a global test for the linear hypotheses of varying coefficient functions and derive the associated asymptotic distribution under the null hypothesis and the asymptotic power under the alternative hypothesis are derived. We also establish the convergence rates of the estimated spatial-temporal covariance operators and their associated eigenvalues and eigenfunctions. We conduct extensive simulations and apply our method to a white-matter fiber data set from a national database for autism research to examine the finite-sample performance of the proposed estimation and inference procedures.
Collapse
Affiliation(s)
- Hongtu Zhu
- The University of Texas MD Anderson Cancer Center
| | | | | | - Ying Yuan
- The University of Texas MD Anderson Cancer Center.,University of Pittsburgh.,Statistics & Decision Sciences.,University of California at Davis
| | | |
Collapse
|
13
|
Yang H, Baladandayuthapani V, Rao AUK, Morris JS. Quantile Function on Scalar Regression Analysis for Distributional Data. J Am Stat Assoc 2019; 115:90-106. [PMID: 32981991 PMCID: PMC7517594 DOI: 10.1080/01621459.2019.1609969] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 03/08/2019] [Accepted: 04/07/2019] [Indexed: 02/05/2023]
Abstract
Radiomics involves the study of tumor images to identify quantitative markers explaining cancer heterogeneity. The predominant approach is to extract hundreds to thousands of image features, including histogram features comprised of summaries of the marginal distribution of pixel intensities, which leads to multiple testing problems and can miss out on insights not contained in the selected features. In this paper, we present methods to model the entire marginal distribution of pixel intensities via the quantile function as functional data, regressed on a set of demographic, clinical, and genetic predictors to investigate their effects of imaging-based cancer heterogeneity. We call this approach quantile functional regression, regressing subject-specific marginal distributions across repeated measurements on a set of covariates, allowing us to assess which covariates are associated with the distribution in a global sense, as well as to identify distributional features characterizing these differences, including mean, variance, skewness, heavy-tailedness, and various upper and lower quantiles. To account for smoothness in the quantile functions, account for intrafunctional correlation, and gain statistical power, we introduce custom basis functions we call quantlets that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set and containing a Gaussian subspace so non-Gaussianness can be assessed. We fit this model using a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and provides fully Bayesian inference after fitting a Markov chain Monte Carlo. We demonstrate the benefit of the basis space modeling through simulation studies, and apply the method to Magnetic resonance imaging (MRI) based radiomic dataset from Glioblastoma Multiforme to relate imaging-based quantile functions to various demographic, clinical, and genetic predictors, finding specific differences in tumor pixel intensity distribution between males and females and between tumors with and without DDIT3 mutations.
Collapse
Affiliation(s)
- Hojin Yang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | | | - Arvind U K Rao
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | - Jeffrey S Morris
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| |
Collapse
|
14
|
Zhu H, Versace F, Cinciripini PM, Rausch P, Morris JS. Robust and Gaussian spatial functional regression models for analysis of event-related potentials. Neuroimage 2018; 181:501-512. [PMID: 30057352 DOI: 10.1016/j.neuroimage.2018.07.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 06/01/2018] [Accepted: 07/03/2018] [Indexed: 10/28/2022] Open
Abstract
Event-related potentials (ERPs) summarize electrophysiological brain response to specific stimuli. They can be considered as correlated functions of time with both spatial correlation across electrodes and nested correlations within subjects. Commonly used analytical methods for ERPs often focus on pre-determined extracted components and/or ignore the correlation among electrodes or subjects, which can miss important insights, and tend to be sensitive to outlying subjects, time points or electrodes. Motivated by ERP data in a smoking cessation study, we introduce a Bayesian spatial functional regression framework that models the entire ERPs as spatially correlated functional responses and the stimulus types as covariates. This novel framework relies on mixed models to characterize the effects of stimuli while simultaneously accounting for the multilevel correlation structure. The spatial correlation among the ERP profiles is captured through basis-space Matérn assumptions that allow either separable or nonseparable spatial correlations over time. We induce both adaptive regularization over time and spatial smoothness across electrodes via a correlated normal-exponential-gamma (CNEG) prior on the fixed effect coefficient functions. Our proposed framework includes both Gaussian models as well as robust models using heavier-tailed distributions to make the regression automatically robust to outliers. We introduce predictive methods to select among Gaussian vs. robust models and models with separable vs. non-separable spatiotemporal correlation structures. Our proposed analysis produces global tests for stimuli effects across entire time (or time-frequency) and electrode domains, plus multiplicity-adjusted pointwise inference based on experiment-wise error rate or false discovery rate to flag spatiotemporal (or spatio-temporal-frequency) regions that characterize stimuli differences, and can also produce inference for any prespecified waveform components. Our analysis of the smoking cessation ERP data set reveals numerous effects across different types of visual stimuli.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA, USA.
| | - Francesco Versace
- Department of Behavioral Science, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Paul M Cinciripini
- Department of Behavioral Science, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Philip Rausch
- Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
15
|
Cao C, Shi JQ, Lee Y. Robust functional regression model for marginal mean and subject-specific inferences. Stat Methods Med Res 2018; 27:3236-3254. [PMID: 29298601 DOI: 10.1177/0962280217695346] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We introduce flexible robust functional regression models, using various heavy-tailed processes, including a Student t-process. We propose efficient algorithms in estimating parameters for the marginal mean inferences and in predicting conditional means as well as interpolation and extrapolation for the subject-specific inferences. We develop bootstrap prediction intervals (PIs) for conditional mean curves. Numerical studies show that the proposed model provides a robust approach against data contamination or distribution misspecification, and the proposed PIs maintain the nominal confidence levels. A real data application is presented as an illustrative example.
Collapse
Affiliation(s)
- Chunzheng Cao
- 1 School of Mathematics and Statistics, Nanjing University of Information Science and Technology, China
- 2 Department of Statistics, Seoul National University, Korea
| | - Jian Qing Shi
- 3 School of Mathematics and Statistics, Newcastle University, UK
| | - Youngjo Lee
- 2 Department of Statistics, Seoul National University, Korea
| |
Collapse
|
16
|
Lee W, Miranda MF, Rausch P, Baladandayuthapani V, Fazio M, Downs JC, Morris JS. Bayesian Semiparametric Functional Mixed Models for Serially Correlated Functional Data, with Application to Glaucoma Data. J Am Stat Assoc 2018; 114:495-513. [PMID: 31235987 PMCID: PMC6590079 DOI: 10.1080/01621459.2018.1476242] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 12/01/2017] [Indexed: 10/14/2022]
Abstract
Glaucoma, a leading cause of blindness, is characterized by optic nerve damage related to intraocular pressure (IOP), but its full etiology is unknown. Researchers at UAB have devised a custom device to measure scleral strain continuously around the eye under fixed levels of IOP, which here is used to assess how strain varies around the posterior pole, with IOP, and across glaucoma risk factors such as age. The hypothesis is that scleral strain decreases with age, which could alter biomechanics of the optic nerve head and cause damage that could eventually lead to glaucoma. To evaluate this hypothesis, we adapted Bayesian Functional Mixed Models to model these complex data consisting of correlated functions on spherical scleral surface, with nonparametric age effects allowed to vary in magnitude and smoothness across the scleral surface, multi-level random effect functions to capture within-subject correlation, and functional growth curve terms to capture serial correlation across IOPs that can vary around the scleral surface. Our method yields fully Bayesian inference on the scleral surface or any aggregation or transformation thereof, and reveals interesting insights into the biomechanical etiology of glaucoma. The general modeling framework described is very flexible and applicable to many complex, high-dimensional functional data.
Collapse
Affiliation(s)
- Wonyul Lee
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| | - Michelle F Miranda
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| | - Philip Rausch
- Department of Psychology, Institut für Psychologie, Humboldt-Universität zu Berlin, Germany
| | | | - Massimo Fazio
- Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, AL 35294
| | - J Crawford Downs
- Department of Ophthalmology, University of Alabama at Birmingham, Birmingham, AL 35294
| | - Jeffrey S Morris
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| |
Collapse
|
17
|
Luo S, Song R, Styner M, Gilmore JH, Zhu H. FSEM: Functional Structural Equation Models for Twin Functional Data. J Am Stat Assoc 2018; 114:344-357. [PMID: 31057192 DOI: 10.1080/01621459.2017.1407773] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The aim of this paper is to develop a novel class of functional structural equation models (FSEMs) for dissecting functional genetic and environmental effects on twin functional data, while characterizing the varying association between functional data and covariates of interest. We propose a three-stage estimation procedure to estimate varying coefficient functions for various covariates (e.g., gender) as well as three covariance operators for the genetic and environmental effects. We develop an inference procedure based on weighted likelihood ratio statistics to test the genetic/environmental effect at either a fixed location or a compact region. We also systematically carry out the theoretical analysis of the estimated varying functions, the weighted likelihood ratio statistics, and the estimated covariance operators. We conduct extensive Monte Carlo simulations to examine the finite-sample performance of the estimation and inference procedures. We apply the proposed FSEM to quantify the degree of genetic and environmental effects on twin white-matter tracts obtained from the UNC early brain development study.
Collapse
Affiliation(s)
- S Luo
- Departments of Statistics, North Carolina State University, Cary, North Carolina, USA
| | - R Song
- Departments of Statistics, North Carolina State University, Cary, North Carolina, USA
| | - M Styner
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - J H Gilmore
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - H Zhu
- Department of Biostatistics, and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.,Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
18
|
Zhu H, Caspers P, Morris JS, Wu X, Müller R. A Unified Analysis of Structured Sonar-terrain Data using Bayesian Functional Mixed Models. Technometrics 2018; 60:112-123. [PMID: 29749977 DOI: 10.1080/00401706.2016.1274681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Sonar emits pulses of sound and uses the reflected echoes to gain information about target objects. It offers a low cost, complementary sensing modality for small robotic platforms. While existing analytical approaches often assume independence across echoes, real sonar data can have more complicated structures due to device setup or experimental design. In this paper, we consider sonar echo data collected from multiple terrain substrates with a dual-channel sonar head. Our goals are to identify the differential sonar responses to terrains and study the effectiveness of this dual-channel design in discriminating targets. We describe a unified analytical framework that achieves these goals rigorously, simultaneously, and automatically. The analysis was done by treating the echo envelope signals as functional responses and the terrain/channel information as covariates in a functional regression setting. We adopt functional mixed models that facilitate the estimation of terrain and channel effects while capturing the complex hierarchical structure in data. This unified analytical framework incorporates both Gaussian models and robust models. We fit the models using a full Bayesian approach, which enables us to perform multiple inferential tasks under the same modeling framework, including selecting models, estimating the effects of interest, identifying significant local regions, discriminating terrain types, and describing the discriminatory power of local regions. Our analysis of the sonar-terrain data identifies time regions that reflect differential sonar responses to terrains. The discriminant analysis suggests that a multi- or dual-channel design achieves target identification performance comparable with or better than a single-channel design.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Philip Caspers
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
| | - Jeffrey S Morris
- The University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Rolf Müller
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
| |
Collapse
|
19
|
Leroux A, Xiao L, Crainiceanu C, Checkley W. Dynamic prediction in functional concurrent regression with an application to child growth. Stat Med 2018; 37:1376-1388. [PMID: 29230836 PMCID: PMC5847461 DOI: 10.1002/sim.7582] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Revised: 11/07/2017] [Accepted: 11/12/2017] [Indexed: 12/24/2022]
Abstract
In many studies, it is of interest to predict the future trajectory of subjects based on their historical data, referred to as dynamic prediction. Mixed effects models have traditionally been used for dynamic prediction. However, the commonly used random intercept and slope model is often not sufficiently flexible for modeling subject-specific trajectories. In addition, there may be useful exposures/predictors of interest that are measured concurrently with the outcome, complicating dynamic prediction. To address these problems, we propose a dynamic functional concurrent regression model to handle the case where both the functional response and the functional predictors are irregularly measured. Currently, such a model cannot be fit by existing software. We apply the model to dynamically predict children's length conditional on prior length, weight, and baseline covariates. Inference on model parameters and subject-specific trajectories is conducted using the mixed effects representation of the proposed model. An extensive simulation study shows that the dynamic functional regression model provides more accurate estimation and inference than existing methods. Methods are supported by fast, flexible, open source software that uses heavily tested smoothing techniques.
Collapse
Affiliation(s)
- Andrew Leroux
- Department of BiostatisticsJohns Hopkins UniversityBaltimoreMD 21205USA
| | - Luo Xiao
- Department of StatisticsNorth Carolina State UniversityRaleighNC 27606USA
| | | | | |
Collapse
|
20
|
Park SY, Staicu AM, Xiao L, Crainiceanu CM. Simple fixed-effects inference for complex functional models. Biostatistics 2018; 19:137-152. [PMID: 29036541 PMCID: PMC5862370 DOI: 10.1093/biostatistics/kxx026] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2016] [Revised: 04/09/2017] [Accepted: 05/07/2017] [Indexed: 11/14/2022] Open
Abstract
We propose simple inferential approaches for the fixed effects in complex functional mixed effects models. We estimate the fixed effects under the independence of functional residuals assumption and then bootstrap independent units (e.g. subjects) to conduct inference on the fixed effects parameters. Simulations show excellent coverage probability of the confidence intervals and size of tests for the fixed effects model parameters. Methods are motivated by and applied to the Baltimore Longitudinal Study of Aging, though they are applicable to other studies that collect correlated functional data.
Collapse
Affiliation(s)
- So Young Park
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Ana-Maria Staicu
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Luo Xiao
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | | |
Collapse
|
21
|
Grigsby MR, Di J, Leroux A, Zipunnikov V, Xiao L, Crainiceanu C, Checkley W. Novel metrics for growth model selection. Emerg Themes Epidemiol 2018; 15:4. [PMID: 29483933 PMCID: PMC5824542 DOI: 10.1186/s12982-018-0072-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 02/14/2018] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Literature surrounding the statistical modeling of childhood growth data involves a diverse set of potential models from which investigators can choose. However, the lack of a comprehensive framework for comparing non-nested models leads to difficulty in assessing model performance. This paper proposes a framework for comparing non-nested growth models using novel metrics of predictive accuracy based on modifications of the mean squared error criteria. METHODS Three metrics were created: normalized, age-adjusted, and weighted mean squared error (MSE). Predictive performance metrics were used to compare linear mixed effects models and functional regression models. Prediction accuracy was assessed by partitioning the observed data into training and test datasets. This partitioning was constructed to assess prediction accuracy for backward (i.e., early growth), forward (i.e., late growth), in-range, and on new-individuals. Analyses were done with height measurements from 215 Peruvian children with data spanning from near birth to 2 years of age. RESULTS Functional models outperformed linear mixed effects models in all scenarios tested. In particular, prediction errors for functional concurrent regression (FCR) and functional principal component analysis models were approximately 6% lower when compared to linear mixed effects models. When we weighted subject-specific MSEs according to subject-specific growth rates during infancy, we found that FCR was the best performer in all scenarios. CONCLUSION With this novel approach, we can quantitatively compare non-nested models and weight subgroups of interest to select the best performing growth model for a particular application or problem at hand.
Collapse
Affiliation(s)
- Matthew R. Grigsby
- Division of Pulmonary and Critical Care, School of Medicine, Johns Hopkins University, 1830 E. Monument Street, 5th Floor, Baltimore, MD 21287 USA
| | - Junrui Di
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| | - Andrew Leroux
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| | - Vadim Zipunnikov
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| | - Luo Xiao
- Department of Statistics, North Carolina State University, Raleigh, NC USA
| | - Ciprian Crainiceanu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| | - William Checkley
- Division of Pulmonary and Critical Care, School of Medicine, Johns Hopkins University, 1830 E. Monument Street, 5th Floor, Baltimore, MD 21287 USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| |
Collapse
|
22
|
Tran H, Zhu H, Wu X, Kim G, Clarke CR, Larose H, Haak DC, Askew SD, Barney JN, Westwood JH, Zhang L. Identification of Differentially Methylated Sites with Weak Methylation Effects. Genes (Basel) 2018; 9:E75. [PMID: 29419727 PMCID: PMC5852571 DOI: 10.3390/genes9020075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Revised: 01/17/2018] [Accepted: 01/25/2018] [Indexed: 12/28/2022] Open
Abstract
Deoxyribonucleic acid (DNA) methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect differentially methylated cytosines (DMCs) among treatments. Most statistical methods for DMC detection do not consider the dependency of methylation patterns across the genome, thus possibly inflating type I error. Furthermore, small sample sizes and weak methylation effects among different phenotype categories make it difficult for these statistical methods to accurately detect DMCs. To address these issues, the wavelet-based functional mixed model (WFMM) was introduced to detect DMCs. To further examine the performance of WFMM in detecting weak differential methylation events, we used both simulated and empirical data and compare WFMM performance to a popular DMC detection tool methylKit. Analyses of simulated data that replicated the effects of the herbicide glyphosate on DNA methylation in Arabidopsis thaliana show that WFMM results in higher sensitivity and specificity in detecting DMCs compared to methylKit, especially when the methylation differences among phenotype groups are small. Moreover, the performance of WFMM is robust with respect to small sample sizes, making it particularly attractive considering the current high costs of bisulfite sequencing. Analysis of empirical Arabidopsis thaliana data under varying glyphosate dosages, and the analysis of monozygotic (MZ) twins who have different pain sensitivities-both datasets have weak methylation effects of <1%-show that WFMM can identify more relevant DMCs related to the phenotype of interest than methylKit. Differentially methylated regions (DMRs) are genomic regions with different DNA methylation status across biological samples. DMRs and DMCs are essentially the same concepts, with the only difference being how methylation information across the genome is summarized. If methylation levels are determined by grouping neighboring cytosine sites, then they are DMRs; if methylation levels are calculated based on single cytosines, they are DMCs.
Collapse
Affiliation(s)
- Hong Tran
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Gunjune Kim
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Christopher R Clarke
- Genetic Improvement of Fruits and Vegetables Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA.
| | - Hailey Larose
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - David C Haak
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Shawn D Askew
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Jacob N Barney
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - James H Westwood
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.
| |
Collapse
|
23
|
Yang J, Cox DD, Lee JS, Ren P, Choi T. Efficient Bayesian hierarchical functional data analysis with basis function approximations using Gaussian-Wishart processes. Biometrics 2017; 73:1082-1091. [PMID: 28395117 PMCID: PMC5634932 DOI: 10.1111/biom.12705] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 03/01/2017] [Accepted: 03/01/2017] [Indexed: 11/28/2022]
Abstract
Functional data are defined as realizations of random functions (mostly smooth functions) varying over a continuum, which are usually collected on discretized grids with measurement errors. In order to accurately smooth noisy functional observations and deal with the issue of high-dimensional observation grids, we propose a novel Bayesian method based on the Bayesian hierarchical model with a Gaussian-Wishart process prior and basis function representations. We first derive an induced model for the basis-function coefficients of the functional data, and then use this model to conduct posterior inference through Markov chain Monte Carlo methods. Compared to the standard Bayesian inference that suffers serious computational burden and instability in analyzing high-dimensional functional data, our method greatly improves the computational scalability and stability, while inheriting the advantage of simultaneously smoothing raw observations and estimating the mean-covariance functions in a nonparametric way. In addition, our method can naturally handle functional data observed on random or uncommon grids. Simulation and real studies demonstrate that our method produces similar results to those obtainable by the standard Bayesian inference with low-dimensional common grids, while efficiently smoothing and estimating functional data with random and high-dimensional observation grids when the standard Bayesian inference fails. In conclusion, our method can efficiently smooth and estimate high-dimensional functional data, providing one way to resolve the curse of dimensionality for Bayesian functional data analysis with Gaussian-Wishart processes.
Collapse
Affiliation(s)
- Jingjing Yang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A
| | - Dennis D Cox
- Department of Statistics, Rice University, Houston, Texas 77005, U.S.A
| | - Jong Soo Lee
- Department of Mathematical Sciences, University of Massachusetts Lowell, Lowell, Massachusetts 01854, U.S.A
| | - Peng Ren
- Suntrust Banks Inc, Atlanta, Georgia 30308, U.S.A
| | - Taeryon Choi
- Department of Statistics, Korea University, Seoul 136-701, Republic of Korea
| |
Collapse
|
24
|
Testing Gait with Ankle-Foot Orthoses in Children with Cerebral Palsy by Using Functional Mixed-Effects Analysis of Variance. Sci Rep 2017; 7:11081. [PMID: 28894132 PMCID: PMC5594035 DOI: 10.1038/s41598-017-11282-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 08/17/2017] [Indexed: 11/25/2022] Open
Abstract
Existing statistical methods extract insufficient information from 3-dimensional gait data, rendering clinical interpretation of impaired movement patterns sub-optimal. We propose an alternative approach based on functional data analysis that may be worthy of exploration. We apply this to gait data analysis using repeated-measurements data from children with cerebral palsy who had been prescribed fixed ankle-foot orthoses as an example. We analyze entire gait curves by means of a new functional F test with comparison to multiple pointwise F tests and also to the traditional method - univariate repeated-measurements analysis of variance of joint angle minima and maxima. The new test maintains the nominal significance level and can be adapted to test hypotheses for specific phases of the gait cycle. The main findings indicate that ankle-foot orthoses exert significant effects on coronal and sagittal plane ankle rotation; and both sagittal and horizontal plane foot rotation. The functional F test provided further information for the stance and swing phases. Differences between the results of the different statistical approaches are discussed, concluding that the novel method has potential utility and is worthy of validation through larger scale patient and clinician engagement to determine whether it is preferable to the traditional approach.
Collapse
|
25
|
Zhu H, Morris JS, Wei F, Cox DD. Multivariate functional response regression, with application to fluorescence spectroscopy in a cervical pre-cancer study. Comput Stat Data Anal 2017; 111:88-101. [PMID: 29051679 PMCID: PMC5642121 DOI: 10.1016/j.csda.2017.02.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors. This novel modeling framework simultaneously accounts for correlations between functional variables and potential multi-level structures in data that are induced by experimental design. The model is fitted by performing a two-stage linear transformation-a basis expansion to each functional variable followed by principal component analysis for the concatenated basis coefficients. This transformation effectively reduces the intra-and inter-function correlations and facilitates fast and convenient calculation. A fully Bayesian approach is adopted to sample the model parameters in the transformed space, and posterior inference is performed after inverse-transforming the regression coefficients back to the original data domain. The proposed approach produces functional tests that flag local regions on the functional effects, while controlling the overall experiment-wise error rate or false discovery rate. It also enables functional discriminant analysis through posterior predictive calculation. Analysis of the fluorescence spectroscopy data reveals local regions with differential expressions across the pre-cancer and normal samples. These regions may serve as biomarkers for prognosis and disease assessment.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Jeffrey S Morris
- The University of Texas MD Anderson Cancer Center, Houston, TX 77230
| | - Fengrong Wei
- Department of Mathematics, University of West Georgia, Carrollton, GA 30118
| | - Dennis D Cox
- Department of Statistics, Rice University, Houston, TX 77005
| |
Collapse
|
26
|
Abstract
In this article, Greven and Scheipl describe an impressively general framework for performing functional regression that builds upon the generalized additive modeling framework. Over the past number of years, my collaborators and I have also been developing a general framework for functional regression, functional mixed models, which shares many similarities with this framework, but has many differences as well. In this discussion, I compare and contrast these two frameworks, to hopefully illuminate characteristics of each, highlighting their respecitve strengths and weaknesses, and providing recommendations regarding the settings in which each approach might be preferable.
Collapse
Affiliation(s)
- Jeffrey S Morris
- The University of Texas, MD Anderson Cancer Center, Unit 1411, PO Box 301402, Houston, TX 77230-1402
| |
Collapse
|
27
|
Zhang L, Baladandayuthapani V, Zhu H, Baggerly KA, Majewski T, Czerniak BA, Morris JS. Functional CAR models for large spatially correlated functional datasets. J Am Stat Assoc 2016; 111:772-786. [PMID: 28018013 DOI: 10.1080/01621459.2015.1042581] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on functions defined on higher dimensional domains such as images. Through simulation studies, we demonstrate that accounting for the spatial correlation in our modeling leads to improved functional regression performance. Applied to a high-throughput spatially correlated copy number dataset, the model identifies genetic markers not identified by comparable methods that ignore spatial correlations.
Collapse
Affiliation(s)
- Lin Zhang
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | | | | | - Keith A Baggerly
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Tadeusz Majewski
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Bogdan A Czerniak
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Jeffrey S Morris
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| |
Collapse
|
28
|
|
29
|
Lee W, Morris JS. Identification of differentially methylated loci using wavelet-based functional mixed models. Bioinformatics 2015; 32:664-72. [PMID: 26559505 DOI: 10.1093/bioinformatics/btv659] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 11/05/2015] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION DNA methylation is a key epigenetic modification that can modulate gene expression. Over the past decade, a lot of studies have focused on profiling DNA methylation and investigating its alterations in complex diseases such as cancer. While early studies were mostly restricted to CpG islands or promoter regions, recent findings indicate that many of important DNA methylation changes can occur in other regions and DNA methylation needs to be examined on a genome-wide scale. In this article, we apply the wavelet-based functional mixed model methodology to analyze the high-throughput methylation data for identifying differentially methylated loci across the genome. Contrary to many commonly-used methods that model probes independently, this framework accommodates spatial correlations across the genome through basis function modeling as well as correlations between samples through functional random effects, which allows it to be applied to many different settings and potentially leads to more power in detection of differential methylation. RESULTS We applied this framework to three different high-dimensional methylation data sets (CpG Shore data, THREE data and NIH Roadmap Epigenomics data), studied previously in other works. A simulation study based on CpG Shore data suggested that in terms of detection of differentially methylated loci, this modeling approach using wavelets outperforms analogous approaches modeling the loci as independent. For the THREE data, the method suggests newly detected regions of differential methylation, which were not reported in the original study. AVAILABILITY AND IMPLEMENTATION Automated software called WFMM is available at https://biostatistics.mdanderson.org/SoftwareDownload CpG Shore data is available at http://rafalab.dfci.harvard.edu NIH Roadmap Epigenomics data is available at http://compbio.mit.edu/roadmap SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT jefmorris@mdanderson.org.
Collapse
Affiliation(s)
- Wonyul Lee
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
30
|
Meyer MJ, Coull BA, Versace F, Cinciripini P, Morris JS. Bayesian function-on-function regression for multilevel functional data. Biometrics 2015; 71:563-74. [PMID: 25787146 PMCID: PMC4575250 DOI: 10.1111/biom.12299] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Revised: 12/01/2014] [Accepted: 01/01/2015] [Indexed: 11/30/2022]
Abstract
Medical and public health research increasingly involves the collection of complex and high dimensional data. In particular, functional data-where the unit of observation is a curve or set of curves that are finely sampled over a grid-is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data on a fine grid, presenting a simple model as well as a more extensive mixed model framework, and introducing various functional Bayesian inferential procedures that account for multiple testing. We examine these models via simulation and a data analysis with data from a study that used event-related potentials to examine how the brain processes various types of images.
Collapse
Affiliation(s)
- Mark J. Meyer
- Department of Mathematics, Bucknell University, Lewisburg, Pennsylvania, U.S.A
| | - Brent A. Coull
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, U.S.A
| | - Francesco Versace
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Paul Cinciripini
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Jeffrey S. Morris
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| |
Collapse
|
31
|
Abstract
We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.
Collapse
|
32
|
|
33
|
Zipunnikov V, Greven S, Shou H, Caffo B, Reich DS, Crainiceanu C. Longitudinal High-Dimensional Principal Components Analysis with Application to Diffusion Tensor Imaging of Multiple Sclerosis. Ann Appl Stat 2015; 8:2175-2202. [PMID: 25663955 PMCID: PMC4316386 DOI: 10.1214/14-aoas748] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
We develop a flexible framework for modeling high-dimensional imaging data observed longitudinally. The approach decomposes the observed variability of repeatedly measured high-dimensional observations into three additive components: a subject-specific imaging random intercept that quantifies the cross-sectional variability, a subject-specific imaging slope that quantifies the dynamic irreversible deformation over multiple realizations, and a subject-visit specific imaging deviation that quantifies exchangeable effects between visits. The proposed method is very fast, scalable to studies including ultra-high dimensional data, and can easily be adapted to and executed on modest computing infrastructures. The method is applied to the longitudinal analysis of diffusion tensor imaging (DTI) data of the corpus callosum of multiple sclerosis (MS) subjects. The study includes 176 subjects observed at 466 visits. For each subject and visit the study contains a registered DTI scan of the corpus callosum at roughly 30,000 voxels.
Collapse
Affiliation(s)
- Vadim Zipunnikov
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205
| | - Sonja Greven
- Department of Statistics, Ludwig-Maximilians-Universität and Miinchen, 80539 Munich, Germany
| | | | - Brian Caffo
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205
| | - Daniel S. Reich
- Translational Neurology Unit, Neuroimmunology Branch, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
| | | |
Collapse
|
34
|
Shim H, Stephens M. WAVELET-BASED GENETIC ASSOCIATION ANALYSIS OF FUNCTIONAL PHENOTYPES ARISING FROM HIGH-THROUGHPUT SEQUENCING ASSAYS. Ann Appl Stat 2015; 9:655-686. [PMID: 29399242 PMCID: PMC5795621 DOI: 10.1214/14-aoas776] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Understanding how genetic variants influence cellular-level processes is an important step toward understanding how they influence important organismal-level traits, or "phenotypes," including human disease susceptibility. To this end, scientists are undertaking large-scale genetic association studies that aim to identify genetic variants associated with molecular and cellular phenotypes, such as gene expression, transcription factor binding, or chromatin accessibility. These studies use high-throughput sequencing assays (e.g., RNA-seq, ChIP-seq, DNase-seq) to obtain high-resolution data on how the traits vary along the genome in each sample. However, typical association analyses fail to exploit these high-resolution measurements, instead aggregating the data at coarser resolutions, such as genes, or windows of fixed length. Here we develop and apply statistical methods that better exploit the high-resolution data. The key idea is to treat the sequence data as measuring an underlying "function" that varies along the genome, and then, building on wavelet-based methods for functional data analysis, test for association between genetic variants and the underlying function. Applying these methods to identify genetic variants associated with chromatin accessibility (dsQTLs), we find that they identify substantially more associations than a simpler window-based analysis, and in total we identify 772 novel dsQTLs not identified by the original analysis.
Collapse
|
35
|
Luo X, Zhu L, Kong L, Zhu H. Functional Nonlinear Mixed Effects Models for Longitudinal Image Data. INFORMATION PROCESSING IN MEDICAL IMAGING : PROCEEDINGS OF THE ... CONFERENCE 2015. [PMID: 26213453 DOI: 10.1007/978-3-319-19992-4_63] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Motivated by studying large-scale longitudinal image data, we propose a novel functional nonlinear mixed effects modeling (FNMEM) framework to model the nonlinear spatial-temporal growth patterns of brain structure and function and their association with covariates of interest (e.g., time or diagnostic status). Our FNMEM explicitly quantifies a random nonlinear association map of individual trajectories. We develop an efficient estimation method to estimate the nonlinear growth function and the covariance operator of the spatial-temporal process. We propose a global test and a simultaneous confidence band for some specific growth patterns. We conduct Monte Carlo simulation to examine the finite-sample performance of the proposed procedures. We apply FNMEM to investigate the spatial-temporal dynamics of white-matter fiber skeletons in a national database for autism research. Our FNMEM may provide a valuable tool for charting the developmental trajectories of various neuropsychiatric and neurodegenerative disorders.
Collapse
|
36
|
Lin JA, Zhu H, Mihye A, Sun W, Ibrahim JG. Functional-mixed effects models for candidate genetic mapping in imaging genetic studies. Genet Epidemiol 2014; 38:680-91. [PMID: 25270690 PMCID: PMC4236266 DOI: 10.1002/gepi.21854] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2014] [Revised: 07/29/2014] [Accepted: 08/13/2014] [Indexed: 01/09/2023]
Abstract
The aim of this paper is to develop a functional-mixed effects modeling (FMEM) framework for the joint analysis of high-dimensional imaging data in a large number of locations (called voxels) of a three-dimensional volume with a set of genetic markers and clinical covariates. Our FMEM is extremely useful for efficiently carrying out the candidate gene approaches in imaging genetic studies. FMEM consists of two novel components including a mixed effects model for modeling nonlinear genetic effects on imaging phenotypes by introducing the genetic random effects at each voxel and a jumping surface model for modeling the variance components of the genetic random effects and fixed effects as piecewise smooth functions of the voxels. Moreover, FMEM naturally accommodates the correlation structure of the genetic markers at each voxel, while the jumping surface model explicitly incorporates the intrinsically spatial smoothness of the imaging data. We propose a novel two-stage adaptive smoothing procedure to spatially estimate the piecewise smooth functions, particularly the irregular functional genetic variance components, while preserving their edges among different piecewise-smooth regions. We develop weighted likelihood ratio tests and derive their exact approximations to test the effect of the genetic markers across voxels. Simulation studies show that FMEM significantly outperforms voxel-wise approaches in terms of higher sensitivity and specificity to identify regions of interest for carrying out candidate genetic mapping in imaging genetic studies. Finally, FMEM is used to identify brain regions affected by three candidate genes including CR1, CD2AP, and PICALM, thereby hoping to shed light on the pathological interactions between these candidate genes and brain structure and function.
Collapse
Affiliation(s)
- Ja-An Lin
- Departments of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Departments of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Biomedical Research Imaging Center, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ahn Mihye
- Departments of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Wei Sun
- Departments of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Departments of Biostatistics Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Joseph G Ibrahim
- Departments of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
37
|
Martinez JG, Bohn KM, Carroll RJ, Morris JS. A Study of Mexican Free-Tailed Bat Chirp Syllables: Bayesian Functional Mixed Models for Nonstationary Acoustic Time Series. J Am Stat Assoc 2013; 108:514-526. [PMID: 23997376 DOI: 10.1080/01621459.2013.793118] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We describe a new approach to analyze chirp syllables of free-tailed bats from two regions of Texas in which they are predominant: Austin and College Station. Our goal is to characterize any systematic regional differences in the mating chirps and assess whether individual bats have signature chirps. The data are analyzed by modeling spectrograms of the chirps as responses in a Bayesian functional mixed model. Given the variable chirp lengths, we compute the spectrograms on a relative time scale interpretable as the relative chirp position, using a variable window overlap based on chirp length. We use 2D wavelet transforms to capture correlation within the spectrogram in our modeling and obtain adaptive regularization of the estimates and inference for the regions-specific spectrograms. Our model includes random effect spectrograms at the bat level to account for correlation among chirps from the same bat, and to assess relative variability in chirp spectrograms within and between bats. The modeling of spectrograms using functional mixed models is a general approach for the analysis of replicated nonstationary time series, such as our acoustical signals, to relate aspects of the signals to various predictors, while accounting for between-signal structure. This can be done on raw spectrograms when all signals are of the same length, and can be done using spectrograms defined on a relative time scale for signals of variable length in settings where the idea of defining correspondence across signals based on relative position is sensible.
Collapse
Affiliation(s)
- Josue G Martinez
- (Deceased) was recently at the Department of Radiation Oncology, The University of Texas M D Anderson Cancer Center, PO Box 301402, Houston, TX 77230-1402, USA
| | | | | | | |
Collapse
|
38
|
|
39
|
Zhu H, Brown PJ, Morris JS. Robust classification of functional and quantitative image data using functional mixed models. Biometrics 2012; 68:1260-8. [PMID: 22670567 PMCID: PMC3443537 DOI: 10.1111/j.1541-0420.2012.01765.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
This article introduces new methods for performing classification of complex, high-dimensional functional data using the functional mixed model (FMM) framework. The FMM relates a functional response to a set of predictors through functional fixed and random effects, which allows it to account for various factors and between-function correlations. The methods include training and prediction steps. In the training steps we train the FMM model by treating class designation as one of the fixed effects, and in the prediction steps we classify the new objects using posterior predictive probabilities of class. Through a Bayesian scheme, we are able to adjust for factors affecting both the functions and the class designations. While the methods can be used in any FMM framework, we provide details for two specific Bayesian approaches: the Gaussian, wavelet-based FMM (G-WFMM) and the robust, wavelet-based FMM (R-WFMM). Both methods perform modeling in the wavelet space, which yields parsimonious representations for the functions, and can naturally adapt to local features and complex nonstationarities in the functions. The R-WFMM allows potentially heavier tails for features of the functions indexed by particular wavelet coefficients, leading to a down-weighting of outliers that makes the method robust to outlying functions or regions of functions. The models are applied to a pancreatic cancer mass spectroscopy data set and compared with other recently developed functional classification methods.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistical Science, Duke University, Durham, NC 27708, U.S.A
| | - Philip J. Brown
- School of Mathematics, Statistics and Actuarial Science, University of Kent, U.K
| | - Jeffrey S. Morris
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX 77230, U.S.A
| |
Collapse
|
40
|
Scheipl F, Fahrmeir L, Kneib T. Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models. J Am Stat Assoc 2012. [DOI: 10.1080/01621459.2012.737742] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
41
|
|
42
|
Morris JS. Statistical Methods for Proteomic Biomarker Discovery based on Feature Extraction or Functional Modeling Approaches. STATISTICS AND ITS INTERFACE 2012; 5:117-135. [PMID: 23814640 PMCID: PMC3693398 DOI: 10.4310/sii.2012.v5.n1.a11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational aspects of comparative proteomic studies, and summarizes contributions I along with numerous collaborators have made. First, there is an overview of comparative proteomics technologies, followed by a discussion of important experimental design and preprocessing issues that must be considered before statistical analysis can be done. Next, the two key approaches to analyzing proteomics data, feature extraction and functional modeling, are described. Feature extraction involves detection and quantification of discrete features like peaks or spots that theoretically correspond to different proteins in the sample. After an overview of the feature extraction approach, specific methods for mass spectrometry (Cromwell) and 2D gel electrophoresis (Pinnacle) are described. The functional modeling approach involves modeling the proteomic data in their entirety as functions or images. A general discussion of the approach is followed by the presentation of a specific method that can be applied, wavelet-based functional mixed models, and its extensions. All methods are illustrated by application to two example proteomic data sets, one from mass spectrometry and one from 2D gel electrophoresis. While the specific methods presented are applied to two specific proteomic technologies, MALDI-TOF and 2D gel electrophoresis, these methods and the other principles discussed in the paper apply much more broadly to other expression proteomics technologies.
Collapse
|
43
|
Least Absolute Deviation Estimate for Functional Coefficient Partially Linear Regression Models. JOURNAL OF PROBABILITY AND STATISTICS 2012. [DOI: 10.1155/2012/131085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|