1
|
Liu X, Jiang J. Classified functional mixed effects model prediction. Stat Med 2024; 43:1329-1340. [PMID: 38279656 DOI: 10.1002/sim.10007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 11/15/2023] [Accepted: 12/15/2023] [Indexed: 01/28/2024]
Abstract
In nowadays biomedical research, there has been a growing demand for making accurate prediction at subject levels. In many of these situations, data are collected as longitudinal curves and display distinct individual characteristics. Thus, prediction mechanisms accommodated with functional mixed effects models (FMEM) are useful. In this paper, we developed a classified functional mixed model prediction (CFMMP) method, which adapts classified mixed model prediction (CMMP) to the framework of FMEM. Performance of CFMMP against functional regression prediction based on simulation studies and the consistency property of CFMMP estimators are explored. Real-world applications of CFMMP are illustrated using real world examples including data from the hormone research menstrual cycles and the diffusion tensor imaging.
Collapse
Affiliation(s)
- Xiaoyan Liu
- Statistics Department, University of California, Davis, California, USA
| | - Jiming Jiang
- Statistics Department, University of California, Davis, California, USA
| |
Collapse
|
2
|
Crook OM, Lilley KS, Gatto L, Kirk PD. Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics. Ann Appl Stat 2022; 16:22-aoas1603. [PMID: 36507469 PMCID: PMC7613899 DOI: 10.1214/22-aoas1603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e. proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.
Collapse
|
3
|
Huo S, Morris JS, Zhu H. Ultra-Fast Approximate Inference Using Variational Functional Mixed Models. J Comput Graph Stat 2022; 32:353-365. [PMID: 37608921 PMCID: PMC10441618 DOI: 10.1080/10618600.2022.2107532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 07/23/2022] [Indexed: 10/16/2022]
Abstract
While Bayesian functional mixed models have been shown effective to model functional data with various complex structures, their application to extremely high-dimensional data is limited due to computational challenges involved in posterior sampling. We introduce a new computational framework that enables ultra-fast approximate inference for high-dimensional data in functional form. This framework adopts parsimonious basis to represent functional observations, which facilitates efficient compression and parallel computing in basis space. Instead of performing expensive Markov chain Monte Carlo sampling, we approximate the posterior distribution using variational Bayes and adopt a fast iterative algorithm to estimate parameters of the approximate distribution. Our approach facilitates a fast multiple testing procedure in basis space, which can be used to identify significant local regions that reflect differences across groups of samples. We perform two simulation studies to assess the performance of approximate inference, and demonstrate applications of the proposed approach by using a proteomic mass spectrometry dataset and a brain imaging dataset. Supplementary materials are available online.
Collapse
Affiliation(s)
| | - Jeffrey S Morris
- Department of Biostatistics, Epidemiology and Informatics, Department of Statistics, University of Pennsylvania
| | | |
Collapse
|
4
|
Meyer MJ, Morris JS, Gazes RP, Coull BA. Ordinal probit functional outcome regression with application to computer-use behavior in rhesus monkeys. Ann Appl Stat 2022; 16:537-550. [DOI: 10.1214/21-aoas1513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Mark J. Meyer
- Department of Mathematics and Statistics, Georgetown University
| | - Jeffrey S. Morris
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania
| | - Regina Paxton Gazes
- Department of Psychology and Program in Animal Behavior, Bucknell University
| | - Brent A. Coull
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| |
Collapse
|
5
|
Tang L, Zeng P, Qing Shi J, Kim WS. Model-based joint curve registration and classification. J Appl Stat 2022; 50:1178-1198. [PMID: 37009594 PMCID: PMC10062228 DOI: 10.1080/02664763.2021.2023118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 12/20/2021] [Indexed: 10/19/2022]
Abstract
In this paper, we consider the problem of classification of misaligned multivariate functional data. We propose to use a model-based approach for the joint registration and classification of such data. The observed functional inputs are modeled as a functional nonlinear mixed effects model containing a nonlinear functional fixed effect constructed upon warping functions to account for curve alignment, and a nonlinear functional random effects component to address the variability among subjects. The warping functions are also modeled to accommodate common effect within groups and the variability between subjects. Then, a functional logistic regression model defined upon the representation of the aligned curves and scalar inputs is used to account for curve classification. EM-based algorithms are developed to perform maximum likelihood inference of the proposed models. The identifiability of the registration model and the asymptotical properties of the proposed method are established. The performance of the proposed procedure is illustrated via simulation studies and an analysis of a hyoid bone movement data application. The statistical developments proposed in this paper were motivated by the hyoid bone movement study, the methodology is designed and presented generality and can be applied to numerous areas of scientific research.
Collapse
Affiliation(s)
- Lin Tang
- Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming, Yunnan, People's Republic of China
| | - Pengcheng Zeng
- Institute of Mathematical Sciences, ShanghaiTech University, Shanghai, People's Republic of China
| | - Jian Qing Shi
- Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen, People's Republic of China
- National Center for Applied Mathematics, Shenzhen, People's Republic of China
| | - Won-Seok Kim
- Department of Rehabilitation Medicine, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea
| |
Collapse
|
6
|
Meng S, Huang Z, Zhang J, Jiang Z. Estimation on functional partially linear single index measurement error model. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1999979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Shuyu Meng
- School of Science, Nanjing University of Science and Technology, Nanjing, Jiangsu, P. R. China
| | - Zhensheng Huang
- School of Science, Nanjing University of Science and Technology, Nanjing, Jiangsu, P. R. China
| | - Jing Zhang
- School of Science, Nanjing University of Science and Technology, Nanjing, Jiangsu, P. R. China
| | - Zhiqiang Jiang
- School of Science, Nanjing University of Science and Technology, Nanjing, Jiangsu, P. R. China
| |
Collapse
|
7
|
Abstract
Covariance estimation is essential yet underdeveloped for analyzing multivariate functional data. We propose a fast covariance estimation method for multivariate sparse functional data using bivariate penalized splines. The tensor-product B-spline formulation of the proposed method enables a simple spectral decomposition of the associated covariance operator and explicit expressions of the resulting eigenfunctions as linear combinations of B-spline bases, thereby dramatically facilitating subsequent principal component analysis. We derive a fast algorithm for selecting the smoothing parameters in covariance smoothing using leave-one-subject-out cross-validation. The method is evaluated with extensive numerical studies and applied to an Alzheimer's disease study with multiple longitudinal outcomes.
Collapse
Affiliation(s)
- Cai Li
- Department of Statistics, North Carolina State Univerisy, NC, USA
| | - Luo Xiao
- Department of Statistics, North Carolina State Univerisy, NC, USA
| | - Sheng Luo
- Department of Biostatistics and Bioinformatics, Duke Universitye, NC, USA
| |
Collapse
|
8
|
Noh H, Choi T, Park J, Chung Y. Bayesian latent factor regression for multivariate functional data with variable selection. J Korean Stat Soc 2020. [DOI: 10.1007/s42952-019-00044-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Yang H, Baladandayuthapani V, Rao AUK, Morris JS. Quantile Function on Scalar Regression Analysis for Distributional Data. J Am Stat Assoc 2019; 115:90-106. [PMID: 32981991 PMCID: PMC7517594 DOI: 10.1080/01621459.2019.1609969] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 03/08/2019] [Accepted: 04/07/2019] [Indexed: 02/05/2023]
Abstract
Radiomics involves the study of tumor images to identify quantitative markers explaining cancer heterogeneity. The predominant approach is to extract hundreds to thousands of image features, including histogram features comprised of summaries of the marginal distribution of pixel intensities, which leads to multiple testing problems and can miss out on insights not contained in the selected features. In this paper, we present methods to model the entire marginal distribution of pixel intensities via the quantile function as functional data, regressed on a set of demographic, clinical, and genetic predictors to investigate their effects of imaging-based cancer heterogeneity. We call this approach quantile functional regression, regressing subject-specific marginal distributions across repeated measurements on a set of covariates, allowing us to assess which covariates are associated with the distribution in a global sense, as well as to identify distributional features characterizing these differences, including mean, variance, skewness, heavy-tailedness, and various upper and lower quantiles. To account for smoothness in the quantile functions, account for intrafunctional correlation, and gain statistical power, we introduce custom basis functions we call quantlets that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set and containing a Gaussian subspace so non-Gaussianness can be assessed. We fit this model using a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and provides fully Bayesian inference after fitting a Markov chain Monte Carlo. We demonstrate the benefit of the basis space modeling through simulation studies, and apply the method to Magnetic resonance imaging (MRI) based radiomic dataset from Glioblastoma Multiforme to relate imaging-based quantile functions to various demographic, clinical, and genetic predictors, finding specific differences in tumor pixel intensity distribution between males and females and between tumors with and without DDIT3 mutations.
Collapse
Affiliation(s)
- Hojin Yang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | | | - Arvind U K Rao
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | - Jeffrey S Morris
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| |
Collapse
|
10
|
Park Y, Simpson DG. Robust probabilistic classification applicable to irregularly sampled functional data. Comput Stat Data Anal 2019; 131:37-49. [PMID: 31086427 PMCID: PMC6510497 DOI: 10.1016/j.csda.2018.08.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
A robust probabilistic classifier for functional data is developed to predict class membership based on functional input measurements and to provide a reliable probability estimates for class membership. The method combines a Bayes classifier and semi-parametric mixed effects model with robust tuning parameter to make the method robust to outlying curves, and to improve the accuracy of the risk or uncertainty estimates, which is crucial in medical diagnostic applications. The approach applies to functional data with varying ranges and irregular sampling without making parametric assumptions on the within-curve covariance. Simulation studies evaluate the proposed method and competitors in terms of sensitivity to heavy tailed functional distributions and outlying curves. Classification performance is evaluated by both error rate and logloss, the latter of which imposes heavier penalties on highly confident errors than on less confident errors. Runtime experiments on the R implementation indicate that the proposed method scales well computationally. Illustrative applications include data from quantitative ultrasound analysis and phoneme recognition.
Collapse
Affiliation(s)
- Yeonjoo Park
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S Wright St., Champaign, IL 61820, USA
| | - Douglas G. Simpson
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S Wright St., Champaign, IL 61820, USA
| |
Collapse
|
11
|
Zhu H, Zhang R, Yu Z, Lian H, Liu Y. Estimation and testing for partially functional linear errors-in-variables models. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2018.11.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
12
|
Wang Y, Hu J, Ng CS, Hobbs BP. A functional model for classifying metastatic lesions integrating scans and biomarkers. Stat Methods Med Res 2019; 29:137-150. [PMID: 30672395 DOI: 10.1177/0962280218823795] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Perfusion computed tomography is an emerging functional imaging modality that uses physiological models to quantify characteristics pertaining to the passage of fluid through blood vessels. Perfusion characteristics provide physiological correlates for neovascularization induced by tumor angiogenesis and thus a quantitative basis for cancer detection, prognostication, and treatment monitoring. We consider a liver cancer study where patients underwent a dynamic computed tomography protocol to enable evaluation of multiple perfusion characteristics derived from interrogating the time-attenuation of the concentration of the intravenously administered contrast medium. The objective is to determine the effectiveness of using perfusion characteristics to identify and discriminate between regions of liver that contain malignant tissues from normal tissue. Each patient contributes multiple regions of interest which are spatially correlated due to the shared vasculature. We propose a multivariate functional data model to disclose the correlation over time and space as well as the correlation among multiple perfusion characteristics. We further propose a simultaneous classification approach that utilizes all the correlation information to predict class assignments for collections of regions. The proposed method outperforms conventional classification approaches in the presence of strong spatial correlation. The method offers maximal relative improvement in the presence of temporal sparsity wherein measurements are obtainable at only a few time points.
Collapse
Affiliation(s)
- Yuan Wang
- Department of Mathematics and Statistics, Washington State University, Pullman, WA, USA
| | - Jianhua Hu
- Department of Biostatistics, Columbia University, New York, NY, USA
| | - Chaan S Ng
- Department of Diagnostic Radiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Brian P Hobbs
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
13
|
Zhu H, Caspers P, Morris JS, Wu X, Müller R. A Unified Analysis of Structured Sonar-terrain Data using Bayesian Functional Mixed Models. Technometrics 2018; 60:112-123. [PMID: 29749977 DOI: 10.1080/00401706.2016.1274681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Sonar emits pulses of sound and uses the reflected echoes to gain information about target objects. It offers a low cost, complementary sensing modality for small robotic platforms. While existing analytical approaches often assume independence across echoes, real sonar data can have more complicated structures due to device setup or experimental design. In this paper, we consider sonar echo data collected from multiple terrain substrates with a dual-channel sonar head. Our goals are to identify the differential sonar responses to terrains and study the effectiveness of this dual-channel design in discriminating targets. We describe a unified analytical framework that achieves these goals rigorously, simultaneously, and automatically. The analysis was done by treating the echo envelope signals as functional responses and the terrain/channel information as covariates in a functional regression setting. We adopt functional mixed models that facilitate the estimation of terrain and channel effects while capturing the complex hierarchical structure in data. This unified analytical framework incorporates both Gaussian models and robust models. We fit the models using a full Bayesian approach, which enables us to perform multiple inferential tasks under the same modeling framework, including selecting models, estimating the effects of interest, identifying significant local regions, discriminating terrain types, and describing the discriminatory power of local regions. Our analysis of the sonar-terrain data identifies time regions that reflect differential sonar responses to terrains. The discriminant analysis suggests that a multi- or dual-channel design achieves target identification performance comparable with or better than a single-channel design.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Philip Caspers
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
| | - Jeffrey S Morris
- The University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Rolf Müller
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
| |
Collapse
|
14
|
|
15
|
Testing Gait with Ankle-Foot Orthoses in Children with Cerebral Palsy by Using Functional Mixed-Effects Analysis of Variance. Sci Rep 2017; 7:11081. [PMID: 28894132 PMCID: PMC5594035 DOI: 10.1038/s41598-017-11282-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 08/17/2017] [Indexed: 11/25/2022] Open
Abstract
Existing statistical methods extract insufficient information from 3-dimensional gait data, rendering clinical interpretation of impaired movement patterns sub-optimal. We propose an alternative approach based on functional data analysis that may be worthy of exploration. We apply this to gait data analysis using repeated-measurements data from children with cerebral palsy who had been prescribed fixed ankle-foot orthoses as an example. We analyze entire gait curves by means of a new functional F test with comparison to multiple pointwise F tests and also to the traditional method - univariate repeated-measurements analysis of variance of joint angle minima and maxima. The new test maintains the nominal significance level and can be adapted to test hypotheses for specific phases of the gait cycle. The main findings indicate that ankle-foot orthoses exert significant effects on coronal and sagittal plane ankle rotation; and both sagittal and horizontal plane foot rotation. The functional F test provided further information for the stance and swing phases. Differences between the results of the different statistical approaches are discussed, concluding that the novel method has potential utility and is worthy of validation through larger scale patient and clinician engagement to determine whether it is preferable to the traditional approach.
Collapse
|
16
|
Morris JS, Baladandayuthapani V. Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration. STAT MODEL 2017; 17:245-289. [PMID: 29129969 PMCID: PMC5679480 DOI: 10.1177/1471082x17698255] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The advent of high-throughput multi-platform genomics technologies providing whole-genome molecular summaries of biological samples has revolutionalized biomedical research. These technologiees yield highly structured big data, whose analysis poses significant quantitative challenges. The field of Bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to effectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, focusing on four areas: (1) experimental design and reproducibility, (2) preprocessing and feature extraction, (3) unified modeling, and (4) structure learning and integration. In each of these areas, we highlight some key contributions and try to elucidate the key statistical principles underlying these methods and approaches. Our goals are to demonstrate major ways in which statisticians have contributed to bioinformatics, encourage statisticians to get involved early in methods development as new technologies emerge, and to stimulate future methodological work based on the statistical principles elucidated in this article and utilizing all availble information to uncover new biological insights.
Collapse
Affiliation(s)
- Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | | |
Collapse
|
17
|
Abstract
In this article, Greven and Scheipl describe an impressively general framework for performing functional regression that builds upon the generalized additive modeling framework. Over the past number of years, my collaborators and I have also been developing a general framework for functional regression, functional mixed models, which shares many similarities with this framework, but has many differences as well. In this discussion, I compare and contrast these two frameworks, to hopefully illuminate characteristics of each, highlighting their respecitve strengths and weaknesses, and providing recommendations regarding the settings in which each approach might be preferable.
Collapse
Affiliation(s)
- Jeffrey S Morris
- The University of Texas, MD Anderson Cancer Center, Unit 1411, PO Box 301402, Houston, TX 77230-1402
| |
Collapse
|
18
|
|
19
|
Kao Y, Reich B, Storlie C, Anderson B. Malware Detection Using Nonparametric Bayesian Clustering and Classification Techniques. Technometrics 2015. [DOI: 10.1080/00401706.2014.958916] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
20
|
Lee W, Morris JS. Identification of differentially methylated loci using wavelet-based functional mixed models. Bioinformatics 2015; 32:664-72. [PMID: 26559505 DOI: 10.1093/bioinformatics/btv659] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 11/05/2015] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION DNA methylation is a key epigenetic modification that can modulate gene expression. Over the past decade, a lot of studies have focused on profiling DNA methylation and investigating its alterations in complex diseases such as cancer. While early studies were mostly restricted to CpG islands or promoter regions, recent findings indicate that many of important DNA methylation changes can occur in other regions and DNA methylation needs to be examined on a genome-wide scale. In this article, we apply the wavelet-based functional mixed model methodology to analyze the high-throughput methylation data for identifying differentially methylated loci across the genome. Contrary to many commonly-used methods that model probes independently, this framework accommodates spatial correlations across the genome through basis function modeling as well as correlations between samples through functional random effects, which allows it to be applied to many different settings and potentially leads to more power in detection of differential methylation. RESULTS We applied this framework to three different high-dimensional methylation data sets (CpG Shore data, THREE data and NIH Roadmap Epigenomics data), studied previously in other works. A simulation study based on CpG Shore data suggested that in terms of detection of differentially methylated loci, this modeling approach using wavelets outperforms analogous approaches modeling the loci as independent. For the THREE data, the method suggests newly detected regions of differential methylation, which were not reported in the original study. AVAILABILITY AND IMPLEMENTATION Automated software called WFMM is available at https://biostatistics.mdanderson.org/SoftwareDownload CpG Shore data is available at http://rafalab.dfci.harvard.edu NIH Roadmap Epigenomics data is available at http://compbio.mit.edu/roadmap SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT jefmorris@mdanderson.org.
Collapse
Affiliation(s)
- Wonyul Lee
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
21
|
Zhao Y, Chen H, Ogden RT. Wavelet-Based Weighted LASSO and Screening Approaches in Functional Linear Regression. J Comput Graph Stat 2015. [DOI: 10.1080/10618600.2014.925458] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
22
|
Meyer MJ, Coull BA, Versace F, Cinciripini P, Morris JS. Bayesian function-on-function regression for multilevel functional data. Biometrics 2015; 71:563-74. [PMID: 25787146 PMCID: PMC4575250 DOI: 10.1111/biom.12299] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Revised: 12/01/2014] [Accepted: 01/01/2015] [Indexed: 11/30/2022]
Abstract
Medical and public health research increasingly involves the collection of complex and high dimensional data. In particular, functional data-where the unit of observation is a curve or set of curves that are finely sampled over a grid-is frequently obtained. Moreover, researchers often sample multiple curves per person resulting in repeated functional measures. A common question is how to analyze the relationship between two functional variables. We propose a general function-on-function regression model for repeatedly sampled functional data on a fine grid, presenting a simple model as well as a more extensive mixed model framework, and introducing various functional Bayesian inferential procedures that account for multiple testing. We examine these models via simulation and a data analysis with data from a study that used event-related potentials to examine how the brain processes various types of images.
Collapse
Affiliation(s)
- Mark J. Meyer
- Department of Mathematics, Bucknell University, Lewisburg, Pennsylvania, U.S.A
| | - Brent A. Coull
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, U.S.A
| | - Francesco Versace
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Paul Cinciripini
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Jeffrey S. Morris
- The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| |
Collapse
|
23
|
Reiss PT, Huo L, Zhao Y, Kelly C, Ogden RT. WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING. Ann Appl Stat 2015; 9:1076-1101. [PMID: 27330652 PMCID: PMC4912166 DOI: 10.1214/15-aoas829] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
An increasingly important goal of psychiatry is the use of brain imaging data to develop predictive models. Here we present two contributions to statistical methodology for this purpose. First, we propose and compare a set of wavelet-domain procedures for fitting generalized linear models with scalar responses and image predictors: sparse variants of principal component regression and of partial least squares, and the elastic net. Second, we consider assessing the contribution of image predictors over and above available scalar predictors, in particular via permutation tests and an extension of the idea of confounding to the case of functional or image predictors. Using the proposed methods, we assess whether maps of a spontaneous brain activity measure, derived from functional magnetic resonance imaging, can meaningfully predict presence or absence of attention deficit/hyperactivity disorder (ADHD). Our results shed light on the role of confounding in the surprising outcome of the recent ADHD-200 Global Competition, which challenged researchers to develop algorithms for automated image-based diagnosis of the disorder.
Collapse
|
24
|
Wang Y, Hobbs BP, Hu J, Ng CS, Do KA. Predictive classification of correlated targets with application to detection of metastatic cancer using functional CT imaging. Biometrics 2015; 71:792-802. [PMID: 25851056 DOI: 10.1111/biom.12304] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 01/01/2015] [Accepted: 02/01/2015] [Indexed: 11/28/2022]
Abstract
Perfusion computed tomography (CTp) is an emerging functional imaging modality that uses physiological models to quantify characteristics pertaining to the passage of fluid through blood vessels. Perfusion characteristics provide physiological correlates for neovascularization induced by tumor angiogenesis. Thus CTp offers promise as a non-invasive quantitative functional imaging tool for cancer detection, prognostication, and treatment monitoring. In this article, we develop a Bayesian probabilistic framework for simultaneous supervised classification of multivariate correlated objects using separable covariance. The classification approach is applied to discriminate between regions of liver that contain pathologically verified metastases from normal liver tissue using five perfusion characteristics. The hepatic regions tend to be highly correlated due to common vasculature. We demonstrate that simultaneous Bayesian classification yields dramatic improvements in performance in the presence of strong correlation among intra-subject units, yet remains competitive with classical methods in the presence of weak or no correlation.
Collapse
Affiliation(s)
- Yuan Wang
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Brian P Hobbs
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Jianhua Hu
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Chaan S Ng
- Department of Diagnostic Radiology, University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| | - Kim-Anh Do
- Department of Biostatistics, University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A
| |
Collapse
|
25
|
Abstract
In recent years, several methods have been proposed to deal with functional data classification problems (e.g., one-dimensional curves or two- or three-dimensional images). One popular general approach is based on the kernel-based method, proposed by Ferraty and Vieu (2003). The performance of this general method depends heavily on the choice of the semi-metric. Motivated by Fan and Lin (1998) and our image data, we propose a new semi-metric, based on wavelet thresholding for classifying functional data. This wavelet-thresholding semi-metric is able to adapt to the smoothness of the data and provides for particularly good classification when data features are localized and/or sparse. We conduct simulation studies to compare our proposed method with several functional classification methods and study the relative performance of the methods for classifying positron emission tomography (PET) images.
Collapse
Affiliation(s)
- Chung Chang
- Department of Applied Mathematics, National Sun Yat-sen University, Taiwan
| | - R. Todd Ogden
- Department of Biostatistics, Columbia University, New York, NY, USA
| | - Yakuan Chen
- Department of Biostatistics, Columbia University, New York, NY, USA
| |
Collapse
|
26
|
Martinez JG, Bohn KM, Carroll RJ, Morris JS. A Study of Mexican Free-Tailed Bat Chirp Syllables: Bayesian Functional Mixed Models for Nonstationary Acoustic Time Series. J Am Stat Assoc 2013; 108:514-526. [PMID: 23997376 DOI: 10.1080/01621459.2013.793118] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We describe a new approach to analyze chirp syllables of free-tailed bats from two regions of Texas in which they are predominant: Austin and College Station. Our goal is to characterize any systematic regional differences in the mating chirps and assess whether individual bats have signature chirps. The data are analyzed by modeling spectrograms of the chirps as responses in a Bayesian functional mixed model. Given the variable chirp lengths, we compute the spectrograms on a relative time scale interpretable as the relative chirp position, using a variable window overlap based on chirp length. We use 2D wavelet transforms to capture correlation within the spectrogram in our modeling and obtain adaptive regularization of the estimates and inference for the regions-specific spectrograms. Our model includes random effect spectrograms at the bat level to account for correlation among chirps from the same bat, and to assess relative variability in chirp spectrograms within and between bats. The modeling of spectrograms using functional mixed models is a general approach for the analysis of replicated nonstationary time series, such as our acoustical signals, to relate aspects of the signals to various predictors, while accounting for between-signal structure. This can be done on raw spectrograms when all signals are of the same length, and can be done using spectrograms defined on a relative time scale for signals of variable length in settings where the idea of defining correspondence across signals based on relative position is sensible.
Collapse
Affiliation(s)
- Josue G Martinez
- (Deceased) was recently at the Department of Radiation Oncology, The University of Texas M D Anderson Cancer Center, PO Box 301402, Houston, TX 77230-1402, USA
| | | | | | | |
Collapse
|
27
|
Morris JS. Statistical Methods for Proteomic Biomarker Discovery based on Feature Extraction or Functional Modeling Approaches. STATISTICS AND ITS INTERFACE 2012; 5:117-135. [PMID: 23814640 PMCID: PMC3693398 DOI: 10.4310/sii.2012.v5.n1.a11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational aspects of comparative proteomic studies, and summarizes contributions I along with numerous collaborators have made. First, there is an overview of comparative proteomics technologies, followed by a discussion of important experimental design and preprocessing issues that must be considered before statistical analysis can be done. Next, the two key approaches to analyzing proteomics data, feature extraction and functional modeling, are described. Feature extraction involves detection and quantification of discrete features like peaks or spots that theoretically correspond to different proteins in the sample. After an overview of the feature extraction approach, specific methods for mass spectrometry (Cromwell) and 2D gel electrophoresis (Pinnacle) are described. The functional modeling approach involves modeling the proteomic data in their entirety as functions or images. A general discussion of the approach is followed by the presentation of a specific method that can be applied, wavelet-based functional mixed models, and its extensions. All methods are illustrated by application to two example proteomic data sets, one from mass spectrometry and one from 2D gel electrophoresis. While the specific methods presented are applied to two specific proteomic technologies, MALDI-TOF and 2D gel electrophoresis, these methods and the other principles discussed in the paper apply much more broadly to other expression proteomics technologies.
Collapse
|