1
|
Gertheiss J, Rügamer D, Liew BXW, Greven S. Functional Data Analysis: An Introduction and Recent Developments. Biom J 2024; 66:e202300363. [PMID: 39330918 DOI: 10.1002/bimj.202300363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 05/17/2024] [Accepted: 05/27/2024] [Indexed: 09/28/2024]
Abstract
Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionality of observations and parameters, respectively. This paper provides an introduction to FDA, including a description of the most common statistical analysis techniques, their respective software implementations, and some recent developments in the field. The paper covers fundamental concepts such as descriptives and outliers, smoothing, amplitude and phase variation, and functional principal component analysis. It also discusses functional regression, statistical inference with functional data, functional classification and clustering, and machine learning approaches for functional data analysis. The methods discussed in this paper are widely applicable in fields such as medicine, biophysics, neuroscience, and chemistry and are increasingly relevant due to the widespread use of technologies that allow for the collection of functional data. Sparse functional data methods are also relevant for longitudinal data analysis. All presented methods are demonstrated using available software in R by analyzing a dataset on human motion and motor control. To facilitate the understanding of the methods, their implementation, and hands-on application, the code for these practical examples is made available through a code and data supplement and on GitHub.
Collapse
Affiliation(s)
- Jan Gertheiss
- Departmesnt of Mathematics and Statistics, School of Economics and Social Sciences, Helmut Schmidt University, Hamburg, Germany
| | - David Rügamer
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Bernard X W Liew
- School of Sport, Rehabilitation and Exercise Sciences, University of Essex, Essex, UK
| | - Sonja Greven
- Chair of Statistics, School of Business and Economics, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
2
|
Sun J, Lee KY. Generalized functional linear model with a point process predictor. Stat Med 2024; 43:1564-1576. [PMID: 38332307 DOI: 10.1002/sim.10023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 12/17/2023] [Accepted: 01/15/2024] [Indexed: 02/10/2024]
Abstract
Point process data have become increasingly popular these days. For example, many of the data captured in electronic health records (EHR) are in the format of point process data. It is of great interest to study the association between a point process predictor and a scalar response using generalized functional linear regression models. Various generalized functional linear regression models have been developed under different settings in the past decades. However, existing methods can only deal with functional or longitudinal predictors, not point process predictors. In this article, we propose a novel generalized functional linear regression model for a point process predictor. Our proposed model is based on the joint modeling framework, where we adopt a log-Gaussian Cox process model for the point process predictor and a generalized linear regression model for the outcome. We also develop a new algorithm for fast model estimation based on the Gaussian variational approximation method. We conduct extensive simulation studies to evaluate the performance of our proposed method and compare it to competing methods. The performance of our proposed method is further demonstrated on an EHR dataset of patients admitted into the intensive care units of the Beth Israel Deaconess Medical Center between 2001 and 2008.
Collapse
Affiliation(s)
- Jiehuan Sun
- Division of Epidemiology and Biostatistics, School of Public Health, University of Illinois Chicago, Chicago, Illinois, USA
| | - Kuang-Yao Lee
- Department of Statistics, Operations, and Data Science, Temple University, Philadelphia, Pennsylvania, USA
| |
Collapse
|
3
|
Liu X, Jiang J. Classified functional mixed effects model prediction. Stat Med 2024; 43:1329-1340. [PMID: 38279656 DOI: 10.1002/sim.10007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 11/15/2023] [Accepted: 12/15/2023] [Indexed: 01/28/2024]
Abstract
In nowadays biomedical research, there has been a growing demand for making accurate prediction at subject levels. In many of these situations, data are collected as longitudinal curves and display distinct individual characteristics. Thus, prediction mechanisms accommodated with functional mixed effects models (FMEM) are useful. In this paper, we developed a classified functional mixed model prediction (CFMMP) method, which adapts classified mixed model prediction (CMMP) to the framework of FMEM. Performance of CFMMP against functional regression prediction based on simulation studies and the consistency property of CFMMP estimators are explored. Real-world applications of CFMMP are illustrated using real world examples including data from the hormone research menstrual cycles and the diffusion tensor imaging.
Collapse
Affiliation(s)
- Xiaoyan Liu
- Statistics Department, University of California, Davis, California, USA
| | - Jiming Jiang
- Statistics Department, University of California, Davis, California, USA
| |
Collapse
|
4
|
Yang Q, Jiang M, Li C, Luo S, Crowley MJ, Shaw RJ. Predicting health outcomes with intensive longitudinal data collected by mobile health devices: a functional principal component regression approach. BMC Med Res Methodol 2024; 24:69. [PMID: 38494505 PMCID: PMC10944610 DOI: 10.1186/s12874-024-02193-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/01/2024] [Indexed: 03/19/2024] Open
Abstract
BACKGROUND Intensive longitudinal data (ILD) collected in near real time by mobile health devices provide a new opportunity for monitoring chronic diseases, early disease risk prediction, and disease prevention in health research. Functional data analysis, specifically functional principal component analysis, has great potential to abstract trends in ILD but has not been used extensively in mobile health research. OBJECTIVE To introduce functional principal component analysis (fPCA) and demonstrate its potential applicability in estimating trends in ILD collected by mobile heath devices, assessing longitudinal association between ILD and health outcomes, and predicting health outcomes. METHODS fPCA and scalar-to-function regression models were reviewed. A case study was used to illustrate the process of abstracting trends in intensively self-measured blood glucose using functional principal component analysis and then predicting future HbA1c values in patients with type 2 diabetes using a scalar-to-function regression model. RESULTS Based on the scalar-to-function regression model results, there was a slightly increasing trend between daily blood glucose measures and HbA1c. 61% of variation in HbA1c could be predicted by the three preceding months' blood glucose values measured before breakfast (P < 0.0001, [Formula: see text]). CONCLUSIONS Functional data analysis, specifically fPCA, offers a unique tool to capture patterns in ILD collected by mobile health devices. It is particularly useful in assessing longitudinal dynamic association between repeated measures and outcomes, and can be easily integrated in prediction models to improve prediction precision.
Collapse
Affiliation(s)
- Qing Yang
- School of Nursing, Duke University, Durham, USA.
| | | | - Cai Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Sheng Luo
- Biostatistics & Bioinformatics, Duke University, Durham, USA
| | - Matthew J Crowley
- Center of Innovation to Accelerate Discovery and Practice Transformation, Durham Veterans Affairs Medical Center, Durham, NC, USA
- Division of Endocrinology, Diabetes and Metabolism, Duke University School of Medicine, Durham, NC, USA
| | - Ryan J Shaw
- School of Nursing, Duke University, Durham, USA
- Center of Innovation to Accelerate Discovery and Practice Transformation, Durham Veterans Affairs Medical Center, Durham, NC, USA
- Center for Applied Genomics & Precision Medicine, School of Medicine, Duke University, Durham, NC, USA
| |
Collapse
|
5
|
Dempsey W. Recurrent event analysis in the presence of real-time high frequency data via random subsampling. J Comput Graph Stat 2023; 33:525-537. [PMID: 38868625 PMCID: PMC11165938 DOI: 10.1080/10618600.2023.2276114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/17/2023] [Indexed: 06/14/2024]
Abstract
Digital monitoring studies collect real-time high frequency data via mobile sensors in the subjects' natural environment. This data can be used to model the impact of changes in physiology on recurrent event outcomes such as smoking, drug use, alcohol use, or self-identified moments of suicide ideation. Likelihood calculations for the recurrent event analysis, however, become computationally prohibitive in this setting. Motivated by this, a random subsampling framework is proposed for computationally efficient, approximate likelihood-based estimation. A subsampling-unbiased estimator for the derivative of the cumulative hazard enters into an approximation of log-likelihood. The estimator has two sources of variation: the first due to the recurrent event model and the second due to subsampling. The latter can be reduced by increasing the sampling rate; however, this leads to increased computational costs. The approximate score equations are equivalent to logistic regression score equations, allowing for standard, "off-the-shelf" software to be used in fitting these models. Simulations demonstrate the method and efficiency-computation trade-off. We end by illustrating our approach using data from a digital monitoring study of suicidal ideation.
Collapse
Affiliation(s)
- Walter Dempsey
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
| |
Collapse
|
6
|
Li R, Xiao L. Latent factor model for multivariate functional data. Biometrics 2023; 79:3307-3318. [PMID: 37661821 PMCID: PMC10840703 DOI: 10.1111/biom.13924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 08/17/2023] [Indexed: 09/05/2023]
Abstract
For multivariate functional data, a functional latent factor model is proposed, extending the traditional latent factor model for multivariate data. The proposed model uses unobserved stochastic processes to induce the dependence among the different functions, and thus, for a large number of functions, may provide a more parsimonious and interpretable characterization of the otherwise complex dependencies between the functions. Sufficient conditions are provided to establish the identifiability of the proposed model. The performance of the proposed model is assessed through simulation studies and an application to electroencephalography data.
Collapse
Affiliation(s)
- Ruonan Li
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, U.S.A
| | - Luo Xiao
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, U.S.A
| |
Collapse
|
7
|
Zou H, Zeng D, Xiao L, Luo S. BAYESIAN INFERENCE AND DYNAMIC PREDICTION FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA. Ann Appl Stat 2023; 17:2574-2595. [PMID: 37719893 PMCID: PMC10500582 DOI: 10.1214/23-aoas1733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2023]
Abstract
Alzheimer's disease (AD) is a complex neurological disorder impairing multiple domains such as cognition and daily functions. To better understand the disease and its progression, many AD research studies collect multiple longitudinal outcomes that are strongly predictive of the onset of AD dementia. We propose a joint model based on a multivariate functional mixed model framework (referred to as MFMM-JM) that simultaneously models the multiple longitudinal outcomes and the time to dementia onset. We develop six functional forms to fully investigate the complex association between longitudinal outcomes and dementia onset. Moreover, we use the Bayesian methods for statistical inference and develop a dynamic prediction framework that provides accurate personalized predictions of disease progressions based on new subject-specific data. We apply the proposed MFMM-JM to two large ongoing AD studies: the Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC), and identify the functional forms with the best predictive performance. our method is also validated by extensive simulation studies with five settings.
Collapse
Affiliation(s)
- Haotian Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Luo Xiao
- Department of Statistics, North Carolina State University
| | - Sheng Luo
- Department of Biostatistics and Bioinformatics, Duke University
| |
Collapse
|
8
|
Zou H, Xiao L, Zeng D, Luo S. Multivariate functional mixed model with MRI data: An application to Alzheimer's disease. Stat Med 2023; 42:1492-1511. [PMID: 36805635 PMCID: PMC10133011 DOI: 10.1002/sim.9683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 11/09/2022] [Accepted: 01/26/2023] [Indexed: 02/22/2023]
Abstract
Alzheimer's Disease (AD) is the leading cause of dementia and impairment in various domains. Recent AD studies, (ie, Alzheimer's Disease Neuroimaging Initiative (ADNI) study), collect multimodal data, including longitudinal neurological assessments and magnetic resonance imaging (MRI) data, to better study the disease progression. Adopting early interventions is essential to slow AD progression for subjects with mild cognitive impairment (MCI). It is of particular interest to develop an AD predictive model that leverages multimodal data and provides accurate personalized predictions. In this article, we propose a multivariate functional mixed model with MRI data (MFMM-MRI) that simultaneously models longitudinal neurological assessments, baseline MRI data, and the survival outcome (ie, dementia onset) for subjects with MCI at baseline. Two functional forms (the random-effects model and instantaneous model) linking the longitudinal and survival process are investigated. We use Markov Chain Monte Carlo (MCMC) method based on No-U-Turn Sampling (NUTS) algorithm to obtain posterior samples. We develop a dynamic prediction framework that provides accurate personalized predictions of longitudinal trajectories and survival probability. We apply MFMM-MRI to the ADNI study and identify significant associations among longitudinal outcomes, MRI data, and the risk of dementia onset. The instantaneous model with voxels from the whole brain has the best prediction performance among all candidate models. The simulation study supports the validity of the estimation and dynamic prediction method.
Collapse
Affiliation(s)
- Haotian Zou
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States
| | - Luo Xiao
- Department of Statistics, North Carolina State University, North Carolina, United States
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States
| | - Sheng Luo
- Department of Biostatistics and Bioinformatics, Duke University, North Carolina, United States
| | | |
Collapse
|
9
|
Zhang Z, Charalambous C, Foster P. A Gaussian copula joint model for longitudinal and time-to-event data with random effects. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
10
|
Stallard E, Kociolek A, Jin Z, Ryu H, Lee S, Cosentino S, Zhu C, Gu Y, Fernandez K, Hernandez M, Kinosian B, Stern Y. Validation of a Multivariate Prediction Model of the Clinical Progression of Alzheimer's Disease in a Community-Dwelling Multiethnic Cohort. J Alzheimers Dis 2023; 95:93-117. [PMID: 37482990 PMCID: PMC10528912 DOI: 10.3233/jad-220811] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
BACKGROUND The major aims of the three Predictors Studies have been to further our understanding of Alzheimer's disease (AD) progression sufficiently to predict the length of time from disease onset to major disease outcomes in individual patients with AD. OBJECTIVES To validate a longitudinal Grade of Membership (L-GoM) prediction algorithm developed using clinic-based, mainly white patients from the Predictors 2 Study in a statistically representative community-based sample of Hispanic (N = 211) and non-Hispanic (N = 62) older adults (with 60 males and 213 females) from the Predictors 3 Study and extend the algorithm to mild cognitive impairment (MCI). METHODS The L-GoM model was applied to data collected at the initial Predictors 3 visit for 150 subjects with AD and 123 with MCI. Participants were followed annually for up to seven years. Observed rates of survival and need for full-time care (FTC) were compared to those predicted by the algorithm. RESULTS Initial MCI/AD severity in Predictors 3 was substantially higher than among clinic-based AD patients enrolled at the specialized Alzheimer's centers in Predictors 2. The observed survival and need for FTC followed the L-GoM model trajectories in individuals with MCI or AD, except for N = 32 subjects who were initially diagnosed with AD but reverted to a non-AD diagnosis on follow-up. CONCLUSION These findings indicate that the L-GoM model is applicable to community-dwelling, multiethnic older adults with AD. They extend the use of the model to the prediction of outcomes for MCI. They also justify release of our L-GoM calculator at this time.
Collapse
Affiliation(s)
- Eric Stallard
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Anton Kociolek
- Cognitive Neuroscience Division of the Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Zhezhen Jin
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Hyunnam Ryu
- Cognitive Neuroscience Division of the Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Seonjoo Lee
- Division of Biostatistics, New York State Psychiatric Institute, New York, NY, USA
- Department of Psychiatry, Columbia University, New York, NY, USA
| | - Stephanie Cosentino
- Cognitive Neuroscience Division of the Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Carolyn Zhu
- Brookdale Department of Geriatrics and Palliative Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- James J. Peters VA Medical Center, Bronx, NY, USA
| | - Yian Gu
- Cognitive Neuroscience Division of the Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Kayri Fernandez
- Cognitive Neuroscience Division of the Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Michelle Hernandez
- Cognitive Neuroscience Division of the Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Bruce Kinosian
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yaakov Stern
- Cognitive Neuroscience Division of the Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| |
Collapse
|
11
|
Agogo GO, Mwambi H, Shi X, Liu Z. Modeling of correlated cognitive function and functional disability outcomes with bounded and missing data in a longitudinal aging study. Behav Res Methods 2022; 54:2949-2961. [PMID: 35132587 DOI: 10.3758/s13428-022-01796-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2022] [Indexed: 12/16/2022]
Abstract
Longitudinal studies of correlated cognitive and disability outcomes among older adults are characterized by missing data due to death or loss to follow-up from deteriorating health conditions. The Mini-Mental State Examination (MMSE) score for assessing cognitive function ranges from a minimum of 0 (floor) to a maximum of 30 (ceiling). To study the risk factors of cognitive function and functional disability, we propose a shared parameter model to handle missingness, correlation between outcomes, and the floor and ceiling effects of the MMSE measurements. The shared random effects in the proposed model handle missingness (either missing at random or missing not at random) and correlation between these outcomes, while the Tobit distribution handles the floor and ceiling effects of the MMSE measurements. We used data from the Chinese Longitudinal Healthy Longevity Survey (CLHLS) and a simulation study. By ignoring the MMSE floor and ceiling effects in the analyses of the CLHLS, the association of systolic blood pressure with cognitive function was not significant and the association of age with cognitive function was lower by 16.6% (from -6.237 to -5.201). By ignoring the MMSE floor and ceiling effects in the simulation study, the relative bias in the estimated association of female gender with cognitive function was 43 times higher (from -0.01 to -0.44). The estimated associations obtained with data missing at random were smaller than those with data missing not at random, demonstrating how the missing data mechanism affects the analytic results. Our work underscores the importance of proper model specification in longitudinal analysis of correlated outcomes subject to missingness and bounded values.
Collapse
Affiliation(s)
- George O Agogo
- StatsDecide Analytics and Consulting Ltd, P.O. Box 17438-20100, Nakuru, Kenya.
| | - Henry Mwambi
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg Campus, Pietermaritzburg, South Africa
| | - Xiaoming Shi
- National Institute of Environmental Health, Chinese Center for Disease Control and Prevention, Beijing, 100021, China
| | - Zuyun Liu
- Department of Big Data in Health Science and Center for Clinical Big Data and Analytics, School of Public Health and the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
12
|
Li Q, Vehik K, Li C, Triplett E, Roesch L, Hu YJ, Krischer J. A robust and transformation-free joint model with matching and regularization for metagenomic trajectory and disease onset. BMC Genomics 2022; 23:661. [PMID: 36123651 PMCID: PMC9484160 DOI: 10.1186/s12864-022-08890-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 09/14/2022] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND To identify operational taxonomy units (OTUs) signaling disease onset in an observational study, a powerful strategy was selecting participants by matched sets and profiling temporal metagenomes, followed by trajectory analysis. Existing trajectory analyses modeled individual OTU or microbial community without adjusting for the within-community correlation and matched-set-specific latent factors. RESULTS We proposed a joint model with matching and regularization (JMR) to detect OTU-specific trajectory predictive of host disease status. The between- and within-matched-sets heterogeneity in OTU relative abundance and disease risk were modeled by nested random effects. The inherent negative correlation in microbiota composition was adjusted by incorporating and regularizing the top-correlated taxa as longitudinal covariate, pre-selected by Bray-Curtis distance and elastic net regression. We designed a simulation pipeline to generate true biomarkers for disease onset and the pseudo biomarkers caused by compositionality. We demonstrated that JMR effectively controlled the false discovery and pseudo biomarkers in a simulation study generating temporal high-dimensional metagenomic counts with random intercept or slope. Application of the competing methods in the simulated data and the TEDDY cohort showed that JMR outperformed the other methods and identified important taxa in infants' fecal samples with dynamics preceding host disease status. CONCLUSION Our method JMR is a robust framework that models taxon-specific trajectory and host disease status for matched participants without transformation of relative abundance, improving the power of detecting disease-associated microbial features in certain scenarios. JMR is available in R package mtradeR at https://github.com/qianli10000/mtradeR.
Collapse
Affiliation(s)
- Qian Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA.
| | - Kendra Vehik
- Health Informatics Institute, University of South Florida, Tampa, 33620, FL, USA
| | - Cai Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA
| | - Eric Triplett
- Department of Microbiology and Cell Science, University of Florida, Gainesville, 32611, FL, USA
| | - Luiz Roesch
- Department of Microbiology and Cell Science, University of Florida, Gainesville, 32611, FL, USA
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, 30322, GA, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, 33620, FL, USA
| |
Collapse
|