1
|
Trecarten S, Fongang B, Liss M. Current Trends and Challenges of Microbiome Research in Prostate Cancer. Curr Oncol Rep 2024; 26:477-487. [PMID: 38573440 DOI: 10.1007/s11912-024-01520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2024] [Indexed: 04/05/2024]
Abstract
PURPOSE OF REVIEW The role of the gut microbiome in prostate cancer is an emerging area of research interest. However, no single causative organism has yet been identified. The goal of this paper is to examine the role of the microbiome in prostate cancer and summarize the challenges relating to methodology in specimen collection, sequencing technology, and interpretation of results. RECENT FINDINGS Significant heterogeneity still exists in methodology for stool sampling/storage, preservative options, DNA extraction, and sequencing database selection/in silico processing. Debate persists over primer choice in amplicon sequencing as well as optimal methods for data normalization. Statistical methods for longitudinal microbiome analysis continue to undergo refinement. While standardization of methodology may help yield more consistent results for organism identification in prostate cancer, this is a difficult task due to considerable procedural variation at each step in the process. Further reproducibility and methodology research is required.
Collapse
Affiliation(s)
- Shaun Trecarten
- Department of Urology, UT Health San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229, USA
| | - Bernard Fongang
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, UT Health San Antonio, San Antonio, TX, USA
- Department of Biochemistry and Structural Biology, UT Health San Antonio, San Antonio, TX, USA
- Department of Population Health Sciences, UT Health San Antonio, San Antonio, TX, USA
| | - Michael Liss
- Department of Urology, UT Health San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229, USA.
| |
Collapse
|
2
|
Lim S. Exposures to select risk factors can be estimated from a continuous stream of inertial sensor measurements during a variety of lifting-lowering tasks. ERGONOMICS 2024:1-16. [PMID: 38646871 DOI: 10.1080/00140139.2024.2343949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/28/2024] [Indexed: 04/23/2024]
Abstract
Wearable inertial measurement units (IMUs) are used increasingly to estimate biomechanical exposures in lifting-lowering tasks. The objective of the study was to develop and evaluate predictive models for estimating relative hand loads and two other critical biomechanical exposures to gain a comprehensive understanding of work-related musculoskeletal disorders in lifting. We collected 12,480 lifting-lowering phases from 26 subjects (15 men and 11 women) performing manual lifting-lowering tasks with hand loads (0-22.7 kg) at varied workstation heights and handling modes. We implemented a Hierarchical model, that sequentially classified risk factors, including workstation height, handling mode, and relative hand load. Our algorithm detected lifting-lowering phases (>97.8%) with mean onset errors of 0.12 and 0.2 seconds for lifting and lowering phases. It estimated workstation height (>98.5%), handling mode (>87.1%), and relative hand load (mean absolute errors of 5.6-5.8%) across conditions, highlighting the benefits of data-driven models in deriving lifting-lowering occurrences, timing, and critical risk factors from continuous IMU-based kinematics.
Collapse
Affiliation(s)
- Sol Lim
- Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA
| |
Collapse
|
3
|
Stephenson BJK, Wu SM, Dominici F. Identifying dietary consumption patterns from survey data: a Bayesian nonparametric latent class model. JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, (STATISTICS IN SOCIETY) 2024; 187:496-512. [PMID: 38617597 PMCID: PMC11009925 DOI: 10.1093/jrsssa/qnad135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 11/06/2023] [Accepted: 11/09/2023] [Indexed: 04/16/2024]
Abstract
Dietary assessments provide the snapshots of population-based dietary habits. Questions remain about how generalisable those snapshots are in national survey data, where certain subgroups are sampled disproportionately. We propose a Bayesian overfitted latent class model to derive dietary patterns, accounting for survey design and sampling variability. Compared to standard approaches, our model showed improved identifiability of the true population pattern and prevalence in simulation. We focus application of this model to identify the intake patterns of adults living at or below the 130% poverty income level. Five dietary patterns were identified and characterised by reproducible code/data made available to encourage further research.
Collapse
Affiliation(s)
- Briana J K Stephenson
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Stephanie M Wu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Francesca Dominici
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
4
|
Bui Q, Kumar A, Chen Y, Hamzehloo A, Heitsch L, Slowik A, Strbian D, Lee JM, Dhar R. CSF-Based Volumetric Imaging Biomarkers Highlight Incidence and Risk Factors for Cerebral Edema After Ischemic Stroke. Neurocrit Care 2024; 40:303-313. [PMID: 37188885 PMCID: PMC11025464 DOI: 10.1007/s12028-023-01742-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 04/19/2023] [Indexed: 05/17/2023]
Abstract
BACKGROUND Cerebral edema has primarily been studied using midline shift or clinical deterioration as end points, which only captures the severe and delayed manifestations of a process affecting many patients with stroke. Quantitative imaging biomarkers that measure edema severity across the entire spectrum could improve its early detection, as well as identify relevant mediators of this important stroke complication. METHODS We applied an automated image analysis pipeline to measure the displacement of cerebrospinal fluid (ΔCSF) and the ratio of lesional versus contralateral hemispheric cerebrospinal fluid (CSF) volume (CSF ratio) in a cohort of 935 patients with hemispheric stroke with follow-up computed tomography scans taken a median of 26 h (interquartile range 24-31) after stroke onset. We determined diagnostic thresholds based on comparison to those without any visible edema. We modeled baseline clinical and radiographic variables against each edema biomarker and assessed how each biomarker was associated with stroke outcome (modified Rankin Scale at 90 days). RESULTS The displacement of CSF and CSF ratio were correlated with midline shift (r = 0.52 and - 0.74, p < 0.0001) but exhibited broader ranges. A ΔCSF of greater than 14% or a CSF ratio below 0.90 identified those with visible edema: more than half of the patients with stroke met these criteria, compared with only 14% who had midline shift at 24 h. Predictors of edema across all biomarkers included a higher National Institutes of Health Stroke Scale score, a lower Alberta Stroke Program Early CT score, and lower baseline CSF volume. A history of hypertension and diabetes (but not acute hyperglycemia) predicted greater ΔCSF but not midline shift. Both ΔCSF and a lower CSF ratio were associated with worse outcome, adjusting for age, National Institutes of Health Stroke Scale score, and Alberta Stroke Program Early CT score (odds ratio 1.7, 95% confidence interval 1.3-2.2 per 21% ΔCSF). CONCLUSIONS Cerebral edema can be measured in a majority of patients with stroke on follow-up computed tomography using volumetric biomarkers evaluating CSF shifts, including in many without visible midline shift. Edema formation is influenced by clinical and radiographic stroke severity but also by chronic vascular risk factors and contributes to worse stroke outcomes.
Collapse
Affiliation(s)
- Quoc Bui
- Department of Neurology, Washington University School of Medicine, 660 S Euclid Avenue, Campus Box 8111, St. Louis, MO, USA
| | - Atul Kumar
- Department of Neurology, Washington University School of Medicine, 660 S Euclid Avenue, Campus Box 8111, St. Louis, MO, USA
| | - Yasheng Chen
- Department of Neurology, Washington University School of Medicine, 660 S Euclid Avenue, Campus Box 8111, St. Louis, MO, USA
| | - Ali Hamzehloo
- Department of Neurology, Washington University School of Medicine, 660 S Euclid Avenue, Campus Box 8111, St. Louis, MO, USA
| | - Laura Heitsch
- Department of Emergency Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Agnieszka Slowik
- Department of Neurology, Jagiellonian University Medical College, Krakow, Poland
| | - Daniel Strbian
- Department of Neurology, Helsinki University Hospital, Helsinki, Finland
| | - Jin-Moo Lee
- Department of Neurology, Washington University School of Medicine, 660 S Euclid Avenue, Campus Box 8111, St. Louis, MO, USA
| | - Rajat Dhar
- Department of Neurology, Washington University School of Medicine, 660 S Euclid Avenue, Campus Box 8111, St. Louis, MO, USA.
| |
Collapse
|
5
|
Yang L. Diagnostics for regression models with semicontinuous outcomes. Biometrics 2024; 80:ujae007. [PMID: 38470256 DOI: 10.1093/biomtc/ujae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 11/16/2023] [Accepted: 01/16/2024] [Indexed: 03/13/2024]
Abstract
Semicontinuous outcomes commonly arise in a wide variety of fields, such as insurance claims, healthcare expenditures, rainfall amounts, and alcohol consumption. Regression models, including Tobit, Tweedie, and two-part models, are widely employed to understand the relationship between semicontinuous outcomes and covariates. Given the potential detrimental consequences of model misspecification, after fitting a regression model, it is of prime importance to check the adequacy of the model. However, due to the point mass at zero, standard diagnostic tools for regression models (eg, deviance and Pearson residuals) are not informative for semicontinuous data. To bridge this gap, we propose a new type of residuals for semicontinuous outcomes that is applicable to general regression models. Under the correctly specified model, the proposed residuals converge to being uniformly distributed, and when the model is misspecified, they significantly depart from this pattern. In addition to in-sample validation, the proposed methodology can also be employed to evaluate predictive distributions. We demonstrate the effectiveness of the proposed tool using health expenditure data from the US Medical Expenditure Panel Survey.
Collapse
Affiliation(s)
- Lu Yang
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
6
|
D’Angelo A, Trenholm N, Loose B, Glastra L, Strock J, Kim J. Microplastics Distribution within Western Arctic Seawater and Sea Ice. TOXICS 2023; 11:792. [PMID: 37755802 PMCID: PMC10534329 DOI: 10.3390/toxics11090792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/12/2023] [Accepted: 09/18/2023] [Indexed: 09/28/2023]
Abstract
Microplastic pollution has emerged as a global environmental concern, exhibiting wide distribution within marine ecosystems, including the Arctic Ocean. Limited Arctic microplastic data exist from beached plastics, seabed sediments, floating plastics, and sea ice. However, no studies have examined microplastics in the sea ice of the Canadian Arctic Archipelago and Tallurutiup Imanga National Marine Conservation Area, and few have explored Arctic marginal seas' water column. The majority of the microplastic data originates from the Eurasian Arctic, with limited data available from other regions of the Arctic Ocean. This study presents data from two distinct campaigns in the Canadian Arctic Archipelago and Western Arctic marginal seas in 2019 and 2020. These campaigns involved sampling from different regions and matrices, making direct comparisons inappropriate. The study's primary objective is to provide insights into the spatial and vertical distribution of microplastics. The results reveal elevated microplastic concentrations within the upper 50 m of the water column and significant accumulation in the sea ice, providing evidence to support the designation of sea ice as a microplastic sink. Surface seawater exhibits a gradient of microplastic counts, decreasing from the Chukchi Sea towards the Beaufort Sea. Polyvinyl chloride polymer (~60%) dominated microplastic composition in both sea ice and seawater. This study highlights the need for further investigations in this region to enhance our understanding of microplastic sources, distribution, and transport.
Collapse
Affiliation(s)
- Alessandra D’Angelo
- Graduate School of Oceanography, University of Rhode Island, Narragansett, RI 02882, USA
| | - Nicole Trenholm
- Center for Environmental Science, University of Maryland, Cambridge, MD 21613, USA;
| | - Brice Loose
- Graduate School of Oceanography, University of Rhode Island, Narragansett, RI 02882, USA
| | - Laura Glastra
- Graduate School of Oceanography, University of Rhode Island, Narragansett, RI 02882, USA
| | - Jacob Strock
- Graduate School of Oceanography, University of Rhode Island, Narragansett, RI 02882, USA
| | - Jongsun Kim
- School of Earth, Environmental and Marine Sciences, The University of Texas Rio Grande Valley, Brownsville, TX 78520, USA
| |
Collapse
|
7
|
Rene L, Linero AR, Slate E. Causal mediation and sensitivity analysis for mixed-scale data. Stat Methods Med Res 2023; 32:1249-1266. [PMID: 37194551 PMCID: PMC10500957 DOI: 10.1177/09622802231173491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The goal of causal mediation analysis, often described within the potential outcomes framework, is to decompose the effect of an exposure on an outcome of interest along different causal pathways. Using the assumption of sequential ignorability to attain non-parametric identification, Imai et al. (2010) proposed a flexible approach to measuring mediation effects, focusing on parametric and semiparametric normal/Bernoulli models for the outcome and mediator. Less attention has been paid to the case where the outcome and/or mediator model are mixed-scale, ordinal, or otherwise fall outside the normal/Bernoulli setting. We develop a simple, but flexible, parametric modeling framework to accommodate the common situation where the responses are mixed continuous and binary, and, apply it to a zero-one inflated beta model for the outcome and mediator. Applying our proposed methods to the publicly-available JOBS II dataset, we (i) argue for the need for non-normal models, (ii) show how to estimate both average and quantile mediation effects for boundary-censored data, and (iii) show how to conduct a meaningful sensitivity analysis by introducing unidentified, scientifically meaningful, sensitivity parameters.
Collapse
Affiliation(s)
- Lexi Rene
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Antonio R Linero
- Department of Statistics and Data Sciences, University of Texas at Austin, Austin, TX, USA
| | - Elizabeth Slate
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| |
Collapse
|
8
|
Stephenson BJK, Willett WC. Racial and ethnic heterogeneity in diets of low-income adult females in the United States: results from National Health and Nutrition Examination Surveys from 2011 to 2018. Am J Clin Nutr 2023; 117:625-634. [PMID: 36872021 PMCID: PMC10315405 DOI: 10.1016/j.ajcnut.2023.01.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 01/03/2023] [Accepted: 01/09/2023] [Indexed: 03/06/2023] Open
Abstract
BACKGROUND Poor diet is a major risk factor of cardiovascular and chronic diseases, particularly for low-income female adults. However, the pathways by which race and ethnicity plays a role in this risk factor have not been fully explored. OBJECTIVES This observational study aimed to identify dietary consumption differences by race and ethnicity of US female adults living at or below the 130% poverty income level from 2011 to 2018. METHODS A total of 2917 adult females aged 20 to 80 years from the National Health and Nutrition Examination Survey (2011-2018) living at or below the 130% poverty income level with at least one complete 24-hour dietary recall were classified into 5 self-identified racial and ethnic subgroups (Mexican, other Hispanic, non-Hispanic [NH]-White, NH-Black, and NH-Asian). Dietary consumption patterns were defined by 28 major food groups summarized from the Food Pattern Equivalents Database and derived via a robust profile clustering model, which identifies foods that share consumption patterns across all low-income female adults and foods that differ in consumption patterns based on the racial and ethnic subgroups. RESULTS All food consumption patterns were identified at the local level, defined by racial and ethnic subgroups. Legumes and cured meats were the most differentiating foods identified across all racial and ethnic subgroups. Higher consumption levels of legumes were observed among Mexican-American and other Hispanic females. Higher consumption levels of cured meat were observed among NH-White and Black females. NH-Asian females had the most uniquely characterized patterns with a higher consumption of prudent foods (fruits, vegetables, and whole grains). CONCLUSIONS Differences among the consumption behaviors of low-income female adults were found along racial and ethnic lines. Efforts to improve the nutritional health of low-income female adults should consider racial and ethnic differences in diets to appropriately focus interventions.
Collapse
Affiliation(s)
| | - Walter C Willett
- Departments of Nutrition and Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
9
|
Jana N, Gautam M. Confidence intervals of difference and ratio of means for zero-adjusted inverse Gaussian distributions. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2102652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Nabakumar Jana
- Department of Mathematics and Computing, Indian Institute of Technology (ISM) Dhanbad, Dhanbad, India
| | - Meenakshi Gautam
- Department of Mathematics and Computing, Indian Institute of Technology (ISM) Dhanbad, Dhanbad, India
| |
Collapse
|
10
|
Kodikara S, Ellul S, Lê Cao KA. Statistical challenges in longitudinal microbiome data analysis. Brief Bioinform 2022; 23:bbac273. [PMID: 35830875 PMCID: PMC9294433 DOI: 10.1093/bib/bbac273] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 05/28/2022] [Accepted: 06/12/2022] [Indexed: 11/13/2022] Open
Abstract
The microbiome is a complex and dynamic community of microorganisms that co-exist interdependently within an ecosystem, and interact with its host or environment. Longitudinal studies can capture temporal variation within the microbiome to gain mechanistic insights into microbial systems; however, current statistical methods are limited due to the complex and inherent features of the data. We have identified three analytical objectives in longitudinal microbial studies: (1) differential abundance over time and between sample groups, demographic factors or clinical variables of interest; (2) clustering of microorganisms evolving concomitantly across time and (3) network modelling to identify temporal relationships between microorganisms. This review explores the strengths and limitations of current methods to fulfill these objectives, compares different methods in simulation and case studies for objectives (1) and (2), and highlights opportunities for further methodological developments. R tutorials are provided to reproduce the analyses conducted in this review.
Collapse
Affiliation(s)
- Saritha Kodikara
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Royal Parade, 3052, Victoria, Australia
| | - Susan Ellul
- Murdoch Children’s Research Institute and Department of Paediatrics, University of Melbourne, Bouverie Street, 3052, Victoria, Australia
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Royal Parade, 3052, Victoria, Australia
| |
Collapse
|
11
|
Hsu WW, Mawella NR, Todem D. On testing for homogeneity with zero-inflated models through the lens of model misspecification. Int Stat Rev 2022; 90:62-77. [PMID: 35601991 PMCID: PMC9122237 DOI: 10.1111/insr.12462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 06/13/2021] [Indexed: 10/20/2022]
Abstract
In many applications of two-component mixture models such as the popular zero-inflated model for discrete-valued data, it is customary for the data analyst to evaluate the inherent heterogeneity in view of observed data. To this end, the score test, acclaimed for its simplicity, is routinely performed. It has long been recognized that this test may behave erratically under model misspecification, but the implications of this behavior remain poorly understood for popular two-component mixture models. For the special case of zero-inflated count models, we use data simulations and theoretical arguments to evaluate this behavior and discuss its implications in settings where the working model is restrictive with regard to the true data generating mechanism. We enrich this discussion with an analysis of count data in HIV research, where a one-component model is shown to fit the data reasonably well despite apparent extra zeros. These results suggest that a rejection of homogeneity does not imply that the underlying mixture model is appropriate. Rather, such a rejection simply implies that the mixture model should be carefully interpreted in the light of potential model misspecifications, and further evaluated against other competing models.
Collapse
Affiliation(s)
- Wei-Wen Hsu
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
| | - Nadeesha R Mawella
- Department of Mathematics and Statistics, University of Missouri-Kansas City, Kansas City, MO 64110, USA
| | - David Todem
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
12
|
Huang Z, Wang C. A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data. Metabolites 2022; 12:305. [PMID: 35448492 PMCID: PMC9032534 DOI: 10.3390/metabo12040305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 03/26/2022] [Accepted: 03/27/2022] [Indexed: 12/04/2022] Open
Abstract
This review presents an overview of the statistical methods on differential abundance (DA) analysis for mass spectrometry (MS)-based metabolomic data. MS has been widely used for metabolomic abundance profiling in biological samples. The high-throughput data produced by MS often contain a large fraction of zero values caused by the absence of certain metabolites and the technical detection limits of MS. Various statistical methods have been developed to characterize the zero-inflated metabolomic data and perform DA analysis, ranging from simple tests to more complex models including parametric, semi-parametric, and non-parametric approaches. In this article, we discuss and compare DA analysis methods regarding their assumptions and statistical modeling techniques.
Collapse
Affiliation(s)
- Zhengyan Huang
- Everest Clinical Research Corporation, Little Falls, NJ 07424, USA
| | - Chi Wang
- Markey Cancer Center, Department of Internal Medicine, University of Kentucky, Lexington, KY 40536, USA
| |
Collapse
|
13
|
Ren J, Tapert S, Fan CC, Thompson WK. A semi-parametric Bayesian model for semi-continuous longitudinal data. Stat Med 2022; 41:2354-2374. [PMID: 35274335 DOI: 10.1002/sim.9359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 01/21/2022] [Accepted: 02/03/2022] [Indexed: 11/11/2022]
Abstract
Semi-continuous data present challenges in both model fitting and interpretation. Parametric distributions may be inappropriate for extreme long right tails of the data. Mean effects of covariates, susceptible to extreme values, may fail to capture relevant information for most of the sample. We propose a two-component semi-parametric Bayesian mixture model, with the discrete component captured by a probability mass (typically at zero) and the continuous component of the density modeled by a mixture of B-spline densities that can be flexibly fit to any data distribution. The model includes random effects of subjects to allow for application to longitudinal data. We specify prior distributions on parameters and perform model inference using a Markov chain Monte Carlo (MCMC) Gibbs-sampling algorithm programmed in R. Statistical inference can be made for multiple quantiles of the covariate effects simultaneously providing a comprehensive view. Various MCMC sampling techniques are used to facilitate convergence. We demonstrate the performance and the interpretability of the model via simulations and analyses on the National Consortium on Alcohol and Neurodevelopment in Adolescence study (NCANDA) data on alcohol binge drinking.
Collapse
Affiliation(s)
- Junting Ren
- Division of Biostatistics, Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, California, USA.,Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA
| | - Susan Tapert
- Department of Psychiatry, University of California San Diego, La Jolla, California, USA
| | - Chun Chieh Fan
- Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA.,Center for Human Development, University of California San Diego, La Jolla, California, USA
| | - Wesley K Thompson
- Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, California, USA.,Department of Radiology, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
14
|
Rustand D, Briollais L, Tournigand C, Rondeau V. Two-part joint model for a longitudinal semicontinuous marker and a terminal event with application to metastatic colorectal cancer data. Biostatistics 2022; 23:50-68. [PMID: 32282877 PMCID: PMC9116390 DOI: 10.1093/biostatistics/kxaa012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 01/19/2020] [Accepted: 02/19/2020] [Indexed: 11/12/2022] Open
Abstract
Joint models for a longitudinal biomarker and a terminal event have gained interests for evaluating cancer clinical trials because the tumor evolution reflects directly the state of the disease. A biomarker characterizing the tumor size evolution over time can be highly informative for assessing treatment options and could be taken into account in addition to the survival time. The biomarker often has a semicontinuous distribution, i.e., it is zero inflated and right skewed. An appropriate model is needed for the longitudinal biomarker as well as an association structure with the survival outcome. In this article, we propose a joint model for a longitudinal semicontinuous biomarker and a survival time. The semicontinuous nature of the longitudinal biomarker is specified by a two-part model, which splits its distribution into a binary outcome (first part) represented by the positive versus zero values and a continuous outcome (second part) with the positive values only. Survival times are modeled with a proportional hazards model for which we propose three association structures with the biomarker. Our simulation studies show some bias can arise in the parameter estimates when the semicontinuous nature of the biomarker is ignored, assuming the true model is a two-part model. An application to advanced metastatic colorectal cancer data from the GERCOR study is performed where our two-part model is compared to one-part joint models. Our results show that treatment arm B (FOLFOX6/FOLFIRI) is associated to higher SLD values over time and its positive association with the terminal event leads to an increased risk of death compared to treatment arm A (FOLFIRI/FOLFOX6).
Collapse
Affiliation(s)
- Denis Rustand
- Department of Biostatistics, Bordeaux Population Health Research Center, INSERM U1219, 146 Rue Léo Saignat, 33076 Bordeaux, France
| | - Laurent Briollais
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital and Dalla Lana School of Public Health (Biostatistics), University of Toronto, 600 University Ave., Ontario M5G 1X5, Canada
| | - Christophe Tournigand
- Hôpital Henri Mondor, 51 Avenue du Maréchal de Lattre de Tassigny, 94010 Créteil, France
| | - Virginie Rondeau
- Department of Biostatistics, Bordeaux Population Health Research Center, INSERM U1219, 146 Rue Léo Saignat, 33076 Bordeaux, France
| |
Collapse
|
15
|
Simulated Flock-Level Shedding Characteristics of Turkeys in Ten Thousand Bird Houses Infected with H7 Low Pathogenicity Avian Influenza Virus Strains. Viruses 2021; 13:v13122509. [PMID: 34960777 PMCID: PMC8706675 DOI: 10.3390/v13122509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 12/09/2021] [Accepted: 12/10/2021] [Indexed: 12/03/2022] Open
Abstract
Understanding the amount of virus shed at the flock level by birds infected with low pathogenicity avian influenza virus (LPAIV) over time can help inform the type and timing of activities performed in response to a confirmed LPAIV-positive premises. To this end, we developed a mathematical model which allows us to estimate viral shedding by 10,000 turkey toms raised in commercial turkey production in the United States, and infected by H7 LPAIV strains. We simulated the amount of virus shed orally and from the cloaca over time, as well as the amount of virus in manure. In addition, we simulated the threshold cycle value (Ct) of pooled oropharyngeal swabs from birds in the infected flock tested by real-time reverse transcription polymerase chain reaction. The simulation model predicted that little to no shedding would occur once the highest threshold of seroconversion was reached. Substantial amounts of virus in manure (median 1.5×108 and 5.8×109; 50% egg infectious dose) were predicted at the peak. Lastly, the model results suggested that higher Ct values, indicating less viral shedding, are more likely to be observed later in the infection process as the flock approaches recovery.
Collapse
|
16
|
Liu T, Xu P, Du Y, Lu H, Zhao H, Wang T. MZINBVA: variational approximation for multilevel zero-inflated negative-binomial models for association analysis in microbiome surveys. Brief Bioinform 2021; 23:6409694. [PMID: 34718406 DOI: 10.1093/bib/bbab443] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 09/11/2021] [Accepted: 09/28/2021] [Indexed: 01/02/2023] Open
Abstract
As our understanding of the microbiome has expanded, so has the recognition of its critical role in human health and disease, thereby emphasizing the importance of testing whether microbes are associated with environmental factors or clinical outcomes. However, many of the fundamental challenges that concern microbiome surveys arise from statistical and experimental design issues, such as the sparse and overdispersed nature of microbiome count data and the complex correlation structure among samples. For example, in the human microbiome project (HMP) dataset, the repeated observations across time points (level 1) are nested within body sites (level 2), which are further nested within subjects (level 3). Therefore, there is a great need for the development of specialized and sophisticated statistical tests. In this paper, we propose multilevel zero-inflated negative-binomial models for association analysis in microbiome surveys. We develop a variational approximation method for maximum likelihood estimation and inference. It uses optimization, rather than sampling, to approximate the log-likelihood and compute parameter estimates, provides a robust estimate of the covariance of parameter estimates and constructs a Wald-type test statistic for association testing. We evaluate and demonstrate the performance of our method using extensive simulation studies and an application to the HMP dataset. We have developed an R package MZINBVA to implement the proposed method, which is available from the GitHub repository https://github.com/liudoubletian/MZINBVA.
Collapse
Affiliation(s)
- Tiantian Liu
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 800 Dongchuan RD, 200240, Shanghai, China
| | - Peirong Xu
- Department of Breast Surgery, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China
| | - Yueyao Du
- Department of Biostatistics, Yale University, 60 College Stree, CT 06520, New Haven, USA.,MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, 800 Dongchuan RD, 200240, Shanghai, China
| | - Hui Lu
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 800 Dongchuan RD, 200240, Shanghai, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, 60 College Stree, CT 06520, New Haven, USA
| | - Tao Wang
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 800 Dongchuan RD, 200240, Shanghai, China
| |
Collapse
|
17
|
Ruf A, Neubauer AB, Ebner-Priemer U, Reif A, Matura S. Studying dietary intake in daily life through multilevel two-part modelling: a novel analytical approach and its practical application. Int J Behav Nutr Phys Act 2021; 18:130. [PMID: 34579744 PMCID: PMC8477527 DOI: 10.1186/s12966-021-01187-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 08/10/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Understanding which factors influence dietary intake, particularly in daily life, is crucial given the impact diet has on physical as well as mental health. However, a factor might influence whether but not how much an individual eats and vice versa or a factor's importance may differ across these two facets. Distinguishing between these two facets, hence, studying dietary intake as a dual process is conceptually promising and not only allows further insights, but also solves a statistical issue. When assessing the association between a predictor (e.g. momentary affect) and subsequent dietary intake in daily life through ecological momentary assessment (EMA), the outcome variable (e.g. energy intake within a predefined time-interval) is semicontinuous. That is, one part is equal to zero (i.e. no dietary intake occurred) and the other contains right-skewed positive values (i.e. dietary intake occurred, but often only small amounts are consumed). However, linear multilevel modelling which is commonly used for EMA data to account for repeated measures within individuals cannot be applied to semicontinuous outcomes. A highly informative statistical approach for semicontinuous outcomes is multilevel two-part modelling which treats the outcome as generated by a dual process, combining a multilevel logistic/probit regression for zeros and a multilevel (generalized) linear regression for nonzero values. METHODS A multilevel two-part model combining a multilevel logistic regression to predict whether an individual eats and a multilevel gamma regression to predict how much is eaten, if an individual eats, is proposed. Its general implementation in R, a widely used and freely available statistical software, using the R-package brms is described. To illustrate its practical application, the analytical approach is applied exemplary to data from the Eat2beNICE-APPetite-study. RESULTS Results highlight that the proposed multilevel two-part model reveals process-specific associations which cannot be detected through traditional multilevel modelling. CONCLUSIONS This paper is the first to introduce multilevel two-part modelling as a novel analytical approach to study dietary intake in daily life. Studying dietary intake through multilevel two-part modelling is conceptually as well as methodologically promising. Findings can be translated to tailored nutritional interventions targeting either the occurrence or the amount of dietary intake.
Collapse
Affiliation(s)
- Alea Ruf
- Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital, Goethe University, Heinrich-Hoffmann-Straße 10, 60528 Frankfurt am Main, Germany
| | - Andreas B. Neubauer
- DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany
- Center for Research on Individual Development and Adaptive Education of Children at Risk (IDeA), Frankfurt am Main, Germany
| | - Ulrich Ebner-Priemer
- Mental mHealth Lab, Institute of Sports and Sports Science, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Andreas Reif
- Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital, Goethe University, Heinrich-Hoffmann-Straße 10, 60528 Frankfurt am Main, Germany
| | - Silke Matura
- Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital, Goethe University, Heinrich-Hoffmann-Straße 10, 60528 Frankfurt am Main, Germany
| |
Collapse
|
18
|
CSF and serum inflammatory response and association with outcomes in spontaneous intracerebral hemorrhage with intraventricular extension: an analysis of the CLEAR-III Trial. J Neuroinflammation 2021; 18:179. [PMID: 34419101 PMCID: PMC8380363 DOI: 10.1186/s12974-021-02224-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 07/22/2021] [Indexed: 12/13/2022] Open
Abstract
Background Intracerebral hemorrhage (ICH) results in a cascade of inflammatory cell activation with recruitment of peripheral leukocytes to the brain parenchyma and surrounding the hematoma. We hypothesized that in patients with ICH and intraventricular hemorrhage (IVH), a robust cerebrospinal fluid (CSF) inflammatory response occurs with leukocyte subtypes being affected by alteplase treatment and contributing to outcomes. Methods Serum and CSF cell counts from patients in the phase 3 Clot Lysis: Evaluating Accelerated Resolution of Intraventricular Hemorrhage (CLEAR III) trial were analyzed. CSF leukocytes were corrected for the presence of red blood cells. Trends in cell counts were plotted chronologically. Associations were evaluated between serum and CSF leukocyte subtypes and adjudicated functional outcome (modified Rankin Scale; mRS) at 30 and 180 days and bacterial infection according to treatment with intraventricular alteplase versus saline. Results A total of 279 and 292 patients had ≥3 differential cell counts from serum and CSF, respectively. CSF leukocyte subtypes evolved during IVH resolution with a significantly augmented inflammatory response for all subtypes in alteplase- compared to saline-treated patients. CSF leukocyte subtypes were not associated with detrimental effect on functional outcomes in the full cohort, but all were associated with poor 30-day outcome in saline-treated patients with IVH volume ≥20 mL. Higher serum lymphocytes were associated with good functional outcomes (mRS 0–3) in the entire cohort and saline-treated but not alteplase-treated group. Conversely, increased serum neutrophil-to-lymphocyte ratio (NLR) in the entire cohort and saline group was associated with worse functional outcomes. Higher median serum lymphocytes were associated with the absence of infection at 7 days. Conclusions Aseptic CSF inflammation after IVH involves all leukocyte subtypes. Serum lymphocytes may be associated with better outcomes by mitigating infection. Alteplase augments the inflammatory response without affecting outcomes. Supplementary Information The online version contains supplementary material available at 10.1186/s12974-021-02224-w.
Collapse
|
19
|
Hui FK, Bondell HD. A shared parameter mixture model for longitudinal income data with missing responses and zero rounding. AUST NZ J STAT 2021. [DOI: 10.1111/anzs.12323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Francis K.C. Hui
- Research School of Finance, Actuarial Studies & Statistics Australian National University Acton ACT2601Australia
| | - Howard D. Bondell
- School of Mathematics and Statistics The University of Melbourne Melbourne VIC3010Australia
| |
Collapse
|
20
|
Feng T, Boyle LN. Sparse group regularization for semi-continuous transportation data. Stat Med 2021; 40:3267-3285. [PMID: 33843070 DOI: 10.1002/sim.8942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 01/17/2021] [Accepted: 02/16/2021] [Indexed: 11/08/2022]
Abstract
Motor vehicle crashes are a global public health concern. Most analysis have used zero-inflated count models for examining crash counts. However, few methods are available to account for safety metrics that have semi-continuous observations. This article considers the problem of variable selection for the semi-continuous zero-inflated (SCZI) models. These models include two parts: a zero-inflated part and a nonzero continuous part. A special group regularization is designed to accommodate the unique structure of two-part SCZI models, and a type of Bayesian information criterion is proposed to select tuning parameters. We illustrate the variable selection process of the proposed model using lane position data from a driving simulator study. In the study, drivers stay in the intended lane for the majority of their drive (zero-inflated part). On occasion, some drivers do drift out of their intended driving lane (nonzero continuous part). Our findings show that individual differences can be captured with the proposed model, which has implications for driving safety and the design of in-vehicle alerting systems.
Collapse
Affiliation(s)
- Tianshu Feng
- Industrial and Systems Engineering, University of Washington, Seattle, Washington, USA
| | - Linda Ng Boyle
- Industrial and Systems Engineering, University of Washington, Seattle, Washington, USA
| |
Collapse
|
21
|
Cavenague de Souza HC, Louzada F, de Oliveira MR, Fawole B, Akintan A, Oyeneyin L, Sanni W, Silva Castro Perdoná GD. The Log-Normal zero-inflated cure regression model for labor time in an African obstetric population. J Appl Stat 2021; 49:2416-2429. [DOI: 10.1080/02664763.2021.1896684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
| | - Francisco Louzada
- Institute of Mathematical Science and Computing, University of São Paulo, São Carlos, Brazil
| | | | - Bukola Fawole
- Department of Obstetrics and Gynaecology, College of Medicine, University of Ibadan, Ibadan, Nigeria
| | - Adesina Akintan
- Department of Obstetrics and Gynaecology, Mother and Child Hospital, Akure, Ondo State, Nigeria
| | - Lawal Oyeneyin
- Department of Obstetrics and Gynaecology, Mother and Child Hospital, Ondo, Ondo State, Nigeria
| | | | - Gleici da Silva Castro Perdoná
- Department of Social Medicine, Ribeirão Preto School of Medicine, University of São Paulo, Ribeirão Preto, São Paulo Brazil
| |
Collapse
|
22
|
Zhou F, He K, Li Q, Chapkin RS, Ni Y. Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization. Biostatistics 2021; 23:891-909. [PMID: 33634824 DOI: 10.1093/biostatistics/kxab002] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 10/08/2020] [Accepted: 01/10/2021] [Indexed: 12/26/2022] Open
Abstract
High-throughput sequencing technology provides unprecedented opportunities to quantitatively explore human gut microbiome and its relation to diseases. Microbiome data are compositional, sparse, noisy, and heterogeneous, which pose serious challenges for statistical modeling. We propose an identifiable Bayesian multinomial matrix factorization model to infer overlapping clusters on both microbes and hosts. The proposed method represents the observed over-dispersed zero-inflated count matrix as Dirichlet-multinomial mixtures on which latent cluster structures are built hierarchically. Under the Bayesian framework, the number of clusters is automatically determined and available information from a taxonomic rank tree of microbes is naturally incorporated, which greatly improves the interpretability of our findings. We demonstrate the utility of the proposed approach by comparing to alternative methods in simulations. An application to a human gut microbiome data set involving patients with inflammatory bowel disease reveals interesting clusters, which contain bacteria families Bacteroidaceae, Bifidobacteriaceae, Enterobacteriaceae, Fusobacteriaceae, Lachnospiraceae, Ruminococcaceae, Pasteurellaceae, and Porphyromonadaceae that are known to be related to the inflammatory bowel disease and its subtypes according to biological literature. Our findings can help generate potential hypotheses for future investigation of the heterogeneity of the human gut microbiome.
Collapse
Affiliation(s)
- Fangting Zhou
- Department of Statistics, Texas A&M University, College Station, TX, USA and Institute of Statistics and Big Data, Renmin University of China, Beijing, China
| | - Kejun He
- Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing, China
| | - Qiwei Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Dallas, TX, USA
| | - Robert S Chapkin
- Department of Nutrition and Food Science, Texas A&M University, College Station, TX, USA
| | | |
Collapse
|
23
|
Zhang H, Chen J, Feng Y, Wang C, Li H, Liu L. Mediation effect selection in high-dimensional and compositional microbiome data. Stat Med 2021; 40:885-896. [PMID: 33205470 PMCID: PMC7855955 DOI: 10.1002/sim.8808] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 08/31/2020] [Accepted: 10/16/2020] [Indexed: 01/08/2023]
Abstract
The microbiome plays an important role in human health by mediating the path from environmental exposures to health outcomes. The relative abundances of the high-dimensional microbiome data have an unit-sum restriction, rendering standard statistical methods in the Euclidean space invalid. To address this problem, we use the isometric log-ratio transformations of the relative abundances as the mediator variables. To select significant mediators, we consider a closed testing-based selection procedure with desirable confidence. Simulations are provided to verify the effectiveness of our method. As an illustrative example, we apply the proposed method to study the mediation effects of murine gut microbiome between subtherapeutic antibiotic treatment and body weight gain, and identify Coprobacillus and Adlercreutzia as two significant mediators.
Collapse
Affiliation(s)
- Haixiang Zhang
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN 55905, USA
| | - Yang Feng
- Department of Biostatistics, College of Global Public Health, New York University, New York, NY 10003, USA
| | - Chan Wang
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| | - Huilin Li
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| | - Lei Liu
- Division of Biostatistics, Washington University in St. Louis, St. Louis, MO 63110, USA
| |
Collapse
|
24
|
Huling JD, Smith MA, Chen G. A Two-Part Framework for Estimating Individualized Treatment Rules From Semicontinuous Outcomes. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1801449] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Jared D. Huling
- Division of Biostatistics, University of Minnesota, Minneapolis, MN
| | - Maureen A. Smith
- Departments of Population Health Sciences and Family Medicine, University of Wisconsin-Madison, Madison, WI
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| |
Collapse
|
25
|
Chen B, Xu W. Generalized estimating equation modeling on correlated microbiome sequencing data with longitudinal measures. PLoS Comput Biol 2020; 16:e1008108. [PMID: 32898133 PMCID: PMC7500673 DOI: 10.1371/journal.pcbi.1008108] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 09/18/2020] [Accepted: 06/30/2020] [Indexed: 11/19/2022] Open
Abstract
Existing models for assessing microbiome sequencing such as operational taxonomic units (OTUs) can only test predictors' effects on OTUs. There is limited work on how to estimate the correlations between multiple OTUs and incorporate such relationship into models to evaluate longitudinal OTU measures. We propose a novel approach to estimate OTU correlations based on their taxonomic structure, and apply such correlation structure in Generalized Estimating Equations (GEE) models to estimate both predictors' effects and OTU correlations. We develop a two-part Microbiome Taxonomic Longitudinal Correlation (MTLC) model for multivariate zero-inflated OTU outcomes based on the GEE framework. In addition, longitudinal and other types of repeated OTU measures are integrated in the MTLC model. Extensive simulations have been conducted to evaluate the performance of the MTLC method. Compared with the existing methods, the MTLC method shows robust and consistent estimation, and improved statistical power for testing predictors' effects. Lastly we demonstrate our proposed method by implementing it into a real human microbiome study to evaluate the obesity on twins.
Collapse
Affiliation(s)
- Bo Chen
- Princess Margaret Hospital, Toronto, Ontario, Canada
| | - Wei Xu
- Princess Margaret Hospital, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|