1
|
Zhang Z, Nishimura A, Trovão NS, Cherry JL, Holbrook AJ, Ji X, Lemey P, Suchard MA. Accelerating Bayesian inference of dependency between mixed-type biological traits. PLoS Comput Biol 2023; 19:e1011419. [PMID: 37639445 PMCID: PMC10491301 DOI: 10.1371/journal.pcbi.1011419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/08/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open
Abstract
Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.
Collapse
Affiliation(s)
- Zhenyu Zhang
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, California, United States of America
| | - Akihiko Nishimura
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Nídia S. Trovão
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Joshua L. Cherry
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Andrew J. Holbrook
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, California, United States of America
| | - Xiang Ji
- Department of Mathematics, Tulane University, New Orleans, Louisiana, United States of America
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Marc A. Suchard
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Biomathematics, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
2
|
Kang T, Gaskins J, Levy S, Datta S. Analyzing dental fluorosis data using a novel Bayesian model for clustered longitudinal ordinal outcomes with an inflated category. Stat Med 2023; 42:745-760. [PMID: 36574753 PMCID: PMC11180454 DOI: 10.1002/sim.9641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 10/26/2022] [Accepted: 12/18/2022] [Indexed: 12/28/2022]
Abstract
We propose a Bayesian hurdle mixed-effects model to analyze longitudinal ordinal data under a complex multilevel structure. This research was motivated by the dataset gathered from the Iowa Fluoride Study (IFS) in order to establish the relationships between fluorosis status and potential risk/protective factors. Dental fluorosis is characterized by spots on tooth enamel and is due to ingestion of excessive fluoride intake during enamel formation. Observations are collected from multiple surface zones on each tooth and on all available teeth of children from the studied cohort, which are longitudinally observed at ages 9, 13, and 17. The data not only exhibit a complex hierarchical structure, but also have a large proportion of zero values that are likely to follow different statistical patterns from non-zero categories. Therefore, we develop a hurdle model to consider the zero category separately, while a proportional odds model is used for the positive categories. The estimated parameters are obtained from a Gibbs sampler implemented by the OpenBUGS software. Our model is compared with two popular methods for ordinal data: the proportional odds model and the partial proportional odds model. We perform a comprehensive analysis of the IFS data and evaluate the accuracy and effectiveness of our methodology through simulation studies. Our discoveries provide novel insights to statisticians and dental practitioners about the associations between patient and clinical characteristics and dental fluorosis.
Collapse
Affiliation(s)
- Tong Kang
- Global Biometic Data Sciences, Oncology, Bristol Myers Squibb, Lawrenceville, New Jersey
| | - Jeremy Gaskins
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky
| | - Steven Levy
- Department of Preventive and Community Dentistry, University of Iowa, Iowa City, Iowa
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, Florida
| |
Collapse
|
3
|
Abstract
The so-called proportional odds assumption is popular in cumulative, ordinal regression. In practice, however, such an assumption is sometimes too restrictive. For instance, when modeling the perception of boar taint on an individual level, it turns out that, at least for some subjects, the effects of predictors (androstenone and skatole) vary between response categories. For more flexible modeling, we consider the use of a ‘smooth-effects-on-response penalty’ (SERP) as a connecting link between proportional and fully non-proportional odds models, assuming that parameters of the latter vary smoothly over response categories. The usefulness of SERP is further demonstrated through a simulation study. Besides flexible and accurate modeling, SERP also enables fitting of parameters in cases where the pure, unpenalized non-proportional odds model fails to converge.
Collapse
|
4
|
Zhang Z, Nishimura A, Bastide P, Ji X, Payne RP, Goulder P, Lemey P, Suchard MA. Large-scale inference of correlation among mixed-type biological traits with phylogenetic multivariate probit models. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1394] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Zhenyu Zhang
- Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles
| | - Akihiko Nishimura
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University
| | | | - Xiang Ji
- Department of Mathematics, School of Science & Engineering, Tulane University
| | - Rebecca P. Payne
- Translational and Clinical Research Institute, Newcastle University
| | - Philip Goulder
- Department of Paediatrics, University of Oxford, HIV Pathogenesis Programme, Doris Duke Medical Research Institute, University of KwaZulu-Natal, Ragon Institute of MGH, MIT and Harvard University
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven
| | - Marc A. Suchard
- Departments of Biomathematics, Biostatistics and Human Genetics, University of California, Los Angeles
| |
Collapse
|
5
|
Rodhouse TJ, Irvine KM, Bowersock L. Post-Fire Vegetation Response in a Repeatedly Burned Low-Elevation Sagebrush Steppe Protected Area Provides Insights About Resilience and Invasion Resistance. Front Ecol Evol 2020. [DOI: 10.3389/fevo.2020.584726] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Sagebrush steppe ecosystems are threatened by human land-use legacies, biological invasions, and altered fire and climate dynamics. Steppe protected areas are therefore of heightened conservation importance but are few and vulnerable to the same impacts broadly affecting sagebrush steppe. To address this problem, sagebrush steppe conservation science is increasingly emphasizing a focus on resilience to fire and resistance to non-native annual grass invasion as a decision framework. It is well-established that the positive feedback loop between fire and annual grass invasion is the driving process of most contemporary steppe degradation. We use a newly developed ordinal zero-augmented beta regression model fit to large-sample vegetation monitoring data from John Day Fossil Beds National Monument, USA, spanning 7 years to evaluate fire responses of two native perennial foundation bunchgrasses and two non-native invasive annual grasses in a repeatedly burned, historically grazed, and inherently low-resilient protected area. We structured our model hierarchically to support inferences about variation among ecological site types and over time after also accounting for growing-season water deficit, fine-scale topographic variation, and burn severity. We use a state-and-transition conceptual diagram and abundances of plants listed in ecological site reference conditions to formalize our hypothesis of fire-accelerated transition to ecologically novel annual grassland. Notably, big sagebrush (Artemisia tridentata) and other woody species were entirely removed by fire. The two perennial grasses, bluebunch wheatgrass (Pseudoroegneria spicata) and Thurber's needlegrass (Achnatherum thurberianum) exhibited fire resiliency, with no apparent trend after fire. The two annual grasses, cheatgrass (Bromus tectorum) and medusahead (Taeniatherum caput-medusae), increased in response to burn severity, most notably medusahead. Surprisingly, we found no variation in grass cover among ecological sites, suggesting fire-driven homogenization as shrubs were removed and annual grasses became dominant. We found contrasting responses among all four grass species along gradients of topography and water deficit, informative to protected-area conservation strategies. The fine-grained influence of topography was particularly important to variation in cover among species and provides a foothold for conservation in low-resilient, aridic steppe. Broadly, our study demonstrates how to operationalize resilience and resistance concepts for protected areas by integrating empirical data with conceptual and statistical models.
Collapse
|
6
|
Itô H. State-space modeling of the dynamics of temporal plant cover using visually determined class data. PeerJ 2020; 8:e9383. [PMID: 32587805 PMCID: PMC7304429 DOI: 10.7717/peerj.9383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 05/28/2020] [Indexed: 11/20/2022] Open
Abstract
A lot of vegetation-related data have been collected as an ordered plant cover class that can be determined visually. However, they are difficult to analyze numerically as they are in an ordinal scale and have uncertainty in their classification. Here, I constructed a state-space model to estimate unobserved plant cover proportions (ranging from zero to one) from such cover class data. The model assumed that the data were measured longitudinally, so that the autocorrelations in the time-series could be utilized to estimate the unobserved cover proportion. The model also assumed that the quadrats where the data were collected were arranged sequentially, so that the spatial autocorrelations also could be utilized to estimate the proportion. Assuming a beta distribution as the probability distribution of the cover proportion, the model was implemented with a regularized incomplete beta function, which is the cumulative density function of the beta distribution. A simulated dataset and real datasets, with one-dimensional spatial structure and longitudinal survey, were fit to the model, and the parameters were estimated using the Markov chain Monte Carlo method. Then, the validity was examined using posterior predictive checks. As a result of the fitting, the Markov chain successfully converged to the stationary distribution, and the posterior predictive checks did not show large discrepancies. For the simulated dataset, the estimated values were close to the values used for the data generation. The estimated values for the real datasets also seemed to be reasonable. These results suggest that the proposed state-space model was able to successfully estimate the unobserved cover proportion. The present model is applicable to similar types of plant cover class data, and has the possibility to be expanded, for example, to incorporate a two-dimensional spatial structure and/or zero-inflation.
Collapse
Affiliation(s)
- Hiroki Itô
- Hokkaido Research Center, Forestry and Forest Products Research Institute, Toyohira-ku, Sapporo, Japan
| |
Collapse
|
7
|
Tiwari J, Yu B, Fentie B, Ellis R. Probability distribution of groundcover for runoff prediction in rangeland in the Burnett–Mary Region, Queensland. RANGELAND JOURNAL 2020. [DOI: 10.1071/rj19082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Considering the degree of spatial and temporal variation of groundcover in grazing land, it is desirable to use a simple and robust model to represent the spatial variation in cover in order to quantify its effect on runoff and soil loss. The purpose of the study was to test whether a two-parameter beta (β) distribution could be used to characterise cover variation in space at the sub-catchment scale. Twenty sub-catchments (area range 35.8–231km2) in the Burnett–Mary region, Queensland, were randomly selected. Thirty raster layers of groundcover at 30-m resolution were prepared for these 20 sub-catchments with the average cover for the 30 layers ranging from 24% to 91%. Three methods were used to test the appropriateness of the β distribution for characterising the cover variation in space: (i) visual goodness-of-fit assessment and Kolmogorov–Smirnov (K-S) test; (ii) the fractional area with cover ≤53%; and (iii) estimated runoff amount for a given rainfall amount for the area with cover ≤53%. The K-S test on 30×100 samples of groundcover showed that the hypothesis of β distribution for groundcover could not be rejected at P=0.05 for 97.5% of the cases. A comparison of the observed and β distributions in terms of the fractional area with cover ≤53% showed that the discrepancy was ≤8% for the 30 layers considered. A comparison in terms of the estimated runoff showed that results using the observed cover distribution and the β distribution were highly correlated (R2 range 0.91–0.98; Nash–Sutcliffe efficiency measure range 0.88–0.99). The mean absolute error of estimated runoff ranged from 0.98 to 8.10mm and the error relative to the mean was 4–16%. The results indicated that the two-parameter β distribution can be adequately used to characterise the spatial variation of cover and to evaluate the effect of cover on runoff for these predominantly grazing catchments.
Collapse
|
8
|
Requena-Mullor JM, Maguire KC, Shinneman DJ, Caughlin TT. Integrating anthropogenic factors into regional-scale species distribution models-A novel application in the imperiled sagebrush biome. GLOBAL CHANGE BIOLOGY 2019; 25:3844-3858. [PMID: 31180605 DOI: 10.1111/gcb.14728] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 05/10/2019] [Indexed: 06/09/2023]
Abstract
Species distribution models (SDMs) that rely on regional-scale environmental variables will play a key role in forecasting species occurrence in the face of climate change. However, in the Anthropocene, a number of local-scale anthropogenic variables, including wildfire history, land-use change, invasive species, and ecological restoration practices can override regional-scale variables to drive patterns of species distribution. Incorporating these human-induced factors into SDMs remains a major research challenge, in part because spatial variability in these factors occurs at fine scales, rendering prediction over regional extents problematic. Here, we used big sagebrush (Artemisia tridentata Nutt.) as a model species to explore whether including human-induced factors improves the fit of the SDM. We applied a Bayesian hurdle spatial approach using 21,753 data points of field-sampled vegetation obtained from the LANDFIRE program to model sagebrush occurrence and cover by incorporating fire history metrics and restoration treatments from 1980 to 2015 throughout the Great Basin of North America. Models including fire attributes and restoration treatments performed better than those including only climate and topographic variables. Number of fires and fire occurrence had the strongest relative effects on big sagebrush occurrence and cover, respectively. The models predicted that the probability of big sagebrush occurrence decreases by 1.2% (95% CI: -6.9%, 0.6%) when one fire occurs and cover decreases by 44.7% (95% CI: -47.9%, -41.3%) if at least one fire occurred over the 36 year period of record. Restoration practices increased the probability of big sagebrush occurrence but had minimal effect on cover. Our results demonstrate the potential value of including disturbance and land management along with climate in models to predict species distributions. As an increasing number of datasets representing land-use history become available, we anticipate that our modeling framework will have broad relevance across a range of biomes and species.
Collapse
Affiliation(s)
| | - Kaitlin C Maguire
- Forest and Rangeland Ecosystem Science Center, U.S. Geological Survey, Boise, Idaho
| | - Douglas J Shinneman
- Forest and Rangeland Ecosystem Science Center, U.S. Geological Survey, Boise, Idaho
| | | |
Collapse
|
9
|
Irvine KM, Wright WJ, Shanahan EK, Rodhouse TJ. Cohesive framework for modelling plant cover class data. Methods Ecol Evol 2019. [DOI: 10.1111/2041-210x.13262] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Kathryn M. Irvine
- U.S. Geological Survey Northern Rocky Mountain Science Center Bozeman MT USA
| | | | - Erin K. Shanahan
- Greater Yellowstone Network U.S. National Park Service Bozeman MT USA
| | | |
Collapse
|
10
|
Wright WJ, Irvine KM, Warren JM, Barnett JK. Statistical design and analysis for plant cover studies with multiple sources of observation errors. Methods Ecol Evol 2017. [DOI: 10.1111/2041-210x.12825] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Wilson J. Wright
- U.S. Geological Survey Northern Rocky Mountain Science Center Bozeman MT USA
| | - Kathryn M. Irvine
- U.S. Geological Survey Northern Rocky Mountain Science Center Bozeman MT USA
| | - Jeffrey M. Warren
- U.S. Fish and Wildlife Service Red Rock Lakes National Wildlife Refuge Lima MT USA
| | - Jenny K. Barnett
- U.S. Fish and Wildlife Service Mid‐Columbia River National Wildlife Refuge Complex Burbank WA USA
| |
Collapse
|