1
|
Yu T, Zhang Y, Yuan J, Zhang Y, Li J, Huang Z. Cholesterol mediates the effects of single and multiple environmental phenols in urine on obesity. Lipids Health Dis 2024; 23:126. [PMID: 38685082 PMCID: PMC11057097 DOI: 10.1186/s12944-024-02113-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 04/18/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Overweight and obesity are among the leading chronic diseases worldwide. Environmental phenols have been renowned as endocrine disruptors that contribute to weight changes; however, the effects of exposure to mixed phenols on obesity are not well established. METHODS Using data from adults in National Health and Nutrition Examination Survey, this study examined the individual and combined effects of four phenols on obesity. A combination of traditional logistic regression and two mixed models (weighted quantile sum (WQS) regression and Bayesian kernel-machine regression (BKMR)) were used together to assess the role of phenols in the development of obesity. The potential mediation of cholesterol on these effects was analyzed through a parallel mediation model. RESULTS The results demonstrated that solitary phenols except triclosan were inversely associated with obesity (P-value < 0.05). The WQS index was also negatively correlated with general obesity (β: 0.770, 95% CI: 0.644-0.919, P-value = 0.004) and abdominal obesity (β: 0.781, 95% CI: 0.658-0.928, P-value = 0.004). Consistently, the BKMR model demonstrated the significant joint negative effects of phenols on obesity. The parallel mediation analysis revealed that high-density lipoprotein mediated the effects of all four single phenols on obesity, whereas low-density lipoprotein only mediated the association between benzophenol-3 and obesity. Moreover, Cholesterol acts as a mediator of the association between mixed phenols and obesity. Exposure to single and mixed phenols significantly and negatively correlated with obesity. Cholesterol mediated the association of single and mixed environmental phenols with obesity. CONCLUSIONS Assessing the potential public health risks of mixed phenols helps to incorporate this information into practical health advice and guidance.
Collapse
Affiliation(s)
- Ting Yu
- School of Public Health, Xuzhou Medical University, Xuzhou, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, China
| | - Yuqing Zhang
- Department of Obstetrics and Gynecology, Nanjing Maternity and Child Health Care Hospital, Women' s Hospital of Nanjing Medical University, Nanjing, China
| | - Jiali Yuan
- School of Public Health, Xuzhou Medical University, Xuzhou, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, China
| | - Yue Zhang
- School of Public Health, Xuzhou Medical University, Xuzhou, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, China
| | - Jing Li
- School of Public Health, Xuzhou Medical University, Xuzhou, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, China
| | - Zhenyao Huang
- School of Public Health, Xuzhou Medical University, Xuzhou, 221004, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, China.
| |
Collapse
|
2
|
Fotso CT, Girel S, Anjuère F, Braud VM, Hubert F, Goudon T. A mixture-like model for tumor-immune system interactions. J Theor Biol 2024; 581:111738. [PMID: 38278343 DOI: 10.1016/j.jtbi.2024.111738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 11/20/2023] [Accepted: 01/10/2024] [Indexed: 01/28/2024]
Abstract
We introduce a mathematical model based on mixture theory intended to describe the tumor-immune system interactions within the tumor microenvironment. The equations account for the geometry of the tumor expansion, and the displacement of the immune cells, driven by diffusion and chemotactic mechanisms. They also take into account the constraints in terms of nutrient and oxygen supply. The numerical investigations analyze the impact of the different modeling assumptions and parameters. Depending on the parameters, the model can reproduce elimination, equilibrium or escape phases and it identifies a critical role of oxygen/nutrient supply in shaping the tumor growth. In addition, antitumor immune cells are key factors in controlling tumor growth, maintaining an equilibrium while protumor cells favor escape and tumor expansion.
Collapse
Affiliation(s)
| | - Simon Girel
- Université Côte d'Azur, Inria, CNRS, LJAD, Parc Valrose, F-06108, Nice, France
| | - Fabienne Anjuère
- Université Côte d'Azur, CNRS, Institut de Pharmacologie Moléculaire et Cellulaire UMR 7275, 660 Route des Lucioles, F-06560, Valbonne, France
| | - Véronique M Braud
- Université Côte d'Azur, CNRS, Institut de Pharmacologie Moléculaire et Cellulaire UMR 7275, 660 Route des Lucioles, F-06560, Valbonne, France
| | - Florence Hubert
- I2M, Aix Marseille Université, CNRS, 39 rue F. Joliot-Curie, F-13453, Marseille, France
| | - Thierry Goudon
- Université Côte d'Azur, Inria, CNRS, LJAD, Parc Valrose, F-06108, Nice, France.
| |
Collapse
|
3
|
Jones G, Johnson WO, Heuer C. Modelling variation in test sensitivity for monitoring leptospirosis in beef cattle. Prev Vet Med 2023; 221:106074. [PMID: 37976969 DOI: 10.1016/j.prevetmed.2023.106074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 11/01/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023]
Abstract
When Bayesian latent class analysis is used for diagnostic test data in the absence of a gold standard test, it is common to assume that any unknown test sensitivities and specificities are constant across different populations. Indeed this assumption is often necessary for model identifiability. However there are a number of practical situations, depending on the type of test and the nature of the disease, where this assumption may not be true. We present a case study of using a microscopic agglutination test to diagnose leptospiroris infection in beef cattle, which strongly suggests that sensitivity in particular varies among herds. We develop and fit an alternative model in which sensitivity is related to within-herd prevalence, and discuss the statistical and epidemiological implications.
Collapse
Affiliation(s)
- Geoff Jones
- School of Mathematical and Computational Sciences, Massey University, Palmerston North 4442, New Zealand.
| | | | - Cord Heuer
- EpiCentre, Massey University, Palmerston North 4442, New Zealand
| |
Collapse
|
4
|
Wang PY, Bartel DP. A statistical approach for identifying primary substrates of ZSWIM8-mediated microRNA degradation in small-RNA sequencing data. BMC Bioinformatics 2023; 24:195. [PMID: 37170259 PMCID: PMC10176919 DOI: 10.1186/s12859-023-05306-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 04/25/2023] [Indexed: 05/13/2023] Open
Abstract
BACKGROUND One strategy for identifying targets of a regulatory factor is to perturb the factor and use high-throughput RNA sequencing to examine the consequences. However, distinguishing direct targets from secondary effects and experimental noise can be challenging when confounding signal is present in the background at varying levels. RESULTS Here, we present a statistical modeling strategy to identify microRNAs that are primary substrates of target-directed miRNA degradation (TDMD) mediated by ZSWIM8. This method uses a bi-beta-uniform mixture (BBUM) model to separate primary from background signal components, leveraging the expectation that primary signal is restricted to upregulation and not downregulation upon loss of ZSWIM8. The BBUM model strategy retained the apparent sensitivity and specificity of the previous ad hoc approach but was more robust against outliers, achieved a more consistent stringency, and could be performed using a single cutoff of false discovery rate (FDR). CONCLUSIONS We developed the BBUM model, a robust statistical modeling strategy to account for background secondary signal in differential expression data. It performed well for identifying primary substrates of TDMD and should be useful for other applications in which the primary regulatory targets are only upregulated or only downregulated. The BBUM model, FDR-correction algorithm, and significance-testing methods are available as an R package at https://github.com/wyppeter/bbum .
Collapse
Affiliation(s)
- Peter Y Wang
- Whitehead Institute for Biomedical Research, 455 Main Street, Cambridge, MA, 02142, USA
- Howard Hughes Medical Institute, Cambridge, MA, 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - David P Bartel
- Whitehead Institute for Biomedical Research, 455 Main Street, Cambridge, MA, 02142, USA.
- Howard Hughes Medical Institute, Cambridge, MA, 02142, USA.
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
5
|
Di Mari R, Ingrassia S, Punzo A. Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models. J Classif 2023; 40:1-34. [PMID: 37359509 PMCID: PMC10071261 DOI: 10.1007/s00357-023-09432-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/08/2023] [Indexed: 06/28/2023]
Abstract
In generalized linear models (GLMs), measures of lack of fit are typically defined as the deviance between two nested models, and a deviance-based R2 is commonly used to evaluate the fit. In this paper, we extend deviance measures to mixtures of GLMs, whose parameters are estimated by maximum likelihood (ML) via the EM algorithm. Such measures are defined both locally, i.e., at cluster-level, and globally, i.e., with reference to the whole sample. At the cluster-level, we propose a normalized two-term decomposition of the local deviance into explained, and unexplained local deviances. At the sample-level, we introduce an additive normalized decomposition of the total deviance into three terms, where each evaluates a different aspect of the fitted model: (1) the cluster separation on the dependent variable, (2) the proportion of the total deviance explained by the fitted model, and (3) the proportion of the total deviance which remains unexplained. We use both local and global decompositions to define, respectively, local and overall deviance R2 measures for mixtures of GLMs, which we illustrate-for Gaussian, Poisson and binomial responses-by means of a simulation study. The proposed fit measures are then used to assess, and interpret clusters of COVID-19 spread in Italy in two time points.
Collapse
Affiliation(s)
- Roberto Di Mari
- Dipartimento di Economia e Impresa, Università di Catania, Catania, Italy
| | | | - Antonio Punzo
- Dipartimento di Economia e Impresa, Università di Catania, Catania, Italy
| |
Collapse
|
6
|
Neilson AR, Jones GT, Macfarlane GJ, Pathan EM, McNamee P. Generating EQ-5D-5L health utility scores from BASDAI and BASFI: a mapping study in patients with axial spondyloarthritis using longitudinal UK registry data. Eur J Health Econ 2022; 23:1357-1369. [PMID: 35113270 PMCID: PMC9550731 DOI: 10.1007/s10198-022-01429-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 01/06/2022] [Indexed: 05/22/2023]
Abstract
BACKGROUND Preference-based health-state utility values (HSUVs), such as the EuroQol five-dimensional questionnaire (EQ-5D-5L), are needed to calculate quality-adjusted life-years (QALYs) for cost-effectiveness analyses. However, these are rarely used in clinical trials of interventions in axial spondyloarthritis (axSpA). In these cases, mapping can be used to predict HSUVs. OBJECTIVE To develop mapping algorithms to estimate EQ-5D-5L HSUVs from the Bath Ankylosing Disease Activity Index (BASDAI) and the Bath Ankylosing Spondylitis Functional Index (BASFI). METHODS Data from the British Society for Rheumatology Biologics Register in Ankylosing Spondylitis (BSRBR-AS) provided 5122 observations with complete BASDAI, BASFI, and EQ-5D-5L responses covering the full range of disease severity. We compared direct mapping using adjusted limited dependent variable mixture models (ALDVMMs) and optional inclusion of the gap between full health and the next feasible value with indirect response mapping using ordered probit (OPROBIT) and generalised ordered probit (GOPROBIT) models. Explanatory variables included BASDAI, BASFI, and age. Metrics to assess model goodness-of-fit and performance/accuracy included Akaike and Bayesian information criteria (AIC/BIC), mean absolute error (MAE) and root mean square error (RMSE), plotting predictive vs. observed estimates across the range of BASDAI/BASFI and comparing simulated data with the original data set for the preferred/best model. RESULTS Overall, the ALDVMM models that did not formally include the gap between full health and the next feasible value outperformed those that did. The four-component mixture models (with squared terms included) performed better than the three-component models. Response mapping using GOPROBIT (no squared terms included) or OPROBIT (with squared terms included) offered the next best performing models after the three-component ALDVMM (with squared terms). Simulated data of the preferred model (ALDVMM with four-components) did not significantly underestimate uncertainty across most of the range of EQ-5D-5L values, however the proportion of data at full health was underrepresented, likely due in part to model fitting on a small number of observations at this point in the actual data (4%). CONCLUSIONS The mapping algorithms developed in this study enabled the generation of EQ-5D-5L utilities from BASDAI/BASFI. The indirect mapping equations reported for the EQ-5D-5L facilitate the calculation of the EQ-5D-5L utility scores using other UK and country-specific value sets.
Collapse
Affiliation(s)
- Aileen R Neilson
- Edinburgh Clinical Trials Unit (ECTU), Usher Institute, University of Edinburgh, Edinburgh, UK.
| | - Gareth T Jones
- Epidemiology Group, Institute of Applied Health Sciences, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen, UK
| | - Gary J Macfarlane
- Epidemiology Group, Institute of Applied Health Sciences, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen, UK
| | - Ejaz Mi Pathan
- Rheumatology Department, Freeman Hospital, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Paul McNamee
- Health Economics Research Unit, Institute of Applied Health Sciences, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen, UK
| |
Collapse
|
7
|
Marrotte RR, Howe EJ, Beauclerc KB, Potter D, Northrup JM. Explaining detection heterogeneity with finite mixture and non-Euclidean movement in spatially explicit capture-recapture models. PeerJ 2022; 10:e13490. [PMID: 35694380 PMCID: PMC9186326 DOI: 10.7717/peerj.13490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 05/03/2022] [Indexed: 01/17/2023] Open
Abstract
Landscape structure affects animal movement. Differences between landscapes may induce heterogeneity in home range size and movement rates among individuals within a population. These types of heterogeneity can cause bias when estimating population size or density and are seldom considered during analyses. Individual heterogeneity, attributable to unknown or unobserved covariates, is often modelled using latent mixture distributions, but these are demanding of data, and abundance estimates are sensitive to the parameters of the mixture distribution. A recent extension of spatially explicit capture-recapture models allows landscape structure to be modelled explicitly by incorporating landscape connectivity using non-Euclidean least-cost paths, improving inference, especially in highly structured (riparian & mountainous) landscapes. Our objective was to investigate whether these novel models could improve inference about black bear (Ursus americanus) density. We fit spatially explicit capture-recapture models with standard and complex structures to black bear data from 51 separate study areas. We found that non-Euclidean models were supported in over half of our study areas. Associated density estimates were higher and less precise than those from simple models and only slightly more precise than those from finite mixture models. Estimates were sensitive to the scale (pixel resolution) at which least-cost paths were calculated, but there was no consistent pattern across covariates or resolutions. Our results indicate that negative bias associated with ignoring heterogeneity is potentially severe. However, the most popular method for dealing with this heterogeneity (finite mixtures) yielded potentially unreliable point estimates of abundance that may not be comparable across surveys, even in data sets with 136-350 total detections, 3-5 detections per individual, 97-283 recaptures, and 80-254 spatial recaptures. In these same study areas with high sample sizes, we expected that landscape features would not severely constrain animal movements and modelling non-Euclidian distance would not consistently improve inference. Our results suggest caution in applying non-Euclidean SCR models when there is no clear landscape covariate that is known to strongly influence the movement of the focal species, and in applying finite mixture models except when abundant data are available.
Collapse
Affiliation(s)
- Robby R. Marrotte
- Wildlife Research & Monitoring Section, Ministry of Northern Development, Mines, Natural Resources and Forestry, Peterborough, Ontario, Canada
| | - Eric J. Howe
- Wildlife Research & Monitoring Section, Ministry of Northern Development, Mines, Natural Resources and Forestry, Peterborough, Ontario, Canada
| | - Kaela B. Beauclerc
- Wildlife Research & Monitoring Section, Ministry of Northern Development, Mines, Natural Resources and Forestry, Peterborough, Ontario, Canada
| | - Derek Potter
- Wildlife Research & Monitoring Section, Ministry of Northern Development, Mines, Natural Resources and Forestry, Peterborough, Ontario, Canada
| | - Joseph M. Northrup
- Wildlife Research & Monitoring Section, Ministry of Northern Development, Mines, Natural Resources and Forestry, Peterborough, Ontario, Canada,Environmental and Life Sciences Graduate Program, Trent University, Peterborough, Ontario, Canada
| |
Collapse
|
8
|
Debnath R, Bardhan R, Misra A, Hong T, Rozite V, Ramage MH. Lockdown impacts on residential electricity demand in India: A data-driven and non-intrusive load monitoring study using Gaussian mixture models. Energy Policy 2022; 164:None. [PMID: 35620237 PMCID: PMC9022708 DOI: 10.1016/j.enpol.2022.112886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 02/23/2022] [Accepted: 02/25/2022] [Indexed: 06/15/2023]
Abstract
This study evaluates the effect of complete nationwide lockdown in 2020 on residential electricity demand across 13 Indian cities and the role of digitalisation using a public smart meter dataset. We undertake a data-driven approach to explore the energy impacts of work-from-home norms across five dwelling typologies. Our methodology includes climate correction, dimensionality reduction and machine learning-based clustering using Gaussian Mixture Models of daily load curves. Results show that during the lockdown, maximum daily peak demand increased by 150-200% as compared to 2018 and 2019 levels for one room-units (RM1), one bedroom-units (BR1) and two bedroom-units (BR2) which are typical for low- and middle-income families. While the upper-middle- and higher-income dwelling units (i.e., three (3BR) and more-than-three bedroom-units (M3BR)) saw night-time demand rise by almost 44% in 2020, as compared to 2018 and 2019 levels. Our results also showed that new peak demand emerged for the lockdown period for RM1, BR1 and BR2 dwelling typologies. We found that the lack of supporting socioeconomic and climatic data can restrict a comprehensive analysis of demand shocks using similar public datasets, which informed policy implications for India's digitalisation. We further emphasised improving the data quality and reliability for effective data-centric policymaking.
Collapse
Key Words
- AI, Artificial Intelligence
- BR1, 1-bedroomunit
- BR2, 2-bedroom unit
- BR3, 3-bedroom unit
- CDD, Cooling Degree Day
- COVID-19
- EM, Expectation–Maximisation algorithm
- GMM, Gaussian Mixture Models
- HDD, Heating Degree Day
- India
- M3BR, More than 3-bedroom unit
- MDS, Multidimensional Scaling
- Machine learning
- Mixture models
- NEEM, National Energy End-use Monitoring
- NILM
- NILM, Non-intrusive Load Monitoring
- RM1, 1-room unit
- WFH, Work-from-Home
- Work-from-home
Collapse
Affiliation(s)
- Ramit Debnath
- Energy Policy Research Group, Cambridge Judge Business School, University of Cambridge, Cambridge, CB2 1AG, UK
- Centre for Natural Material Innovation, Department of Architecture, University of Cambridge, Cambridge, CB2 1PX, UK
- Division of Humanities and Social Science, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Ronita Bardhan
- Sustainable Design Group, Department of Architecture, University of Cambridge, Cambridge, CB2 1PX, UK
| | - Ashwin Misra
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 15213-3890, USA
| | - Tianzhen Hong
- Building Technology and Urban Systems Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Vida Rozite
- Energy Efficiency Division, International Energy Agency, Paris, 75015, France
| | - Michael H. Ramage
- Centre for Natural Material Innovation, Department of Architecture, University of Cambridge, Cambridge, CB2 1PX, UK
| |
Collapse
|
9
|
Fasel NJ, Vullioud C, Genoud M. Assigning metabolic rate measurements to torpor and euthermy in heterothermic endotherms: "torpor", a new package for R. Biol Open 2022; 11:274272. [PMID: 35128558 PMCID: PMC9002798 DOI: 10.1242/bio.059064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 01/27/2022] [Indexed: 11/21/2022] Open
Abstract
Torpor is a state of controlled reduction of metabolic rate (M) in endotherms. Assigning measurements of M to torpor or euthermy can be challenging, especially when the difference between euthermic M and torpid M is small, in species defending a high minimal body temperature in torpor, in thermolabile species, and slightly below the thermoneutral zone (TNZ). Here, we propose a novel method for distinguishing torpor from euthermy. We use the variation in M measured during euthermic rest and torpor at varying ambient temperatures (Ta) to objectively estimate the lower critical temperature (Tlc) of the TNZ and to assign measurements to torpor, euthermic rest or rest within TNZ. In addition, this method allows the prediction of M during euthermic rest and torpor at varying Ta, including resting M within the TNZ. The present method has shown highly satisfactory results using 28 published sets of metabolic data obtained by respirometry on 26 species of mammals. Ultimately, this novel method aims to facilitate analysis of respirometry data in heterothermic endotherms. Finally, the development of the associated R-package (torpor) will enable widespread use of the method amongst biologists. Summary: The presented method and its associated R-package (torpor) enable the assignment of metabolic rate measurements to torpor or euthermy, ultimately improving the standardization of respirometry analyses in heterotherms.
Collapse
Affiliation(s)
- Nicolas J Fasel
- Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland
| | - Colin Vullioud
- Department Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research, D-10315 Berlin, Germany
| | - Michel Genoud
- Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
10
|
Abstract
In long-term follow-up studies, data are often collected on repeated measures of multivariate response variables as well as on time to the occurrence of a certain event. To jointly analyze such longitudinal data and survival time, we propose a general class of semiparametric latent-class models that accommodates a heterogeneous study population with flexible dependence structures between the longitudinal and survival outcomes. We combine nonparametric maximum likelihood estimation with sieve estimation and devise an efficient EM algorithm to implement the proposed approach. We establish the asymptotic properties of the proposed estimators through novel use of modern empirical process theory, sieve estimation theory, and semiparametric efficiency theory. Finally, we demonstrate the advantages of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities study.
Collapse
Affiliation(s)
- KIN YAU WONG
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - DONGLIN ZENG
- Department of Biostatistics, University of North Carolina at Chapel Hill, USA
| | - D. Y. LIN
- Department of Biostatistics, University of North Carolina at Chapel Hill, USA
| |
Collapse
|
11
|
Abstract
Background The types of outcomes measured collected in clinical studies and those required for cost-effectiveness analysis often differ. Decision makers routinely use quality adjusted life years (QALYs) to compare the benefits and costs of treatments across different diseases and treatments using a common metric. QALYs can be calculated using preference-based measures (PBMs) such as EQ-5D-3L, but clinical studies often focus on objective clinician or laboratory measured outcomes and non-preference-based patient outcomes, such as QLQ-C30. We model the relationship between the generic, preference-based EQ-5D-3L and the cancer specific quality of life questionnaire, QLQ-C30 in patients with breast cancer. This will result in a mapping that allows users to convert QLQ-C30 scores into EQ-5D-3L scores for the purposes of cost-effectiveness analysis or economic evaluation. Methods We use data from a randomized trial of 602 patients with HER2-positive advanced breast cancer provided 3766 EQ-5D-3L observations. Direct mapping using adjusted, limited dependent variable mixture models (ALDVMM) is compared to a random effects linear regression and indirect mapping using seemingly unrelated ordered probit models. EQ-5D-3L was estimated as a function of the summary scales of the QLQ-C30 and other patient characteristics. Results A four component mixture model outperformed other models in terms of summary fit statistics. A close fit to the observed data was observed across the range of disease severity. Simulated data from the model closely aligned to the original data and showed that mapping did not significantly underestimate uncertainty. In the simulated data, 22.15% were equal to 1 compared to 21.93% in the original data. Variance was 0.0628 in the simulated data versus 0.0693 in the original data. The preferred mapping is provided in Excel and Stata files for the ease of users. Conclusion A four component adjusted mixture model provides reliable, non-biased estimates of EQ-5D-3L from the QLQ-C30, to link clinical studies to economic evaluation of health technologies for breast cancer. This work adds to a growing body of literature demonstrating the appropriateness of mixture model based approaches in mapping. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-08964-5.
Collapse
Affiliation(s)
- Laura A Gray
- Health Economics and Decision Science, School of Health and Related Research, University of Sheffield, Sheffield, UK.
| | - Monica Hernandez Alava
- Health Economics and Decision Science, School of Health and Related Research, University of Sheffield, Sheffield, UK
| | - Allan J Wailoo
- Health Economics and Decision Science, School of Health and Related Research, University of Sheffield, Sheffield, UK
| |
Collapse
|
12
|
Moorman SM, Greenfield EA, Carr K. Using Mixture Modeling to Construct Subgroups of Cognitive Aging in the Wisconsin Longitudinal Study. J Gerontol B Psychol Sci Soc Sci 2021; 76:1512-1522. [PMID: 33152080 DOI: 10.1093/geronb/gbaa191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES Longitudinal surveys of older adults increasingly incorporate assessments of cognitive performance. However, very few studies have used mixture modeling techniques to describe cognitive aging, identifying subgroups of people who display similar patterns of performance across discrete cognitive functions. We employ this approach to advance empirical evidence concerning interindividual variability and intraindividual change in patterns of cognitive aging. METHOD We drew upon data from 3,713 participants in the Wisconsin Longitudinal Study (WLS). We used latent class analysis to generate subgroups of cognitive aging based on assessments of verbal fluency and episodic memory at ages 65 and 72. We also employed latent transition analysis to identify how individual participants moved between subgroups over the 7-year period. RESULTS There were 4 subgroups at each point in time. Approximately 3 quarters of the sample demonstrated continuity in the qualitative type of profile between ages 65 and 72, with 17.9% of the sample in a profile with sustained overall low performance at both ages 65 and 72. An additional 18.7% of participants made subgroup transitions indicating marked decline in episodic memory. DISCUSSION Results demonstrate the utility of using mixture modeling to identify qualitatively and quantitatively distinct subgroups of cognitive aging among older adults. We discuss the implications of these results for the continued use of population health data to advance research on cognitive aging.
Collapse
Affiliation(s)
| | | | - Kyle Carr
- Boston College, Chestnut Hill, Massachusetts
| |
Collapse
|
13
|
Burgette LF, Cabreros I, Han B, Paddock SM. Appropriate analyses of bimodal substance use frequency outcomes: a mixture model approach. Am J Drug Alcohol Abuse 2021; 47:559-568. [PMID: 34372719 DOI: 10.1080/00952990.2021.1946070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Background: In addiction research, outcome measures are often characterized by bimodal distributions. One mode can be for individuals with low substance use and the other mode for individuals with high substance use. Applying standard statistical procedures to bimodal data may result in invalid inference. Mixture models are appropriate for bimodal data because they assume that the sampled population is composed of several underlying subpopulations.Objectives: To introduce a novel mixture modeling approach to analyze bimodal substance use frequency data.Methods: We reviewed existing models used to analyze substance use frequency outcomes and developed multiple alternative variants of a finite mixture model. We applied all methods to data from a randomized controlled study in which 30-day alcohol abstinence was the primary outcome. Study data included 73 individuals (38 men and 35 women). Models were implemented in the software packages SAS, Stata, and Stan.Results: Shortcomings of existing approaches include: 1) inability to model outcomes with multiple modes, 2) invalid statistical inferences, including anti-conservative p-values, 3) sensitivity of results to the arbitrary choice to model days of substance use versus days of substance abstention, and 4) generation of predictions outside the range of common substance use frequency outcomes. Our mixture model variants avoided all of these shortcomings.Conclusions: Standard models of substance use frequency outcomes can be problematic, sometimes overstating treatment effectiveness. The mixture models developed improve the analysis of bimodal substance use frequency.
Collapse
Affiliation(s)
| | | | - Bing Han
- Economics, Sociology, and Statistics Department, RAND Corporation, Santa Monica, CA, USA
| | | |
Collapse
|
14
|
Ensari I, Caceres BA, Jackman KB, Suero-Tejeda N, Shechter A, Odlum ML, Bakken S. Digital phenotyping of sleep patterns among heterogenous samples of Latinx adults using unsupervised learning. Sleep Med 2021; 85:211-220. [PMID: 34364092 DOI: 10.1016/j.sleep.2021.07.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/17/2021] [Accepted: 07/12/2021] [Indexed: 11/25/2022]
Abstract
OBJECTIVE This study aimed to identify sleep disturbance subtypes ("phenotypes") among Latinx adults based on objective sleep data using a flexible unsupervised machine learning technique. METHODS This study was an analysis of sleep data from three cross-sectional studies of the Precision in Symptom Self-Management Center at Columbia University. All studies focused on sleep health in Latinx adults at increased risk for sleep disturbance. Data on total sleep time (TST), time in bed (TIB), wake after sleep onset (WASO), sleep efficiency (SE), number of awakenings (NOA) and the mean length of nightly awakenings were collected using wrist-mounted accelerometers. Cluster analysis of the sleep data was conducted using an unsupervised machine learning approach that relies on mixtures of multivariate generalized linear mixed models. RESULTS The analytic sample included 494 days of data from 118 adults (Ages 19-77). A 3-cluster model provided the best fit based on deviance indices (ie, DΔ∼ -75 and -17 from 1- and 2- to 3-cluster models, respectively) and likelihood ratio (Pdiff ∼ 0.93). Phenotype 1 (n = 64) was associated with greater likelihood of overall adequate SE and less variability in SE and WASO. Phenotype 2 (n = 11) was characterized by higher NOAs, and greater WASO and TIB than the other phenotypes. Phenotype 3 (n = 43) was characterized by greater variability in SE, bed times and awakening times. CONCLUSION Robust digital data-driven modeling approaches can be useful for detecting sleep phenotypes from heterogenous patient populations, and have implications for designing precision sleep health strategies for management and early detection of sleep problems.
Collapse
Affiliation(s)
- Ipek Ensari
- Columbia University Data Science Institute, New York, NY, 10025, USA.
| | - Billy A Caceres
- Columbia University Data Science Institute, New York, NY, 10025, USA; Columbia University School of Nursing, New York, NY, 10032, USA
| | - Kasey B Jackman
- Columbia University School of Nursing, New York, NY, 10032, USA; New York-Presbyterian Hospital, New York, 10032, USA
| | | | - Ari Shechter
- Columbia University Irving Medical Center, New York, NY, 10032, USA
| | | | - Suzanne Bakken
- Columbia University Data Science Institute, New York, NY, 10025, USA; Columbia University School of Nursing, New York, NY, 10032, USA
| |
Collapse
|
15
|
Goodin DA, Frieboes HB. Simulation of 3D centimeter-scale continuum tumor growth at sub-millimeter resolution via distributed computing. Comput Biol Med 2021; 134:104507. [PMID: 34157612 DOI: 10.1016/j.compbiomed.2021.104507] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 05/15/2021] [Accepted: 05/16/2021] [Indexed: 12/28/2022]
Abstract
Simulation of cm-scale tumor growth has generally been constrained by the computational cost to numerically solve the associated equations, with models limited to representing mm-scale or smaller tumors. While the work has proven useful to the study of small tumors and micro-metastases, a biologically-relevant simulation of cm-scale masses as would be typically detected and treated in patients has remained an elusive goal. This study presents a distributed computing (parallelized) implementation of a mixture model of tumor growth to simulate 3D cm-scale vascularized tissue at sub-mm resolution. The numerical solving scheme utilizes a two-stage parallelization framework. The solution is written for GPU computation using the CUDA framework, which handles all Multigrid-related computations. Message Passing Interface (MPI) handles distribution of information across multiple processes, freeing the program from RAM and the processing limitations found on single systems. On each system, Nvidia's CUDA library allows for fast processing of model data using GPU-bound computing on fewer systems. The results show that a combined MPI-CUDA implementation enables the continuum modeling of cm-scale tumors at reasonable computational cost. Further work to calibrate model parameters to particular tumor conditions could enable simulation of patient-specific tumors for clinical application.
Collapse
Affiliation(s)
- Dylan A Goodin
- Department of Bioengineering, University of Louisville, KY, USA
| | - Hermann B Frieboes
- Department of Bioengineering, University of Louisville, KY, USA; James Graham Brown Cancer Center, University of Louisville, KY, USA; Center for Predictive Medicine, University of Louisville, KY, USA.
| |
Collapse
|
16
|
Kundu S, Ming J, Nocera J, McGregor KM. Integrative learning for population of dynamic networks with covariates. Neuroimage 2021; 236:118181. [PMID: 34022384 PMCID: PMC8851385 DOI: 10.1016/j.neuroimage.2021.118181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 04/22/2021] [Accepted: 05/16/2021] [Indexed: 11/16/2022] Open
Abstract
Although there is a rapidly growing literature on dynamic connectivity methods, the primary focus has been on separate network estimation for each individual, which fails to leverage common patterns of information. We propose novel graph-theoretic approaches for estimating a population of dynamic networks that are able to borrow information across multiple heterogeneous samples in an unsupervised manner and guided by covariate information. Specifically, we develop a Bayesian product mixture model that imposes independent mixture priors at each time scan and uses covariates to model the mixture weights, which results in time-varying clusters of samples designed to pool information. The computation is carried out using an effcient Expectation-Maximization algorithm. Extensive simulation studies illustrate sharp gains in recovering the true dynamic network over existing dynamic connectivity methods. An analysis of fMRI block task data with behavioral interventions reveal subgroups of individuals having similar dynamic connectivity, and identifies intervention-related dynamic network changes that are concentrated in biologically interpretable brain regions. In contrast, existing dynamic connectivity approaches are able to detect minimal or no changes in connectivity over time, which seems biologically unrealistic and highlights the challenges resulting from the inability to systematically borrow information across samples.
Collapse
Affiliation(s)
- Suprateek Kundu
- Biostatistics Emory University, 1518 Clifton Road, Atlanta, GA 30322, USA.
| | - Jin Ming
- Biostatistics Emory University, 1518 Clifton Road, Atlanta, GA 30322, USA
| | - Joe Nocera
- Biostatistics Emory University, 1518 Clifton Road, Atlanta, GA 30322, USA
| | - Keith M McGregor
- Biostatistics Emory University, 1518 Clifton Road, Atlanta, GA 30322, USA
| |
Collapse
|
17
|
Panza KE, Kline AC, Norman GJ, Pitts M, Norman SB. Subgroups of comorbid PTSD and AUD in U.S. military veterans predict differential responsiveness to two integrated treatments: A latent class analysis. J Psychiatr Res 2021; 137:342-350. [PMID: 33756376 DOI: 10.1016/j.jpsychires.2021.02.061] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Revised: 02/16/2021] [Accepted: 02/23/2021] [Indexed: 01/19/2023]
Abstract
Posttraumatic stress disorder (PTSD) and alcohol use disorder (AUD) frequently co-occur. Integrated treatments are effective, but not all patients respond and predicting outcome remains difficult. In this study, latent class analysis (LCA) identified symptom-based subgroups of comorbid PTSD/AUD among 119 veterans with PTSD/AUD from a randomized controlled trial of integrated exposure therapy (I-PE) versus integrated coping skills therapy (I-CS). Multilevel models compared subgroups on PTSD severity and percentage of heavy drinking days at post-treatment and 3- and 6-month follow-up. LCA revealed three subgroups best fit the data: Moderate PTSD/Low AUD Impairment (21%), High PTSD/High AUD Impairment (48%), and Low PTSD/High AUD Impairment (31%). There was a three-way interaction between time, treatment condition, and subgroup in predicting PTSD outcomes (p < .05). For the Moderate PTSD/Low AUD Impairment class, outcomes at post-treatment and 3-months were similar (ds = 0.17, 0.55), however I-PE showed greater reductions at 6-months (d = 1.36). For the High PTSD/High AUD Impairment class, I-PE demonstrated better post-treatment (d = 0.83) but comparable follow-up (ds = -0.18, 0.49) outcomes. For the Low PTSD/High AUD Impairment class, I-PE demonstrated stronger outcomes at every timepoint (ds = 0.82-1.15). Heavy drinking days declined significantly through follow-up, with an effect of subgroup, but not treatment, on timing of response. This was the first study modeling how PTSD and AUD symptoms might cluster together in a treatment sample of veterans with PTSD/AUD. Symptom-based subgroups show promise in helping understand variability in treatment response among patients with PTSD/AUD and deserve further study.
Collapse
Affiliation(s)
- Kaitlyn E Panza
- VA San Diego Healthcare System, 3350 La Jolla Village Drive, San Diego, CA, 92161, USA; Department of Psychiatry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
| | - Alexander C Kline
- VA San Diego Healthcare System, 3350 La Jolla Village Drive, San Diego, CA, 92161, USA; Department of Psychiatry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Gregory J Norman
- Department of Family Medicine and Public Health, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Michelle Pitts
- VA San Diego Healthcare System, 3350 La Jolla Village Drive, San Diego, CA, 92161, USA
| | - Sonya B Norman
- VA San Diego Healthcare System, 3350 La Jolla Village Drive, San Diego, CA, 92161, USA; Department of Psychiatry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA; National Center for Posttraumatic Stress Disorder, 163 Veterans Drive, White River Junction, VT, 05009, USA
| |
Collapse
|
18
|
Abstract
BACKGROUND Tissues are often heterogeneous in their single-cell molecular expression, and this can govern the regulation of cell fate. For the understanding of development and disease, it is important to quantify heterogeneity in a given tissue. RESULTS We present the R package stochprofML which uses the maximum likelihood principle to parameterize heterogeneity from the cumulative expression of small random pools of cells. We evaluate the algorithm's performance in simulation studies and present further application opportunities. CONCLUSION Stochastic profiling outweighs the necessary demixing of mixed samples with a saving in experimental cost and effort and less measurement error. It offers possibilities for parameterizing heterogeneity, estimating underlying pool compositions and detecting differences between cell populations between samples.
Collapse
Affiliation(s)
- Lisa Amrhein
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
- Department of Mathematics, Technical University Munich, Boltzmannstrasse 3, 85748 Garching, Germany
| | - Christiane Fuchs
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
- Department of Mathematics, Technical University Munich, Boltzmannstrasse 3, 85748 Garching, Germany
- Faculty of Business Administration and Economics, Bielefeld University, Universitätsstrasse 25, 33615 Bielefeld, Germany
| |
Collapse
|
19
|
Race JA, Pennell ML. Semi-parametric survival analysis via Dirichlet process mixtures of the First Hitting Time model. Lifetime Data Anal 2021; 27:177-194. [PMID: 33420544 DOI: 10.1007/s10985-020-09514-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 12/23/2020] [Indexed: 06/12/2023]
Abstract
Time-to-event data often violate the proportional hazards assumption inherent in the popular Cox regression model. Such violations are especially common in the sphere of biological and medical data where latent heterogeneity due to unmeasured covariates or time varying effects are common. A variety of parametric survival models have been proposed in the literature which make more appropriate assumptions on the hazard function, at least for certain applications. One such model is derived from the First Hitting Time (FHT) paradigm which assumes that a subject's event time is determined by a latent stochastic process reaching a threshold value. Several random effects specifications of the FHT model have also been proposed which allow for better modeling of data with unmeasured covariates. While often appropriate, these methods often display limited flexibility due to their inability to model a wide range of heterogeneities. To address this issue, we propose a Bayesian model which loosens assumptions on the mixing distribution inherent in the random effects FHT models currently in use. We demonstrate via simulation study that the proposed model greatly improves both survival and parameter estimation in the presence of latent heterogeneity. We also apply the proposed methodology to data from a toxicology/carcinogenicity study which exhibits nonproportional hazards and contrast the results with both the Cox model and two popular FHT models.
Collapse
Affiliation(s)
- Jonathan A Race
- Division of Biostatistics, College of Public Health, The Ohio State University, 1841 Neil Ave., Columbus, OH, 43210, USA
| | - Michael L Pennell
- Division of Biostatistics, College of Public Health, The Ohio State University, 1841 Neil Ave., Columbus, OH, 43210, USA.
| |
Collapse
|
20
|
Shirinkam S, Alaeddini A, Gross E. IDENTIFYING THE NUMBER OF COMPONENTS IN GAUSSIAN MIXTURE MODELS USING NUMERICAL ALGEBRAIC GEOMETRY. J Algebra Appl 2020; 19:2050204. [PMID: 33867617 PMCID: PMC8048412 DOI: 10.1142/s0219498820502047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model are typically estimated from training data using the iterative expectation-maximization algorithm, which requires the number of Gaussian components a priori. In this study, we propose two algorithms rooted in numerical algebraic geometry, namely an area-based algorithm and a local maxima algorithm, to identify the optimal number of components. The area-based algorithm transforms several Gaussian mixture models with varying number of components into sets of equivalent polynomial regression splines. Next, it uses homotopy continuation methods for evaluating the resulting splines to identify the number of components that results in the best fit. The local maxima algorithm forms a set of polynomials by fitting a smoothing spline to a kernel density estimate of the data. Next, it uses numerical algebraic geometry to solve the system of the first derivatives for finding the local maxima of the resulting smoothing spline, which estimates the number of mixture components. The local maxima algorithm also identifies the location of the centers of Gaussian components. Using a real-world case study in automotive manufacturing and multiple simulations, we compare the performance of the proposed algorithms with that of Akaike information criterion (AIC) and Bayesian information criterion (BIC), which are popular methods in the literature. We show the proposed algorithms are more robust than AIC and BIC when the Gaussian assumption is violated.
Collapse
Affiliation(s)
- Sara Shirinkam
- Department of Mathematics and Statistics, University of the Incarnate Word, 4301 Broadway, CPO 311, San Antonio, TX 78209, USA
| | - Adel Alaeddini
- Department of Mechanical Engineering, University of Texas at San Antonio, One UTSA Circle San Antonio, TX 78249, USA
| | - Elizabeth Gross
- Department of Mathematics, University of Hawai'i at Mānoa, 2565 McCarthy Mall, Honolulu, Hawaii 96822, USA
| |
Collapse
|
21
|
Sakurai Y, Kurosaki T. How has the relationship between oil and the US stock market changed after the Covid-19 crisis? Financ Res Lett 2020; 37:101773. [PMID: 33046963 PMCID: PMC7540196 DOI: 10.1016/j.frl.2020.101773] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 09/05/2020] [Accepted: 09/09/2020] [Indexed: 05/05/2023]
Abstract
In this paper, we investigate how the relationship between oil and the US stock market has changed after the onset of Covid-19 crisis. To do so, we compute upside and downside correlations between the two markets. Our findings are as follows. First, we document the correlation asymmetry: the downside correlation is higher than the upside correlation. Second, we find that both upside and downside correlations increased after the crisis. This indicates that after the start of the Covid-19 crisis, a positive (negative) oil shock is even better (worse) news for the stock market than an equivalent shock before the crisis.
Collapse
Affiliation(s)
- Yuji Sakurai
- Federal Reserve Bank of Richmond, Charlotte Office, 530 E Trade St, Charlotte, NC, 28202, USA
| | | |
Collapse
|
22
|
Pavlović DM, Guillaume BRL, Towlson EK, Kuek NMY, Afyouni S, Vértes PE, Yeo BTT, Bullmore ET, Nichols TE. Multi-subject Stochastic Blockmodels for adaptive analysis of individual differences in human brain network cluster structure. Neuroimage 2020; 220:116611. [PMID: 32058004 DOI: 10.1016/j.neuroimage.2020.116611] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 01/31/2020] [Accepted: 02/04/2020] [Indexed: 12/12/2022] Open
Abstract
There is considerable interest in elucidating the cluster structure of brain networks in terms of modules, blocks or clusters of similar nodes. However, it is currently challenging to handle data on multiple subjects since most of the existing methods are applicable only on a subject-by-subject basis or for analysis of an average group network. The main limitation of per-subject models is that there is no obvious way to combine the results for group comparisons, and of group-averaged models that they do not reflect the variability between subjects. Here, we propose two new extensions of the classical Stochastic Blockmodel (SBM) that use a mixture model to estimate blocks or clusters of connected nodes, combined with a regression model to capture the effects of subject-level covariates on individual differences in cluster structure. The proposed Multi-Subject Stochastic Blockmodels (MS-SBMs) can flexibly account for between-subject variability in terms of homogeneous or heterogeneous covariate effects on connectivity using subject demographics such as age or diagnostic status. Using synthetic data, representing a range of block sizes and cluster structures, we investigate the accuracy of the estimated MS-SBM parameters as well as the validity of inference procedures based on the Wald, likelihood ratio and permutation tests. We show that the proposed multi-subject SBMs recover the true cluster structure of synthetic networks more accurately and adaptively than standard methods for modular decomposition (i.e. the Fast Louvain and Newman Spectral algorithms). Permutation tests of MS-SBM parameters were more robustly valid for statistical inference and Type I error control than tests based on standard asymptotic assumptions. Applied to analysis of multi-subject resting-state fMRI networks (13 healthy volunteers; 12 people with schizophrenia; n=268 brain regions), we show that Heterogeneous Stochastic Blockmodel (Het-SBM) identifies a range of network topologies simultaneously, including modular and core structures.
Collapse
|
23
|
Kabeshova A, Yu Y, Lukacs B, Bacry E, Gaïffas S. ZiMM: A deep learning model for long term and blurry relapses with non-clinical claims data. J Biomed Inform 2020; 110:103531. [PMID: 32818667 DOI: 10.1016/j.jbi.2020.103531] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 07/25/2020] [Accepted: 08/09/2020] [Indexed: 11/28/2022]
Abstract
This paper considers the problems of modeling and predicting a long-term and "blurry" relapse that occurs after a medical act, such as a surgery. We do not consider a short-term complication related to the act itself, but a long-term relapse that clinicians cannot explain easily, since it depends on unknown sets or sequences of past events that occurred before the act. The relapse is observed only indirectly, in a "blurry" fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a "non-clinical" claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Our motivation for using such a non-clinical claims database is its exhaustivity population-wise, compared to clinical electronic health records coming from a single or a small set of hospitals. Indeed, we consider a dataset containing the claims of almost all French citizens who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work.
Collapse
Affiliation(s)
| | | | | | | | - Stéphane Gaïffas
- LPSM, Université de Paris, France; DMA, Ecole normale supérieure, Paris, France.
| |
Collapse
|
24
|
Lebeaux RM, Doherty BT, Gallagher LG, Zoeller RT, Hoofnagle AN, Calafat AM, Karagas MR, Yolton K, Chen A, Lanphear BP, Braun JM, Romano ME. Maternal serum perfluoroalkyl substance mixtures and thyroid hormone concentrations in maternal and cord sera: The HOME Study. Environ Res 2020; 185:109395. [PMID: 32222633 PMCID: PMC7657649 DOI: 10.1016/j.envres.2020.109395] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 03/12/2020] [Accepted: 03/13/2020] [Indexed: 05/09/2023]
Abstract
BACKGROUND Per- and polyfluoroalkyl substances (PFAS) are ubiquitous. Previous studies have found associations between PFAS and thyroid hormones in maternal and cord sera, but the results are inconsistent. To further address this research question, we used mixture modeling to assess the associations with individual PFAS, interactions among PFAS chemicals, and the overall mixture. METHODS We collected data through the Health Outcomes and Measures of the Environment (HOME) Study, a prospective cohort study that between 2003 and 2006 enrolled 468 pregnant women and their children in the greater Cincinnati, Ohio region. We assessed the associations of maternal serum PFAS concentrations measured during pregnancy with maternal (n = 185) and cord (n = 256) sera thyroid stimulating hormone (TSH), total thyroxine (TT4), total triiodothyronine (TT3), free thyroxine (FT4), and free triiodothyronine (FT3) using two mixture modeling approaches (Bayesian kernel machine regression (BKMR) and quantile g-computation) and multivariable linear regression. Additional models considered thyroid autoantibodies, other non-PFAS chemicals, and iodine deficiency as potential confounders or effect measure modifiers. RESULTS PFAS, considered individually or as mixtures, were generally not associated with any thyroid hormones. A doubling of perfluorooctanesulfonic acid (PFOS) had a positive association with cord serum TSH in BKMR models but the 95% Credible Interval included the null (β = 0.09; 95% CrI: -0.08, 0.27). Using BKMR and multivariable models, we found that among children born to mothers with higher thyroid peroxidase antibody (TPOAb), perfluorooctanoic acid (PFOA), PFOS, and perfluorohexanesulfonic acid (PFHxS) were associated with decreased cord FT4 suggesting modification by maternal TPOAb status. CONCLUSIONS These findings suggest that maternal serum PFAS concentrations measured in the second trimester of pregnancy are not strongly associated with thyroid hormones in maternal and cord sera. Further analyses using robust mixture models in other cohorts are required to corroborate these findings.
Collapse
Affiliation(s)
- Rebecca M Lebeaux
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Brett T Doherty
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | | | - R Thomas Zoeller
- Department of Biology, University of Massachusetts, Amherst, MA, USA
| | - Andrew N Hoofnagle
- Department of Laboratory Medicine, University of Washington, Seattle, WA, USA
| | - Antonia M Calafat
- Division of Laboratory Sciences, National Center for Environmental Health, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Margaret R Karagas
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Kimberly Yolton
- Division of General and Community Pediatrics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Aimin Chen
- Division of Epidemiology, Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Bruce P Lanphear
- Child and Family Research Institute, BC Children's and Women's Hospital and Faculty of Health Sciences, Simon Fraser University, Vancouver, British Columbia, Canada
| | - Joseph M Braun
- Department of Epidemiology, Brown University School of Public Health, Providence, RI, USA
| | - Megan E Romano
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA.
| |
Collapse
|
25
|
Wu J, Gupta M, Hussein AI, Gerstenfeld L. Bayesian modeling of factorial time-course data with applications to a bone aging gene expression study. J Appl Stat 2020; 48:1730-1754. [PMID: 34295011 DOI: 10.1080/02664763.2020.1772733] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Many scientific studies, especially in the biomedical sciences, generate data measured simultaneously over a multitude of units, over a period of time, and under different conditions or combinations of factors. Often, an important question of interest asked relates to which units behave similarly under different conditions, but measuring the variation over time complicates the analysis significantly. In this article we address such a problem arising from a gene expression study relating to bone aging, and develop a Bayesian statistical method that can simultaneously detect and uncover signals on three levels within such data: factorial, longitudinal, and transcriptional. Our model framework considers both cluster and time-point-specific parameters and these parameters uniquely determine the shapes of the temporal gene expression profiles, allowing the discovery and characterization of latent gene clusters based on similar underlying biological mechanisms. Our methodology was successfully applied to discover transcriptional networks in a microarray data set comparing the transcriptomic changes that occurred during bone aging in male and female mice expressing one or both copies of the bromodomain (Brd2) gene, a transcriptional regulator which exhibits an age-dependent sex-linked bone loss phenotype.
Collapse
Affiliation(s)
- Joseph Wu
- Boston University School of Public Health, Boston, MA, U. S. A.,Pfizer, Inc., Groton, CT, U.S.A
| | | | | | | |
Collapse
|
26
|
Stein E, Witkiewitz K. Trait self-control predicts drinking patterns during treatment for alcohol use disorder and recovery up to three years following treatment. Addict Behav 2019; 99:106083. [PMID: 31430618 DOI: 10.1016/j.addbeh.2019.106083] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 08/06/2019] [Accepted: 08/07/2019] [Indexed: 10/26/2022]
Abstract
To more fully understand recovery from alcohol use disorder, we must consider several ways in which reductions in drinking and improvements in psychosocial functioning may occur. Previous research has demonstrated various patterns of drinking and functioning during and after behavioral treatment for alcohol use disorder, including groups of individuals who consume alcohol at low-risk levels and those that report occasional heavy drinking yet good psychosocial functioning. This study aimed to identify whether trait self-control, which has previously been associated with alcohol treatment outcomes, was a predictor of drinking patterns during treatment as well as three years following treatment. Latent variable mixture modeling was used to identify seven classes of drinking patterns during treatment and four profiles of drinking and psychosocial function after treatment. We found that membership in the low-risk drinking class was predicted by greater trait self-control than several of the other classes, including the consistent abstinence class. Furthermore, we found that greater trait self-control predicted membership in two high-functioning recovery profiles at three years following treatment, including a high functioning occasional heavy drinking profile. These findings suggest that self-control is an important predictor of recovery, particularly for a non-abstinent recovery.
Collapse
|
27
|
Abstract
A prevailing notion in experimental psychology is that individuals' performance in a task varies gradually in a continuous fashion. In a Stroop task, for example, the true average effect may be 50 ms with a standard deviation of say 30 ms. In this case, some individuals will have greater effects than 50 ms, some will have smaller, and some are forecasted to have negative effects in sign-they respond faster to incongruent items than to congruent ones! But are there people who have a true negative effect in Stroop or any other task? We highlight three qualitatively different effects: negative effects, null effects, and positive effects. The main goal of this paper is to develop models that allow researchers to explore whether all three are present in a task: Do all individuals show a positive effect? Are there individuals with truly no effect? Are there any individuals with negative effects? We develop a family of Bayesian hierarchical models that capture a variety of these constraints. We apply this approach to Stroop interference experiments and a near-liminal priming experiment where the prime may be below and above threshold for different people. We show that most tasks people are quite alike-for example everyone has positive Stroop effects and nobody fails to Stroop or Stroops negatively. We also show a case that under very specific circumstances, we could entice some people to not Stroop at all.
Collapse
|
28
|
Abstract
Various mixture modeling approaches have been proposed to identify within-subjects differences in the psychological processes underlying responses to psychometric tests. Although valuable, the existing mixture models are associated with at least one of the following three challenges: (1) A parametric distribution is assumed for the response times that—if violated—may bias the results; (2) the response processes are assumed to result in equal variances (homoscedasticity) in the response times, whereas some processes may produce more variability than others (heteroscedasticity); and (3) the different response processes are modeled as independent latent variables, whereas they may be related. Although each of these challenges has been addressed separately, in practice they may occur simultaneously. Therefore, we propose a heteroscedastic hidden Markov mixture model for responses and categorized response times that addresses all the challenges above in a single model. In a simulation study, we demonstrated that the model is associated with acceptable parameter recovery and acceptable resolution to distinguish between various special cases. In addition, the model was applied to the responses and response times of the WAIS-IV block design subtest, to demonstrate its use in practice.
Collapse
|
29
|
Arshad U, Chasseloup E, Nordgren R, Karlsson MO. Development of visual predictive checks accounting for multimodal parameter distributions in mixture models. J Pharmacokinet Pharmacodyn 2019; 46:241-250. [PMID: 30968312 PMCID: PMC6560505 DOI: 10.1007/s10928-019-09632-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 03/29/2019] [Indexed: 01/18/2023]
Abstract
The assumption of interindividual variability being unimodally distributed in nonlinear mixed effects models does not hold when the population under study displays multimodal parameter distributions. Mixture models allow the identification of parameters characteristic to a subpopulation by describing these multimodalities. Visual predictive check (VPC) is a standard simulation based diagnostic tool, but not yet adapted to account for multimodal parameter distributions. Mixture model analysis provides the probability for an individual to belong to a subpopulation (IPmix) and the most likely subpopulation for an individual to belong to (MIXEST). Using simulated data examples, two implementation strategies were followed to split the data into subpopulations for the development of mixture model specific VPCs. The first strategy splits the observed and simulated data according to the MIXEST assignment. A shortcoming of the MIXEST-based allocation strategy was a biased allocation towards the dominating subpopulation. This shortcoming was avoided by splitting observed and simulated data according to the IPmix assignment. For illustration purpose, the approaches were also applied to an irinotecan mixture model demonstrating 36% lower clearance of irinotecan metabolite (SN-38) in individuals with UGT1A1 homo/heterozygote versus wild-type genotype. VPCs with segregated subpopulations were helpful in identifying model misspecifications which were not evident with standard VPCs. The new tool provides an enhanced power of evaluation of mixture models.
Collapse
Affiliation(s)
- Usman Arshad
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.
- Faculty of Medicine and University Hospital Cologne, Center for Pharmacology, Department I of Pharmacology, University of Cologne, Gleueler Str 24, 50931, Cologne, Germany.
| | - Estelle Chasseloup
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Rikard Nordgren
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Mats O Karlsson
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
30
|
Kazmi SO, Rodrigue N. Detecting amino acid preference shifts with codon-level mutation-selection mixture models. BMC Evol Biol 2019; 19:62. [PMID: 30808289 PMCID: PMC6390532 DOI: 10.1186/s12862-019-1358-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 01/11/2019] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND In recent years, increasing attention has been placed on the development of phylogeny-based statistical methodologies for uncovering site-specific changes in amino acid fitness profiles over time. The few available random-effects approaches, modelling across-site variation in amino acid profiles as random variables drawn from a statistical law, either lack a mechanistic codon-level formulation, or pose significant computational challenges. RESULTS Here, we bring together a few existing ideas to explore a simple and fast method based on a predefined finite mixture of amino acid profiles within a codon-level substitution model following the mutation-selection formulation. Our study is focused on the detection of site-specific shifts in amino acid profiles over a known sub-clade of a tree, using simulations with and without shifts over the sub-clade to study the properties of the method. Through modifications of the values of the amino acid profiles, our simulations show different levels of reliability under different forms of finite mixture models. Sites identified by our method in a real data set show obvious overlap with those identified using previous methods, with some notable differences. CONCLUSION Overall, our results show that when a site-specific shift in amino acid profile is strongly pronounced, involving two clearly different sets of profiles, the method performs very well; but shifts between profiles that share many features are difficult to correctly identify, highlighting the challenging nature of the problem.
Collapse
Affiliation(s)
- S Omar Kazmi
- Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada
| | - Nicolas Rodrigue
- Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada. .,Institute of Biochemistry and School of Mathematics and Statistics, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada.
| |
Collapse
|
31
|
Drewe-Boss P, Wessels HH, Ohler U. omniCLIP: probabilistic identification of protein-RNA interactions from CLIP-seq data. Genome Biol 2018; 19:183. [PMID: 30384847 DOI: 10.1186/s13059-018-1521-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 09/03/2018] [Indexed: 12/04/2022] Open
Abstract
CLIP-seq methods allow the generation of genome-wide maps of RNA binding protein – RNA interaction sites. However, due to differences between different CLIP-seq assays, existing computational approaches to analyze the data can only be applied to a subset of assays. Here, we present a probabilistic model called omniCLIP that can detect regulatory elements in RNAs from data of all CLIP-seq assays. omniCLIP jointly models data across replicates and can integrate background information. Therefore, omniCLIP greatly simplifies the data analysis, increases the reliability of results and paves the way for integrative studies based on data from different assays.
Collapse
|
32
|
Abstract
Pooled CRISPR screens allow researchers to interrogate genetic causes of complex phenotypes at the genome-wide scale and promise higher specificity and sensitivity compared to competing technologies. Unfortunately, two problems exist, particularly for CRISPRi/a screens: variability in guide efficiency and large rare off-target effects. We present a method, CRISPhieRmix, that resolves these issues by using a hierarchical mixture model with a broad-tailed null distribution. We show that CRISPhieRmix allows for more accurate and powerful inferences in large-scale pooled CRISPRi/a screens. We discuss key issues in the analysis and design of screens, particularly the number of guides needed for faithful full discovery.
Collapse
Affiliation(s)
- Timothy P. Daley
- Department of Statistics, Stanford University, 450 Serra Mall, Stanford, 94305 USA
- Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, 94305 USA
| | - Zhixiang Lin
- Department of Statistics, Stanford University, 450 Serra Mall, Stanford, 94305 USA
- Present Address: Department of Statistics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
| | - Xueqiu Lin
- Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, 94305 USA
| | - Yanxia Liu
- Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, 94305 USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, 450 Serra Mall, Stanford, 94305 USA
- Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford, 94305 USA
| | - Lei S. Qi
- Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, 94305 USA
- Department of Chemical and Systems Biology, Stanford University, 443 Via Ortega, Stanford, 94305 USA
- ChEM-H Institute, Stanford University, 443 Via Ortega, Stanford, 94305 USA
| |
Collapse
|
33
|
Abstract
We consider the problem of multivariate density deconvolution when interest lies in estimating the distribution of a vector valued random variable X but precise measurements on X are not available, observations being contaminated by measurement errors U. The existing sparse literature on the problem assumes the density of the measurement errors to be completely known. We propose robust Bayesian semiparametric multivariate deconvolution approaches when the measurement error density of U is not known but replicated proxies are available for at least some individuals. Additionally, we allow the variability of U to depend on the associated unobserved values of X through unknown relationships, which also automatically includes the case of multivariate multiplicative measurement errors. Basic properties of finite mixture models, multivariate normal kernels and exchangeable priors are exploited in novel ways to meet modeling and computational challenges. Theoretical results showing the flexibility of the proposed methods in capturing a wide variety of data generating processes are provided. We illustrate the efficiency of the proposed methods in recovering the density of X through simulation experiments. The methodology is applied to estimate the joint consumption pattern of different dietary components from contaminated 24 hour recalls. Supplementary Material presents substantive additional details.
Collapse
Affiliation(s)
- Abhra Sarkar
- Department of Statistical Science, Duke University, Durham, NC 27708-0251, USA,
| | - Debdeep Pati
- Department of Statistics, Florida State University, Tallahassee, FL 32306-4330, USA,
| | - Antik Chakraborty
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX, 77843-3143 USA,
| | - Bani K Mallick
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143, USA,
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143, USA, and School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway NSW 2007, Australia,
| |
Collapse
|
34
|
Ravazzi C, Magli E. Improved iterative shrinkage-thresholding for sparse signal recovery via Laplace mixtures models. EURASIP J Adv Signal Process 2018; 2018:46. [PMID: 30996728 PMCID: PMC6434991 DOI: 10.1186/s13634-018-0565-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Accepted: 06/13/2018] [Indexed: 06/09/2023]
Abstract
In this paper, we propose a new method for support detection and estimation of sparse and approximately sparse signals from compressed measurements. Using a double Laplace mixture model as the parametric representation of the signal coefficients, the problem is formulated as a weighted ℓ 1 minimization. Then, we introduce a new family of iterative shrinkage-thresholding algorithms based on double Laplace mixture models. They preserve the computational simplicity of classical ones and improve iterative estimation by incorporating soft support detection. In particular, at each iteration, by learning the components that are likely to be nonzero from the current MAP signal estimate, the shrinkage-thresholding step is adaptively tuned and optimized. Unlike other adaptive methods, we are able to prove, under suitable conditions, the convergence of the proposed methods to a local minimum of the weighted ℓ 1 minimization. Moreover, we also provide an upper bound on the reconstruction error. Finally, we show through numerical experiments that the proposed methods outperform classical shrinkage-thresholding in terms of rate of convergence, accuracy, and of sparsity-undersampling trade-off.
Collapse
Affiliation(s)
- Chiara Ravazzi
- National Research Council of Italy, IEIIT-CNR, c/o Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino, 10129 Italy
| | - Enrico Magli
- National Research Council of Italy, IEIIT-CNR, c/o Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino, 10129 Italy
- Politecnico di Torino, DET, Corso Duca degli Abruzzi 24, Torino, 10129 Italy
| |
Collapse
|
35
|
Hemmati H, Kamali-Asl A, Ghafarian P, Ay MR. Reconstruction/segmentation of attenuation map in TOF-PET based on mixture models. Ann Nucl Med 2018; 32:474-84. [PMID: 29931622 DOI: 10.1007/s12149-018-1270-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 06/12/2018] [Indexed: 10/28/2022]
Abstract
Attenuation correction is known as a necessary step in positron emission tomography (PET) system to have accurate and quantitative activity images. Emission-based method is known as a promising approach for attenuation map estimation on TOF-PET scanners. The proposed method in this study imposes additional histogram-based information as a mixture model prior on the emission-based approach using maximum a posteriori (MAP) framework to improve its performance and make such a nearly segmented attenuation map. To eliminate misclassification of histogram modeling, a Median root prior is incorporated on the proposed approach to reduce the noise between neighbor voxels and encourage spatial smoothness in the reconstructed attenuation map. The joint-MAP optimization is carried out as an iterative approach wherein an alteration of the activity and attenuation updates is followed by a mixture decomposition of the attenuation map histogram. Also, the proposed method can segment attenuation map during the reconstruction. The evaluation of the proposed method on the numerical, simulation and real contexts indicate that the presented method has the potential to be used as a stand-alone method or even combined with other methods for attenuation correction on PET/MR systems.
Collapse
|
36
|
Abstract
Variance in longevity among individuals may arise as an effect of heterogeneity (differences in mortality rates experienced at the same age or stage) or as an effect of individual stochasticity (the outcome of random demographic events during the life cycle). Decomposing the variance into components due to heterogeneity and stochasticity is crucial for evolutionary analyses.In this study, we analyze longevity from ten studies of invertebrates in the laboratory, and use the results to partition the variance in longevity into its components. To do so, we fit finite mixtures of Weibull survival functions to each data set by maximum likelihood, using the EM algorithm. We used the Bayesian Information Criterion to select the most well supported model. The results of the mixture analysis were used to construct an age × stage-classified matrix model, with heterogeneity groups as stages, from which we calculated the variance in longevity and its components. Almost all data sets revealed evidence of some degree of heterogeneity. The median contribution of unobserved heterogeneity to the total variance was 35%, with the remaining 65% due to stochasticity. The differences among groups in mean longevity were typically on the order of 30% of the overall life expectancy. There was considerable variation among data sets in both the magnitude of heterogeneity and the proportion of variance due to heterogeneity, but no clear patterns were apparent in relation to sex, taxon, or environmental conditions.
Collapse
Affiliation(s)
- Nienke Hartemink
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, P.O. Box 94248, 1090 GE Amsterdam, The Netherlands
| | - Hal Caswell
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, P.O. Box 94248, 1090 GE Amsterdam, The Netherlands
| |
Collapse
|
37
|
Klukowski P, Augoff M, Zamorski M, Gonczarek A, Walczak MJ. Application of Dirichlet process mixture model to the identification of spin systems in protein NMR spectra. J Biomol NMR 2018; 71:11-18. [PMID: 29777498 DOI: 10.1007/s10858-018-0185-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 05/10/2018] [Indexed: 06/08/2023]
Abstract
Analysis of structure, function and interactions of proteins by NMR spectroscopy usually requires the assignment of resonances to the corresponding nuclei in protein. This task, although automated by methods such as FLYA or PINE, is still frequently performed manually. To facilitate the manual sequence-specific chemical shift assignment of complex proteins, we propose a method based on Dirichlet process mixture model (DPMM) that performs automated matching of groups of signals observed in NMR spectra to corresponding nuclei in protein sequence. The model has been extensively tested on 80 proteins retrieved from the BMRB database and has shown superior performance to the reference method.
Collapse
Affiliation(s)
- Piotr Klukowski
- Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland.
| | - Michał Augoff
- Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
| | - Maciej Zamorski
- Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
| | - Adam Gonczarek
- Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
- Alphamoon Ltd., ul. Włodkowica 21/3, 50-072, Wrocław, Poland
| | - Michał J Walczak
- Captor Therapeutics Ltd., ul. Duńska 11, 54-427, Wrocław, Poland
- Alphamoon Ltd., ul. Włodkowica 21/3, 50-072, Wrocław, Poland
| |
Collapse
|
38
|
Abstract
In statistical modeling, it is often of interest to evaluate non-negative quantities that capture heterogeneity in the population such as variances, mixing proportions and dispersion parameters. In instances of covariate-dependent heterogeneity, the implied homogeneity hypotheses are nonstandard and existing inferential techniques are not applicable. In this paper, we develop a quasi-score test statistic to evaluate homogeneity against heterogeneity that varies with a covariate profile through a regression model. We establish the limiting null distribution of the proposed test as a functional of mixtures of chi-square processes. The methodology does not require the full distribution of the data to be entirely specified. Instead, a general estimating function for a finite dimensional component of the model that is of interest is assumed but other characteristics of the population are left completely unspecified. We apply the methodology to evaluate the excess zero proportion in zero-inflated models for count data. Our numerical simulations show that the proposed test can greatly improve efficiency over tests of homogeneity that neglect covariate information under the alternative hypothesis. An empirical application to dental caries indices demonstrates the importance and practical utility of the methodology in detecting excess zeros in the data.
Collapse
Affiliation(s)
- David Todem
- Michigan State University, East Lansing, USA
| | | | | |
Collapse
|
39
|
Hernández-Alava M, Pudney S. Econometric modelling of multiple self-reports of health states: The switch from EQ-5D-3L to EQ-5D-5L in evaluating drug therapies for rheumatoid arthritis. J Health Econ 2017; 55:139-152. [PMID: 28778350 PMCID: PMC5597047 DOI: 10.1016/j.jhealeco.2017.06.013] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Revised: 06/14/2017] [Accepted: 06/30/2017] [Indexed: 05/11/2023]
Abstract
EQ-5D is used in cost-effectiveness studies underlying many important health policy decisions. It comprises a survey instrument describing health states across five domains, and a system of utility values for each state. The original 3-level version of EQ-5D is being replaced with a more sensitive 5-level version but the consequences of this change are uncertain. We develop a multi-equation ordinal response model incorporating a copula specification with normal mixture marginals to analyse joint responses to EQ-5D-3L and EQ-5D-5L in a survey of people with rheumatic disease, and use it to generate mappings between the alternative descriptive systems. We revisit a major cost-effectiveness study of drug therapies for rheumatoid arthritis, mapping the original EQ-5D-3L measure onto a 5L valuation basis. Working within a comprehensive, flexible econometric framework, we find that use of simpler restricted specifications can make very large changes to cost-effectiveness estimates with serious implications for decision-making.
Collapse
Affiliation(s)
| | - Stephen Pudney
- School of Health and Related Research, University of Sheffield, UK.
| |
Collapse
|
40
|
Zuanetti DA, Milan LA. A generalized mixture model applied to diabetes incidence data. Biom J 2017; 59:826-842. [PMID: 28321898 DOI: 10.1002/bimj.201600086] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 01/15/2017] [Accepted: 01/17/2017] [Indexed: 11/06/2022]
Abstract
We present a generalization of the usual (independent) mixture model to accommodate a Markovian first-order mixing distribution. We propose the data-driven reversible jump, a Markov chain Monte Carlo (MCMC) procedure, for estimating the a posteriori probability for each model in a model selection procedure and estimating the corresponding parameters. Simulated datasets show excellent performance of the proposed method in the convergence, model selection, and precision of parameters estimates. Finally, we apply the proposed method to analyze USA diabetes incidence datasets.
Collapse
Affiliation(s)
- Daiane Aparecida Zuanetti
- Departamento de Estatística, Universidade Federal de Sao Carlos, Rod. Washington Luís, Km 235, SP 310 Sao Carlos, São Paulo, 13565-905, Brazil
| | - Luis Aparecido Milan
- Departamento de Estatística, Universidade Federal de Sao Carlos, Rod. Washington Luís, Km 235, SP 310 Sao Carlos, São Paulo, 13565-905, Brazil
| |
Collapse
|
41
|
Farcomeni A. Penalized estimation in latent Markov models, with application to monitoring serum calcium levels in end-stage kidney insufficiency. Biom J 2017; 59:1035-1046. [PMID: 28593719 DOI: 10.1002/bimj.201700007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Revised: 04/07/2017] [Accepted: 04/22/2017] [Indexed: 11/06/2022]
Abstract
We introduce a penalized likelihood form for latent Markov models. We motivate its use for biomedical applications where the sample size is in the order of the tens, or at most hundreds, and there are only few repeated measures. The resulting estimates never break down, while spurious solutions are often obtained by maximizing the likelihood itself. We discuss model choice based on the Takeuchi Information Criterion. Simulations and a real-data application to monitoring serum Calcium levels in end-stage kidney disease are used for illustration.
Collapse
Affiliation(s)
- Alessio Farcomeni
- Department of Public Health and Infectious Diseases (Sapienza-University of Rome), Piazzale Aldo Moro, 5, 00185, Roma, Italy
| |
Collapse
|
42
|
Tayob N, Stingo F, Do KA, Lok ASF, Feng Z. A Bayesian screening approach for hepatocellular carcinoma using multiple longitudinal biomarkers. Biometrics 2017; 74:249-259. [PMID: 28482112 DOI: 10.1111/biom.12717] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 02/01/2017] [Accepted: 04/01/2017] [Indexed: 12/18/2022]
Abstract
Advanced hepatocellular carcinoma (HCC) has limited treatment options and poor survival, therefore early detection is critical to improving the survival of patients with HCC. Current guidelines for high-risk patients include ultrasound screenings every six months, but ultrasounds are operator dependent and not sensitive for early HCC. Serum α-Fetoprotein (AFP) is a widely used diagnostic biomarker, but it has limited sensitivity and is not elevated in all HCC cases so, we incorporate a second blood-based biomarker, des'γ carboxy-prothrombin (DCP), that has shown potential as a screening marker for HCC. The data from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial is a valuable source of data to study biomarker screening for HCC. We assume the trajectories of AFP and DCP follow a joint hierarchical mixture model with random changepoints that allows for distinct changepoint times and subsequent trajectories of each biomarker. The changepoint indicators are jointly modeled with a Markov Random Field distribution to help detect borderline changepoints. Markov chain Monte Carlo methods are used to calculate posterior distributions, which are used in risk calculations among future patients and determine whether a patient has a positive screen. The screening algorithm was compared to alternatives in simulations studies under a range of possible scenarios and in the HALT-C Trial using cross-validation.
Collapse
Affiliation(s)
- Nabihah Tayob
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A
| | - Francesco Stingo
- Department of Statistics, Computer Science, Applications "G. Parenti", University of Florence, Florence, Italy
| | - Kim-Anh Do
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A
| | - Anna S F Lok
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - Ziding Feng
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A
| |
Collapse
|
43
|
Papastamoulis P, Rattray M. A Bayesian model selection approach for identifying differentially expressed transcripts from RNA sequencing data. J R Stat Soc Ser C Appl Stat 2017; 67:3-23. [PMID: 29353941 PMCID: PMC5763373 DOI: 10.1111/rssc.12213] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Recent advances in molecular biology allow the quantification of the transcriptome and scoring transcripts as differentially or equally expressed between two biological conditions. Although these two tasks are closely linked, the available inference methods treat them separately: a primary model is used to estimate expression and its output is post processed by using a differential expression model. In the paper, both issues are simultaneously addressed by proposing the joint estimation of expression levels and differential expression: the unknown relative abundance of each transcript can either be equal or not between two conditions. A hierarchical Bayesian model builds on the BitSeq framework and the posterior distribution of transcript expression and differential expression is inferred by using Markov chain Monte Carlo sampling. It is shown that the model proposed enjoys conjugacy for fixed dimension variables; thus the full conditional distributions are analytically derived. Two samplers are constructed, a reversible jump Markov chain Monte Carlo sampler and a collapsed Gibbs sampler, and the latter is found to perform better. A cluster representation of the aligned reads to the transcriptome is introduced, allowing parallel estimation of the marginal posterior distribution of subsets of transcripts under reasonable computing time. Under a fixed prior probability of differential expression the clusterwise sampler has the same marginal posterior distributions as the raw sampler, but a more general prior structure is also employed. The algorithm proposed is benchmarked against alternative methods by using synthetic data sets and applied to real RNA sequencing data. Source code is available on line from https://github.com/mqbssppe/cjBitSeq.
Collapse
|
44
|
Abstract
BACKGROUND With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest. METHODS In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection. RESULTS We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data. CONCLUSIONS This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology.
Collapse
Affiliation(s)
- Yinglei Lai
- Department of Statistics, The George Washington University, 801 22nd St. N.W., Rome Hall, 7th Floor, Washington, 20052, D.C., USA.
| | - Fanni Zhang
- Department of Statistics, The George Washington University, 801 22nd St. N.W., Rome Hall, 7th Floor, Washington, 20052, D.C., USA
| | - Tapan K Nayak
- Department of Statistics, The George Washington University, 801 22nd St. N.W., Rome Hall, 7th Floor, Washington, 20052, D.C., USA
| | - Reza Modarres
- Department of Statistics, The George Washington University, 801 22nd St. N.W., Rome Hall, 7th Floor, Washington, 20052, D.C., USA
| | - Norman H Lee
- Department of Pharmacology and Physiology, The George Washington University Medical Center, Washington, 20037, D.C., USA
| | - Timothy A McCaffrey
- Department of Medicine, Division of Genomic Medicine, The George Washington University Medical Center, Washington, 20037, D.C., USA
| |
Collapse
|
45
|
Laporte F, Charcosset A, Mary-Huard T. Estimation of the relatedness coefficients from biallelic markers, application in plant mating designs. Biometrics 2017; 73:885-894. [PMID: 28084017 DOI: 10.1111/biom.12634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 10/01/2016] [Accepted: 10/01/2016] [Indexed: 01/22/2023]
Abstract
The problem of inferring the relatedness distribution between two individuals from biallelic marker data is considered. This problem can be cast as an estimation task in a mixture model: at each marker the latent variable is the relatedness state, and the observed variable is the genotype of the two individuals. In this model, only the prior proportions are unknown, and can be obtained via ML estimation using the EM algorithm. When the markers are biallelic and the data unphased, the identifiability of the model is known not to be guaranteed. In this article, model identifiability is investigated in the case of phased data generated from a crossing design, a classical situation in plant genetics. It is shown that identifiability can be guaranteed under some conditions on the crossing design. The adapted ML estimator is implemented in an R package called Relatedness. The performance of the ML estimator is evaluated and compared to that of the benchmark moment estimator, both on simulated and real data. Compared to its competitor, the ML estimator is shown to be more robust and to provide more realistic estimates.
Collapse
Affiliation(s)
- Fabien Laporte
- INRA, UMR 0320 / UMR 8120 Génétique Quantitative et Évolution-Le Moulon F-91190 Gif-sur-Yvette, France
| | - Alain Charcosset
- INRA, UMR 0320 / UMR 8120 Génétique Quantitative et Évolution-Le Moulon F-91190 Gif-sur-Yvette, France
| | - Tristan Mary-Huard
- INRA, UMR 0320 / UMR 8120 Génétique Quantitative et Évolution-Le Moulon F-91190 Gif-sur-Yvette, France.,AgroParisTech, UMR518 MIA-Paris, F-75231 Paris Cedex 05, France INRA, UMR518 MIA-Paris F-75231 Paris Cedex 05, France
| |
Collapse
|
46
|
Smith DK, Smith LE, Billings FT, Blume JD. A general approach to risk modeling using partial surrogate markers with application to perioperative acute kidney injury. Diagn Progn Res 2017; 1:21. [PMID: 31093550 PMCID: PMC6460789 DOI: 10.1186/s41512-017-0022-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 12/05/2017] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Surrogate outcomes are often utilized when disease outcomes are difficult to directly measure. When a biological threshold effect exists, surrogate outcomes may only represent disease in specific subpopulations. We refer to these outcomes as "partial surrogate outcomes." We hypothesized that risk models of partial surrogate outcomes would perform poorly if they fail to account for this population heterogeneity. We developed criteria for predictive model development using partial surrogate outcomes and demonstrate their importance in model selection and evaluation within the clinical example of serum creatinine, a partial surrogate outcome for acute kidney injury. METHODS Data from 4737 patients who underwent cardiac surgery at a major academic center were obtained. Linear and mixture models were fit on maximum 2-day serum creatinine change as a surrogate for estimated glomerular filtration rate at 90 days after surgery (eGFR90), adjusted for known AKI risk factors. The AUC for eGFR90 decline and Spearman's rho were calculated to compare model discrimination between the linear model and a single component of the mixture model deemed to represent the informative subpopulation. Simulation studies based on the clinical data were conducted to further demonstrate the consistency and limitations of the procedure. RESULTS The mixture model was highly favored over the linear model with BICs of 2131.3 and 5034.3, respectively. When model discrimination was evaluated with respect to the partial surrogate, the linear model displays superior performance (p < 0.001); however, when it was evaluated with respect to the target outcome, the mixture model approach displays superior performance (AUC difference p = 0.002; Spearman's difference p = 0.020). Simulation studies demonstrate that the nature of the heterogeneity determines the magnitude of any advantage the mixture model. CONCLUSIONS Partial surrogate outcomes add complexity and limitations to risk score modeling, including the potential for the usual metrics of discrimination to be misleading. Partial surrogacy can be potentially uncovered and appropriately accounted for using a mixture model approach. Serum creatinine behaved as a partial surrogate outcome consistent with two patient subpopulations, one representing patients whose injury did not exceed their renal functional reserve and a second population representing patients whose injury did exceed renal functional reserve.
Collapse
Affiliation(s)
- Derek K. Smith
- 0000 0004 1936 9916grid.412807.8Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 11000, Nashville, TN 37212 USA
| | - Loren E. Smith
- 0000 0004 1936 9916grid.412807.8Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, TN USA
| | - Frederic T. Billings
- 0000 0004 1936 9916grid.412807.8Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, TN USA
| | - Jeffrey D. Blume
- 0000 0004 1936 9916grid.412807.8Department of Biostatistics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 11000, Nashville, TN 37212 USA
| |
Collapse
|
47
|
Niklitschek EJ, Darnaude AM. Performance of maximum likelihood mixture models to estimate nursery habitat contributions to fish stocks: a case study on sea bream Sparus aurata. PeerJ 2016; 4:e2415. [PMID: 27761305 PMCID: PMC5068389 DOI: 10.7717/peerj.2415] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 08/05/2016] [Indexed: 11/21/2022] Open
Abstract
Background Mixture models (MM) can be used to describe mixed stocks considering three sets of parameters: the total number of contributing sources, their chemical baseline signatures and their mixing proportions. When all nursery sources have been previously identified and sampled for juvenile fish to produce baseline nursery-signatures, mixing proportions are the only unknown set of parameters to be estimated from the mixed-stock data. Otherwise, the number of sources, as well as some/all nursery-signatures may need to be also estimated from the mixed-stock data. Our goal was to assess bias and uncertainty in these MM parameters when estimated using unconditional maximum likelihood approaches (ML-MM), under several incomplete sampling and nursery-signature separation scenarios. Methods We used a comprehensive dataset containing otolith elemental signatures of 301 juvenile Sparus aurata, sampled in three contrasting years (2008, 2010, 2011), from four distinct nursery habitats. (Mediterranean lagoons) Artificial nursery-source and mixed-stock datasets were produced considering: five different sampling scenarios where 0–4 lagoons were excluded from the nursery-source dataset and six nursery-signature separation scenarios that simulated data separated 0.5, 1.5, 2.5, 3.5, 4.5 and 5.5 standard deviations among nursery-signature centroids. Bias (BI) and uncertainty (SE) were computed to assess reliability for each of the three sets of MM parameters. Results Both bias and uncertainty in mixing proportion estimates were low (BI ≤ 0.14, SE ≤ 0.06) when all nursery-sources were sampled but exhibited large variability among cohorts and increased with the number of non-sampled sources up to BI = 0.24 and SE = 0.11. Bias and variability in baseline signature estimates also increased with the number of non-sampled sources, but tended to be less biased, and more uncertain than mixing proportion ones, across all sampling scenarios (BI < 0.13, SE < 0.29). Increasing separation among nursery signatures improved reliability of mixing proportion estimates, but lead to non-linear responses in baseline signature parameters. Low uncertainty, but a consistent underestimation bias affected the estimated number of nursery sources, across all incomplete sampling scenarios. Discussion ML-MM produced reliable estimates of mixing proportions and nursery-signatures under an important range of incomplete sampling and nursery-signature separation scenarios. This method failed, however, in estimating the true number of nursery sources, reflecting a pervasive issue affecting mixture models, within and beyond the ML framework. Large differences in bias and uncertainty found among cohorts were linked to differences in separation of chemical signatures among nursery habitats. Simulation approaches, such as those presented here, could be useful to evaluate sensitivity of MM results to separation and variability in nursery-signatures for other species, habitats or cohorts.
Collapse
Affiliation(s)
| | - Audrey M Darnaude
- Center for Marine Biodiversity, Exploitation & Conservation, Centre National de la Recherche Scientifique , Montpellier , France
| |
Collapse
|
48
|
Ruhi S, Karim MR. Selecting statistical model and optimum maintenance policy: a case study of hydraulic pump. Springerplus 2016; 5:969. [PMID: 27429879 PMCID: PMC4932024 DOI: 10.1186/s40064-016-2619-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 06/17/2016] [Indexed: 11/18/2022]
Abstract
Introduction Proper maintenance policy can play a vital role for effective investigation of product reliability. Every engineered object such as product, plant or infrastructure needs preventive and corrective maintenance. Case description In this paper we look at a real case study. It deals with the maintenance of hydraulic pumps used in excavators by a mining company. We obtain the data that the owner had collected and carry out an analysis and building models for pump failures. The data consist of both failure and censored lifetimes of the hydraulic pump. Discussion and evaluation Different competitive mixture models are applied to analyze a set of maintenance data of a hydraulic pump. Various characteristics of the mixture models, such as the cumulative distribution function, reliability function, mean time to failure, etc. are estimated to assess the reliability of the pump. Akaike Information Criterion, adjusted Anderson–Darling test statistic, Kolmogrov–Smirnov test statistic and root mean square error are considered to select the suitable models among a set of competitive models. The maximum likelihood estimation method via the EM algorithm is applied mainly for estimating the parameters of the models and reliability related quantities. Conclusions In this study, it is found that a threefold mixture model (Weibull–Normal–Exponential) fits well for the hydraulic pump failures data set. This paper also illustrates how a suitable statistical model can be applied to estimate the optimum maintenance period at a minimum cost of a hydraulic pump.
Collapse
Affiliation(s)
- S Ruhi
- Department of Statistics, Pabna University of Science and Technology, Pabna, Bangladesh
| | - M R Karim
- Department of Statistics, University of Rajshahi, Rajshahi, 6205 Bangladesh
| |
Collapse
|
49
|
Torabi M. Hierarchical multivariate mixture generalized linear models for the analysis of spatial data: An application to disease mapping. Biom J 2016; 58:1138-50. [PMID: 27374632 DOI: 10.1002/bimj.201500248] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Revised: 04/25/2016] [Accepted: 05/03/2016] [Indexed: 11/09/2022]
Abstract
Disease mapping of a single disease has been widely studied in the public health setup. Simultaneous modeling of related diseases can also be a valuable tool both from the epidemiological and from the statistical point of view. In particular, when we have several measurements recorded at each spatial location, we need to consider multivariate models in order to handle the dependence among the multivariate components as well as the spatial dependence between locations. It is then customary to use multivariate spatial models assuming the same distribution through the entire population density. However, in many circumstances, it is a very strong assumption to have the same distribution for all the areas of population density. To overcome this issue, we propose a hierarchical multivariate mixture generalized linear model to simultaneously analyze spatial Normal and non-Normal outcomes. As an application of our proposed approach, esophageal and lung cancer deaths in Minnesota are used to show the outperformance of assuming different distributions for different counties of Minnesota rather than assuming a single distribution for the population density. Performance of the proposed approach is also evaluated through a simulation study.
Collapse
Affiliation(s)
- Mahmoud Torabi
- Department of Community Health Sciences, University of Manitoba, S113 Medical Services Building, 750 Bannatyne Ave., Winnipeg, MB, Canada, R3E 0W3.
| |
Collapse
|
50
|
Gajewski BJ, Reese CS, Colombo J, Carlson SE. Commensurate Priors on a Finite Mixture Model for Incorporating Repository Data in Clinical Trials. Stat Biopharm Res 2016; 8:151-160. [PMID: 27347357 DOI: 10.1080/19466315.2015.1133453] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Docosahexaenoic acid (DHA) is a good source of fat that can be taken up through food, such as fish, or taken as a supplement. Evidence is building that DHA provides a high yield, low risk strategy to reduce preterm birth and/or low birth weight. These births are great costs to society. A recently completed phase III trial revealed that higher birth weight and gestational age were associated with DHA dosed at 600 mg/day. In this paper we take a posterior predictive approach to assess impacts of these findings on public health. Simple statistical models are not adequate for accurate posterior predictive distribution estimation. Of particular interest is a paper by Schwartz et al. (2010) who discovered that the joint distribution of birth weight and gestational age is well modeled by a finite mixture of three normal distributions. Data from our own clinical trial exhibit similar features. Using the mean and variance-covariance matrices from Schwartz et al. (2010) and flexible commensurate priors (Hobbs et al., 2012) for the mixing parameters, we estimate the effect of DHA supplementation on the over 20,000 infants born in hospitals demographically similar to the hospital where the clinical trial was conducted.
Collapse
Affiliation(s)
- Byron J Gajewski
- Department of Biostatistics, University of Kansas Medical Center
| | | | - John Colombo
- Schiefelbusch Institute for Life Span Studies and Department of Psychology, University of Kansas
| | - Susan E Carlson
- Department of Dietetics and Nutrition, University of Kansas Medical Center
| |
Collapse
|