1
|
Mota A, Milani EA, Leão J, Ramos PL, Ferreira PH, Junior OG, Tomazella VLD, Louzada F. A new cure rate frailty regression model based on a weighted Lindley distribution applied to stomach cancer data. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-022-00673-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
2
|
Large-scale changes in marine and terrestrial environments drive the population dynamics of long-tailed ducks breeding in Siberia. Sci Rep 2022; 12:12355. [PMID: 35853919 PMCID: PMC9296647 DOI: 10.1038/s41598-022-16166-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Accepted: 07/05/2022] [Indexed: 11/08/2022] Open
Abstract
Migratory animals experience very different environmental conditions at different times of the year, i.e., at the breeding grounds, during migration, and in winter. The long-tailed duck Clangula hyemalis breeds in the Arctic regions of the northern hemisphere and migrates to temperate climate zones, where it winters in marine environments. The breeding success of the long-tailed duck is affected by the abundances of predators and their main prey species, lemmings Lemmus sibiricus and Dicrostonyx torquatus, whose population fluctuation is subject to climate change. In the winter quarters, long-tailed ducks mainly eat the blue mussel Mytilus edulis. We examined how North-west Siberian lemming dynamics, assumed as a proxy for predation pressure, affect long-tailed duck breeding success and how nutrient availability in the Baltic Sea influences long-tailed duck population size via mussel biomass and quality. Evidence suggests that the long-tailed duck population dynamics was predator-driven on the breeding grounds and resource-driven on the wintering grounds. Nutrients from fertilizer runoff from farmland stimulate mussel stocks and quality, supporting high long-tailed duck population sizes. The applied hierarchical analysis combining several trophic levels can be used for evaluating large-scale environmental factors that affect the population dynamics and abundance of migrants from one environment to another.
Collapse
|
3
|
An Overview of Modern Applications of Negative Binomial Modelling in Ecology and Biodiversity. DIVERSITY 2022. [DOI: 10.3390/d14050320] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Negative binomial modelling is one of the most commonly used statistical tools for analysing count data in ecology and biodiversity research. This is not surprising given the prevalence of overdispersion (i.e., evidence that the variance is greater than the mean) in many biological and ecological studies. Indeed, overdispersion is often indicative of some form of biological aggregation process (e.g., when species or communities cluster in groups). If overdispersion is ignored, the precision of model parameters can be severely overestimated and can result in misleading statistical inference. In this article, we offer some insight as to why the negative binomial distribution is becoming, and arguably should become, the default starting distribution (as opposed to assuming Poisson counts) for analysing count data in ecology and biodiversity research. We begin with an overview of traditional uses of negative binomial modelling, before examining several modern applications and opportunities in modern ecology/biodiversity where negative binomial modelling is playing a critical role, from generalisations based on exploiting its Poisson-gamma mixture formulation in species distribution models and occurrence data analysis, to estimating animal abundance in negative binomial N-mixture models, and biodiversity measures via rank abundance distributions. Comparisons to other common models for handling overdispersion on real data are provided. We also address the important issue of software, and conclude with a discussion of future directions for analysing ecological and biological data with negative binomial models. In summary, we hope this overview will stimulate the use of negative binomial modelling as a starting point for the analysis of count data in ecology and biodiversity studies.
Collapse
|
4
|
Kenne Pagui EC, Salvan A, Sartori N. Improved estimation in negative binomial regression. Stat Med 2022; 41:2403-2416. [PMID: 35277866 PMCID: PMC9314673 DOI: 10.1002/sim.9361] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Revised: 01/18/2022] [Accepted: 02/11/2022] [Indexed: 11/11/2022]
Abstract
Negative binomial regression is commonly employed to analyze overdispersed count data. With small to moderate sample sizes, the maximum likelihood estimator of the dispersion parameter may be subject to a significant bias, that in turn affects inference on mean parameters. This article proposes inference for negative binomial regression based on adjustments of the score function aimed at mean or median bias reduction. The resulting estimating equations generalize those available for improved inference in generalized linear models and can be solved using a suitable extension of iterative weighted least squares. Simulation studies confirm the good properties of the new methods, which are also found to solve in many cases numerical problems of maximum likelihood estimation. The methods are illustrated and evaluated using two case studies: an Ames salmonella assay data set and data on epileptic seizures. Inference based on adjusted scores turns out to generally improve on maximum likelihood, and even on explicit bias correction, with median bias reduction being overall preferable.
Collapse
Affiliation(s)
| | - Alessandra Salvan
- Department of Statistical Sciences, University of Padova, Padova, Italy
| | - Nicola Sartori
- Department of Statistical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
5
|
Menezes AFB, Mazucheli J, de Oliveira RP, Chakraborty S. Improved maximum likelihood estimation of the parameters of the Gamma-Uniform distribution with bias-corrections. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.1951760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- A. F. B. Menezes
- Department of Statistics, Universidade Estadual de Maringá, DEs, Maringá, Paraná, Brazil
| | - J. Mazucheli
- Department of Statistics, Universidade Estadual de Maringá, DEs, Maringá, Paraná, Brazil
| | - R. P. de Oliveira
- Medical School, Universidade de São Paulo, Ribeirão Preto, Sao Paulo, Brazil
| | - S. Chakraborty
- Department of Statistics, Dibrugarh University, Dibrugarh, Assam, India
| |
Collapse
|
6
|
|
7
|
Gedik Balay İ. Estimation of the generalized process capability index Cpyk based on bias-corrected maximum-likelihood estimators for the generalized inverse Lindley distribution and bootstrap confidence intervals. J STAT COMPUT SIM 2021. [DOI: 10.1080/00949655.2021.1879081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- İklim Gedik Balay
- Department of Banking and Finance, Ankara Yıldırım Beyazıt University, Ankara, Turkey
| |
Collapse
|
8
|
Choe S, Kim HS, Lee S. Exploration of Superspreading Events in 2015 MERS-CoV Outbreak in Korea by Branching Process Models. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17176137. [PMID: 32846960 PMCID: PMC7504499 DOI: 10.3390/ijerph17176137] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 08/15/2020] [Accepted: 08/19/2020] [Indexed: 11/23/2022]
Abstract
South Korea has learned a valuable lesson from the Middle East respiratory syndrome (MERS) coronavirus outbreak in 2015. The 2015 MERS-CoV outbreak in Korea was the largest outbreak outside the Middle Eastern countries and was characterized as a nosocomial infection and a superspreading event. To assess the characteristics of a super spreading event, we specifically analyze the behaviors and epidemiological features of superspreaders. Furthermore, we employ a branching process model to understand a significantly high level of heterogeneity in generating secondary cases. The existing model of the branching process (Lloyd-Smith model) is used to incorporate individual heterogeneity into the model, and the key epidemiological components (the reproduction number and the dispersive parameter) are estimated through the empirical transmission tree of the MERS-CoV data. We also investigate the impact of control intervention strategies on the MERS-CoV dynamics of the Lloyd-Smith model. Our results highlight the roles of superspreaders in a high level of heterogeneity. This indicates that the conditions within hospitals as well as multiple hospital visits were the crucial factors for superspreading events of the 2015 MERS-CoV outbreak.
Collapse
Affiliation(s)
- Seoyun Choe
- Department of Mathematics, University of Central Florida, Orlando, FL 32816, USA;
| | - Hee-Sung Kim
- Department of Internal Medicine, Chungbuk National University Hospital, Chungbuk National University College of Medicine, Cheongju 28644, Korea;
| | - Sunmi Lee
- Department of Applied Mathematics, Kyung Hee University, Yongin 17104, Korea
- Correspondence: ; Tel.: +82-031-201-2409
| |
Collapse
|
9
|
Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nat Commun 2020; 11:3274. [PMID: 32612268 PMCID: PMC7330047 DOI: 10.1038/s41467-020-16905-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 05/29/2020] [Indexed: 01/22/2023] Open
Abstract
Despite their widespread applications, single-cell RNA-sequencing (scRNA-seq) experiments are still plagued by batch effects and dropout events. Although the completely randomized experimental design has frequently been advocated to control for batch effects, it is rarely implemented in real applications due to time and budget constraints. Here, we mathematically prove that under two more flexible and realistic experimental designs—the reference panel and the chain-type designs—true biological variability can also be separated from batch effects. We develop Batch effects correction with Unknown Subtypes for scRNA-seq data (BUSseq), which is an interpretable Bayesian hierarchical model that closely follows the data-generating mechanism of scRNA-seq experiments. BUSseq can simultaneously correct batch effects, cluster cell types, impute missing data caused by dropout events, and detect differentially expressed genes without requiring a preliminary normalization step. We demonstrate that BUSseq outperforms existing methods with simulated and real data. It is not clear which designs, other than completely randomized ones, are valid for scRNA-seq experiments so that batch effects can be adjusted. Here the authors show that under flexible reference panel and chain-type designs, biological variability can also be separated from batch effects, at least by BUSseq.
Collapse
|
10
|
Corrected Maximum Likelihood Estimations of the Lognormal Distribution Parameters. Symmetry (Basel) 2020. [DOI: 10.3390/sym12060968] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
As a result of asymmetry in practical problems, the Lognormal distribution is more suitable for data modeling in biological and economic fields than the normal distribution, while biases of maximum likelihood estimators are regular of the order O ( n − 1 ) , especially in small samples. It is of necessity to derive logical expressions for the biases of the first-order and nearly consistent estimators by bias correction techniques. Two methods are adopted in this article. One is the Cox-Snell method. The other is the resampling method known as parametric Bootstrap. They can improve maximum likelihood estimators performance and correct biases of the Lognormal distribution parameters. Through Monte Carlo simulations, we obtain average root mean squared error and bias, which are two important indexes to compare the effect of different methods. The numerical results reveal that for small and medium-sized samples, the performance of analytical bias correction estimation is superior than bootstrap estimation and classical maximum likelihood estimation. Finally, an example is given based on the actual data.
Collapse
|
11
|
Menezes AFB, Mazucheli J. Improved maximum likelihood estimators for the parameters of the Johnson SB distribution. COMMUN STAT-SIMUL C 2020. [DOI: 10.1080/03610918.2018.1498892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
| | - Josmar Mazucheli
- Department of Statistics, Universidade Estadual de Maringá, Maringá, PR, Brazil
| |
Collapse
|
12
|
Mao H, Deng X, Lord D, Flintsch G, Guo F. Adjusting finite sample bias in traffic safety modeling. ACCIDENT; ANALYSIS AND PREVENTION 2019; 131:112-121. [PMID: 31252329 DOI: 10.1016/j.aap.2019.05.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 02/22/2019] [Accepted: 05/29/2019] [Indexed: 06/09/2023]
Abstract
Poisson and negative binomial regression models are fundamental statistical analysis tools for traffic safety evaluation. The regression parameter estimation could suffer from the finite sample bias when event frequency is low, which is commonly observed in safety research as crashes are rare events. In this study, we apply a bias-correction procedure to the parameter estimation of Poisson and NB regression models. We provide a general bias-correction formulation and illustrate the finite sample bias through a special scenario with a single binary explanatory variable. Several factors affecting the magnitude of bias are identified, including the number of crashes and the balance of the crash counts within strata of a categorical explanatory variable. Simulations are conducted to examine the properties of the bias-corrected coefficient estimators. The results show that the bias-corrected estimators generally provide less bias and smaller variance. The effect is especially pronounced when the crash count in one stratum is between 5 and 50. We apply the proposed method to a case study of infrastructure safety evaluation. Three scenarios were evaluated, all crashes collected in three years, and two hypothetical situations, where crash information was collected for "half-year" and "quarter-year" periods. The case-study results confirm that the magnitude of bias correction is larger for smaller crash counts. This paper demonstrates the finite sample bias associated with the small number of crashes and suggests bias adjustment can provide more accurate estimation when evaluating the impacts of crash risk factors.
Collapse
Affiliation(s)
- Huiying Mao
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA
| | - Xinwei Deng
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA
| | - Dominique Lord
- Zachry Department of Civil Engineering, Texas A&M University, College Station, TX 77843-3136, USA
| | - Gerardo Flintsch
- Virginia Tech Transportation Institute, Virginia Tech, Blacksburg, VA 24061, USA; Charles E. Via, Jr. Department of Civil and Environmental Engineering, Virginia Tech, Blacksburg, VA 24061, USA
| | - Feng Guo
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA; Virginia Tech Transportation Institute, Virginia Tech, Blacksburg, VA 24061, USA.
| |
Collapse
|
13
|
Ganyani T, Faes C, Hens N. Inference of the generalized-growth model via maximum likelihood estimation: A reflection on the impact of overdispersion. J Theor Biol 2019; 484:110029. [PMID: 31568788 DOI: 10.1016/j.jtbi.2019.110029] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 07/15/2019] [Accepted: 09/26/2019] [Indexed: 01/17/2023]
Abstract
Recently, the generalized-growth model was introduced as a flexible approach to characterize growth dynamics of disease outbreaks during the early ascending phase. In this work, by using classical maximum likelihood estimation to obtain parameter estimates, we evaluate the impact of varying levels of overdispersion on the inference of the growth scaling parameter through comparing Poisson and Negative binomial models. In particular, under exponential and sub-exponential growth scenarios, we evaluate, via simulations, the error rate of making an incorrect characterization of early outbreak growth patterns. Simulation results show that the ability to correctly identify early outbreak growth patterns can be affected by overdispersion even when accounted for using the Negative binomial model. We exemplify our findings using data on five different outbreaks. Overall, our results show that estimates should be interpreted with caution when data are overdispersed.
Collapse
Affiliation(s)
- Tapiwa Ganyani
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, UHasselt (Hasselt University), Diepenbeek, Belgium.
| | - Christel Faes
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, UHasselt (Hasselt University), Diepenbeek, Belgium
| | - Niel Hens
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, UHasselt (Hasselt University), Diepenbeek, Belgium; Centre for Health Economics Research and Modelling Infectious Diseases, Vaccine and Infectious Disease Institute, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
14
|
A Bayesian Cure Rate Model Based on the Power Piecewise Exponential Distribution. Methodol Comput Appl Probab 2019. [DOI: 10.1007/s11009-019-09728-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
15
|
Qian J, Ray E, Brecha RL, Reilly MP, Foulkes AS. A likelihood-based approach to transcriptome association analysis. Stat Med 2019; 38:1357-1373. [PMID: 30515859 DOI: 10.1002/sim.8040] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 08/27/2018] [Accepted: 10/24/2018] [Indexed: 12/31/2022]
Abstract
Elucidating the mechanistic underpinnings of genetic associations with complex traits requires formally characterizing and testing associated cell and tissue-specific expression profiles. New opportunities exist to bolster this investigation with the growing numbers of large publicly available omics level data resources. Herein, we describe a fully likelihood-based strategy to leveraging external resources in the setting that expression profiles are partially or fully unobserved in a genetic association study. A general framework is presented to accommodate multiple data types, and strategies for implementation using existing software packages are described. The method is applied to an investigation of the genetics of evoked inflammatory response in cardiovascular disease research. Simulation studies suggest appropriate type-1 error control and power gains compared to single regression imputation, the most commonly applied practice in this setting.
Collapse
Affiliation(s)
- Jing Qian
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts Amherst, Amherst, Massachusetts
| | - Evan Ray
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, Massachusetts
| | - Regina L Brecha
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, Massachusetts
| | - Muredach P Reilly
- Department of Medicine, Columbia University, College of Physicians and Surgeons, New York, New York
| | - Andrea S Foulkes
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, Massachusetts
| |
Collapse
|
16
|
Affiliation(s)
- Xuemao Zhang
- Department of Mathematics, East Stroudsburg University, East Stroudsburg, Pennsylvania, USA
| | - Sudhir Paul
- Department of Mathematics and Statistics, University of Windsor, Windsor, Ontario, Canada
| | - You-Gan Wang
- School of Mathematical Sciences, Queensland University of Technology, Brisbane City, Australia
| |
Collapse
|
17
|
Konietschke F, Friede T, Pauly M. Semi-parametric analysis of overdispersed count and metric data with varying follow-up times: Asymptotic theory and small sample approximations. Biom J 2018; 61:616-629. [PMID: 30515878 PMCID: PMC6587510 DOI: 10.1002/bimj.201800027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 11/09/2018] [Accepted: 11/09/2018] [Indexed: 11/09/2022]
Abstract
Count data are common endpoints in clinical trials, for example magnetic resonance imaging lesion counts in multiple sclerosis. They often exhibit high levels of overdispersion, that is variances are larger than the means. Inference is regularly based on negative binomial regression along with maximum-likelihood estimators. Although this approach can account for heterogeneity it postulates a common overdispersion parameter across groups. Such parametric assumptions are usually difficult to verify, especially in small trials. Therefore, novel procedures that are based on asymptotic results for newly developed rate and variance estimators are proposed in a general framework. Moreover, in case of small samples the procedures are carried out using permutation techniques. Here, the usual assumption of exchangeability under the null hypothesis is not met due to varying follow-up times and unequal overdispersion parameters. This problem is solved by the use of studentized permutations leading to valid inference methods for situations with (i) varying follow-up times, (ii) different overdispersion parameters, and (iii) small sample sizes.
Collapse
Affiliation(s)
- Frank Konietschke
- Department of Mathematical Sciences, University of Texas at Dallas, Dallas, TX, USA
| | - Tim Friede
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Markus Pauly
- Institute of Statistics, Ulm University, Ulm, Germany
| |
Collapse
|
18
|
Baghestani AR, Shahmirzalou P, Sayad S, Akbari ME, Zayeri F. Comparison Cure Rate Models by DIC Criteria in Breast Cancer Data. Asian Pac J Cancer Prev 2018; 19:1601-1606. [PMID: 29936785 PMCID: PMC6103589 DOI: 10.22034/apjcp.2018.19.6.1601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2017] [Accepted: 05/22/2018] [Indexed: 11/27/2022] Open
Abstract
Background: One of the malignant tumors is Breast Cancer (BC) that starts in the cells of breast. There is many models for survival analysis of patients such as Cox PH model, Parametric models etc. But some disease are that all of patients will not experience main event then usual survival model is inappropriate. In addition, In the presence of cured patients, if researcher can specify distribution of survival time, usually cure rate models are preferable to parametric models. Distribution of Survival time can be Weibull, Log normal, Logistic, Gamma and so. Comparison of Weibull, Log normal and Logistic distribution for finding the best distribution of survival time is purpose of this study. Material and Methods: Among 787 patients with BC by Cancer Research Center recognized and followed from 1985 until 2013. Variables stage of cancer, age at diagnosis, tumor size and Number of Removed Positive Lymph Nodes (NRPLN) for fitting Cure rate model were selected. The best model selected with DIC criteria. All analysis were performed using SAS 9.2. Results: Mean (SD) of age was 48.47 (11.49) years and Mean of survival time and Maximum follow up time was 326 and 55.12 months respectively. During following patients, 145 (18.4%) patients died from BC and others survived (censored). Also, 1-year, 5-year and 10-year survival rate was 94, 77 and 56 percent respectively. Log normal model with smaller DIC were selected and fitted. All of mentioned variables in the model were significant on cure rate. Conclusion: This study indicated that survival time of BC followed from Log normal distribution in the best way.
Collapse
Affiliation(s)
- Ahmad Reza Baghestani
- Physiotherapy Research Centre, Department of Biostatistics, Faculty of Paramedical Sciences
| | - Parviz Shahmirzalou
- Social Determinants of Health Research Center, Yasuj University of Medical Sciences, Yasuj, Iran
- Cancer Research Center, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran
| | - Soheila Sayad
- Cancer Research Center, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran
| | - Mohammad Esmaeil Akbari
- Cancer Research Center, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran
| | - Farid Zayeri
- Department of Biostatistics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran
| |
Collapse
|
19
|
Mazucheli J, Menezes AFB, Dey S. Bias-corrected maximum likelihood estimators of the parameters of the inverse Weibull distribution. COMMUN STAT-SIMUL C 2018. [DOI: 10.1080/03610918.2018.1433838] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Josmar Mazucheli
- Department of Statistics, Universidade Estadual de Maringá Maringá, PR, Brazil
| | | | - Sanku Dey
- Department of Statistics, St. Anthony’s College, Shillong, Meghalaya, India
| |
Collapse
|
20
|
Mazucheli J, Menezes AFB, Dey S. Improved maximum-likelihood estimators for the parameters of the unit-gamma distribution. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2017.1361993] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Josmar Mazucheli
- Department of Statistics, Universidade Estadual de Maringá Maringá, PR, Brazil
| | | | - Sanku Dey
- Department of Statistics, St. Anthony’s College, Shillong, Meghalaya, India
| |
Collapse
|
21
|
Tang Y. Sample size for comparing negative binomial rates in noninferiority and equivalence trials with unequal follow-up times. J Biopharm Stat 2017; 28:475-491. [DOI: 10.1080/10543406.2017.1333998] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Yongqiang Tang
- Department of Biostatistics and Programming, Shire, MA, USA
| |
Collapse
|
22
|
Zhang X, Mallick H, Tang Z, Zhang L, Cui X, Benson AK, Yi N. Negative binomial mixed models for analyzing microbiome count data. BMC Bioinformatics 2017; 18:4. [PMID: 28049409 PMCID: PMC5209949 DOI: 10.1186/s12859-016-1441-7] [Citation(s) in RCA: 86] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 12/21/2016] [Indexed: 12/21/2022] Open
Abstract
Background Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data. Results In this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models. Conclusions We evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/ and http://github.com/abbyyan3/BhGLM), providing a useful tool for analyzing microbiome data.
Collapse
Affiliation(s)
- Xinyan Zhang
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, 35294-0022, USA
| | - Himel Mallick
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Medical and Population Genetics, the Broad Institute, Cambridge, MA, 02142, USA
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China
| | - Lei Zhang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China
| | - Xiangqin Cui
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, 35294-0022, USA
| | - Andrew K Benson
- Department of Food Science and Technology and Core for Applied Genomics and Ecology, University of Nebraska, Lincoln, NE, 68583, USA
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, 35294-0022, USA.
| |
Collapse
|
23
|
Calsavara VF, Rodrigues AS, Tomazella VLD, de Castro M. Frailty models power variance function with cure fraction and latent risk factors negative binomial. COMMUN STAT-THEOR M 2016. [DOI: 10.1080/03610926.2016.1218029] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Vinicius Fernando Calsavara
- Departamento de Epidemiologia e Estatística, Centro Internacional de Pesquisa, A.C. Camargo Cancer Center, São Paulo-SP, Brazil
- Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo-SP, Brazil
| | - Agatha Sacramento Rodrigues
- Departamento de Epidemiologia e Estatística, Centro Internacional de Pesquisa, A.C. Camargo Cancer Center, São Paulo-SP, Brazil
- Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo-SP, Brazil
| | | | - Mário de Castro
- Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos-SP, Brazil
| |
Collapse
|
24
|
Lu P, Tolliver D. Accident prediction model for public highway-rail grade crossings. ACCIDENT; ANALYSIS AND PREVENTION 2016; 90:73-81. [PMID: 26922288 DOI: 10.1016/j.aap.2016.02.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 01/15/2016] [Accepted: 02/18/2016] [Indexed: 06/05/2023]
Abstract
Considerable research has focused on roadway accident frequency analysis, but relatively little research has examined safety evaluation at highway-rail grade crossings. Highway-rail grade crossings are critical spatial locations of utmost importance for transportation safety because traffic crashes at highway-rail grade crossings are often catastrophic with serious consequences. The Poisson regression model has been employed to analyze vehicle accident frequency as a good starting point for many years. The most commonly applied variations of Poisson including negative binomial, and zero-inflated Poisson. These models are used to deal with common crash data issues such as over-dispersion (sample variance is larger than the sample mean) and preponderance of zeros (low sample mean and small sample size). On rare occasions traffic crash data have been shown to be under-dispersed (sample variance is smaller than the sample mean) and traditional distributions such as Poisson or negative binomial cannot handle under-dispersion well. The objective of this study is to investigate and compare various alternate highway-rail grade crossing accident frequency models that can handle the under-dispersion issue. The contributions of the paper are two-fold: (1) application of probability models to deal with under-dispersion issues and (2) obtain insights regarding to vehicle crashes at public highway-rail grade crossings.
Collapse
Affiliation(s)
- Pan Lu
- Assistant Professor of Transportation, Upper Great Plains Transportation Institute, Dept. 2880, North Dakota State University, Fargo, ND 58108-6050, USA.
| | - Denver Tolliver
- Director and Professor of Transportation, Upper Great Plains Transportation Institute, Dept. 2880, North Dakota State University, Fargo, ND 58108-6050, USA.
| |
Collapse
|
25
|
Zhao G, Li Q, Wang IM, Liu X, Fang X, Zhang XD. An effective analytic method for detecting tissue-specific genes in RNA-seq experiments. Pharmacogenomics 2015; 16:1769-79. [PMID: 26554622 DOI: 10.2217/pgs.15.118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIM To develop an analytic method for identifying tissue-specific (TS) genes from RNA-seq data. MATERIALS & METHODS Based on a negative binomial distribution, we develop a statistical method containing consecutive procedures incorporating data variability from replicates in each tissue. RESULTS Simulations show that our approach can effectively identify at least 94% of the truly TS genes if the sample size is 3 and at least 84% of the TS genes detected by our method are truly TS genes. We illustrated the utility of our method in an in-house RNA-seq project and produced sensible results. CONCLUSION Our approach not only directly works on discrete data but also naturally incorporates data variability. It works effectively for detecting TS genes.
Collapse
Affiliation(s)
- Guoqing Zhao
- School of Mathematical Sciences, Peking University, Beijing 100871, China.,BARDS,MSD R&D (China), Beijing 100015, China
| | - Qiao Li
- BARDS,MSD R&D (China), Beijing 100015, China
| | - I-Ming Wang
- Discovery Pharmacogenomics, Merck Research Laboratories, West Point, PA 19486, USA
| | - Xiaoqiao Liu
- Scientific Informatics, MSD R&D (China), Beijing 100015, China
| | - Xiangzhong Fang
- School of Mathematical Sciences, Peking University, Beijing 100871, China
| | | |
Collapse
|
26
|
A New Long-Term Survival Model with Interval-Censored Data. SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS 2015. [DOI: 10.1007/s13571-015-0102-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
27
|
Khazraee SH, Sáez-Castillo AJ, Geedipally SR, Lord D. Application of the Hyper-Poisson Generalized Linear Model for Analyzing Motor Vehicle Crashes. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2015; 35:919-930. [PMID: 25385093 DOI: 10.1111/risa.12296] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The hyper-Poisson distribution can handle both over- and underdispersion, and its generalized linear model formulation allows the dispersion of the distribution to be observation-specific and dependent on model covariates. This study's objective is to examine the potential applicability of a newly proposed generalized linear model framework for the hyper-Poisson distribution in analyzing motor vehicle crash count data. The hyper-Poisson generalized linear model was first fitted to intersection crash data from Toronto, characterized by overdispersion, and then to crash data from railway-highway crossings in Korea, characterized by underdispersion. The results of this study are promising. When fitted to the Toronto data set, the goodness-of-fit measures indicated that the hyper-Poisson model with a variable dispersion parameter provided a statistical fit as good as the traditional negative binomial model. The hyper-Poisson model was also successful in handling the underdispersed data from Korea; the model performed as well as the gamma probability model and the Conway-Maxwell-Poisson model previously developed for the same data set. The advantages of the hyper-Poisson model studied in this article are noteworthy. Unlike the negative binomial model, which has difficulties in handling underdispersed data, the hyper-Poisson model can handle both over- and underdispersed crash data. Although not a major issue for the Conway-Maxwell-Poisson model, the effect of each variable on the expected mean of crashes is easily interpretable in the case of this new model.
Collapse
Affiliation(s)
- S Hadi Khazraee
- Zachry Department of Civil Engineering, Texas A&M University, College Station, TX, USA
| | | | | | - Dominique Lord
- Zachry Department of Civil Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
28
|
Zhou M, Carin L. Negative Binomial Process Count and Mixture Modeling. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:307-320. [PMID: 26353243 DOI: 10.1109/tpami.2013.211] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The seemingly disjoint problems of count and mixture modeling are united under the negative binomial (NB) process. A gamma process is employed to model the rate measure of a Poisson process, whose normalization provides a random probability measure for mixture modeling and whose marginalization leads to an NB process for count modeling. A draw from the NB process consists of a Poisson distributed finite number of distinct atoms, each of which is associated with a logarithmic distributed number of data samples. We reveal relationships between various count- and mixture-modeling distributions and construct a Poisson-logarithmic bivariate distribution that connects the NB and Chinese restaurant table distributions. Fundamental properties of the models are developed, and we derive efficient Bayesian inference. It is shown that with augmentation and normalization, the NB process and gamma-NB process can be reduced to the Dirichlet process and hierarchical Dirichlet process, respectively. These relationships highlight theoretical, structural, and computational advantages of the NB process. A variety of NB processes, including the beta-geometric, beta-NB, marked-beta-NB, marked-gamma-NB and zero-inflated-NB processes, with distinct sharing mechanisms, are also constructed. These models are applied to topic modeling, with connections made to existing algorithms under Poisson factor analysis. Example results show the importance of inferring both the NB dispersion and probability parameters.
Collapse
|
29
|
Cairns J, Lynch AG, Tavaré S. Quantifying the impact of inter-site heterogeneity on the distribution of ChIP-seq data. Front Genet 2014; 5:399. [PMID: 25452765 PMCID: PMC4231950 DOI: 10.3389/fgene.2014.00399] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 10/29/2014] [Indexed: 12/13/2022] Open
Abstract
Chromatin Immunoprecipitation followed by sequencing (ChIP-seq) is a valuable tool for epigenetic studies. Analysis of the data arising from ChIP-seq experiments often requires implicit or explicit statistical modeling of the read counts. The simple Poisson model is attractive, but does not provide a good fit to observed ChIP-seq data. Researchers therefore often either extend to a more general model (e.g., the Negative Binomial), and/or exclude regions of the genome that do not conform to the model. Since many modeling strategies employed for ChIP-seq data reduce to fitting a mixture of Poisson distributions, we explore the problem of inferring the optimal mixing distribution. We apply the Constrained Newton Method (CNM), which suggests the Negative Binomial - Negative Binomial (NB-NB) mixture model as a candidate for modeling ChIP-seq data. We illustrate fitting the NB-NB model with an accelerated EM algorithm on four data sets from three species. Zero-inflated models have been suggested as an approach to improve model fit for ChIP-seq data. We show that the NB-NB mixture model requires no zero-inflation and suggest that in some cases the need for zero inflation is driven by the model's inability to cope with both artifactual large read counts and the frequently observed very low read counts. We see that the CNM-based approach is a useful diagnostic for the assessment of model fit and inference in ChIP-seq data and beyond. Use of the suggested NB-NB mixture model will be of value not only when calling peaks or otherwise modeling ChIP-seq data, but also when simulating data or constructing blacklists de novo.
Collapse
Affiliation(s)
- Jonathan Cairns
- Nuclear Dynamics Group, The Babraham Institute Cambridge, UK ; Cancer Research UK Cambridge Institute, University of Cambridge Cambridge, UK
| | - Andy G Lynch
- Cancer Research UK Cambridge Institute, University of Cambridge Cambridge, UK
| | - Simon Tavaré
- Cancer Research UK Cambridge Institute, University of Cambridge Cambridge, UK
| |
Collapse
|
30
|
Xu J, Kuk A. On Pooling of Data and Its Relative Efficiency. Int Stat Rev 2014. [DOI: 10.1111/insr.12070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
31
|
Paul S, Alam K. Testing equality of two negative binomial means in presence of unequal over-dispersion parameters: a Behrens–Fisher problem analog. J STAT COMPUT SIM 2014. [DOI: 10.1080/00949655.2014.955025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
32
|
A stepwise likelihood ratio test procedure for rare variant selection in case-control studies. J Hum Genet 2014; 59:198-205. [PMID: 24451226 DOI: 10.1038/jhg.2014.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 12/26/2013] [Accepted: 12/26/2013] [Indexed: 01/12/2023]
Abstract
There is much recent interest in finding rare genetic variants associated with various diseases. Owing to the scarcity of rare mutations, single-variant analyses often lack power. To enable pooling of information across variants, we use a random effect formulation within a retrospective modeling framework that respects the retrospective data collecting mechanism of case-control studies. More concretely, we model the control allele frequencies of the variants as random effects, and the systematic differences between the case and control frequencies as fixed effects, resulting in a mixed model. The use of Poisson approximation and gamma-distributed random effects results in a generalized negative binomial distribution for the joint distribution of the control and case frequencies. Variants are selected by conducting stepwise likelihood ratio tests. The superiority of the proposed method over two existing variant selection methods is demonstrated in a simulation study. The effects of non-gamma random effects and correlated variants are also found to be not too detrimental in the simulation study. When the proposed procedure is applied to identify rare variants associated with obesity, it identifies one additional variant not picked up by existing methods.
Collapse
|
33
|
Saha KK. Inference concerning a common dispersion of several treatment groups in the analysis of over/underdispersed count data. Biom J 2014; 56:441-60. [DOI: 10.1002/bimj.201300105] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Revised: 09/24/2013] [Accepted: 10/25/2013] [Indexed: 11/10/2022]
Affiliation(s)
- Krishna K. Saha
- Department of Mathematical Sciences; Central Connecticut State University; New Britain CT 06050 USA
| |
Collapse
|
34
|
Saha KK, Bilisoly R, Dziuda DM. Hybrid-based confidence intervals for the ratio of two treatment means in the over-dispersed Poisson data. J Appl Stat 2013. [DOI: 10.1080/02664763.2013.840273] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
35
|
Bellos E, Johnson MR, Coin LJM. cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data. Genome Biol 2012; 13:R120. [PMID: 23259578 PMCID: PMC4056371 DOI: 10.1186/gb-2012-13-12-r120] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Accepted: 12/22/2012] [Indexed: 02/08/2023] Open
Abstract
Recent advances in sequencing technologies provide the means for identifying copy number variation (CNV) at an unprecedented resolution. A single next-generation sequencing experiment offers several features that can be used to detect CNV, yet current methods do not incorporate all available signatures into a unified model. cnvHiTSeq is an integrative probabilistic method for CNV discovery and genotyping that jointly analyzes multiple features at the population level. By combining evidence from complementary sources, cnvHiTSeq achieves high genotyping accuracy and a substantial improvement in CNV detection sensitivity over existing methods, while maintaining a low false discovery rate. cnvHiTSeq is available at http://sourceforge.net/projects/cnvhitseq.
Collapse
|
36
|
Borges P, Rodrigues J, Louzada F, Balakrishnan N. A cure rate survival model under a hybrid latent activation scheme. Stat Methods Med Res 2012; 25:838-56. [DOI: 10.1177/0962280212469682] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In lifetimes studies, the occurrence of an event (such as tumor detection or death) might be caused by one of many competing causes. Moreover, both the number of causes and the time-to-event associated with each cause are not usually observable. The number of causes can be zero, corresponding to a cure fraction. In this article, we propose a method of estimating the numerical characteristics of unobservable stages (such as initiation, promotion and progression) of carcinogenesis from data on tumor size at detection in the presence of latent competing causes. To this end, a general survival model for spontaneous carcinogenesis under a hybrid latent activation scheme has been developed to allow for a simple pattern of the dynamics of tumor growth. It is assumed that a tumor becomes detectable when its size attains some threshold level (proliferation of tumorais cells (or descendants) generated by the malignant cell), which is treated as a random variable. We assume the number of initiated cells and the number of malignant cells (competing causes) both to follow weighted Poisson distributions. The advantage of this model is that it incorporates into the analysis characteristics of the stage of tumor progression as well as the proportion of initiated cells that had been ‘promoted’ to the malignant ones and the proportion of malignant cells that die before tumor induction. The lifetimes corresponding to each competing cause are assumed to follow a Weibull distribution. Parameter estimation of the proposed model is discussed through the maximum likelihood estimation method. A simulation study has been carried out in order to examine the coverage probabilities of the confidence intervals. Finally, we illustrate the usefulness of the proposed model by applying it to a real data involving malignant melanoma.
Collapse
Affiliation(s)
- Patrick Borges
- Department of Statistics, Universidade Federal do Espírito Santo, Vitøria, Brazil
| | - Josemar Rodrigues
- Department of Statistics, Universidade Federal de São Carlos, São Paulo, Brazil
| | - Francisco Louzada
- Department of Mathematics and Statistics, Universidade de São Paulo, São Paulo, Brazil
| | | |
Collapse
|
37
|
Saha KK. Interval estimation of the mean difference in the analysis of over-dispersed count data. Biom J 2012; 55:114-33. [DOI: 10.1002/bimj.201200032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Revised: 09/07/2012] [Accepted: 10/02/2012] [Indexed: 11/08/2022]
Affiliation(s)
- Krishna K. Saha
- Department of Mathematical Sciences; Central Connecticut State University; 1615 Stanley Street; New Britain; CT 06050; USA
| |
Collapse
|
38
|
Ortega EM, Cordeiro GM, Kattan MW. The negative binomial–beta Weibull regression model to predict the cure of prostate cancer. J Appl Stat 2012. [DOI: 10.1080/02664763.2011.644525] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
39
|
Saha KK, Sen D, Jin C. Profile likelihood-based confidence interval for the dispersion parameter in count data. J Appl Stat 2012. [DOI: 10.1080/02664763.2011.616581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
40
|
|
41
|
|
42
|
|
43
|
Cancho VG, Rodrigues J, de Castro M. A flexible model for survival data with a cure rate: a Bayesian approach. J Appl Stat 2011. [DOI: 10.1080/02664760903254052] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
44
|
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol 2010; 11:R106. [PMID: 20979621 PMCID: PMC3218662 DOI: 10.1186/gb-2010-11-10-r106] [Citation(s) in RCA: 11063] [Impact Index Per Article: 790.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Revised: 07/22/2010] [Accepted: 10/27/2010] [Indexed: 02/07/2023] Open
Abstract
High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.
Collapse
Affiliation(s)
- Simon Anders
- European Molecular Biology Laboratory, Mayerhofstraße 1, 69117 Heidelberg, Germany.
| | | |
Collapse
|
45
|
Saha KK. Interval estimation of the over-dispersion parameter in the analysis of one-way layout of count data. Stat Med 2010; 30:39-51. [DOI: 10.1002/sim.4061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2009] [Accepted: 07/15/2010] [Indexed: 11/07/2022]
|
46
|
Lord D, Geedipally SR, Guikema SD. Extension of the application of conway-maxwell-poisson models: analyzing traffic crash data exhibiting underdispersion. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2010; 30:1268-1276. [PMID: 20412518 DOI: 10.1111/j.1539-6924.2010.01417.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The objective of this article is to evaluate the performance of the COM-Poisson GLM for analyzing crash data exhibiting underdispersion (when conditional on the mean). The COM-Poisson distribution, originally developed in 1962, has recently been reintroduced by statisticians for analyzing count data subjected to either over- or underdispersion. Over the last year, the COM-Poisson GLM has been evaluated in the context of crash data analysis and it has been shown that the model performs as well as the Poisson-gamma model for crash data exhibiting overdispersion. To accomplish the objective of this study, several COM-Poisson models were estimated using crash data collected at 162 railway-highway crossings in South Korea between 1998 and 2002. This data set has been shown to exhibit underdispersion when models linking crash data to various explanatory variables are estimated. The modeling results were compared to those produced from the Poisson and gamma probability models documented in a previous published study. The results of this research show that the COM-Poisson GLM can handle crash data when the modeling output shows signs of underdispersion. Finally, they also show that the model proposed in this study provides better statistical performance than the gamma probability and the traditional Poisson models, at least for this data set.
Collapse
Affiliation(s)
- Dominique Lord
- Zachry Department of Civil Engineering, Texas A&M University, 3136 TAMU, College Station, TX 77843-3136, USA.
| | | | | |
Collapse
|
47
|
Estimating the negative binomial dispersion parameter with highly stratified surveys. J Stat Plan Inference 2010. [DOI: 10.1016/j.jspi.2010.02.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
48
|
de Castro M, Cancho VG, Rodrigues J. A note on a unified approach for cure rate models. BRAZ J PROBAB STAT 2010. [DOI: 10.1214/08-bjps015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
49
|
Haleem K, Abdel-Aty M, Mackie K. Using a reliability process to reduce uncertainty in predicting crashes at unsignalized intersections. ACCIDENT; ANALYSIS AND PREVENTION 2010; 42:654-666. [PMID: 20159091 DOI: 10.1016/j.aap.2009.10.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2009] [Revised: 09/19/2009] [Accepted: 10/07/2009] [Indexed: 05/28/2023]
Abstract
The negative binomial (NB) model has been used extensively by traffic safety analysts as a crash prediction model, because it can accommodate the over-dispersion criterion usually exhibited in crash count data. However, the NB model is still a probabilistic model that may benefit from updating the parameters of the covariates to better predict crash frequencies at intersections. The objective of this paper is to examine the effect of updating the parameters of the covariates in the fitted NB model using a Bayesian updating reliability method to more accurately predict crash frequencies at 3-legged and 4-legged unsignalized intersections. For this purpose, data from 433 unsignalized intersections in Orange County, Florida were collected and used in the analysis. Four Bayesian-structure models were examined: (1) a non-informative prior with a log-gamma likelihood function, (2) a non-informative prior with an NB likelihood function, (3) an informative prior with an NB likelihood function, and (4) an informative prior with a log-gamma likelihood function. Standard measures of model effectiveness, such as the Akaike information criterion (AIC), mean absolute deviance (MAD), mean square prediction error (MSPE) and overall prediction accuracy, were used to compare the NB and Bayesian model predictions. Considering only the best estimates of the model parameters (ignoring uncertainty), both the NB and Bayesian models yielded favorable results. However, when considering the standard errors for the fitted parameters as a surrogate measure for measuring uncertainty, the Bayesian methods yielded more promising results. The full Bayesian updating framework using the log-gamma likelihood function for updating parameter estimates of the NB probabilistic models resulted in the least standard error values.
Collapse
Affiliation(s)
- Kirolos Haleem
- Department of Civil, Environmental & Construction Engineering, 4000 Central Florida Blvd, University of Central Florida, Orlando, FL 32816, United States.
| | | | | |
Collapse
|
50
|
de Castro M, Cancho VG, Rodrigues J. A hands-on approach for fitting long-term survival models under the GAMLSS framework. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2010; 97:168-177. [PMID: 19758722 DOI: 10.1016/j.cmpb.2009.08.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2008] [Revised: 08/05/2009] [Accepted: 08/11/2009] [Indexed: 05/28/2023]
Abstract
In many data sets from clinical studies there are patients insusceptible to the occurrence of the event of interest. Survival models which ignore this fact are generally inadequate. The main goal of this paper is to describe an application of the generalized additive models for location, scale, and shape (GAMLSS) framework to the fitting of long-term survival models. In this work the number of competing causes of the event of interest follows the negative binomial distribution. In this way, some well known models found in the literature are characterized as particular cases of our proposal. The model is conveniently parameterized in terms of the cured fraction, which is then linked to covariates. We explore the use of the gamlss package in R as a powerful tool for inference in long-term survival models. The procedure is illustrated with a numerical example.
Collapse
Affiliation(s)
- Mário de Castro
- Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Caixa Postal 668, 13560-970, São Carlos-SP, Brazil.
| | | | | |
Collapse
|