26
|
Zou Y, Ash JE, Park BJ, Lord D, Wu L. Empirical Bayes estimates of finite mixture of negative binomial regression models and its application to highway safety. J Appl Stat 2017. [DOI: 10.1080/02664763.2017.1389863] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
27
|
Shirazi M, Dhavala SS, Lord D, Geedipally SR. A methodology to design heuristics for model selection based on the characteristics of data: Application to investigate when the Negative Binomial Lindley (NB-L) is preferred over the Negative Binomial (NB). ACCIDENT; ANALYSIS AND PREVENTION 2017; 107:186-194. [PMID: 28886410 DOI: 10.1016/j.aap.2017.07.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Revised: 05/25/2017] [Accepted: 07/04/2017] [Indexed: 06/07/2023]
Abstract
Safety analysts usually use post-modeling methods, such as the Goodness-of-Fit statistics or the Likelihood Ratio Test, to decide between two or more competitive distributions or models. Such metrics require all competitive distributions to be fitted to the data before any comparisons can be accomplished. Given the continuous growth in introducing new statistical distributions, choosing the best one using such post-modeling methods is not a trivial task, in addition to all theoretical or numerical issues the analyst may face during the analysis. Furthermore, and most importantly, these measures or tests do not provide any intuitions into why a specific distribution (or model) is preferred over another (Goodness-of-Logic). This paper ponders into these issues by proposing a methodology to design heuristics for Model Selection based on the characteristics of data, in terms of descriptive summary statistics, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte-Carlo Simulations and (2) Machine Learning Classifiers, to design easy heuristics to predict the label of the 'most-likely-true' distribution for analyzing data. The proposed methodology was applied to investigate when the recently introduced Negative Binomial Lindley (NB-L) distribution is preferred over the Negative Binomial (NB) distribution. Heuristics were designed to select the 'most-likely-true' distribution between these two distributions, given a set of prescribed summary statistics of data. The proposed heuristics were successfully compared against classical tests for several real or observed datasets. Not only they are easy to use and do not need any post-modeling inputs, but also, using these heuristics, the analyst can attain useful information about why the NB-L is preferred over the NB - or vice versa- when modeling data.
Collapse
|
28
|
Wu L, Lord D. Examining the influence of link function misspecification in conventional regression models for developing crash modification factors. ACCIDENT; ANALYSIS AND PREVENTION 2017; 102:123-135. [PMID: 28282580 DOI: 10.1016/j.aap.2017.02.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Revised: 02/08/2017] [Accepted: 02/13/2017] [Indexed: 06/06/2023]
Abstract
This study further examined the use of regression models for developing crash modification factors (CMFs), specifically focusing on the misspecification in the link function. The primary objectives were to validate the accuracy of CMFs derived from the commonly used regression models (i.e., generalized linear models or GLMs with additive linear link functions) when some of the variables have nonlinear relationships and quantify the amount of bias as a function of the nonlinearity. Using the concept of artificial realistic data, various linear and nonlinear crash modification functions (CM-Functions) were assumed for three variables. Crash counts were randomly generated based on these CM-Functions. CMFs were then derived from regression models for three different scenarios. The results were compared with the assumed true values. The main findings are summarized as follows: (1) when some variables have nonlinear relationships with crash risk, the CMFs for these variables derived from the commonly used GLMs are all biased, especially around areas away from the baseline conditions (e.g., boundary areas); (2) with the increase in nonlinearity (i.e., nonlinear relationship becomes stronger), the bias becomes more significant; (3) the quality of CMFs for other variables having linear relationships can be influenced when mixed with those having nonlinear relationships, but the accuracy may still be acceptable; and (4) the misuse of the link function for one or more variables can also lead to biased estimates for other parameters. This study raised the importance of the link function when using regression models for developing CMFs.
Collapse
|
29
|
Hamza AV, Nikroo A, Alger E, Antipa N, Atherton LJ, Barker D, Baxamusa S, Bhandarkar S, Biesiada T, Buice E, Carr E, Castro C, Choate C, Conder A, Crippen J, Dylla-Spears R, Dzenitis E, Eddinger S, Emerich M, Fair J, Farrell M, Felker S, Florio J, Forsman A, Giraldez E, Hein N, Hoover D, Horner J, Huang H, Kozioziemski B, Kroll J, Lawson B, Letts SA, Lord D, Mapoles E, Mauldin M, Miller P, Montesanti R, Moreno K, Parham T, Nathan B, Reynolds J, Sater J, Segraves K, Seugling R, Stadermann M, Strauser R, Stephens R, Suratwala TI, Swisher M, Taylor JS, Wallace R, Wegner P, Wilkens H, Yoxalla B. Target Development for the National Ignition Campaign. FUSION SCIENCE AND TECHNOLOGY 2017. [DOI: 10.13182/fst15-163] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
30
|
Shirazi M, Reddy Geedipally S, Lord D. A Monte-Carlo simulation analysis for evaluating the severity distribution functions (SDFs) calibration methodology and determining the minimum sample-size requirements. ACCIDENT; ANALYSIS AND PREVENTION 2017; 98:303-311. [PMID: 27810672 DOI: 10.1016/j.aap.2016.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Revised: 09/26/2016] [Accepted: 10/04/2016] [Indexed: 06/06/2023]
Abstract
Severity distribution functions (SDFs) are used in highway safety to estimate the severity of crashes and conduct different types of safety evaluations and analyses. Developing a new SDF is a difficult task and demands significant time and resources. To simplify the process, the Highway Safety Manual (HSM) has started to document SDF models for different types of facilities. As such, SDF models have recently been introduced for freeway and ramps in HSM addendum. However, since these functions or models are fitted and validated using data from a few selected number of states, they are required to be calibrated to the local conditions when applied to a new jurisdiction. The HSM provides a methodology to calibrate the models through a scalar calibration factor. However, the proposed methodology to calibrate SDFs was never validated through research. Furthermore, there are no concrete guidelines to select a reliable sample size. Using extensive simulation, this paper documents an analysis that examined the bias between the 'true' and 'estimated' calibration factors. It was indicated that as the value of the true calibration factor deviates further away from '1', more bias is observed between the 'true' and 'estimated' calibration factors. In addition, simulation studies were performed to determine the calibration sample size for various conditions. It was found that, as the average of the coefficient of variation (CV) of the 'KAB' and 'C' crashes increases, the analyst needs to collect a larger sample size to calibrate SDF models. Taking this observation into account, sample-size guidelines are proposed based on the average CV of crash severities that are used for the calibration process.
Collapse
|
31
|
Park BJ, Lord D, Wu L. Finite mixture modeling approach for developing crash modification factors in highway safety analysis. ACCIDENT; ANALYSIS AND PREVENTION 2016; 97:274-287. [PMID: 27974277 DOI: 10.1016/j.aap.2016.10.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 10/17/2016] [Accepted: 10/18/2016] [Indexed: 06/06/2023]
Abstract
This study aimed to investigate the relative performance of two models (negative binomial (NB) model and two-component finite mixture of negative binomial models (FMNB-2)) in terms of developing crash modification factors (CMFs). Crash data on rural multilane divided highways in California and Texas were modeled with the two models, and crash modification functions (CMFunctions) were derived. The resultant CMFunction estimated from the FMNB-2 model showed several good properties over that from the NB model. First, the safety effect of a covariate was better reflected by the CMFunction developed using the FMNB-2 model, since the model takes into account the differential responsiveness of crash frequency to the covariate. Second, the CMFunction derived from the FMNB-2 model is able to capture nonlinear relationships between covariate and safety. Finally, following the same concept as those for NB models, the combined CMFs of multiple treatments were estimated using the FMNB-2 model. The results indicated that they are not the simple multiplicative of single ones (i.e., their safety effects are not independent under FMNB-2 models). Adjustment Factors (AFs) were then developed. It is revealed that current Highway Safety Manual's method could over- or under-estimate the combined CMFs under particular combination of covariates. Safety analysts are encouraged to consider using the FMNB-2 models for developing CMFs and AFs.
Collapse
|
32
|
Shirazi M, Lord D, Geedipally SR. Sample-size guidelines for recalibrating crash prediction models: Recommendations for the highway safety manual. ACCIDENT; ANALYSIS AND PREVENTION 2016; 93:160-168. [PMID: 27183517 DOI: 10.1016/j.aap.2016.04.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2016] [Revised: 04/08/2016] [Accepted: 04/09/2016] [Indexed: 06/05/2023]
Abstract
The Highway Safety Manual (HSM) prediction models are fitted and validated based on crash data collected from a selected number of states in the United States. Therefore, for a jurisdiction to be able to fully benefit from applying these models, it is necessary to calibrate or recalibrate them to local conditions. The first edition of the HSM recommends calibrating the models using a one-size-fits-all sample-size of 30-50 locations with total of at least 100 crashes per year. However, the HSM recommendation is not fully supported by documented studies. The objectives of this paper are consequently: (1) to examine the required sample size based on the characteristics of the data that will be used for the calibration or recalibration process; and, (2) propose revised guidelines. The objectives were accomplished using simulation runs for different scenarios that characterized the sample mean and variance of the data. The simulation results indicate that as the ratio of the standard deviation to the mean (i.e., coefficient of variation) of the crash data increases, a larger sample-size is warranted to fulfill certain levels of accuracy. Taking this observation into account, sample-size guidelines were prepared based on the coefficient of variation of the crash data that are needed for the calibration process. The guidelines were then successfully applied to the two observed datasets. The proposed guidelines can be used for all facility types and both for segment and intersection prediction models.
Collapse
|
33
|
Shirazi M, Lord D, Dhavala SS, Geedipally SR. A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data. ACCIDENT; ANALYSIS AND PREVENTION 2016; 91:10-18. [PMID: 26945472 DOI: 10.1016/j.aap.2016.02.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Revised: 02/21/2016] [Accepted: 02/22/2016] [Indexed: 06/05/2023]
Abstract
Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB) model, which cannot handle this kind of data efficiently. The research documented in this paper continues the work related to multi-parameter models. The objective of this paper is to document the development and application of a flexible NB generalized linear model with randomly distributed mixed effects characterized by the Dirichlet process (NB-DP) to model crash data. The objective of the study was accomplished using two datasets. The new model was compared to the NB and the recently introduced model based on the mixture of the NB and Lindley (NB-L) distributions. Overall, the research study shows that the NB-DP model offers a better performance than the NB model once data are over-dispersed and have a heavy tail. The NB-DP performed better than the NB-L when the dataset has a heavy tail, but a smaller percentage of zeros. However, both models performed similarly when the dataset contained a large amount of zeros. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data, such as the identification of outliers and sources of dispersion.
Collapse
|
34
|
Imprialou MIM, Quddus M, Pitfield DE, Lord D. Re-visiting crash-speed relationships: A new perspective in crash modelling. ACCIDENT; ANALYSIS AND PREVENTION 2016; 86:173-185. [PMID: 26571206 DOI: 10.1016/j.aap.2015.10.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 08/20/2015] [Accepted: 10/01/2015] [Indexed: 06/05/2023]
Abstract
Although speed is considered to be one of the main crash contributory factors, research findings are inconsistent. Independent of the robustness of their statistical approaches, crash frequency models typically employ crash data that are aggregated using spatial criteria (e.g., crash counts by link termed as a link-based approach). In this approach, the variability in crashes between links is explained by highly aggregated average measures that may be inappropriate, especially for time-varying variables such as speed and volume. This paper re-examines crash-speed relationships by creating a new crash data aggregation approach that enables improved representation of the road conditions just before crash occurrences. Crashes are aggregated according to the similarity of their pre-crash traffic and geometric conditions, forming an alternative crash count dataset termed as a condition-based approach. Crash-speed relationships are separately developed and compared for both approaches by employing the annual crashes that occurred on the Strategic Road Network of England in 2012. The datasets are modelled by injury severity using multivariate Poisson lognormal regression, with multivariate spatial effects for the link-based model, using a full Bayesian inference approach. The results of the condition-based approach show that high speeds trigger crash frequency. The outcome of the link-based model is the opposite; suggesting that the speed-crash relationship is negative regardless of crash severity. The differences between the results imply that data aggregation is a crucial, yet so far overlooked, methodological element of crash data analyses that may have direct impact on the modelling outcomes.
Collapse
|
35
|
Khazraee SH, Sáez-Castillo AJ, Geedipally SR, Lord D. Application of the Hyper-Poisson Generalized Linear Model for Analyzing Motor Vehicle Crashes. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2015; 35:919-930. [PMID: 25385093 DOI: 10.1111/risa.12296] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The hyper-Poisson distribution can handle both over- and underdispersion, and its generalized linear model formulation allows the dispersion of the distribution to be observation-specific and dependent on model covariates. This study's objective is to examine the potential applicability of a newly proposed generalized linear model framework for the hyper-Poisson distribution in analyzing motor vehicle crash count data. The hyper-Poisson generalized linear model was first fitted to intersection crash data from Toronto, characterized by overdispersion, and then to crash data from railway-highway crossings in Korea, characterized by underdispersion. The results of this study are promising. When fitted to the Toronto data set, the goodness-of-fit measures indicated that the hyper-Poisson model with a variable dispersion parameter provided a statistical fit as good as the traditional negative binomial model. The hyper-Poisson model was also successful in handling the underdispersed data from Korea; the model performed as well as the gamma probability model and the Conway-Maxwell-Poisson model previously developed for the same data set. The advantages of the hyper-Poisson model studied in this article are noteworthy. Unlike the negative binomial model, which has difficulties in handling underdispersed data, the hyper-Poisson model can handle both over- and underdispersed crash data. Although not a major issue for the Conway-Maxwell-Poisson model, the effect of each variable on the expected mean of crashes is easily interpretable in the case of this new model.
Collapse
|
36
|
Peng Y, Lord D, Zou Y. Applying the Generalized Waring model for investigating sources of variance in motor vehicle crash analysis. ACCIDENT; ANALYSIS AND PREVENTION 2014; 73:20-26. [PMID: 25173723 DOI: 10.1016/j.aap.2014.07.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Revised: 07/28/2014] [Accepted: 07/28/2014] [Indexed: 06/03/2023]
Abstract
As one of the major analysis methods, statistical models play an important role in traffic safety analysis. They can be used for a wide variety of purposes, including establishing relationships between variables and understanding the characteristics of a system. The purpose of this paper is to document a new type of model that can help with the latter. This model is based on the Generalized Waring (GW) distribution. The GW model yields more information about the sources of the variance observed in datasets than other traditional models, such as the negative binomial (NB) model. In this regards, the GW model can separate the observed variability into three parts: (1) the randomness, which explains the model's uncertainty; (2) the proneness, which refers to the internal differences between entities or observations; and (3) the liability, which is defined as the variance caused by other external factors that are difficult to be identified and have not been included as explanatory variables in the model. The study analyses were accomplished using two observed datasets to explore potential sources of variation. The results show that the GW model can provide meaningful information about sources of variance in crash data and also performs better than the NB model.
Collapse
|
37
|
Park BJ, Lord D, Lee C. Finite mixture modeling for vehicle crash data with application to hotspot identification. ACCIDENT; ANALYSIS AND PREVENTION 2014; 71:319-326. [PMID: 24992301 DOI: 10.1016/j.aap.2014.05.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Revised: 05/26/2014] [Accepted: 05/27/2014] [Indexed: 06/03/2023]
Abstract
The application of finite mixture regression models has recently gained an interest from highway safety researchers because of its considerable potential for addressing unobserved heterogeneity. Finite mixture models assume that the observations of a sample arise from two or more unobserved components with unknown proportions. Both fixed and varying weight parameter models have been shown to be useful for explaining the heterogeneity and the nature of the dispersion in crash data. Given the superior performance of the finite mixture model, this study, using observed and simulated data, investigated the relative performance of the finite mixture model and the traditional negative binomial (NB) model in terms of hotspot identification. For the observed data, rural multilane segment crash data for divided highways in California and Texas were used. The results showed that the difference measured by the percentage deviation in ranking orders was relatively small for this dataset. Nevertheless, the ranking results from the finite mixture model were considered more reliable than the NB model because of the better model specification. This finding was also supported by the simulation study which produced a high number of false positives and negatives when a mis-specified model was used for hotspot identification. Regarding an optimal threshold value for identifying hotspots, another simulation analysis indicated that there is a discrepancy between false discovery (increasing) and false negative rates (decreasing). Since the costs associated with false positives and false negatives are different, it is suggested that the selected optimal threshold value should be decided by considering the trade-offs between these two costs so that unnecessary expenses are minimized.
Collapse
|
38
|
Heydari S, Miranda-Moreno LF, Lord D, Fu L. Bayesian methodology to estimate and update safety performance functions under limited data conditions: a sensitivity analysis. ACCIDENT; ANALYSIS AND PREVENTION 2014; 64:41-51. [PMID: 24316506 DOI: 10.1016/j.aap.2013.11.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 10/27/2013] [Accepted: 11/01/2013] [Indexed: 06/02/2023]
Abstract
In road safety studies, decision makers must often cope with limited data conditions. In such circumstances, the maximum likelihood estimation (MLE), which relies on asymptotic theory, is unreliable and prone to bias. Moreover, it has been reported in the literature that (a) Bayesian estimates might be significantly biased when using non-informative prior distributions under limited data conditions, and that (b) the calibration of limited data is plausible when existing evidence in the form of proper priors is introduced into analyses. Although the Highway Safety Manual (2010) (HSM) and other research studies provide calibration and updating procedures, the data requirements can be very taxing. This paper presents a practical and sound Bayesian method to estimate and/or update safety performance function (SPF) parameters combining the information available from limited data with the SPF parameters reported in the HSM. The proposed Bayesian updating approach has the advantage of requiring fewer observations to get reliable estimates. This paper documents this procedure. The adopted technique is validated by conducting a sensitivity analysis through an extensive simulation study with 15 different models, which include various prior combinations. This sensitivity analysis contributes to our understanding of the comparative aspects of a large number of prior distributions. Furthermore, the proposed method contributes to unification of the Bayesian updating process for SPFs. The results demonstrate the accuracy of the developed methodology. Therefore, the suggested approach offers considerable promise as a methodological tool to estimate and/or update baseline SPFs and to evaluate the efficacy of road safety countermeasures under limited data conditions.
Collapse
|
39
|
Ye Z, Zhang Y, Lord D. Goodness-of-fit testing for accident models with low means. ACCIDENT; ANALYSIS AND PREVENTION 2013; 61:78-86. [PMID: 23219076 DOI: 10.1016/j.aap.2012.11.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2011] [Revised: 09/06/2012] [Accepted: 11/08/2012] [Indexed: 06/01/2023]
Abstract
The modeling of relationships between motor vehicle crashes and underlying factors has been investigated for more than three decades. Recently, many highway safety studies have documented the use of negative binomial (NB) regression models. On rare occasions, the Poisson model may be the only alternative especially when crash sample mean is low. Pearson's X(2) and the scaled deviance (G(2)) are two common test statistics that have been proposed as measures of goodness-of-fit (GOF) for Poisson or NB models. Unfortunately, transportation safety analysts often deal with crash data that are characterized by low sample mean values. Under such conditions, the traditional test statistics may not perform very well. This study has three objectives. The first objective is to examine all the traditional test statistics and compare their performance for the GOF of accident models subjected to low sample means. Secondly, this study proposes a new test statistic that is not dependent on the sample size for Poisson regression model, as opposed to the grouped G(2) method. The proposed method is easy to use and does not require grouping data, which is time consuming and may not be feasible to use if the sample size is small. Moreover, the proposed method can be used for lower sample means than documented in previous studies. Thirdly, this study provides guidance on how and when to use appropriate test statistics for both Poisson and negative binomial (NB) regression models.
Collapse
|
40
|
Zou Y, Geedipally SR, Lord D. Evaluating the double Poisson generalized linear model. ACCIDENT; ANALYSIS AND PREVENTION 2013; 59:497-505. [PMID: 23954684 DOI: 10.1016/j.aap.2013.07.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Revised: 07/09/2013] [Accepted: 07/12/2013] [Indexed: 06/02/2023]
Abstract
The objectives of this study are to: (1) examine the applicability of the double Poisson (DP) generalized linear model (GLM) for analyzing motor vehicle crash data characterized by over- and under-dispersion and (2) compare the performance of the DP GLM with the Conway-Maxwell-Poisson (COM-Poisson) GLM in terms of goodness-of-fit and theoretical soundness. The DP distribution has seldom been investigated and applied since its first introduction two decades ago. The hurdle for applying the DP is related to its normalizing constant (or multiplicative constant) which is not available in closed form. This study proposed a new method to approximate the normalizing constant of the DP with high accuracy and reliability. The DP GLM and COM-Poisson GLM were developed using two observed over-dispersed datasets and one observed under-dispersed dataset. The modeling results indicate that the DP GLM with its normalizing constant approximated by the new method can handle crash data characterized by over- and under-dispersion. Its performance is comparable to the COM-Poisson GLM in terms of goodness-of-fit (GOF), although COM-Poisson GLM provides a slightly better fit. For the over-dispersed data, the DP GLM performs similar to the NB GLM. Considering the fact that the DP GLM can be easily estimated with inexpensive computation and that it is simpler to interpret coefficients, it offers a flexible and efficient alternative for researchers to model count data.
Collapse
|
41
|
Miranda-Moreno LF, Heydari S, Lord D, Fu L. Bayesian road safety analysis: incorporation of past evidence and effect of hyper-prior choice. JOURNAL OF SAFETY RESEARCH 2013; 46:31-40. [PMID: 23932683 DOI: 10.1016/j.jsr.2013.03.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Revised: 03/11/2013] [Accepted: 03/11/2013] [Indexed: 06/02/2023]
Abstract
PROBLEM This paper aims to address two related issues when applying hierarchical Bayesian models for road safety analysis, namely: (a) how to incorporate available information from previous studies or past experiences in the (hyper) prior distributions for model parameters and (b) what are the potential benefits of incorporating past evidence on the results of a road safety analysis when working with scarce accident data (i.e., when calibrating models with crash datasets characterized by a very low average number of accidents and a small number of sites). METHOD A simulation framework was developed to evaluate the performance of alternative hyper-priors including informative and non-informative Gamma, Pareto, as well as Uniform distributions. Based on this simulation framework, different data scenarios (i.e., number of observations and years of data) were defined and tested using crash data collected at 3-legged rural intersections in California and crash data collected for rural 4-lane highway segments in Texas. RESULTS This study shows how the accuracy of model parameter estimates (inverse dispersion parameter) is considerably improved when incorporating past evidence, in particular when working with the small number of observations and crash data with low mean. The results also illustrates that when the sample size (more than 100 sites) and the number of years of crash data is relatively large, neither the incorporation of past experience nor the choice of the hyper-prior distribution may affect the final results of a traffic safety analysis. CONCLUSIONS As a potential solution to the problem of low sample mean and small sample size, this paper suggests some practical guidance on how to incorporate past evidence into informative hyper-priors. By combining evidence from past studies and data available, the model parameter estimates can significantly be improved. The effect of prior choice seems to be less important on the hotspot identification. IMPACT ON INDUSTRY The results show the benefits of incorporating prior information when working with limited crash data in road safety studies.
Collapse
|
42
|
Zou Y, Zhang Y, Lord D. Application of finite mixture of negative binomial regression models with varying weight parameters for vehicle crash data analysis. ACCIDENT; ANALYSIS AND PREVENTION 2013; 50:1042-1051. [PMID: 23022076 DOI: 10.1016/j.aap.2012.08.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2012] [Revised: 07/23/2012] [Accepted: 08/05/2012] [Indexed: 06/01/2023]
Abstract
Recently, a finite mixture of negative binomial (NB) regression models has been proposed to address the unobserved heterogeneity problem in vehicle crash data. This approach can provide useful information about features of the population under study. For a standard finite mixture of regression models, previous studies have used a fixed weight parameter that is applied to the entire dataset. However, various studies suggest modeling the weight parameter as a function of the explanatory variables in the data. The objective of this study is to investigate the differences on the modeling and fitting results between the two-component finite mixture of NB regression models with fixed weight parameters (FMNB-2) and the two-component finite mixture of NB regression models with varying weight parameters (GFMNB-2), and compare the group classification from both models. To accomplish the objective of this study, the FMNB-2 and GFMNB-2 models are applied to two crash datasets. The important findings can be summarized as follows: first, the GFMNB-2 models can provide more reasonable classification results, as well as better statistical fitting performance than the FMNB-2 models; second, the GFMNB-2 models can be used to better reveal the source of dispersion observed in the crash data than the FMNB-2 models. Therefore, it is concluded that in many cases the GFMNB-2 models may be a better alternative to the FMNB-2 models for explaining the heterogeneity and the nature of the dispersion in the crash data.
Collapse
|
43
|
Lord D, Kuo PF. Examining the effects of site selection criteria for evaluating the effectiveness of traffic safety countermeasures. ACCIDENT; ANALYSIS AND PREVENTION 2012; 47:52-63. [PMID: 22405239 DOI: 10.1016/j.aap.2011.12.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2011] [Revised: 11/20/2011] [Accepted: 12/21/2011] [Indexed: 05/31/2023]
Abstract
The primary objective of this paper is to describe how site selection effects can influence the safety effectiveness of treatments. More specifically, the goal is to quantify the bias for the safety effectiveness of a treatment as a function of different entry criteria as well as other factors associated with crash data, and propose a new method to minimize this bias when a control group is not available. The study objective was accomplished using simulated data. The proposed method documented in this paper was compared to the four most common types of before-after studies: the Naïve, using a control group (CG), the empirical Bayes (EB) method based on the method of moment (EB(MM)), and the EB method based on a control group (EB(CG)). Five scenarios were examined: a direct comparison of the methods, different dispersion parameter values of the Negative Binomial model, different sample sizes, different values of the index of safety effectiveness (θ), and different levels of uncertainty associated with the index. Based on the simulated scenarios (also supported theoretically), the study results showed that higher entry criteria, larger values of the safety effectiveness, and smaller dispersion parameter values will cause a larger selection bias. Furthermore, among all methods evaluated, the Naïve and the EB(MM) methods are both significantly affected by the selection bias. Using a control group, or the EB(CG), can mutually eliminate the site selection bias, as long as the characteristics of the control group (truncated data for the CG method or the non-truncated sample population for the EB(CG) method) are exactly the same as for the treatment group. In practice, finding datasets for the control group with the exact same characteristics as for the treatment group may not always be feasible. To overcome this problem, the method proposed in this study can be used to adjust the Naïve estimator of the index of safety effectiveness, even when the mean and dispersion parameter are not properly estimated.
Collapse
|
44
|
Geedipally SR, Lord D, Dhavala SS. The negative binomial-Lindley generalized linear model: characteristics and application using crash data. ACCIDENT; ANALYSIS AND PREVENTION 2012; 45:258-265. [PMID: 22269508 DOI: 10.1016/j.aap.2011.07.012] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Revised: 07/18/2011] [Accepted: 07/18/2011] [Indexed: 05/31/2023]
Abstract
There has been a considerable amount of work devoted by transportation safety analysts to the development and application of new and innovative models for analyzing crash data. One important characteristic about crash data that has been documented in the literature is related to datasets that contained a large amount of zeros and a long or heavy tail (which creates highly dispersed data). For such datasets, the number of sites where no crash is observed is so large that traditional distributions and regression models, such as the Poisson and Poisson-gamma or negative binomial (NB) models cannot be used efficiently. To overcome this problem, the NB-Lindley (NB-L) distribution has recently been introduced for analyzing count data that are characterized by excess zeros. The objective of this paper is to document the application of a NB generalized linear model with Lindley mixed effects (NB-L GLM) for analyzing traffic crash data. The study objective was accomplished using simulated and observed datasets. The simulated dataset was used to show the general performance of the model. The model was then applied to two datasets based on observed data. One of the dataset was characterized by a large amount of zeros. The NB-L GLM was compared with the NB and zero-inflated models. Overall, the research study shows that the NB-L GLM not only offers superior performance over the NB and zero-inflated models when datasets are characterized by a large number of zeros and a long tail, but also when the crash dataset is highly dispersed.
Collapse
|
45
|
Patil S, Geedipally SR, Lord D. Analysis of crash severities using nested logit model--accounting for the underreporting of crashes. ACCIDENT; ANALYSIS AND PREVENTION 2012; 45:646-653. [PMID: 22269553 DOI: 10.1016/j.aap.2011.09.034] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2011] [Revised: 08/22/2011] [Accepted: 09/18/2011] [Indexed: 05/31/2023]
Abstract
Recent studies in the area of highway safety have demonstrated the usefulness of logit models for modeling crash injury severities. Use of these models enables one to identify and quantify the effects of factors that contribute to certain levels of severity. Most often, these models are estimated assuming equal probability of the occurrence for each injury severity level in the data. However, traffic crash data are generally characterized by underreporting, especially when crashes result in lower injury severity. Thus, the sample used for an analysis is often outcome-based, which can result in a biased estimation of model parameters. This is more of a problem when a nested logit model specification is used instead of a multinomial logit model and when true shares of the outcomes-injury severity levels in the population are not known (which is almost always the case). This study demonstrates an application of a recently proposed weighted conditional maximum likelihood estimator in tackling the problem of underreporting of crashes when using a nested logit model for crash severity analyses.
Collapse
|
46
|
Francis RA, Geedipally SR, Guikema SD, Dhavala SS, Lord D, LaRocca S. Characterizing the performance of the Conway-Maxwell Poisson generalized linear model. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2012; 32:167-183. [PMID: 21801191 DOI: 10.1111/j.1539-6924.2011.01659.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Count data are pervasive in many areas of risk analysis; deaths, adverse health outcomes, infrastructure system failures, and traffic accidents are all recorded as count events, for example. Risk analysts often wish to estimate the probability distribution for the number of discrete events as part of doing a risk assessment. Traditional count data regression models of the type often used in risk assessment for this problem suffer from limitations due to the assumed variance structure. A more flexible model based on the Conway-Maxwell Poisson (COM-Poisson) distribution was recently proposed, a model that has the potential to overcome the limitations of the traditional model. However, the statistical performance of this new model has not yet been fully characterized. This article assesses the performance of a maximum likelihood estimation method for fitting the COM-Poisson generalized linear model (GLM). The objectives of this article are to (1) characterize the parameter estimation accuracy of the MLE implementation of the COM-Poisson GLM, and (2) estimate the prediction accuracy of the COM-Poisson GLM using simulated data sets. The results of the study indicate that the COM-Poisson GLM is flexible enough to model under-, equi-, and overdispersed data sets with different sample mean values. The results also show that the COM-Poisson GLM yields accurate parameter estimates. The COM-Poisson GLM provides a promising and flexible approach for performing count data regression.
Collapse
|
47
|
Lord D, Geedipally SR. The negative binomial-Lindley distribution as a tool for analyzing crash data characterized by a large amount of zeros. ACCIDENT; ANALYSIS AND PREVENTION 2011; 43:1738-1742. [PMID: 21658501 DOI: 10.1016/j.aap.2011.04.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2011] [Revised: 04/01/2011] [Accepted: 04/02/2011] [Indexed: 05/30/2023]
Abstract
The modeling of crash count data is a very important topic in highway safety. As documented in the literature, given the characteristics associated with crash data, transportation safety analysts have proposed a significant number of analysis tools, statistical methods and models for analyzing such data. Among the data issues, we find the one related to crash data which have a large amount of zeros and a long or heavy tail. It has been found that using this kind of dataset could lead to erroneous results or conclusions if the wrong statistical tools or methods are used. Thus, the purpose of this paper is to introduce a new distribution, known as the negative binomial-Lindley (NB-L), which has very recently been introduced for analyzing data characterized by a large number of zeros. The NB-L offers the advantage of being able to handle this kind of datasets, while still maintaining similar characteristics as the traditional negative binomial (NB). In other words, the NB-L is a two-parameter distribution and the long-term mean is never equal to zero. To examine this distribution, simulated and observed data were used. The results show that the NB-L can provide a better statistical fit than the traditional NB for datasets that contain a large amount of zeros.
Collapse
|
48
|
Savolainen PT, Mannering FL, Lord D, Quddus MA. The statistical analysis of highway crash-injury severities: a review and assessment of methodological alternatives. ACCIDENT; ANALYSIS AND PREVENTION 2011; 43:1666-1676. [PMID: 21658493 DOI: 10.1016/j.aap.2011.03.025] [Citation(s) in RCA: 314] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2010] [Revised: 03/24/2011] [Accepted: 03/27/2011] [Indexed: 05/30/2023]
Abstract
Reducing the severity of injuries resulting from motor-vehicle crashes has long been a primary emphasis of highway agencies and motor-vehicle manufacturers. While progress can be simply measured by the reduction in injury levels over time, insights into the effectiveness of injury-reduction technologies, policies, and regulations require a more detailed empirical assessment of the complex interactions that vehicle, roadway, and human factors have on resulting crash-injury severities. Over the years, researchers have used a wide range of methodological tools to assess the impact of such factors on disaggregate-level injury-severity data, and recent methodological advances have enabled the development of sophisticated models capable of more precisely determining the influence of these factors. This paper summarizes the evolution of research and current thinking as it relates to the statistical analysis of motor-vehicle injury severities, and provides a discussion of future methodological directions.
Collapse
|
49
|
Lord D, Page R. Elucidating the functions of key regulators in biofilm formation and dispersal. Acta Crystallogr A 2011. [DOI: 10.1107/s010876731108785x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
50
|
Ye Z, Veneziano D, Lord D. Safety impact of Gateway Monuments. ACCIDENT; ANALYSIS AND PREVENTION 2011; 43:290-300. [PMID: 21094327 DOI: 10.1016/j.aap.2010.08.027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2009] [Revised: 05/09/2010] [Accepted: 08/29/2010] [Indexed: 05/30/2023]
Abstract
Gateway Monuments are free standing roadside structures or signage that communicate the name of a city, country or township to motorists. The placement of such monuments within state-controlled right-of-way is a relatively recent occurrence in California. As a result, the California Department of Transportation (Caltrans) initiated research to quantify the impacts that this type of signage may or may not have on crashes in their vicinity. To date, no specific research has examined the impact such features have on crashes. To determine whether these features impacted safety, the before-after study method using the Empirical Bayes technique was used, with reference groups and Safety Performance Functions adapted from existing studies, eliminating the need to calibrate new models. Results indicated that, on an individual basis, no deterioration in safety was observed at any monument site. When all sites were examined collectively (using two different scenarios), the calculated index of effectiveness values were 0.978 and 0.680, respectively, corresponding to 2.2% and 32.0% reductions in crashes. In addition to the EB method, naïve study methods (with and without AADT taken into account) were applied to the study data. Results (crash reductions) from these methods also showed that the presence of Gateway Monuments did not have negative impact on traffic safety. However, the use of EB technique should be very careful employed when adopting reference groups from different jurisdictions, as these may affect the validity of EB results. In light of these results, Caltrans may continue to participate in the Gateway Monument Program at its discretion with the knowledge that roadway safety is not impacted by monuments.
Collapse
|