Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

9
(from Reference Citation Analysis)

Article PDFs (2)

Cited by > 0 (6)

Searched Name

Negative Binomial

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Gil-Marin JK, Shirazi M, Ivan JN. Assessing the Negative Binomial-Lindley model for crash hotspot identification: Insights from Monte Carlo simulation analysis. Accid Anal Prev 2024;199:107478. [PMID: 38458009 DOI: 10.1016/j.aap.2024.107478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 12/27/2023] [Accepted: 01/13/2024] [Indexed: 03/10/2024]

Abstract

Identifying hazardous crash sites (or hotspots) is a crucial step in highway safety management. The Negative Binomial (NB) model is the most common model used in safety analyses and evaluations - including hotspot identification. The NB model, however, is not without limitations. In fact, this model does not perform well when data are highly dispersed, include excess zero observations, or have a long tail. Recently, the Negative Binomial-Lindley (NB-L) model has been proposed as an alternative to the NB. The NB-L model overcomes several limitations related to the NB, such as addressing the issue of excess zero observations in highly dispersed data. However, it is not clear how the NB-L model performs regarding the hotspot identification. In this paper, an innovative Monte Carlo simulation protocol was designed to generate a wide range of simulated data characterized by different means, dispersions, and percentage of zeros. Next, the NB-L model was written as a Full-Bayes hierarchical model and compared with the Full-Bayes NB model for hotspot identification using extensive simulation scenarios. Most previous studies focused on statistical fit, and showed that the NB-L model fits the data better than the NB. In this research, however, we investigated the performance of the NB-L model in identifying the hazardous sites. We showed that there is a trade-off between the NB-L and NB when it comes to hotspot identification. Multiple performance metrics were used for the assessment. Among those, the results show that the NB-L model provides a better specificity in identifying hotspots, while the NB model provides a better sensitivity, especially for highly dispersed data. In other words, while the NB model performs better in identifying hazardous sites, the NB-L model performs better, when budget is limited, by not selecting non-hazardous sites as hazardous.

Collapse

Pelizzola M, Laursen R, Hobolth A. Model selection and robust inference of mutational signatures using Negative Binomial non-negative matrix factorization. BMC Bioinformatics 2023;24:187. [PMID: 37158829 PMCID: PMC10165836 DOI: 10.1186/s12859-023-05304-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open

Abstract

BACKGROUND

The spectrum of mutations in a collection of cancer genomes can be described by a mixture of a few mutational signatures. The mutational signatures can be found using non-negative matrix factorization (NMF). To extract the mutational signatures we have to assume a distribution for the observed mutational counts and a number of mutational signatures. In most applications, the mutational counts are assumed to be Poisson distributed, and the rank is chosen by comparing the fit of several models with the same underlying distribution and different values for the rank using classical model selection procedures. However, the counts are often overdispersed, and thus the Negative Binomial distribution is more appropriate.

RESULTS

We propose a Negative Binomial NMF with a patient specific dispersion parameter to capture the variation across patients and derive the corresponding update rules for parameter estimation. We also introduce a novel model selection procedure inspired by cross-validation to determine the number of signatures. Using simulations, we study the influence of the distributional assumption on our method together with other classical model selection procedures. We also present a simulation study with a method comparison where we show that state-of-the-art methods are highly overestimating the number of signatures when overdispersion is present. We apply our proposed analysis on a wide range of simulated data and on two real data sets from breast and prostate cancer patients. On the real data we describe a residual analysis to investigate and validate the model choice.

CONCLUSIONS

With our results on simulated and real data we show that our model selection procedure is more robust at determining the correct number of signatures under model misspecification. We also show that our model selection procedure is more accurate than the available methods in the literature for finding the true number of signatures. Lastly, the residual analysis clearly emphasizes the overdispersion in the mutational count data. The code for our model selection procedure and Negative Binomial NMF is available in the R package SigMoS and can be found at https://github.com/MartaPelizzola/SigMoS .

Collapse

Tan YL, Yiew TH, Lau LS, Tan AL. Environmental Kuznets curve for biodiversity loss: evidence from South and Southeast Asian countries. Environ Sci Pollut Res Int 2022;29:64004-64021. [PMID: 35467185 DOI: 10.1007/s11356-022-20090-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 04/01/2022] [Indexed: 06/14/2023]

Delgado R. Detecting target species: with how many samples? R Soc Open Sci 2022;9:220046. [PMID: 35958088 PMCID: PMC9364006 DOI: 10.1098/rsos.220046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 07/20/2022] [Indexed: 06/15/2023]

Goksuluk D, Zararsiz G, Korkmaz S, Eldem V, Zararsiz GE, Ozcetin E, Ozturk A, Karaagaoglu AE. MLSeq: Machine learning interface for RNA-sequencing data. Comput Methods Programs Biomed 2019;175:223-231. [PMID: 31104710 DOI: 10.1016/j.cmpb.2019.04.007] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 03/21/2019] [Accepted: 04/08/2019] [Indexed: 06/09/2023]

Abstract

BACKGROUND AND OBJECTIVE

In the last decade, RNA-sequencing technology has become method-of-choice and prefered to microarray technology for gene expression based classification and differential expression analysis since it produces less noisy data. Although there are many algorithms proposed for microarray data, the number of available algorithms and programs are limited for classification of RNA-sequencing data. For this reason, we developed MLSeq, to bring not only frequently used classification algorithms but also novel approaches together and make them available to be used for classification of RNA sequencing data. This package is developed using R language environment and distributed through BIOCONDUCTOR network.

METHODS

Classification of RNA-sequencing data is not straightforward since raw data should be preprocessed before downstream analysis. With MLSeq package, researchers can easily preprocess (normalization, filtering, transformation etc.) and classify raw RNA-sequencing data using two strategies: (i) to perform algorithms which are directly proposed for RNA-sequencing data structure or (ii) to transform RNA-sequencing data in order to bring it distributionally closer to microarray data structure, and perform algorithms which are developed for microarray data. Moreover, we proposed novel algorithms such as voom (an acronym for variance modelling at observational level) based nearest shrunken centroids (voomNSC), diagonal linear discriminant analysis (voomDLDA), etc. through MLSeq.

MATERIALS

Three real RNA-sequencing datasets (i.e cervical cancer, lung cancer and aging datasets) were used to evalute model performances. Poisson linear discriminant analysis (PLDA) and negative binomial linear discriminant analysis (NBLDA) were selected as algorithms based on dicrete distributions, and voomNSC, nearest shrunken centroids (NSC) and support vector machines (SVM) were selected as algorithms based on continuous distributions for model comparisons. Each algorithm is compared using classification accuracies and sparsities on an independent test set.

RESULTS

The algorithms which are based on discrete distributions performed better in cervical cancer and aging data with accuracies above 0.92. In lung cancer data, the most of algorithms performed similar with accuracies of 0.88 except that SVM achieved 0.94 of accuracy. Our voomNSC algorithm was the most sparse algorithm, and able to select 2.2% and 6.6% of all features for cervical cancer and lung cancer datasets respectively. However, in aging data, sparse classifiers were not able to select an optimal subset of all features.

CONCLUSION

MLSeq is comprehensive and easy-to-use interface for classification of gene expression data. It allows researchers perform both preprocessing and classification tasks through single platform. With this property, MLSeq can be considered as a pipeline for the classification of RNA-sequencing data.

Collapse

Largajolli A, Beerahee M, Yang S. Bayesian approach to investigate a two-state mixed model of COPD exacerbations. J Pharmacokinet Pharmacodyn 2019;46:371-384. [PMID: 31197640 PMCID: PMC6848253 DOI: 10.1007/s10928-019-09643-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 06/05/2019] [Indexed: 11/29/2022]

Abstract

Chronic obstructive pulmonary disease (COPD) is a chronic obstructive disease of the airways. An exacerbation of COPD is defined as shortness of breath, cough, and sputum production. New therapies for COPD exacerbations are examined in clinical trials frequently based on the number of exacerbations that implies long-term study due to the high variability in occurrence and duration of the events. In this work, we expanded the two-state model developed by Cook et al. where the patient transits from an asymptomatic (state 1) to a symptomatic state (state 2) and vice versa, through investigating different semi-Markov models in a Bayesian context using data from actual clinical trials. Of the four models tested, the log-logistic model was shown to adequately characterize the duration and number of COPD exacerbations. The patient disease stage was found a significant covariate with an effect of accelerating the transition from asymptomatic to symptomatic state. In addition, the best dropout model (log-logistic) was incorporated in the final two-state model to describe the dropout mechanism. Simulation based diagnostics such as posterior predictive check (PPC) and visual predictive check (VPC) were used to assess the behaviour of the model. The final model was applied in three clinical trial data to investigate its ability to detect the drug effect: the drug effect was captured in all three datasets and in both directions (from state 1 to state 2 and vice versa). A practical design investigation was also carried out and showed the limits of reducing the number of subjects and study length on the drug effect identification. Finally, clinical trial simulation confirmed that the model can potentially be used to predict medium term (6–12 months) clinical trial outcome using the first 3 months data, but at the expense of showing a non-significant drug effect.

Collapse

Shirazi M, Dhavala SS, Lord D, Geedipally SR. A methodology to design heuristics for model selection based on the characteristics of data: Application to investigate when the Negative Binomial Lindley (NB-L) is preferred over the Negative Binomial (NB). Accid Anal Prev 2017;107:186-194. [PMID: 28886410 DOI: 10.1016/j.aap.2017.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Revised: 05/25/2017] [Accepted: 07/04/2017] [Indexed: 06/07/2023]

Abstract

Safety analysts usually use post-modeling methods, such as the Goodness-of-Fit statistics or the Likelihood Ratio Test, to decide between two or more competitive distributions or models. Such metrics require all competitive distributions to be fitted to the data before any comparisons can be accomplished. Given the continuous growth in introducing new statistical distributions, choosing the best one using such post-modeling methods is not a trivial task, in addition to all theoretical or numerical issues the analyst may face during the analysis. Furthermore, and most importantly, these measures or tests do not provide any intuitions into why a specific distribution (or model) is preferred over another (Goodness-of-Logic). This paper ponders into these issues by proposing a methodology to design heuristics for Model Selection based on the characteristics of data, in terms of descriptive summary statistics, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte-Carlo Simulations and (2) Machine Learning Classifiers, to design easy heuristics to predict the label of the 'most-likely-true' distribution for analyzing data. The proposed methodology was applied to investigate when the recently introduced Negative Binomial Lindley (NB-L) distribution is preferred over the Negative Binomial (NB) distribution. Heuristics were designed to select the 'most-likely-true' distribution between these two distributions, given a set of prescribed summary statistics of data. The proposed heuristics were successfully compared against classical tests for several real or observed datasets. Not only they are easy to use and do not need any post-modeling inputs, but also, using these heuristics, the analyst can attain useful information about why the NB-L is preferred over the NB - or vice versa- when modeling data.

Collapse

Cairns J, Lynch AG, Tavaré S. Quantifying the impact of inter-site heterogeneity on the distribution of ChIP-seq data. Front Genet 2014;5:399. [PMID: 25452765 PMCID: PMC4231950 DOI: 10.3389/fgene.2014.00399] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 10/29/2014] [Indexed: 12/13/2022] Open

Goh KCK, Currie G, Sarvi M, Logan D. Bus accident analysis of routes with/without bus priority. Accid Anal Prev 2014;65:18-27. [PMID: 24406378 DOI: 10.1016/j.aap.2013.12.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 11/21/2013] [Accepted: 12/05/2013] [Indexed: 06/03/2023]