76
|
Ramirez ED, Hagen SJ. The quantitative measure and statistical distribution of fame. PLoS One 2018; 13:e0200196. [PMID: 29979792 PMCID: PMC6034871 DOI: 10.1371/journal.pone.0200196] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2017] [Accepted: 06/21/2018] [Indexed: 11/19/2022] Open
Abstract
Fame and celebrity play an ever-increasing role in our culture. However, despite the cultural and economic importance of fame and its gradations, there exists no consensus method for quantifying the fame of an individual, or of comparing that of two individuals. We argue that, even if fame is difficult to measure with precision, one may develop useful metrics for fame that correlate well with intuition and that remain reasonably stable over time. Using datasets of recently deceased individuals who were highly renowned, we have evaluated several internet-based methods for quantifying fame. We find that some widely-used internet-derived metrics, such as search engine results, correlate poorly with human subject judgments of fame. However other metrics exist that agree well with human judgments and appear to offer workable, easily accessible measures of fame. Using such a metric we perform a preliminary investigation of the statistical distribution of fame, which has some of the power law character seen in other natural and social phenomena such as landslides and market crashes. In order to demonstrate how such findings can generate quantitative insight into celebrity culture, we assess some folk ideas regarding the frequency distribution and apparent clustering of celebrity deaths.
Collapse
|
77
|
Bhadra A, Rao A, Baladandayuthapani V. Inferring network structure in non-normal and mixed discrete-continuous genomic data. Biometrics 2018; 74:185-195. [PMID: 28437848 PMCID: PMC5654714 DOI: 10.1111/biom.12711] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 02/01/2017] [Accepted: 03/01/2017] [Indexed: 11/28/2022]
Abstract
Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach.
Collapse
|
78
|
Baggio S, Iglesias K, Rousson V. Modeling count data in the addiction field: Some simple recommendations. Int J Methods Psychiatr Res 2018; 27:e1585. [PMID: 29027305 PMCID: PMC6877188 DOI: 10.1002/mpr.1585] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 08/02/2017] [Accepted: 08/03/2017] [Indexed: 11/09/2022] Open
Abstract
Analyzing count data is frequent in addiction studies but may be cumbersome, time-consuming, and cause misleading inference if models are not correctly specified. We compared different statistical models in a simulation study to provide simple, yet valid, recommendations when analyzing count data.We used 2 simulation studies to test the performance of 7 statistical models (classical or quasi-Poisson regression, classical or zero-inflated negative binomial regression, classical or heteroskedasticity-consistent linear regression, and Mann-Whitney test) for predicting the differences between population means for 9 different population distributions (Poisson, negative binomial, zero- and one-inflated Poisson and negative binomial, uniform, left-skewed, and bimodal). We considered a large number of scenarios likely to occur in addiction research: presence of outliers, unbalanced design, and the presence of confounding factors. In unadjusted models, the Mann-Whitney test was the best model, followed closely by the heteroskedasticity-consistent linear regression and quasi-Poisson regression. Poisson regression was by far the worst model. In adjusted models, quasi-Poisson regression was the best model. If the goal is to compare 2 groups with respect to count data, a simple recommendation would be to use quasi-Poisson regression, which was the most generally valid model in our extensive simulations.
Collapse
|
79
|
Harder S, Graff J, Klinkhardt U, von Hentig N, Walenga JM, Watanabe H, Osakabe M, Breddin HK. Transition from argatroban to oral anticoagulation with phenprocoumon or acenocoumarol: effects on prothrombin time, activated partial thromboplastin time, and Ecarin Clotting Time. Thromb Haemost 2017; 91:1137-45. [PMID: 15175800 DOI: 10.1160/th03-12-0794] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
SummaryTreatment with the direct thrombin inhibitor argatroban (ARG) is often followed by vitamin K-antagonist treatment (VKA). Phenprocoumon (PC) and acenocoumarol (AC) are frequently used in Europe. The standard monitoring test for VKA, prothrombin time (PT), is prolonged by direct thrombin inhibitors. Therefore the International Normalized Ratio (INR) obtained during combined treatment does not reflect the true effect of the VKA. A similar interference of the VKA on the activated partial thromboplastin time (aPTT), a monitoring assay for direct thrombin inhibitors, can occur. In 39 healthy volunteers the effect of ARG alone or combined with PC or AC on PT, INR, aPTT, and Ecarin Clotting Time (ECT) was investigated. 6 groups each of 6-8 volunteers received a 5-hour infusion of either 1.0, 2.0 or 3.0 µg/kg/min ARG (days 1, 3, 4 and 5) before initiation of either PC or AC (day 1) and during continued VKA dosing (target INR 2-3). A linear relationship (INR ARG+VKA = intercept + slope * INR VKA alone) was observed between the INR measured “on” and “off” ARG.The slope depended on the argatroban dose and on the International Sensitivity Index (ISI) of the PT reagent, the steepest slope (i.e., the largest difference between INR ARG+VKA and INR VKA alone) was seen with the highest ARG dose and the PT reagent with an ISI of 2.13.There was a close correlation between plasma levels of ARG and aPTT or ECT. Under VKA the ARG-aPTT relationship indicated an increased sensitivity of the aPTT to ARG, VKA treatment had no effect on the prolongation of the ECT induced by argatroban. In conclusion, ARG at doses up to 2 µg/kg/min can be discontinued at an INR of 4.0 on combined therapy with VKA, as this would correspond to an INR between 2.2 and 3.7 for the VKA. If it is necessary to monitor ARG in the critical transition period, the ECT which is not influenced by VKA can be used as an alternative to the aPTT.
Collapse
|
80
|
Sampaio C, Wang RF. The cause of category-based distortions in spatial memory: A distribution analysis. J Exp Psychol Learn Mem Cogn 2017; 43:1988-1992. [PMID: 28504529 DOI: 10.1037/xlm0000424] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
81
|
Ultsch A, Lötsch J. A data science based standardized Gini index as a Lorenz dominance preserving measure of the inequality of distributions. PLoS One 2017; 12:e0181572. [PMID: 28796778 PMCID: PMC5552103 DOI: 10.1371/journal.pone.0181572] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2017] [Accepted: 06/23/2017] [Indexed: 11/19/2022] Open
Abstract
The Gini index is a measure of the inequality of a distribution that can be derived from Lorenz curves. While commonly used in, e.g., economic research, it suffers from ambiguity via lack of Lorenz dominance preservation. Here, investigation of large sets of empirical distributions of incomes of the World's countries over several years indicated firstly, that the Gini indices are centered on a value of 33.33% corresponding to the Gini index of the uniform distribution and secondly, that the Lorenz curves of these distributions are consistent with Lorenz curves of log-normal distributions. This can be employed to provide a Lorenz dominance preserving equivalent of the Gini index. Therefore, a modified measure based on log-normal approximation and standardization of Lorenz curves is proposed. The so-called UGini index provides a meaningful and intuitive standardization on the uniform distribution as this characterizes societies that provide equal chances. The novel UGini index preserves Lorenz dominance. Analysis of the probability density distributions of the UGini index of the World's counties income data indicated multimodality in two independent data sets. Applying Bayesian statistics provided a data-based classification of the World's countries' income distributions. The UGini index can be re-transferred into the classical index to preserve comparability with previous research.
Collapse
|
82
|
Tomitaka S, Kawasaki Y, Ide K, Akutagawa M, Yamada H, Ono Y, Furukawa TA. Characteristic distribution of the total and individual item scores on the Kessler Screening Scale for Psychological Distress (K6) in US adults. BMC Psychiatry 2017; 17:290. [PMID: 28784101 PMCID: PMC5545851 DOI: 10.1186/s12888-017-1449-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 07/28/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The distributional pattern of total scores on depression screening scales in the general population has not been well studied. Recent studies suggest that the total scores on depression screening scales follow an exponential pattern, with the exception of the lower end of the distribution. To further investigate the findings, we determined the distributions of the total and individual item scores on the Kessler Screening Scale for Psychological Distress (K6). METHODS Data were obtained from the National Survey of Midlife Development in the United States. Participants comprised 6,223 individuals between the ages of 25 and 74. The distributions of the total and individual item scores in various combinations were investigated with histograms and regression analysis. RESULTS Irrespective of the combination of items, the total and individual item scores followed an exponential pattern except at the lower scores. The estimated rate parameters of regression analysis were similar among distributions with the same number of chosen items. At the lower scores, the distributional patterns of total scores varied according to the ratio of "a little" to "none" for each item response. CONCLUSIONS The present results have the potential to estimate the distribution of depressive symptoms in the general population. While the degree of depressive symptoms varies from individual to individual, an entire population may show a certain mathematical distribution.
Collapse
|
83
|
Wu SH, Schwartz RS, Winter DJ, Conrad DF, Cartwright RA. Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions. Bioinformatics 2017; 33:2322-2329. [PMID: 28334373 PMCID: PMC5860108 DOI: 10.1093/bioinformatics/btx133] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 01/22/2017] [Accepted: 03/07/2017] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. RESULTS We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. AVAILABILITY AND IMPLEMENTATION Methods and data files are available at https://github.com/CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). CONTACT cartwright@asu.edu. SUPPLEMENTARY INFORMATION Supplementary data is available at Bioinformatics online.
Collapse
|
84
|
Warton DI, Thibaut L, Wang YA. The PIT-trap-A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses. PLoS One 2017; 12:e0181790. [PMID: 28738071 PMCID: PMC5524334 DOI: 10.1371/journal.pone.0181790] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 07/07/2017] [Indexed: 11/19/2022] Open
Abstract
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
Collapse
|
85
|
Souza Vilela Podestá T, Venzel Rosembach T, Aparecida dos Santos A, Lobato Martins M. Anomalous diffusion and q-Weibull velocity distributions in epithelial cell migration. PLoS One 2017; 12:e0180777. [PMID: 28700652 PMCID: PMC5507264 DOI: 10.1371/journal.pone.0180777] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 06/21/2017] [Indexed: 12/03/2022] Open
Abstract
In multicellular organisms, cell motility is central in all morphogenetic processes, tissue maintenance, wound healing and immune surveillance. Hence, the control of cell motion is a major demand in the creation of artificial tissues and organs. Here, cell migration assays on plastic 2D surfaces involving normal (MDCK) and tumoral (B16F10) epithelial cell lines were performed varying the initial density of plated cells. Through time-lapse microscopy quantities such as speed distributions, velocity autocorrelations and spatial correlations, as well as the scaling of mean-squared displacements were determined. We find that these cells exhibit anomalous diffusion with q-Weibull speed distributions that evolves non-monotonically to a Maxwellian distribution as the initial density of plated cells increases. Although short-ranged spatial velocity correlations mark the formation of small cell clusters, the emergence of collective motion was not observed. Finally, simulational results from a correlated random walk and the Vicsek model of collective dynamics evidence that fluctuations in cell velocity orientations are sufficient to produce q-Weibull speed distributions seen in our migration assays.
Collapse
|
86
|
Gragg J, Klose E, Yang J. Modelling the stochastic nature of the available coefficient of friction at footwear-floor interfaces. ERGONOMICS 2017; 60:977-984. [PMID: 27592564 DOI: 10.1080/00140139.2016.1231346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The available coefficient of friction (ACOF) is a measure of the friction available between two surfaces, which for human gait would be the footwear-floor interface. It is often compared to the required coefficient of friction (RCOF) to determine the likelihood of a slip in gait. Both the ACOF and RCOF are stochastic by nature meaning that neither should be represented by a deterministic value, such as the sample mean. Previous research has determined that the RCOF can be modelled well by either the normal or lognormal distributions, but previous research aimed at determining an appropriate distribution for the ACOF was inconclusive. This study focuses on modelling the stochastic nature of the ACOF by fitting eight continuous probability distributions to ACOF data for six scenarios. In addition, the data were used to study the effect that a simple housekeeping action such as sweeping could have on the ACOF. Practitioner Summary: Previous research aimed at determining an appropriate distribution for the ACOF was inconclusive. The study addresses this issue as well as looking at the effect that an act such as sweeping has on the ACOF.
Collapse
|
87
|
Gerlovina I, van der Laan MJ, Hubbard A. Big Data, Small Sample. Int J Biostat 2017; 13:/j/ijb.2017.13.issue-1/ijb-2017-0012/ijb-2017-0012.xml. [PMID: 28599385 DOI: 10.1515/ijb-2017-0012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Multiple comparisons and small sample size, common characteristics of many types of "Big Data" including those that are produced by genomic studies, present specific challenges that affect reliability of inference. Use of multiple testing procedures necessitates calculation of very small tail probabilities of a test statistic distribution. Results based on large deviation theory provide a formal condition that is necessary to guarantee error rate control given practical sample sizes, linking the number of tests and the sample size; this condition, however, is rarely satisfied. Using methods that are based on Edgeworth expansions (relying especially on the work of Peter Hall), we explore the impact of departures of sampling distributions from typical assumptions on actual error rates. Our investigation illustrates how far the actual error rates can be from the declared nominal levels, suggesting potentially wide-spread problems with error rate control, specifically excessive false positives. This is an important factor that contributes to "reproducibility crisis". We also review some other commonly used methods (such as permutation and methods based on finite sampling inequalities) in their application to multiple testing/small sample data. We point out that Edgeworth expansions, providing higher order approximations to the sampling distribution, offer a promising direction for data analysis that could improve reliability of studies relying on large numbers of comparisons with modest sample sizes.
Collapse
|
88
|
Liu J, Wu Z, Wu J, Dong J, Zhao Y, Wen D. A Weibull distribution accrual failure detector for cloud computing. PLoS One 2017; 12:e0173666. [PMID: 28278229 PMCID: PMC5344516 DOI: 10.1371/journal.pone.0173666] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Accepted: 02/25/2017] [Indexed: 12/02/2022] Open
Abstract
Failure detectors are used to build high availability distributed systems as the fundamental component. To meet the requirement of a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. However, several implementations of accrual failure detectors do not adapt well to the cloud service environment. To solve this problem, a new accrual failure detector based on Weibull Distribution, called the Weibull Distribution Failure Detector, has been proposed specifically for cloud computing. It can adapt to the dynamic and unexpected network conditions in cloud computing. The performance of the Weibull Distribution Failure Detector is evaluated and compared based on public classical experiment data and cloud computing experiment data. The results show that the Weibull Distribution Failure Detector has better performance in terms of speed and accuracy in unstable scenarios, especially in cloud computing.
Collapse
|
89
|
Huang X, Zhang Y, Meng L, Qian M, Wong KKL, Abbott D, Zheng R, Zheng H, Niu L, Huang X, Zheng R, Zheng H, Wong KKL, Qian M, Zhang Y, Abbott D, Niu L, Meng L. Identification of Ultrasonic Echolucent Carotid Plaques Using Discrete Fréchet Distance Between Bimodal Gamma Distributions. IEEE Trans Biomed Eng 2017; 65:949-955. [PMID: 28278452 DOI: 10.1109/tbme.2017.2676129] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
OBJECTIVE Echolucent carotid plaques are associated with acute cardiovascular and cerebrovascular events (ACCEs) in atherosclerotic patients. The aim of this study was to develop a computer-aided method for identifying echolucent plaques. METHODS A total of 315 ultrasound images of carotid plaques (105 echo-rich, 105 intermediate, and 105 echolucent) collected from 153 patients were included in this study. A bimodal gamma distribution was proposed to model the pixel statistics in the gray scale images of plaques. The discrete Fréchet distance features (DFDFs) of each plaque were extracted based on the statistical model. The most discriminative features (MDFs) were obtained from DFDFs by the linear discriminant analysis, and a k-nearest-neighbor classifier was implemented for classification of different types of plaques. RESULTS The classification accuracy of the three types of plaques using MDFs can reach 77.46%. When a receiver operating characteristics curve was produced to identify echolucent plaques, the area under the curve was 0.831. CONCLUSION Our results indicate potential feasibility of the method for identifying echolucent plaques based on DFDFs. SIGNIFICANCE Our method may potentially improve the ability of noninvasive ultrasonic examination in risk prediction of ACCEs for patients with plaques.
Collapse
|
90
|
|
91
|
Rights JD, Sterba SK. The relationship between multilevel models and non-parametric multilevel mixture models: Discrete approximation of intraclass correlation, random coefficient distributions, and residual heteroscedasticity. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2016; 69:316-343. [PMID: 27458827 DOI: 10.1111/bmsp.12073] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 05/16/2016] [Indexed: 06/06/2023]
Abstract
Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level 2. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed.
Collapse
|
92
|
Wilcox RR. Comparing dependent robust correlations. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2016; 69:215-224. [PMID: 27114391 DOI: 10.1111/bmsp.12069] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Revised: 03/13/2016] [Indexed: 06/05/2023]
Abstract
Let r1 and r2 be two dependent estimates of Pearson's correlation. There is a substantial literature on testing H0 : ρ1 = ρ2 , the hypothesis that the population correlation coefficients are equal. However, it is well known that Pearson's correlation is not robust. Even a single outlier can have a substantial impact on Pearson's correlation, resulting in a misleading understanding about the strength of the association among the bulk of the points. A way of mitigating this concern is to use a correlation coefficient that guards against outliers, many of which have been proposed. But apparently there are no results on how to compare dependent robust correlation coefficients when there is heteroscedasicity. Extant results suggest that a basic percentile bootstrap will perform reasonably well. This paper reports simulation results indicating the extent to which this is true when using Spearman's rho, a Winsorized correlation or a skipped correlation.
Collapse
|
93
|
Cheng Y, Liu H. A short note on the maximal point-biserial correlation under non-normality. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2016; 69:344-351. [PMID: 27458986 DOI: 10.1111/bmsp.12075] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2016] [Revised: 05/17/2016] [Indexed: 06/06/2023]
Abstract
The aim of this paper is to derive the maximal point-biserial correlation under non-normality. Several widely used non-normal distributions are considered, namely the uniform distribution, t-distribution, exponential distribution, and a mixture of two normal distributions. Results show that the maximal point-biserial correlation, depending on the non-normal continuous variable underlying the binary manifest variable, may not be a function of p (the probability that the dichotomous variable takes the value 1), can be symmetric or non-symmetric around p = .5, and may still lie in the range from -1.0 to 1.0. Therefore researchers should exercise caution when they interpret their sample point-biserial correlation coefficients based on popular beliefs that the maximal point-biserial correlation is always smaller than 1, and that the size of the correlation is always further restricted as p deviates from .5.
Collapse
|
94
|
Amiri S, Dinov ID. Comparison of genomic data via statistical distribution. J Theor Biol 2016; 407:318-327. [PMID: 27460589 PMCID: PMC5361063 DOI: 10.1016/j.jtbi.2016.07.032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 06/22/2016] [Accepted: 07/20/2016] [Indexed: 11/28/2022]
Abstract
Sequence comparison has become an essential tool in bioinformatics, because highly homologous sequences usually imply significant functional or structural similarity. Traditional sequence analysis techniques are based on preprocessing and alignment, which facilitate measuring and quantitative characterization of genetic differences, variability and complexity. However, recent developments of next generation and whole genome sequencing technologies give rise to new challenges that are related to measuring similarity and capturing rearrangements of large segments contained in the genome. This work is devoted to illustrating different methods recently introduced for quantifying sequence distances and variability. Most of the alignment-free methods rely on counting words, which are small contiguous fragments of the genome. Our approach considers the locations of nucleotides in the sequences and relies more on appropriate statistical distributions. The results of this technique for comparing sequences, by extracting information and comparing matching fidelity and location regularization information, are very encouraging, specifically to classify mutation sequences.
Collapse
|
95
|
Epley N, Dunning D. The Mixed Blessings of Self-Knowledge in Behavioral Prediction: Enhanced Discrimination but Exacerbated Bias. PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN 2016; 32:641-55. [PMID: 16702157 DOI: 10.1177/0146167205284007] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Four experiments demonstrate that self-knowledge provides a mixed blessing in behavioral prediction, depending on how accuracy is measured. Compared with predictions of others, selfknowledge tends to decrease overall accuracy by increasing bias (the mean difference between predicted behavior and reality) but tends to increase overall accuracy by also enhancing discrimination (the correlation between predicted behavior and reality). Overall, participants’ self-predictions overestimated the likelihood that they would engage in desirable behaviors (bias), whereas peer predictions were relatively unbiased. However, selfpredictions also were more strongly correlated with individual differences in actual behavior (discrimination) than were peer predictions. Discussion addresses the costs and benefits of selfknowledge in behavioral prediction and the broader implications of measuring judgmental accuracy of judgment in terms of bias versus discrimination.
Collapse
|
96
|
Abstract
The spatial distribution of income shapes the structure and organisation of cities and its understanding has broad societal implications. Despite an abundant literature, many issues remain unclear. In particular, all definitions of segregation are implicitely tied to a single indicator, usually rely on an ambiguous definition of income classes, without any consensus on how to define neighbourhoods and to deal with the polycentric organization of large cities. In this paper, we address all these questions within a unique conceptual framework. We avoid the challenge of providing a direct definition of segregation and instead start from a definition of what segregation is not. This naturally leads to the measure of representation that is able to identify locations where categories are over- or underrepresented. From there, we provide a new measure of exposure that discriminates between situations where categories co-locate or repel one another. We then use this feature to provide an unambiguous, parameter-free method to find meaningful breaks in the income distribution, thus defining classes. Applied to the 2014 American Community Survey, we find 3 emerging classes-low, middle and higher income-out of the original 16 income categories. The higher-income households are proportionally more present in larger cities, while lower-income households are not, invalidating the idea of an increased social polarisation. Finally, using the density-and not the distance to a center which is meaningless in polycentric cities-we find that the richer class is overrepresented in high density zones, especially for larger cities. This suggests that density is a relevant factor for understanding the income structure of cities and might explain some of the differences observed between US and European cities.
Collapse
|
97
|
|
98
|
Smits IAM, Timmerman ME, Stegeman A. Modelling non-normal data: The relationship between the skew-normal factor model and the quadratic factor model. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2016; 69:105-121. [PMID: 26566696 DOI: 10.1111/bmsp.12062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Revised: 09/14/2015] [Indexed: 06/05/2023]
Abstract
Maximum likelihood estimation of the linear factor model for continuous items assumes normally distributed item scores. We consider deviations from normality by means of a skew-normally distributed factor model or a quadratic factor model. We show that the item distributions under a skew-normal factor are equivalent to those under a quadratic model up to third-order moments. The reverse only holds if the quadratic loadings are equal to each other and within certain bounds. We illustrate that observed data which follow any skew-normal factor model can be so well approximated with the quadratic factor model that the models are empirically indistinguishable, and that the reverse does not hold in general. The choice between the two models to account for deviations of normality is illustrated by an empirical example from clinical psychology.
Collapse
|
99
|
Godfrey K. Fresh stirrings among statisticians: statistical commentary. AUSTRALIAN ORTHODONTIC JOURNAL 2016; 32:109-112. [PMID: 27468598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
For some years there has been unrest in the statistical world regarding the use of the p-value. It has been indicated that the significance of p-values is open to question, which therefore reduces the ability to measure the strength of evidence. This paper examines the use and misuse of the p-value and recommends consideration in its application.
Collapse
|
100
|
d'Acremont M, Bossaerts P. Neural Mechanisms Behind Identification of Leptokurtic Noise and Adaptive Behavioral Response. Cereb Cortex 2016; 26:1818-1830. [PMID: 26850528 PMCID: PMC4785960 DOI: 10.1093/cercor/bhw013] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Large-scale human interaction through, for example, financial markets causes ceaseless random changes in outcome variability, producing frequent and salient outliers that render the outcome distribution more peaked than the Gaussian distribution, and with longer tails. Here, we study how humans cope with this evolutionary novel leptokurtic noise, focusing on the neurobiological mechanisms that allow the brain, 1) to recognize the outliers as noise and 2) to regulate the control necessary for adaptive response. We used functional magnetic resonance imaging, while participants tracked a target whose movements were affected by leptokurtic noise. After initial overreaction and insufficient subsequent correction, participants improved performance significantly. Yet, persistently long reaction times pointed to continued need for vigilance and control. We ran a contrasting treatment where outliers reflected permanent moves of the target, as in traditional mean-shift paradigms. Importantly, outliers were equally frequent and salient. There, control was superior and reaction time was faster. We present a novel reinforcement learning model that fits observed choices better than the Bayes-optimal model. Only anterior insula discriminated between the 2 types of outliers. In both treatments, outliers initially activated an extensive bottom-up attention and belief network, followed by sustained engagement of the fronto-parietal control network.
Collapse
|