526
|
Abstract
The aim of this review was to review research addressing the relationship between population drinking and health, particularly mortality. The review is based primarily on articles published in international journals after 1994 to February 2005, identified via Medline. The method used in most studies is time-series analysis based on autoregressive intergrated moving average (ARIMA) modelling. The outcome measures covered included the following mortality indicators: mortality from liver cirrhosis and other alcohol-related diseases, accident mortality, suicide, homicide, ischaemic heart disease (IHD) mortality and all-cause mortality. The study countries included most of the EU member states as of 1995 (14 countries), Canada and the United States. For Eastern Europe there was only scanty evidence. The study period was in most cases the post-war period. There was a statistically significant relationship between per capita consumption and mortality from liver cirrhosis and other alcohol-related diseases in all countries. In about half the countries, there was a significant relationship between consumption, on one hand, and mortality from accidents and homicide as well as all-cause mortality on the other hand. A link between alcohol and suicide was found in all regions except for mid- and southern Europe. There was no systematic link between consumption and IHD mortality. Overall, a 1-litre increase in per capita consumption was associated with a stronger effect in northern Europe and Canada than in mid- and southern Europe. Research during the past decade has strengthened the notion of a relationship between population drinking and alcohol-related harm. At the same time, the marked regional variation in the magnitude of this relationship suggests the importance of drinking patterns for modifying the impact of alcohol. By and large, there was little evidence for any cardioprotective effect at the population level. It is a challenge for future research to reconcile this outcome with the findings from observational studies, most of which suggest a protective effect of moderate drinking.
Collapse
|
527
|
McClelland RL, Chung H, Detrano R, Post W, Kronmal RA. Distribution of coronary artery calcium by race, gender, and age: results from the Multi-Ethnic Study of Atherosclerosis (MESA). Circulation 2005; 113:30-7. [PMID: 16365194 DOI: 10.1161/circulationaha.105.580696] [Citation(s) in RCA: 574] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Coronary artery calcium (CAC) has been demonstrated to be associated with the risk of coronary heart disease. The Multi-Ethnic Study of Atherosclerosis (MESA) provides a unique opportunity to examine the distribution of CAC on the basis of age, gender, and race/ethnicity in a cohort free of clinical cardiovascular disease and treated diabetes. METHODS AND RESULTS MESA is a prospective cohort study designed to investigate subclinical cardiovascular disease in a multiethnic cohort free of clinical cardiovascular disease. The percentiles of the CAC distribution were estimated with nonparametric techniques. Treated diabetics were excluded from analysis. There were 6110 included in the analysis, with 53% female and an average age of 62 years. Men had greater calcium levels than women, and calcium amount and prevalence were steadily higher with increasing age. There were significant differences in calcium by race, and these associations differed across age and gender. For women, whites had the highest percentiles and Hispanics generally had the lowest; in the oldest age group, however, Chinese women had the lowest values. Overall, Chinese and black women were intermediate, with their order dependent on age. For men, whites consistently had the highest percentiles, and Hispanics had the second highest. Blacks were lowest at the younger ages, and Chinese were lowest at the older ages. At the MESA public website (http://www.mesa-nhlbi.org), an interactive form allows one to enter an age, gender, race/ethnicity, and CAC score to obtain a corresponding estimated percentile. CONCLUSIONS The information provided here can be used to examine whether a patient has a high CAC score relative to others with the same age, gender, and race/ethnicity who do not have clinical cardiovascular disease or treated diabetes.
Collapse
|
528
|
Foucher Y, Mathieu E, Saint-Pierre P, Durand JF, Daurès JP. A Semi-Markov Model Based on Generalized Weibull Distribution with an Illustration for HIV Disease. Biom J 2005; 47:825-33. [PMID: 16450855 DOI: 10.1002/bimj.200410170] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Multi-state stochastic models are useful tools for studying complex dynamics such as chronic diseases. Semi-Markov models explicitly define distributions of waiting times, giving an extension of continuous time and homogeneous Markov models based implicitly on exponential distributions. This paper develops a parametric model adapted to complex medical processes. (i) We introduced a hazard function of waiting times with a U or inverse U shape. (ii) These distributions were specifically selected for each transition. (iii) The vector of covariates was also selected for each transition. We applied this method to the evolution of HIV infected patients. We used a sample of 1244 patients followed up at the hospital in Nice, France.
Collapse
|
529
|
Stern HS. Model inference or model selection: Discussion of Klugkist, Laudy, and Hoijtink (2005). Psychol Methods 2005; 10:494-9. [PMID: 16393002 DOI: 10.1037/1082-989x.10.4.494] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
I. Klugkist, O. Laudy, and H. Hoijtink (2005) presented a Bayesian approach to analysis of variance models with inequality constraints. Constraints may play 2 distinct roles in data analysis. They may represent prior information that allows more precise inferences regarding parameter values, or they may describe a theory to be judged against the data. In the latter case, the authors emphasized the use of Bayes factors and posterior model probabilities to select the best theory. One difficulty is that interpretation of the posterior model probabilities depends on which other theories are included in the comparison. The posterior distribution of the parameters under an unconstrained model allows one to quantify the support provided by the data for inequality constraints without requiring the model selection framework.
Collapse
|
530
|
Andersen PK, Ekstrøm CT, Klein JP, Shu Y, Zhang MJ. A Class of Goodness of Fit Tests for a Copula Based on Bivariate Right-Censored Data. Biom J 2005; 47:815-24. [PMID: 16450854 DOI: 10.1002/bimj.200410163] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The copula of a bivariate distribution, constructed by making marginal transformations of each component, captures all the information in the bivariate distribution about the dependence between two variables. For frailty models for bivariate data the choice of a family of distributions for the random frailty corresponds to the choice of a parametric family for the copula. A class of tests of the hypothesis that the copula is in a given parametric family, with unspecified association parameter, based on bivariate right censored data is proposed. These tests are based on first making marginal Kaplan-Meier transformations of the data and then comparing a non-parametric estimate of the copula to an estimate based on the assumed family of models. A number of options are available for choosing the scale and the distance measure for this comparison. Significance levels of the test are found by a modified bootstrap procedure. The procedure is used to check the appropriateness of a gamma or a positive stable frailty model in a set of survival data on Danish twins.
Collapse
|
531
|
Chu H, Nie L. A Note on Comparing Exposure Data to a Regulatory Limit in the Presence of Unexposed and a Limit of Detection. Biom J 2005; 47:880-7. [PMID: 16450859 DOI: 10.1002/bimj.200510174] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In some occupational health studies, observations occur in both exposed and unexposed individuals. If the levels of all exposed individuals have been detected, a two-part zero-inflated log-normal model is usually recommended, which assumes that the data has a probability mass at zero for unexposed individuals and a continuous response for values greater than zero for exposed individuals. However, many quantitative exposure measurements are subject to left censoring due to values falling below assay detection limits. A zero-inflated log-normal mixture model is suggested in this situation since unexposed zeros are not distinguishable from those exposed with values below detection limits. In the context of this mixture distribution, the information contributed by values falling below a fixed detection limit is used only to estimate the probability of unexposed. We consider sample size and statistical power calculation when comparing the median of exposed measurements to a regulatory limit. We calculate the required sample size for the data presented in a recent paper comparing the benzene TWA exposure data to a regulatory occupational exposure limit. A simulation study is conducted to investigate the performance of the proposed sample size calculation methods.
Collapse
|
532
|
Shmueli A. The Visual Analog rating Scale of health-related quality of life: an examination of end-digit preferences. Health Qual Life Outcomes 2005; 3:71. [PMID: 16285884 PMCID: PMC1308843 DOI: 10.1186/1477-7525-3-71] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2005] [Accepted: 11/14/2005] [Indexed: 11/30/2022] Open
Abstract
Background The Visual Analog Scale (VAS) has been extensively used in the valuation of health-related quality of life (HRQL). The objective of this paper is to examine the measurement error (rounding) explanation for the higher prevalence of VAS scores ending with a zero, and to provide an alternative interpretation. Methods The analysis is based on more than 4,500 reported VAS valuations of own HRQL, included in two Israeli health surveys (1993 and 2000). Bivariate and logistic regression analyses are used. Results The results show that reporting VAS scores ending with a 0 (...-20, ..0,10,20.....) decreases and scores ending with a 5 (...-15,-5,5,15,25,...) and with any other integer (...-12, -11,...1,2,...,92,..99) increases as VAS scores depart from 50, particularly when increasing up to 100. This pattern remains after controlling for personal characteristics determining the level of VAS. Discussion Rounding true HRQL to the nearest 10's or 5's cannot explain the specific pattern found. It is suggested that this pattern corresponds to a S-shaped value function, where individuals tend to evaluate their HRQL as "gains" or "losses" relative to a reference point evaluated at 50. This particular reference score originates from being a traditional "passing threshold" and the scale's midpoint. Several implications of this interpretation to the measurement of HRQL are discussed.
Collapse
|
533
|
Mentré F, Escolano S. Prediction discrepancies for the evaluation of nonlinear mixed-effects models. J Pharmacokinet Pharmacodyn 2005; 33:345-67. [PMID: 16284919 PMCID: PMC1989778 DOI: 10.1007/s10928-005-0016-4] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2005] [Accepted: 08/24/2005] [Indexed: 10/25/2022]
Abstract
Reliable estimation methods for non-linear mixed-effects models are now available and, although these models are increasingly used, only a limited number of statistical developments for their evaluation have been reported. We develop a criterion and a test to evaluate nonlinear mixed-effects models based on the whole predictive distribution. For each observation, we define the prediction discrepancy (pd) as the percentile of the observation in the whole marginal predictive distribution under H(0). We propose to compute prediction discrepancies using Monte Carlo integration which does not require model approximation. If the model is valid, these pd should be uniformly distributed over (0, 1) which can be tested by a Kolmogorov-Smirnov test. In a simulation study based on a standard population pharmacokinetic model, we compare and show the interest of this criterion with respect to the one most frequently used to evaluate nonlinear mixed-effects models: standardized prediction errors (spe) which are evaluated using a first order approximation of the model. Trends in pd can also be evaluated via several plots to check for specific departures from the model.
Collapse
|
534
|
Xiang L, Tse SK. Maximum likelihood estimation in survival studies under progressive interval censoring with random removals. J Biopharm Stat 2005; 15:981-91. [PMID: 16279356 DOI: 10.1080/10543400500266643] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Censoring occurs commonly in clinical trials. This article investigates a new censoring scheme, namely, Type II progressive interval censoring with random removals to cope with the setting that patients are examined at fixed regular intervals and dropouts may occur during the study period. We discuss the maximum likelihood estimation of the model parameters and derive the corresponding asymptotic variances when survival times are assumed to be Weibull distributed. An example is discussed to illustrate the application of the results under this censoring scheme.
Collapse
|
535
|
Vickers AJ. Parametric versus non-parametric statistics in the analysis of randomized trials with non-normally distributed data. BMC Med Res Methodol 2005; 5:35. [PMID: 16269081 PMCID: PMC1310536 DOI: 10.1186/1471-2288-5-35] [Citation(s) in RCA: 210] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2005] [Accepted: 11/03/2005] [Indexed: 11/10/2022] Open
Abstract
Background It has generally been argued that parametric statistics should not be applied to data with non-normal distributions. Empirical research has demonstrated that Mann-Whitney generally has greater power than the t-test unless data are sampled from the normal. In the case of randomized trials, we are typically interested in how an endpoint, such as blood pressure or pain, changes following treatment. Such trials should be analyzed using ANCOVA, rather than t-test. The objectives of this study were: a) to compare the relative power of Mann-Whitney and ANCOVA; b) to determine whether ANCOVA provides an unbiased estimate for the difference between groups; c) to investigate the distribution of change scores between repeat assessments of a non-normally distributed variable. Methods Polynomials were developed to simulate five archetypal non-normal distributions for baseline and post-treatment scores in a randomized trial. Simulation studies compared the power of Mann-Whitney and ANCOVA for analyzing each distribution, varying sample size, correlation and type of treatment effect (ratio or shift). Results Change between skewed baseline and post-treatment data tended towards a normal distribution. ANCOVA was generally superior to Mann-Whitney in most situations, especially where log-transformed data were entered into the model. The estimate of the treatment effect from ANCOVA was not importantly biased. Conclusion ANCOVA is the preferred method of analyzing randomized trials with baseline and post-treatment measures. In certain extreme cases, ANCOVA is less powerful than Mann-Whitney. Notably, in these cases, the estimate of treatment effect provided by ANCOVA is of questionable interpretability.
Collapse
|
536
|
Jeong J, Becker ER, Mauldin PD, Weintraub WS. A comparison of self-selectivity corrections in economic evaluations and outcomes research. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2005; 8:656-66. [PMID: 16283866 DOI: 10.1111/j.1524-4733.2005.00054.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
OBJECTIVE Two alternative selectivity correction methods have been widely applied in the health economics literature: the sample selection model (SSM) and the multipart model (MPM). The difference between these two approaches results from their initial assumptions about the distribution of error terms. Because the distributional assumptions cannot be theoretically verified, the usefulness of the methods can only be evaluated by real world comparison. This article reviews and empirically tests the two alternative selectivity correction methods to give a reality-based evaluation. METHODS Using a randomized sample of patients as the "gold standard," the SSM and MPM are applied to a nonrandomized sample of patients with an identical set of dependent and independent variables. By comparing the actual estimates of the two methods, we evaluate the robustness of the two approaches. RESULTS The results show that neither method is empirically robust in replicating the results of the randomized trial. There is no consistent pattern in the coefficients from either selectivity-correction method for replicating the coefficients in the randomized sample. CONCLUSIONS Researchers should be cautious in applying these correction methods, and any conclusions based on these approaches may need to be qualified.
Collapse
|
537
|
Juang KW, Lee DY, Teng YL. Adaptive sampling based on the cumulative distribution function of order statistics to delineate heavy-metal contaminated soils using kriging. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2005; 138:268-77. [PMID: 15936860 DOI: 10.1016/j.envpol.2005.04.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2004] [Accepted: 04/05/2005] [Indexed: 05/02/2023]
Abstract
Correctly classifying "contaminated" areas in soils, based on the threshold for a contaminated site, is important for determining effective clean-up actions. Pollutant mapping by means of kriging is increasingly being used for the delineation of contaminated soils. However, those areas where the kriged pollutant concentrations are close to the threshold have a high possibility for being misclassified. In order to reduce the misclassification due to the over- or under-estimation from kriging, an adaptive sampling using the cumulative distribution function of order statistics (CDFOS) was developed to draw additional samples for delineating contaminated soils, while kriging. A heavy-metal contaminated site in Hsinchu, Taiwan was used to illustrate this approach. The results showed that compared with random sampling, adaptive sampling using CDFOS reduced the kriging estimation errors and misclassification rates, and thus would appear to be a better choice than random sampling, as additional sampling is required for delineating the "contaminated" areas.
Collapse
|
538
|
Abstract
AIMS One aim was to disentangle how the shape and location of the BMI distribution changed among Swedish children over a 12 y period. Another aim was to identify the age during childhood when changes occurred or became manifest. METHODS Two population-based cohorts-2,591 children from Stockholm born 1985-1987 and 3,650 from Gothenburg born 1973-1975-were compared with respect to BMI distributions from 2 to 15 y of age. RESULTS Differences between the BMI distributions of the two cohorts were present from 5-6 y of age. From age 7, the children born in 1985-1987 and belonging to the upper parts of the BMI distribution, e.g. those above the 90th or 95th BMI percentiles, had much higher BMI mean values compared to their counterparts born 12 y earlier. Comparisons with respect to the 5th, 10th, 25th, 50th, 75th, 90th and 95th BMI percentiles showed that changes appeared above the 25th percentile and became increasingly pronounced in the upper parts of the BMI distributions. CONCLUSION School-aged children in the rightmost parts of the BMI distributions may be more susceptible to "obesogenic" environmental exposures than those in the middle or leftmost parts. The results support the suggestion that the period of BMI rebound is critical for the development of obesity.
Collapse
|
539
|
Pigolotti S, Flammini A, Marsili M, Maritan A. Species lifetime distribution for simple models of ecologies. Proc Natl Acad Sci U S A 2005; 102:15747-51. [PMID: 16236730 PMCID: PMC1276042 DOI: 10.1073/pnas.0502648102] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2005] [Accepted: 09/07/2005] [Indexed: 11/18/2022] Open
Abstract
Interpretation of empirical results based on a taxa's lifetime distribution shows apparently conflicting results. Species' lifetime is reported to be exponentially distributed, whereas higher-order taxa, such as families or genera, follow a broader distribution, compatible with power-law decay. We show that both forms of evidence are consistent with a simple evolutionary model that does not require specific assumptions on species interaction. The model provides a zero-order description of the dynamics of ecological communities, and its species lifetime distribution can be computed exactly. Different behaviors are found as follows: an initial t(-3/2) power law, emerging from a random walk type of dynamics, which crosses over to a steeper t(-2) branching process-like regime and finally is cut off by an exponential decay that becomes weaker and weaker as the total population increases. Sampling effects also can be taken into account and shown to be relevant. If species in the fossil record were sampled according to the Fisher log-series distribution, lifetime should be distributed according to a t(-1) power law. Such variability of behaviors in a simple model, combined with the scarcity of data available, casts serious doubt on the possibility of validating theories of evolution on the basis of species lifetime data.
Collapse
|
540
|
Chen YC, Luo MB. Dynamic Monte Carlo study on the probability distribution functions of tail-like polymer chain. J Zhejiang Univ Sci B 2005; 6:1130-4. [PMID: 16252349 PMCID: PMC1390662 DOI: 10.1631/jzus.2005.b1130] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The configurational properties of tail-like polymer chains with one end attached to a flat surface are studied by using dynamic Monte Carlo technique. We find that the probability distribution of the free end in z direction P(R(z)) and the density profile rho(z) can be scaled approximately by a factor beta to be a length independent function for both random walking (RW) and self-avoiding walking (SAW) tail-like chains, where the factor beta is related to the mean square end-to-end distance <R(2)>. The scaled P(R(z)) of the SAW chain roughly overlaps that of the RW chain, but the scaled rho(z) of the SAW chain locates at smaller betaz than that of the RW chain.
Collapse
|
541
|
Victorov A, Radke C, Prausnitz J. Molecular thermodynamics for swelling of a mesoscopic ionomer gel in 1 : 1 salt solutions. Phys Chem Chem Phys 2005; 8:264-78. [PMID: 16482269 DOI: 10.1039/b512748c] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
For a microphase-separated diblock copolymer ionic gel swollen in salt solution, a molecular-thermodynamic model is based on the self-consistent field theory in the limit of strongly segregated copolymer subchains. The geometry of microdomains is described using the Milner generic wedge construction neglecting the packing frustration. A geometry-dependent generalized analytical solution for the linearized Poisson-Boltzmann equation is obtained. This generalized solution not only reduces to those known previously for planar, cylindrical and spherical geometries, but is also applicable to saddle-like structures. Thermodynamic functions are expressed analytically for gels of lamellar, bicontinuous, cylindrical and spherical morphologies. Molecules are characterized by chain composition, length, rigidity, degree of ionization, and by effective polymer-polymer and polymer-solvent interaction parameters. The model predicts equilibrium solvent uptakes and the equilibrium microdomain spacing for gels swollen in salt solutions. Results are given for details of the gel structure: distribution of mobile ions and polymer segments, and the electric potential across microdomains. Apart from effects obtained by coupling the classical Flory-Rehner theory with Donnan equilibria, viz. increased swelling with polyelectrolyte charge and shrinking of gel upon addition of salt, the model predicts the effects of microphase morphology on swelling.
Collapse
|
542
|
Abstract
We apply methods from statistical physics (histograms, correlation functions, fractal dimensions, and singularity spectra) to characterize large-scale structure of the distribution of nucleotides along genomic sequences. We discuss the role of the extension of noncoding segments ("junk DNA") for the genomic organization, and the connection between the coding segment distribution and the high-eukaryotic chromatin condensation. The following sequences taken from GenBank were analyzed: complete genome of Xanthomonas campestri, complete genome of yeast, chromosome V of Caenorhabditis elegans, and human chromosome XVII around gene BRCA1. The results are compared with the random and periodic sequences and those generated by simple and generalized fractal Cantor sets.
Collapse
|
543
|
Solberg HE, Lahti A. Detection of outliers in reference distributions: performance of Horn's algorithm. Clin Chem 2005; 51:2326-32. [PMID: 16223885 DOI: 10.1373/clinchem.2005.058339] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND Medical laboratory reference data may be contaminated with outliers that should be eliminated before estimation of the reference interval. A statistical test for outliers has been proposed by Paul S. Horn and coworkers (Clin Chem 2001;47:2137-45). The algorithm operates in 2 steps: (a) mathematically transform the original data to approximate a gaussian distribution; and (b) establish detection limits (Tukey fences) based on the central part of the transformed distribution. METHODS We studied the specificity of Horn's test algorithm (probability of false detection of outliers), using Monte Carlo computer simulations performed on 13 types of probability distributions covering a wide range of positive and negative skewness. Distributions with 3% of the original observations replaced by random outliers were used to also examine the sensitivity of the test (probability of detection of true outliers). Three data transformations were used: the Box and Cox function (used in the original Horn's test), the Manly exponential function, and the John and Draper modulus function. RESULTS For many of the probability distributions, the specificity of Horn's algorithm was rather poor compared with the theoretical expectation. The cause for such poor performance was at least partially related to remaining nongaussian kurtosis (peakedness). The sensitivity showed great variation, dependent on both the type of underlying distribution and the location of the outliers (upper and/or lower tail). CONCLUSION Although Horn's algorithm undoubtedly is an improvement compared with older methods for outlier detection, reliable statistical identification of outliers in reference data remains a challenge.
Collapse
|
544
|
Valle F, Favre M, De Los Rios P, Rosa A, Dietler G. Scaling exponents and probability distributions of DNA end-to-end distance. PHYSICAL REVIEW LETTERS 2005; 95:158105. [PMID: 16241768 DOI: 10.1103/physrevlett.95.158105] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2005] [Indexed: 05/05/2023]
Abstract
The scaling of the average gyration radius of polymers as a function of their length can be experimentally determined from ensemble measurements, such as light scattering, and agrees with analytical estimates. Ensemble techniques, yet, do not give access to the full probability distributions. Single molecule techniques, instead, can deliver information on both average quantities and distribution functions. Here we exploit the high resolution of atomic force microscopy over long DNA molecules adsorbed on a surface to measure the average end-to-end distance as a function of the DNA length, and its full distribution function. We find that all the scaling exponents are close to the predicted 3D values (upsilon=0.589+/-0.006 and delta=2.58+/-0.77). These results suggest that the adsorption process is akin to a geometric projection from 3D to 2D, known to preserve the scaling properties of fractal objects of dimension df<2.
Collapse
|
545
|
Hansen JP. CAN'T MISS: conquer any number task by making important statistics simple. Part 7. Statistical process control: x-s control charts. J Healthc Qual 2005; 27:32-43. [PMID: 16201489 DOI: 10.1111/j.1945-1474.2005.tb00566.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Statistical process control (SPC) can be thought of as the frequent monitoring of processes using inferential statistics. The feature that distinguishes SPC from the typical use of inferential statistics for analyzing populations is that in the former frequent samples are taken over time, whereas in inferential statistics a single sample is generaLLy taken before and after some intervention or treatment. An x-s control chart is used to monitor a continuous variable that reflects the output of a process. The x-s control chart is a graph that includes serial sample means (x) as the variables of interest, a centerline that represents the grand mean of the samples (x), and upper control limit (UCL) and lower control limit (LCL) that represent three standard errors (SEx) above and below the centerline. An x-s control chart is used to estimate with 99.7% confidence that the population mean of a continuous output variable was within the interval defined by the UCL and LCL during a period of baseline monitoring. It is further assumed that if the process remains stable, future population means wiLL remain between the control Limits for additional process outputs. Control charts allow the evaluation of both common- and special-cause variation. AnaLysis of the common-cause variation aLLows an assessment of the current process performance. Special-cause variation is identified when there is a sample mean that is beyond the UCL or LCL.
Collapse
|
546
|
Mota M, del Puerto I, Ramos A. The bisexual branching process with population-size dependent mating as a mathematical model to describe phenomena concerning to inhabit or re-inhabit environments with animal species. Math Biosci 2005; 206:120-7. [PMID: 16197966 DOI: 10.1016/j.mbs.2005.01.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2004] [Revised: 12/17/2004] [Accepted: 01/06/2005] [Indexed: 11/19/2022]
Abstract
We consider the bisexual Galton-Watson branching process with population-size dependent mating as a mathematical model adequate for the description of some natural phenomena. More specifically we are interested in studying some questions about the problem of populating an environmental with new animal species or re-populating it with species which have previously disappeared.
Collapse
|
547
|
Abstract
MOTIVATION Phylogenetic networks are becoming an important tool in molecular evolution, as the evolutionary role of reticulate events, such as hybridization, horizontal gene transfer and recombination, is becoming more evident, and as the available data is dramatically increasing in quantity and quality. RESULTS This paper addresses the problem of computing a most parsimonious recombination network for an alignment of binary sequences that are assumed to have arisen under the 'infinite sites' model of evolution with recombinations. Using the concept of a splits network as the underlying datastructure, this paper shows how a recent method designed for the computation of hybridization networks can be extended to also compute recombination networks. A robust implementation of the approach is provided and is illustrated using a number of real biological datasets. AVAILABILITY Our implementation of this approach is freely available as part of the SplitsTree4 software, downloadable from www.splitstree.org
Collapse
|
548
|
Begelfor E, Werman M. How to put probabilities on homographies. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2005; 27:1666-70. [PMID: 16238000 DOI: 10.1109/tpami.2005.200] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
We present a family of "normal" distributions over a matrix group together with a simple method for estimating its parameters. In particular, the mean of a set of elements can be calculated. The approach is applied to planar projective homographies, showing that using priors defined in this way improves object recognition.
Collapse
|
549
|
Fridman T, Razumovskaya J, Verberkmoes N, Hurst G, Protopopescu V, Xu Y. The probability distribution for a random match between an experimental-theoretical spectral pair in tandem mass spectrometry. J Bioinform Comput Biol 2005; 3:455-76. [PMID: 15852515 DOI: 10.1142/s0219720005001120] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2004] [Revised: 10/22/2004] [Accepted: 10/30/2004] [Indexed: 11/18/2022]
Abstract
Proteomic techniques are fast becoming the main method for qualitative and quantitative determination of the protein content in biological systems. Despite notable advances, efficient and accurate analysis of high throughput proteomic data generated by mass spectrometers remains one of the major stumbling blocks in the protein identification problem. We present a model for the number of random matches between an experimental MS-MS spectrum and a theoretical spectrum of a peptide. The shape of the probability distribution is a function of the experimental accuracy, the number of peaks in the experimental spectrum, the length of the interval over which the peaks are distributed, and the number of theoretical spectral peaks in this interval. Based on this probability distribution, a goodness-of-fit tool can be used to yield fast and accurate scoring schemes for peptide identification through database search. In this paper, we describe one possible implementation of such a method and compare the performance of the resulting scoring function with that of SEQUEST. In terms of speed, our algorithm is roughly two orders of magnitude faster than the SEQUEST program, and its accuracy of peptide identification compares favorably to that of SEQUEST. Moreover, our algorithm does not use information related to the intensities of the peaks.
Collapse
|
550
|
Collin D, Ritort F, Jarzynski C, Smith SB, Tinoco I, Bustamante C. Verification of the Crooks fluctuation theorem and recovery of RNA folding free energies. Nature 2005; 437:231-4. [PMID: 16148928 PMCID: PMC1752236 DOI: 10.1038/nature04061] [Citation(s) in RCA: 518] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2005] [Accepted: 07/02/2005] [Indexed: 11/08/2022]
Abstract
Atomic force microscopes and optical tweezers are widely used to probe the mechanical properties of individual molecules and molecular interactions, by exerting mechanical forces that induce transitions such as unfolding or dissociation. These transitions often occur under nonequilibrium conditions and are associated with hysteresis effects-features usually taken to preclude the extraction of equilibrium information from the experimental data. But fluctuation theorems allow us to relate the work along nonequilibrium trajectories to thermodynamic free-energy differences. They have been shown to be applicable to single-molecule force measurements and have already provided information on the folding free energy of a RNA hairpin. Here we show that the Crooks fluctuation theorem can be used to determine folding free energies for folding and unfolding processes occurring in weak as well as strong nonequilibrium regimes, thereby providing a test of its validity under such conditions. We use optical tweezers to measure repeatedly the mechanical work associated with the unfolding and refolding of a small RNA hairpin and an RNA three-helix junction. The resultant work distributions are then analysed according to the theorem and allow us to determine the difference in folding free energy between an RNA molecule and a mutant differing only by one base pair, and the thermodynamic stabilizing effect of magnesium ions on the RNA structure.
Collapse
|