1
|
Modak S. A new interpoint distance-based clustering algorithm using kernel density estimation. COMMUN STAT-SIMUL C 2023. [DOI: 10.1080/03610918.2023.2179071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Affiliation(s)
- Soumita Modak
- Faculty of Statistics, Department of Statistics, Faculty of Statistics, University of Calcutta, Basanti Devi College, Kolkata, India
| |
Collapse
|
2
|
Analysis of distance matrices. Stat Probab Lett 2023. [DOI: 10.1016/j.spl.2022.109720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
3
|
Mukherjee A, Murakami H. Multivariate Kruskal–Wallis tests based on principal component score and latent source of independent component analysis. AUST NZ J STAT 2022. [DOI: 10.1111/anzs.12371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Amitava Mukherjee
- Production, Operations and Decision Sciences Area XLRI‐Xavier School of Management XLRI Jamshedpur Jamshedpur 831001JharkhandIndia
| | - Hidetoshi Murakami
- Department of Applied Mathematics Tokyo University of Science Shinjuku‐ku 162‐8601TokyoJapan
| |
Collapse
|
4
|
Modarres R. Nonparametric tests for detection of high dimensional outliers. J Nonparametr Stat 2022. [DOI: 10.1080/10485252.2022.2026945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Reza Modarres
- Department of Statistics, George Washington University, Washington, DC, USA
| |
Collapse
|
5
|
Kropf S, Antweiler K, Glimm E. Use of multivariate distance measures for high-dimensional data in tests for difference, superiority, equivalence and non-inferiority. Biom J 2021; 64:577-597. [PMID: 34862646 DOI: 10.1002/bimj.202000367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 08/09/2021] [Accepted: 09/13/2021] [Indexed: 11/10/2022]
Abstract
Tests based on pairwise distance measures for multivariate sample vectors are common in ecological studies but are usually restricted to two-sided tests for differences. In this paper, we investigate extensions to tests for superiority, equivalence and non-inferiority.
Collapse
Affiliation(s)
- Siegfried Kropf
- Institute of Biometry and Medical Informatics, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Kai Antweiler
- Department of Medical Statistics, Universitätsmedizin Göttingen, Germany
| | - Ekkehard Glimm
- Institute of Biometry and Medical Informatics, Otto-von-Guericke University Magdeburg, Magdeburg, Germany.,Novartis Pharma AG, Basel, Switzerland
| |
Collapse
|
6
|
Chen H, Xia Y. A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1953507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Hao Chen
- Department of Statistics, University of California at Davis, CA
| | - Yin Xia
- Department of Statistics, School of Management, Fudan University
| |
Collapse
|
7
|
|
8
|
Graphical tests of independence for general distributions. Comput Stat 2021. [DOI: 10.1007/s00180-021-01134-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
|
10
|
Improving covariance-regularized discriminant analysis for EHR-based predictive analytics of diseases. APPL INTELL 2021. [DOI: 10.1007/s10489-020-01810-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
11
|
Multivariate power series interpoint distances. STAT METHOD APPL-GER 2020. [DOI: 10.1007/s10260-020-00508-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
12
|
Konietschke F, Schwab K, Pauly M. Small sample sizes: A big data problem in high-dimensional data analysis. Stat Methods Med Res 2020; 30:687-701. [PMID: 33228480 PMCID: PMC8008424 DOI: 10.1177/0962280220970228] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In many experiments and especially in translational and preclinical research, sample sizes are (very) small. In addition, data designs are often high dimensional, i.e. more dependent than independent replications of the trial are observed. The present paper discusses the applicability of max t-test-type statistics (multiple contrast tests) in high-dimensional designs (repeated measures or multivariate) with small sample sizes. A randomization-based approach is developed to approximate the distribution of the maximum statistic. Extensive simulation studies confirm that the new method is particularly suitable for analyzing data sets with small sample sizes. A real data set illustrates the application of the methods.
Collapse
Affiliation(s)
- Frank Konietschke
- Charité-Universitätsmedizin Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, Berlin, Germany.,Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, Berlin, Germany
| | - Karima Schwab
- Institute of Pharmacology, Charité-Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany
| | - Markus Pauly
- Department of Statistics, TU Dortmund University, Dortmund, Germany
| |
Collapse
|
13
|
Yaeger R, Paroder V, Bates DDB, Capanu M, Chou J, Tang L, Chatila W, Schultz N, Hersch J, Kelsen D. Systemic Chemotherapy for Metastatic Colitis-Associated Cancer Has a Worse Outcome Than Sporadic Colorectal Cancer: Matched Case Cohort Analysis. Clin Colorectal Cancer 2020; 19:e151-e156. [PMID: 32798155 DOI: 10.1016/j.clcc.2020.02.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 11/29/2019] [Accepted: 02/02/2020] [Indexed: 01/09/2023]
Abstract
BACKGROUND Colitis-associated cancers (CAC) are a catastrophic complication of inflammatory bowel disease; at diagnosis, CAC is frequently at an advanced stage. Although the genomic alterations (GA) in CAC are different from sporadic colorectal cancer (CRC), the same systemic therapies are used. We compared clinically relevant outcomes using standard care systemic chemotherapy of stage IV CAC versus a matched patient control cohort of stage IV CRC patients. PATIENTS AND METHODS A retrospective matched cohort design was used. Eighteen cases of stage IV CAC (7 ulcerative colitis, 11 Crohn disease) and 18 CRC were identified. GA analysis was available for all patients. Outcome endpoints included response rate and response duration, progression-free survival, and OS. RESULTS Although the response rates were similar (CAC 35.7% vs. CRC 57.1%, P = .45), the median duration of response for CAC was significantly shorter (1.4 months, vs. CRC 11.8 months, P = .006). There was no difference in dose density of first-line therapy between cohorts, suggesting that shorter response duration was due to more rapid development of chemotherapy resistance. Median OS was significantly shorter for CAC patients (13 vs. 27.6 months, P = .034). As expected, there was a difference in the spectrum of GA between CAC and CRC cohorts. However, GA associated with poor prognosis (eg, B-Raf) were no more frequent in the CAC cohort. CONCLUSION Clinically meaningful outcomes of duration of response and OS are worse for CAC versus sporadic CRC patients treated with FOLFOX or FOLFIRI as first therapy for metastatic disease.
Collapse
Affiliation(s)
| | | | | | | | - Joanne Chou
- Department of Epidemiology and Biostatistics
| | | | - Walid Chatila
- Department of Computational Oncology, Memorial Sloan-Kettering Cancer Center and Weil-Cornell Medical College, New York, NY
| | - Nikolaus Schultz
- Department of Computational Oncology, Memorial Sloan-Kettering Cancer Center and Weil-Cornell Medical College, New York, NY
| | | | | |
Collapse
|
14
|
Affiliation(s)
- Reza Modarres
- Department of Statistics George Washington University Washington DC USA
| |
Collapse
|
15
|
Song Y, Modarres R. Interpoint Distance Test of Homogeneity for Multivariate Mixture Models. Int Stat Rev 2019. [DOI: 10.1111/insr.12332] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Yu Song
- Department of StatisticsGeorge Washington University Washington 20052 DC USA
| | - Reza Modarres
- Department of StatisticsGeorge Washington University Washington 20052 DC USA
| |
Collapse
|
16
|
Affiliation(s)
- Yu Song
- Department of Statistics, George Washington University, Washington, DC, USA
| | - Reza Modarres
- Department of Statistics, George Washington University, Washington, DC, USA
| |
Collapse
|
17
|
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med 2019; 38:2074-2102. [PMID: 30652356 PMCID: PMC6492164 DOI: 10.1002/sim.8086] [Citation(s) in RCA: 483] [Impact Index Per Article: 96.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 08/23/2018] [Accepted: 11/02/2018] [Indexed: 12/11/2022]
Abstract
Simulation studies are computer experiments that involve creating data by pseudo-random sampling. A key strength of simulation studies is the ability to understand the behavior of statistical methods because some "truth" (usually some parameter/s of interest) is known from the process of generating the data. This allows us to consider properties of methods, such as bias. While widely used, simulation studies are often poorly designed, analyzed, and reported. This tutorial outlines the rationale for using simulation studies and offers guidance for design, execution, analysis, reporting, and presentation. In particular, this tutorial provides a structured approach for planning and reporting simulation studies, which involves defining aims, data-generating mechanisms, estimands, methods, and performance measures ("ADEMP"); coherent terminology for simulation studies; guidance on coding simulation studies; a critical discussion of key performance measures and their estimation; guidance on structuring tabular and graphical presentation of results; and new graphical presentations. With a view to describing recent practice, we review 100 articles taken from Volume 34 of Statistics in Medicine, which included at least one simulation study and identify areas for improvement.
Collapse
Affiliation(s)
- Tim P. Morris
- London Hub for Trials Methodology ResearchMRC Clinical Trials Unit at UCLLondonUnited Kingdom
| | - Ian R. White
- London Hub for Trials Methodology ResearchMRC Clinical Trials Unit at UCLLondonUnited Kingdom
| | - Michael J. Crowther
- Biostatistics Research Group, Department of Health SciencesUniversity of LeicesterLeicesterUnited Kingdom
| |
Collapse
|
18
|
|
19
|
Herzlinger G, Grosman L. AGMT3-D: A software for 3-D landmarks-based geometric morphometric shape analysis of archaeological artifacts. PLoS One 2018; 13:e0207890. [PMID: 30458049 PMCID: PMC6245792 DOI: 10.1371/journal.pone.0207890] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2018] [Accepted: 11/07/2018] [Indexed: 11/29/2022] Open
Abstract
We present here a newly developed software package named Artifact GeoMorph Toolbox 3-D (AGMT3-D). It is intended to provide archaeologists with a simple and easy-to-use tool for performing 3-D landmarks-based geometric morphometric shape analysis on 3-D digital models of archaeological artifacts. It requires no prior knowledge of programming or proficiency in statistics. AGMT3-D consists of a data-acquisition procedure for automatically positioning 3-D models in space and fitting them with grids of 3-D semi-landmarks. It also provides a number of analytical tools and procedures that allow the processing and statistical analysis of the data, including generalized Procrustes analysis, principal component analysis, a warp tool, automatic calculation of shape variabilities and statistical tests. It provides an output of quantitative, objective and reproducible results in numerical, textual and graphic formats. These can be used to answer archaeologically significant questions relating to morphologies and morphological variabilities in artifact assemblages. Following the presentation of the software and its functions, we apply it to a case study addressing the effects of different types of raw material on the morphologies and morphological variabilities present in an experimentally produced Acheulian handaxe assemblage. The results show that there are statistically significant differences between the mean shapes and shape variabilities of handaxes produced on flint and those produced on basalt. With AGMT3-D, users can analyze artifact assemblages and address questions that are deducible from the morphologies and morphological variabilities of material culture assemblages. These questions can relate to issues of, among others, relative chronology, cultural affinities, tool function and production technology. AGMT3-D is aimed at making 3-D landmarks-based geometric morphometric shape analysis more accessible to archaeologists, in the hope that this method will become a tool commonly used by archaeologists.
Collapse
Affiliation(s)
- Gadi Herzlinger
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
- The Jack, Joseph and Morton Mandel School for Advanced Studies in the Humanities, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Leore Grosman
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
20
|
Dai X, Niu C, Guo X. Testing for central symmetry and inference of the unknown center. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2018.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
21
|
Multinomial interpoint distances. Stat Pap (Berl) 2018. [DOI: 10.1007/s00362-016-0766-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
22
|
Marozzi M. Tests for comparison of multiple endpoints with application to omics data. Stat Appl Genet Mol Biol 2018; 17:sagmb-2017-0033. [PMID: 29381476 DOI: 10.1515/sagmb-2017-0033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In biomedical research, multiple endpoints are commonly analyzed in "omics" fields like genomics, proteomics and metabolomics. Traditional methods designed for low-dimensional data either perform poorly or are not applicable when analyzing high-dimensional data whose dimension is generally similar to, or even much larger than, the number of subjects. The complex biochemical interplay between hundreds (or thousands) of endpoints is reflected by complex dependence relations. The aim of the paper is to propose tests that are very suitable for analyzing omics data because they do not require the normality assumption, are powerful also for small sample sizes, in the presence of complex dependence relations among endpoints, and when the number of endpoints is much larger than the number of subjects. Unbiasedness and consistency of the tests are proved and their size and power are assessed numerically. It is shown that the proposed approach based on the nonparametric combination of dependent interpoint distance tests is very effective. Applications to genomics and metabolomics are discussed.
Collapse
Affiliation(s)
- Marco Marozzi
- University of Venice, Via Torino 155, 30172 Venezia, Italy
| |
Collapse
|
23
|
Lee K, Song H, Yoo JK. Dimension test approach of heteroscedasticity in the linear model. COMMUN STAT-SIMUL C 2017. [DOI: 10.1080/03610918.2015.1117636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Keunbaik Lee
- Department of Statistics, Sungkyunkwan University, Seoul, Republic of Korea
| | - Hyejin Song
- Department of Statistics, Ewha Womans University, Seoul, Republic of Korea
| | - Jae Keun Yoo
- Department of Statistics, Ewha Womans University, Seoul, Republic of Korea
| |
Collapse
|
24
|
Cai F, Liu S, Dijke PT, Verbeek FJ. Image Analysis and Pattern Extraction of Proteins Classes from One-Dimensional Gels Electrophoresis. ACTA ACUST UNITED AC 2017. [DOI: 10.17706/ijbbb.2017.7.4.201-212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
25
|
Grogan TR, Elashoff DA. A simulation based method for assessing the statistical significance of logistic regression models after common variable selection procedures. COMMUN STAT-SIMUL C 2016; 46:7180-7193. [PMID: 29225408 PMCID: PMC5722241 DOI: 10.1080/03610918.2016.1230216] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 08/22/2016] [Indexed: 10/20/2022]
Abstract
Classification models can demonstrate apparent prediction accuracy even when there is no underlying relationship between the predictors and the response. Variable selection procedures can lead to false positive variable selections and overestimation of true model performance. A simulation study was conducted using logistic regression with forward stepwise, best subsets, and LASSO variable selection methods with varying total sample sizes (20, 50, 100, 200) and numbers of random noise predictor variables (3, 5, 10, 15, 20, 50). Using our critical values can help reduce needless follow-up on variables having no true association with the outcome.
Collapse
Affiliation(s)
- Tristan R. Grogan
- Department of Medicine Statistics Core, University of California, Los Angeles, CA
| | - David A. Elashoff
- Department of Medicine Statistics Core, University of California, Los Angeles, CA
| |
Collapse
|
26
|
Arboretti Giancristofaro R, Bonnini S, Corain L, Salmaso L. Dependency and truncated forms of combinations in multivariate combination-based permutation tests and ordered categorical variables. J STAT COMPUT SIM 2016. [DOI: 10.1080/00949655.2016.1177826] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
27
|
Marozzi M. Does bad inference drive out good? Clin Exp Pharmacol Physiol 2016; 42:727-33. [PMID: 25974387 DOI: 10.1111/1440-1681.12422] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Revised: 03/09/2015] [Accepted: 05/07/2015] [Indexed: 11/26/2022]
Abstract
The (mis)use of statistics in practice is widely debated, and a field where the debate is particularly active is medicine. Many scholars emphasize that a large proportion of published medical research contains statistical errors. It has been noted that top class journals like Nature Medicine and The New England Journal of Medicine publish a considerable proportion of papers that contain statistical errors and poorly document the application of statistical methods. This paper joins the debate on the (mis)use of statistics in the medical literature. Even though the validation process of a statistical result may be quite elusive, a careful assessment of underlying assumptions is central in medicine as well as in other fields where a statistical method is applied. Unfortunately, a careful assessment of underlying assumptions is missing in many papers, including those published in top class journals. In this paper, it is shown that nonparametric methods are good alternatives to parametric methods when the assumptions for the latter ones are not satisfied. A key point to solve the problem of the misuse of statistics in the medical literature is that all journals have their own statisticians to review the statistical method/analysis section in each submitted paper.
Collapse
|