1
|
Munko M, Ditzhaus M, Dobler D, Genuneit J. RMST-based multiple contrast tests in general factorial designs. Stat Med 2024; 43:1849-1866. [PMID: 38402907 DOI: 10.1002/sim.10017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 11/13/2023] [Accepted: 01/06/2024] [Indexed: 02/27/2024]
Abstract
Several methods in survival analysis are based on the proportional hazards assumption. However, this assumption is very restrictive and often not justifiable in practice. Therefore, effect estimands that do not rely on the proportional hazards assumption are highly desirable in practical applications. One popular example for this is the restricted mean survival time (RMST). It is defined as the area under the survival curve up to a prespecified time point and, thus, summarizes the survival curve into a meaningful estimand. For two-sample comparisons based on the RMST, previous research found the inflation of the type I error of the asymptotic test for small samples and, therefore, a two-sample permutation test has already been developed. The first goal of the present paper is to further extend the permutation test for general factorial designs and general contrast hypotheses by considering a Wald-type test statistic and its asymptotic behavior. Additionally, a groupwise bootstrap approach is considered. Moreover, when a global test detects a significant difference by comparing the RMSTs of more than two groups, it is of interest which specific RMST differences cause the result. However, global tests do not provide this information. Therefore, multiple tests for the RMST are developed in a second step to infer several null hypotheses simultaneously. Hereby, the asymptotically exact dependence structure between the local test statistics is incorporated to gain more power. Finally, the small sample performance of the proposed global and multiple testing procedures is analyzed in simulations and illustrated in a real data example.
Collapse
Affiliation(s)
- Merle Munko
- Department of Mathematics, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Marc Ditzhaus
- Department of Mathematics, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Dennis Dobler
- Department of Mathematics, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Jon Genuneit
- Department of Pediatrics, Leipzig University, Leipzig, Germany
| |
Collapse
|
2
|
Schreck N, Slynko A, Saadati M, Benner A. Statistical plasmode simulations-Potentials, challenges and recommendations. Stat Med 2024; 43:1804-1825. [PMID: 38356231 DOI: 10.1002/sim.10012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 12/18/2023] [Accepted: 01/02/2024] [Indexed: 02/16/2024]
Abstract
Statistical data simulation is essential in the development of statistical models and methods as well as in their performance evaluation. To capture complex data structures, in particular for high-dimensional data, a variety of simulation approaches have been introduced including parametric and the so-called plasmode simulations. While there are concerns about the realism of parametrically simulated data, it is widely claimed that plasmodes come very close to reality with some aspects of the "truth" known. However, there are no explicit guidelines or state-of-the-art on how to perform plasmode data simulations. In the present paper, we first review existing literature and introduce the concept of statistical plasmode simulation. We then discuss advantages and challenges of statistical plasmodes and provide a step-wise procedure for their generation, including key steps to their implementation and reporting. Finally, we illustrate the concept of statistical plasmodes as well as the proposed plasmode generation procedure by means of a public real RNA data set on breast carcinoma patients.
Collapse
Affiliation(s)
- Nicholas Schreck
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Alla Slynko
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Maral Saadati
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Axel Benner
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
3
|
Chiu LW, Ku YE, Chao HJ, Lie WN, Chan FY, Wang SY, Shen WC, Chen HY. Machine Learning Algorithms to Predict Colistin-Induced Nephrotoxicity from Electronic Health Records in Patients with Multidrug-Resistant Gram-Negative Infection. Int J Antimicrob Agents 2024:107175. [PMID: 38642812 DOI: 10.1016/j.ijantimicag.2024.107175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 02/29/2024] [Accepted: 04/12/2024] [Indexed: 04/22/2024]
Abstract
OBJECTIVES Colistin-induced nephrotoxicity prolongs hospitalization and increases mortality. The study aimed to construct machine learning models to predict colistin-induced nephrotoxicity in patients with multidrug-resistant gram-negative infection. METHODS Patients receiving colistin from three hospitals in the Clinical Research Database were included. Data were divided into a derivation cohort (2011∼2017) and a temporal validation cohort (2018∼2020). Fifteen machine learning models were established by categorical boosting, light gradient boosting machine, and random forest. Classifier performances were compared by the sensitivity, F1 score, Matthews correlation coefficient (MCC), area under the receiver operating characteristic (AUROC) curve, and area under the precision-recall curve (AUPRC). SHapley Additive exPlanations plots were drawn to understand feature importance and interactions. RESULTS The study included 1392 patients, with 360 (36.4%) and 165 (40.9%) experiencing nephrotoxicity in the derivation and temporal validation cohorts, respectively. The categorical boosting with oversampling achieved the highest performance with a sensitivity of 0.860, an F1 score of 0.740, an MCC of 0.533, an AUROC curve of 0.823, and an AUPRC of 0.737. The feature importance demonstrated that the days of colistin use, cumulative dose, daily dose, latest C-reactive protein, and baseline hemoglobin were the most important risk factors, especially for vulnerable patients. A cutoff colistin dose of 4.0 mg/kg body weight/day was identified for patients at higher risk of nephrotoxicity. CONCLUSIONS Machine learning techniques can be an early identification tool to predict colistin-induced nephrotoxicity. The observed interactions suggest a modification in dose adjustment guidelines. Future geographic and prospective validation studies are warranted to strengthen the real-world applicability.
Collapse
Affiliation(s)
- Ling-Wan Chiu
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan; Department of Pharmacy, Shuang Ho Hospital, Taipei Medical University, New Taipei City, Taiwan
| | - Yi-En Ku
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Horng-Jiun Chao
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Wen-Nung Lie
- Department of Electrical Engineering, National Chung Cheng University, Chiayi, Taiwan
| | - Fan Ying Chan
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - San-Yuan Wang
- Pharmacogenomics and Pharmacoproteomics, College of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Wan-Chen Shen
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan; Department of Pharmacy, Shuang Ho Hospital, Taipei Medical University, New Taipei City, Taiwan
| | - Hsiang-Yin Chen
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan; Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan.
| |
Collapse
|
4
|
Ancillotto L, Amori G, Capizzi D, Cignini B, Zapparoli M, Mori E. No city for wetland species: habitat associations affect mammal persistence in urban areas. Proc Biol Sci 2024; 291:20240079. [PMID: 38471547 DOI: 10.1098/rspb.2024.0079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 02/09/2024] [Indexed: 03/14/2024] Open
Abstract
The fast rate of replacement of natural areas by expanding cities is a key threat to wildlife worldwide. Many wild species occur in cities, yet little is known on the dynamics of urban wildlife assemblages due to species' extinction and colonization that may occur in response to the rapidly evolving conditions within urban areas. Namely, species' ability to spread within urban areas, besides habitat preferences, is likely to shape the fate of species once they occur in a city. Here we use a long-term dataset on mammals occurring in one of the largest and most ancient cities in Europe to assess whether and how spatial spread and association with specific habitats drive the probability of local extinction within cities. Our analysis included mammalian records dating between years 1832 and 2023, and revealed that local extinctions in urban areas are biased towards species associated with wetlands and that were naturally rare within the city. Besides highlighting the role of wetlands within urban areas for conserving wildlife, our work also highlights the importance of long-term biodiversity monitoring in highly dynamic habitats such as cities, as a key asset to better understand wildlife trends and thus foster more sustainable and biodiversity-friendly cities.
Collapse
Affiliation(s)
- Leonardo Ancillotto
- National Research Council (CNR), Institute for the Research on Terrestrial Ecosystems (IRET), via della Madonna del Piano 10, 50019 Sesto Fiorentino, Italy
- National Biodiversity Future Center (NBFC), Palermo, Italy
| | - Giovanni Amori
- National Research Council (CNR), Institute for the Research on Terrestrial Ecosystems (IRET), via della Madonna del Piano 10, 50019 Sesto Fiorentino, Italy
| | - Dario Capizzi
- Latium Region Directorate for Environment, Via di Campo Romano 65, 00173 Rome, Italy
| | - Bruno Cignini
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
| | - Marzio Zapparoli
- Department for Innovation in Biological, Agro-food and Forest systems (DIBAF), Università degli Studi della Tuscia, via San Camillo de Lellis snc, 01100 Viterbo, Italy
| | - Emiliano Mori
- National Research Council (CNR), Institute for the Research on Terrestrial Ecosystems (IRET), via della Madonna del Piano 10, 50019 Sesto Fiorentino, Italy
- National Biodiversity Future Center (NBFC), Palermo, Italy
| |
Collapse
|
5
|
Gurarie D, Mondal A, Ndeffo-Mbah ML. Improved Assessment of Schistosoma Community Infection Through Data Resampling Method. Open Forum Infect Dis 2024; 11:ofad659. [PMID: 38328495 PMCID: PMC10847808 DOI: 10.1093/ofid/ofad659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 12/18/2023] [Indexed: 02/09/2024] Open
Abstract
Background The conventional diagnostic for Schistosoma mansoni infection is stool microscopy with the Kato-Katz (KK) technique to detect eggs. Its outcomes are highly variable on a day-to-day basis and may lead to biased estimates of community infection used to inform public health programs. Our goal is to develop a resampling method that leverages data from a large-scale randomized trial to accurately predict community infection. Methods We developed a resampling method that provides unbiased community estimates of prevalence, intensity and other statistics for S mansoni infection when a community survey is conducted using KK stool microscopy with a single sample per host. It leverages a large-scale data set, collected in the Schistosomiasis Consortium for Operational Research and Evaluation (SCORE) project, and allows linking single-stool specimen community screening to its putative multiday "true statistics." Results SCORE data analysis reveals the limited sensitivity of KK stool microscopy and systematic bias of single-day community testing versus multiday testing; for prevalence estimate, it can fall up to 50% below the true value. The proposed SCORE cluster method reduces systematic bias and brings the estimated prevalence values within 5%-10% of the true value. This holds for a broad swath of transmission settings, including SCORE communities, and other data sets. Conclusions Our SCORE cluster method can markedly improve the S mansoni prevalence estimate in settings using stool microscopy.
Collapse
Affiliation(s)
- David Gurarie
- Department of Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University, Cleveland, Ohio, USA
- Center for Global Health and Diseases, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Anirban Mondal
- Department of Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University, Cleveland, Ohio, USA
| | - Martial L Ndeffo-Mbah
- Department of Veterinary and Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, Texas, USA
- Department of Epidemiology and Biostatistics, School of Public Health, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
6
|
Zhou T, Jiao H. Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment. Educ Psychol Meas 2023; 83:831-854. [PMID: 37398846 PMCID: PMC10311957 DOI: 10.1177/00131644221117193] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning algorithm for cheating detection. Furthermore, no study addressed the issue of class imbalance using resampling. This study explored the application of the stacking ensemble machine learning algorithm to analyze the item response, response time, and augmented data of test-takers to detect cheating behaviors. The performance of the stacking method was compared with that of two other ensemble methods (bagging and boosting) as well as six base non-ensemble machine learning algorithms. Issues related to class imbalance and input features were addressed. The study results indicated that stacking, resampling, and feature sets including augmented summary data generally performed better than its counterparts in cheating detection. Compared with other competing machine learning algorithms investigated in this study, the meta-model from stacking using discriminant analysis based on the top two base models-Gradient Boosting and Random Forest-generally performed the best when item responses and the augmented summary statistics were used as the input features with an under-sampling ratio of 10:1 among all the study conditions.
Collapse
Affiliation(s)
- Todd Zhou
- Winston Churchill High School, Potomac, MD, USA
| | - Hong Jiao
- University of Maryland, College Park, USA
| |
Collapse
|
7
|
Welvaars K, Oosterhoff JHF, van den Bekerom MPJ, Doornberg JN, van Haarst EP. Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data. JAMIA Open 2023; 6:ooad033. [PMID: 37266187 PMCID: PMC10232287 DOI: 10.1093/jamiaopen/ooad033] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/04/2023] [Accepted: 05/11/2023] [Indexed: 06/03/2023] Open
Abstract
Objective When correcting for the "class imbalance" problem in medical data, the effects of resampling applied on classifier algorithms remain unclear. We examined the effect on performance over several combinations of classifiers and resampling ratios. Materials and Methods Multiple classification algorithms were trained on 7 resampled datasets: no correction, random undersampling, 4 ratios of Synthetic Minority Oversampling Technique (SMOTE), and random oversampling with the Adaptive Synthetic algorithm (ADASYN). Performance was evaluated in Area Under the Curve (AUC), precision, recall, Brier score, and calibration metrics. A case study on prediction modeling for 30-day unplanned readmissions in previously admitted Urology patients was presented. Results For most algorithms, using resampled data showed a significant increase in AUC and precision, ranging from 0.74 (CI: 0.69-0.79) to 0.93 (CI: 0.92-0.94), and 0.35 (CI: 0.12-0.58) to 0.86 (CI: 0.81-0.92) respectively. All classification algorithms showed significant increases in recall, and significant decreases in Brier score with distorted calibration overestimating positives. Discussion Imbalance correction resulted in an overall improved performance, yet poorly calibrated models. There can still be clinical utility due to a strong discriminating performance, specifically when predicting only low and high risk cases is clinically more relevant. Conclusion Resampling data resulted in increased performances in classification algorithms, yet produced an overestimation of positive predictions. Based on the findings from our case study, a thoughtful predefinition of the clinical prediction task may guide the use of resampling techniques in future studies aiming to improve clinical decision support tools.
Collapse
Affiliation(s)
- Koen Welvaars
- Corresponding Author: Koen Welvaars, MSc, Data Science Team, OLVG, Jan Tooropstraat 164, 1061 AE Amsterdam, the Netherlands;
| | | | | | | | | | | |
Collapse
|
8
|
Héberger K. Selection of optimal validation methods for quantitative structure-activity relationships and applicability domain. SAR QSAR Environ Res 2023:1-20. [PMID: 37227317 DOI: 10.1080/1062936x.2023.2214871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This brief literature survey groups the (numerical) validation methods and emphasizes the contradictions and confusion considering bias, variance and predictive performance. A multicriteria decision-making analysis has been made using the sum of absolute ranking differences (SRD), illustrated with five case studies (seven examples). SRD was applied to compare external and cross-validation techniques, indicators of predictive performance, and to select optimal methods to determine the applicability domain (AD). The ordering of model validation methods was in accordance with the sayings of original authors, but they are contradictory within each other, suggesting that any variant of cross-validation can be superior or inferior to other variants depending on the algorithm, data structure and circumstances applied. A simple fivefold cross-validation proved to be superior to the Bayesian Information Criterion in the vast majority of situations. It is simply not sufficient to test a numerical validation method in one situation only, even if it is a well defined one. SRD as a preferable multicriteria decision-making algorithm is suitable for tailoring the techniques for validation, and for the optimal determination of the applicability domain according to the dataset in question.
Collapse
Affiliation(s)
- K Héberger
- Plasma Chemistry Research Group, Institute of Materials and Environmental Chemistry, Research Centre for Natural Sciences, Institute of Excellence of the Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
9
|
Eysenbach G, Chao HJ, Chiang YC, Chen HY. Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation. J Med Internet Res 2023; 25:e43734. [PMID: 36749620 PMCID: PMC9944157 DOI: 10.2196/43734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/25/2022] [Accepted: 01/16/2023] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects. OBJECTIVE We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning-based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds. METHODS This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling-edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F1-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed. RESULTS The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F1-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F1-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features. CONCLUSIONS Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support.
Collapse
Affiliation(s)
| | - Horng-Jiun Chao
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Yi-Chun Chiang
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| | - Hsiang-Yin Chen
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
10
|
Jablonski A. Rational Resampling Ratio as Enhancement to Shaft Imbalance Detection. Sensors (Basel) 2023; 23:1719. [PMID: 36772754 PMCID: PMC9920726 DOI: 10.3390/s23031719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 01/13/2023] [Accepted: 01/16/2023] [Indexed: 06/18/2023]
Abstract
Trend analysis is one of the most powerful techniques for monitoring the technical condition of individual mechanical components of rotating machinery. It is based on extraction of characteristic signal components according to kinetostatic configuration of the machine drivetrain. It has been used for decades and is well-understood. However, classical trend analysis is based on some assumptions which have resulted from the limited computational power of embedded systems years ago. This paper tries to answer a question on whether the assumption of a single signal resampling path for calculation of signal components generated by shafts with rational transmission ratio is valid. The study was conducted using an extensive imbalance test on a medium-power test rig. The paper originally demonstrates that application of an advanced resampling algorithm does not significantly influence the overall trend increase, but it is of utmost importance when trend variance is of interest.
Collapse
Affiliation(s)
- Adam Jablonski
- Department of Robotics and Mechatronics, Faculty of Mechanical Engineering and Robotics, AGH University of Science and Technology, 30-059 Krakow, Poland
| |
Collapse
|
11
|
Hong F, Tian L, Devanarayan V. Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models. Mathematics (Basel) 2023; 11:557. [PMID: 37990696 PMCID: PMC10660556 DOI: 10.3390/math11030557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
High-dimensional data applications often entail the use of various statistical and machine-learning algorithms to identify an optimal signature based on biomarkers and other patient characteristics that predicts the desired clinical outcome in biomedical research. Both the composition and predictive performance of such biomarker signatures are critical in various biomedical research applications. In the presence of a large number of features, however, a conventional regression analysis approach fails to yield a good prediction model. A widely used remedy is to introduce regularization in fitting the relevant regression model. In particular, a L 1 penalty on the regression coefficients is extremely useful, and very efficient numerical algorithms have been developed for fitting such models with different types of responses. This L 1 -based regularization tends to generate a parsimonious prediction model with promising prediction performance, i.e., feature selection is achieved along with construction of the prediction model. The variable selection, and hence the composition of the signature, as well as the prediction performance of the model depend on the choice of the penalty parameter used in the L 1 regularization. The penalty parameter is often chosen by K-fold cross-validation. However, such an algorithm tends to be unstable and may yield very different choices of the penalty parameter across multiple runs on the same dataset. In addition, the predictive performance estimates from the internal cross-validation procedure in this algorithm tend to be inflated. In this paper, we propose a Monte Carlo approach to improve the robustness of regularization parameter selection, along with an additional cross-validation wrapper for objectively evaluating the predictive performance of the final model. We demonstrate the improvements via simulations and illustrate the application via a real dataset.
Collapse
Affiliation(s)
- Feng Hong
- Takeda Pharmaceuticals, Cambridge, MA 02139, USA
| | - Lu Tian
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Viswanath Devanarayan
- Eisai Inc., Nutley, NJ 07110, USA
- Department of Mathematics, Statistics, and Computer Science, University of Illinois Chicago, Chicago, IL 60607, USA
| |
Collapse
|
12
|
Galyean ML, Tedeschi LO. Predicting microbial crude protein synthesis in cattle from intakes of dietary energy and crude protein. J Anim Sci 2023; 101:skad359. [PMID: 37843507 PMCID: PMC10601907 DOI: 10.1093/jas/skad359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 10/13/2023] [Indexed: 10/17/2023] Open
Abstract
Accurate predictions of microbial crude protein (MCP) synthesis are needed to predict metabolizable protein supply in ruminants. Since 1996, the National Academies of Sciences, Engineering, and Medicine series on beef cattle nutrient requirements has used the intake of total digestible nutrients (TDN) to predict ruminal MCP synthesis. Because various tabular energy values for feeds are highly correlated, our objective was to determine whether intakes of digestible energy (DE), metabolizable energy (ME), and net energy for maintenance (NEm) could be used as predictors of MCP synthesis in beef cattle. A published database of 285 treatment means from experiments that evaluated MCP synthesis was updated with 50 additional treatment mean observations. When intakes of TDN, fat-free TDN, DE, ME, NEm, dry matter, organic matter, crude protein (CP), ether extract, neutral detergent fiber, and starch were used in a stepwise regression analysis to predict MCP, only intakes of DE and CP met the P < 0.10 criterion for entry into the model. Mixed-model regression analyses were used to adjust for random intercept and slope effects of citations to evaluate intake of DE alone or in combination with CP intake as predictors of MCP synthesis, and the intakes of TDN, ME, and NEm as alternatives to DE intake. Similar precisions in predicting MCP synthesis were obtained with all measures of energy intake (CV = root mean square error [RMSE] as a percentage of the overall mean MCP varied from 9% to 9.67%), and adding CP intake to statistical models increased precision (CV ranged from 8.43% to 9.39%). Resampling analyses were used to evaluate observed vs. predicted values for the various energy intake models with or without CP intake, as well as the TDN-based equation used in the current beef cattle nutrient requirements calculations. The coefficient of determination, concordance correlation coefficient, and RMSE of prediction as a percentage of the mean averaged 0.595%, 0.730%, and 28.6% for the four measures of energy intake, with average values of 0.630%, 0.757%, and 27.4%, respectively, for equations that included CP intake. The TDN equation adopted by the 2016 beef cattle nutrient requirements system yielded similar results to newly developed equations but had a slightly greater mean bias. We concluded that any of the measures of energy intake we evaluated can be used to predict MCP synthesis by beef cattle and that adding CP intake improves model precision.
Collapse
Affiliation(s)
- M L Galyean
- Department of Veterinary Sciences, Texas Tech University, Lubbock, TX 79409-2123USA
| | - L O Tedeschi
- Department of Animal Science, Texas A&M University, College Station, TX 77843-2471USA
| |
Collapse
|
13
|
Shang J, Cai X, Zhang T, Sun Y, Zhang Y, Liu J, Guan B. EpiReSIM: A Resampling Method of Epistatic Model without Marginal Effects Using Under-Determined System of Equations. Genes (Basel) 2022; 13:genes13122286. [PMID: 36553553 PMCID: PMC9777644 DOI: 10.3390/genes13122286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/30/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
Simulation experiments are essential to evaluate epistasis detection methods, which is the main way to prove their effectiveness and move toward practical applications. However, due to the lack of effective simulators, especially for simulating models without marginal effects (eNME models), epistasis detection methods can hardly verify their effectiveness through simulation experiments. In this study, we propose a resampling simulation method (EpiReSIM) for generating the eNME model. First, EpiReSIM provides two strategies for solving eNME models. One is to calculate eNME models using prevalence constraints, and another is by joint constraints of prevalence and heritability. We transform the computation of the model into the problem of solving the under-determined system of equations. Introducing the complete orthogonal decomposition method and Newton's method, EpiReSIM calculates the solution of the underdetermined system of equations to obtain the eNME model, especially the solution of the high-order model, which is the highlight of EpiReSIM. Second, based on the computed eNME model, EpiReSIM generates simulation data by a resampling method. Experimental results show that EpiReSIM has advantages in preserving the biological properties of minor allele frequencies and calculating high-order models, and it is a convenient and effective alternative method for current simulation software.
Collapse
Affiliation(s)
- Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Xinrui Cai
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Tongdui Zhang
- Science and Technology Innovation Service Institution of Rizhao, Rizhao 276827, China
| | - Yan Sun
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Jinxing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Boxin Guan
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
- Correspondence:
| |
Collapse
|
14
|
Király B, Hangya B. Navigating the Statistical Minefield of Model Selection and Clustering in Neuroscience. eNeuro 2022; 9:ENEURO. [PMID: 35835556 DOI: 10.1523/ENEURO.0066-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 06/16/2022] [Accepted: 06/22/2022] [Indexed: 11/21/2022] Open
Abstract
Model selection is often implicit: when performing an ANOVA, one assumes that the normal distribution is a good model of the data; fitting a tuning curve implies that an additive and a multiplicative scaler describes the behavior of the neuron; even calculating an average implicitly assumes that the data were sampled from a distribution that has a finite first statistical moment: the mean. Model selection may be explicit, when the aim is to test whether one model provides a better description of the data than a competing one. As a special case, clustering algorithms identify groups with similar properties within the data. They are widely used from spike sorting to cell type identification to gene expression analysis. We discuss model selection and clustering techniques from a statistician’s point of view, revealing the assumptions behind, and the logic that governs the various approaches. We also showcase important neuroscience applications and provide suggestions how neuroscientists could put model selection algorithms to best use as well as what mistakes should be avoided.
Collapse
|
15
|
Vogel F, Vahle NM, Gertheiss J, Tomasik MJ. Supervised learning for analysing movement patterns in a virtual reality experiment. R Soc Open Sci 2022; 9:211594. [PMID: 35601447 PMCID: PMC9039785 DOI: 10.1098/rsos.211594] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 03/22/2022] [Indexed: 05/03/2023]
Abstract
The projection into a virtual character and the concomitant illusionary body ownership can lead to transformations of one's entity. Both during and after the exposure, behavioural and attitudinal changes may occur, depending on the characteristics or stereotypes associated with the embodied avatar. In the present study, we investigated the effects on physical activity when young students experience being old. After assignment (at random) to a young or an older avatar, the participants' body movements were tracked while performing upper body exercises. We propose and discuss the use of supervised learning procedures to assign these movement patterns to the underlying avatar class in order to detect behavioural differences. This approach can be seen as an alternative to classical feature-wise testing. We found that the classification accuracy was remarkably good for support vector machines with linear kernel and deep learning by convolutional neural networks, when inserting time sub-sequences extracted at random and repeatedly from the original data. For hand movements, associated decision boundaries revealed a higher level of local, vertical positions for the young avatar group, indicating increased agility in their performances. This occurrence held for both guided movements as well as achievement-orientated exercises.
Collapse
Affiliation(s)
- Frederike Vogel
- Department of Mathematics and Statistics, School of Economics and Social Sciences, Helmut Schmidt University, Hamburg, Germany
| | - Nils M. Vahle
- Department of Psychology and Psychotherapy, University of Witten/Herdecke, Witten, Nordrhein-Westfalen, Germany
| | - Jan Gertheiss
- Department of Mathematics and Statistics, School of Economics and Social Sciences, Helmut Schmidt University, Hamburg, Germany
| | - Martin J. Tomasik
- Department of Psychology and Psychotherapy, University of Witten/Herdecke, Witten, Nordrhein-Westfalen, Germany
| |
Collapse
|
16
|
Liu T, Yu H, Blair RH. Stability estimation for unsupervised clustering: A review. Wiley Interdiscip Rev Comput Stat 2022; 14:e1575. [PMID: 36583207 PMCID: PMC9787023 DOI: 10.1002/wics.1575] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/24/2021] [Accepted: 12/08/2021] [Indexed: 01/01/2023]
Abstract
Cluster analysis remains one of the most challenging yet fundamental tasks in unsupervised learning. This is due in part to the fact that there are no labels or gold standards by which performance can be measured. Moreover, the wide range of clustering methods available is governed by different objective functions, different parameters, and dissimilarity measures. The purpose of clustering is versatile, often playing critical roles in the early stages of exploratory data analysis and as an endpoint for knowledge and discovery. Thus, understanding the quality of a clustering is of critical importance. The concept of stability has emerged as a strategy for assessing the performance and reproducibility of data clustering. The key idea is to produce perturbed data sets that are very close to the original, and cluster them. If the clustering is stable, then the clusters from the original data will be preserved in the perturbed data clustering. The nature of the perturbation, and the methods for quantifying similarity between clusterings, are nontrivial, and ultimately what distinguishes many of the stability estimation methods apart. In this review, we provide an overview of the very active research area of cluster stability estimation and discuss some of the open questions and challenges that remain in the field. This article is categorized under:Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification.
Collapse
Affiliation(s)
- Tianmou Liu
- Institute for Artificial Intelligence and Data ScienceState University of New York at BuffaloBuffaloNew YorkUSA
| | - Han Yu
- Roswell Park Comprehensive Cancer CenterBuffaloNew YorkUSA
| | - Rachael Hageman Blair
- Department of Biostatistics, Institute for Artificial Intelligence and Data ScienceState University of New York at BuffaloBuffaloNew YorkUSA
| |
Collapse
|
17
|
Dai Q, Wang Z, Liu Z, Duan X, Song J, Guo M. Predicting miRNA-disease associations using an ensemble learning framework with resampling method. Brief Bioinform 2021; 23:6470964. [PMID: 34929742 DOI: 10.1093/bib/bbab543] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 11/05/2021] [Accepted: 11/25/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Accumulating evidences have indicated that microRNA (miRNA) plays a crucial role in the pathogenesis and progression of various complex diseases. Inferring disease-associated miRNAs is significant to explore the etiology, diagnosis and treatment of human diseases. As the biological experiments are time-consuming and labor-intensive, developing effective computational methods has become indispensable to identify associations between miRNAs and diseases. RESULTS We present an Ensemble learning framework with Resampling method for MiRNA-Disease Association (ERMDA) prediction to discover potential disease-related miRNAs. Firstly, the resampling strategy is proposed for building multiple different balanced training subsets to address the challenge of sample imbalance within the database. Then, ERMDA extracts miRNA and disease feature representations by integrating miRNA-miRNA similarities, disease-disease similarities and experimentally verified miRNA-disease association information. Next, the feature selection approach is applied to reduce the redundant information and increase the diversity among these subsets. Lastly, ERMDA constructs an individual learner on each subset to yield primitive outcomes, and the soft voting method is introduced for making the final decision based on the prediction results of individual learners. A series of experimental results demonstrates that ERMDA outperforms other state-of-the-art methods on both balanced and unbalanced testing sets. Besides, case studies conducted on the three human diseases further confirm the ERMDA's prediction capability for identifying potential disease-related miRNAs. In conclusion, these experimental results demonstrate that our method can serve as an effective and reliable tool for researchers to explore the regulatory role of miRNAs in complex diseases.
Collapse
Affiliation(s)
- Qiguo Dai
- School of Computer Science and Engineering, Dalian Minzu University, 116600, Dalian, China.,SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Zhaowei Wang
- School of Computer Science and Engineering, Dalian Minzu University, 116600, Dalian, China.,SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Ziqiang Liu
- School of Computer Science and Engineering, Dalian Minzu University, 116600, Dalian, China.,SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Xiaodong Duan
- SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Jinmiao Song
- SEAC Key Laboratory of Big Data Applied Technology, Dalian Minzu University, 116600, Dalian, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, 100044, Beijing, China
| |
Collapse
|
18
|
Wang X, Zheng Y, Jensen MK, He Z, Cai T. Biomarker evaluation under imperfect nested case-control design. Stat Med 2021; 40:4035-4052. [PMID: 33915597 PMCID: PMC8286316 DOI: 10.1002/sim.9012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 04/06/2021] [Accepted: 04/12/2021] [Indexed: 12/24/2022]
Abstract
The nested case-control (NCC) design has been widely adopted as a cost-effective sampling design for biomarker research. Under the NCC design, markers are only measured for the NCC subcohort consisting of all cases and a fraction of the controls selected randomly from the matched risk sets of the cases. Robust methods for evaluating prediction performance of risk models have been derived under the inverse probability weighting framework. The probabilities of samples being included in the NCC cohort can be calculated based on the study design ``a previous study'' or estimated non-parametrically ``a previous study''. Neither strategy works well due to model mis-specification and the curse of dimensionality in practical settings where the sampling does not entirely follow the study design or depends on many factors. In this paper, we propose an alternative strategy to estimate the sampling probabilities based on a varying coefficient model, which attains a balance between robustness and the curse of dimensionality. The complex correlation structure induced by repeated finite risk set sampling makes the standard resampling procedure for variance estimation fail. We propose a perturbation resampling procedure that provides valid interval estimation for the proposed estimators. Simulation studies show that the proposed method performs well in finite samples. We apply the proposed method to the Nurses' Health Study II to develop and evaluate prediction models using clinical biomarkers for cardiovascular risk.
Collapse
Affiliation(s)
- Xuan Wang
- Department of Biostatistics, Harvard University, Boston, MA, USA
| | - Yingye Zheng
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | - Zeling He
- Department of Biostatistics, Harvard University, Boston, MA, USA
| | - Tianxi Cai
- Department of Biostatistics, Harvard University, Boston, MA, USA,Department of Biomedical Informatics, Harvard University, Boston, MA, USA
| |
Collapse
|
19
|
AlQabbany AO, Azmi AM. Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams. Entropy (Basel) 2021; 23:859. [PMID: 34356400 DOI: 10.3390/e23070859] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 06/23/2021] [Accepted: 06/30/2021] [Indexed: 01/14/2023]
Abstract
We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data's underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances' continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms' efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.
Collapse
|
20
|
Kolari THM, Korpelainen P, Kumpula T, Tahvanainen T. Accelerated vegetation succession but no hydrological change in a boreal fen during 20 years of recent climate change. Ecol Evol 2021; 11:7602-7621. [PMID: 34188838 PMCID: PMC8216969 DOI: 10.1002/ece3.7592] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 03/26/2021] [Accepted: 04/01/2021] [Indexed: 11/17/2022] Open
Abstract
Northern mires (fens and bogs) have significant climate feedbacks and contribute to biodiversity, providing habitats to specialized biota. Many studies have found drying and degradation of bogs in response to climate change, while northern fens have received less attention. Rich fens are particularly important to biodiversity, but subject to global climate change, fen ecosystems may change via direct response of vegetation or indirectly by hydrological changes. With repeated sampling over the past 20 years, we aim to reveal trends in hydrology and vegetation in a pristine boreal fen with gradient from rich to poor fen and bog vegetation. We resampled 203 semi-permanent plots and compared water-table depth (WTD), pH, concentrations of mineral elements, and dissolved organic carbon (DOC), plant species occurrences, community structure, and vegetation types between 1998 and 2018. In the study area, the annual mean temperature rose by 1.0°C and precipitation by 46 mm, in 20-year periods prior to sampling occasions. We found that wet fen vegetation decreased, while bog and poor fen vegetation increased significantly. This reflected a trend of increasing abundance of common, generalist hummock species at the expense of fen specialist species. Changes were the most pronounced in high pH plots, where Sphagnum mosses had significantly increased in plot frequency, cover, and species richness. Changes of water chemistry were mainly insignificant in concentration levels and spatial patterns. Although indications toward drier conditions were found in vegetation, WTD had not consistently increased, instead, our results revealed complex dynamics of WTD as depending on vegetation changes. Overall, we found significant trend in vegetation, conforming to common succession pattern from rich to poor fen and bog vegetation. Our results suggest that responses intrinsic to vegetation, such as increased productivity or altered species interactions, may be more significant than indirect effects via local hydrology to the ecosystem response to climate warming.
Collapse
Affiliation(s)
- Tiina H. M. Kolari
- Department of Environmental and Biological SciencesUniversity of Eastern FinlandJoensuuFinland
| | - Pasi Korpelainen
- Department of Geographical and Historical StudiesUniversity of Eastern FinlandJoensuuFinland
| | - Timo Kumpula
- Department of Geographical and Historical StudiesUniversity of Eastern FinlandJoensuuFinland
| | - Teemu Tahvanainen
- Department of Environmental and Biological SciencesUniversity of Eastern FinlandJoensuuFinland
| |
Collapse
|
21
|
Herrmann C, Kluge C, Pilz M, Kieser M, Rauch G. Improving sample size recalculation in adaptive clinical trials by resampling. Pharm Stat 2021; 20:1035-1050. [PMID: 33792167 DOI: 10.1002/pst.2122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 12/16/2020] [Accepted: 03/08/2021] [Indexed: 11/11/2022]
Abstract
Sample size calculations in clinical trials need to be based on profound parameter assumptions. Wrong parameter choices may lead to too small or too high sample sizes and can have severe ethical and economical consequences. Adaptive group sequential study designs are one solution to deal with planning uncertainties. Here, the sample size can be updated during an ongoing trial based on the observed interim effect. However, the observed interim effect is a random variable and thus does not necessarily correspond to the true effect. One way of dealing with the uncertainty related to this random variable is to include resampling elements in the recalculation strategy. In this paper, we focus on clinical trials with a normally distributed endpoint. We consider resampling of the observed interim test statistic and apply this principle to several established sample size recalculation approaches. The resulting recalculation rules are smoother than the original ones and thus the variability in sample size is lower. In particular, we found that some resampling approaches mimic a group sequential design. In general, incorporating resampling of the interim test statistic in existing sample size recalculation rules results in a substantial performance improvement with respect to a recently published conditional performance score.
Collapse
Affiliation(s)
- Carolin Herrmann
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117 Berlin, Germany
| | - Corinna Kluge
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117 Berlin, Germany
| | - Maximilian Pilz
- Institute of Medical Biometry and Informatics, University Medical Center Ruprechts-Karls University Heidelberg, Heidelberg, Germany
| | - Meinhard Kieser
- Institute of Medical Biometry and Informatics, University Medical Center Ruprechts-Karls University Heidelberg, Heidelberg, Germany
| | - Geraldine Rauch
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
22
|
Jiang S, Liu B, Wang S. A Dispersion Compensation Method Based on Resampling of Modulated Signal for FMCW Lidar. Sensors (Basel) 2021; 21:s21010249. [PMID: 33401670 PMCID: PMC7795196 DOI: 10.3390/s21010249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/14/2020] [Accepted: 12/19/2020] [Indexed: 11/22/2022]
Abstract
In order to eliminate the nonlinearity in the laser modulation process, the dual-interferometers system is often adopted in the frequency modulation continuous wave (FMCW) laser ranging. However, the dispersion mismatch between the fiber reference interferometer and the measurement interferometer will lead to the decrease in ranging accuracy and resolution. In this paper, a dispersion compensation method based on resampling with a modulated signal is proposed. Since the beat signal of the end face of the delay fiber is not affected by dispersion mismatch, it can be modulated to generate a signal whose phase is proportional to that of the target spatial signal. Then, the modulated signal is regarded as the reference clock to sample the target spatial signal. Thereby, the influence of the dispersion mismatch between the two optical interferometers can be eliminated. In this article, simulation is performed to verify the effect of this method, and an experiment is carried out on the target at the distance of 2.4 m. Experiments show that the full width at half maximum (FWHM) of the distance spectrum after dispersion compensation is consistent with the reflected signal from the end face of the delay fiber, and the standard deviation of multiple measurements reached 10.12 μm.
Collapse
Affiliation(s)
- Shuo Jiang
- Key Laboratory of Science and Technology on Space Optoelectronic Precision Measurement, CAS, Chengdu 610200, China; (S.J.); (S.W.)
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China
- University of Chinese Academy of Sciences, Beijing 101400, China
| | - Bo Liu
- Key Laboratory of Science and Technology on Space Optoelectronic Precision Measurement, CAS, Chengdu 610200, China; (S.J.); (S.W.)
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China
- University of Chinese Academy of Sciences, Beijing 101400, China
- Correspondence:
| | - Shengjie Wang
- Key Laboratory of Science and Technology on Space Optoelectronic Precision Measurement, CAS, Chengdu 610200, China; (S.J.); (S.W.)
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China
- Civil Aviation Flight University of China, Deyang 618300, China
| |
Collapse
|
23
|
Abstract
Factorial survival designs with right-censored observations are commonly inferred by Cox regression and explained by means of hazard ratios. However, in case of non-proportional hazards, their interpretation can become cumbersome; especially for clinicians. We therefore offer an alternative: median survival times are used to estimate treatment and interaction effects and null hypotheses are formulated in contrasts of their population versions. Permutation-based tests and confidence regions are proposed and shown to be asymptotically valid. Their type-1 error control and power behavior are investigated in extensive simulations, showing the new methods' wide applicability. The latter is complemented by an illustrative data analysis.
Collapse
Affiliation(s)
- Marc Ditzhaus
- Faculty of Statistics, TU Dortmund University, Dortmund, Germany
| | - Dennis Dobler
- Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Markus Pauly
- Faculty of Statistics, TU Dortmund University, Dortmund, Germany
| |
Collapse
|
24
|
Cao Y, Yu J. A class of goodness-of-fit test for the additive hazards model with case-cohort data. Pharm Stat 2020; 20:451-461. [PMID: 33305424 DOI: 10.1002/pst.2087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 10/13/2020] [Accepted: 11/16/2020] [Indexed: 11/12/2022]
Abstract
The case-cohort design is commonly used in epidemiological studies due to its cost-effectiveness. The additive hazards model is widely used in survival analysis when the hazards difference is constant. In this article, we propose a class of goodness-of-fit test statistics for the assumption of the additive hazards model with case-cohort data through a class of asymptotically mean-zero multiparameter stochastic processes. We also establish the asymptotic theory of the proposed test statistics and a resampling scheme is adopted to approximate its asymptotic distribution. The performance of the proposed test statistics is evaluated through simulation studies and a real dataset is analyzed to illustrate the proposed method.
Collapse
Affiliation(s)
- Yongxiu Cao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Jichang Yu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| |
Collapse
|
25
|
Abstract
In many experiments and especially in translational and preclinical research, sample sizes are (very) small. In addition, data designs are often high dimensional, i.e. more dependent than independent replications of the trial are observed. The present paper discusses the applicability of max t-test-type statistics (multiple contrast tests) in high-dimensional designs (repeated measures or multivariate) with small sample sizes. A randomization-based approach is developed to approximate the distribution of the maximum statistic. Extensive simulation studies confirm that the new method is particularly suitable for analyzing data sets with small sample sizes. A real data set illustrates the application of the methods.
Collapse
Affiliation(s)
- Frank Konietschke
- Charité-Universitätsmedizin Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, Berlin, Germany.,Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, Berlin, Germany
| | - Karima Schwab
- Institute of Pharmacology, Charité-Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany
| | - Markus Pauly
- Department of Statistics, TU Dortmund University, Dortmund, Germany
| |
Collapse
|
26
|
Yamashita S, Okuda K, Nakaichi T, Yamamoto H, Yokoyama K. Texture Feature Comparison Between Step-and-Shoot and Continuous-Bed-Motion 18F-FDG PET. J Nucl Med Technol 2020; 49:58-64. [PMID: 33020230 DOI: 10.2967/jnmt.120.246157] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 08/11/2020] [Indexed: 11/16/2022] Open
Abstract
Our objective was to investigate the differences in texture features between step-and-shoot (SS) and continuous-bed-motion (CBM) imaging in phantom and clinical studies. Methods: A National Electrical Manufacturers Association body phantom was filled with 18F-FDG solution at a sphere-to-background ratio of 4:1. SS and CBM were performed using the same acquisition duration, and the data were reconstructed using 3-dimensional ordered-subset expectation maximization with time-of-flight algorithms. Texture features were extracted using the software LIFEx. A volume of interest was delineated on the 22-, 28-, and 37-mm spheres with a threshold of 42% of the maximum SUV. The voxel intensities were discretized using 2 resampling methods, namely a fixed bin size and a fixed bin number discretization. The discrete resampling values were set to 64 and 128. In total, 31 texture features were calculated with gray-level cooccurrence matrix (GLCM), gray-level run length matrix, neighborhood gray-level different matrix, and gray-level zone length matrix. The texture features of the SS and CBM images were compared for all settings using the paired t test and the coefficient of variation. In a clinical study, 27 lesions from 20 patients were examined using the same acquisition and image processing as were used during the phantom study. The percentage difference (%Diff) and correlation between the texture features from SS and CBM images were calculated to evaluate agreement between the 2 scanning techniques. Results: In the phantom study, the 11 features exhibited no significant difference between SS and CBM images, and the coefficient of variation was no more than 10%, depending on resampling conditions, whereas entropy and dissimilarity from GLCM fulfilled the criteria for all settings. In the clinical study, the entropy and dissimilarity from GLCM exhibited a low %Diff and excellent correlation in all resampling conditions. The %Diff of entropy was lower than that of dissimilarity. Conclusion: Differences between the texture features of SS and CBM images varied depending on the type of feature. Because entropy for GLCM exhibits minimal differences between SS and CBM images irrespective of resampling conditions, entropy may be the optimal feature to reduce the differences between the 2 scanning techniques.
Collapse
Affiliation(s)
- Shozo Yamashita
- Division of Radiology, Public Central Hospital of Matto Ishikawa, Ishikawa, Japan
| | - Koichi Okuda
- Department of Physics, Kanazawa Medical University, Kahoku, Japan; and
| | - Tetsu Nakaichi
- Division of Radiology, Public Central Hospital of Matto Ishikawa, Ishikawa, Japan
| | - Haruki Yamamoto
- Division of Radiology, Public Central Hospital of Matto Ishikawa, Ishikawa, Japan
| | - Kunihiko Yokoyama
- PET Imaging Center, Public Central Hospital of Matto Ishikawa, Ishikawa, Japan
| |
Collapse
|
27
|
Banerjee A, Choi H, DeVogel N, Xu Y, Yoganandan N. Uncertainty Evaluations for Risk Assessment in Impact Injuries and Implications for Clinical Practice. Front Bioeng Biotechnol 2020; 8:877. [PMID: 32850734 PMCID: PMC7426360 DOI: 10.3389/fbioe.2020.00877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 07/08/2020] [Indexed: 11/25/2022] Open
Abstract
Injury risk curves (IRCs) represent the quantification of risk of adverse outcomes, such as a bone fracture, quantified by a biomechanical metric such as force or deflection. From a biomechanical perspective, they are crucial in crashworthiness studies to advance human safety. In clinical settings, they can be used as an assistive tool to aid in the decision-making process for surgical or conservative treatment. The estimation of risk corresponding to a level of biomechanical metric is done using a regression technique, such as a parametric survival regression model. As with any statistical procedure, error measures are computed for the IRC, representing the quality of the estimated risk. For example, confidence intervals (CIs) are recommended by the International Standards Organization, and the normalized confidence interval width (NCIW) is computed based on the width of the CI. This is a surrogate for the quality of the risk curve. A 95% CI means that if the same experiment were hypothetically repeated 100 times, at least 95 of the computed CIs should contain the true risk curve. Such an interpretation is problematic in most biomechanical contexts as rarely the same experiment is repeated. The notion that a wider confidence interval implies a poorer quality risk curve can be misleading. This article considers the evaluation of CIs and its implications in biomechanical settings for safety engineering and clinical practice. Alternatives are suggested for future studies.
Collapse
Affiliation(s)
- Anjishnu Banerjee
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Hoon Choi
- Center for NeuroTrauma Research, Department of Neurosurgery, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Nicholas DeVogel
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Yayun Xu
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Narayan Yoganandan
- Center for NeuroTrauma Research, Department of Neurosurgery, Medical College of Wisconsin, Milwaukee, WI, United States
| |
Collapse
|
28
|
Abstract
We propose parameter optimization techniques for weighted ensemble sampling of Markov chains in the steady-state regime. Weighted ensemble consists of replicas of a Markov chain, each carrying a weight, that are periodically resampled according to their weights inside of each of a number of bins that partition state space. We derive, from first principles, strategies for optimizing the choices of weighted ensemble parameters, in particular the choice of bins and the number of replicas to maintain in each bin. In a simple numerical example, we compare our new strategies with more traditional ones and with direct Monte Carlo.
Collapse
|
29
|
Abstract
Publication bias frequently appears in meta-analyses when the included studies' results (e.g., p-values) influence the studies' publication processes. Some unfavorable studies may be suppressed from publication, so the meta-analytic results may be biased toward an artificially favorable direction. Many statistical tests have been proposed to detect publication bias in recent two decades. However, they often make dramatically different assumptions about the cause of publication bias; therefore, they are usually powerful only in certain cases that support their particular assumptions, while their powers may be fairly low in many other cases. Although several simulation studies have been carried out to compare different tests' powers under various situations, it is typically infeasible to justify the exact mechanism of publication bias in a real-world meta-analysis and thus select the corresponding optimal publication bias test. We introduce a hybrid test for publication bias by synthesizing various tests and incorporating their benefits, so that it maintains relatively high powers across various mechanisms of publication bias. The superior performance of the proposed hybrid test is illustrated using simulation studies and three real-world meta-analyses with different effect sizes. It is compared with many existing methods, including the commonly used regression and rank tests, and the trim-and-fill method.
Collapse
Affiliation(s)
- Lifeng Lin
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| |
Collapse
|
30
|
D’Arco M, Napoli E, Zacharelos E. Digital Circuit for Seamless Resampling ADC Output Streams. Sensors (Basel) 2020; 20:s20061619. [PMID: 32183269 PMCID: PMC7146112 DOI: 10.3390/s20061619] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 03/06/2020] [Accepted: 03/11/2020] [Indexed: 11/16/2022]
Abstract
Fine resolution selection of the sample rate is not available in digital storage oscilloscopes (DSOs), so the user has to rely on offline processing to cope with such need. The paper first discusses digital signal processing based methods that allow changing the sampling rate by means of digital resampling approaches. Then, it proposes a digital circuit that, if included in the acquisition channel of a digital storage oscilloscope, between the internal analog-to-digital converter (ADC) and the acquisition memory, allows the user to select any sampling rate lower than the maximum one with fine resolution. The circuit relies both on the use of a short digital filter with dynamically generated coefficients and on a suitable memory management strategy. The output samples produced by the digital circuit are characterized by a sampling rate that can be incoherent with the clock frequency regulating the memory access. Both a field programmable gate array (FPGA) implementation and an application specific integrated circuit (ASIC) design of the proposed circuit are evaluated.
Collapse
|
31
|
Sies A, Van Mechelen I. Estimating the quality of optimal treatment regimes. Stat Med 2019; 38:4925-4938. [PMID: 31424128 DOI: 10.1002/sim.8342] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Revised: 07/13/2019] [Accepted: 07/18/2019] [Indexed: 11/08/2022]
Abstract
When multiple treatment alternatives are available for a disease, an obvious question is which alternative is most effective for which patient. One may address this question by searching for optimal treatment regimes that specify for each individual the preferable treatment alternative based on that individual's baseline characteristics. When such a regime has been estimated, its quality (in terms of the expected outcome if it was used for treatment assignment of all patients in the population under study) is of obvious interest. Obtaining a good and reliable estimate of this quantity is a key challenge for which so far no satisfactory solution is available. In this paper, we consider for this purpose several estimators of the expected outcome in conjunction with several resampling methods. The latter have been evaluated before within the context of statistical learning to estimate the prediction error of estimated prediction rules. Yet, the results of these evaluations were equivocal, with different best performing methods in different studies, and with near-zero and even negative correlations between true and estimated prediction errors. Moreover, for different reasons, it is not straightforward to extrapolate the findings of these studies to the context of optimal treatment regimes. To address these issues, we set up a new and comprehensive simulation study. In this study, combinations of different estimators with .632+ and out-of-bag bootstrap resampling methods performed best. In addition, the study shed a surprising new light on the previously reported problematic correlations between true and estimated prediction errors in the area of statistical learning.
Collapse
Affiliation(s)
- Aniek Sies
- Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Iven Van Mechelen
- Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
32
|
Gienger CM, Dochtermann NA, Tracy CR. Detecting trends in body size: empirical and statistical requirements for intraspecific analyses. Curr Zool 2019; 65:493-497. [PMID: 31616479 PMCID: PMC6784499 DOI: 10.1093/cz/zoy079] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Accepted: 10/09/2018] [Indexed: 11/14/2022] Open
Abstract
Attributing biological explanations to observed ecogeographical and ecological patterns require eliminating potential statistical and sampling artifacts as alternative explanations of the observed patterns. Here, we assess the role of sample size, statistical power, and geographic inclusivity on the general validity and statistical significance of relationships between body size and latitude for 3 well-studied species of turtles. We extend those analyses to emphasize the importance of using statistically robust data in determining macroecological patterns. We examined intraspecific trends in body size with latitude in Chelydra serpentina, Chrysemys picta, and Trachemys scripta using Pearson’s correlations, diagnostic tests for influential points, and resampling. Existing data were insufficient to ascertain a latitudinal trend in body size for C. serpentina or T. scripta. There was a significant relationship for C. picta, however, resampling analyses show that, on average, 16 of the 23 available independent populations were needed to demonstrate a significant relationship and that at least 20 of 23 populations were required to obtain a statistically powerful correlation between body size and latitude. Furthermore, restricting the latitudes of populations resampled shows that body size trends of C. picta were largely due to leveraging effects of populations at the edge of the species range. Our results suggest that broad inferences regarding ecological trends in body size should be made with caution until underlying (intraspecific) patterns in body size can be statistically and conclusively demonstrated.
Collapse
Affiliation(s)
- C M Gienger
- Department of Biology, Center of Excellence for Field Biology, Austin Peay State University, Clarksville, TN, USA
- Address correspondence to C. M. Gienger. E-mail:
| | - Ned A Dochtermann
- Department of Biological Sciences, North Dakota State University, Fargo, ND, USA
| | - C Richard Tracy
- Department of Biology, Program in Ecology, Evolution, and Conservation Biology, University of Nevada, Reno, NV, USA
| |
Collapse
|
33
|
Klau S, Martin-Magniette ML, Boulesteix AL, Hoffmann S. Sampling uncertainty versus method uncertainty: A general framework with applications to omics biomarker selection. Biom J 2019; 62:670-687. [PMID: 31099917 DOI: 10.1002/bimj.201800309] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 04/12/2019] [Accepted: 04/23/2019] [Indexed: 12/19/2022]
Abstract
Uncertainty is a crucial issue in statistics which can be considered from different points of view. One type of uncertainty, typically referred to as sampling uncertainty, arises through the variability of results obtained when the same analysis strategy is applied to different samples. Another type of uncertainty arises through the variability of results obtained when using the same sample but different analysis strategies addressing the same research question. We denote this latter type of uncertainty as method uncertainty. It results from all the choices to be made for an analysis, for example, decisions related to data preparation, method choice, or model selection. In medical sciences, a large part of omics research is focused on the identification of molecular biomarkers, which can either be performed through ranking or by selection from among a large number of candidates. In this paper, we introduce a general resampling-based framework to quantify and compare sampling and method uncertainty. For illustration, we apply this framework to different scenarios related to the selection and ranking of omics biomarkers in the context of acute myeloid leukemia: variable selection in multivariable regression using different types of omics markers, the ranking of biomarkers according to their predictive performance, and the identification of differentially expressed genes from RNA-seq data. For all three scenarios, our findings suggest highly unstable results when the same analysis strategy is applied to two independent samples, indicating high sampling uncertainty and a comparatively smaller, but non-negligible method uncertainty, which strongly depends on the methods being compared.
Collapse
Affiliation(s)
- Simon Klau
- Institute for Medical Information Processing, Biometry and Epidemiology (IBE), Munich, Germany
| | - Marie-Laure Martin-Magniette
- Institute of Plant Sciences Paris Saclay IPS2, CNRS, INRA, Université Paris-Sud, Université Evry, Université Paris-Saclay, Orsay, France.,Institute of Plant Sciences Paris-Saclay IPS2, Paris Diderot, Sorbonne Paris-Cité, Orsay, France.,UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology (IBE), Munich, Germany
| | - Sabine Hoffmann
- Institute for Medical Information Processing, Biometry and Epidemiology (IBE), Munich, Germany
| |
Collapse
|
34
|
Parast L, Cai T, Tian L. Using a surrogate marker for early testing of a treatment effect. Biometrics 2019; 75:1253-1263. [PMID: 31009073 DOI: 10.1111/biom.13067] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 03/25/2019] [Indexed: 02/01/2023]
Abstract
The development of methods to identify, validate, and use surrogate markers to test for a treatment effect has been an area of intense research interest given the potential for valid surrogate markers to reduce the required costs and follow-up times of future studies. Several quantities and procedures have been proposed to assess the utility of a surrogate marker. However, few methods have been proposed to address how one might use the surrogate marker information to test for a treatment effect at an earlier time point, especially in settings where the primary outcome and the surrogate marker are subject to censoring. In this paper, we propose a novel test statistic to test for a treatment effect using surrogate marker information measured prior to the end of the study in a time-to-event outcome setting. We propose a robust nonparametric estimation procedure and propose inference procedures. In addition, we evaluate the power for the design of a future study based on surrogate marker information. We illustrate the proposed procedure and relative power of the proposed test compared to a test performed at the end of the study using simulation studies and an application to data from the Diabetes Prevention Program.
Collapse
Affiliation(s)
- Layla Parast
- Statistics Group, RAND Corporation, Santa Monica, California
| | - Tianxi Cai
- Department of Biostatistics, Harvard University, Boston, Massachusetts
| | - Lu Tian
- Department of Biomedical Data Science, Stanford University, Stanford, California
| |
Collapse
|
35
|
Wen W, Kajínek O, Khatibi S, Chadzitaskos G. A Common Assessment Space for Different Sensor Structures. Sensors (Basel) 2019; 19:s19030568. [PMID: 30700053 PMCID: PMC6387182 DOI: 10.3390/s19030568] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 01/21/2019] [Accepted: 01/22/2019] [Indexed: 12/04/2022]
Abstract
The study of the evolution process of our visual system indicates the existence of variational spatial arrangement; from densely hexagonal in the fovea to a sparse circular structure in the peripheral retina. Today’s sensor spatial arrangement is inspired by our visual system. However, we have not come further than rigid rectangular and, on a minor scale, hexagonal sensor arrangements. Even in this situation, there is a need for directly assessing differences between the rectangular and hexagonal sensor arrangements, i.e., without the conversion of one arrangement to another. In this paper, we propose a method to create a common space for addressing any spatial arrangements and assessing the differences among them, e.g., between the rectangular and hexagonal. Such a space is created by implementing a continuous extension of discrete Weyl Group orbit function transform which extends a discrete arrangement to a continuous one. The implementation of the space is demonstrated by comparing two types of generated hexagonal images from each rectangular image with two different methods of the half-pixel shifting method and virtual hexagonal method. In the experiment, a group of ten texture images were generated with variational curviness content using ten different Perlin noise patterns, adding to an initial 2D Gaussian distribution pattern image. Then, the common space was obtained from each of the discrete images to assess the differences between the original rectangular image and its corresponding hexagonal image. The results show that the space facilitates a usage friendly tool to address an arrangement and assess the changes between different spatial arrangements by which, in the experiment, the hexagonal images show richer intensity variation, nonlinear behavior, and larger dynamic range in comparison to the rectangular images.
Collapse
Affiliation(s)
- Wei Wen
- Department of Technology and Aesthetics, Blekinge Institute of Technology, 37179 Karlskrona, Sweden.
| | - Ondřej Kajínek
- Department of Physics, Czech Technical University, 11519 Prague 1, Czech Republic.
| | - Siamak Khatibi
- Department of Technology and Aesthetics, Blekinge Institute of Technology, 37179 Karlskrona, Sweden.
| | - Goce Chadzitaskos
- Department of Physics, Czech Technical University, 11519 Prague 1, Czech Republic.
| |
Collapse
|
36
|
Zimmermann G, Pauly M, Bathke AC. Small-sample performance and underlying assumptions of a bootstrap-based inference method for a general analysis of covariance model with possibly heteroskedastic and nonnormal errors. Stat Methods Med Res 2019; 28:3808-3821. [PMID: 30600769 DOI: 10.1177/0962280218817796] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
It is well known that the standard F test is severely affected by heteroskedasticity in unbalanced analysis of covariance models. Currently available potential remedies for such a scenario are based on heteroskedasticity-consistent covariance matrix estimation (HCCME). However, the HCCME approach tends to be liberal in small samples. Therefore, in the present paper, we propose a combination of HCCME and a wild bootstrap technique, with the aim of improving the small-sample performance. We precisely state a set of assumptions for the general analysis of covariance model and discuss their practical interpretation in detail, since this issue may have been somewhat neglected in applied research so far. We prove that these assumptions are sufficient to ensure the asymptotic validity of the combined HCCME-wild bootstrap analysis of covariance. The results of our simulation study indicate that our proposed test remedies the problems of the analysis of covariance F test and its heteroskedasticity-consistent alternatives in small to moderate sample size scenarios. Our test only requires very mild conditions, thus being applicable in a broad range of real-life settings, as illustrated by the detailed discussion of a dataset from preclinical research on spinal cord injury. Our proposed method is ready-to-use and allows for valid hypothesis testing in frequently encountered settings (e.g., comparing group means while adjusting for baseline measurements in a randomized controlled clinical trial).
Collapse
Affiliation(s)
- Georg Zimmermann
- Department of Mathematics, Paris Lodron University, Salzburg, Austria.,Spinal Cord Injury and Tissue Regeneration Centre Salzburg, Paracelsus Medical University, Salzburg, Austria.,Department of Neurology, Christian Doppler Medical Centre and Centre for Cognitive Neuroscience, Salzburg, Austria
| | - Markus Pauly
- Institute of Statistics, University of Ulm, Ulm, Germany
| | - Arne C Bathke
- Department of Mathematics, Paris Lodron University, Salzburg, Austria.,Department of Statistics, University of Kentucky, Lexington, KY, USA
| |
Collapse
|
37
|
Konietschke F, Friede T, Pauly M. Semi-parametric analysis of overdispersed count and metric data with varying follow-up times: Asymptotic theory and small sample approximations. Biom J 2018; 61:616-629. [PMID: 30515878 PMCID: PMC6587510 DOI: 10.1002/bimj.201800027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 11/09/2018] [Accepted: 11/09/2018] [Indexed: 11/09/2022]
Abstract
Count data are common endpoints in clinical trials, for example magnetic resonance imaging lesion counts in multiple sclerosis. They often exhibit high levels of overdispersion, that is variances are larger than the means. Inference is regularly based on negative binomial regression along with maximum-likelihood estimators. Although this approach can account for heterogeneity it postulates a common overdispersion parameter across groups. Such parametric assumptions are usually difficult to verify, especially in small trials. Therefore, novel procedures that are based on asymptotic results for newly developed rate and variance estimators are proposed in a general framework. Moreover, in case of small samples the procedures are carried out using permutation techniques. Here, the usual assumption of exchangeability under the null hypothesis is not met due to varying follow-up times and unequal overdispersion parameters. This problem is solved by the use of studentized permutations leading to valid inference methods for situations with (i) varying follow-up times, (ii) different overdispersion parameters, and (iii) small sample sizes.
Collapse
Affiliation(s)
- Frank Konietschke
- Department of Mathematical Sciences, University of Texas at Dallas, Dallas, TX, USA
| | - Tim Friede
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Markus Pauly
- Institute of Statistics, Ulm University, Ulm, Germany
| |
Collapse
|
38
|
Schomaker M, Heumann C. Bootstrap inference when using multiple imputation. Stat Med 2018; 37:2252-2266. [PMID: 29682776 DOI: 10.1002/sim.7654] [Citation(s) in RCA: 221] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Revised: 01/18/2018] [Accepted: 02/10/2018] [Indexed: 01/21/2023]
Abstract
Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available.
Collapse
Affiliation(s)
- Michael Schomaker
- Centre for Infectious Disease Epidemiology & Research, University of Cape Town, Falmouth Building, Observatory, Cape Town, 7925, South Africa
| | - Christian Heumann
- Christian Heumann, Institut für Statistik, Ludwig-Maximilians Universität München, München, Germany
| |
Collapse
|
39
|
Dinov ID, Palanimalai S, Khare A, Christou N. Randomization-Based Statistical Inference: A Resampling and Simulation Infrastructure. Teach Stat 2018; 40:64-73. [PMID: 30270947 PMCID: PMC6155997 DOI: 10.1111/test.12156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Statistical inference involves drawing scientifically-based conclusions describing natural processes or observable phenomena from datasets with intrinsic random variation. There are parametric and non-parametric approaches for studying the data or sampling distributions, yet few resources are available to provide integrated views of data (observed or simulated), theoretical concepts, computational mechanisms and hands-on utilization via flexible graphical user interfaces. We designed, implemented and validated a new portable randomization-based statistical inference infrastructure (http://socr.umich.edu/HTML5/Resampling_Webapp) that blends research-driven data analytics and interactive learning, and provides a backend computational library for managing large amounts of simulated or user-provided data. The core of this framework is a modern randomization webapp, which may be invoked on any device supporting a JavaScript-enabled web-browser. We demonstrate the use of these resources to analyze proportion, mean, and other statistics using simulated (virtual experiments) and observed (e.g., Acute Myocardial Infarction, Job Rankings) data. Finally, we draw parallels between parametric inference methods and their distribution-free alternatives. The Randomization and Resampling webapp can be used for data analytics, as well as for formal, in-class and informal, out-of-the-classroom learning and teaching of different scientific concepts. Such concepts include sampling, random variation, computational statistical inference and data-driven analytics. The entire scientific community may utilize, test, expand, modify or embed these resources (data, source-code, learning activity, webapp) without any restrictions.
Collapse
Affiliation(s)
- Ivo D. Dinov
- Statistics Online Computational Resource, University of California, Los Angeles, Los Angeles, CA 90095
- Statistics Online Computational Resource, University of Michigan, UMSN, Ann Arbor, Michigan 48109-5482
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan 48109
| | - Selvam Palanimalai
- Statistics Online Computational Resource, University of California, Los Angeles, Los Angeles, CA 90095
| | - Ashwini Khare
- Statistics Online Computational Resource, University of California, Los Angeles, Los Angeles, CA 90095
| | - Nicolas Christou
- Statistics Online Computational Resource, University of California, Los Angeles, Los Angeles, CA 90095
| |
Collapse
|
40
|
Olmos V, Marro M, Loza-Alvarez P, Raldúa D, Prats E, Padrós F, Piña B, Tauler R, de Juan A. Combining hyperspectral imaging and chemometrics to assess and interpret the effects of environmental stressors on zebrafish eye images at tissue level. J Biophotonics 2018; 11:e201700089. [PMID: 28766927 DOI: 10.1002/jbio.201700089] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Revised: 07/27/2017] [Accepted: 07/31/2017] [Indexed: 06/07/2023]
Abstract
Changes on an organism by the exposure to environmental stressors may be characterized by hyperspectral images (HSI), which preserve the morphology of biological samples, and suitable chemometric tools. The approach proposed allows assessing and interpreting the effect of contaminant exposure on heterogeneous biological samples monitored by HSI at specific tissue levels. In this work, the model example used consists of the study of the effect of the exposure of chlorpyrifos-oxon on zebrafish tissues. To assess this effect, unmixing of the biological sample images followed by tissue-specific classification models based on the unmixed spectral signatures is proposed. Unmixing and classification are performed by multivariate curve resolution-alternating least squares (MCR-ALS) and partial least squares-discriminant analysis (PLS-DA), respectively. Crucial aspects of the approach are: (1) the simultaneous MCR-ALS analysis of all images from 1 population to take into account biological variability and provide reliable tissue spectral signatures, and (2) the use of resolved spectral signatures from control and exposed populations obtained from resampling of pixel subsets analyzed by MCR-ALS multiset analysis as information for the tissue-specific PLS-DA classification models. Classification results diagnose the presence of a significant effect and identify the spectral regions at a tissue level responsible for the biological change.
Collapse
Affiliation(s)
- Víctor Olmos
- Department of Chemical Engineering and Analytical Chemistry, University of Barcelona, Barcelona, Spain
| | - Mònica Marro
- Institut de Ciencies Fotòniques (ICFO), The Barcelona Institute of Science and Technology, Castelldefels, Spain
| | - Pablo Loza-Alvarez
- Institut de Ciencies Fotòniques (ICFO), The Barcelona Institute of Science and Technology, Castelldefels, Spain
| | - Demetrio Raldúa
- Department of Environmental Chemistry, Institute of Environmental Assessment and Water Diagnostic (IDAEA-CSIC), Barcelona, Spain
| | - Eva Prats
- Research and Development Centre (CID-CSIC), Barcelona, Spain
| | - Francesc Padrós
- Pathological Diagnostic Service in Fish, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Benjamin Piña
- Department of Environmental Chemistry, Institute of Environmental Assessment and Water Diagnostic (IDAEA-CSIC), Barcelona, Spain
| | - Romà Tauler
- Department of Environmental Chemistry, Institute of Environmental Assessment and Water Diagnostic (IDAEA-CSIC), Barcelona, Spain
| | - Anna de Juan
- Department of Chemical Engineering and Analytical Chemistry, University of Barcelona, Barcelona, Spain
| |
Collapse
|
41
|
Cho Y, Hu C, Ghosh D. Covariate adjustment using propensity scores for dependent censoring problems in the accelerated failure time model. Stat Med 2018; 37:390-404. [PMID: 29023972 DOI: 10.1002/sim.7513] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 07/13/2017] [Accepted: 08/29/2017] [Indexed: 11/10/2022]
Abstract
In many medical studies, estimation of the association between treatment and outcome of interest is often of primary scientific interest. Standard methods for its evaluation in survival analysis typically require the assumption of independent censoring. This assumption might be invalid in many medical studies, where the presence of dependent censoring leads to difficulties in analyzing covariate effects on disease outcomes. This data structure is called "semicompeting risks data," for which many authors have proposed an artificial censoring technique. However, confounders with large variability may lead to excessive artificial censoring, which subsequently results in numerically unstable estimation. In this paper, we propose a strategy for weighted estimation of the associations in the accelerated failure time model. Weights are based on propensity score modeling of the treatment conditional on confounder variables. This novel application of propensity scores avoids excess artificial censoring caused by the confounders and simplifies computation. Monte Carlo simulation studies and application to AIDS and cancer research are used to illustrate the methodology.
Collapse
Affiliation(s)
- Youngjoo Cho
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Chen Hu
- Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA.,NRG Oncology, Statistics and Data Management Center, Philadelphia, PA 19103, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado, Aurora, CO 80045, USA
| |
Collapse
|
42
|
Saffari A, Silver MJ, Zavattari P, Moi L, Columbano A, Meaburn EL, Dudbridge F. Estimation of a significance threshold for epigenome-wide association studies. Genet Epidemiol 2018; 42:20-33. [PMID: 29034560 PMCID: PMC5813244 DOI: 10.1002/gepi.22086] [Citation(s) in RCA: 103] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Revised: 05/30/2017] [Accepted: 07/24/2017] [Indexed: 12/17/2022]
Abstract
Epigenome-wide association studies (EWAS) are designed to characterise population-level epigenetic differences across the genome and link them to disease. Most commonly, they assess DNA-methylation status at cytosine-guanine dinucleotide (CpG) sites, using platforms such as the Illumina 450k array that profile a subset of CpGs genome wide. An important challenge in the context of EWAS is determining a significance threshold for declaring a CpG site as differentially methylated, taking multiple testing into account. We used a permutation method to estimate a significance threshold specifically for the 450k array and a simulation extrapolation approach to estimate a genome-wide threshold. These methods were applied to five different EWAS datasets derived from a variety of populations and tissue types. We obtained an estimate of α=2.4×10-7 for the 450k array, and a genome-wide estimate of α=3.6×10-8. We further demonstrate the importance of these results by showing that previously recommended sample sizes for EWAS should be adjusted upwards, requiring samples between ∼10% and ∼20% larger in order to maintain type-1 errors at the desired level.
Collapse
Affiliation(s)
- Ayden Saffari
- Department of Non‐Communicable Disease EpidemiologyLondon School of Hygiene and Tropical MedicineLondonUnited Kingdom
- MRC Unit, The Gambia and MRC International Nutrition GroupLondon School of Hygiene and Tropical MedicineLondonUnited Kingdom
- Department of Psychological Sciences, BirkbeckUniversity of LondonLondonUnited Kingdom
| | - Matt J. Silver
- MRC Unit, The Gambia and MRC International Nutrition GroupLondon School of Hygiene and Tropical MedicineLondonUnited Kingdom
| | - Patrizia Zavattari
- Department of Biomedical SciencesUniversity of CagliariCagliariSardiniaItaly
| | - Loredana Moi
- Department of Biomedical SciencesUniversity of CagliariCagliariSardiniaItaly
| | - Amedeo Columbano
- Department of Biomedical SciencesUniversity of CagliariCagliariSardiniaItaly
| | - Emma L. Meaburn
- Department of Psychological Sciences, BirkbeckUniversity of LondonLondonUnited Kingdom
| | - Frank Dudbridge
- Department of Non‐Communicable Disease EpidemiologyLondon School of Hygiene and Tropical MedicineLondonUnited Kingdom
- Department of Health SciencesUniversity of LeicesterLeicesterUnited Kingdom
| |
Collapse
|
43
|
Mao Q, Liu S, Wang S, Ma X. Surface Fitting for Quasi Scattered Data from Coordinate Measuring Systems. Sensors (Basel) 2018; 18:E214. [PMID: 29342869 DOI: 10.3390/s18010214] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Revised: 01/10/2018] [Accepted: 01/10/2018] [Indexed: 11/17/2022]
Abstract
Non-uniform rational B-spline (NURBS) surface fitting from data points is wildly used in the fields of computer aided design (CAD), medical imaging, cultural relic representation and object-shape detection. Usually, the measured data acquired from coordinate measuring systems is neither gridded nor completely scattered. The distribution of this kind of data is scattered in physical space, but the data points are stored in a way consistent with the order of measurement, so it is named quasi scattered data in this paper. Therefore they can be organized into rows easily but the number of points in each row is random. In order to overcome the difficulty of surface fitting from this kind of data, a new method based on resampling is proposed. It consists of three major steps: (1) NURBS curve fitting for each row, (2) resampling on the fitted curve and (3) surface fitting from the resampled data. Iterative projection optimization scheme is applied in the first and third step to yield advisable parameterization and reduce the time cost of projection. A resampling approach based on parameters, local peaks and contour curvature is proposed to overcome the problems of nodes redundancy and high time consumption in the fitting of this kind of scattered data. Numerical experiments are conducted with both simulation and practical data, and the results show that the proposed method is fast, effective and robust. What’s more, by analyzing the fitting results acquired form data with different degrees of scatterness it can be demonstrated that the error introduced by resampling is negligible and therefore it is feasible.
Collapse
|
44
|
Heinze G, Wallisch C, Dunkler D. Variable selection - A review and recommendations for the practicing statistician. Biom J 2018; 60:431-449. [PMID: 29292533 PMCID: PMC5969114 DOI: 10.1002/bimj.201700067] [Citation(s) in RCA: 654] [Impact Index Per Article: 109.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Revised: 11/13/2017] [Accepted: 11/17/2017] [Indexed: 12/12/2022]
Abstract
Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well-established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10-30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change-in-estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p-values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low-dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms.
Collapse
Affiliation(s)
- Georg Heinze
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, 1090, Austria
| | - Christine Wallisch
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, 1090, Austria
| | - Daniela Dunkler
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
45
|
Abstract
Identification of treatment selection biomarkers has become very important in cancer drug development. Adaptive enrichment designs have been developed for situations where a unique treatment selection biomarker is not apparent based on the mechanism of action of the drug. With such designs, the eligibility rules may be adaptively modified at interim analysis times to exclude patients who are unlikely to benefit from the test treatment.We consider a recently proposed, particularly flexible approach that permits development of model-based multifeature predictive classifiers as well as optimized cut-points for continuous biomarkers. A single significance test, including all randomized patients, is performed at the end of the trial of the strong null hypothesis that the expected outcome on the test treatment is no better than control for any of the subset populations of patients accrued in the K stages of the clinical trial. In this paper, we address 2 issues involving inference following an adaptive enrichment design as described above. The first is specification of the intended use population and estimation of treatment effect for that population following rejection of the strong null hypothesis. The second issue is defining conditions in which rejection of the strong null hypothesis implies rejection of the null hypothesis for the intended use population.
Collapse
Affiliation(s)
- Richard Simon
- Biometric Research Program, National Cancer Institute, Rockville, MD 20850, USA
| | - Noah Simon
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
46
|
Hopton ME, Karunanithi AT, Garmestani AS, White D, Choate JR, Cabezas H. A supplementary tool to existing approaches for assessing ecosystem community structure. Ecol Modell 2017; 355:64-69. [PMID: 30220776 DOI: 10.1016/j.ecolmodel.2017.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Measures of biological or species diversity are central to ecology and conservation biology. Although there are several commonly used indices, each has shortcomings and all vary in the relative emphasis they place on the number of species and their relative abundance. We propose utilizing Fisher Information, not as a replacement for existing indices, but as a supplement to other indices because it is sensitive to community structure. We demonstrate how Shannon's and Simpson's diversity indices quantify the diversity of two different systems and how Fisher Information can enhance the analyses by comparing, as example, body size, and phylogenetic diversity of the different communities. Fisher Information is sensitive to the order in which species are entered into the analysis, and therefore, it can detect differences in community structure. Thus, the Fisher Information index can be useful in helping understand and analyze biodiversity of ecosystems and in comparing ecological communities.
Collapse
Affiliation(s)
- Matthew E Hopton
- United States Environmental Protection Agency, Office of Research and Development, National Risk Management Research Laboratory, 26 West Martin Luther King Drive, MS 443, Cincinnati, OH 45268 USA
| | - Arunprakash T Karunanithi
- Center for Sustainable Infrastructure Systems, University of Colorado Denver, 1200 Larimer Street, Denver, CO 80217, USA
| | - Ahjond S Garmestani
- United States Environmental Protection Agency, Office of Research and Development, National Risk Management Research Laboratory, 26 West Martin Luther King Drive, MS 443, Cincinnati, OH 45268 USA
| | - Denis White
- United States Environmental Protection Agency, Office of Research and Development, National Health and Environmental Effects Research Laboratory, 200 SW 35th Street, Corvallis, Oregon 97333 USA.,Present address: Geography Program, College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis OR 97331 USA
| | - Jerry R Choate
- Posthumously; Sternberg Museum of Natural History, Fort Hays State University, 3000 Sternberg Drive, Hays, KS 67601-2006 USA
| | - Heriberto Cabezas
- United States Environmental Protection Agency, Office of Research and Development, National Risk Management Research Laboratory, 26 West Martin Luther King Drive, MS 443, Cincinnati, OH 45268 USA
| |
Collapse
|
47
|
Zhao J. Reducing Bias for Maximum Approximate Conditional Likelihood Estimator with General Missing Data Mechanism. J Nonparametr Stat 2017; 29:577-593. [PMID: 31551650 PMCID: PMC6759332 DOI: 10.1080/10485252.2017.1339306] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 04/20/2017] [Indexed: 10/19/2022]
Abstract
In missing data analysis, the assumption of the missing data mechanism is crucial. Under different assumptions, different statistical methods have to be developed accordingly; however, in reality this kind of assumption is usually unverifiable. Therefore a less stringent, and hence more flexible, assumption is preferred. In this paper, we consider a generally applicable missing data mechanism, which includes various instances in all three scenarios: missing completely at random, missing at random, and missing not at random. Under this general missing data mechanism, we introduce the conditional likelihood and its approximate version as the base for estimating the unknown parameter of interest. Since this approximate conditional likelihood uses the completely observed samples only, it may result in large estimation bias, which could deteriorate the statistical inference and also jeopardize other statistical procedure. To tackle this problem, we propose to use some resampling techniques to reduce the estimation bias. We consider both the Jackknife and the Bootstrap in our paper. We compare their asymptotic biases through a higher order expansion up to O(n -1). We also derive some results for the mean squared error in terms of estimation accuracy. We conduct comprehensive simulation studies under different situations to illustrate our proposed method. We also apply our method to a prostate cancer data analysis.
Collapse
Affiliation(s)
- Jiwei Zhao
- Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY, USA
| |
Collapse
|
48
|
Kim HJ, Luo J, Chen HS, Green D, Buckman D, Byrne J, Feuer EJ. Improved confidence interval for average annual percent change in trend analysis. Stat Med 2017; 36:3059-3074. [PMID: 28585245 DOI: 10.1002/sim.7344] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Revised: 04/25/2017] [Accepted: 04/28/2017] [Indexed: 02/02/2023]
Abstract
This paper considers an improved confidence interval for the average annual percent change in trend analysis, which is based on a weighted average of the regression slopes in the segmented line regression model with unknown change points. The performance of the improved confidence interval proposed by Muggeo is examined for various distribution settings, and two new methods are proposed for further improvement. The first method is practically equivalent to the one proposed by Muggeo, but its construction is simpler, and it is modified to use the t-distribution instead of the standard normal distribution. The second method is based on the empirical distribution of the residuals and the resampling using a uniform random sample, and its satisfactory performance is indicated by a simulation study. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Hyune-Ju Kim
- Department of Mathematics, Syracuse University, Syracuse, 13244, New York, U.S.A
| | - Jun Luo
- Digit Compass LLC, Mason, 45040, Ohio, U.S.A
| | - Huann-Sheng Chen
- Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, 20892-9765, Maryland, U.S.A
| | - Don Green
- Information Management Services, Inc., Calverton, 20705, Maryland, U.S.A
| | - Dennis Buckman
- Information Management Services, Inc., Calverton, 20705, Maryland, U.S.A
| | - Jeffrey Byrne
- Information Management Services, Inc., Calverton, 20705, Maryland, U.S.A
| | - Eric J Feuer
- Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, 20892-9765, Maryland, U.S.A
| |
Collapse
|
49
|
Abstract
Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.
Collapse
Affiliation(s)
- Yize Zhao
- Department of Healthcare Policy and Research, Weill Cornell Medical College, Cornell University
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania
| |
Collapse
|
50
|
Cheng R, Doerge RW, Borevitz J. Novel Resampling Improves Statistical Power for Multiple-Trait QTL Mapping. G3 (Bethesda) 2017; 7:813-822. [PMID: 28064191 PMCID: PMC5345711 DOI: 10.1534/g3.116.037531] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 12/29/2016] [Indexed: 01/13/2023]
Abstract
Multiple-trait analysis typically employs models that associate a quantitative trait locus (QTL) with all of the traits. As a result, statistical power for QTL detection may not be optimal if the QTL contributes to the phenotypic variation in only a small proportion of the traits. Excluding QTL effects that contribute little to the test statistic can improve statistical power. In this article, we show that an optimal power can be achieved when the number of QTL effects is best estimated, and that a stringent criterion for QTL effect selection may improve power when the number of QTL effects is small but can reduce power otherwise. We investigate strategies for excluding trivial QTL effects, and propose a method that improves statistical power when the number of QTL effects is relatively small, and fairly maintains the power when the number of QTL effects is large. The proposed method first uses resampling techniques to determine the number of nontrivial QTL effects, and then selects QTL effects by the backward elimination procedure for significance test. We also propose a method for testing QTL-trait associations that are desired for biological interpretation in applications. We validate our methods using simulations and Arabidopsis thaliana transcript data.
Collapse
Affiliation(s)
- Riyan Cheng
- Research School of Biology, The Australian National University, Acton, Australian Capital Territory 2601, Australia, ARC Center of Excellence in Plant Energy Biology, The Australian National University, Acton, ACT 2601, Australia
| | - R W Doerge
- Department of Statistics, Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Justin Borevitz
- Research School of Biology, The Australian National University, Acton, Australian Capital Territory 2601, Australia, ARC Center of Excellence in Plant Energy Biology, The Australian National University, Acton, ACT 2601, Australia
| |
Collapse
|