1
|
Wang JA, Wang HF, Cao B, Lei X, Long C. Cultural Dimensions Moderate the Association between Loneliness and Mental Health during Adolescence and Younger Adulthood: A Systematic Review and Meta-Analysis. J Youth Adolesc 2024; 53:1774-1819. [PMID: 38662185 DOI: 10.1007/s10964-024-01977-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Accepted: 03/22/2024] [Indexed: 04/26/2024]
Abstract
Cultural factors, such as country or continent, influence the relationship between loneliness and mental health. However, less is known about how cultural dimensions moderate this relationship during adolescence and younger adulthood, even if these dimensions manifest as country or continent differences. This study aims to examine the potential influence of Hofstede's cultural dimensions on this relationship using a three-level meta-analysis approach. A total of 292 studies with 291,946 participants aged 10 to 24 were included in this study. The results indicate that cultural dimensions, such as individualism vs. collectivism, indulgence vs. restraint, power distance, and long-term vs. short-term orientation, moderated the associations between loneliness and social anxiety, stress, Internet overuse, and negative affect. The association between loneliness and mental health was not moderated by cultural dimensions, such as masculinity and uncertainty avoidance. These findings suggest that culture's influence on the association between loneliness and mental health is based on a domain-specific mechanism.
Collapse
Affiliation(s)
- Jing-Ai Wang
- School of Psychology and Key Laboratory of Cognition and Personality of the Ministry of Education, Southwest University, Chongqing, 400715, China
| | - Hai-Fan Wang
- School of Psychology and Key Laboratory of Cognition and Personality of the Ministry of Education, Southwest University, Chongqing, 400715, China
| | - Bing Cao
- School of Psychology and Key Laboratory of Cognition and Personality of the Ministry of Education, Southwest University, Chongqing, 400715, China
| | - Xu Lei
- School of Psychology and Key Laboratory of Cognition and Personality of the Ministry of Education, Southwest University, Chongqing, 400715, China
| | - Changquan Long
- School of Psychology and Key Laboratory of Cognition and Personality of the Ministry of Education, Southwest University, Chongqing, 400715, China.
| |
Collapse
|
2
|
Kneipp J, Seifert S, Gärber F. SERS microscopy as a tool for comprehensive biochemical characterization in complex samples. Chem Soc Rev 2024. [PMID: 38934892 DOI: 10.1039/d4cs00460d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
Surface enhanced Raman scattering (SERS) spectra of biomaterials such as cells or tissues can be used to obtain biochemical information from nanoscopic volumes in these heterogeneous samples. This tutorial review discusses the factors that determine the outcome of a SERS experiment in complex bioorganic samples. They are related to the SERS process itself, the possibility to selectively probe certain regions or constituents of a sample, and the retrieval of the vibrational information in order to identify molecules and their interaction. After introducing basic aspects of SERS experiments in the context of biocompatible environments, spectroscopy in typical microscopic settings is exemplified, including the possibilities to combine SERS with other linear and non-linear microscopic tools, and to exploit approaches that improve lateral and temporal resolution. In particular the great variation of data in a SERS experiment calls for robust data analysis tools. Approaches will be introduced that have been originally developed in the field of bioinformatics for the application to omics data and that show specific potential in the analysis of SERS data. They include the use of simulated data and machine learning tools that can yield chemical information beyond achieving spectral classification.
Collapse
Affiliation(s)
- Janina Kneipp
- Department of Chemistry, Humboldt-Universität zu Berlin, Brook-Taylor-Str. 2, 12489 Berlin, Germany.
| | - Stephan Seifert
- Hamburg School of Food Science, Department of Chemistry, Universität Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| | - Florian Gärber
- Hamburg School of Food Science, Department of Chemistry, Universität Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| |
Collapse
|
3
|
Shi H, Lin R, Lin X. Comparative review of novel model-assisted designs for phase I/II clinical trials. Biom J 2024; 66:e2300398. [PMID: 38738318 DOI: 10.1002/bimj.202300398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/07/2024] [Indexed: 05/14/2024]
Abstract
In recent years, both model-based and model-assisted designs have emerged to efficiently determine the optimal biological dose (OBD) in phase I/II trials for immunotherapy and targeted cellular agents. Model-based designs necessitate repeated model fitting and computationally intensive posterior sampling for each dose-assignment decision, limiting their practical application in real trials. On the other hand, model-assisted designs employ simple statistical models and facilitate the precalculation of a decision table for use throughout the trial, eliminating the need for repeated model fitting. Due to their simplicity and transparency, model-assisted designs are often preferred in phase I/II trials. In this paper, we systematically evaluate and compare the operating characteristics of several recent model-assisted phase I/II designs, including TEPI, PRINTE, Joint i3+3, BOIN-ET, STEIN, uTPI, and BOIN12, in addition to the well-known model-based EffTox design, using comprehensive numerical simulations. To ensure an unbiased comparison, we generated 10,000 dosing scenarios using a random scenario generation algorithm for each predetermined OBD location. We thoroughly assess various performance metrics, such as the selection percentages, average patient allocation to OBD, and overdose percentages across the eight designs. Based on these assessments, we offer design recommendations tailored to different objectives, sample sizes, and starting dose locations.
Collapse
Affiliation(s)
- Haolun Shi
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Ruitao Lin
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Xiaolei Lin
- School of Data Science, Fudan University, Shanghai, China
| |
Collapse
|
4
|
Stolte M, Schreck N, Slynko A, Saadati M, Benner A, Rahnenführer J, Bommert A. Simulation study to evaluate when Plasmode simulation is superior to parametric simulation in estimating the mean squared error of the least squares estimator in linear regression. PLoS One 2024; 19:e0299989. [PMID: 38748677 PMCID: PMC11095703 DOI: 10.1371/journal.pone.0299989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 02/20/2024] [Indexed: 05/19/2024] Open
Abstract
Simulation is a crucial tool for the evaluation and comparison of statistical methods. How to design fair and neutral simulation studies is therefore of great interest for both researchers developing new methods and practitioners confronted with the choice of the most suitable method. The term simulation usually refers to parametric simulation, that is, computer experiments using artificial data made up of pseudo-random numbers. Plasmode simulation, that is, computer experiments using the combination of resampling feature data from a real-life dataset and generating the target variable with a known user-selected outcome-generating model, is an alternative that is often claimed to produce more realistic data. We compare parametric and Plasmode simulation for the example of estimating the mean squared error (MSE) of the least squares estimator (LSE) in linear regression. If the true underlying data-generating process (DGP) and the outcome-generating model (OGM) were known, parametric simulation would obviously be the best choice in terms of estimating the MSE well. However, in reality, both are usually unknown, so researchers have to make assumptions: in Plasmode simulation studies for the OGM, in parametric simulation for both DGP and OGM. Most likely, these assumptions do not exactly reflect the truth. Here, we aim to find out how assumptions deviating from the true DGP and the true OGM affect the performance of parametric and Plasmode simulations in the context of MSE estimation for the LSE and in which situations which simulation type is preferable. Our results suggest that the preferable simulation method depends on many factors, including the number of features, and on how and to what extent the assumptions of a parametric simulation differ from the true DGP. Also, the resampling strategy used for Plasmode influences the results. In particular, subsampling with a small sampling proportion can be recommended.
Collapse
Affiliation(s)
- Marieke Stolte
- Department of Statistics, TU Dortmund University, Dortmund, North Rhine-Westphalia, Germany
| | - Nicholas Schreck
- Division of Biostatistics, German Cancer Research Center, Heidelberg, Baden-Wuerttemberg, Germany
| | - Alla Slynko
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Maral Saadati
- Division of Biostatistics, German Cancer Research Center, Heidelberg, Baden-Wuerttemberg, Germany
| | - Axel Benner
- Division of Biostatistics, German Cancer Research Center, Heidelberg, Baden-Wuerttemberg, Germany
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, Dortmund, North Rhine-Westphalia, Germany
| | - Andrea Bommert
- Department of Statistics, TU Dortmund University, Dortmund, North Rhine-Westphalia, Germany
| |
Collapse
|
5
|
El Emam K, Mosquera L, Fang X, El-Hussuna A. An evaluation of the replicability of analyses using synthetic health data. Sci Rep 2024; 14:6978. [PMID: 38521806 PMCID: PMC10960851 DOI: 10.1038/s41598-024-57207-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 03/15/2024] [Indexed: 03/25/2024] Open
Abstract
Synthetic data generation is being increasingly used as a privacy preserving approach for sharing health data. In addition to protecting privacy, it is important to ensure that generated data has high utility. A common way to assess utility is the ability of synthetic data to replicate results from the real data. Replicability has been defined using two criteria: (a) replicate the results of the analyses on real data, and (b) ensure valid population inferences from the synthetic data. A simulation study using three heterogeneous real-world datasets evaluated the replicability of logistic regression workloads. Eight replicability metrics were evaluated: decision agreement, estimate agreement, standardized difference, confidence interval overlap, bias, confidence interval coverage, statistical power, and precision (empirical SE). The analysis of synthetic data used a multiple imputation approach whereby up to 20 datasets were generated and the fitted logistic regression models were combined using combining rules for fully synthetic datasets. The effects of synthetic data amplification were evaluated, and two types of generative models were used: sequential synthesis using boosted decision trees and a generative adversarial network (GAN). Privacy risk was evaluated using a membership disclosure metric. For sequential synthesis, adjusted model parameters after combining at least ten synthetic datasets gave high decision and estimate agreement, low standardized difference, as well as high confidence interval overlap, low bias, the confidence interval had nominal coverage, and power close to the nominal level. Amplification had only a marginal benefit. Confidence interval coverage from a single synthetic dataset without applying combining rules were erroneous, and statistical power, as expected, was artificially inflated when amplification was used. Sequential synthesis performed considerably better than the GAN across multiple datasets. Membership disclosure risk was low for all datasets and models. For replicable results, the statistical analysis of fully synthetic data should be based on at least ten generated datasets of the same size as the original whose analyses results are combined. Analysis results from synthetic data without applying combining rules can be misleading. Replicability results are dependent on the type of generative model used, with our study suggesting that sequential synthesis has good replicability characteristics for common health research workloads.
Collapse
Affiliation(s)
- Khaled El Emam
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada.
- Replica Analytics, Ottawa, ON, Canada.
- Children's Hospital of Eastern Ontario (CHEO) Research Institute, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada.
| | - Lucy Mosquera
- Replica Analytics, Ottawa, ON, Canada
- Children's Hospital of Eastern Ontario (CHEO) Research Institute, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada
| | - Xi Fang
- Replica Analytics, Ottawa, ON, Canada
| | | |
Collapse
|
6
|
Buch G, Schulz A, Schmidtmann I, Strauch K, Wild PS. Interpretability of bi-level variable selection methods. Biom J 2024; 66:e2300063. [PMID: 38519877 DOI: 10.1002/bimj.202300063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 01/31/2024] [Accepted: 02/07/2024] [Indexed: 03/25/2024]
Abstract
Variable selection is usually performed to increase interpretability, as sparser models are easier to understand than full models. However, a focus on sparsity is not always suitable, for example, when features are related due to contextual similarities or high correlations. Here, it may be more appropriate to identify groups and their predictive members, a task that can be accomplished with bi-level selection procedures. To investigate whether such techniques lead to increased interpretability, group exponential LASSO (GEL), sparse group LASSO (SGL), composite minimax concave penalty (cMCP), and least absolute shrinkage, and selection operator (LASSO) as reference methods were used to select predictors in time-to-event, regression, and classification tasks in bootstrap samples from a cohort of 1001 patients. Different groupings based on prior knowledge, correlation structure, and random assignment were compared in terms of selection relevance, group consistency, and collinearity tolerance. The results show that bi-level selection methods are superior to LASSO in all criteria. The cMCP demonstrated superiority in selection relevance, while SGL was convincing in group consistency. An all-round capacity was achieved by GEL: the approach jointly selected correlated and content-related predictors while maintaining high selection relevance. This method seems recommendable when variables are grouped, and interpretation is of primary interest.
Collapse
Affiliation(s)
- Gregor Buch
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- German Center for Cardiovascular Research (DZHK), Mainz, Germany
| | - Andreas Schulz
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Irene Schmidtmann
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Konstantin Strauch
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Philipp S Wild
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- German Center for Cardiovascular Research (DZHK), Mainz, Germany
- Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
- Institute of Molecular Biology (IMB), Mainz, Germany
| |
Collapse
|
7
|
Fairchild AJ, Yin Y, Baraldi AN, Astivia OLO, Shi D. Many nonnormalities, one simulation: Do different data generation algorithms affect study results? Behav Res Methods 2024:10.3758/s13428-024-02364-w. [PMID: 38389030 DOI: 10.3758/s13428-024-02364-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/07/2024] [Indexed: 02/24/2024]
Abstract
Monte Carlo simulation studies are among the primary scientific outputs contributed by methodologists, guiding application of various statistical tools in practice. Although methodological researchers routinely extend simulation study findings through follow-up work, few studies are ever replicated. Simulation studies are susceptible to factors that can contribute to replicability failures, however. This paper sought to conduct a meta-scientific study by replicating one highly cited simulation study (Curran et al., Psychological Methods, 1, 16-29, 1996) that investigated the robustness of normal theory maximum likelihood (ML)-based chi-square fit statistics under multivariate nonnormality. We further examined the generalizability of the original study findings across different nonnormal data generation algorithms. Our replication results were generally consistent with original findings, but we discerned several differences. Our generalizability results were more mixed. Only two results observed under the original data generation algorithm held completely across other algorithms examined. One of the most striking findings we observed was that results associated with the independent generator (IG) data generation algorithm vastly differed from other procedures examined and suggested that ML was robust to nonnormality for the particular factor model used in the simulation. Findings point to the reality that extant methodological recommendations may not be universally valid in contexts where multiple data generation algorithms exist for a given data characteristic. We recommend that researchers consider multiple approaches to generating a specific data or model characteristic (when more than one is available) to optimize the generalizability of simulation results.
Collapse
Affiliation(s)
- Amanda J Fairchild
- Department of Psychology, University of South Carolina, Columbia, SC, USA.
| | - Yunhang Yin
- Department of Psychology, University of South Carolina, Columbia, SC, USA
| | - Amanda N Baraldi
- Department of Psychology, Oklahoma State University, Stillwater, OK, USA
| | | | - Dexin Shi
- Department of Psychology, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
8
|
Turner AJ, Sammon C, Latimer N, Adamson B, Beal B, Subbiah V, Abrams KR, Ray J. Transporting Comparative Effectiveness Evidence Between Countries: Considerations for Health Technology Assessments. PHARMACOECONOMICS 2024; 42:165-176. [PMID: 37891433 PMCID: PMC10811184 DOI: 10.1007/s40273-023-01323-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 10/03/2023] [Indexed: 10/29/2023]
Abstract
Internal validity is often the primary concern for health technology assessment agencies when assessing comparative effectiveness evidence. However, the increasing use of real-world data from countries other than a health technology assessment agency's target population in effectiveness research has increased concerns over the external validity, or "transportability", of this evidence, and has led to a preference for local data. Methods have been developed to enable a lack of transportability to be addressed, for example by accounting for cross-country differences in disease characteristics, but their consideration in health technology assessments is limited. This may be because of limited knowledge of the methods and/or uncertainties in how best to utilise them within existing health technology assessment frameworks. This article aims to provide an introduction to transportability, including a summary of its assumptions and the methods available for identifying and adjusting for a lack of transportability, before discussing important considerations relating to their use in health technology assessment settings, including guidance on the identification of effect modifiers, guidance on the choice of target population, estimand, study sample and methods, and how evaluations of transportability can be integrated into health technology assessment submission and decision processes.
Collapse
Affiliation(s)
| | | | - Nick Latimer
- School of Health and Related Research, University of Sheffield, Sheffield, UK
- Delta Hat, Nottingham, UK
| | | | | | | | - Keith R Abrams
- Department of Statistics, University of Warwick, Coventry, UK
- Centre for Health Economics, University of York, York, UK
| | - Joshua Ray
- Global Access, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070 Basel, Switzerland.
| |
Collapse
|
9
|
Nießl C, Hoffmann S, Ullmann T, Boulesteix AL. Explaining the optimistic performance evaluation of newly proposed methods: A cross-design validation experiment. Biom J 2024; 66:e2200238. [PMID: 36999395 DOI: 10.1002/bimj.202200238] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 01/09/2023] [Accepted: 01/10/2023] [Indexed: 04/01/2023]
Abstract
The constant development of new data analysis methods in many fields of research is accompanied by an increasing awareness that these new methods often perform better in their introductory paper than in subsequent comparison studies conducted by other researchers. We attempt to explain this discrepancy by conducting a systematic experiment that we call "cross-design validation of methods". In the experiment, we select two methods designed for the same data analysis task, reproduce the results shown in each paper, and then reevaluate each method based on the study design (i.e., datasets, competing methods, and evaluation criteria) that was used to show the abilities of the other method. We conduct the experiment for two data analysis tasks, namely cancer subtyping using multiomic data and differential gene expression analysis. Three of the four methods included in the experiment indeed perform worse when they are evaluated on the new study design, which is mainly caused by the different datasets. Apart from illustrating the many degrees of freedom existing in the assessment of a method and their effect on its performance, our experiment suggests that the performance discrepancies between original and subsequent papers may not only be caused by the nonneutrality of the authors proposing the new method but also by differences regarding the level of expertise and field of application. Authors of new methods should thus focus not only on a transparent and extensive evaluation but also on comprehensive method documentation that enables the correct use of their methods in subsequent studies.
Collapse
Affiliation(s)
- Christina Nießl
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany
- Munich Center for Machine Learning (MCML), Munich, Germany
| | - Sabine Hoffmann
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany
- Department of Statistics, LMU Munich, Munich, Germany
| | - Theresa Ullmann
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany
| |
Collapse
|
10
|
Friedrich S, Friede T. On the role of benchmarking data sets and simulations in method comparison studies. Biom J 2024; 66:e2200212. [PMID: 36810737 DOI: 10.1002/bimj.202200212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 01/26/2023] [Accepted: 02/01/2023] [Indexed: 02/24/2023]
Abstract
Method comparisons are essential to provide recommendations and guidance for applied researchers, who often have to choose from a plethora of available approaches. While many comparisons exist in the literature, these are often not neutral but favor a novel method. Apart from the choice of design and a proper reporting of the findings, there are different approaches concerning the underlying data for such method comparison studies. Most manuscripts on statistical methodology rely on simulation studies and provide a single real-world data set as an example to motivate and illustrate the methodology investigated. In the context of supervised learning, in contrast, methods are often evaluated using so-called benchmarking data sets, that is, real-world data that serve as gold standard in the community. Simulation studies, on the other hand, are much less common in this context. The aim of this paper is to investigate differences and similarities between these approaches, to discuss their advantages and disadvantages, and ultimately to develop new approaches to the evaluation of methods picking the best of both worlds. To this aim, we borrow ideas from different contexts such as mixed methods research and Clinical Scenario Evaluation.
Collapse
Affiliation(s)
- Sarah Friedrich
- Institute of Mathematics, University of Augsburg, Augsburg, Germany
- Centre for Advanced Analytics and Predictive Sciences, University of Augsburg, Augsburg, Germany
| | - Tim Friede
- Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee, Göttingen, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Göttingen, Göttingen, Germany
| |
Collapse
|
11
|
Leha A, Huber C, Friede T, Bauer T, Beckmann A, Bekeredjian R, Bleiziffer S, Herrmann E, Möllmann H, Walther T, Beyersdorf F, Hamm C, Künzi A, Windecker S, Stortecky S, Kutschka I, Hasenfuß G, Ensminger S, Frerker C, Seidler T. Challenges in developing and validating machine learning models for TAVI mortality risk prediction: reply. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2024; 5:3-5. [PMID: 38264698 PMCID: PMC10802823 DOI: 10.1093/ehjdh/ztad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 01/25/2024]
Affiliation(s)
- Andreas Leha
- Department of Medical Statistics, University Medical Center
Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner
Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany
| | - Cynthia Huber
- Department of Medical Statistics, University Medical Center
Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
| | - Tim Friede
- Department of Medical Statistics, University Medical Center
Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner
Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany
| | - Timm Bauer
- Department of Cardiology, Sana Klinikum Offenbach,
Starkenburgring 66, 63069 Offenbach am Main, Germany
| | - Andreas Beckmann
- German Society for Thoracic and Cardiovascular Surgery,
Langenbeck-Virchow-Haus, Luisenstraße 58/59, 10117 Berlin, Germany
- Department for Cardiac and Pediatric Cardiac Surgery, Heart Center
Duisburg, EVKLN, Gerrickstr. 21, 47137 Duisburg,
Germany
| | - Raffi Bekeredjian
- Department of Cardiology, Robert-Bosch-Krankenhaus,
Auerbachstraße 110, 70376 Stuttgart, Germany
| | - Sabine Bleiziffer
- Clinic for Thoracic and Cardiovascular Surgery, Heart and Diabetes Center
Northrhine-Westphalia, Georgstr 11, 32545 Bad Oeynhausen, Germany
| | - Eva Herrmann
- Goethe University Frankfurt, Department of Medicine, Institute of
Biostatistics and Mathematical Modelling, Theodor-Stern-Kai 7, 60590
Frankfurt Main, Germany
- DZHK (German Centre for Cardiovascular Research), Partner
Site Rhine/Main, Theodor-Stern-Kai 7, 60590 Frankfurt Main, Germany
| | - Helge Möllmann
- Department of Cardiology, St.-Johannes-Hospital Dortmund,
Johannesstrasse 9-17, 44137 Dortmund, Germany
| | - Thomas Walther
- Department of Cardiothoracic Surgery, University Hospital
Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
| | - Friedhelm Beyersdorf
- Medical Faculty of the Albert-Ludwigs-University Freiburg, University
Hospital Freiburg, Hugstetterstr. 55, 79106 Freiburg, Germany
- Department of Cardiovascular Surgery, Heart Centre Freiburg
University, Freiburg, Germany
| | - Christian Hamm
- Department of Cardiology and Angiology, University Hospital
Gießen, Klinikstr. 33, 35392 Gießen, Germany
- Department of Cardiology, Kerckhoff Heart and Thorax Center,
Benekestraße 2-8, D-61231 Bad Nauheim, Germany
| | - Arnaud Künzi
- CTU Bern, University of Bern, Mittelstrasse 43, 3012 Bern,
Switzerland
| | - Stephan Windecker
- Department of Cardiology, Inselspital, Bern University Hospital, University
of Bern, 3010 Bern, Switzerland
| | - Stefan Stortecky
- Department of Cardiology, Inselspital, Bern University Hospital, University
of Bern, 3010 Bern, Switzerland
| | - Ingo Kutschka
- Clinic for Cardiothoracic and Vascular Surgery/Heart Center, University
Medical Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen,
Germany
| | - Gerd Hasenfuß
- DZHK (German Center for Cardiovascular Research), Partner
Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany
- Clinic for Cardiology and Pulmonology, Heart Center, University Medical
Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany
| | - Stephan Ensminger
- Department of Cardiac and Thoracic Vascular Surgery, University Heart
Center Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
- DZHK (German Centre for Cardiovascular Research),
partner site Hamburg/Kiel/Lübeck, Lübeck, Germany
| | - Christian Frerker
- DZHK (German Centre for Cardiovascular Research),
partner site Hamburg/Kiel/Lübeck, Lübeck, Germany
- Department of Cardiology, University Heart Center Lübeck,
Ratzeburger Allee 160, 23538 Lübeck, Germany
| | - Tim Seidler
- DZHK (German Center for Cardiovascular Research), Partner
Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany
- Clinic for Cardiology and Pulmonology, Heart Center, University Medical
Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany
| |
Collapse
|
12
|
Geroldinger M, Verbeeck J, Thiel KE, Molenberghs G, Bathke AC, Laimer M, Zimmermann G. A neutral comparison of statistical methods for analyzing longitudinally measured ordinal outcomes in rare diseases. Biom J 2024; 66:e2200236. [PMID: 36890631 DOI: 10.1002/bimj.202200236] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 12/27/2022] [Accepted: 01/30/2023] [Indexed: 03/10/2023]
Abstract
Ordinal data in a repeated measures design of a crossover study for rare diseases usually do not allow for the use of standard parametric methods, and hence, nonparametric methods should be considered instead. However, only limited simulation studies in settings with small sample sizes exist. Therefore, starting from an Epidermolysis Bullosa simplex trial with the above-mentioned design, a rank-based approach using the R package nparLD and different generalized pairwise comparisons (GPC) methods were compared impartially in a simulation study. The results revealed that there was not one single best method for this particular design, because a trade-off exists between achieving high power, accounting for period effects, and for missing data. Specifically, nparLD as well as the unmatched GPC approaches do not address crossover aspects, and the univariate GPC variants partly ignore the longitudinal information. The matched GPC approaches, on the other hand, take the crossover effect into account in the sense of incorporating the within-subject association. Overall, the prioritized unmatched GPC method achieved the highest power in the simulation scenarios, although this may be due to the specified prioritization. The rank-based approach yielded good power even at a sample size ofN = 6 $N=6$ , whereas the matched GPC method could not control the type I error.
Collapse
Affiliation(s)
- Martin Geroldinger
- Team Biostatistics and Big Medical Data, IDA Lab Salzburg, Paracelsus Medical University, Salzburg, Austria
- Department of Research and Innovation, Paracelsus Medical University, Salzburg, Austria
| | - Johan Verbeeck
- Data Science Institute (DSI), Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Hasselt University, Hasselt, Belgium
| | - Konstantin E Thiel
- Team Biostatistics and Big Medical Data, IDA Lab Salzburg, Paracelsus Medical University, Salzburg, Austria
- Department of Research and Innovation, Paracelsus Medical University, Salzburg, Austria
| | - Geert Molenberghs
- Data Science Institute (DSI), Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Hasselt University, Hasselt, Belgium
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), KULeuven, Leuven, Belgium
| | - Arne C Bathke
- Intelligent Data Analytics (IDA) Lab Salzburg, Department of Artificial Intelligence and Human Interfaces, Faculty of Digital and Analytical Sciences, Paris Lodron University of Salzburg, Salzburg, Austria
| | - Martin Laimer
- Department of Dermatology and Allergology, Paracelsus Medical University, Salzburg, Austria
| | - Georg Zimmermann
- Team Biostatistics and Big Medical Data, IDA Lab Salzburg, Paracelsus Medical University, Salzburg, Austria
- Department of Research and Innovation, Paracelsus Medical University, Salzburg, Austria
| |
Collapse
|
13
|
Strobl C, Leisch F. Against the "one method fits all data sets" philosophy for comparison studies in methodological research. Biom J 2024; 66:e2200104. [PMID: 36053253 DOI: 10.1002/bimj.202200104] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 08/12/2022] [Accepted: 08/20/2022] [Indexed: 11/11/2022]
Abstract
Many methodological comparison studies aim at identifying a single or a few "best performing" methods over a certain range of data sets. In this paper we take a different viewpoint by asking whether the research question of identifying the best performing method is what we should be striving for in the first place. We will argue that this research question implies assumptions which we do not consider warranted in methodological research, that a different research question would be more informative, and how this research question can be fruitfully investigated.
Collapse
Affiliation(s)
- Carolin Strobl
- Department of Psychology, University of Zurich, Zurich, Switzerland
| | - Friedrich Leisch
- Institute of Statistics, University of Natural Resources and Life Sciences, Vienna, Austria
| |
Collapse
|
14
|
Heinz P, Wendel-Garcia PD, Held U. Impact of the matching algorithm on the treatment effect estimate: A neutral comparison study. Biom J 2024; 66:e2100292. [PMID: 35385172 DOI: 10.1002/bimj.202100292] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 02/22/2022] [Accepted: 02/25/2022] [Indexed: 11/12/2022]
Abstract
Propensity score matching is increasingly being used in the medical literature. Choice of matching algorithms, reporting quality, and estimands are oftentimes not discussed. We evaluated the impact of propensity score matching algorithms, based on a recent clinical dataset, with three commonly used outcomes. The resulting estimands for different strengths of treatment effects were compared in a neutral comparison study and based on a thoroughly designed simulation study. Different algorithms yielded different levels of balance after matching. Along with full matching and genetic matching with replacement, good balance was achieved with nearest neighbor matching with caliper but thereby more than one fifth of the treated units were discarded. Average marginal treatment effect estimates were least biased with genetic or nearest neighbor matching, both with replacement and full matching. Double adjustment yielded conditional treatment effects that were closer to the true values, throughout. The choice of the matching algorithm had an impact on covariate balance after matching as well as treatment effect estimates. In comparison, genetic matching with replacement yielded better covariate balance than all other matching algorithms. A literature review in the British Medical Journal including its subjournals revealed frequent use of propensity score matching; however, the use of different matching algorithms before treatment effect estimation was only reported in one out of 21 studies. Propensity score matching is a methodology for causal treatment effect estimation from observational data; however, the methodological difficulties and low reporting quality in applied medical research need to be addressed.
Collapse
Affiliation(s)
- Priska Heinz
- Epidemiology, Biostatistics and Prevention Institute, Department of Biostatistics, University of Zurich, Zurich, Switzerland
| | | | - Ulrike Held
- Epidemiology, Biostatistics and Prevention Institute, Department of Biostatistics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
15
|
Sun S, Sechidis K, Chen Y, Lu J, Ma C, Mirshani A, Ohlssen D, Vandemeulebroecke M, Bornkamp B. Comparing algorithms for characterizing treatment effect heterogeneity in randomized trials. Biom J 2024; 66:e2100337. [PMID: 36437036 DOI: 10.1002/bimj.202100337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 10/04/2022] [Accepted: 10/16/2022] [Indexed: 11/29/2022]
Abstract
The identification and estimation of heterogeneous treatment effects in biomedical clinical trials are challenging, because trials are typically planned to assess the treatment effect in the overall trial population. Nevertheless, the identification of how the treatment effect may vary across subgroups is of major importance for drug development. In this work, we review some existing simulation work and perform a simulation study to evaluate recent methods for identifying and estimating the heterogeneous treatments effects using various metrics and scenarios relevant for drug development. Our focus is not only on a comparison of the methods in general, but on how well these methods perform in simulation scenarios that reflect real clinical trials. We provide the R package benchtm that can be used to simulate synthetic biomarker distributions based on real clinical trial data and to create interpretable scenarios to benchmark methods for identification and estimation of treatment effect heterogeneity.
Collapse
Affiliation(s)
- Sophie Sun
- Advanced Methodology and Data Science, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| | | | - Yao Chen
- Advanced Methodology and Data Science, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| | - Jiarui Lu
- Advanced Methodology and Data Science, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| | - Chong Ma
- Early Development Analytics, Novartis Pharmaceuticals Corporation, Cambridge, Massachusetts, USA
| | - Ardalan Mirshani
- Advanced Methodology and Data Science, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| | - David Ohlssen
- Advanced Methodology and Data Science, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| | | | - Björn Bornkamp
- Advanced Methodology and Data Science, Novartis Pharma AG, Basel, Switzerland
| |
Collapse
|
16
|
Huang Q, Trinquart L. Relative likelihood ratios for neutral comparisons of statistical tests in simulation studies. Biom J 2024; 66:e2200102. [PMID: 36642800 DOI: 10.1002/bimj.202200102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 11/11/2022] [Accepted: 11/15/2022] [Indexed: 01/17/2023]
Abstract
When comparing the performance of two or more competing tests, simulation studies commonly focus on statistical power. However, if the size of the tests being compared are either different from one another or from the nominal size, comparing tests based on power alone may be misleading. By analogy with diagnostic accuracy studies, we introduce relative positive and negative likelihood ratios to factor in both power and size in the comparison of multiple tests. We derive sample size formulas for a comparative simulation study. As an example, we compared the performance of six statistical tests for small-study effects in meta-analyses of randomized controlled trials: Begg's rank correlation, Egger's regression, Schwarzer's method for sparse data, the trim-and-fill method, the arcsine-Thompson test, and Lin and Chu's combined test. We illustrate that comparing power alone, or power adjusted or penalized for size, can be misleading, and how the proposed likelihood ratio approach enables accurate comparison of the trade-off between power and size between competing tests.
Collapse
Affiliation(s)
- Qiuxi Huang
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Ludovic Trinquart
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA
- Tufts Clinical and Translational Science Institute, Tufts University, Boston, Massachusetts, USA
| |
Collapse
|
17
|
Pawel S, Kook L, Reeve K. Pitfalls and potentials in simulation studies: Questionable research practices in comparative simulation studies allow for spurious claims of superiority of any method. Biom J 2024; 66:e2200091. [PMID: 36890629 DOI: 10.1002/bimj.202200091] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 01/05/2023] [Accepted: 01/09/2023] [Indexed: 03/10/2023]
Abstract
Comparative simulation studies are workhorse tools for benchmarking statistical methods. As with other empirical studies, the success of simulation studies hinges on the quality of their design, execution, and reporting. If not conducted carefully and transparently, their conclusions may be misleading. In this paper, we discuss various questionable research practices, which may impact the validity of simulation studies, some of which cannot be detected or prevented by the current publication process in statistics journals. To illustrate our point, we invent a novel prediction method with no expected performance gain and benchmark it in a preregistered comparative simulation study. We show how easy it is to make the method appear superior over well-established competitor methods if questionable research practices are employed. Finally, we provide concrete suggestions for researchers, reviewers, and other academic stakeholders for improving the methodological quality of comparative simulation studies, such as preregistering simulation protocols, incentivizing neutral simulation studies, and code and data sharing.
Collapse
Affiliation(s)
- Samuel Pawel
- Epidemiology, Biostatistics and Prevention Institute, Center for Reproducible Science, University of Zurich, Zurich, Switzerland
| | - Lucas Kook
- Epidemiology, Biostatistics and Prevention Institute, Center for Reproducible Science, University of Zurich, Zurich, Switzerland
| | - Kelly Reeve
- Epidemiology, Biostatistics and Prevention Institute, Center for Reproducible Science, University of Zurich, Zurich, Switzerland
| |
Collapse
|
18
|
Heinze G, Boulesteix AL, Kammer M, Morris TP, White IR. Phases of methodological research in biostatistics-Building the evidence base for new methods. Biom J 2024; 66:e2200222. [PMID: 36737675 PMCID: PMC7615508 DOI: 10.1002/bimj.202200222] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 12/09/2022] [Accepted: 01/22/2023] [Indexed: 02/05/2023]
Abstract
Although new biostatistical methods are published at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similar to the well-known phases of clinical research in drug development, we propose to define four phases of methodological research. These four phases cover (I) proposing a new methodological idea while providing, for example, logical reasoning or proofs, (II) providing empirical evidence, first in a narrow target setting, then (III) in an extended range of settings and for various outcomes, accompanied by appropriate application examples, and (IV) investigations that establish a method as sufficiently well-understood to know when it is preferred over others and when it is not; that is, its pitfalls. We suggest basic definitions of the four phases to provoke thought and discussion rather than devising an unambiguous classification of studies into phases. Too many methodological developments finish before phase III/IV, but we give two examples with references. Our concept rebalances the emphasis to studies in phases III and IV, that is, carefully planned method comparison studies and studies that explore the empirical properties of existing methods in a wider range of problems.
Collapse
Affiliation(s)
- Georg Heinze
- Center for Medical Data Science, Institute of Clinical Biometrics, Medical University of Vienna, Vienna, Austria
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians University of Munich, Munich, Germany
| | - Michael Kammer
- Center for Medical Data Science, Institute of Clinical Biometrics, Medical University of Vienna, Vienna, Austria
- Department of Medicine III, Division of Nephrology, Medical University of Vienna, Vienna, Austria
| | - Tim P. Morris
- MRC Clinical Trials Unit, Institute of Clinical Trials & Methodology, University College London, London, UK
| | - Ian R. White
- MRC Clinical Trials Unit, Institute of Clinical Trials & Methodology, University College London, London, UK
| |
Collapse
|
19
|
Kodalci L, Thas O. Neutralise: An open science initiative for neutral comparison of two-sample tests. Biom J 2024; 66:e2200237. [PMID: 38285404 DOI: 10.1002/bimj.202200237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/17/2023] [Accepted: 12/03/2023] [Indexed: 01/30/2024]
Abstract
The two-sample problem is one of the earliest problems in statistics: given two samples, the question is whether or not the observations were sampled from the same distribution. Many statistical tests have been developed for this problem, and many tests have been evaluated in simulation studies, but hardly any study has tried to set up a neutral comparison study. In this paper, we introduce an open science initiative that potentially allows for neutral comparisons of two-sample tests. It is designed as an open-source R package, a repository, and an online R Shiny app. This paper describes the principles, the design of the system and illustrates the use of the system.
Collapse
Affiliation(s)
- Leyla Kodalci
- I-BioStat, Data Science Institute, Hasselt University, Diepenbeek, Belgium
| | - Olivier Thas
- I-BioStat, Data Science Institute, Hasselt University, Diepenbeek, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- National Institute of Applied Statistics Research Australia (NIASRA), University of Wollongong, Wollongong, Australia
| |
Collapse
|
20
|
Infante G, Miceli R, Ambrogi F. Sample size and predictive performance of machine learning methods with survival data: A simulation study. Stat Med 2023; 42:5657-5675. [PMID: 37947168 DOI: 10.1002/sim.9931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 09/11/2023] [Accepted: 09/21/2023] [Indexed: 11/12/2023]
Abstract
Prediction models are increasingly developed and used in diagnostic and prognostic studies, where the use of machine learning (ML) methods is becoming more and more popular over traditional regression techniques. For survival outcomes the Cox proportional hazards model is generally used and it has been proven to achieve good prediction performances with few strong covariates. The possibility to improve the model performance by including nonlinearities, covariate interactions and time-varying effects while controlling for overfitting must be carefully considered during the model building phase. On the other hand, ML techniques are able to learn complexities from data at the cost of hyper-parameter tuning and interpretability. One aspect of special interest is the sample size needed for developing a survival prediction model. While there is guidance when using traditional statistical models, the same does not apply when using ML techniques. This work develops a time-to-event simulation framework to evaluate performances of Cox regression compared, among others, to tuned random survival forest, gradient boosting, and neural networks at varying sample sizes. Simulations were based on replications of subjects from publicly available databases, where event times were simulated according to a Cox model with nonlinearities on continuous variables and time-varying effects and on the SEER registry data.
Collapse
Affiliation(s)
- Gabriele Infante
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
- Unit of Biostatistics for Clinical Research, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Rosalba Miceli
- Unit of Biostatistics for Clinical Research, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Federico Ambrogi
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
- Scientific Directorate, IRCCS Policlinico San Donato, San Donato Milanese, Italy
| |
Collapse
|
21
|
Abell L, Maher F, Jennings AC, Gray LJ. A systematic review of simulation studies which compare existing statistical methods to account for non-compliance in randomised controlled trials. BMC Med Res Methodol 2023; 23:300. [PMID: 38104108 PMCID: PMC10724933 DOI: 10.1186/s12874-023-02126-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/12/2023] [Indexed: 12/19/2023] Open
Abstract
INTRODUCTION Non-compliance is a common challenge for researchers and may reduce the power of an intention-to-treat analysis. Whilst a per protocol approach attempts to deal with this issue, it can result in biased estimates. Several methods to resolve this issue have been identified in previous reviews, but there is limited evidence supporting their use. This review aimed to identify simulation studies which compare such methods, assess the extent to which certain methods have been investigated and determine their performance under various scenarios. METHODS A systematic search of several electronic databases including MEDLINE and Scopus was carried out from conception to 30th November 2022. Included papers were published in a peer-reviewed journal, readily available in the English language and focused on comparing relevant methods in a superiority randomised controlled trial under a simulation study. Articles were screened using these criteria and a predetermined extraction form used to identify relevant information. A quality assessment appraised the risk of bias in individual studies. Extracted data was synthesised using tables, figures and a narrative summary. Both screening and data extraction were performed by two independent reviewers with disagreements resolved by consensus. RESULTS Of 2325 papers identified, 267 full texts were screened and 17 studies finally included. Twelve methods were identified across papers. Instrumental variable methods were commonly considered, but many authors found them to be biased in some settings. Non-compliance was generally assumed to be all-or-nothing and only occurring in the intervention group, although some methods considered it as time-varying. Simulation studies commonly varied the level and type of non-compliance and factors such as effect size and strength of confounding. The quality of papers was generally good, although some lacked detail and justification. Therefore, their conclusions were deemed to be less reliable. CONCLUSIONS It is common for papers to consider instrumental variable methods but more studies are needed that consider G-methods and compare a wide range of methods in realistic scenarios. It is difficult to make conclusions about the best method to deal with non-compliance due to a limited body of evidence and the difficulty in combining results from independent simulation studies. PROSPERO REGISTRATION NUMBER CRD42022370910.
Collapse
Affiliation(s)
- Lucy Abell
- Department of Population Health Sciences, University of Leicester, Leicester, UK
| | - Francesca Maher
- Department of Population Health Sciences, University of Leicester, Leicester, UK
| | - Angus C Jennings
- Department of Population Health Sciences, University of Leicester, Leicester, UK
| | - Laura J Gray
- Department of Population Health Sciences, University of Leicester, Leicester, UK.
| |
Collapse
|
22
|
Young NT, Matz RL, Bell EF, Hayward C. How researchers calculate students' grade point average in other courses has minimal impact. PLoS One 2023; 18:e0290109. [PMID: 37594958 PMCID: PMC10437965 DOI: 10.1371/journal.pone.0290109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 08/01/2023] [Indexed: 08/20/2023] Open
Abstract
Grade point average in "other" courses (GPAO) is an increasingly common measure used to control for prior academic performance and to predict future academic performance. In previous work, there are two distinct approaches to calculating GPAO, one based on only courses taken concurrently (term GPAO) and one based on all previous courses taken (cumulative GPAO). To our knowledge, no one has studied whether these methods for calculating the GPAO result in equivalent analyses and conclusions. As researchers often use one definition or the other without comment on why that choice was made, if the two calculations of GPAO are different, researchers might be inducing systematic error into their results and publishing potentially inaccurate conclusions. We looked at more than 3,700 courses at a public, research-intensive university over a decade and found limited evidence that the choice of GPAO calculation affects the conclusions. At most, one in seven courses could be affected. Further analysis suggests that there may be situations where one form of GPAO may be preferred over the other when it comes to examining inequity in courses or predicting student grades. However, we did not find sufficient evidence to universally recommend one form of GPAO over the other.
Collapse
Affiliation(s)
- Nicholas T. Young
- Center for Academic Innovation, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Rebecca L. Matz
- Center for Academic Innovation, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Eric F. Bell
- Department of Astronomy, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Caitlin Hayward
- Center for Academic Innovation, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
23
|
Kernfeld E, Yang Y, Weinstock JS, Battle A, Cahan P. A systematic comparison of computational methods for expression forecasting. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.28.551039. [PMID: 37577640 PMCID: PMC10418073 DOI: 10.1101/2023.07.28.551039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Due to the abundance of single cell RNA-seq data, a number of methods for predicting expression after perturbation have recently been published. Expression prediction methods are enticing because they promise to answer pressing questions in fields ranging from developmental genetics to cell fate engineering and because they are faster, cheaper, and higher-throughput than their experimental counterparts. However, the absolute and relative accuracy of these methods is poorly characterized, limiting their informed use, their improvement, and the interpretation of their predictions. To address these issues, we created a benchmarking platform that combines a panel of large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to current methods. We used our platform to systematically assess methods, parameters, and sources of auxiliary data. We found that uninformed baseline predictions, which were not always included in prior evaluations, yielded the same or better mean absolute error than benchmarked methods in all test cases. These results cast doubt on the ability of current expression forecasting methods to provide mechanistic insights or to rank hypotheses for experimental follow-up. However, given the rapid pace of innovation in the field, new approaches may yield more accurate expression predictions. Our platform will serve as a neutral benchmark to improve methods and to identify contexts in which expression prediction can succeed.
Collapse
|
24
|
Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review. J Clin Epidemiol 2023; 157:120-133. [PMID: 36935090 DOI: 10.1016/j.jclinepi.2023.03.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 03/03/2023] [Accepted: 03/14/2023] [Indexed: 03/19/2023]
Abstract
OBJECTIVES In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualized risk prediction. STUDY DESIGN AND SETTING We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 1st January, 2019, and 5th September, 2019. We used existing spin frameworks and described areas of highly suggestive spin practices. RESULTS We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion, or conclusion. CONCLUSION The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence.
Collapse
Affiliation(s)
- Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Benjamin Speich
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; Meta-Research Centre, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Garrett Bullock
- Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Shona Kirtley
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK, ST5 5BG
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium; Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; EPI-centre, KU Leuven, Leuven, Belgium
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
25
|
Hu J, Szymczak S. A review on longitudinal data analysis with random forest. Brief Bioinform 2023; 24:6991123. [PMID: 36653905 PMCID: PMC10025446 DOI: 10.1093/bib/bbad002] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 12/12/2022] [Accepted: 12/31/2012] [Indexed: 01/20/2023] Open
Abstract
In longitudinal studies variables are measured repeatedly over time, leading to clustered and correlated observations. If the goal of the study is to develop prediction models, machine learning approaches such as the powerful random forest (RF) are often promising alternatives to standard statistical methods, especially in the context of high-dimensional data. In this paper, we review extensions of the standard RF method for the purpose of longitudinal data analysis. Extension methods are categorized according to the data structures for which they are designed. We consider both univariate and multivariate response longitudinal data and further categorize the repeated measurements according to whether the time effect is relevant. Even though most extensions are proposed for low-dimensional data, some can be applied to high-dimensional data. Information of available software implementations of the reviewed extensions is also given. We conclude with discussions on the limitations of our review and some future research directions.
Collapse
Affiliation(s)
- Jianchang Hu
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| | - Silke Szymczak
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany
| |
Collapse
|
26
|
Ullmann T, Peschel S, Finger P, Müller CL, Boulesteix AL. Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering. PLoS Comput Biol 2023; 19:e1010820. [PMID: 36608142 PMCID: PMC9873197 DOI: 10.1371/journal.pcbi.1010820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 01/24/2023] [Accepted: 12/15/2022] [Indexed: 01/07/2023] Open
Abstract
In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the "best" ones. However, if only the best results are selectively reported, this may cause over-optimism: the "best" method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the "best" method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.
Collapse
Affiliation(s)
- Theresa Ullmann
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität München, München, Germany
- Munich Center for Machine Learning (MCML), München, Germany
- * E-mail:
| | - Stefanie Peschel
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, Neuherberg, Germany
- Department of Statistics, Ludwig-Maximilians-Universität München, München, Germany
| | - Philipp Finger
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität München, München, Germany
| | - Christian L. Müller
- Department of Statistics, Ludwig-Maximilians-Universität München, München, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Computational Mathematics, Flatiron Institute, New York, New York, United States of America
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität München, München, Germany
- Munich Center for Machine Learning (MCML), München, Germany
| |
Collapse
|
27
|
Chowdhury MZI, Leung AA, Walker RL, Sikdar KC, O’Beirne M, Quan H, Turin TC. A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population. Sci Rep 2023; 13:13. [PMID: 36593280 PMCID: PMC9807553 DOI: 10.1038/s41598-022-27264-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 12/29/2022] [Indexed: 01/03/2023] Open
Abstract
Risk prediction models are frequently used to identify individuals at risk of developing hypertension. This study evaluates different machine learning algorithms and compares their predictive performance with the conventional Cox proportional hazards (PH) model to predict hypertension incidence using survival data. This study analyzed 18,322 participants on 24 candidate features from the large Alberta's Tomorrow Project (ATP) to develop different prediction models. To select the top features, we applied five feature selection methods, including two filter-based: a univariate Cox p-value and C-index; two embedded-based: random survival forest and least absolute shrinkage and selection operator (Lasso); and one constraint-based: the statistically equivalent signature (SES). Five machine learning algorithms were developed to predict hypertension incidence: penalized regression Ridge, Lasso, Elastic Net (EN), random survival forest (RSF), and gradient boosting (GB), along with the conventional Cox PH model. The predictive performance of the models was assessed using C-index. The performance of machine learning algorithms was observed, similar to the conventional Cox PH model. Average C-indexes were 0.78, 0.78, 0.78, 0.76, 0.76, and 0.77 for Ridge, Lasso, EN, RSF, GB and Cox PH, respectively. Important features associated with each model were also presented. Our study findings demonstrate little predictive performance difference between machine learning algorithms and the conventional Cox PH regression model in predicting hypertension incidence. In a moderate dataset with a reasonable number of features, conventional regression-based models perform similar to machine learning algorithms with good predictive accuracy.
Collapse
Affiliation(s)
- Mohammad Ziaul Islam Chowdhury
- grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada ,grid.22072.350000 0004 1936 7697Department of Family Medicine, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada ,grid.22072.350000 0004 1936 7697Present Address: Department of Psychiatry, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada
| | - Alexander A. Leung
- grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada ,grid.22072.350000 0004 1936 7697Department of Medicine, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada
| | - Robin L. Walker
- grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada ,grid.413574.00000 0001 0693 8815Primary Health Care Integration Network, Primary Health Care, Alberta Health Services, Calgary, AB Canada
| | - Khokan C. Sikdar
- grid.413574.00000 0001 0693 8815Health Status Assessment, Surveillance and Reporting, Public Health Surveillance and Infrastructure, Provincial Population and Public Health, Alberta Health Services, 10101 Southport Rd. SW, Calgary, AB T2W 3N2 Canada
| | - Maeve O’Beirne
- grid.22072.350000 0004 1936 7697Department of Family Medicine, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada
| | - Hude Quan
- grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada
| | - Tanvir C. Turin
- grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada ,grid.22072.350000 0004 1936 7697Department of Family Medicine, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada
| |
Collapse
|
28
|
White IR. The importance of plausible data-generating mechanisms in simulation studies: A response to 'Comparing methods for handling missing covariates in meta-regression' by Lee and Beretvas (doi: 10.1002/jrsm.1585). Res Synth Methods 2023; 14:137-139. [PMID: 36181469 DOI: 10.1002/jrsm.1605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 07/05/2022] [Indexed: 01/18/2023]
Abstract
The paper by Lee and Beretvas (doi:10.1002/jrsm.1585) described a well-executed simulation study comparing 'modern' with 'ad hoc' methods for performing meta-regression when some covariates are incomplete. However, they drew practical conclusions after simulating data under a single missing data mechanism which favoured the 'modern' methods, while other missing data mechanisms would have favoured the 'ad hoc' methods. Broad recommendations about methods to use in practice should instead be based on simulation studies using a range of plausible data-generating mechanisms. This range must represent what is believed likely to occur in practice, and not what is convenient for statistical analysis.
Collapse
|
29
|
Kanduri C, Scheffer L, Pavlović M, Rand KD, Chernigovskaya M, Pirvandy O, Yaari G, Greiff V, Sandve GK. simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods. Gigascience 2022; 12:giad074. [PMID: 37848619 PMCID: PMC10580376 DOI: 10.1093/gigascience/giad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 07/20/2023] [Accepted: 08/29/2023] [Indexed: 10/19/2023] Open
Abstract
BACKGROUND Machine learning (ML) has gained significant attention for classifying immune states in adaptive immune receptor repertoires (AIRRs) to support the advancement of immunodiagnostics and therapeutics. Simulated data are crucial for the rigorous benchmarking of AIRR-ML methods. Existing approaches to generating synthetic benchmarking datasets result in the generation of naive repertoires missing the key feature of many shared receptor sequences (selected for common antigens) found in antigen-experienced repertoires. RESULTS We demonstrate that a common approach to generating simulated AIRR benchmark datasets can introduce biases, which may be exploited for undesired shortcut learning by certain ML methods. To mitigate undesirable access to true signals in simulated AIRR datasets, we devised a simulation strategy (simAIRR) that constructs antigen-experienced-like repertoires with a realistic overlap of receptor sequences. simAIRR can be used for constructing AIRR-level benchmarks based on a range of assumptions (or experimental data sources) for what constitutes receptor-level immune signals. This includes the possibility of making or not making any prior assumptions regarding the similarity or commonality of immune state-associated sequences that will be used as true signals. We demonstrate the real-world realism of our proposed simulation approach by showing that basic ML strategies perform similarly on simAIRR-generated and real-world experimental AIRR datasets. CONCLUSIONS This study sheds light on the potential shortcut learning opportunities for ML methods that can arise with the state-of-the-art way of simulating AIRR datasets. simAIRR is available as a Python package: https://github.com/KanduriC/simAIRR.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
- UiORealArt Convergence Environment, University of Oslo, 0373 Oslo, Norway
| | - Lonneke Scheffer
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Milena Pavlović
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
- UiORealArt Convergence Environment, University of Oslo, 0373 Oslo, Norway
| | - Knut Dagestad Rand
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Maria Chernigovskaya
- Department of Immunology and Oslo University Hospital, University of Oslo, 0373 Oslo, Norway
| | - Oz Pirvandy
- Faculty of Engineering, Bar-Ilan University, 5290002, Israel
| | - Gur Yaari
- Faculty of Engineering, Bar-Ilan University, 5290002, Israel
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, 0373 Oslo, Norway
| | - Geir K Sandve
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0373 Oslo, Norway
- UiORealArt Convergence Environment, University of Oslo, 0373 Oslo, Norway
| |
Collapse
|
30
|
Lohmann A, Astivia OLO, Morris TP, Groenwold RHH. It's time! Ten reasons to start replicating simulation studies. FRONTIERS IN EPIDEMIOLOGY 2022; 2:973470. [PMID: 38455335 PMCID: PMC10911016 DOI: 10.3389/fepid.2022.973470] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 08/17/2022] [Indexed: 03/09/2024]
Abstract
The quantitative analysis of research data is a core element of empirical research. The performance of statistical methods that are used for analyzing empirical data can be evaluated and compared using computer simulations. A single simulation study can influence the analyses of thousands of empirical studies to follow. With great power comes great responsibility. Here, we argue that this responsibility includes replication of simulation studies to ensure a sound foundation for data analytical decisions. Furthermore, being designed, run, and reported by humans, simulation studies face challenges similar to other experimental empirical research and hence should not be exempt from replication attempts. We highlight that the potential replicability of simulation studies is an opportunity quantitative methodology as a field should pay more attention to.
Collapse
Affiliation(s)
- Anna Lohmann
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, Netherlands
| | | | - Tim P. Morris
- MRC Clinical Trials Unit at UCL, Institute of Clinical Trials and Methodology, University College London, London, United Kingdom
| | - Rolf H. H. Groenwold
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
| |
Collapse
|
31
|
Bracher-Smith M, Rees E, Menzies G, Walters JTR, O'Donovan MC, Owen MJ, Kirov G, Escott-Price V. Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank. Schizophr Res 2022; 246:156-164. [PMID: 35779327 PMCID: PMC9399753 DOI: 10.1016/j.schres.2022.06.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/01/2022] [Accepted: 06/11/2022] [Indexed: 01/29/2023]
Abstract
Machine learning (ML) holds promise for precision psychiatry, but its predictive performance is unclear. We assessed whether ML provided added value over logistic regression for prediction of schizophrenia, and compared models built using polygenic risk scores (PRS) or clinical/demographic factors. LASSO and ridge-penalised logistic regression, support vector machines (SVM), random forests, boosting, neural networks and stacked models were trained to predict schizophrenia, using PRS for schizophrenia (PRSSZ), sex, parental depression, educational attainment, winter birth, handedness and number of siblings as predictors. Models were evaluated for discrimination using area under the receiver operator characteristic curve (AUROC) and relative importance of predictors using permutation feature importance (PFI). In a secondary analysis, fitted models were tested for association with schizophrenia-related traits which had not been used in model development. Following learning curve analysis, 738 cases and 3690 randomly sampled controls were selected from the UK Biobank. ML models combining all predictors showed the highest discrimination (linear SVM, AUROC = 0.71), but did not significantly outperform logistic regression. AUROC was robust over 100 random resamples of controls. PFI identified PRSSZ as the most important predictor. Highest variance in fitted models was explained by schizophrenia-related traits including fluid intelligence (most associated: linear SVM), digit symbol substitution (RBF SVM), BMI (XGBoost), smoking status (XGBoost) and deprivation (linear SVM). In conclusion, ML approaches did not provide substantial added value for prediction of schizophrenia over logistic regression, as indexed by AUROC; however, risk scores derived with different ML approaches differ with respect to association with schizophrenia-related traits.
Collapse
Affiliation(s)
- Matthew Bracher-Smith
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University, UK; Dementia Research Institute, Cardiff University, UK
| | - Elliott Rees
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University, UK
| | | | - James T R Walters
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University, UK
| | - Michael C O'Donovan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University, UK
| | - Michael J Owen
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University, UK
| | - George Kirov
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University, UK
| | - Valentina Escott-Price
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University, UK.
| |
Collapse
|
32
|
Nabirotchkin S, Bouaziz J, Glibert F, Mandel J, Foucquier J, Hajj R, Callizot N, Cholet N, Guedj M, Cohen D. Combinational Drug Repurposing from Genetic Networks Applied to Alzheimer’s Disease. J Alzheimers Dis 2022; 88:1585-1603. [DOI: 10.3233/jad-220120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Background: Human diseases are multi-factorial biological phenomena resulting from perturbations of numerous functional networks. The complex nature of human diseases explains frequently observed marginal or transitory efficacy of mono-therapeutic interventions. For this reason, combination therapy is being increasingly evaluated as a biologically plausible strategy for reversing disease state, fostering the development of dedicated methodological and experimental approaches. In parallel, genome-wide association studies (GWAS) provide a prominent opportunity for disclosing human-specific therapeutic targets and rational drug repurposing. Objective: In this context, our objective was to elaborate an integrated computational platform to accelerate discovery and experimental validation of synergistic combinations of repurposed drugs for treatment of common human diseases. Methods: The proposed approach combines adapted statistical analysis of GWAS data, pathway-based functional annotation of genetic findings using gene set enrichment technique, computational reconstruction of signaling networks enriched in disease-associated genes, selection of candidate repurposed drugs and proof-of-concept combinational experimental screening. Results: It enables robust identification of signaling pathways enriched in disease susceptibility loci. Therapeutic targeting of the disease-associated signaling networks provides a reliable way for rational drug repurposing and rapid development of synergistic drug combinations for common human diseases. Conclusion: Here we demonstrate the feasibility and efficacy of the proposed approach with an experiment application to Alzheimer’s disease.
Collapse
|
33
|
Austin PC, Harrell FE, Lee DS, Steyerberg EW. Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure. Sci Rep 2022; 12:9312. [PMID: 35660759 PMCID: PMC9166797 DOI: 10.1038/s41598-022-13015-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 05/19/2022] [Indexed: 12/20/2022] Open
Abstract
Machine learning is increasingly being used to predict clinical outcomes. Most comparisons of different methods have been based on empirical analyses in specific datasets. We used Monte Carlo simulations to determine when machine learning methods perform better than statistical learning methods in a specific setting. We evaluated six learning methods: stochastic gradient boosting machines using trees as the base learners, random forests, artificial neural networks, the lasso, ridge regression, and linear regression estimated using ordinary least squares (OLS). Our simulations were informed by empirical analyses in patients with acute myocardial infarction (AMI) and congestive heart failure (CHF) and used six data-generating processes, each based on one of the six learning methods, to simulate continuous outcomes in the derivation and validation samples. The outcome was systolic blood pressure at hospital discharge, a continuous outcome. We applied the six learning methods in each of the simulated derivation samples and evaluated performance in the simulated validation samples. The primary observation was that neural networks tended to result in estimates with worse predictive accuracy than the other five methods in both disease samples and across all six data-generating processes. Boosted trees and OLS regression tended to perform well across a range of scenarios.
Collapse
Affiliation(s)
- Peter C Austin
- ICES, G106, 2075 Bayview Avenue, Toronto, ON, M4N 3M5, Canada. .,Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada. .,Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, ON, Canada.
| | - Frank E Harrell
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Douglas S Lee
- ICES, G106, 2075 Bayview Avenue, Toronto, ON, M4N 3M5, Canada.,Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada.,Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands
| |
Collapse
|
34
|
Smith H, Sweeting M, Morris T, Crowther MJ. A scoping methodological review of simulation studies comparing statistical and machine learning approaches to risk prediction for time-to-event data. Diagn Progn Res 2022; 6:10. [PMID: 35650647 PMCID: PMC9161606 DOI: 10.1186/s41512-022-00124-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/01/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND There is substantial interest in the adaptation and application of so-called machine learning approaches to prognostic modelling of censored time-to-event data. These methods must be compared and evaluated against existing methods in a variety of scenarios to determine their predictive performance. A scoping review of how machine learning methods have been compared to traditional survival models is important to identify the comparisons that have been made and issues where they are lacking, biased towards one approach or misleading. METHODS We conducted a scoping review of research articles published between 1 January 2000 and 2 December 2020 using PubMed. Eligible articles were those that used simulation studies to compare statistical and machine learning methods for risk prediction with a time-to-event outcome in a medical/healthcare setting. We focus on data-generating mechanisms (DGMs), the methods that have been compared, the estimands of the simulation studies, and the performance measures used to evaluate them. RESULTS A total of ten articles were identified as eligible for the review. Six of the articles evaluated a method that was developed by the authors, four of which were machine learning methods, and the results almost always stated that this developed method's performance was equivalent to or better than the other methods compared. Comparisons were often biased towards the novel approach, with the majority only comparing against a basic Cox proportional hazards model, and in scenarios where it is clear it would not perform well. In many of the articles reviewed, key information was unclear, such as the number of simulation repetitions and how performance measures were calculated. CONCLUSION It is vital that method comparisons are unbiased and comprehensive, and this should be the goal even if realising it is difficult. Fully assessing how newly developed methods perform and how they compare to a variety of traditional statistical methods for prognostic modelling is imperative as these methods are already being applied in clinical contexts. Evaluations of the performance and usefulness of recently developed methods for risk prediction should be continued and reporting standards improved as these methods become increasingly popular.
Collapse
Affiliation(s)
- Hayley Smith
- grid.9918.90000 0004 1936 8411Department of Health Sciences, University of Leicester, Leicester, LE1 7RH UK
| | - Michael Sweeting
- grid.9918.90000 0004 1936 8411Department of Health Sciences, University of Leicester, Leicester, LE1 7RH UK
- grid.417815.e0000 0004 5929 4381Statistical Innovation, Oncology Biometrics, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Tim Morris
- grid.415052.70000 0004 0606 323XMRC Clinical Trials Unit at UCL, 90 High Holborn, London, WC1V 6LJ UK
| | - Michael J. Crowther
- grid.4714.60000 0004 1937 0626Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
35
|
Richter J, Friede T, Rahnenführer J. Improving adaptive seamless designs through Bayesian optimization. Biom J 2022; 64:948-963. [PMID: 35212423 DOI: 10.1002/bimj.202000389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 08/29/2021] [Accepted: 10/01/2021] [Indexed: 11/07/2022]
Abstract
We propose to use Bayesian optimization (BO) to improve the efficiency of the design selection process in clinical trials. BO is a method to optimize expensive black-box functions, by using a regression as a surrogate to guide the search. In clinical trials, planning test procedures and sample sizes is a crucial task. A common goal is to maximize the test power, given a set of treatments, corresponding effect sizes, and a total number of samples. From a wide range of possible designs, we aim to select the best one in a short time to allow quick decisions. The standard approach to simulate the power for each single design can become too time consuming. When the number of possible designs becomes very large, either large computational resources are required or an exhaustive exploration of all possible designs takes too long. Here, we propose to use BO to quickly find a clinical trial design with high power from a large number of candidate designs. We demonstrate the effectiveness of our approach by optimizing the power of adaptive seamless designs for different sets of treatment effect sizes. Comparing BO with an exhaustive evaluation of all candidate designs shows that BO finds competitive designs in a fraction of the time.
Collapse
Affiliation(s)
- Jakob Richter
- Fakultät Statistik, Technische Universität Dortmund, Dortmund, Germany
| | - Tim Friede
- Institut für Medizinische Statistik, Universitätsmedizin Göttingen, Göttingen, Germany.,Deutsches Zentrum für Herz-Kreislauf-Forschung (DZHK), Standort Göttingen, Göttingen, Germany
| | - Jörg Rahnenführer
- Fakultät Statistik, Technische Universität Dortmund, Dortmund, Germany
| |
Collapse
|
36
|
A systematic survey of methods guidance suggests areas for improvement regarding access, development, and transparency. J Clin Epidemiol 2022; 149:217-226. [DOI: 10.1016/j.jclinepi.2022.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 05/01/2022] [Accepted: 05/15/2022] [Indexed: 11/17/2022]
|
37
|
Ullmann T, Beer A, Hünemörder M, Seidl T, Boulesteix AL. Over-optimistic evaluation and reporting of novel cluster algorithms: an illustrative study. ADV DATA ANAL CLASSI 2022. [DOI: 10.1007/s11634-022-00496-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractWhen researchers publish new cluster algorithms, they usually demonstrate the strengths of their novel approaches by comparing the algorithms’ performance with existing competitors. However, such studies are likely to be optimistically biased towards the new algorithms, as the authors have a vested interest in presenting their method as favorably as possible in order to increase their chances of getting published. Therefore, the superior performance of newly introduced cluster algorithms is over-optimistic and might not be confirmed in independent benchmark studies performed by neutral and unbiased authors. This problem is known among many researchers, but so far, the different mechanisms leading to over-optimism in cluster algorithm evaluation have never been systematically studied and discussed. Researchers are thus often not aware of the full extent of the problem. We present an illustrative study to illuminate the mechanisms by which authors—consciously or unconsciously—paint their cluster algorithm’s performance in an over-optimistic light. Using the recently published cluster algorithm Rock as an example, we demonstrate how optimization of the used datasets or data characteristics, of the algorithm’s parameters and of the choice of the competing cluster algorithms leads to Rock’s performance appearing better than it actually is. Our study is thus a cautionary tale that illustrates how easy it can be for researchers to claim apparent “superiority” of a new cluster algorithm. This illuminates the vital importance of strategies for avoiding the problems of over-optimism (such as, e.g., neutral benchmark studies), which we also discuss in the article.
Collapse
|
38
|
Gardner PP, Paterson JM, McGimpsey S, Ashari-Ghomi F, Umu SU, Pawlik A, Gavryushkin A, Black MA. Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software. Genome Biol 2022; 23:56. [PMID: 35172880 PMCID: PMC8851831 DOI: 10.1186/s13059-022-02625-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 02/06/2022] [Indexed: 11/29/2022] Open
Abstract
Background Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software. Results We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs. Conclusions Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish—possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-022-02625-x).
Collapse
Affiliation(s)
- Paul P Gardner
- Department of Biochemistry,, University of Otago, Dunedin, New Zealand. .,Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand.
| | - James M Paterson
- Department of Civil and Natural Resources Engineering, University of Canterbury, Christchurch, New Zealand
| | | | - Fatemeh Ashari-Ghomi
- Research Group for Genomic Epidemiology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Sinan U Umu
- Department of Research, Cancer Registry of Norway, Oslo, Norway
| | | | - Alex Gavryushkin
- Department of Computer Science, University of Otago, Dunedin, New Zealand.,School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Michael A Black
- Department of Biochemistry,, University of Otago, Dunedin, New Zealand
| |
Collapse
|
39
|
Westphal M, Zapf A, Brannath W. A multiple testing framework for diagnostic accuracy studies with co-primary endpoints. Stat Med 2022; 41:891-909. [PMID: 35075684 DOI: 10.1002/sim.9308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 12/12/2021] [Accepted: 12/17/2021] [Indexed: 11/08/2022]
Abstract
Major advances have been made regarding the utilization of machine learning techniques for disease diagnosis and prognosis based on complex and high-dimensional data. Despite all justified enthusiasm, overoptimistic assessments of predictive performance are still common in this area. However, predictive models and medical devices based on such models should undergo a throughout evaluation before being implemented into clinical practice. In this work, we propose a multiple testing framework for (comparative) phase III diagnostic accuracy studies with sensitivity and specificity as co-primary endpoints. Our approach challenges the frequent recommendation to strictly separate model selection and evaluation, that is, to only assess a single diagnostic model in the evaluation study. We show that our parametric simultaneous test procedure asymptotically allows strong control of the family-wise error rate. A multiplicity correction is also available for point and interval estimates. Moreover, we demonstrate in an extensive simulation study that our multiple testing strategy on average leads to a better final diagnostic model and increased statistical power. To plan such studies, we propose a Bayesian approach to determine the optimal number of models to evaluate simultaneously. For this purpose, our algorithm optimizes the expected final model performance given previous (hold-out) data from the model development phase. We conclude that an assessment of multiple promising diagnostic models in the same evaluation study has several advantages when suitable adjustments for multiple comparisons are employed.
Collapse
Affiliation(s)
- Max Westphal
- Institute for Statistics, University of Bremen, Bremen, Germany.,Max Westphal, Fraunhofer Institute for Digital Medicine MEVIS, Max-Von-Laue-Straße 2, 28359, Bremen, Germany
| | - Antonia Zapf
- Institute of Medical Biometry and Epidemiology, UKE Hamburg, Hamburg, Germany
| | - Werner Brannath
- Institute for Statistics, University of Bremen, Bremen, Germany.,Competence Center for Clinical Trials Bremen, University of Bremen, Bremen, Germany
| |
Collapse
|
40
|
|
41
|
Mejia D, Diaz M, Charry A, Enciso K, Ramírez O, Burkart S. "Stay at Home": The Effects of the COVID-19 Lockdown on Household Food Waste in Colombia. Front Psychol 2021; 12:764715. [PMID: 34777172 PMCID: PMC8581448 DOI: 10.3389/fpsyg.2021.764715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 09/30/2021] [Indexed: 11/13/2022] Open
Abstract
Household food waste represents one of the main challenges for sustainable development as this directly affects the economy of food consumers, the loss of natural resources and generates additional greenhouse gas emissions. The COVID-19 pandemic and its mitigation strategies caused one of the most serious economic crises in recent decades and could become the worst economic crisis that Latin America has had in its history. The objective of this study is to analyze changes in food waste behavior during the COVID-19 lockdown in Colombia in 2020, applying the Theory of Planned Behavior (TPB). For this purpose, we conducted a survey with 581 Colombian food consumers, which examined the influence of intentions to not waste food, subjective norms, some situational predictors, questions related to the COVID-19 pandemic, and the control of perceived behavior on food waste. The results suggest that the TPB can predict the intention to not waste food and, through it, the actual household food waste behavior, considering the lockdown in Colombia as an external shock. We observe that regarding the intention to not waste food, the most relevant variables are attitudes, subjective norms, control of the perceived behavior, and concerns regarding the Covid-19 pandemic. These variables increase the probability on average by a 0.8 Odds Ratio that the intention not to waste food increases, too. Regarding food waste behavior, whether it is considered ordinal or nominal, we see that the most relevant variables are intention, financial attitudes, and control of perceived behavior, doubling the probability that food waste behavior will improve. Based on the results, we provide recommendations for interested stakeholders that can help in the design of instruments for household food waste reduction.
Collapse
Affiliation(s)
| | - Manuel Diaz
- Alliance Bioversity International and CIAT, Cali, Colombia
| | - Andres Charry
- Alliance Bioversity International and CIAT, Cali, Colombia
| | - Karen Enciso
- Alliance Bioversity International and CIAT, Cali, Colombia
| | | | - Stefan Burkart
- Alliance Bioversity International and CIAT, Cali, Colombia
| |
Collapse
|
42
|
Peschel S, Müller CL, von Mutius E, Boulesteix AL, Depner M. NetCoMi: network construction and comparison for microbiome data in R. Brief Bioinform 2021; 22:bbaa290. [PMID: 33264391 PMCID: PMC8293835 DOI: 10.1093/bib/bbaa290] [Citation(s) in RCA: 143] [Impact Index Per Article: 47.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 09/24/2020] [Accepted: 10/07/2020] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analysing and comparing microbial association networks from high-throughput sequencing data. RESULTS Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analysing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi's wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children's rooms between samples from two study centers (Ulm and Munich). AVAILABILITY R scripts used for producing the examples shown in this manuscript are provided as supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi. CONTACT Tel:+49 89 3187 43258; stefanie.peschel@mail.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Stefanie Peschel
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
| | - Christian L Müller
- Department of Statistics, LMU München, Munich, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
- Center for Computational Mathematics, Flatiron Institute, New York, USA
| | - Erika von Mutius
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
- Dr von Hauner Children’s Hospital, LMU München, Munich, Germany
- Comprehensive Pneumology Center Munich (CPC-M), Member of the German Center for Lung Research, Munich, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU München, Munich, Germany
| | - Martin Depner
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
| |
Collapse
|
43
|
Buchka S, Hapfelmeier A, Gardner PP, Wilson R, Boulesteix AL. On the optimistic performance evaluation of newly introduced bioinformatic methods. Genome Biol 2021; 22:152. [PMID: 33975646 PMCID: PMC8111726 DOI: 10.1186/s13059-021-02365-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Accepted: 04/23/2021] [Indexed: 12/03/2022] Open
Abstract
Most research articles presenting new data analysis methods claim that "the new method performs better than existing methods," but the veracity of such statements is questionable. Our manuscript discusses and illustrates consequences of the optimistic bias occurring during the evaluation of novel data analysis methods, that is, all biases resulting from, for example, selection of datasets or competing methods, better ability to fix bugs in a preferred method, and selective reporting of method variants. We quantitatively investigate this bias using an example from epigenetic analysis: normalization methods for data generated by the Illumina HumanMethylation450K BeadChip microarray.
Collapse
Affiliation(s)
- Stefan Buchka
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU, Munich, Germany
| | - Alexander Hapfelmeier
- Institute of Medical Informatics, Statistics and Epidemiology, School of Medicine, TUM, Munich, Germany
- Institute of General Practice and Health Services Research, School of Medicine, TUM, Munich, Germany
| | - Paul P. Gardner
- Department of Biochemistry, University of Otago, Otago, New Zealand
| | - Rory Wilson
- Research Unit Molecular Epidemiology, Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU, Munich, Germany
| |
Collapse
|
44
|
Hleap JS, Littlefair JE, Steinke D, Hebert PDN, Cristescu ME. Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes. Mol Ecol Resour 2021; 21:2190-2203. [PMID: 33905615 DOI: 10.1111/1755-0998.13407] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 03/08/2021] [Accepted: 04/19/2021] [Indexed: 01/04/2023]
Abstract
The effective use of metabarcoding in biodiversity science has brought important analytical challenges due to the need to generate accurate taxonomic assignments. The assignment of sequences to genus or species level is critical for biodiversity surveys and biomonitoring, but it is particularly challenging as researchers must select the approach that best recovers information on species composition. This study evaluates the performance and accuracy of seven methods in recovering the species composition of mock communities by using COI barcode fragments. The mock communities varied in species number and specimen abundance, while upstream molecular and bioinformatic variables were held constant, and using a set of COI fragments. We evaluated the impact of parameter optimization on the quality of the predictions. Our results indicate that BLAST top hit competes well with more complex approaches if optimized for the mock community under study. For example, the two machine learning methods that were benchmarked proved more sensitive to reference database heterogeneity and completeness than methods based on sequence similarity. The accuracy of assignments was impacted by both species and specimen counts (query compositional heterogeneity) which ultimately influence the selection of appropriate software. We urge researchers to: (i) use realistic mock communities to allow optimization of parameters, regardless of the taxonomic assignment method employed; (ii) carefully choose and curate the reference databases including completeness; and (iii) use QIIME, BLAST or LCA methods, in conjunction with parameter tuning to better assign taxonomy to diverse communities, especially when information on species diversity is lacking for the area under study.
Collapse
Affiliation(s)
- Jose S Hleap
- Department of Biology, McGill University, Montreal, QC, Canada.,SHARCNET, University of Guelph, Guelph, ON, Canada.,Fundacion SQUALUS, Cali, Colombia
| | - Joanne E Littlefair
- Department of Biology, McGill University, Montreal, QC, Canada.,Queen Mary University of London, London, UK
| | - Dirk Steinke
- Centre for Biodiversity Genomics & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | - Paul D N Hebert
- Centre for Biodiversity Genomics & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | | |
Collapse
|
45
|
Quaak M, van de Mortel L, Thomas RM, van Wingen G. Deep learning applications for the classification of psychiatric disorders using neuroimaging data: Systematic review and meta-analysis. Neuroimage Clin 2021; 30:102584. [PMID: 33677240 PMCID: PMC8209481 DOI: 10.1016/j.nicl.2021.102584] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 01/18/2021] [Accepted: 01/29/2021] [Indexed: 12/20/2022]
Abstract
Deep learning (DL) methods have been increasingly applied to neuroimaging data to identify patients with psychiatric and neurological disorders. This review provides an overview of the different DL applications within psychiatry and compares DL model accuracy to standard machine learning (SML). Fifty-three articles were included for qualitative analysis, primarily investigating autism spectrum disorder (ASD; n = 22), schizophrenia (SZ; n = 22) and attention-deficit/hyperactivity disorder (ADHD; n = 9). Thirty-two of the thirty-five studies that directly compared DL to SML reported a higher accuracy for DL. Only sixteen studies could be included in a meta-regression to quantitatively compare DL and SML performance. This showed a higher odds ratio for DL models, though the comparison attained significance only for ASD. Our results suggest that deep learning of neuroimaging data is a promising tool for the classification of individual psychiatric patients. However, it is not yet used to its full potential: most studies use pre-engineered features, whereas one of the main advantages of DL is its ability to learn representations of minimally processed data. Our current evaluation is limited by minimal reporting of performance measures to enable quantitative comparisons, and the restriction to ADHD, SZ and ASD as current research focusses on large publicly available datasets. To truly uncover the added value of DL, we need carefully designed comparisons of SML and DL models which are yet rarely performed.
Collapse
Affiliation(s)
- Mirjam Quaak
- Amsterdam UMC, University of Amsterdam, Department of Psychiatry, Meibergdreef 5, 1105 AZ Amsterdam, The Netherlands
| | - Laurens van de Mortel
- Amsterdam UMC, University of Amsterdam, Department of Psychiatry, Meibergdreef 5, 1105 AZ Amsterdam, The Netherlands
| | - Rajat Mani Thomas
- Amsterdam UMC, University of Amsterdam, Department of Psychiatry, Meibergdreef 5, 1105 AZ Amsterdam, The Netherlands
| | - Guido van Wingen
- Amsterdam UMC, University of Amsterdam, Department of Psychiatry, Meibergdreef 5, 1105 AZ Amsterdam, The Netherlands.
| |
Collapse
|
46
|
Baali I, Erten C, Kazan H. DriveWays: a method for identifying possibly overlapping driver pathways in cancer. Sci Rep 2020; 10:21971. [PMID: 33319839 PMCID: PMC7738685 DOI: 10.1038/s41598-020-78852-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 11/19/2020] [Indexed: 11/22/2022] Open
Abstract
The majority of the previous methods for identifying cancer driver modules output nonoverlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution. We provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes. We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWay’s output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.
Collapse
Affiliation(s)
- Ilyes Baali
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, 07190, Antalya, Turkey
| | - Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, 07190, Antalya, Turkey.
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, 07190, Antalya, Turkey.
| |
Collapse
|
47
|
Bokulich NA, Ziemski M, Robeson MS, Kaehler BD. Measuring the microbiome: Best practices for developing and benchmarking microbiomics methods. Comput Struct Biotechnol J 2020; 18:4048-4062. [PMID: 33363701 PMCID: PMC7744638 DOI: 10.1016/j.csbj.2020.11.049] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 11/27/2020] [Accepted: 11/28/2020] [Indexed: 12/12/2022] Open
Abstract
Microbiomes are integral components of diverse ecosystems, and increasingly recognized for their roles in the health of humans, animals, plants, and other hosts. Given their complexity (both in composition and function), the effective study of microbiomes (microbiomics) relies on the development, optimization, and validation of computational methods for analyzing microbial datasets, such as from marker-gene (e.g., 16S rRNA gene) and metagenome data. This review describes best practices for benchmarking and implementing computational methods (and software) for studying microbiomes, with particular focus on unique characteristics of microbiomes and microbiomics data that should be taken into account when designing and testing microbiomics methods.
Collapse
Affiliation(s)
- Nicholas A. Bokulich
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zurich, Switzerland
| | - Michal Ziemski
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zurich, Switzerland
| | - Michael S. Robeson
- University of Arkansas for Medical Sciences, Department of Biomedical Informatics, Little Rock, AR, USA
| | | |
Collapse
|
48
|
Kreutz C, Can NS, Bruening RS, Meyberg R, Mérai Z, Fernandez-Pozo N, Rensing SA. A blind and independent benchmark study for detecting differeally methylated regions in plants. Bioinformatics 2020; 36:3314-3321. [PMID: 32181821 DOI: 10.1093/bioinformatics/btaa191] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 01/31/2020] [Accepted: 03/13/2020] [Indexed: 01/03/2023] Open
Abstract
MOTIVATION Bisulfite sequencing (BS-seq) is a state-of-the-art technique for investigating methylation of the DNA to gain insights into the epigenetic regulation. Several algorithms have been published for identification of differentially methylated regions (DMRs). However, the performances of the individual methods remain unclear and it is difficult to optimally select an algorithm in application settings. RESULTS We analyzed BS-seq data from four plants covering three taxonomic groups. We first characterized the data using multiple summary statistics describing methylation levels, coverage, noise, as well as frequencies, magnitudes and lengths of methylated regions. Then, simulated datasets with most similar characteristics to real experimental data were created. Seven different algorithms (metilene, methylKit, MOABS, DMRcate, Defiant, BSmooth, MethylSig) for DMR identification were applied and their performances were assessed. A blind and independent study design was chosen to reduce bias and to derive practical method selection guidelines. Overall, metilene had superior performance in most settings. Data attributes, such as coverage and spread of the DMR lengths, were found to be useful for selecting the best method for DMR detection. A decision tree to select the optimal approach based on these data attributes is provided. The presented procedure might serve as a general strategy for deriving algorithm selection rules tailored to demands in specific application settings. AVAILABILITY AND IMPLEMENTATION Scripts that were used for the analyses and that can be used for prediction of the optimal algorithm are provided at https://github.com/kreutz-lab/DMR-DecisionTree. Simulated and experimental data are available at https://doi.org/10.6084/m9.figshare.11619045. CONTACT ckreutz@imbi.uni-freiburg.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Clemens Kreutz
- Faculty of Medicine and Medical Center, Institute of Medical Biometry and Statistics, University of Freiburg, 79104 Freiburg, Germany.,Centre for Integrative Biological Signalling Studies (CIBSS), University of Freiburg, 79104 Freiburg, Germany
| | - Nilay S Can
- Plant Cell Biology, Faculty of Biology, University of Marburg, 35043 Marburg, Germany
| | - Ralf Schulze Bruening
- Plant Cell Biology, Faculty of Biology, University of Marburg, 35043 Marburg, Germany
| | - Rabea Meyberg
- Plant Cell Biology, Faculty of Biology, University of Marburg, 35043 Marburg, Germany
| | - Zsuzsanna Mérai
- Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), 1030 Vienna, Austria
| | - Noe Fernandez-Pozo
- Plant Cell Biology, Faculty of Biology, University of Marburg, 35043 Marburg, Germany
| | - Stefan A Rensing
- Plant Cell Biology, Faculty of Biology, University of Marburg, 35043 Marburg, Germany.,Centre for Biological Signaling Studies (BIOSS), University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
49
|
Herrmann M, Probst P, Hornung R, Jurinovic V, Boulesteix AL. Large-scale benchmark study of survival prediction methods using multi-omics data. Brief Bioinform 2020; 22:5895463. [PMID: 32823283 PMCID: PMC8138887 DOI: 10.1093/bib/bbaa167] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/25/2020] [Accepted: 07/03/2020] [Indexed: 12/18/2022] Open
Abstract
Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database 'The Cancer Genome Atlas' (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno's C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups-especially clinical variables-from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact:moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.
Collapse
Affiliation(s)
- Moritz Herrmann
- Department of Statistics, Ludwig Maximilian University, Munich, 80539, Germany
| | - Philipp Probst
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| | - Roman Hornung
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| | - Vindi Jurinovic
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany
| |
Collapse
|
50
|
Wright ES. RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency. RNA (NEW YORK, N.Y.) 2020; 26:531-540. [PMID: 32005745 PMCID: PMC7161358 DOI: 10.1261/rna.073015.119] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 01/28/2020] [Indexed: 05/05/2023]
Abstract
The importance of noncoding RNA sequences has become increasingly clear over the past decade. New RNA families are often detected and analyzed using comparative methods based on multiple sequence alignments. Accordingly, a number of programs have been developed for aligning and deriving secondary structures from sets of RNA sequences. Yet, the best tools for these tasks remain unclear because existing benchmarks contain too few sequences belonging to only a small number of RNA families. RNAconTest (RNA consistency test) is a new benchmarking approach relying on the observation that secondary structure is often conserved across highly divergent RNA sequences from the same family. RNAconTest scores multiple sequence alignments based on the level of consistency among known secondary structures belonging to reference sequences in their output alignment. Similarly, consensus secondary structure predictions are scored according to their agreement with one or more known structures in a family. Comparing the performance of 10 popular alignment programs using RNAconTest revealed that DAFS, DECIPHER, LocARNA, and MAFFT created the most structurally consistent alignments. The best consensus secondary structure predictions were generated by DAFS and LocARNA (via RNAalifold). Many of the methods specific to noncoding RNAs exhibited poor scalability as the number or length of input sequences increased, and several programs displayed substantial declines in score as more sequences were aligned. Overall, RNAconTest provides a means of testing and improving tools for comparative RNA analysis, as well as highlighting the best available approaches. RNAconTest is available from the DECIPHER website (http://DECIPHER.codes/Downloads.html).
Collapse
Affiliation(s)
- Erik S Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 15219, USA
| |
Collapse
|