Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Boulesteix AL, Lauer S, Eugster MJA. A plea for neutral comparison studies in computational sciences. PLoS One 2013;8:e61562. [PMID: 23637855 PMCID: PMC3634809 DOI: 10.1371/journal.pone.0061562] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2012] [Accepted: 03/11/2013] [Indexed: 12/04/2022] Open

For:	Boulesteix AL, Lauer S, Eugster MJA. A plea for neutral comparison studies in computational sciences. PLoS One 2013;8:e61562. [PMID: 23637855 PMCID: PMC3634809 DOI: 10.1371/journal.pone.0061562] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2012] [Accepted: 03/11/2013] [Indexed: 12/04/2022] Open

Number

Cited by Other Article(s)

Wang JA, Wang HF, Cao B, Lei X, Long C. Cultural Dimensions Moderate the Association between Loneliness and Mental Health during Adolescence and Younger Adulthood: A Systematic Review and Meta-Analysis. J Youth Adolesc 2024;53:1774-1819. [PMID: 38662185 DOI: 10.1007/s10964-024-01977-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Accepted: 03/22/2024] [Indexed: 04/26/2024]

Kneipp J, Seifert S, Gärber F. SERS microscopy as a tool for comprehensive biochemical characterization in complex samples. Chem Soc Rev 2024. [PMID: 38934892 DOI: 10.1039/d4cs00460d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]

Shi H, Lin R, Lin X. Comparative review of novel model-assisted designs for phase I/II clinical trials. Biom J 2024;66:e2300398. [PMID: 38738318 DOI: 10.1002/bimj.202300398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 12/21/2023] [Accepted: 03/07/2024] [Indexed: 05/14/2024]

Stolte M, Schreck N, Slynko A, Saadati M, Benner A, Rahnenführer J, Bommert A. Simulation study to evaluate when Plasmode simulation is superior to parametric simulation in estimating the mean squared error of the least squares estimator in linear regression. PLoS One 2024;19:e0299989. [PMID: 38748677 PMCID: PMC11095703 DOI: 10.1371/journal.pone.0299989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 02/20/2024] [Indexed: 05/19/2024] Open

Abstract

Simulation is a crucial tool for the evaluation and comparison of statistical methods. How to design fair and neutral simulation studies is therefore of great interest for both researchers developing new methods and practitioners confronted with the choice of the most suitable method. The term simulation usually refers to parametric simulation, that is, computer experiments using artificial data made up of pseudo-random numbers. Plasmode simulation, that is, computer experiments using the combination of resampling feature data from a real-life dataset and generating the target variable with a known user-selected outcome-generating model, is an alternative that is often claimed to produce more realistic data. We compare parametric and Plasmode simulation for the example of estimating the mean squared error (MSE) of the least squares estimator (LSE) in linear regression. If the true underlying data-generating process (DGP) and the outcome-generating model (OGM) were known, parametric simulation would obviously be the best choice in terms of estimating the MSE well. However, in reality, both are usually unknown, so researchers have to make assumptions: in Plasmode simulation studies for the OGM, in parametric simulation for both DGP and OGM. Most likely, these assumptions do not exactly reflect the truth. Here, we aim to find out how assumptions deviating from the true DGP and the true OGM affect the performance of parametric and Plasmode simulations in the context of MSE estimation for the LSE and in which situations which simulation type is preferable. Our results suggest that the preferable simulation method depends on many factors, including the number of features, and on how and to what extent the assumptions of a parametric simulation differ from the true DGP. Also, the resampling strategy used for Plasmode influences the results. In particular, subsampling with a small sampling proportion can be recommended.

Collapse

El Emam K, Mosquera L, Fang X, El-Hussuna A. An evaluation of the replicability of analyses using synthetic health data. Sci Rep 2024;14:6978. [PMID: 38521806 PMCID: PMC10960851 DOI: 10.1038/s41598-024-57207-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 03/15/2024] [Indexed: 03/25/2024] Open

Abstract

Synthetic data generation is being increasingly used as a privacy preserving approach for sharing health data. In addition to protecting privacy, it is important to ensure that generated data has high utility. A common way to assess utility is the ability of synthetic data to replicate results from the real data. Replicability has been defined using two criteria: (a) replicate the results of the analyses on real data, and (b) ensure valid population inferences from the synthetic data. A simulation study using three heterogeneous real-world datasets evaluated the replicability of logistic regression workloads. Eight replicability metrics were evaluated: decision agreement, estimate agreement, standardized difference, confidence interval overlap, bias, confidence interval coverage, statistical power, and precision (empirical SE). The analysis of synthetic data used a multiple imputation approach whereby up to 20 datasets were generated and the fitted logistic regression models were combined using combining rules for fully synthetic datasets. The effects of synthetic data amplification were evaluated, and two types of generative models were used: sequential synthesis using boosted decision trees and a generative adversarial network (GAN). Privacy risk was evaluated using a membership disclosure metric. For sequential synthesis, adjusted model parameters after combining at least ten synthetic datasets gave high decision and estimate agreement, low standardized difference, as well as high confidence interval overlap, low bias, the confidence interval had nominal coverage, and power close to the nominal level. Amplification had only a marginal benefit. Confidence interval coverage from a single synthetic dataset without applying combining rules were erroneous, and statistical power, as expected, was artificially inflated when amplification was used. Sequential synthesis performed considerably better than the GAN across multiple datasets. Membership disclosure risk was low for all datasets and models. For replicable results, the statistical analysis of fully synthetic data should be based on at least ten generated datasets of the same size as the original whose analyses results are combined. Analysis results from synthetic data without applying combining rules can be misleading. Replicability results are dependent on the type of generative model used, with our study suggesting that sequential synthesis has good replicability characteristics for common health research workloads.

Collapse

Buch G, Schulz A, Schmidtmann I, Strauch K, Wild PS. Interpretability of bi-level variable selection methods. Biom J 2024;66:e2300063. [PMID: 38519877 DOI: 10.1002/bimj.202300063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 01/31/2024] [Accepted: 02/07/2024] [Indexed: 03/25/2024]

Fairchild AJ, Yin Y, Baraldi AN, Astivia OLO, Shi D. Many nonnormalities, one simulation: Do different data generation algorithms affect study results? Behav Res Methods 2024:10.3758/s13428-024-02364-w. [PMID: 38389030 DOI: 10.3758/s13428-024-02364-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/07/2024] [Indexed: 02/24/2024]

Turner AJ, Sammon C, Latimer N, Adamson B, Beal B, Subbiah V, Abrams KR, Ray J. Transporting Comparative Effectiveness Evidence Between Countries: Considerations for Health Technology Assessments. PHARMACOECONOMICS 2024;42:165-176. [PMID: 37891433 PMCID: PMC10811184 DOI: 10.1007/s40273-023-01323-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 10/03/2023] [Indexed: 10/29/2023]

Nießl C, Hoffmann S, Ullmann T, Boulesteix AL. Explaining the optimistic performance evaluation of newly proposed methods: A cross-design validation experiment. Biom J 2024;66:e2200238. [PMID: 36999395 DOI: 10.1002/bimj.202200238] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 01/09/2023] [Accepted: 01/10/2023] [Indexed: 04/01/2023]

Friedrich S, Friede T. On the role of benchmarking data sets and simulations in method comparison studies. Biom J 2024;66:e2200212. [PMID: 36810737 DOI: 10.1002/bimj.202200212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 01/26/2023] [Accepted: 02/01/2023] [Indexed: 02/24/2023]

Leha A, Huber C, Friede T, Bauer T, Beckmann A, Bekeredjian R, Bleiziffer S, Herrmann E, Möllmann H, Walther T, Beyersdorf F, Hamm C, Künzi A, Windecker S, Stortecky S, Kutschka I, Hasenfuß G, Ensminger S, Frerker C, Seidler T. Challenges in developing and validating machine learning models for TAVI mortality risk prediction: reply. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2024;5:3-5. [PMID: 38264698 PMCID: PMC10802823 DOI: 10.1093/ehjdh/ztad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 01/25/2024]

Affiliation(s)

Andreas Leha Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073 Göttingen, Germany DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany
Cynthia Huber Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
Tim Friede Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, 37073 Göttingen, Germany DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany
Timm Bauer Department of Cardiology, Sana Klinikum Offenbach, Starkenburgring 66, 63069 Offenbach am Main, Germany
Andreas Beckmann German Society for Thoracic and Cardiovascular Surgery, Langenbeck-Virchow-Haus, Luisenstraße 58/59, 10117 Berlin, Germany Department for Cardiac and Pediatric Cardiac Surgery, Heart Center Duisburg, EVKLN, Gerrickstr. 21, 47137 Duisburg, Germany
Raffi Bekeredjian Department of Cardiology, Robert-Bosch-Krankenhaus, Auerbachstraße 110, 70376 Stuttgart, Germany
Sabine Bleiziffer Clinic for Thoracic and Cardiovascular Surgery, Heart and Diabetes Center Northrhine-Westphalia, Georgstr 11, 32545 Bad Oeynhausen, Germany
Eva Herrmann Goethe University Frankfurt, Department of Medicine, Institute of Biostatistics and Mathematical Modelling, Theodor-Stern-Kai 7, 60590 Frankfurt Main, Germany DZHK (German Centre for Cardiovascular Research), Partner Site Rhine/Main, Theodor-Stern-Kai 7, 60590 Frankfurt Main, Germany
Helge Möllmann Department of Cardiology, St.-Johannes-Hospital Dortmund, Johannesstrasse 9-17, 44137 Dortmund, Germany
Thomas Walther Department of Cardiothoracic Surgery, University Hospital Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
Friedhelm Beyersdorf Medical Faculty of the Albert-Ludwigs-University Freiburg, University Hospital Freiburg, Hugstetterstr. 55, 79106 Freiburg, Germany Department of Cardiovascular Surgery, Heart Centre Freiburg University, Freiburg, Germany
Christian Hamm Department of Cardiology and Angiology, University Hospital Gießen, Klinikstr. 33, 35392 Gießen, Germany Department of Cardiology, Kerckhoff Heart and Thorax Center, Benekestraße 2-8, D-61231 Bad Nauheim, Germany
Arnaud Künzi CTU Bern, University of Bern, Mittelstrasse 43, 3012 Bern, Switzerland
Stephan Windecker Department of Cardiology, Inselspital, Bern University Hospital, University of Bern, 3010 Bern, Switzerland
Stefan Stortecky Department of Cardiology, Inselspital, Bern University Hospital, University of Bern, 3010 Bern, Switzerland
Ingo Kutschka Clinic for Cardiothoracic and Vascular Surgery/Heart Center, University Medical Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany
Gerd Hasenfuß DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany Clinic for Cardiology and Pulmonology, Heart Center, University Medical Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany
Stephan Ensminger Department of Cardiac and Thoracic Vascular Surgery, University Heart Center Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Lübeck, Germany
Christian Frerker DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Lübeck, Germany Department of Cardiology, University Heart Center Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
Tim Seidler DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, Robert-Koch str. 40, 37075 Göttingen, Germany Clinic for Cardiology and Pulmonology, Heart Center, University Medical Center Göttingen, Robert-Koch Str. 40, 37075 Göttingen, Germany

Collapse

Geroldinger M, Verbeeck J, Thiel KE, Molenberghs G, Bathke AC, Laimer M, Zimmermann G. A neutral comparison of statistical methods for analyzing longitudinally measured ordinal outcomes in rare diseases. Biom J 2024;66:e2200236. [PMID: 36890631 DOI: 10.1002/bimj.202200236] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 12/27/2022] [Accepted: 01/30/2023] [Indexed: 03/10/2023]

Strobl C, Leisch F. Against the "one method fits all data sets" philosophy for comparison studies in methodological research. Biom J 2024;66:e2200104. [PMID: 36053253 DOI: 10.1002/bimj.202200104] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 08/12/2022] [Accepted: 08/20/2022] [Indexed: 11/11/2022]

Heinz P, Wendel-Garcia PD, Held U. Impact of the matching algorithm on the treatment effect estimate: A neutral comparison study. Biom J 2024;66:e2100292. [PMID: 35385172 DOI: 10.1002/bimj.202100292] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 02/22/2022] [Accepted: 02/25/2022] [Indexed: 11/12/2022]

Sun S, Sechidis K, Chen Y, Lu J, Ma C, Mirshani A, Ohlssen D, Vandemeulebroecke M, Bornkamp B. Comparing algorithms for characterizing treatment effect heterogeneity in randomized trials. Biom J 2024;66:e2100337. [PMID: 36437036 DOI: 10.1002/bimj.202100337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 10/04/2022] [Accepted: 10/16/2022] [Indexed: 11/29/2022]

Huang Q, Trinquart L. Relative likelihood ratios for neutral comparisons of statistical tests in simulation studies. Biom J 2024;66:e2200102. [PMID: 36642800 DOI: 10.1002/bimj.202200102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 11/11/2022] [Accepted: 11/15/2022] [Indexed: 01/17/2023]

Pawel S, Kook L, Reeve K. Pitfalls and potentials in simulation studies: Questionable research practices in comparative simulation studies allow for spurious claims of superiority of any method. Biom J 2024;66:e2200091. [PMID: 36890629 DOI: 10.1002/bimj.202200091] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 01/05/2023] [Accepted: 01/09/2023] [Indexed: 03/10/2023]

Heinze G, Boulesteix AL, Kammer M, Morris TP, White IR. Phases of methodological research in biostatistics-Building the evidence base for new methods. Biom J 2024;66:e2200222. [PMID: 36737675 PMCID: PMC7615508 DOI: 10.1002/bimj.202200222] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 12/09/2022] [Accepted: 01/22/2023] [Indexed: 02/05/2023]

Kodalci L, Thas O. Neutralise: An open science initiative for neutral comparison of two-sample tests. Biom J 2024;66:e2200237. [PMID: 38285404 DOI: 10.1002/bimj.202200237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/17/2023] [Accepted: 12/03/2023] [Indexed: 01/30/2024]

Infante G, Miceli R, Ambrogi F. Sample size and predictive performance of machine learning methods with survival data: A simulation study. Stat Med 2023;42:5657-5675. [PMID: 37947168 DOI: 10.1002/sim.9931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 09/11/2023] [Accepted: 09/21/2023] [Indexed: 11/12/2023]

Abell L, Maher F, Jennings AC, Gray LJ. A systematic review of simulation studies which compare existing statistical methods to account for non-compliance in randomised controlled trials. BMC Med Res Methodol 2023;23:300. [PMID: 38104108 PMCID: PMC10724933 DOI: 10.1186/s12874-023-02126-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/12/2023] [Indexed: 12/19/2023] Open

Abstract

INTRODUCTION

Non-compliance is a common challenge for researchers and may reduce the power of an intention-to-treat analysis. Whilst a per protocol approach attempts to deal with this issue, it can result in biased estimates. Several methods to resolve this issue have been identified in previous reviews, but there is limited evidence supporting their use. This review aimed to identify simulation studies which compare such methods, assess the extent to which certain methods have been investigated and determine their performance under various scenarios.

METHODS

A systematic search of several electronic databases including MEDLINE and Scopus was carried out from conception to 30th November 2022. Included papers were published in a peer-reviewed journal, readily available in the English language and focused on comparing relevant methods in a superiority randomised controlled trial under a simulation study. Articles were screened using these criteria and a predetermined extraction form used to identify relevant information. A quality assessment appraised the risk of bias in individual studies. Extracted data was synthesised using tables, figures and a narrative summary. Both screening and data extraction were performed by two independent reviewers with disagreements resolved by consensus.

RESULTS

Of 2325 papers identified, 267 full texts were screened and 17 studies finally included. Twelve methods were identified across papers. Instrumental variable methods were commonly considered, but many authors found them to be biased in some settings. Non-compliance was generally assumed to be all-or-nothing and only occurring in the intervention group, although some methods considered it as time-varying. Simulation studies commonly varied the level and type of non-compliance and factors such as effect size and strength of confounding. The quality of papers was generally good, although some lacked detail and justification. Therefore, their conclusions were deemed to be less reliable.

CONCLUSIONS

It is common for papers to consider instrumental variable methods but more studies are needed that consider G-methods and compare a wide range of methods in realistic scenarios. It is difficult to make conclusions about the best method to deal with non-compliance due to a limited body of evidence and the difficulty in combining results from independent simulation studies.

PROSPERO REGISTRATION NUMBER

CRD42022370910.

Collapse

Young NT, Matz RL, Bell EF, Hayward C. How researchers calculate students' grade point average in other courses has minimal impact. PLoS One 2023;18:e0290109. [PMID: 37594958 PMCID: PMC10437965 DOI: 10.1371/journal.pone.0290109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 08/01/2023] [Indexed: 08/20/2023] Open

Kernfeld E, Yang Y, Weinstock JS, Battle A, Cahan P. A systematic comparison of computational methods for expression forecasting. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.28.551039. [PMID: 37577640 PMCID: PMC10418073 DOI: 10.1101/2023.07.28.551039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]

Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review. J Clin Epidemiol 2023;157:120-133. [PMID: 36935090 DOI: 10.1016/j.jclinepi.2023.03.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 03/03/2023] [Accepted: 03/14/2023] [Indexed: 03/19/2023]

Affiliation(s)

Paula Dhiman Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
Jie Ma Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Constanza L Andaur Navarro Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
Benjamin Speich Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; Meta-Research Centre, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
Garrett Bullock Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK
Johanna A A Damen Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
Lotty Hooft Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
Shona Kirtley Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
Richard D Riley Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK, ST5 5BG
Ben Van Calster Department of Development and Regeneration, KU Leuven, Leuven, Belgium; Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; EPI-centre, KU Leuven, Leuven, Belgium
Karel G M Moons Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
Gary S Collins Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK

Collapse

Hu J, Szymczak S. A review on longitudinal data analysis with random forest. Brief Bioinform 2023;24:6991123. [PMID: 36653905 PMCID: PMC10025446 DOI: 10.1093/bib/bbad002] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 12/12/2022] [Accepted: 12/31/2012] [Indexed: 01/20/2023] Open

Ullmann T, Peschel S, Finger P, Müller CL, Boulesteix AL. Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering. PLoS Comput Biol 2023;19:e1010820. [PMID: 36608142 PMCID: PMC9873197 DOI: 10.1371/journal.pcbi.1010820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 01/24/2023] [Accepted: 12/15/2022] [Indexed: 01/07/2023] Open

Abstract

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the "best" ones. However, if only the best results are selectively reported, this may cause over-optimism: the "best" method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the "best" method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.

Collapse

Chowdhury MZI, Leung AA, Walker RL, Sikdar KC, O’Beirne M, Quan H, Turin TC. A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population. Sci Rep 2023;13:13. [PMID: 36593280 PMCID: PMC9807553 DOI: 10.1038/s41598-022-27264-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 12/29/2022] [Indexed: 01/03/2023] Open

Affiliation(s)

Mohammad Ziaul Islam Chowdhury grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada ,2grid.22072.350000 0004 1936 7697Department of Family Medicine, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada ,3grid.22072.350000 0004 1936 7697Present Address: Department of Psychiatry, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada
Alexander A. Leung grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada ,4grid.22072.350000 0004 1936 7697Department of Medicine, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada
Robin L. Walker grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada ,5grid.413574.00000 0001 0693 8815Primary Health Care Integration Network, Primary Health Care, Alberta Health Services, Calgary, AB Canada
Khokan C. Sikdar grid.413574.00000 0001 0693 8815Health Status Assessment, Surveillance and Reporting, Public Health Surveillance and Infrastructure, Provincial Population and Public Health, Alberta Health Services, 10101 Southport Rd. SW, Calgary, AB T2W 3N2 Canada
Maeve O’Beirne grid.22072.350000 0004 1936 7697Department of Family Medicine, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada
Hude Quan grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada
Tanvir C. Turin grid.22072.350000 0004 1936 7697Department of Community Health Sciences, University of Calgary, 3280 Hospital Drive NW, Calgary, AB T2N 4Z6 Canada ,2grid.22072.350000 0004 1936 7697Department of Family Medicine, University of Calgary, 3330 Hospital Drive NW, Calgary, AB T2N 4N1 Canada

Collapse

White IR. The importance of plausible data-generating mechanisms in simulation studies: A response to 'Comparing methods for handling missing covariates in meta-regression' by Lee and Beretvas (doi: 10.1002/jrsm.1585). Res Synth Methods 2023;14:137-139. [PMID: 36181469 DOI: 10.1002/jrsm.1605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 07/05/2022] [Indexed: 01/18/2023]

Kanduri C, Scheffer L, Pavlović M, Rand KD, Chernigovskaya M, Pirvandy O, Yaari G, Greiff V, Sandve GK. simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods. Gigascience 2022;12:giad074. [PMID: 37848619 PMCID: PMC10580376 DOI: 10.1093/gigascience/giad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 07/20/2023] [Accepted: 08/29/2023] [Indexed: 10/19/2023] Open

Lohmann A, Astivia OLO, Morris TP, Groenwold RHH. It's time! Ten reasons to start replicating simulation studies. FRONTIERS IN EPIDEMIOLOGY 2022;2:973470. [PMID: 38455335 PMCID: PMC10911016 DOI: 10.3389/fepid.2022.973470] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 08/17/2022] [Indexed: 03/09/2024]

Bracher-Smith M, Rees E, Menzies G, Walters JTR, O'Donovan MC, Owen MJ, Kirov G, Escott-Price V. Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank. Schizophr Res 2022;246:156-164. [PMID: 35779327 PMCID: PMC9399753 DOI: 10.1016/j.schres.2022.06.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/01/2022] [Accepted: 06/11/2022] [Indexed: 01/29/2023]

Abstract

Machine learning (ML) holds promise for precision psychiatry, but its predictive performance is unclear. We assessed whether ML provided added value over logistic regression for prediction of schizophrenia, and compared models built using polygenic risk scores (PRS) or clinical/demographic factors. LASSO and ridge-penalised logistic regression, support vector machines (SVM), random forests, boosting, neural networks and stacked models were trained to predict schizophrenia, using PRS for schizophrenia (PRS_SZ), sex, parental depression, educational attainment, winter birth, handedness and number of siblings as predictors. Models were evaluated for discrimination using area under the receiver operator characteristic curve (AUROC) and relative importance of predictors using permutation feature importance (PFI). In a secondary analysis, fitted models were tested for association with schizophrenia-related traits which had not been used in model development. Following learning curve analysis, 738 cases and 3690 randomly sampled controls were selected from the UK Biobank. ML models combining all predictors showed the highest discrimination (linear SVM, AUROC = 0.71), but did not significantly outperform logistic regression. AUROC was robust over 100 random resamples of controls. PFI identified PRS_SZ as the most important predictor. Highest variance in fitted models was explained by schizophrenia-related traits including fluid intelligence (most associated: linear SVM), digit symbol substitution (RBF SVM), BMI (XGBoost), smoking status (XGBoost) and deprivation (linear SVM). In conclusion, ML approaches did not provide substantial added value for prediction of schizophrenia over logistic regression, as indexed by AUROC; however, risk scores derived with different ML approaches differ with respect to association with schizophrenia-related traits.

Collapse

Nabirotchkin S, Bouaziz J, Glibert F, Mandel J, Foucquier J, Hajj R, Callizot N, Cholet N, Guedj M, Cohen D. Combinational Drug Repurposing from Genetic Networks Applied to Alzheimer’s Disease. J Alzheimers Dis 2022;88:1585-1603. [DOI: 10.3233/jad-220120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Austin PC, Harrell FE, Lee DS, Steyerberg EW. Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure. Sci Rep 2022;12:9312. [PMID: 35660759 PMCID: PMC9166797 DOI: 10.1038/s41598-022-13015-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 05/19/2022] [Indexed: 12/20/2022] Open

Smith H, Sweeting M, Morris T, Crowther MJ. A scoping methodological review of simulation studies comparing statistical and machine learning approaches to risk prediction for time-to-event data. Diagn Progn Res 2022;6:10. [PMID: 35650647 PMCID: PMC9161606 DOI: 10.1186/s41512-022-00124-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/01/2022] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

There is substantial interest in the adaptation and application of so-called machine learning approaches to prognostic modelling of censored time-to-event data. These methods must be compared and evaluated against existing methods in a variety of scenarios to determine their predictive performance. A scoping review of how machine learning methods have been compared to traditional survival models is important to identify the comparisons that have been made and issues where they are lacking, biased towards one approach or misleading.

METHODS

We conducted a scoping review of research articles published between 1 January 2000 and 2 December 2020 using PubMed. Eligible articles were those that used simulation studies to compare statistical and machine learning methods for risk prediction with a time-to-event outcome in a medical/healthcare setting. We focus on data-generating mechanisms (DGMs), the methods that have been compared, the estimands of the simulation studies, and the performance measures used to evaluate them.

RESULTS

A total of ten articles were identified as eligible for the review. Six of the articles evaluated a method that was developed by the authors, four of which were machine learning methods, and the results almost always stated that this developed method's performance was equivalent to or better than the other methods compared. Comparisons were often biased towards the novel approach, with the majority only comparing against a basic Cox proportional hazards model, and in scenarios where it is clear it would not perform well. In many of the articles reviewed, key information was unclear, such as the number of simulation repetitions and how performance measures were calculated.

CONCLUSION

It is vital that method comparisons are unbiased and comprehensive, and this should be the goal even if realising it is difficult. Fully assessing how newly developed methods perform and how they compare to a variety of traditional statistical methods for prognostic modelling is imperative as these methods are already being applied in clinical contexts. Evaluations of the performance and usefulness of recently developed methods for risk prediction should be continued and reporting standards improved as these methods become increasingly popular.

Collapse

Richter J, Friede T, Rahnenführer J. Improving adaptive seamless designs through Bayesian optimization. Biom J 2022;64:948-963. [PMID: 35212423 DOI: 10.1002/bimj.202000389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 08/29/2021] [Accepted: 10/01/2021] [Indexed: 11/07/2022]

A systematic survey of methods guidance suggests areas for improvement regarding access, development, and transparency. J Clin Epidemiol 2022;149:217-226. [DOI: 10.1016/j.jclinepi.2022.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 05/01/2022] [Accepted: 05/15/2022] [Indexed: 11/17/2022]

Ullmann T, Beer A, Hünemörder M, Seidl T, Boulesteix AL. Over-optimistic evaluation and reporting of novel cluster algorithms: an illustrative study. ADV DATA ANAL CLASSI 2022. [DOI: 10.1007/s11634-022-00496-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Gardner PP, Paterson JM, McGimpsey S, Ashari-Ghomi F, Umu SU, Pawlik A, Gavryushkin A, Black MA. Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software. Genome Biol 2022;23:56. [PMID: 35172880 PMCID: PMC8851831 DOI: 10.1186/s13059-022-02625-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 02/06/2022] [Indexed: 11/29/2022] Open

Abstract

Background

Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software.

Results

We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs.

Conclusions

Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish—possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate.

Supplementary Information

The online version contains supplementary material available at (10.1186/s13059-022-02625-x).

Collapse

Westphal M, Zapf A, Brannath W. A multiple testing framework for diagnostic accuracy studies with co-primary endpoints. Stat Med 2022;41:891-909. [PMID: 35075684 DOI: 10.1002/sim.9308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 12/12/2021] [Accepted: 12/17/2021] [Indexed: 11/08/2022]

An empirical comparison and characterisation of nine popular clustering methods. ADV DATA ANAL CLASSI 2022. [DOI: 10.1007/s11634-021-00478-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Mejia D, Diaz M, Charry A, Enciso K, Ramírez O, Burkart S. "Stay at Home": The Effects of the COVID-19 Lockdown on Household Food Waste in Colombia. Front Psychol 2021;12:764715. [PMID: 34777172 PMCID: PMC8581448 DOI: 10.3389/fpsyg.2021.764715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 09/30/2021] [Indexed: 11/13/2022] Open

Peschel S, Müller CL, von Mutius E, Boulesteix AL, Depner M. NetCoMi: network construction and comparison for microbiome data in R. Brief Bioinform 2021;22:bbaa290. [PMID: 33264391 PMCID: PMC8293835 DOI: 10.1093/bib/bbaa290] [Citation(s) in RCA: 143] [Impact Index Per Article: 47.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 09/24/2020] [Accepted: 10/07/2020] [Indexed: 12/25/2022] Open

Abstract

MOTIVATION

Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analysing and comparing microbial association networks from high-throughput sequencing data.

RESULTS

Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analysing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi's wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children's rooms between samples from two study centers (Ulm and Munich).

AVAILABILITY

R scripts used for producing the examples shown in this manuscript are provided as supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi.

CONTACT

Tel:+49 89 3187 43258; stefanie.peschel@mail.de.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Briefings in Bioinformatics online.

Collapse

Buchka S, Hapfelmeier A, Gardner PP, Wilson R, Boulesteix AL. On the optimistic performance evaluation of newly introduced bioinformatic methods. Genome Biol 2021;22:152. [PMID: 33975646 PMCID: PMC8111726 DOI: 10.1186/s13059-021-02365-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Accepted: 04/23/2021] [Indexed: 12/03/2022] Open

Hleap JS, Littlefair JE, Steinke D, Hebert PDN, Cristescu ME. Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes. Mol Ecol Resour 2021;21:2190-2203. [PMID: 33905615 DOI: 10.1111/1755-0998.13407] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 03/08/2021] [Accepted: 04/19/2021] [Indexed: 01/04/2023]

Quaak M, van de Mortel L, Thomas RM, van Wingen G. Deep learning applications for the classification of psychiatric disorders using neuroimaging data: Systematic review and meta-analysis. Neuroimage Clin 2021;30:102584. [PMID: 33677240 PMCID: PMC8209481 DOI: 10.1016/j.nicl.2021.102584] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 01/18/2021] [Accepted: 01/29/2021] [Indexed: 12/20/2022]

Baali I, Erten C, Kazan H. DriveWays: a method for identifying possibly overlapping driver pathways in cancer. Sci Rep 2020;10:21971. [PMID: 33319839 PMCID: PMC7738685 DOI: 10.1038/s41598-020-78852-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 11/19/2020] [Indexed: 11/22/2022] Open

Bokulich NA, Ziemski M, Robeson MS, Kaehler BD. Measuring the microbiome: Best practices for developing and benchmarking microbiomics methods. Comput Struct Biotechnol J 2020;18:4048-4062. [PMID: 33363701 PMCID: PMC7744638 DOI: 10.1016/j.csbj.2020.11.049] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 11/27/2020] [Accepted: 11/28/2020] [Indexed: 12/12/2022] Open

Kreutz C, Can NS, Bruening RS, Meyberg R, Mérai Z, Fernandez-Pozo N, Rensing SA. A blind and independent benchmark study for detecting differeally methylated regions in plants. Bioinformatics 2020;36:3314-3321. [PMID: 32181821 DOI: 10.1093/bioinformatics/btaa191] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 01/31/2020] [Accepted: 03/13/2020] [Indexed: 01/03/2023] Open

Abstract

MOTIVATION

Bisulfite sequencing (BS-seq) is a state-of-the-art technique for investigating methylation of the DNA to gain insights into the epigenetic regulation. Several algorithms have been published for identification of differentially methylated regions (DMRs). However, the performances of the individual methods remain unclear and it is difficult to optimally select an algorithm in application settings.

RESULTS

We analyzed BS-seq data from four plants covering three taxonomic groups. We first characterized the data using multiple summary statistics describing methylation levels, coverage, noise, as well as frequencies, magnitudes and lengths of methylated regions. Then, simulated datasets with most similar characteristics to real experimental data were created. Seven different algorithms (metilene, methylKit, MOABS, DMRcate, Defiant, BSmooth, MethylSig) for DMR identification were applied and their performances were assessed. A blind and independent study design was chosen to reduce bias and to derive practical method selection guidelines. Overall, metilene had superior performance in most settings. Data attributes, such as coverage and spread of the DMR lengths, were found to be useful for selecting the best method for DMR detection. A decision tree to select the optimal approach based on these data attributes is provided. The presented procedure might serve as a general strategy for deriving algorithm selection rules tailored to demands in specific application settings.

AVAILABILITY AND IMPLEMENTATION

Scripts that were used for the analyses and that can be used for prediction of the optimal algorithm are provided at https://github.com/kreutz-lab/DMR-DecisionTree. Simulated and experimental data are available at https://doi.org/10.6084/m9.figshare.11619045.

CONTACT

ckreutz@imbi.uni-freiburg.de.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Herrmann M, Probst P, Hornung R, Jurinovic V, Boulesteix AL. Large-scale benchmark study of survival prediction methods using multi-omics data. Brief Bioinform 2020;22:5895463. [PMID: 32823283 PMCID: PMC8138887 DOI: 10.1093/bib/bbaa167] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/25/2020] [Accepted: 07/03/2020] [Indexed: 12/18/2022] Open

Abstract

Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database 'The Cancer Genome Atlas' (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan-Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno's C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups-especially clinical variables-from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact:moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.

Collapse

Wright ES. RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency. RNA (NEW YORK, N.Y.) 2020;26:531-540. [PMID: 32005745 PMCID: PMC7161358 DOI: 10.1261/rna.073015.119] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 01/28/2020] [Indexed: 05/05/2023]