1
|
Scorza LC, Zieliński T, Kalita I, Lepore A, El Karoui M, Millar AJ. Daily life in the Open Biologist's second job, as a Data Curator. Wellcome Open Res 2024; 9:523. [PMID: 39360219 PMCID: PMC11445645 DOI: 10.12688/wellcomeopenres.22899.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2024] [Indexed: 10/04/2024] Open
Abstract
Background Data reusability is the driving force of the research data life cycle. However, implementing strategies to generate reusable data from the data creation to the sharing stages is still a significant challenge. Even when datasets supporting a study are publicly shared, the outputs are often incomplete and/or not reusable. The FAIR (Findable, Accessible, Interoperable, Reusable) principles were published as a general guidance to promote data reusability in research, but the practical implementation of FAIR principles in research groups is still falling behind. In biology, the lack of standard practices for a large diversity of data types, data storage and preservation issues, and the lack of familiarity among researchers are some of the main impeding factors to achieve FAIR data. Past literature describes biological curation from the perspective of data resources that aggregate data, often from publications. Methods Our team works alongside data-generating, experimental researchers so our perspective aligns with publication authors rather than aggregators. We detail the processes for organizing datasets for publication, showcasing practical examples from data curation to data sharing. We also recommend strategies, tools and web resources to maximize data reusability, while maintaining research productivity. Conclusion We propose a simple approach to address research data management challenges for experimentalists, designed to promote FAIR data sharing. This strategy not only simplifies data management, but also enhances data visibility, recognition and impact, ultimately benefiting the entire scientific community.
Collapse
Affiliation(s)
- Livia C.T. Scorza
- Centre for Engineering Biology and School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, EH9 3BF, UK
| | - Tomasz Zieliński
- Centre for Engineering Biology and School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, EH9 3BF, UK
| | - Irina Kalita
- Centre for Engineering Biology and School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, EH9 3BF, UK
- Institute of Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, EH9 3JD, UK
- Center for Synthetic Microbiology (SYNMIKRO), Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
| | - Alessia Lepore
- Centre for Engineering Biology and School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, EH9 3BF, UK
- Institute of Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, EH9 3JD, UK
- Laboratory for Optics and Biosciences, École Polytechnique, Institut Polytechnique de Paris, Palaiseau, Île-de-France, France
| | - Meriem El Karoui
- Centre for Engineering Biology and School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, EH9 3BF, UK
- Institute of Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, EH9 3JD, UK
- Laboratoire de Biologie et Pharmacologie Appliquée (LBPA), - ENS Paris-Saclay CNRS UMR 8113, Paris, Gif-sur-Yvette, France
| | - Andrew J. Millar
- Centre for Engineering Biology and School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, EH9 3BF, UK
| |
Collapse
|
2
|
Tazare J, Wang SV, Gini R, Prieto-Alhambra D, Arlett P, Morales Leaver DR, Morton C, Logie J, Popovic J, Donegan K, Schneeweiss S, Douglas I, Schultze A. Sharing Is Caring? International Society for Pharmacoepidemiology Review and Recommendations for Sharing Programming Code. Pharmacoepidemiol Drug Saf 2024; 33:e5856. [PMID: 39233394 DOI: 10.1002/pds.5856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/06/2024] [Accepted: 06/06/2024] [Indexed: 09/06/2024]
Abstract
PURPOSE There is increasing recognition of the importance of transparency and reproducibility in scientific research. This study aimed to quantify the extent to which programming code is publicly shared in pharmacoepidemiology, and to develop a set of recommendations on this topic. METHODS We conducted a literature review identifying all studies published in Pharmacoepidemiology and Drug Safety (PDS) between 2017 and 2022. Data were extracted on the frequency and types of programming code shared, and other key open science practices (clinical codelist sharing, data sharing, study preregistration, and stated use of reporting guidelines and preprinting). We developed six recommendations for investigators who choose to share code and gathered feedback from members of the International Society for Pharmacoepidemiology (ISPE). RESULTS Programming code sharing by articles published in PDS ranged from 1.8% in 2017 to 9.5% in 2022. It was more prevalent among articles with a methodological focus, simulation studies, and papers which also shared record-level data. CONCLUSION Programming code sharing is rare but increasing in pharmacoepidemiology studies published in PDS. We recommend improved reporting of whether code is shared and how available code can be accessed. When sharing programming code, we recommend the use of permanent digital identifiers, appropriate licenses, and, where possible, adherence to good software practices around the provision of metadata and documentation, computational reproducibility, and data privacy.
Collapse
Affiliation(s)
- John Tazare
- Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, UK
| | - Shirley V Wang
- Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Rosa Gini
- Agenzia Regionale di Sanità della Toscana, Florence, Italy
| | - Daniel Prieto-Alhambra
- Pharmaco- and Device Epidemiology, Botnar Research Centre, NDORMS, University of Oxford, Oxford, UK
- Data Analytics and Methods Taskforce, Department of Medical Informatics, Erasmus MC, Rotterdam, Netherlands
| | - Peter Arlett
- European Medicines Agency, Amsterdam, Netherlands
| | - Daniel R Morales Leaver
- European Medicines Agency, Amsterdam, Netherlands
- Division of Population Health and Genomics, University of Dundee, Dundee, UK
| | | | | | | | | | | | - Ian Douglas
- Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, UK
| | - Anna Schultze
- Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, UK
| |
Collapse
|
3
|
Sprang M, Möllmann J, Andrade-Navarro MA, Fontaine JF. Overlooked poor-quality patient samples in sequencing data impair reproducibility of published clinically relevant datasets. Genome Biol 2024; 25:222. [PMID: 39152483 PMCID: PMC11328481 DOI: 10.1186/s13059-024-03331-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/08/2024] [Indexed: 08/19/2024] Open
Abstract
BACKGROUND Reproducibility is a major concern in biomedical studies, and existing publication guidelines do not solve the problem. Batch effects and quality imbalances between groups of biological samples are major factors hampering reproducibility. Yet, the latter is rarely considered in the scientific literature. RESULTS Our analysis uses 40 clinically relevant RNA-seq datasets to quantify the impact of quality imbalance between groups of samples on the reproducibility of gene expression studies. High-quality imbalance is frequent (14 datasets; 35%), and hundreds of quality markers are present in more than 50% of the datasets. Enrichment analysis suggests common stress-driven effects among the low-quality samples and highlights a complementary role of transcription factors and miRNAs to regulate stress response. Preliminary ChIP-seq results show similar trends. Quality imbalance has an impact on the number of differential genes derived by comparing control to disease samples (the higher the imbalance, the higher the number of genes), on the proportion of quality markers in top differential genes (the higher the imbalance, the higher the proportion; up to 22%) and on the proportion of known disease genes in top differential genes (the higher the imbalance, the lower the proportion). We show that removing outliers based on their quality score improves the resulting downstream analysis. CONCLUSIONS Thanks to a stringent selection of well-designed datasets, we demonstrate that quality imbalance between groups of samples can significantly reduce the relevance of differential genes, consequently reducing reproducibility between studies. Appropriate experimental design and analysis methods can substantially reduce the problem.
Collapse
Affiliation(s)
- Maximilian Sprang
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany
| | - Jannik Möllmann
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany.
| | - Jean-Fred Fontaine
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, Mainz, 55128, Germany
- Central Institute for Decision Support Systems in Crop Protection (ZEPP), Rüdesheimer Str. 60-68, Bad Kreuznach, 55545, Germany
| |
Collapse
|
4
|
Zhang J, Liu Y, Thabane L, Li J, Bai X, Li L, Lip GYH, Sun X, Xia M, Van Spall HGC, Li G. Journal requirement for data sharing statements in clinical trials: a cross-sectional study. J Clin Epidemiol 2024; 172:111405. [PMID: 38838963 DOI: 10.1016/j.jclinepi.2024.111405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/24/2024] [Accepted: 05/27/2024] [Indexed: 06/07/2024]
Abstract
OBJECTIVES Data sharing statements are considered routine in clinical trial reporting and represent a step toward data transparency. The International Committee of Medical Journal Editors (ICMJE) required clinical trials to publish data sharing statements. We aimed to assess the requirement for data sharing statements of individual participant data by biomedical journals and explore associations between journal characteristics and journal requirements for data sharing statements. STUDY DESIGN AND SETTING In this cross-sectional study, we included all biomedical journals that published clinical trials from January 1, 2019, to December 31, 2022, and that were indexed by the Journal Citation Reports. The study outcome was the journal requirement for data sharing statements. Multivariable logistic regression analysis was used to assess the relationship between journal characteristics and requirement for data sharing statements. RESULTS Of the 3229 biomedical journals included in the analysis, 2345 (72.6%) required authors to include data sharing statements. Journals published in the UK (OR, 3.19 [95% CI, 2.43-4.22]) and endorsing the Consolidated Standards of Reporting Trials (OR, 3.30 [95% CI, 2.78-3.92]) had greater odds of requiring data sharing statements. Journals that were open access, non-English language, in the Journal Citation Reports group of clinical medicine, and on the ICMJE list had lower odds of requiring data sharing statements, with ORs ranging from 0.18 to 0.81. CONCLUSION Despite ICMJE recommendations, more than 27% of the biomedical journals that published clinical trials did not require clinical trials to include data sharing statements, highlighting room for improved transparency.
Collapse
Affiliation(s)
- Jingyi Zhang
- Center for Clinical Epidemiology and Methodology (CCEM), The Affiliated Guangdong Second Provincial General Hospital of Jinan University, Guangzhou, China
| | - Yingxin Liu
- Center for Clinical Epidemiology and Methodology (CCEM), The Affiliated Guangdong Second Provincial General Hospital of Jinan University, Guangzhou, China
| | - Lehana Thabane
- Department of Health Research Methods, Evidence, and Impact (HEI), McMaster University, Hamilton, ON, Canada; Father Sean O'Sullivan Research Centre, St Joseph's Healthcare Hamilton, Hamilton, ON, Canada; Faculty of Health Sciences, University of Johannesburg, Johannesburg, South Africa
| | - Jianfeng Li
- Department of Epidemiology and Health Statistics, School of Public Health, Guangdong Medical University, Dongguan, China
| | - Xuerui Bai
- Department of Epidemiology, School of Medicine, Jinan University, Guangzhou, China
| | - Likang Li
- Center for Clinical Epidemiology and Methodology (CCEM), The Affiliated Guangdong Second Provincial General Hospital of Jinan University, Guangzhou, China
| | - Gregory Y H Lip
- Liverpool Centre for Cardiovascular Sciences at University of Liverpool, Liverpool John Moores University, Liverpool Heart & Chest Hospital, Liverpool, United Kingdom; Department of Clinical Medicine, Aalborg University, Aalborg, Denmark
| | - Xin Sun
- Institute of Integrated Traditional Chinese and Western Medicine and Chinese Evidence-Based Medicine Center and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China; NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, Sichuan, China; Sichuan Center of Technology Innovation for Real World Data, Chengdu, Sichuan, China
| | - Min Xia
- Guangdong Provincial Key Laboratory of Food, Nutrition and Health, and Department of Nutrition, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Harriette G C Van Spall
- Department of Health Research Methods, Evidence, and Impact (HEI), McMaster University, Hamilton, ON, Canada; Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Guowei Li
- Center for Clinical Epidemiology and Methodology (CCEM), The Affiliated Guangdong Second Provincial General Hospital of Jinan University, Guangzhou, China; Father Sean O'Sullivan Research Centre, St Joseph's Healthcare Hamilton, Hamilton, ON, Canada.
| |
Collapse
|
5
|
Topham JT, Lawlor RT, Lemaire D, Casolino R, Biankin AV. Data sharing in cancer research: perceived risks and the consequences of not sharing. Lancet Oncol 2024; 25:275-276. [PMID: 38423044 DOI: 10.1016/s1470-2045(24)00021-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 01/10/2024] [Indexed: 03/02/2024]
Affiliation(s)
| | - Rita T Lawlor
- ARC-Net Research Centre and Department of Engineering for Innovative Medicine, University of Verona, Verona, Italy
| | - Diana Lemaire
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Raffaella Casolino
- Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1BD, UK
| | - Andrew V Biankin
- Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, G61 1BD, UK.
| |
Collapse
|
6
|
Hamilton DG, Page MJ, Everitt S, Fraser H, Fidler F. Cancer researchers' experiences with and perceptions of research data sharing: Results of a cross-sectional survey. Account Res 2024:1-28. [PMID: 38299475 DOI: 10.1080/08989621.2024.2308606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 01/18/2024] [Indexed: 02/02/2024]
Abstract
BACKGROUND Despite wide recognition of the benefits of sharing research data, public availability rates have not increased substantially in oncology or medicine more broadly over the last decade. METHODS We surveyed 285 cancer researchers to determine their prior experience with sharing data and views on known drivers and inhibitors. RESULTS We found that 45% of respondents had shared some data from their most recent empirical publication, with respondents who typically studied non-human research participants, or routinely worked with human genomic data, more likely to share than those who did not. A third of respondents added that they had previously shared data privately, with 74% indicating that doing so had also led to authorship opportunities or future collaborations for them. Journal and funder policies were reported to be the biggest general drivers toward sharing, whereas commercial interests, agreements with industrial sponsors and institutional policies were the biggest prohibitors. We show that researchers' decisions about whether to share data are also likely to be influenced by participants' desires. CONCLUSIONS Our survey suggests that increased promotion and support by research institutions, alongside greater championing of data sharing by journals and funders, may motivate more researchers in oncology to share their data.
Collapse
Affiliation(s)
- Daniel G Hamilton
- MetaMelb Research Group, School of BioSciences, University of Melbourne, Melbourne, Australia
- Melbourne Medical School, Faculty of Medicine, Dentistry & Health Sciences, University of Melbourne, Melbourne, Australia
| | - Matthew J Page
- Methods in Evidence Synthesis Unit, School of Public Health & Preventive Medicine, Monash University, Melbourne, Australia
| | - Sarah Everitt
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Australia
| | - Hannah Fraser
- MetaMelb Research Group, School of BioSciences, University of Melbourne, Melbourne, Australia
| | - Fiona Fidler
- MetaMelb Research Group, School of BioSciences, University of Melbourne, Melbourne, Australia
- School of History & Philosophy of Sciences, University of Melbourne, Melbourne, Australia
| |
Collapse
|
7
|
Alberti P, Argyriou AA, Bruna J, Damaj MI, Faithfull S, Harding A, Hoke A, Knoerl R, Kolb N, Li T, Park SB, Staff NP, Tamburin S, Thomas S, Smith EL. Considerations for establishing and maintaining international research collaboration: the example of chemotherapy-induced peripheral neurotoxicity (CIPN)-a white paper. Support Care Cancer 2024; 32:117. [PMID: 38244122 PMCID: PMC10799817 DOI: 10.1007/s00520-023-08301-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 12/28/2023] [Indexed: 01/22/2024]
Abstract
PURPOSE This white paper provides guidance regarding the process for establishing and maintaining international collaborations to conduct oncology/neurology-focused chemotherapy-induced peripheral neurotoxicity (CIPN) research. METHODS An international multidisciplinary group of CIPN scientists, clinicians, research administrators, and legal experts have pooled their collective knowledge regarding recommendations for establishing and maintaining international collaboration to foster advancement of CIPN science. RESULTS Experts provide recommendations in 10 categories: (1) preclinical and (2) clinical research collaboration; (3) collaborators and consortiums; (4) communication; (5) funding; (6) international regulatory standards; (7) staff training; (8) data management, quality control, and data sharing; (9) dissemination across disciplines and countries; and (10) additional recommendations about feasibility, policy, and mentorship. CONCLUSION Recommendations to establish and maintain international CIPN research collaboration will promote the inclusion of more diverse research participants, increasing consideration of cultural and genetic factors that are essential to inform innovative precision medicine interventions and propel scientific discovery to benefit cancer survivors worldwide. RELEVANCE TO INFORM RESEARCH POLICY Our suggested guidelines for establishing and maintaining international collaborations to conduct oncology/neurology-focused chemotherapy-induced peripheral neurotoxicity (CIPN) research set forth a challenge to multinational science, clinical, and policy leaders to (1) develop simple, streamlined research designs; (2) address logistical barriers; (3) simplify and standardize regulatory requirements across countries; (4) increase funding to support international collaboration; and (5) foster faculty mentorship.
Collapse
Affiliation(s)
- Paola Alberti
- University of Milano-Bicocca, School of Medicine and Surgery, Monza, Italy
- Fondazione IRCCS San Gerardo dei Tintori, Monza, Italy
| | | | - Jordi Bruna
- Hospital Universitari de Bellvitge, Neuro-Oncology Unit, Institut Catala d'Oncologia (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Spain
| | - M Imad Damaj
- Department of Pharmacology and Toxicology and Translational Research Initiative for Pain and Neuropathy, Virginia Commonwealth University, Richmond, VA, USA
| | - Sara Faithfull
- Trinity College Dublin, School of Medicine, Dublin, Ireland
- University of Dublin, Trinity Centre for Health Sciences St. James's Hospital Campus, Dublin, Ireland
| | - Alice Harding
- University of Alabama at Birmingham, Office of Sponsored Programs, Birmingham, AL, USA
| | - Ahmet Hoke
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Robert Knoerl
- Department of Health Behavior and Biological Sciences, University of Michigan School of Nursing, Ann Arbor, MI, USA
| | - Noah Kolb
- Department of Neurological Sciences, University of Vermont Robert Larner College of Medicine, Burlington, VT, USA
| | - Tiffany Li
- Faculty of Medicine and Health, University of Sydney, Brain and Mind Centre and School of Medical Sciences, Sydney, Australia
| | - Susanna B Park
- Faculty of Medicine and Health, University of Sydney, Brain and Mind Centre and School of Medical Sciences, Sydney, Australia
| | - Nathan P Staff
- Department of Neurology, Mayo Clinic, Rochester, MN, USA
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | - Simone Thomas
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Ellen Lavoie Smith
- Department of Acute, Chronic & Continuing Care, University of Alabama at Birmingham School of Nursing, Birmingham, AL, USA.
| |
Collapse
|
8
|
Collins GS, Whittle R, Bullock GS, Logullo P, Dhiman P, de Beyer JA, Riley RD, Schlussel MM. Open science practices need substantial improvement in prognostic model studies in oncology using machine learning. J Clin Epidemiol 2024; 165:111199. [PMID: 37898461 DOI: 10.1016/j.jclinepi.2023.10.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/06/2023] [Accepted: 10/20/2023] [Indexed: 10/30/2023]
Abstract
OBJECTIVE To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine learning methods in the field of oncology. STUDY DESIGN AND SETTING We conducted a systematic review, searching the MEDLINE database between December 1, 2022, and December 31, 2022, for studies developing a multivariable prognostic model using machine learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices. RESULTS We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data sharing statements, with 21 (46%) indicating data were available on request to the authors and seven declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their model to be used in practice. The use of reporting guidelines was rare: eight studies (18%) mentioning using a reporting guideline, with 4 (10%) using the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis statement, 1 (2%) using Minimum Information About Clinical Artificial Intelligence Modeling and Consolidated Standards Of Reporting Trials-Artificial Intelligence, 1 (2%) using Strengthening The Reporting Of Observational Studies In Epidemiology, 1 (2%) using Standards for Reporting Diagnostic Accuracy Studies, and 1 (2%) using Transparent Reporting of Evaluations with Nonrandomized Designs. CONCLUSION The adoption of open science principles in oncology studies developing prognostic models using machine learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science are needed for prediction research in oncology.
Collapse
Affiliation(s)
- Gary S Collins
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom.
| | - Rebecca Whittle
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Garrett S Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA; Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, United Kingdom
| | - Patricia Logullo
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Jennifer A de Beyer
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Michael M Schlussel
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
9
|
Hamilton DG, Hong K, Fraser H, Rowhani-Farid A, Fidler F, Page MJ. Prevalence and predictors of data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. BMJ 2023; 382:e075767. [PMID: 37433624 PMCID: PMC10334349 DOI: 10.1136/bmj-2023-075767] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/07/2023] [Indexed: 07/13/2023]
Abstract
OBJECTIVES To synthesise research investigating data and code sharing in medicine and health to establish an accurate representation of the prevalence of sharing, how this frequency has changed over time, and what factors influence availability. DESIGN Systematic review with meta-analysis of individual participant data. DATA SOURCES Ovid Medline, Ovid Embase, and the preprint servers medRxiv, bioRxiv, and MetaArXiv were searched from inception to 1 July 2021. Forward citation searches were also performed on 30 August 2022. REVIEW METHODS Meta-research studies that investigated data or code sharing across a sample of scientific articles presenting original medical and health research were identified. Two authors screened records, assessed the risk of bias, and extracted summary data from study reports when individual participant data could not be retrieved. Key outcomes of interest were the prevalence of statements that declared that data or code were publicly or privately available (declared availability) and the success rates of retrieving these products (actual availability). The associations between data and code availability and several factors (eg, journal policy, type of data, trial design, and human participants) were also examined. A two stage approach to meta-analysis of individual participant data was performed, with proportions and risk ratios pooled with the Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis. RESULTS The review included 105 meta-research studies examining 2 121 580 articles across 31 specialties. Eligible studies examined a median of 195 primary articles (interquartile range 113-475), with a median publication year of 2015 (interquartile range 2012-2018). Only eight studies (8%) were classified as having a low risk of bias. Meta-analyses showed a prevalence of declared and actual public data availability of 8% (95% confidence interval 5% to 11%) and 2% (1% to 3%), respectively, between 2016 and 2021. For public code sharing, both the prevalence of declared and actual availability were estimated to be <0.5% since 2016. Meta-regressions indicated that only declared public data sharing prevalence estimates have increased over time. Compliance with mandatory data sharing policies ranged from 0% to 100% across journals and varied by type of data. In contrast, success in privately obtaining data and code from authors historically ranged between 0% and 37% and 0% and 23%, respectively. CONCLUSIONS The review found that public code sharing was persistently low across medical research. Declarations of data sharing were also low, increasing over time, but did not always correspond to actual sharing of data. The effectiveness of mandatory data sharing policies varied substantially by journal and type of data, a finding that might be informative for policy makers when designing policies and allocating resources to audit compliance. SYSTEMATIC REVIEW REGISTRATION Open Science Framework doi:10.17605/OSF.IO/7SX8U.
Collapse
Affiliation(s)
- Daniel G Hamilton
- MetaMelb Research Group, School of BioSciences, University of Melbourne, Melbourne, VIC, Australia
- Melbourne Medical School, Faculty of Medicine, Dentistry, and Health Sciences, University of Melbourne, Melbourne, VIC, Australia
| | - Kyungwan Hong
- Department of Practice, Sciences, and Health Outcomes Research, University of Maryland School of Pharmacy, Baltimore, MD, USA
| | - Hannah Fraser
- MetaMelb Research Group, School of BioSciences, University of Melbourne, Melbourne, VIC, Australia
| | - Anisa Rowhani-Farid
- Department of Practice, Sciences, and Health Outcomes Research, University of Maryland School of Pharmacy, Baltimore, MD, USA
| | - Fiona Fidler
- MetaMelb Research Group, School of BioSciences, University of Melbourne, Melbourne, VIC, Australia
- School of Historical and Philosophical Studies, University of Melbourne, Melbourne, VIC, Australia
| | - Matthew J Page
- Methods in Evidence Synthesis Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| |
Collapse
|