1
|
Bai I, Doshi P, Herder M. How to use the regulatory data from Health Canada for secondary analyses on new drugs, biologics and vaccines. BMJ Evid Based Med 2024; 29:187-193. [PMID: 37898504 PMCID: PMC11137451 DOI: 10.1136/bmjebm-2023-112475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/11/2023] [Indexed: 10/30/2023]
Abstract
Incorporating clinical data held by national health product regulatory authorities into secondary analyses such as systematic reviews can help combat publication bias and selective outcome reporting, in turn, supporting more evidence-based decisions regarding the prescribing of drugs, biologics and vaccines. Owing to recent changes in Canadian law, Health Canada has begun to make clinical information-whether it has been previously published or not-publicly available through its 'Public Release of Clinical Information' (PRCI) online database. We provide guidance about how to access and use regulatory data obtained through the PRCI database for the purpose of conducting drug and biologic secondary analyses.
Collapse
Affiliation(s)
- Isaac Bai
- Faculty of Medicine, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Peter Doshi
- Department of Practice, Sciences, and Health Outcomes Research, School of Pharmacy, University of Maryland, Baltimore, Maryland, USA
| | - Matthew Herder
- Health Law Institute, Schulich School of Law, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Pharmacology, Faculty of Medicine, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
2
|
Hopkins AM, Modi ND, Rockhold FW, Hoffmann T, Menz BD, Veroniki AA, McKinnon RA, Rowland A, Swain SM, Ross JS, Sorich MJ. Accessibility of clinical study reports supporting medicine approvals: a cross-sectional evaluation. J Clin Epidemiol 2024; 167:111263. [PMID: 38219810 DOI: 10.1016/j.jclinepi.2024.111263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 01/07/2024] [Accepted: 01/09/2024] [Indexed: 01/16/2024]
Abstract
OBJECTIVES Clinical study reports (CSRs) are highly detailed documents that play a pivotal role in medicine approval processes. Though not historically publicly available, in recent years, major entities including the European Medicines Agency (EMA), Health Canada, and the US Food and Drug Administration (FDA) have highlighted the importance of CSR accessibility. The primary objective herein was to determine the proportion of CSRs that support medicine approvals available for public download as well as the proportion eligible for independent researcher request via the study sponsor. STUDY DESIGN AND SETTING This cross-sectional study examined the accessibility of CSRs from industry-sponsored clinical trials whose results were reported in the FDA-authorized drug labels of the top 30 highest-revenue medicines of 2021. We determined (1) whether the CSRs were available for download from a public repository, and (2) whether the CSRs were eligible for request by independent researchers based on trial sponsors' data sharing policies. RESULTS There were 316 industry-sponsored clinical trials with results presented in the FDA-authorized drug labels of the 30 sampled medicines. Of these trials, CSRs were available for public download from 70 (22%), with 37 available at EMA and 40 at Health Canada repositories. While pharmaceutical company platforms offered no direct downloads of CSRs, sponsors confirmed that CSRs from 183 (58%) of the 316 clinical trials were eligible for independent researcher request via the submission of a research proposal. Overall, 218 (69%) of the sampled clinical trials had CSRs available for public download and/or were eligible for request from the trial sponsor. CONCLUSION CSRs were available from 69% of the clinical trials supporting regulatory approval of the 30 medicines sampled. However, only 22% of the CSRs were directly downloadable from regulatory agencies, the remaining required a formal application process to request access to the CSR from the study sponsor.
Collapse
Affiliation(s)
- Ashley M Hopkins
- College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia.
| | - Natansh D Modi
- College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia
| | - Frank W Rockhold
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Tammy Hoffmann
- Institute for Evidence-Based Healthcare, Faculty of Health Sciences and Medicine, Bond University, Gold Coast, Queensland, Australia
| | - Bradley D Menz
- College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia
| | - Areti-Angeliki Veroniki
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada; Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, Ontario, Canada
| | - Ross A McKinnon
- College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia
| | - Andrew Rowland
- College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia
| | - Sandra M Swain
- Georgetown Lombardi Comprehensive Cancer Center, MedStar Health, Washington DC, USA
| | - Joseph S Ross
- Section of General Medicine, Department of Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Michael J Sorich
- College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia
| |
Collapse
|
3
|
El Kababji S, Mitsakakis N, Fang X, Beltran-Bless AA, Pond G, Vandermeer L, Radhakrishnan D, Mosquera L, Paterson A, Shepherd L, Chen B, Barlow WE, Gralow J, Savard MF, Clemons M, El Emam K. Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets. JCO Clin Cancer Inform 2023; 7:e2300116. [PMID: 38011617 PMCID: PMC10703127 DOI: 10.1200/cci.23.00116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 08/24/2023] [Accepted: 09/19/2023] [Indexed: 11/29/2023] Open
Abstract
PURPOSE There is strong interest from patients, researchers, the pharmaceutical industry, medical journal editors, funders of research, and regulators in sharing clinical trial data for secondary analysis. However, data access remains a challenge because of concerns about patient privacy. It has been argued that synthetic data generation (SDG) is an effective way to address these privacy concerns. There is a dearth of evidence supporting this on oncology clinical trial data sets, and on the utility of privacy-preserving synthetic data. The objective of the proposed study is to validate the utility and privacy risks of synthetic clinical trial data sets across multiple SDG techniques. METHODS We synthesized data sets from eight breast cancer clinical trial data sets using three types of generative models: sequential synthesis, conditional generative adversarial network, and variational autoencoder. Synthetic data utility was evaluated by replicating the published analyses on the synthetic data and assessing concordance of effect estimates and CIs between real and synthetic data. Privacy was evaluated by measuring attribution disclosure risk and membership disclosure risk. RESULTS Utility was highest using the sequential synthesis method where all results were replicable and the CI overlap most similar or higher for seven of eight data sets. Both types of privacy risks were low across all three types of generative models. DISCUSSION Synthetic data using sequential synthesis methods can act as a proxy for real clinical trial data sets, and simultaneously have low privacy risks. This type of generative model can be one way to enable broader sharing of clinical trial data.
Collapse
Affiliation(s)
| | | | - Xi Fang
- Replica Analytics Ltd, Ottawa, ON, Canada
| | - Ana-Alicia Beltran-Bless
- Ottawa Hospital Research Institute, Ottawa, ON, Canada
- Division of Medical Oncology, Department of Medicine, University of Ottawa, ON, Canada
| | - Greg Pond
- McMaster University, Hamilton, ON, Canada
| | | | - Dhenuka Radhakrishnan
- CHEO Research Institute, Ottawa, ON, Canada
- Department of Paediatrics, University of Ottawa, Ottawa, ON, Canada
| | - Lucy Mosquera
- CHEO Research Institute, Ottawa, ON, Canada
- Replica Analytics Ltd, Ottawa, ON, Canada
| | | | | | | | | | | | - Marie-France Savard
- Ottawa Hospital Research Institute, Ottawa, ON, Canada
- Division of Medical Oncology, Department of Medicine, University of Ottawa, ON, Canada
| | - Mark Clemons
- Ottawa Hospital Research Institute, Ottawa, ON, Canada
- Division of Medical Oncology, Department of Medicine, University of Ottawa, ON, Canada
| | - Khaled El Emam
- CHEO Research Institute, Ottawa, ON, Canada
- Replica Analytics Ltd, Ottawa, ON, Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
4
|
Byrne D, Prendergast C, Fahey T, Moriarty F. Clinical study reports published by the European Medicines Agency 2016-2018: a cross-sectional analysis. BMJ Open 2023; 13:e068981. [PMID: 37188475 DOI: 10.1136/bmjopen-2022-068981] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/17/2023] Open
Abstract
OBJECTIVES To describe the characteristics of clinical study report (CSR) documents published by the European Medicines Agency (EMA), and for included pivotal trials, to quantify the timeliness of access to trial results from CSRs compared with conventional published sources. DESIGN Cross-sectional analysis of CSR documents published by the EMA from 2016 to 2018. METHODS CSR files and medication summary information were downloaded from the EMA. Individual trials in each submission were identified using document filenames. Number and length of documents and trials were determined. For pivotal trials, trial phase, dates of EMA document publication and matched journal and registry publications were obtained. RESULTS The EMA published documents on 142 medications that were submitted for regulatory drug approval. Submissions were for initial marketing authorisations in 64.1%. There was a median of 15 (IQR 5-46) documents, 5 (IQR 2-14) trials and 9629 (IQR 2711-26,673) pages per submission, and a median of 1 (IQR 1-4) document and 336 (IQR 21-1192) pages per trial. Of all identified pivotal trials, 60.9% were phase 3 and 18.5% were phase 1. Of 119 unique submissions to the EMA, 46.2% were supported by a single pivotal trial, with 13.4% based on a single pivotal phase 1 trial. No trial registry results were identified for 26.1% trials, no journal publications for 16.7% and 13.5% of trials had neither. EMA publication was the earliest information source for 5.8% of pivotal trials, available a median 523 days (IQR 363-882 days) before the earliest publication. CONCLUSIONS The EMA Clinical Data website contains lengthy clinical trial documents. Almost half of submissions to the EMA were based on single pivotal trials, many of which were phase 1 trials. CSRs were the only source and a timelier source of information for many trials. Access to unpublished trial information should be open and timely to support decision-making for patients.
Collapse
Affiliation(s)
- David Byrne
- Department of General Practice, Royal College of Surgeons in Ireland, Dublin, Ireland
- Royal College of Surgeons in Ireland, HRB Centre for Primary Care Research, Dublin, Ireland
| | - Ciaran Prendergast
- School of Pharmacy and Biomolecular Sciences, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Tom Fahey
- Department of General Practice, Royal College of Surgeons in Ireland, Dublin, Ireland
- Royal College of Surgeons in Ireland, HRB Centre for Primary Care Research, Dublin, Ireland
| | - Frank Moriarty
- School of Pharmacy and Biomolecular Sciences, Royal College of Surgeons in Ireland, Dublin, Ireland
| |
Collapse
|
5
|
Zemła-Pacud Ż, Lenarczyk G. Clinical Trial Data Transparency in the EU: Is the New Clinical Trials Regulation a Game-Changer? IIC; INTERNATIONAL REVIEW OF INDUSTRIAL PROPERTY AND COPYRIGHT LAW 2023; 54:732-763. [PMID: 37215361 PMCID: PMC10158712 DOI: 10.1007/s40319-023-01329-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/17/2023] [Indexed: 05/24/2023]
Abstract
The benefits of access to clinical trial data are related to their inestimable value from the perspective of clinical trial participants, society as a whole, public health systems and scientific progress. In light of the development of innovative data analysis technologies, access to raw clinical trial data opens up an ever-widening array of possibilities: it can profoundly facilitate machine data analysis for, inter alia, hypothesis generation, risk modelling, counterfactual simulation and - finally - drug repurposing and development. The enactment of the new Clinical Trials Regulation (EU) No. 536/2014 (CTR) and introduction of the Clinical Trials Information System (CTIS) were heralded as ensuring a level of transparency in clinical trials that is sufficient to contribute to protecting public health and fostering the innovation capacity of European medical research, while recognizing the legitimate economic interests of sponsors. This paper presents the hitherto binding rules for the disclosure of clinical trial data and, against this background, their new framework, introduced by the CTR. In addition to assessing whether the CTR's objectives are fulfilled, this paper examines whether the latest changes impact the hitherto existing rules on protection of regulatory data via regulatory exclusivities. Finally, it points out concerns regarding whether data gathered in the CTIS can be efficiently used by innovative data analysis technologies for further processing for both commercial and non-commercial purposes.
Collapse
Affiliation(s)
- Żaneta Zemła-Pacud
- Dr.; Department of Polish and European Industrial Property Law, Polish Academy of Sciences, Warsaw, Poland
| | - Gabriela Lenarczyk
- Dr.; Department of Private Law, Institute of Law Studies, Polish Academy of Sciences, Warsaw, Poland
| |
Collapse
|
6
|
Paludan-Müller AS, Maclean-Nyegaard IR, Munkholm K. Substantial delays in clinical data published by the European Medicines Agency – a cross sectional study. J Clin Epidemiol 2022; 146:68-76. [DOI: 10.1016/j.jclinepi.2022.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 01/13/2022] [Accepted: 02/16/2022] [Indexed: 10/18/2022]
|
7
|
Practical Considerations and Challenges When Conducting an Individual Participant Data (IPD) Meta-Analysis. Methods Mol Biol 2021. [PMID: 34550596 DOI: 10.1007/978-1-0716-1566-9_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
This chapter provides a broad overview of the use of individual participant (sometimes referred to as patient) data (IPD ) within meta-analyses, the associated advantages of using IPD in meta-analysis compared to aggregate data, and when IPD should be used in meta-analysis.This chapter also outlines the steps of conducting an IPD meta-analysis, with practical guidance relating to requesting and obtaining IPD for meta-analysis. Challenges that can be associated with conducting an IPD meta-analysis are also discussed, including consideration of availability bias, when a subset of the relevant IPD is not available for meta-analysis.
Collapse
|
8
|
Hodkinson A, Heneghan C, Mahtani KR, Kontopantelis E, Panagioti M. Benefits and harms of Risperidone and Paliperidone for treatment of patients with schizophrenia or bipolar disorder: a meta-analysis involving individual participant data and clinical study reports. BMC Med 2021; 19:195. [PMID: 34429113 PMCID: PMC8386072 DOI: 10.1186/s12916-021-02062-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 07/13/2021] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Schizophrenia and bipolar disorder are severe mental illnesses which are highly prevalent worldwide. Risperidone and Paliperidone are treatments for either illnesses, but their efficacy compared to other antipsychotics and growing reports of hormonal imbalances continue to raise concerns. As existing evidence on both antipsychotics are solely based on aggregate data, we aimed to assess the benefits and harms of Risperidone and Paliperidone in the treatment of patients with schizophrenia or bipolar disorder, using individual participant data (IPD), clinical study reports (CSRs) and publicly available sources (journal publications and trial registries). METHODS We searched MEDLINE, Central, EMBASE and PsycINFO until December 2020 for randomised placebo-controlled trials of Risperidone, Paliperidone or Paliperidone palmitate in patients with schizophrenia or bipolar disorder. We obtained IPD and CSRs from the Yale University Open Data Access project. The primary outcome Positive and Negative Syndrome Scale (PANSS) score was analysed using one-stage IPD meta-analysis. Random-effect meta-analysis of harm outcomes involved methods for coping with rare events. Effect-sizes were compared across all available data sources using the ratio of means or relative risk. We registered our review on PROSPERO, CRD42019140556. RESULTS Of the 35 studies, IPD meta-analysis involving 22 (63%) studies showed a significant clinical reduction in the PANSS in patients receiving Risperidone (mean difference - 5.83, 95% CI - 10.79 to - 0.87, I2 = 8.5%, n = 4 studies, 1131 participants), Paliperidone (- 6.01, 95% CI - 8.7 to - 3.32, I2 = 4.3%, n = 13, 3821) and Paliperidone palmitate (- 7.89, 95% CI - 12.1 to - 3.69, I2 = 2.9%, n = 5, 2209). CSRs reported nearly two times more adverse events (4434 vs. 2296 publication, relative difference (RD) = 1.93, 95% CI 1.86 to 2.00) and almost 8 times more serious adverse events (650 vs. 82; RD = 7.93, 95% CI 6.32 to 9.95) than the journal publications. Meta-analyses of individual harms from CSRs revealed a significant increased risk among several outcomes including extrapyramidal disorder, tardive dyskinesia and increased weight. But the ratio of relative risk between the different data sources was not significant. Three treatment-related gynecomastia events occurred, and these were considered mild to moderate in severity. CONCLUSION IPD meta-analysis conclude that Risperidone and Paliperidone antipsychotics had a small beneficial effect on reducing PANSS score over 9 weeks, which is more conservative than estimates from reviews based on journal publications. CSRs also contained significantly more data on harms that were unavailable in journal publications or trial registries. Sharing of IPD and CSRs are necessary when performing meta-analysis on the efficacy and safety of antipsychotics.
Collapse
Affiliation(s)
- Alexander Hodkinson
- National Institute for Health Research School for Primary Care Research, Centre for Primary Care and Health Services Research, Division of Population Health, Health Services Research and Primary Care, School of Health Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Williamson Building, Oxford Road, Manchester, M13 9PL, UK.
- National Institute for Health Research Greater Manchester Patient Safety Translational Research Centre, School of Health Sciences, University of Manchester, Manchester, M13 9PL, UK.
| | - Carl Heneghan
- Nuffield Department of Primary Care health Sciences, University of Oxford, Oxford, UK
| | - Kamal R Mahtani
- Nuffield Department of Primary Care health Sciences, University of Oxford, Oxford, UK
| | - Evangelos Kontopantelis
- National Institute for Health Research School for Primary Care Research, Centre for Primary Care and Health Services Research, Division of Population Health, Health Services Research and Primary Care, School of Health Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Williamson Building, Oxford Road, Manchester, M13 9PL, UK
- Division of Informatics, Imaging & Data Sciences, University of Manchester, Manchester, M13 9PL, UK
| | - Maria Panagioti
- National Institute for Health Research School for Primary Care Research, Centre for Primary Care and Health Services Research, Division of Population Health, Health Services Research and Primary Care, School of Health Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Williamson Building, Oxford Road, Manchester, M13 9PL, UK
- National Institute for Health Research Greater Manchester Patient Safety Translational Research Centre, School of Health Sciences, University of Manchester, Manchester, M13 9PL, UK
| |
Collapse
|
9
|
Azizi Z, Zheng C, Mosquera L, Pilote L, El Emam K. Can synthetic data be a proxy for real clinical trial data? A validation study. BMJ Open 2021; 11:e043497. [PMID: 33863713 PMCID: PMC8055130 DOI: 10.1136/bmjopen-2020-043497] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 01/14/2021] [Accepted: 03/18/2021] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVES There are increasing requirements to make research data, especially clinical trial data, more broadly available for secondary analyses. However, data availability remains a challenge due to complex privacy requirements. This challenge can potentially be addressed using synthetic data. SETTING Replication of a published stage III colon cancer trial secondary analysis using synthetic data generated by a machine learning method. PARTICIPANTS There were 1543 patients in the control arm that were included in our analysis. PRIMARY AND SECONDARY OUTCOME MEASURES Analyses from a study published on the real dataset were replicated on synthetic data to investigate the relationship between bowel obstruction and event-free survival. Information theoretic metrics were used to compare the univariate distributions between real and synthetic data. Percentage CI overlap was used to assess the similarity in the size of the bivariate relationships, and similarly for the multivariate Cox models derived from the two datasets. RESULTS Analysis results were similar between the real and synthetic datasets. The univariate distributions were within 1% of difference on an information theoretic metric. All of the bivariate relationships had CI overlap on the tau statistic above 50%. The main conclusion from the published study, that lack of bowel obstruction has a strong impact on survival, was replicated directionally and the HR CI overlap between the real and synthetic data was 61% for overall survival (real data: HR 1.56, 95% CI 1.11 to 2.2; synthetic data: HR 2.03, 95% CI 1.44 to 2.87) and 86% for disease-free survival (real data: HR 1.51, 95% CI 1.18 to 1.95; synthetic data: HR 1.63, 95% CI 1.26 to 2.1). CONCLUSIONS The high concordance between the analytical results and conclusions from synthetic and real data suggests that synthetic data can be used as a reasonable proxy for real clinical trial datasets. TRIAL REGISTRATION NUMBER NCT00079274.
Collapse
Affiliation(s)
- Zahra Azizi
- Center for Outcomes Research and Evaluation, Faculty of Medicine, McGill University, Montreal, Québec, Canada
| | - Chaoyi Zheng
- Data Science, Replica Analytics Ltd, Ottawa, Ontario, Canada
| | - Lucy Mosquera
- Data Science, Replica Analytics Ltd, Ottawa, Ontario, Canada
| | - Louise Pilote
- Medicine, McGill University, Montreal, Québec, Canada
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, Québec, Canada
| | - Khaled El Emam
- Electronic Health Information Laboratory, Children's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
10
|
Emam KE, Mosquera L, Zheng C. Optimizing the synthesis of clinical trial data using sequential trees. J Am Med Inform Assoc 2021; 28:3-13. [PMID: 33186440 PMCID: PMC7810457 DOI: 10.1093/jamia/ocaa249] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 09/22/2020] [Indexed: 12/13/2022] Open
Abstract
OBJECTIVE With the growing demand for sharing clinical trial data, scalable methods to enable privacy protective access to high-utility data are needed. Data synthesis is one such method. Sequential trees are commonly used to synthesize health data. It is hypothesized that the utility of the generated data is dependent on the variable order. No assessments of the impact of variable order on synthesized clinical trial data have been performed thus far. Through simulation, we aim to evaluate the variability in the utility of synthetic clinical trial data as variable order is randomly shuffled and implement an optimization algorithm to find a good order if variability is too high. MATERIALS AND METHODS Six oncology clinical trial datasets were evaluated in a simulation. Three utility metrics were computed comparing real and synthetic data: univariate similarity, similarity in multivariate prediction accuracy, and a distinguishability metric. Particle swarm was implemented to optimize variable order, and was compared with a curriculum learning approach to ordering variables. RESULTS As the number of variables in a clinical trial dataset increases, there is a pattern of a marked increase in variability of data utility with order. Particle swarm with a distinguishability hinge loss ensured adequate utility across all 6 datasets. The hinge threshold was selected to avoid overfitting which can create a privacy problem. This was superior to curriculum learning in terms of utility. CONCLUSIONS The optimization approach presented in this study gives a reliable way to synthesize high-utility clinical trial datasets.
Collapse
Affiliation(s)
- Khaled El Emam
- School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
- Electronic Health Information Laboratory, Childrens Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada
- Replica Analytics Ltd, Ottawa, Ontario, Canada
| | | | | |
Collapse
|
11
|
Branson J, Good N, Chen JW, Monge W, Probst C, El Emam K. Evaluating the re-identification risk of a clinical study report anonymized under EMA Policy 0070 and Health Canada Regulations. Trials 2020; 21:200. [PMID: 32070405 PMCID: PMC7029478 DOI: 10.1186/s13063-020-4120-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Accepted: 01/30/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Regulatory agencies, such as the European Medicines Agency and Health Canada, are requiring the public sharing of clinical trial reports that are used to make drug approval decisions. Both agencies have provided guidance for the quantitative anonymization of these clinical reports before they are shared. There is limited empirical information on the effectiveness of this approach in protecting patient privacy for clinical trial data. METHODS In this paper we empirically test the hypothesis that when these guidelines are implemented in practice, they provide adequate privacy protection to patients. An anonymized clinical study report for a trial on a non-steroidal anti-inflammatory drug that is sold as a prescription eye drop was subjected to re-identification. The target was 500 patients in the USA. Only suspected matches to real identities were reported. RESULTS Six suspected matches with low confidence scores were identified. Each suspected match took 24.2 h of effort. Social media and death records provided the most useful information for getting the suspected matches. CONCLUSIONS These results suggest that the anonymization guidance from these agencies can provide adequate privacy protection for patients, and the modes of attack can inform further refinements of the methodologies they recommend in their guidance for manufacturers.
Collapse
Affiliation(s)
| | | | | | | | | | - Khaled El Emam
- Privacy Analytics, Ottawa, Canada. .,Children's Hospital of Eastern Ontario Research Institute, Ottawa, Canada.
| |
Collapse
|