1
|
Alloza C, Knox B, Raad H, Aguilà M, Coakley C, Mohrova Z, Boin É, Bénard M, Davies J, Jacquot E, Lecomte C, Fabre A, Batech M. A Case for Synthetic Data in Regulatory Decision-Making in Europe. Clin Pharmacol Ther 2023; 114:795-801. [PMID: 37441734 DOI: 10.1002/cpt.3001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 07/05/2023] [Indexed: 07/15/2023]
Abstract
Regulators are faced with many challenges surrounding health data usage, including privacy, fragmentation, validity, and generalizability, especially in the European Union, for which synthetic data may provide innovative solutions. Synthetic data, defined as data artificially generated rather than captured in the real world, are increasingly being used for healthcare research purposes as a proxy to real-world data (RWD). Currently, there are barriers particularly challenging in Europe, where sharing patient's data is strictly regulated, costly, and time-consuming, causing delays in evidence generation and regulatory approvals. Recent initiatives are encouraging the use of synthetic data in regulatory decision making and health technology assessment to overcome these challenges, but synthetic data have still to overcome realistic obstacles before their adoption by researchers and regulators in Europe. Thus, the emerging use of RWD and synthetic data by pharmaceutical and medical device industries calls regulatory bodies to provide a framework for proper evidence generation and informed regulatory decision making. As the provision of data becomes more ubiquitous in scientific research, so will innovations in artificial intelligence, machine learning, and generation of synthetic data, making the exploration and intricacies of this topic all the more important and timely. In this review, we discuss the potential merits and challenges of synthetic data in the context of decision making in the European regulatory environment. We explore the current uses of synthetic data and ongoing initiatives, the value of synthetic data for regulatory purposes, and realistic barriers to the adoption of synthetic data in healthcare.
Collapse
|
2
|
McDonnell KJ. Leveraging the Academic Artificial Intelligence Silecosystem to Advance the Community Oncology Enterprise. J Clin Med 2023; 12:4830. [PMID: 37510945 PMCID: PMC10381436 DOI: 10.3390/jcm12144830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
Over the last 75 years, artificial intelligence has evolved from a theoretical concept and novel paradigm describing the role that computers might play in our society to a tool with which we daily engage. In this review, we describe AI in terms of its constituent elements, the synthesis of which we refer to as the AI Silecosystem. Herein, we provide an historical perspective of the evolution of the AI Silecosystem, conceptualized and summarized as a Kuhnian paradigm. This manuscript focuses on the role that the AI Silecosystem plays in oncology and its emerging importance in the care of the community oncology patient. We observe that this important role arises out of a unique alliance between the academic oncology enterprise and community oncology practices. We provide evidence of this alliance by illustrating the practical establishment of the AI Silecosystem at the City of Hope Comprehensive Cancer Center and its team utilization by community oncology providers.
Collapse
Affiliation(s)
- Kevin J McDonnell
- Center for Precision Medicine, Department of Medical Oncology & Therapeutics Research, City of Hope Comprehensive Cancer Center, Duarte, CA 91010, USA
| |
Collapse
|
3
|
A Semi-Supervised Machine Learning Approach in Predicting High-Risk Pregnancies in the Philippines. Diagnostics (Basel) 2022; 12:diagnostics12112782. [PMID: 36428842 PMCID: PMC9689356 DOI: 10.3390/diagnostics12112782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/02/2022] [Accepted: 11/11/2022] [Indexed: 11/16/2022] Open
Abstract
Early risk tagging is crucial in maternal health, especially because it threatens both the mother and the long-term development of the baby. By tagging high-risk pregnancies, mothers would be given extra care before, during, and after pregnancies, thus reducing the risk of complications. In the Philippines, where the fertility rate is high, especially among the youth, awareness of risks can significantly contribute to the overall outcome of the pregnancy and, to an extent, the Maternal mortality rate. Although supervised machine learning models have ubiquity as predictors, there is a gap when data are weak or scarce. Using limited collected data from the municipality of Daraga in Albay, the study first compared multiple supervised machine learning algorithms to analyze and accurately predict high-risk pregnancies. Through hyperparameter tuning, supervised learning algorithms such as Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, and Multilayer Perceptron were evaluated by using 10-fold cross validation to obtain the best parameters with the best scores. The results show that Decision Tree bested other algorithms and attained a test score of 93.70%. To address the gap, a semi-supervised approach using a Self-Training model was applied to the modified Decision Tree, which was then used as the base estimator with a 30% unlabeled dataset and achieved a 97.01% accuracy rate which outweighs similar studies.
Collapse
|
4
|
Kuo NIH, Polizzotto MN, Finfer S, Garcia F, Sönnerborg A, Zazzi M, Böhm M, Kaiser R, Jorm L, Barbieri S. The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms. Sci Data 2022; 9:693. [PMID: 36369205 PMCID: PMC9652426 DOI: 10.1038/s41597-022-01784-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 10/17/2022] [Indexed: 11/13/2022] Open
Abstract
In recent years, the machine learning research community has benefited tremendously from the availability of openly accessible benchmark datasets. Clinical data are usually not openly available due to their confidential nature. This has hampered the development of reproducible and generalisable machine learning applications in health care. Here we introduce the Health Gym - a growing collection of highly realistic synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms, with a specific focus on reinforcement learning. The three synthetic datasets described in this paper present patient cohorts with acute hypotension and sepsis in the intensive care unit, and people with human immunodeficiency virus (HIV) receiving antiretroviral therapy. The datasets were created using a novel generative adversarial network (GAN). The distributions of variables, and correlations between variables and trends in variables over time in the synthetic datasets mirror those in the real datasets. Furthermore, the risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.
Collapse
Affiliation(s)
- Nicholas I-Hsien Kuo
- Centre for Big Data Research in Health, University of New South Wales, Sydney, Australia.
| | | | - Simon Finfer
- The George Institute for Global Health, Sydney, Australia
- University of New South Wales, Sydney, Australia
- Imperial College London, London, United Kingdom
| | | | | | | | - Michael Böhm
- Uniklinik Köln, Universität zu Köln, Cologne, Germany
| | - Rolf Kaiser
- Uniklinik Köln, Universität zu Köln, Cologne, Germany
| | - Louisa Jorm
- Centre for Big Data Research in Health, University of New South Wales, Sydney, Australia
| | - Sebastiano Barbieri
- Centre for Big Data Research in Health, University of New South Wales, Sydney, Australia
| |
Collapse
|
5
|
Banerjee S, Bishop TRP. dsSynthetic: synthetic data generation for the DataSHIELD federated analysis system. BMC Res Notes 2022; 15:230. [PMID: 35761417 PMCID: PMC9235208 DOI: 10.1186/s13104-022-06111-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 06/15/2022] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Platforms such as DataSHIELD allow users to analyse sensitive data remotely, without having full access to the detailed data items (federated analysis). While this feature helps to overcome difficulties with data sharing, it can make it challenging to write code without full visibility of the data. One solution is to generate realistic, non-disclosive synthetic data that can be transferred to the analyst so they can perfect their code without the access limitation. When this process is complete, they can run the code on the real data. RESULTS We have created a package in DataSHIELD (dsSynthetic) which allows generation of realistic synthetic data, building on existing packages. In our paper and accompanying tutorial we demonstrate how the use of synthetic data generated with our package can help DataSHIELD users with tasks such as writing analysis scripts and harmonising data to common scales and measures.
Collapse
Affiliation(s)
- Soumya Banerjee
- Medical Research Council Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge, UK.
| | - Tom R P Bishop
- Medical Research Council Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge, UK
| |
Collapse
|
6
|
Dynamics Modeling of Industrial Robotic Manipulators: A Machine Learning Approach Based on Synthetic Data. MATHEMATICS 2022. [DOI: 10.3390/math10071174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Obtaining a dynamic model of the robotic manipulator is a complex task. With the growing application of machine learning (ML) approaches in modern robotics, a question arises of using ML for dynamic modeling. Still, due to the large amounts of data necessary for this approach, data collection may be time and resource-intensive. For this reason, this paper aims to research the possibility of synthetic dataset creation by using pre-existing dynamic models to test the possibilities of both applications of such synthetic datasets, as well as modeling the dynamics of an industrial manipulator using ML. Authors generate the dataset consisting of 20,000 data points and train seven separate multilayer perceptron (MLP) artificial neural networks (ANN)—one for each joint of the manipulator and one for the total torque—using randomized search (RS) for hyperparameter tuning. Additional MLP is trained for the total torsion of the entire manipulator using the same approach. Each model is evaluated using the coefficient of determination (R2) and mean absolute percentage error (MAPE), with 10-fold cross-validation applied. With these settings, all individual joint torque models achieved R2 scores higher than 0.9, with the models for first four joints achieving scores above 0.95. Furthermore, all models for all individual joints achieve MAPE lower than 2%. The model for the total torque of all joints of the robotic manipulator achieves weaker regression scores, with the R2 score of 0.89 and MAPE slightly higher than 2%. The results show that the torsion models of each individual joint, and of the entire manipulator, can be regressed using the described method, with satisfactory accuracy.
Collapse
|