1
|
Panavas L, Crnovrsanin T, Adams JL, Ullman J, Sargavad A, Tory M, Dunne C. Investigating the Visual Utility of Differentially Private Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5370-5385. [PMID: 37405888 DOI: 10.1109/tvcg.2023.3292391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Increasingly, visualization practitioners are working with, using, and studying private and sensitive data. There can be many stakeholders interested in the resulting analyses-but widespread sharing of the data can cause harm to individuals, companies, and organizations. Practitioners are increasingly turning to differential privacy to enable public data sharing with a guaranteed amount of privacy. Differential privacy algorithms do this by aggregating data statistics with noise, and this now-private data can be released visually with differentially private scatterplots. While the private visual output is affected by the algorithm choice, privacy level, bin number, data distribution, and user task, there is little guidance on how to choose and balance the effect of these parameters. To address this gap, we had experts examine 1,200 differentially private scatterplots created with a variety of parameter choices and tested their ability to see aggregate patterns in the private output (i.e. the visual utility of the chart). We synthesized these results to provide easy-to-use guidance for visualization practitioners releasing private data through scatterplots. Our findings also provide a ground truth for visual utility, which we use to benchmark automated utility metrics from various fields. We demonstrate how multi-scale structural similarity (MS-SSIM), the metric most strongly correlated with our study's utility results, can be used to optimize parameter selection.
Collapse
|
2
|
Tang AS, Woldemariam SR, Miramontes S, Norgeot B, Oskotsky TT, Sirota M. Harnessing EHR data for health research. Nat Med 2024; 30:1847-1855. [PMID: 38965433 DOI: 10.1038/s41591-024-03074-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/17/2024] [Indexed: 07/06/2024]
Abstract
With the increasing availability of rich, longitudinal, real-world clinical data recorded in electronic health records (EHRs) for millions of patients, there is a growing interest in leveraging these records to improve the understanding of human health and disease and translate these insights into clinical applications. However, there is also a need to consider the limitations of these data due to various biases and to understand the impact of missing information. Recognizing and addressing these limitations can inform the design and interpretation of EHR-based informatics studies that avoid confusing or incorrect conclusions, particularly when applied to population or precision medicine. Here we discuss key considerations in the design, implementation and interpretation of EHR-based informatics studies, drawing from examples in the literature across hypothesis generation, hypothesis testing and machine learning applications. We outline the growing opportunities for EHR-based informatics studies, including association studies and predictive modeling, enabled by evolving AI capabilities-while addressing limitations and potential pitfalls to avoid.
Collapse
Affiliation(s)
- Alice S Tang
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Sarah R Woldemariam
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Silvia Miramontes
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | | | - Tomiko T Oskotsky
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
3
|
Vallevik VB, Babic A, Marshall SE, Elvatun S, Brøgger HMB, Alagaratnam S, Edwin B, Veeraragavan NR, Befring AK, Nygård JF. Can I trust my fake data - A comprehensive quality assessment framework for synthetic tabular data in healthcare. Int J Med Inform 2024; 185:105413. [PMID: 38493547 DOI: 10.1016/j.ijmedinf.2024.105413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/17/2024] [Accepted: 03/11/2024] [Indexed: 03/19/2024]
Abstract
BACKGROUND Ensuring safe adoption of AI tools in healthcare hinges on access to sufficient data for training, testing and validation. Synthetic data has been suggested in response to privacy concerns and regulatory requirements and can be created by training a generator on real data to produce a dataset with similar statistical properties. Competing metrics with differing taxonomies for quality evaluation have been proposed, resulting in a complex landscape. Optimising quality entails balancing considerations that make the data fit for use, yet relevant dimensions are left out of existing frameworks. METHOD We performed a comprehensive literature review on the use of quality evaluation metrics on synthetic data within the scope of synthetic tabular healthcare data using deep generative methods. Based on this and the collective team experiences, we developed a conceptual framework for quality assurance. The applicability was benchmarked against a practical case from the Dutch National Cancer Registry. CONCLUSION We present a conceptual framework for quality assuranceof synthetic data for AI applications in healthcare that aligns diverging taxonomies, expands on common quality dimensions to include the dimensions of Fairness and Carbon footprint, and proposes stages necessary to support real-life applications. Building trust in synthetic data by increasing transparency and reducing the safety risk will accelerate the development and uptake of trustworthy AI tools for the benefit of patients. DISCUSSION Despite the growing emphasis on algorithmic fairness and carbon footprint, these metrics were scarce in the literature review. The overwhelming focus was on statistical similarity using distance metrics while sequential logic detection was scarce. A consensus-backed framework that includes all relevant quality dimensions can provide assurance for safe and responsible real-life applications of synthetic data. As the choice of appropriate metrics are highly context dependent, further research is needed on validation studies to guide metric choices and support the development of technical standards.
Collapse
Affiliation(s)
- Vibeke Binz Vallevik
- University of Oslo, Boks 1072 Blindern, NO-0316 Oslo, Norway; DNV AS, Veritasveien 1, 1322 Høvik, Norway.
| | | | | | - Severin Elvatun
- Cancer Registry of Norway, Ullernchausseen 64, 0379 Oslo, Norway
| | - Helga M B Brøgger
- DNV AS, Veritasveien 1, 1322 Høvik, Norway; Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway
| | | | - Bjørn Edwin
- University of Oslo, Boks 1072 Blindern, NO-0316 Oslo, Norway; The Intervention Centre and Department of HPB Surgery, Oslo University Hospital and Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | | | | | - Jan F Nygård
- Cancer Registry of Norway, Ullernchausseen 64, 0379 Oslo, Norway; UiT - The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
4
|
Maleki Varnosfaderani S, Forouzanfar M. The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century. Bioengineering (Basel) 2024; 11:337. [PMID: 38671759 PMCID: PMC11047988 DOI: 10.3390/bioengineering11040337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
As healthcare systems around the world face challenges such as escalating costs, limited access, and growing demand for personalized care, artificial intelligence (AI) is emerging as a key force for transformation. This review is motivated by the urgent need to harness AI's potential to mitigate these issues and aims to critically assess AI's integration in different healthcare domains. We explore how AI empowers clinical decision-making, optimizes hospital operation and management, refines medical image analysis, and revolutionizes patient care and monitoring through AI-powered wearables. Through several case studies, we review how AI has transformed specific healthcare domains and discuss the remaining challenges and possible solutions. Additionally, we will discuss methodologies for assessing AI healthcare solutions, ethical challenges of AI deployment, and the importance of data privacy and bias mitigation for responsible technology use. By presenting a critical assessment of AI's transformative potential, this review equips researchers with a deeper understanding of AI's current and future impact on healthcare. It encourages an interdisciplinary dialogue between researchers, clinicians, and technologists to navigate the complexities of AI implementation, fostering the development of AI-driven solutions that prioritize ethical standards, equity, and a patient-centered approach.
Collapse
Affiliation(s)
| | - Mohamad Forouzanfar
- Département de Génie des Systèmes, École de Technologie Supérieure (ÉTS), Université du Québec, Montréal, QC H3C 1K3, Canada
- Centre de Recherche de L’institut Universitaire de Gériatrie de Montréal (CRIUGM), Montréal, QC H3W 1W5, Canada
| |
Collapse
|
5
|
Liu L, Liu R, Lv Z, Huang D, Liu X. Dual blockchain-based data sharing mechanism with privacy protection for medical internet of things. Heliyon 2024; 10:e23575. [PMID: 38169943 PMCID: PMC10758875 DOI: 10.1016/j.heliyon.2023.e23575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 12/02/2023] [Accepted: 12/06/2023] [Indexed: 01/05/2024] Open
Abstract
In the period of big data, the Medical Internet of Things (MIoT) serves as a critical technology for modern medical data collection. Through medical devices and sensors, it enables real-time collection of a large amount of patients' physiological parameters and health data. However, these data are often generated in a high-speed, large-scale, and diverse manner, requiring integration with traditional medical systems, which further exacerbates the phenomenon of scattered and heterogeneous medical data. Additionally, the privacy and security requirements for the devices and sensor data involved in the MIoT are more stringent. Therefore, when designing a medical data sharing mechanism, the data privacy protection capability of the mechanism must be fully considered. This paper proposes an alliance chain medical data sharing mechanism based on a dual-chain structure to achieve secure sharing of medical data among entities such as medical institutions, research institutions, and cloud privacy centers, and at the same time provide privacy protection functions to achieve a balanced combination of privacy protection capability and data accessibility of medical data. First, a knowledge technology based on ciphertext policy attribute encryption with zero-knowledge concise non-interactive argumentation is used, combined with the data sharing structure of the federation chain, to ensure the integrity and privacy-protecting capability of medical data. Second, the approach employs certificate-based signing and proxy re-encryption technology, ensuring that entities can decrypt and verify medical data at the cloud privacy center using this methodology, consequently addressing the confidentiality concerns surrounding medical data. Third, an efficient and secure key identity-based encryption protocol is used to ensure the legitimacy of user identity and improve the security of medical data. Finally, the theoretical and practical performance analysis proves that the mechanism is feasible and efficient compared with other existing mechanisms.
Collapse
Affiliation(s)
- Linchen Liu
- Department of Rheumatology, Zhongda Hospital, School of Medicine, Southeast University, 210009, Nanjing, Jiangsu, China
| | - Ruyan Liu
- Engineering Research Center of Digital Forensics of Ministry of Education, School of Computer Science, Nanjing University of Information Science and Technology, 210044, Nanjing, Jiangsu, China
| | - Zhiying Lv
- Engineering Research Center of Digital Forensics of Ministry of Education, School of Computer Science, Nanjing University of Information Science and Technology, 210044, Nanjing, Jiangsu, China
| | - Ding Huang
- Engineering Research Center of Digital Forensics of Ministry of Education, School of Computer Science, Nanjing University of Information Science and Technology, 210044, Nanjing, Jiangsu, China
| | - Xing Liu
- School of Medicine, Southeast University, 210009, Nanjing, Jiangsu, China
| |
Collapse
|
6
|
van Breugel M, Fehrmann RSN, Bügel M, Rezwan FI, Holloway JW, Nawijn MC, Fontanella S, Custovic A, Koppelman GH. Current state and prospects of artificial intelligence in allergy. Allergy 2023; 78:2623-2643. [PMID: 37584170 DOI: 10.1111/all.15849] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/08/2023] [Accepted: 07/31/2023] [Indexed: 08/17/2023]
Abstract
The field of medicine is witnessing an exponential growth of interest in artificial intelligence (AI), which enables new research questions and the analysis of larger and new types of data. Nevertheless, applications that go beyond proof of concepts and deliver clinical value remain rare, especially in the field of allergy. This narrative review provides a fundamental understanding of the core concepts of AI and critically discusses its limitations and open challenges, such as data availability and bias, along with potential directions to surmount them. We provide a conceptual framework to structure AI applications within this field and discuss forefront case examples. Most of these applications of AI and machine learning in allergy concern supervised learning and unsupervised clustering, with a strong emphasis on diagnosis and subtyping. A perspective is shared on guidelines for good AI practice to guide readers in applying it effectively and safely, along with prospects of field advancement and initiatives to increase clinical impact. We anticipate that AI can further deepen our knowledge of disease mechanisms and contribute to precision medicine in allergy.
Collapse
Affiliation(s)
- Merlijn van Breugel
- Department of Pediatric Pulmonology and Pediatric Allergology, Beatrix Children's Hospital, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Groningen Research Institute for Asthma and COPD (GRIAC), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- MIcompany, Amsterdam, the Netherlands
| | - Rudolf S N Fehrmann
- Department of Medical Oncology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | | | - Faisal I Rezwan
- Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
- Department of Computer Science, Aberystwyth University, Aberystwyth, UK
| | - John W Holloway
- Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
- National Institute for Health and Care Research Southampton Biomedical Research Centre, University Hospitals Southampton NHS Foundation Trust, Southampton, UK
| | - Martijn C Nawijn
- Groningen Research Institute for Asthma and COPD (GRIAC), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Sara Fontanella
- National Heart and Lung Institute, Imperial College London, London, UK
- National Institute for Health and Care Research Imperial Biomedical Research Centre (BRC), London, UK
| | - Adnan Custovic
- National Heart and Lung Institute, Imperial College London, London, UK
- National Institute for Health and Care Research Imperial Biomedical Research Centre (BRC), London, UK
| | - Gerard H Koppelman
- Department of Pediatric Pulmonology and Pediatric Allergology, Beatrix Children's Hospital, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Groningen Research Institute for Asthma and COPD (GRIAC), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
7
|
Peppes N, Tsakanikas P, Daskalakis E, Alexakis T, Adamopoulou E, Demestichas K. FoGGAN: Generating Realistic Parkinson's Disease Freezing of Gait Data Using GANs. SENSORS (BASEL, SWITZERLAND) 2023; 23:8158. [PMID: 37836988 PMCID: PMC10574838 DOI: 10.3390/s23198158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 09/23/2023] [Accepted: 09/27/2023] [Indexed: 10/15/2023]
Abstract
Data scarcity in the healthcare domain is a major drawback for most state-of-the-art technologies engaging artificial intelligence. The unavailability of quality data due to both the difficulty to gather and label them as well as due to their sensitive nature create a breeding ground for data augmentation solutions. Parkinson's Disease (PD) which can have a wide range of symptoms including motor impairments consists of a very challenging case for quality data acquisition. Generative Adversarial Networks (GANs) can help alleviate such data availability issues. In this light, this study focuses on a data augmentation solution engaging Generative Adversarial Networks (GANs) using a freezing of gait (FoG) symptom dataset as input. The data generated by the so-called FoGGAN architecture presented in this study are almost identical to the original as concluded by a variety of similarity metrics. This highlights the significance of such solutions as they can provide credible synthetically generated data which can be utilized as training dataset inputs to AI applications. Additionally, a DNN classifier's performance is evaluated using three different evaluation datasets and the accuracy results were quite encouraging, highlighting that the FOGGAN solution could lead to the alleviation of the data shortage matter.
Collapse
Affiliation(s)
- Nikolaos Peppes
- Institute of Communication and Computer Systems, National Technical University of Athens, 15773 Athens, Greece; (P.T.); (E.D.); (T.A.); (E.A.)
| | - Panagiotis Tsakanikas
- Institute of Communication and Computer Systems, National Technical University of Athens, 15773 Athens, Greece; (P.T.); (E.D.); (T.A.); (E.A.)
| | - Emmanouil Daskalakis
- Institute of Communication and Computer Systems, National Technical University of Athens, 15773 Athens, Greece; (P.T.); (E.D.); (T.A.); (E.A.)
| | - Theodoros Alexakis
- Institute of Communication and Computer Systems, National Technical University of Athens, 15773 Athens, Greece; (P.T.); (E.D.); (T.A.); (E.A.)
| | - Evgenia Adamopoulou
- Institute of Communication and Computer Systems, National Technical University of Athens, 15773 Athens, Greece; (P.T.); (E.D.); (T.A.); (E.A.)
| | - Konstantinos Demestichas
- Department of Agricultural Economics and Rural Development, Agricultural University of Athens, 11855 Athens, Greece;
| |
Collapse
|
8
|
Zhan W, Chen B, Wu X, Yang Z, Lin C, Lin J, Guan X. Wood identification of Cyclobalanopsis (Endl.) Oerst based on microscopic features and CTGAN-enhanced explainable machine learning models. FRONTIERS IN PLANT SCIENCE 2023; 14:1203836. [PMID: 37484454 PMCID: PMC10361066 DOI: 10.3389/fpls.2023.1203836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 06/06/2023] [Indexed: 07/25/2023]
Abstract
Introduction Accurate and fast identification of wood at the species level is critical for protecting and conserving tree species resources. The current identification methods are inefficient, costly, and complex. Methods A wood species identification model based on wood anatomy and using the Cyclobalanopsis genus wood cell geometric dataset was proposed. The model was enhanced by the CTGAN deep learning algorithm and used a simulated cell geometric feature dataset. The machine learning models BPNN and SVM were trained respectively for recognition of three Cyclobalanopsis species with simulated vessel cells and simulated wood fiber cells. Results The SVM model and BPNN model achieved recognition accuracy of 96.4% and 99.6%, respectively, on the real dataset, using the CTGAN-generated vessel dataset. The BPNN model and SVM model achieved recognition accuracy of 75.5% and 77.9% on real dataset, respectively, using the CTGAN-generated wood fiber dataset. Discussion The machine learning model trained based on the enhanced cell geometric feature data by CTGAN achieved good recognition of Cyclobalanopsis, with the SVM model having a higher prediction accuracy than BPNN. The machine learning models were interpreted based on LIME to explore how they identify tree species based on wood cell geometric features. This proposed model can be used for efficient and cost-effective identification of wood species in industrial applications.
Collapse
Affiliation(s)
- Weihui Zhan
- College of Materials Engineering, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
| | - Bowen Chen
- College of Materials Engineering, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
| | - Xiaolian Wu
- College of Materials Engineering, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
| | - Zhen Yang
- College of Transportation and Civil Engineering, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
| | - Che Lin
- College of Materials Engineering, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
| | - Jinguo Lin
- College of Materials Engineering, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
- National Forestry and Grassland Administration Key Laboratory of Plant Fiber Functional Materials, Fuzhou, Fujian, China
| | - Xin Guan
- College of Materials Engineering, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China
- National Forestry and Grassland Administration Key Laboratory of Plant Fiber Functional Materials, Fuzhou, Fujian, China
| |
Collapse
|
9
|
Sun C, van Soest J, Dumontier M. Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy. J Biomed Inform 2023:104404. [PMID: 37268168 DOI: 10.1016/j.jbi.2023.104404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/25/2023] [Accepted: 05/21/2023] [Indexed: 06/04/2023]
Abstract
A large amount of personal health data that is highly valuable to the scientific community is still not accessible or requires a lengthy request process due to privacy concerns and legal restrictions. As a solution, synthetic data has been studied and proposed to be a promising alternative to this issue. However, generating realistic and privacy-preserving synthetic personal health data retains challenges such as simulating the characteristics of the patients' data that are in the minority classes, capturing the relations among variables in imbalanced data and transferring them to the synthetic data, and preserving individual patients' privacy. In this paper, we propose a differentially private conditional Generative Adversarial Network model (DP-CGANS) consisting of data transformation, sampling, conditioning, and network training to generate realistic and privacy-preserving personal data. Our model distinguishes categorical and continuous variables and transforms them into latent space separately for better training performance. We tackle the unique challenges of generating synthetic patient data due to the special data characteristics of personal health data. For example, patients with a certain disease are typically the minority in the dataset and the relations among variables are crucial to be observed. Our model is structured with a conditional vector as an additional input to present the minority class in the imbalanced data and maximally capture the dependency between variables. Moreover, we inject statistical noise into the gradients in the networking training process of DP-CGANS to provide a differential privacy guarantee. We extensively evaluate our model with state-of-the-art generative models on personal socio-economic datasets and real-world personal health datasets in terms of statistical similarity, machine learning performance, and privacy measurement. We demonstrate that our model outperforms other comparable models, especially in capturing the dependence between variables. Finally, we present the balance between data utility and privacy in synthetic data generation considering the different data structures and characteristics of real-world personal health data such as imbalanced classes, abnormal distributions, and data sparsity.
Collapse
Affiliation(s)
- Chang Sun
- Institute of Data Science, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands; Department of Advanced Computing Sciences, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands.
| | - Johan van Soest
- Brightlands Institute of Smart Society, Faculty of Science and Engineering, Maastricht University, Heerlen, The Netherlands; Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre, Maastricht, The Netherlands.
| | - Michel Dumontier
- Institute of Data Science, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands; Department of Advanced Computing Sciences, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands.
| |
Collapse
|
10
|
Prezja F, Paloneva J, Pölönen I, Niinimäki E, Äyrämö S. DeepFake knee osteoarthritis X-rays from generative adversarial neural networks deceive medical experts and offer augmentation potential to automatic classification. Sci Rep 2022; 12:18573. [PMID: 36329253 PMCID: PMC9633706 DOI: 10.1038/s41598-022-23081-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 10/25/2022] [Indexed: 11/05/2022] Open
Abstract
Recent developments in deep learning have impacted medical science. However, new privacy issues and regulatory frameworks have hindered medical data sharing and collection. Deep learning is a very data-intensive process for which such regulatory limitations limit the potential for new breakthroughs and collaborations. However, generating medically accurate synthetic data can alleviate privacy issues and potentially augment deep learning pipelines. This study presents generative adversarial neural networks capable of generating realistic images of knee joint X-rays with varying osteoarthritis severity. We offer 320,000 synthetic (DeepFake) X-ray images from training with 5,556 real images. We validated our models regarding medical accuracy with 15 medical experts and for augmentation effects with an osteoarthritis severity classification task. We devised a survey of 30 real and 30 DeepFake images for medical experts. The result showed that on average, more DeepFakes were mistaken for real than the reverse. The result signified sufficient DeepFake realism for deceiving the medical experts. Finally, our DeepFakes improved classification accuracy in an osteoarthritis severity classification task with scarce real data and transfer learning. In addition, in the same classification task, we replaced all real training data with DeepFakes and suffered only a [Formula: see text] loss from baseline accuracy in classifying real osteoarthritis X-rays.
Collapse
Affiliation(s)
- Fabi Prezja
- grid.9681.60000 0001 1013 7965Faculty of Information Technology, University of Jyväskylä, 40014 Jyväskylä, Finland
| | - Juha Paloneva
- grid.460356.20000 0004 0449 0385Department of Surgery, Central Finland Healthcare District, 40620 Jyväskylä, Finland ,grid.9668.10000 0001 0726 2490School of Medicine, University of Eastern Finland, 70211 Kuopio, Finland
| | - Ilkka Pölönen
- grid.9681.60000 0001 1013 7965Faculty of Information Technology, University of Jyväskylä, 40014 Jyväskylä, Finland
| | - Esko Niinimäki
- grid.9681.60000 0001 1013 7965Faculty of Information Technology, University of Jyväskylä, 40014 Jyväskylä, Finland
| | - Sami Äyrämö
- grid.9681.60000 0001 1013 7965Faculty of Information Technology, University of Jyväskylä, 40014 Jyväskylä, Finland
| |
Collapse
|
11
|
Wu Y, Wang B, Yuan R, Watada J. A Gramian angular field-based data-driven approach for multiregion and multisource renewable scenario generation. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
12
|
Wang Z, Cheng X, Su S, Wang G. Differentially Private Generative Decomposed Adversarial Network for Vertically Partitioned Data Sharing. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
13
|
|
14
|
Yu R, Tian Y, Gao J, Liu Z, Wei X, Jiang H, Huang Y, Li X. Feature discretization-based deep clustering for thyroid ultrasound image feature extraction. Comput Biol Med 2022; 146:105600. [DOI: 10.1016/j.compbiomed.2022.105600] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 04/28/2022] [Accepted: 05/06/2022] [Indexed: 02/08/2023]
|