Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Beaulieu-Jones BK, Wu ZS, Williams C, Lee R, Bhavnani SP, Byrd JB, Greene CS. Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing. Circ Cardiovasc Qual Outcomes 2019;12:e005122. [PMID: 31284738 PMCID: PMC7041894 DOI: 10.1161/circoutcomes.118.005122] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

For:	Beaulieu-Jones BK, Wu ZS, Williams C, Lee R, Bhavnani SP, Byrd JB, Greene CS. Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing. Circ Cardiovasc Qual Outcomes 2019;12:e005122. [PMID: 31284738 PMCID: PMC7041894 DOI: 10.1161/circoutcomes.118.005122] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Number

Cited by Other Article(s)

Thangaraj PM, Benson SH, Oikonomou EK, Asselbergs FW, Khera R. Cardiovascular care with digital twin technology in the era of generative artificial intelligence. Eur Heart J 2024:ehae619. [PMID: 39322420 DOI: 10.1093/eurheartj/ehae619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/16/2024] [Accepted: 09/01/2024] [Indexed: 09/27/2024] Open

Cho H, Froelicher D, Dokmai N, Nandi A, Sadhuka S, Hong MM, Berger B. Privacy-Enhancing Technologies in Biomedical Data Science. Annu Rev Biomed Data Sci 2024;7:317-343. [PMID: 39178425 PMCID: PMC11346580 DOI: 10.1146/annurev-biodatasci-120423-120107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]

Prediger L, Jälkö J, Honkela A, Kaski S. Collaborative learning from distributed data with differentially private synthetic data. BMC Med Inform Decis Mak 2024;24:167. [PMID: 38877563 PMCID: PMC11179391 DOI: 10.1186/s12911-024-02563-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 06/03/2024] [Indexed: 06/16/2024] Open

Abstract

BACKGROUND

Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible due to privacy concerns and parties are unable to engage in centrally coordinated joint computation. We study the feasibility of combining privacy preserving synthetic data sets in place of the original data for collaborative learning on real-world health data from the UK Biobank.

METHODS

We perform an empirical evaluation based on an existing prospective cohort study from the literature. Multiple parties were simulated by splitting the UK Biobank cohort along assessment centers, for which we generate synthetic data using differentially private generative modelling techniques. We then apply the original study's Poisson regression analysis on the combined synthetic data sets and evaluate the effects of 1) the size of local data set, 2) the number of participating parties, and 3) local shifts in distributions, on the obtained likelihood scores.

RESULTS

We discover that parties engaging in the collaborative learning via shared synthetic data obtain more accurate estimates of the regression parameters compared to using only their local data. This finding extends to the difficult case of small heterogeneous data sets. Furthermore, the more parties participate, the larger and more consistent the improvements become up to a certain limit. Finally, we find that data sharing can especially help parties whose data contain underrepresented groups to perform better-adjusted analysis for said groups.

CONCLUSIONS

Based on our results we conclude that sharing of synthetic data is a viable method for enabling learning from sensitive data without violating privacy constraints even if individual data sets are small or do not represent the overall population well. Lack of access to distributed sensitive data is often a bottleneck in biomedical research, which our study shows can be alleviated with privacy-preserving collaborative learning methods.

Collapse

Vallevik VB, Babic A, Marshall SE, Elvatun S, Brøgger HMB, Alagaratnam S, Edwin B, Veeraragavan NR, Befring AK, Nygård JF. Can I trust my fake data - A comprehensive quality assessment framework for synthetic tabular data in healthcare. Int J Med Inform 2024;185:105413. [PMID: 38493547 DOI: 10.1016/j.ijmedinf.2024.105413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/17/2024] [Accepted: 03/11/2024] [Indexed: 03/19/2024]

Abstract

BACKGROUND

Ensuring safe adoption of AI tools in healthcare hinges on access to sufficient data for training, testing and validation. Synthetic data has been suggested in response to privacy concerns and regulatory requirements and can be created by training a generator on real data to produce a dataset with similar statistical properties. Competing metrics with differing taxonomies for quality evaluation have been proposed, resulting in a complex landscape. Optimising quality entails balancing considerations that make the data fit for use, yet relevant dimensions are left out of existing frameworks.

METHOD

We performed a comprehensive literature review on the use of quality evaluation metrics on synthetic data within the scope of synthetic tabular healthcare data using deep generative methods. Based on this and the collective team experiences, we developed a conceptual framework for quality assurance. The applicability was benchmarked against a practical case from the Dutch National Cancer Registry.

CONCLUSION

We present a conceptual framework for quality assuranceof synthetic data for AI applications in healthcare that aligns diverging taxonomies, expands on common quality dimensions to include the dimensions of Fairness and Carbon footprint, and proposes stages necessary to support real-life applications. Building trust in synthetic data by increasing transparency and reducing the safety risk will accelerate the development and uptake of trustworthy AI tools for the benefit of patients.

DISCUSSION

Despite the growing emphasis on algorithmic fairness and carbon footprint, these metrics were scarce in the literature review. The overwhelming focus was on statistical similarity using distance metrics while sequential logic detection was scarce. A consensus-backed framework that includes all relevant quality dimensions can provide assurance for safe and responsible real-life applications of synthetic data. As the choice of appropriate metrics are highly context dependent, further research is needed on validation studies to guide metric choices and support the development of technical standards.

Collapse

Carey EG, Adeyemi FO, Neelakantan L, Fernandes B, Fazel M, Ford T, Burn AM. Preferences on Governance Models for Mental Health Data: Qualitative Study With Young People. JMIR Form Res 2024;8:e50368. [PMID: 38652525 PMCID: PMC11077411 DOI: 10.2196/50368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 11/08/2023] [Accepted: 03/22/2024] [Indexed: 04/25/2024] Open

Abstract

BACKGROUND

Improving access to mental health data to accelerate research and improve mental health outcomes is a potentially achievable goal given the substantial data that can now be collected from mobile devices. Smartphones can provide a useful mechanism for collecting mental health data from young people, especially as their use is relatively ubiquitous in high-resource settings such as the United Kingdom and they have a high capacity to collect active and passive data. This raises the interesting opportunity to establish a large bank of mental health data from young people that could be accessed by researchers worldwide, but it is important to clarify how to ensure that this is done in an appropriate manner aligned with the values of young people.

OBJECTIVE

In this study, we discussed the preferences of young people in the United Kingdom regarding the governance, sharing, and use of their mental health data with the establishment of a global data bank in mind. We aimed to determine whether young people want and feel safe to share their mental health data; if so, with whom; and their preferences in doing so.

METHODS

Young people (N=46) were provided with 2 modules of educational material about data governance models and background in scientific research. We then conducted 2-hour web-based group sessions using a deliberative democracy methodology to reach a consensus where possible. Findings were analyzed using the framework method.

RESULTS

Young people were generally enthusiastic about contributing data to mental health research. They believed that broader availability of mental health data could be used to discover what improves or worsens mental health and develop new services to support young people. However, this enthusiasm came with many concerns and caveats, including distributed control of access to ensure appropriate use, distributed power, and data management that included diverse representation and sufficient ethical training for applicants and data managers.

CONCLUSIONS

Although it is feasible to use smartphones to collect mental health data from young people in the United Kingdom, it is essential to carefully consider the parameters of such a data bank. Addressing and embedding young people's preferences, including the need for robust procedures regarding how their data are managed, stored, and accessed, will set a solid foundation for establishing any global data bank.

Collapse

El Emam K, Mosquera L, Fang X, El-Hussuna A. An evaluation of the replicability of analyses using synthetic health data. Sci Rep 2024;14:6978. [PMID: 38521806 PMCID: PMC10960851 DOI: 10.1038/s41598-024-57207-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 03/15/2024] [Indexed: 03/25/2024] Open

Abstract

Synthetic data generation is being increasingly used as a privacy preserving approach for sharing health data. In addition to protecting privacy, it is important to ensure that generated data has high utility. A common way to assess utility is the ability of synthetic data to replicate results from the real data. Replicability has been defined using two criteria: (a) replicate the results of the analyses on real data, and (b) ensure valid population inferences from the synthetic data. A simulation study using three heterogeneous real-world datasets evaluated the replicability of logistic regression workloads. Eight replicability metrics were evaluated: decision agreement, estimate agreement, standardized difference, confidence interval overlap, bias, confidence interval coverage, statistical power, and precision (empirical SE). The analysis of synthetic data used a multiple imputation approach whereby up to 20 datasets were generated and the fitted logistic regression models were combined using combining rules for fully synthetic datasets. The effects of synthetic data amplification were evaluated, and two types of generative models were used: sequential synthesis using boosted decision trees and a generative adversarial network (GAN). Privacy risk was evaluated using a membership disclosure metric. For sequential synthesis, adjusted model parameters after combining at least ten synthetic datasets gave high decision and estimate agreement, low standardized difference, as well as high confidence interval overlap, low bias, the confidence interval had nominal coverage, and power close to the nominal level. Amplification had only a marginal benefit. Confidence interval coverage from a single synthetic dataset without applying combining rules were erroneous, and statistical power, as expected, was artificially inflated when amplification was used. Sequential synthesis performed considerably better than the GAN across multiple datasets. Membership disclosure risk was low for all datasets and models. For replicable results, the statistical analysis of fully synthetic data should be based on at least ten generated datasets of the same size as the original whose analyses results are combined. Analysis results from synthetic data without applying combining rules can be misleading. Replicability results are dependent on the type of generative model used, with our study suggesting that sequential synthesis has good replicability characteristics for common health research workloads.

Collapse

Yuan J, Tang R, Jiang X, Hu X. Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024;2023:1324-1333. [PMID: 38222339 PMCID: PMC10785941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]

Bordukova M, Makarov N, Rodriguez-Esteban R, Schmich F, Menden MP. Generative artificial intelligence empowers digital twins in drug discovery and clinical trials. Expert Opin Drug Discov 2024;19:33-42. [PMID: 37887266 DOI: 10.1080/17460441.2023.2273839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 10/18/2023] [Indexed: 10/28/2023]

Gouda MA, Hong W, Jiang D, Feng N, Zhou B, Li Z. Synthesis of sEMG Signals for Hand Gestures Using a 1DDCGAN. Bioengineering (Basel) 2023;10:1353. [PMID: 38135944 PMCID: PMC10740493 DOI: 10.3390/bioengineering10121353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/18/2023] [Accepted: 11/20/2023] [Indexed: 12/24/2023] Open

Kang HYJ, Batbaatar E, Choi DW, Choi KS, Ko M, Ryu KS. Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy. JMIR Med Inform 2023;11:e47859. [PMID: 37999942 DOI: 10.2196/47859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 08/02/2023] [Accepted: 10/28/2023] [Indexed: 11/25/2023] Open

Abstract

BACKGROUND

Synthetic data generation (SDG) based on generative adversarial networks (GANs) is used in health care, but research on preserving data with logical relationships with synthetic tabular data (STD) remains challenging. Filtering methods for SDG can lead to the loss of important information.

OBJECTIVE

This study proposed a divide-and-conquer (DC) method to generate STD based on the GAN algorithm, while preserving data with logical relationships.

METHODS

The proposed method was evaluated on data from the Korea Association for Lung Cancer Registry (KALC-R) and 2 benchmark data sets (breast cancer and diabetes). The DC-based SDG strategy comprises 3 steps: (1) We used 2 different partitioning methods (the class-specific criterion distinguished between survival and death groups, while the Cramer V criterion identified the highest correlation between columns in the original data); (2) the entire data set was divided into a number of subsets, which were then used as input for the conditional tabular generative adversarial network and the copula generative adversarial network to generate synthetic data; and (3) the generated synthetic data were consolidated into a single entity. For validation, we compared DC-based SDG and conditional sampling (CS)-based SDG through the performances of machine learning models. In addition, we generated imbalanced and balanced synthetic data for each of the 3 data sets and compared their performance using 4 classifiers: decision tree (DT), random forest (RF), Extreme Gradient Boosting (XGBoost), and light gradient-boosting machine (LGBM) models.

RESULTS

The synthetic data of the 3 diseases (non-small cell lung cancer [NSCLC], breast cancer, and diabetes) generated by our proposed model outperformed the 4 classifiers (DT, RF, XGBoost, and LGBM). The CS- versus DC-based model performances were compared using the mean area under the curve (SD) values: 74.87 (SD 0.77) versus 63.87 (SD 2.02) for NSCLC, 73.31 (SD 1.11) versus 67.96 (SD 2.15) for breast cancer, and 61.57 (SD 0.09) versus 60.08 (SD 0.17) for diabetes (DT); 85.61 (SD 0.29) versus 79.01 (SD 1.20) for NSCLC, 78.05 (SD 1.59) versus 73.48 (SD 4.73) for breast cancer, and 59.98 (SD 0.24) versus 58.55 (SD 0.17) for diabetes (RF); 85.20 (SD 0.82) versus 76.42 (SD 0.93) for NSCLC, 77.86 (SD 2.27) versus 68.32 (SD 2.37) for breast cancer, and 60.18 (SD 0.20) versus 58.98 (SD 0.29) for diabetes (XGBoost); and 85.14 (SD 0.77) versus 77.62 (SD 1.85) for NSCLC, 78.16 (SD 1.52) versus 70.02 (SD 2.17) for breast cancer, and 61.75 (SD 0.13) versus 61.12 (SD 0.23) for diabetes (LGBM). In addition, we found that balanced synthetic data performed better.

CONCLUSIONS

This study is the first attempt to generate and validate STD based on a DC approach and shows improved performance using STD. The necessity for balanced SDG was also demonstrated.

Collapse

Xing X, Ser JD, Wu Y, Li Y, Xia J, Xu L, Firmin D, Gatehouse P, Yang G. HDL: Hybrid Deep Learning for the Synthesis of Myocardial Velocity Maps in Digital Twins for Cardiac Analysis. IEEE J Biomed Health Inform 2023;27:5134-5142. [PMID: 35290192 DOI: 10.1109/jbhi.2022.3158897] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Bonomi L, Gousheh S, Fan L. Enabling Health Data Sharing with Fine-Grained Privacy. PROCEEDINGS OF THE ... ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT. ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT 2023;2023:131-141. [PMID: 37906633 PMCID: PMC10601092 DOI: 10.1145/3583780.3614864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]

García-Domínguez A, Galván-Tejada CE, Magallanes-Quintanar R, Cruz M, Gonzalez-Curiel I, Delgado-Contreras JR, Soto-Murillo MA, Celaya-Padilla JM, Galván-Tejada JI. Optimizing Clinical Diabetes Diagnosis through Generative Adversarial Networks: Evaluation and Validation. Diseases 2023;11:134. [PMID: 37873778 PMCID: PMC10594466 DOI: 10.3390/diseases11040134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 09/24/2023] [Accepted: 09/28/2023] [Indexed: 10/25/2023] Open

Affiliation(s)

Antonio García-Domínguez Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (A.G.-D.); (R.M.-Q.); (J.R.D.-C.); (M.A.S.-M.); (J.M.C.-P.); (J.I.G.-T.)
Carlos E. Galván-Tejada Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (A.G.-D.); (R.M.-Q.); (J.R.D.-C.); (M.A.S.-M.); (J.M.C.-P.); (J.I.G.-T.)
Rafael Magallanes-Quintanar Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (A.G.-D.); (R.M.-Q.); (J.R.D.-C.); (M.A.S.-M.); (J.M.C.-P.); (J.I.G.-T.)
Miguel Cruz Medical Research Unit in Biochemestry, National Medical Center Siglo XXI, IMSS, Mexico City 06720, Mexico;
Irma Gonzalez-Curiel Unidad Académica de Ciencias Químicas, Universidad Autónoma de Zacatecas, Jardín Juarez 147, Centro, Zacatecas 98000, Mexico;
J. Rubén Delgado-Contreras Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (A.G.-D.); (R.M.-Q.); (J.R.D.-C.); (M.A.S.-M.); (J.M.C.-P.); (J.I.G.-T.)
Manuel A. Soto-Murillo Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (A.G.-D.); (R.M.-Q.); (J.R.D.-C.); (M.A.S.-M.); (J.M.C.-P.); (J.I.G.-T.)
José M. Celaya-Padilla Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (A.G.-D.); (R.M.-Q.); (J.R.D.-C.); (M.A.S.-M.); (J.M.C.-P.); (J.I.G.-T.)
Jorge I. Galván-Tejada Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (A.G.-D.); (R.M.-Q.); (J.R.D.-C.); (M.A.S.-M.); (J.M.C.-P.); (J.I.G.-T.)

Collapse

Peppes N, Tsakanikas P, Daskalakis E, Alexakis T, Adamopoulou E, Demestichas K. FoGGAN: Generating Realistic Parkinson's Disease Freezing of Gait Data Using GANs. SENSORS (BASEL, SWITZERLAND) 2023;23:8158. [PMID: 37836988 PMCID: PMC10574838 DOI: 10.3390/s23198158] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 09/23/2023] [Accepted: 09/27/2023] [Indexed: 10/15/2023]

Pun FW, Ozerov IV, Zhavoronkov A. AI-powered therapeutic target discovery. Trends Pharmacol Sci 2023;44:561-572. [PMID: 37479540 DOI: 10.1016/j.tips.2023.06.010] [Citation(s) in RCA: 39] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 06/20/2023] [Accepted: 06/23/2023] [Indexed: 07/23/2023]

Jacobs F, D'Amico S, Benvenuti C, Gaudio M, Saltalamacchia G, Miggiano C, De Sanctis R, Della Porta MG, Santoro A, Zambelli A. Opportunities and Challenges of Synthetic Data Generation in Oncology. JCO Clin Cancer Inform 2023;7:e2300045. [PMID: 37535875 DOI: 10.1200/cci.23.00045] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/05/2023] [Accepted: 05/25/2023] [Indexed: 08/05/2023] Open

Zuber S, Bechtiger L, Bodelet JS, Golin M, Heumann J, Kim JH, Klee M, Mur J, Noll J, Voll S, O’Keefe P, Steinhoff A, Zölitz U, Muniz-Terrera G, Shanahan L, Shanahan MJ, Hofer SM. An integrative approach for the analysis of risk and health across the life course: challenges, innovations, and opportunities for life course research. DISCOVER SOCIAL SCIENCE AND HEALTH 2023;3:14. [PMID: 37469576 PMCID: PMC10352429 DOI: 10.1007/s44155-023-00044-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 06/26/2023] [Indexed: 07/21/2023]

Affiliation(s)

Sascha Zuber Institute On Aging & Lifelong Health, University of Victoria, Victoria, BC Canada Center for the Interdisciplinary Study of Gerontology and Vulnerability, University of Geneva, Geneva, Switzerland
Laura Bechtiger Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Julien Stéphane Bodelet Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Marta Golin Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Jens Heumann Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Jung Hyun Kim University of Luxembourg, Esch-sur-Alzette, Luxembourg
Matthias Klee University of Luxembourg, Esch-sur-Alzette, Luxembourg
Jure Mur University of Edinburgh, Edinburgh, Scotland
Jennie Noll Pennsylvania State University, State College, PA USA
Stacey Voll Institute On Aging & Lifelong Health, University of Victoria, Victoria, BC Canada
Patrick O’Keefe Department of Neurology, Oregon Health & Science University, Portland, OR USA
Annekatrin Steinhoff Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland University Hospital of Child and Adolescent Psychiatry and Psychotherapy, University of Bern, Bern, Switzerland
Ulf Zölitz Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Graciela Muniz-Terrera Ohio University, Athens, OH USA
Lilly Shanahan Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland Department of Psychology, University of Zürich, Zürich, Switzerland
Michael J. Shanahan Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland Department of Sociology, University of Zürich, Zürich, Switzerland
Scott M. Hofer Institute On Aging & Lifelong Health, University of Victoria, Victoria, BC Canada Department of Neurology, Oregon Health & Science University, Portland, OR USA

Collapse

Azizi Z, Lindner S, Shiba Y, Raparelli V, Norris CM, Kublickiene K, Herrero MT, Kautzky-Willer A, Klimek P, Gisinger T, Pilote L, El Emam K. A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health. Sci Rep 2023;13:11540. [PMID: 37460705 DOI: 10.1038/s41598-023-38457-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 07/08/2023] [Indexed: 07/20/2023] Open

Affiliation(s)

Zahra Azizi Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 De Maisonneuve Blvd, Office 2B.39, Montréal, QC, H4A 3S5, Canada
Simon Lindner Department of Internal Medicine III, Division of Endocrinology and Metabolism, Gender Medicine Unit, Medical University of Vienna, Vienna, Austria
Yumika Shiba Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 De Maisonneuve Blvd, Office 2B.39, Montréal, QC, H4A 3S5, Canada Faculty of Medicine, McGill University, Montreal, Canada
Valeria Raparelli Department of Translational Medicine, University of Ferrara, Ferrara, Italy Faculty of Nursing, University of Alberta, Edmonton, AB, Canada
Colleen M Norris Faculty of Nursing, University of Alberta, Edmonton, AB, Canada Heart and Stroke Strategic Clinical Networks, Alberta Health Services, Alberta, Canada
Karolina Kublickiene Karolinska Institute, Stockholm, Sweden
Maria Trinidad Herrero Clinical & Experimental Neuroscience (NiCE-IMIB-IUIE), School of Medicine, University of Murcia, Murcia, Spain
Alexandra Kautzky-Willer Department of Internal Medicine III, Division of Endocrinology and Metabolism, Gender Medicine Unit, Medical University of Vienna, Vienna, Austria
Peter Klimek Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Vienna, Austria Complexity Science Hub Vienna, Vienna, Austria
Teresa Gisinger Division of Endocrinology and Metabolism, Medical University of Vienna, Vienna, Austria
Louise Pilote Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 De Maisonneuve Blvd, Office 2B.39, Montréal, QC, H4A 3S5, Canada. Divisions of Clinical Epidemiology and General Internal Medicine, McGill University Health Centre Research Institute, Montreal, QC, Canada.
Khaled El Emam Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada. School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada. Replica Analytics Ltd, Ottawa, ON, Canada.

Collapse

Scendoni R, Tomassini L, Cingolani M, Perali A, Pilati S, Fedeli P. Artificial Intelligence in Evaluation of Permanent Impairment: New Operational Frontiers. Healthcare (Basel) 2023;11:1979. [PMID: 37510420 PMCID: PMC10378994 DOI: 10.3390/healthcare11141979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/01/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open

Sun H, Plawinski J, Subramaniam S, Jamaludin A, Kadir T, Readie A, Ligozio G, Ohlssen D, Baillie M, Coroller T. A deep learning approach to private data sharing of medical images using conditional generative adversarial networks (GANs). PLoS One 2023;18:e0280316. [PMID: 37410795 PMCID: PMC10325103 DOI: 10.1371/journal.pone.0280316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 12/27/2022] [Indexed: 07/08/2023] Open

Wang X, Dervishi L, Li W, Jiang X, Ayday E, Vaidya J. Efficient Federated Kinship Relationship Identification. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2023;2023:534-543. [PMID: 37351796 PMCID: PMC10283133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/24/2023]

Fritzsche MC, Akyüz K, Cano Abadía M, McLennan S, Marttinen P, Mayrhofer MT, Buyx AM. Ethical layering in AI-driven polygenic risk scores-New complexities, new challenges. Front Genet 2023;14:1098439. [PMID: 36816027 PMCID: PMC9933509 DOI: 10.3389/fgene.2023.1098439] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/04/2023] [Indexed: 01/27/2023] Open

Abstract

Researchers aim to develop polygenic risk scores as a tool to prevent and more effectively treat serious diseases, disorders and conditions such as breast cancer, type 2 diabetes mellitus and coronary heart disease. Recently, machine learning techniques, in particular deep neural networks, have been increasingly developed to create polygenic risk scores using electronic health records as well as genomic and other health data. While the use of artificial intelligence for polygenic risk scores may enable greater accuracy, performance and prediction, it also presents a range of increasingly complex ethical challenges. The ethical and social issues of many polygenic risk score applications in medicine have been widely discussed. However, in the literature and in practice, the ethical implications of their confluence with the use of artificial intelligence have not yet been sufficiently considered. Based on a comprehensive review of the existing literature, we argue that this stands in need of urgent consideration for research and subsequent translation into the clinical setting. Considering the many ethical layers involved, we will first give a brief overview of the development of artificial intelligence-driven polygenic risk scores, associated ethical and social implications, challenges in artificial intelligence ethics, and finally, explore potential complexities of polygenic risk scores driven by artificial intelligence. We point out emerging complexity regarding fairness, challenges in building trust, explaining and understanding artificial intelligence and polygenic risk scores as well as regulatory uncertainties and further challenges. We strongly advocate taking a proactive approach to embedding ethics in research and implementation processes for polygenic risk scores driven by artificial intelligence.

Collapse

Hernadez M, Epelde G, Alberdi A, Cilla R, Rankin D. Synthetic Tabular Data Evaluation in the Health Domain Covering Resemblance, Utility, and Privacy Dimensions. Methods Inf Med 2023. [PMID: 36623830 DOI: 10.1055/s-0042-1760247] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Abstract

BACKGROUND

Synthetic tabular data generation is a potentially valuable technology with great promise for data augmentation and privacy preservation. However, prior to adoption, an empirical assessment of generated synthetic tabular data is required across dimensions relevant to the target application to determine its efficacy. A lack of standardized and objective evaluation and benchmarking strategy for synthetic tabular data in the health domain has been found in the literature.

OBJECTIVE

The aim of this paper is to identify key dimensions, per dimension metrics, and methods for evaluating synthetic tabular data generated with different techniques and configurations for health domain application development and to provide a strategy to orchestrate them.

METHODS

Based on the literature, the resemblance, utility, and privacy dimensions have been prioritized, and a collection of metrics and methods for their evaluation are orchestrated into a complete evaluation pipeline. This way, a guided and comparative assessment of generated synthetic tabular data can be done, categorizing its quality into three categories ("Excellent," "Good," and "Poor"). Six health care-related datasets and four synthetic tabular data generation approaches have been chosen to conduct an analysis and evaluation to verify the utility of the proposed evaluation pipeline.

RESULTS

The synthetic tabular data generated with the four selected approaches has maintained resemblance, utility, and privacy for most datasets and synthetic tabular data generation approach combination. In several datasets, some approaches have outperformed others, while in other datasets, more than one approach has yielded the same performance.

CONCLUSION

The results have shown that the proposed pipeline can effectively be used to evaluate and benchmark the synthetic tabular data generated by various synthetic tabular data generation approaches. Therefore, this pipeline can support the scientific community in selecting the most suitable synthetic tabular data generation approaches for their data and application of interest.

Collapse

Ge S, Liu B, Wang P, Li Y, Zeng D. Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022;PP:116-127. [PMID: 37015525 DOI: 10.1109/tip.2022.3226416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Rajotte JF, Bergen R, Buckeridge DL, El Emam K, Ng R, Strome E. Synthetic data as an enabler for machine learning applications in medicine. iScience 2022;25:105331. [PMID: 36325058 PMCID: PMC9619172 DOI: 10.1016/j.isci.2022.105331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open

Sakthivel RK, Nagasubramanian G, Sankayya M, Al-Turjman F. Multilingual News Feed Analysis Using Intelligent Linguistic Particle Filtering Techniques. ACM T ASIAN LOW-RESO 2022. [DOI: 10.1145/3569899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

El Emam K, Mosquera L, Fang X. Validating a membership disclosure metric for synthetic health data. JAMIA Open 2022;5:ooac083. [PMID: 36238080 PMCID: PMC9553223 DOI: 10.1093/jamiaopen/ooac083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/13/2022] [Accepted: 09/22/2022] [Indexed: 11/24/2022] Open

Shi J, Wang D, Tesei G, Norgeot B. Generating high-fidelity privacy-conscious synthetic patient data for causal effect estimation with multiple treatments. Front Artif Intell 2022;5:918813. [PMID: 36187323 PMCID: PMC9515575 DOI: 10.3389/frai.2022.918813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 08/15/2022] [Indexed: 12/03/2022] Open

Abstract

In the past decade, there has been exponentially growing interest in the use of observational data collected as a part of routine healthcare practice to determine the effect of a treatment with causal inference models. Validation of these models, however, has been a challenge because the ground truth is unknown: only one treatment-outcome pair for each person can be observed. There have been multiple efforts to fill this void using synthetic data where the ground truth can be generated. However, to date, these datasets have been severely limited in their utility either by being modeled after small non-representative patient populations, being dissimilar to real target populations, or only providing known effects for two cohorts (treated vs. control). In this work, we produced a large-scale and realistic synthetic dataset that provides ground truth effects for over 10 hypertension treatments on blood pressure outcomes. The synthetic dataset was created by modeling a nationwide cohort of more than 580, 000 hypertension patient data including each person's multi-year history of diagnoses, medications, and laboratory values. We designed a data generation process by combining an adapted ADS-GAN model for fictitious patient information generation and a neural network for treatment outcome generation. Wasserstein distance of 0.35 demonstrates that our synthetic data follows a nearly identical joint distribution to the patient cohort used to generate the data. Patient privacy was a primary concern for this study; the ϵ-identifiability metric, which estimates the probability of actual patients being identified, is 0.008%, ensuring that our synthetic data cannot be used to identify any actual patients. To demonstrate its usage, we tested the bias in causal effect estimation of four well-established models using this dataset. The approach we used can be readily extended to other types of diseases in the clinical domain, and to datasets in other domains as well.

Collapse

Couckuyt A, Seurinck R, Emmaneel A, Quintelier K, Novak D, Van Gassen S, Saeys Y. Challenges in translational machine learning. Hum Genet 2022;141:1451-1466. [PMID: 35246744 PMCID: PMC8896412 DOI: 10.1007/s00439-022-02439-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 02/08/2022] [Indexed: 11/25/2022]

Generation of realistic synthetic data using Multimodal Neural Ordinary Differential Equations. NPJ Digit Med 2022;5:122. [PMID: 35986075 PMCID: PMC9391444 DOI: 10.1038/s41746-022-00666-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 07/25/2022] [Indexed: 11/11/2022] Open

Hernandez M, Epelde G, Alberdi A, Cilla R, Rankin D. Synthetic data generation for tabular health records: A systematic review. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.053] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Generating Higher-Fidelity Synthetic Datasets with Privacy Guarantees. ALGORITHMS 2022. [DOI: 10.3390/a15070232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

ZenoPS: A Distributed Learning System Integrating Communication Efficiency and Security. ALGORITHMS 2022. [DOI: 10.3390/a15070233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Coyner AS, Chen JS, Chang K, Singh P, Ostmo S, Chan RVP, Chiang MF, Kalpathy-Cramer J, Campbell JP. Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis. OPHTHALMOLOGY SCIENCE 2022;2:100126. [PMID: 36249693 PMCID: PMC9560638 DOI: 10.1016/j.xops.2022.100126] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 02/01/2022] [Accepted: 02/07/2022] [Indexed: 02/06/2023]

Abstract

Purpose

Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets.

Design

Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data.

Participants

Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants.

Methods

Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets.

Main Outcome Measures

Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fréchet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong's test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar's chi-square test and Cohen's κ value.

Results

The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; P = 0.006 and P = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fréchet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations.

Conclusions

Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.

Collapse

Affiliation(s)

Aaron S. Coyner Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
Jimmy S. Chen Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, San Diego, California
Ken Chang Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts Center for Clinical Data Science, Massachusetts General Hospital and Boston Women’s Hospital, Boston, Massachusetts
Praveer Singh Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts Center for Clinical Data Science, Massachusetts General Hospital and Boston Women’s Hospital, Boston, Massachusetts
Susan Ostmo Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
R. V. Paul Chan Department of Ophthalmology and Visual Sciences, Eye and Ear Infirmary, University of Illinois, Chicago, Illinois
Michael F. Chiang National Eye Institute, National Institutes of Health, Bethesda, Maryland
Jayashree Kalpathy-Cramer Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts Center for Clinical Data Science, Massachusetts General Hospital and Boston Women’s Hospital, Boston, Massachusetts
J. Peter Campbell Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
Imaging and Informatics in Retinopathy of Prematurity Consortium† Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, San Diego, California Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts Center for Clinical Data Science, Massachusetts General Hospital and Boston Women’s Hospital, Boston, Massachusetts Department of Ophthalmology and Visual Sciences, Eye and Ear Infirmary, University of Illinois, Chicago, Illinois National Eye Institute, National Institutes of Health, Bethesda, Maryland

Collapse

Bonomi L, Fan L. Sharing Time-to-Event Data with Privacy Protection. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2022;2022:10.1109/ichi54592.2022.00014. [PMID: 36120417 PMCID: PMC9473343 DOI: 10.1109/ichi54592.2022.00014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Torkzadehmahani R, Nasirigerdeh R, Blumenthal DB, Kacprowski T, List M, Matschinske J, Spaeth J, Wenke NK, Baumbach J. Privacy-Preserving Artificial Intelligence Techniques in Biomedicine. Methods Inf Med 2022;61:e12-e27. [PMID: 35062032 PMCID: PMC9246509 DOI: 10.1055/s-0041-1740630] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 09/18/2021] [Indexed: 12/15/2022]

Kokosi T, De Stavola B, Mitra R, Frayling L, Doherty A, Dove I, Sonnenberg P, Harron K. An overview of synthetic administrative data for research. Int J Popul Data Sci 2022;7:1727. [PMID: 37650026 PMCID: PMC10464868 DOI: 10.23889/ijpds.v7i1.1727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open

Thomas JA, Foraker RE, Zamstein N, Morrow JD, Payne PRO, Wilcox AB. Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C). J Am Med Inform Assoc 2022;29:1350-1365. [PMID: 35357487 PMCID: PMC8992357 DOI: 10.1093/jamia/ocac045] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 03/11/2022] [Accepted: 03/28/2022] [Indexed: 11/16/2022] Open

Hartebrodt A, Röttger R. Federated horizontally partitioned principal component analysis for biomedical applications. BIOINFORMATICS ADVANCES 2022;2:vbac026. [PMID: 36699354 PMCID: PMC9710634 DOI: 10.1093/bioadv/vbac026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 04/07/2022] [Indexed: 01/28/2023]

El Emam K, Mosquera L, Fang X, El-Hussuna A. Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study. JMIR Med Inform 2022;10:e35734. [PMID: 35389366 PMCID: PMC9030990 DOI: 10.2196/35734] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 01/27/2022] [Accepted: 02/13/2022] [Indexed: 01/06/2023] Open

Bonomi L, Wu Z, Fan L. Sharing personal ECG time-series data privately. J Am Med Inform Assoc 2022;29:1152-1160. [PMID: 35380666 PMCID: PMC9196703 DOI: 10.1093/jamia/ocac047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 03/16/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open

Abstract Abstract Objective Emerging technologies (eg, wearable devices) have made it possible to collect data directly from individuals (eg, time-series), providing new insights on the health and well-being of individual patients. Broadening the access to these data would facilitate the integration with existing data sources (eg, clinical and genomic data) and advance medical research. Compared to traditional health data, these data are collected directly from individuals, are highly unique and provide fine-grained information, posing new privacy challenges. In this work, we study the applicability of a novel privacy model to enable individual-level time-series data sharing while maintaining the usability for data analytics. Methods and materials We propose a privacy-protecting method for sharing individual-level electrocardiography (ECG) time-series data, which leverages dimensional reduction technique and random sampling to achieve provable privacy protection. We show that our solution provides strong privacy protection against an informed adversarial model while enabling useful aggregate-level analysis. Results We conduct our evaluations on 2 real-world ECG datasets. Our empirical results show that the privacy risk is significantly reduced after sanitization while the data usability is retained for a variety of clinical tasks (eg, predictive modeling and clustering). Discussion Our study investigates the privacy risk in sharing individual-level ECG time-series data. We demonstrate that individual-level data can be highly unique, requiring new privacy solutions to protect data contributors. Conclusion The results suggest our proposed privacy-protection method provides strong privacy protections while preserving the usefulness of the data. Collapse

Liu H, Peng C, Tian Y, Long S, Tian F, Wu Z. GDP vs. LDP: A Survey from the Perspective of Information-Theoretic Channel. ENTROPY (BASEL, SWITZERLAND) 2022;24:430. [PMID: 35327940 PMCID: PMC8953244 DOI: 10.3390/e24030430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 03/02/2022] [Accepted: 03/17/2022] [Indexed: 11/30/2022]

Incorporation of Synthetic Data Generation Techniques within a Controlled Data Processing Workflow in the Health and Wellbeing Domain. ELECTRONICS 2022. [DOI: 10.3390/electronics11050812] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Dong J, Roth A, Su WJ. Authors’ reply to the Discussion of ‘Gaussian Differential Privacy’ by Dong et al . J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Artificial Intelligence and Cardiovascular Genetics. Life (Basel) 2022;12:life12020279. [PMID: 35207566 PMCID: PMC8875522 DOI: 10.3390/life12020279] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/26/2022] [Accepted: 02/09/2022] [Indexed: 12/13/2022] Open

Artificial Intelligence and Hypertension Management. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022;23:40-55. [PMID: 34518686 DOI: 10.1038/s41580-021-00407-0] [Citation(s) in RCA: 556] [Impact Index Per Article: 278.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2021] [Indexed: 02/08/2023]

Zhang Z, Yan C, Malin BA. Membership inference attacks against synthetic health data. J Biomed Inform 2022;125:103977. [PMID: 34920126 PMCID: PMC8766950 DOI: 10.1016/j.jbi.2021.103977] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/17/2021] [Accepted: 12/08/2021] [Indexed: 01/03/2023]

Abstract

Synthetic data generation has emerged as a promising method to protect patient privacy while sharing individual-level health data. Intuitively, sharing synthetic data should reduce disclosure risks because no explicit linkage is retained between the synthetic records and the real data upon which it is based. However, the risks associated with synthetic data are still evolving, and what seems protected today may not be tomorrow. In this paper, we show that membership inference attacks, whereby an adversary infers if the data from certain target individuals (known to the adversary a priori) were relied upon by the synthetic data generation process, can be substantially enhanced through state-of-the-art machine learning frameworks, which calls into question the protective nature of existing synthetic data generators. Specifically, we formulate the membership inference problem from the perspective of the data holder, who aims to perform a disclosure risk assessment prior to sharing any health data. To support such an assessment, we introduce a framework for effective membership inference against synthetic health data without specific assumptions about the generative model or a well-defined data structure, leveraging the principles of contrastive representation learning. To illustrate the potential for such an attack, we conducted experiments against synthesis approaches using two datasets derived from several health data resources (Vanderbilt University Medical Center, the All of Us Research Program) to determine the upper bound of risk brought by an adversary who invokes an optimal strategy. The results indicate that partially synthetic data are vulnerable to membership inference at a very high rate. By contrast, fully synthetic data are only marginally susceptible and, in most cases, could be deemed sufficiently protected from membership inference.

Collapse

Wan Z, Vorobeychik Y, Xia W, Liu Y, Wooders M, Guo J, Yin Z, Clayton EW, Kantarcioglu M, Malin BA. Using game theory to thwart multistage privacy intrusions when sharing data. SCIENCE ADVANCES 2021;7:eabe9986. [PMID: 34890225 PMCID: PMC8664254 DOI: 10.1126/sciadv.abe9986] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 10/25/2021] [Indexed: 06/13/2023]

Affiliation(s)

Zhiyu Wan Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
Yevgeniy Vorobeychik Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
Weiyi Xia Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
Yongtai Liu Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
Myrna Wooders Department of Economics, Vanderbilt University, Nashville, TN 37235, USA
Jia Guo Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
Zhijun Yin Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
Ellen Wright Clayton Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, TN 37203, USA School of Law, Vanderbilt University, Nashville, TN 37203, USA Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
Murat Kantarcioglu Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA Institute for Quantitative Social Science, Harvard University, Cambridge, MA 02138, USA Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA
Bradley A. Malin Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA

Collapse

Chen D, Cheung SCS, Chuah CN, Ozonoff S. Differentially Private Generative Adversarial Networks with Model Inversion. PROCEEDINGS OF THE ... IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY. IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY 2021;2021:10.1109/wifs53200.2021.9648378. [PMID: 35517057 PMCID: PMC9070036 DOI: 10.1109/wifs53200.2021.9648378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]