Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Baowaly MK, Lin CC, Liu CL, Chen KT. Synthesizing electronic health records using improved generative adversarial networks. J Am Med Inform Assoc 2019;26:228-241. [PMID: 30535151 PMCID: PMC7647178 DOI: 10.1093/jamia/ocy142] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 09/21/2018] [Accepted: 10/24/2018] [Indexed: 11/14/2022] Open

For:	Baowaly MK, Lin CC, Liu CL, Chen KT. Synthesizing electronic health records using improved generative adversarial networks. J Am Med Inform Assoc 2019;26:228-241. [PMID: 30535151 PMCID: PMC7647178 DOI: 10.1093/jamia/ocy142] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 09/21/2018] [Accepted: 10/24/2018] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Sheng B, Pushpanathan K, Guan Z, Lim QH, Lim ZW, Yew SME, Goh JHL, Bee YM, Sabanayagam C, Sevdalis N, Lim CC, Lim CT, Shaw J, Jia W, Ekinci EI, Simó R, Lim LL, Li H, Tham YC. Artificial intelligence for diabetes care: current and future prospects. Lancet Diabetes Endocrinol 2024;12:569-595. [PMID: 39054035 DOI: 10.1016/s2213-8587(24)00154-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/28/2024] [Accepted: 05/16/2024] [Indexed: 07/27/2024]

Affiliation(s)

Bin Sheng Shanghai Belt and Road International Joint Laboratory for Intelligent Prevention and Treatment of Metabolic Disorders, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China; Key Laboratory of Artificial Intelligence, Ministry of Education, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Krithi Pushpanathan Centre of Innovation and Precision Eye Health, Department of Ophthalmology, National University of Singapore, Singapore; Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Zhouyu Guan Shanghai Belt and Road International Joint Laboratory for Intelligent Prevention and Treatment of Metabolic Disorders, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
Quan Hziung Lim Department of Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
Zhi Wei Lim Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Samantha Min Er Yew Centre of Innovation and Precision Eye Health, Department of Ophthalmology, National University of Singapore, Singapore; Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Jocelyn Hui Lin Goh Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
Yong Mong Bee Department of Endocrinology, Singapore General Hospital, Singapore; SingHealth Duke-National University of Singapore Diabetes Centre, Singapore Health Services, Singapore
Charumathi Sabanayagam Ophthalmology and Visual Sciences Academic Clinical Program, Duke-National University of Singapore Medical School, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
Nick Sevdalis Centre for Behavioural and Implementation Science Interventions, National University of Singapore, Singapore
Cynthia Ciwei Lim Department of Renal Medicine, Singapore General Hospital, Singapore
Chwee Teck Lim Department of Biomedical Engineering, National University of Singapore, Singapore; Institute for Health Innovation and Technology, National University of Singapore, Singapore; Mechanobiology Institute, National University of Singapore, Singapore
Jonathan Shaw Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
Weiping Jia Shanghai Belt and Road International Joint Laboratory for Intelligent Prevention and Treatment of Metabolic Disorders, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China
Elif Ilhan Ekinci Australian Centre for Accelerating Diabetes Innovations, Melbourne Medical School and Department of Medicine, University of Melbourne, Melbourne, VIC, Australia; Department of Endocrinology, Austin Health, Melbourne, VIC, Australia
Rafael Simó Diabetes and Metabolism Research Unit, Vall d'Hebron University Hospital and Vall d'Hebron Research Institute, Barcelona, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas, Instituto de Salud Carlos III, Madrid, Spain
Lee-Ling Lim Department of Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia; Department of Medicine and Therapeutics, Chinese University of Hong Kong, Hong Kong Special Administrative Region, China; Asia Diabetes Foundation, Hong Kong Special Administrative Region, China
Huating Li Shanghai Belt and Road International Joint Laboratory for Intelligent Prevention and Treatment of Metabolic Disorders, Department of Computer Science and Engineering, School of Electronic, Information, and Electrical Engineering, Shanghai Jiao Tong University, Department of Endocrinology and Metabolism, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China.
Yih-Chung Tham Centre of Innovation and Precision Eye Health, Department of Ophthalmology, National University of Singapore, Singapore; Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Ophthalmology and Visual Sciences Academic Clinical Program, Duke-National University of Singapore Medical School, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore.

Collapse

Vasdev N, Gupta T, Pawar B, Bain A, Tekade RK. Navigating the future of health care with AI-driven digital therapeutics. Drug Discov Today 2024:104110. [PMID: 39034025 DOI: 10.1016/j.drudis.2024.104110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 07/01/2024] [Accepted: 07/16/2024] [Indexed: 07/23/2024]

Borisov V, Leemann T, Sebler K, Haug J, Pawelczyk M, Kasneci G. Deep Neural Networks and Tabular Data: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024;35:7499-7519. [PMID: 37015381 DOI: 10.1109/tnnls.2022.3229161] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Abstract

Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data.

Collapse

Chandra S, Prakash PKS, Samanta S, Chilukuri S. ClinicalGAN: powering patient monitoring in clinical trials with patient digital twins. Sci Rep 2024;14:12236. [PMID: 38806536 PMCID: PMC11133486 DOI: 10.1038/s41598-024-62567-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 05/19/2024] [Indexed: 05/30/2024] Open

Abstract

Conducting clinical trials is becoming increasingly challenging lately due to spiraling costs, increased time to market, and high failure rates. Patient recruitment and retention is one of the key challenges that impact 90% of the trials directly. While a lot of attention has been given to optimizing patient recruitment, limited progress has been made towards developing comprehensive clinical trial monitoring systems to determine patients at risk and potentially improve patient retention through the right intervention at the right time. Earlier research in patient retention primarily focused on using deterministic frameworks to model the inherently stochastic patient journey process. Existing generative approaches to model temporal data such as TimeGAN or CRBM , face challenges and fail to address key requirements such as personalized generation, variable patient journey, and multi-variate time-series needed to model patient digital twin. In response to these challenges, current research proposes ClinicalGAN to enable patient level generation, effectively creating a patient's digital twin. ClinicalGAN provides capabilities for: (a) patient-level personalized generation by utilizing patient meta-data for conditional generation; (b) dynamic termination prediction to enable pro-active patient monitoring for improved patient retention; (c) multi-variate time-series training to incorporate relationship and dependencies among different tests measures captured during patient journey. The proposed solution is validated on two Alzheimer's clinical trial datasets and the results are benchmarked across multiple dimensions of generation quality. Empirical results demonstrate that the proposed ClinicalGAN outperforms the SOTA approach by 3-4 × on average across all the generation quality metrics. Furthermore, the proposed architecture is shown to outperform predictive methods at the task of drop-off prediction significantly (5-10% MAPE scores).

Collapse

Vallevik VB, Babic A, Marshall SE, Elvatun S, Brøgger HMB, Alagaratnam S, Edwin B, Veeraragavan NR, Befring AK, Nygård JF. Can I trust my fake data - A comprehensive quality assessment framework for synthetic tabular data in healthcare. Int J Med Inform 2024;185:105413. [PMID: 38493547 DOI: 10.1016/j.ijmedinf.2024.105413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/17/2024] [Accepted: 03/11/2024] [Indexed: 03/19/2024]

Abstract

BACKGROUND

Ensuring safe adoption of AI tools in healthcare hinges on access to sufficient data for training, testing and validation. Synthetic data has been suggested in response to privacy concerns and regulatory requirements and can be created by training a generator on real data to produce a dataset with similar statistical properties. Competing metrics with differing taxonomies for quality evaluation have been proposed, resulting in a complex landscape. Optimising quality entails balancing considerations that make the data fit for use, yet relevant dimensions are left out of existing frameworks.

METHOD

We performed a comprehensive literature review on the use of quality evaluation metrics on synthetic data within the scope of synthetic tabular healthcare data using deep generative methods. Based on this and the collective team experiences, we developed a conceptual framework for quality assurance. The applicability was benchmarked against a practical case from the Dutch National Cancer Registry.

CONCLUSION

We present a conceptual framework for quality assuranceof synthetic data for AI applications in healthcare that aligns diverging taxonomies, expands on common quality dimensions to include the dimensions of Fairness and Carbon footprint, and proposes stages necessary to support real-life applications. Building trust in synthetic data by increasing transparency and reducing the safety risk will accelerate the development and uptake of trustworthy AI tools for the benefit of patients.

DISCUSSION

Despite the growing emphasis on algorithmic fairness and carbon footprint, these metrics were scarce in the literature review. The overwhelming focus was on statistical similarity using distance metrics while sequential logic detection was scarce. A consensus-backed framework that includes all relevant quality dimensions can provide assurance for safe and responsible real-life applications of synthetic data. As the choice of appropriate metrics are highly context dependent, further research is needed on validation studies to guide metric choices and support the development of technical standards.

Collapse

Carini C, Seyhan AA. Tribulations and future opportunities for artificial intelligence in precision medicine. J Transl Med 2024;22:411. [PMID: 38702711 PMCID: PMC11069149 DOI: 10.1186/s12967-024-05067-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 03/05/2024] [Indexed: 05/06/2024] Open

Abstract

Upon a diagnosis, the clinical team faces two main questions: what treatment, and at what dose? Clinical trials' results provide the basis for guidance and support for official protocols that clinicians use to base their decisions. However, individuals do not consistently demonstrate the reported response from relevant clinical trials. The decision complexity increases with combination treatments where drugs administered together can interact with each other, which is often the case. Additionally, the individual's response to the treatment varies with the changes in their condition. In practice, the drug and the dose selection depend significantly on the medical protocol and the medical team's experience. As such, the results are inherently varied and often suboptimal. Big data and Artificial Intelligence (AI) approaches have emerged as excellent decision-making tools, but multiple challenges limit their application. AI is a rapidly evolving and dynamic field with the potential to revolutionize various aspects of human life. AI has become increasingly crucial in drug discovery and development. AI enhances decision-making across different disciplines, such as medicinal chemistry, molecular and cell biology, pharmacology, pathology, and clinical practice. In addition to these, AI contributes to patient population selection and stratification. The need for AI in healthcare is evident as it aids in enhancing data accuracy and ensuring the quality care necessary for effective patient treatment. AI is pivotal in improving success rates in clinical practice. The increasing significance of AI in drug discovery, development, and clinical trials is underscored by many scientific publications. Despite the numerous advantages of AI, such as enhancing and advancing Precision Medicine (PM) and remote patient monitoring, unlocking its full potential in healthcare requires addressing fundamental concerns. These concerns include data quality, the lack of well-annotated large datasets, data privacy and safety issues, biases in AI algorithms, legal and ethical challenges, and obstacles related to cost and implementation. Nevertheless, integrating AI in clinical medicine will improve diagnostic accuracy and treatment outcomes, contribute to more efficient healthcare delivery, reduce costs, and facilitate better patient experiences, making healthcare more sustainable. This article reviews AI applications in drug development and clinical practice, making healthcare more sustainable, and highlights concerns and limitations in applying AI.

Collapse

Rahman MA, Victoros E, Ernest J, Davis R, Shanjana Y, Islam MR. Impact of Artificial Intelligence (AI) Technology in Healthcare Sector: A Critical Evaluation of Both Sides of the Coin. CLINICAL PATHOLOGY (THOUSAND OAKS, VENTURA COUNTY, CALIF.) 2024;17:2632010X241226887. [PMID: 38264676 PMCID: PMC10804900 DOI: 10.1177/2632010x241226887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/27/2023] [Indexed: 01/25/2024]

Gwon H, Ahn I, Kim Y, Kang HJ, Seo H, Choi H, Cho HN, Kim M, Han J, Kee G, Park S, Lee KH, Jun TJ, Kim YH. LDP-GAN : Generative adversarial networks with local differential privacy for patient medical records synthesis. Comput Biol Med 2024;168:107738. [PMID: 37995536 DOI: 10.1016/j.compbiomed.2023.107738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 10/31/2023] [Accepted: 11/16/2023] [Indexed: 11/25/2023]

Affiliation(s)

Hansle Gwon Department of Information Medicine, Asan Medical Center, 8, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Imjin Ahn Department of Information Medicine, Asan Medical Center, 8, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Yunha Kim Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Hee Jun Kang Division of Cardiology, Asan Medical Center, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Hyeram Seo Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Heejung Choi Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Ha Na Cho Department of Information Medicine, Asan Medical Center, 8, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Minkyoung Kim Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
JiYe Han Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Gaeun Kee Department of Information Medicine, Asan Medical Center, 8, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Seohyun Park Department of Information Medicine, Asan Medical Center, 8, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Kye Hwa Lee Department of Information Medicine, Asan Medical Center, 8, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea
Tae Joon Jun Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea.
Young-Hak Kim Division of Cardiology, Department of Information Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea

Collapse

Kang HYJ, Batbaatar E, Choi DW, Choi KS, Ko M, Ryu KS. Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy. JMIR Med Inform 2023;11:e47859. [PMID: 37999942 DOI: 10.2196/47859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 08/02/2023] [Accepted: 10/28/2023] [Indexed: 11/25/2023] Open

Abstract

BACKGROUND

Synthetic data generation (SDG) based on generative adversarial networks (GANs) is used in health care, but research on preserving data with logical relationships with synthetic tabular data (STD) remains challenging. Filtering methods for SDG can lead to the loss of important information.

OBJECTIVE

This study proposed a divide-and-conquer (DC) method to generate STD based on the GAN algorithm, while preserving data with logical relationships.

METHODS

The proposed method was evaluated on data from the Korea Association for Lung Cancer Registry (KALC-R) and 2 benchmark data sets (breast cancer and diabetes). The DC-based SDG strategy comprises 3 steps: (1) We used 2 different partitioning methods (the class-specific criterion distinguished between survival and death groups, while the Cramer V criterion identified the highest correlation between columns in the original data); (2) the entire data set was divided into a number of subsets, which were then used as input for the conditional tabular generative adversarial network and the copula generative adversarial network to generate synthetic data; and (3) the generated synthetic data were consolidated into a single entity. For validation, we compared DC-based SDG and conditional sampling (CS)-based SDG through the performances of machine learning models. In addition, we generated imbalanced and balanced synthetic data for each of the 3 data sets and compared their performance using 4 classifiers: decision tree (DT), random forest (RF), Extreme Gradient Boosting (XGBoost), and light gradient-boosting machine (LGBM) models.

RESULTS

The synthetic data of the 3 diseases (non-small cell lung cancer [NSCLC], breast cancer, and diabetes) generated by our proposed model outperformed the 4 classifiers (DT, RF, XGBoost, and LGBM). The CS- versus DC-based model performances were compared using the mean area under the curve (SD) values: 74.87 (SD 0.77) versus 63.87 (SD 2.02) for NSCLC, 73.31 (SD 1.11) versus 67.96 (SD 2.15) for breast cancer, and 61.57 (SD 0.09) versus 60.08 (SD 0.17) for diabetes (DT); 85.61 (SD 0.29) versus 79.01 (SD 1.20) for NSCLC, 78.05 (SD 1.59) versus 73.48 (SD 4.73) for breast cancer, and 59.98 (SD 0.24) versus 58.55 (SD 0.17) for diabetes (RF); 85.20 (SD 0.82) versus 76.42 (SD 0.93) for NSCLC, 77.86 (SD 2.27) versus 68.32 (SD 2.37) for breast cancer, and 60.18 (SD 0.20) versus 58.98 (SD 0.29) for diabetes (XGBoost); and 85.14 (SD 0.77) versus 77.62 (SD 1.85) for NSCLC, 78.16 (SD 1.52) versus 70.02 (SD 2.17) for breast cancer, and 61.75 (SD 0.13) versus 61.12 (SD 0.23) for diabetes (LGBM). In addition, we found that balanced synthetic data performed better.

CONCLUSIONS

This study is the first attempt to generate and validate STD based on a DC approach and shows improved performance using STD. The necessity for balanced SDG was also demonstrated.

Collapse

Bonomi L, Gousheh S, Fan L. Enabling Health Data Sharing with Fine-Grained Privacy. PROCEEDINGS OF THE ... ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT. ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT 2023;2023:131-141. [PMID: 37906633 PMCID: PMC10601092 DOI: 10.1145/3583780.3614864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]

Peppes N, Tsakanikas P, Daskalakis E, Alexakis T, Adamopoulou E, Demestichas K. FoGGAN: Generating Realistic Parkinson's Disease Freezing of Gait Data Using GANs. SENSORS (BASEL, SWITZERLAND) 2023;23:8158. [PMID: 37836988 PMCID: PMC10574838 DOI: 10.3390/s23198158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 09/23/2023] [Accepted: 09/27/2023] [Indexed: 10/15/2023]

Al Hadithy ZA, Al Lawati A, Al-Zadjali R, Al Sinawi H. Knowledge, Attitudes, and Perceptions of Artificial Intelligence in Healthcare Among Medical Students at Sultan Qaboos University. Cureus 2023;15:e44887. [PMID: 37814766 PMCID: PMC10560391 DOI: 10.7759/cureus.44887] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2023] [Indexed: 10/11/2023] Open

Theodorou B, Xiao C, Sun J. Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model. Nat Commun 2023;14:5305. [PMID: 37652934 PMCID: PMC10471716 DOI: 10.1038/s41467-023-41093-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 08/23/2023] [Indexed: 09/02/2023] Open

Wolfien M, Ahmadi N, Fitzer K, Grummt S, Heine KL, Jung IC, Krefting D, Kühn A, Peng Y, Reinecke I, Scheel J, Schmidt T, Schmücker P, Schüttler C, Waltemath D, Zoch M, Sedlmayr M. Ten Topics to Get Started in Medical Informatics Research. J Med Internet Res 2023;25:e45948. [PMID: 37486754 PMCID: PMC10407648 DOI: 10.2196/45948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 03/29/2023] [Accepted: 04/11/2023] [Indexed: 07/25/2023] Open

Abstract

The vast and heterogeneous data being constantly generated in clinics can provide great wealth for patients and research alike. The quickly evolving field of medical informatics research has contributed numerous concepts, algorithms, and standards to facilitate this development. However, these difficult relationships, complex terminologies, and multiple implementations can present obstacles for people who want to get active in the field. With a particular focus on medical informatics research conducted in Germany, we present in our Viewpoint a set of 10 important topics to improve the overall interdisciplinary communication between different stakeholders (eg, physicians, computational experts, experimentalists, students, patient representatives). This may lower the barriers to entry and offer a starting point for collaborations at different levels. The suggested topics are briefly introduced, then general best practice guidance is given, and further resources for in-depth reading or hands-on tutorials are recommended. In addition, the topics are set to cover current aspects and open research gaps of the medical informatics domain, including data regulations and concepts; data harmonization and processing; and data evaluation, visualization, and dissemination. In addition, we give an example on how these topics can be integrated in a medical informatics curriculum for higher education. By recognizing these topics, readers will be able to (1) set clinical and research data into the context of medical informatics, understanding what is possible to achieve with data or how data should be handled in terms of data privacy and storage; (2) distinguish current interoperability standards and obtain first insights into the processes leading to effective data transfer and analysis; and (3) value the use of newly developed technical approaches to utilize the full potential of clinical data.

Collapse

Affiliation(s)

Markus Wolfien Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany Center for Scalable Data Analytics and Artificial Intelligence, Dresden, Germany
Najia Ahmadi Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Kai Fitzer Core Unit Data Integration Center, University Medicine Greifswald, Greifswald, Germany
Sophia Grummt Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Kilian-Ludwig Heine Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Ian-C Jung Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Dagmar Krefting Department of Medical Informatics, University Medical Center, Goettingen, Germany
Andreas Kühn Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Yuan Peng Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Ines Reinecke Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Julia Scheel Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
Tobias Schmidt Institute for Medical Informatics, University of Applied Sciences Mannheim, Mannheim, Germany
Paul Schmücker Institute for Medical Informatics, University of Applied Sciences Mannheim, Mannheim, Germany
Christina Schüttler Central Biobank Erlangen, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Dagmar Waltemath Core Unit Data Integration Center, University Medicine Greifswald, Greifswald, Germany Department of Medical Informatics, University Medicine Greifswald, Greifswald, Germany
Michele Zoch Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Martin Sedlmayr Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany Center for Scalable Data Analytics and Artificial Intelligence, Dresden, Germany

Collapse

Ghosheh GO, Thwaites CL, Zhu T. Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs). Biomedicines 2023;11:1749. [PMID: 37371844 PMCID: PMC10295936 DOI: 10.3390/biomedicines11061749] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 06/12/2023] [Accepted: 06/15/2023] [Indexed: 06/29/2023] Open

Al Kuwaiti A, Nazer K, Al-Reedy A, Al-Shehri S, Al-Muhanna A, Subbarayalu AV, Al Muhanna D, Al-Muhanna FA. A Review of the Role of Artificial Intelligence in Healthcare. J Pers Med 2023;13:951. [PMID: 37373940 PMCID: PMC10301994 DOI: 10.3390/jpm13060951] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/11/2023] [Accepted: 05/12/2023] [Indexed: 06/29/2023] Open

Sun C, van Soest J, Dumontier M. Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy. J Biomed Inform 2023:104404. [PMID: 37268168 DOI: 10.1016/j.jbi.2023.104404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/25/2023] [Accepted: 05/21/2023] [Indexed: 06/04/2023]

Abstract

A large amount of personal health data that is highly valuable to the scientific community is still not accessible or requires a lengthy request process due to privacy concerns and legal restrictions. As a solution, synthetic data has been studied and proposed to be a promising alternative to this issue. However, generating realistic and privacy-preserving synthetic personal health data retains challenges such as simulating the characteristics of the patients' data that are in the minority classes, capturing the relations among variables in imbalanced data and transferring them to the synthetic data, and preserving individual patients' privacy. In this paper, we propose a differentially private conditional Generative Adversarial Network model (DP-CGANS) consisting of data transformation, sampling, conditioning, and network training to generate realistic and privacy-preserving personal data. Our model distinguishes categorical and continuous variables and transforms them into latent space separately for better training performance. We tackle the unique challenges of generating synthetic patient data due to the special data characteristics of personal health data. For example, patients with a certain disease are typically the minority in the dataset and the relations among variables are crucial to be observed. Our model is structured with a conditional vector as an additional input to present the minority class in the imbalanced data and maximally capture the dependency between variables. Moreover, we inject statistical noise into the gradients in the networking training process of DP-CGANS to provide a differential privacy guarantee. We extensively evaluate our model with state-of-the-art generative models on personal socio-economic datasets and real-world personal health datasets in terms of statistical similarity, machine learning performance, and privacy measurement. We demonstrate that our model outperforms other comparable models, especially in capturing the dependence between variables. Finally, we present the balance between data utility and privacy in synthetic data generation considering the different data structures and characteristics of real-world personal health data such as imbalanced classes, abnormal distributions, and data sparsity.

Collapse

Li J, Cairns BJ, Li J, Zhu T. Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications. NPJ Digit Med 2023;6:98. [PMID: 37244963 DOI: 10.1038/s41746-023-00834-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 05/05/2023] [Indexed: 05/29/2023] Open

Nikolentzos G, Vazirgiannis M, Xypolopoulos C, Lingman M, Brandt EG. Synthetic electronic health records generated with variational graph autoencoders. NPJ Digit Med 2023;6:83. [PMID: 37120594 PMCID: PMC10148837 DOI: 10.1038/s41746-023-00822-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 04/05/2023] [Indexed: 05/01/2023] Open

Davis SE, Ssemaganda H, Koola JD, Mao J, Westerman D, Speroff T, Govindarajulu US, Ramsay CR, Sedrakyan A, Ohno-Machado L, Resnic FS, Matheny ME. Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance. BMC Med Res Methodol 2023;23:89. [PMID: 37041457 PMCID: PMC10088292 DOI: 10.1186/s12874-023-01913-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 04/04/2023] [Indexed: 04/13/2023] Open

Abstract

BACKGROUND

Validating new algorithms, such as methods to disentangle intrinsic treatment risk from risk associated with experiential learning of novel treatments, often requires knowing the ground truth for data characteristics under investigation. Since the ground truth is inaccessible in real world data, simulation studies using synthetic datasets that mimic complex clinical environments are essential. We describe and evaluate a generalizable framework for injecting hierarchical learning effects within a robust data generation process that incorporates the magnitude of intrinsic risk and accounts for known critical elements in clinical data relationships.

METHODS

We present a multi-step data generating process with customizable options and flexible modules to support a variety of simulation requirements. Synthetic patients with nonlinear and correlated features are assigned to provider and institution case series. The probability of treatment and outcome assignment are associated with patient features based on user definitions. Risk due to experiential learning by providers and/or institutions when novel treatments are introduced is injected at various speeds and magnitudes. To further reflect real-world complexity, users can request missing values and omitted variables. We illustrate an implementation of our method in a case study using MIMIC-III data for reference patient feature distributions.

RESULTS

Realized data characteristics in the simulated data reflected specified values. Apparent deviations in treatment effects and feature distributions, though not statistically significant, were most common in small datasets (n < 3000) and attributable to random noise and variability in estimating realized values in small samples. When learning effects were specified, synthetic datasets exhibited changes in the probability of an adverse outcomes as cases accrued for the treatment group impacted by learning and stable probabilities as cases accrued for the treatment group not affected by learning.

CONCLUSIONS

Our framework extends clinical data simulation techniques beyond generation of patient features to incorporate hierarchical learning effects. This enables the complex simulation studies required to develop and rigorously test algorithms developed to disentangle treatment safety signals from the effects of experiential learning. By supporting such efforts, this work can help identify training opportunities, avoid unwarranted restriction of access to medical advances, and hasten treatment improvements.

Collapse

Affiliation(s)

Sharon E Davis Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA.
Henry Ssemaganda Comparative Effectiveness Research Institute, Lahey Hospital and Medical Center, 41 Mall Road, Burlington, MA, 01803, USA
Jejo D Koola UC Health Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Dr. MC 0728, La Jolla, San Diego, CA, 92093-0728, USA
Jialin Mao Department of Population Health Sciences, Weill Cornell Medicine, 1300 York Avenue, New York, NY, 10065, USA
Dax Westerman Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA
Theodore Speroff Departments of Medicine and Biostatistics, Vanderbilt University Medical Center, 1313 21St Avenue South, Oxford House, Room 209, Nashville, TN, 37232, USA
Usha S Govindarajulu Center for Biostatistics, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1077, New York, NY, 10029, USA
Craig R Ramsay Health Services Research Unit, University of Aberdeen, Health Sciences Building, Foresterhill, 3rd Floor, Aberdeen, AB25 2ZD, UK
Art Sedrakyan Department of Population Health Sciences, Weill Cornell Medicine, 1300 York Avenue, New York, NY, 10065, USA
Lucila Ohno-Machado Biomedical Informatics and Data Science, Yale School of Medicine, 100 College Street, New Haven, CT, 06510, USA
Frederic S Resnic Division of Cardiovascular Medicine and Comparative Effectiveness Research Institute, Lahey Hospital and Medical Center, Tufts University School of Medicine, 41 Burlington Mall Road, Burlington, MA, 01805, USA
Michael E Matheny Departments of Biomedical Informatics, Biostatistics, and Medicine, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, 1310 24th Avenue South, Nashville, TN, 37212, USA

Collapse

Mosquera L, El Emam K, Ding L, Sharma V, Zhang XH, Kababji SE, Carvalho C, Hamilton B, Palfrey D, Kong L, Jiang B, Eurich DT. A method for generating synthetic longitudinal health data. BMC Med Res Methodol 2023;23:67. [PMID: 36959532 PMCID: PMC10034254 DOI: 10.1186/s12874-023-01869-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 02/19/2023] [Indexed: 03/25/2023] Open

Abstract

Getting access to administrative health data for research purposes is a difficult and time-consuming process due to increasingly demanding privacy regulations. An alternative method for sharing administrative health data would be to share synthetic datasets where the records do not correspond to real individuals, but the patterns and relationships seen in the data are reproduced. This paper assesses the feasibility of generating synthetic administrative health data using a recurrent deep learning model. Our data comes from 120,000 individuals from Alberta Health's administrative health database. We assess how similar our synthetic data is to the real data using utility assessments that assess the structure and general patterns in the data as well as by recreating a specific analysis in the real data commonly applied to this type of administrative health data. We also assess the privacy risks associated with the use of this synthetic dataset. Generic utility assessments that used Hellinger distance to quantify the difference in distributions between real and synthetic datasets for event types (0.027), attributes (mean 0.0417), Markov transition matrices (order 1 mean absolute difference: 0.0896, sd: 0.159; order 2: mean Hellinger distance 0.2195, sd: 0.2724), the Hellinger distance between the joint distributions was 0.352, and the similarity of random cohorts generated from real and synthetic data had a mean Hellinger distance of 0.3 and mean Euclidean distance of 0.064, indicating small differences between the distributions in the real data and the synthetic data. By applying a realistic analysis to both real and synthetic datasets, Cox regression hazard ratios achieved a mean confidence interval overlap of 68% for adjusted hazard ratios among 5 key outcomes of interest, indicating synthetic data produces similar analytic results to real data. The privacy assessment concluded that the attribution disclosure risk associated with this synthetic dataset was substantially less than the typical 0.09 acceptable risk threshold. Based on these metrics our results show that our synthetic data is suitably similar to the real data and could be shared for research purposes thereby alleviating concerns associated with the sharing of real data in some circumstances.

Collapse

Theodorou B, Xiao C, Sun J. Synthesize Extremely High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model. RESEARCH SQUARE 2023:rs.3.rs-2644725. [PMID: 36945542 PMCID: PMC10029081 DOI: 10.21203/rs.3.rs-2644725/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]

Abstract

Synthetic electronic health records (EHRs) that are both realistic and preserve privacy can serve as an alternative to real EHRs for machine learning (ML) modeling and statistical analysis. However, generating high-fidelity and granular electronic health record (EHR) data in its original, highly-dimensional form poses challenges for existing methods due to the complexities inherent in high-dimensional data. In this paper, we propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal high-dimensional EHR, which preserve the statistical properties of real EHR and can be used to train accurate ML models without privacy concerns. Our HALO method, designed as a hierarchical autoregressive model, generates a probability density function of medical codes, clinical visits, and patient records, allowing for the generation of realistic EHR data in its original, unaggregated form without the need for variable selection or aggregation. Additionally, our model also produces high-quality continuous variables in a longitudinal and probabilistic manner. We conducted extensive experiments and demonstrate that HALO can generate high-fidelity EHR data with high-dimensional disease code probabilities ( d ≈ 10,000), disease code co-occurrence probabilities within a visit ( d ≈ 1,000,000), and conditional probabilities across consecutive visits ( d ≈ 5,000,000) and achieve above 0.9 R 2 correlation in comparison to real EHR data. In comparison to the leading baseline, HALO improves predictive modeling by over 17% in its predictive accuracy and perplexity on a hold-off test set of real EHR data. This performance then enables downstream ML models trained on its synthetic data to achieve comparable accuracy to models trained on real data (0.938 area under the ROC curve with HALO data vs. 0.943 with real data). Finally, using a combination of real and synthetic data enhances the accuracy of ML models beyond that achieved by using only real EHR data.

Collapse

khan B, Fatima H, Qureshi A, Kumar S, Hanan A, Hussain J, Abdullah S. Drawbacks of Artificial Intelligence and Their Potential Solutions in the Healthcare Sector. BIOMEDICAL MATERIALS & DEVICES (NEW YORK, N.Y.) 2023;1:1-8. [PMID: 36785697 PMCID: PMC9908503 DOI: 10.1007/s44174-023-00063-2] [Citation(s) in RCA: 42] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 01/19/2023] [Indexed: 02/10/2023]

Yang W, Zou H, Wang M, Zhang Q, Li S, Liang H. Mortality prediction among ICU inpatients based on MIMIC-III database results from the conditional medical generative adversarial network. Heliyon 2023;9:e13200. [PMID: 36798767 PMCID: PMC9925961 DOI: 10.1016/j.heliyon.2023.e13200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 01/18/2023] [Accepted: 01/19/2023] [Indexed: 01/26/2023] Open

Abstract

Background and aims

Improved mortality prediction among intensive care unit (ICU) inpatients is a valuable and challenging task. Limited clinical data, especially with appropriate labels, are an important element restricting accurate predictions. Generative adversarial networks (GANs) are excellent generative models and have shown great potential for data simulation. However, there have been no relevant studies using GANs to predict mortality among ICU inpatients. In this study, we aim to evaluate the predictive performance of a variant of GAN called conditional medical GAN (c-med GAN) compared with some baseline models, including simplified acute physiology score II (SAPS II), support vector machine (SVM), and multilayer perceptron (MLP).

Methods

Data from a publicly available intensive care database, the Medical Information Mart for Intensive Care III (MIMIC-III) database (v1.4), were included in this study. The area under the precision-recall curve (PR-AUC), area under the receiver operating characteristic curve (ROC-AUC), and F1 score were used to evaluate the predictive performance. In addition, the size of the dataset was artificially reduced, and the performance of the c-med GAN was compared in different size datasets.

Results

The results showed that c-med GAN achieves the best PR-AUC, ROC-AUC, and F1 score compared with SAPS II, SVM, and MLP when training in the full MIMIC-III dataset. When the size of the dataset was reduced, the prediction performances of both MLP and c-med GAN were affected. However, the c-med GAN still outperformed MLP on smaller datasets and had less degradation.

Conclusion

The prediction of in-hospital mortality based on the c-med GAN for ICU patients showed better performance than the baseline models. Despite some inadequacies, this model may have a promising future in clinical applications which will be explored by further research.

Collapse

Hernadez M, Epelde G, Alberdi A, Cilla R, Rankin D. Synthetic Tabular Data Evaluation in the Health Domain Covering Resemblance, Utility, and Privacy Dimensions. Methods Inf Med 2023. [PMID: 36623830 DOI: 10.1055/s-0042-1760247] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Abstract

BACKGROUND

Synthetic tabular data generation is a potentially valuable technology with great promise for data augmentation and privacy preservation. However, prior to adoption, an empirical assessment of generated synthetic tabular data is required across dimensions relevant to the target application to determine its efficacy. A lack of standardized and objective evaluation and benchmarking strategy for synthetic tabular data in the health domain has been found in the literature.

OBJECTIVE

The aim of this paper is to identify key dimensions, per dimension metrics, and methods for evaluating synthetic tabular data generated with different techniques and configurations for health domain application development and to provide a strategy to orchestrate them.

METHODS

Based on the literature, the resemblance, utility, and privacy dimensions have been prioritized, and a collection of metrics and methods for their evaluation are orchestrated into a complete evaluation pipeline. This way, a guided and comparative assessment of generated synthetic tabular data can be done, categorizing its quality into three categories ("Excellent," "Good," and "Poor"). Six health care-related datasets and four synthetic tabular data generation approaches have been chosen to conduct an analysis and evaluation to verify the utility of the proposed evaluation pipeline.

RESULTS

The synthetic tabular data generated with the four selected approaches has maintained resemblance, utility, and privacy for most datasets and synthetic tabular data generation approach combination. In several datasets, some approaches have outperformed others, while in other datasets, more than one approach has yielded the same performance.

CONCLUSION

The results have shown that the proposed pipeline can effectively be used to evaluate and benchmark the synthetic tabular data generated by various synthetic tabular data generation approaches. Therefore, this pipeline can support the scientific community in selecting the most suitable synthetic tabular data generation approaches for their data and application of interest.

Collapse

Kroes SKS, van Leeuwen M, Groenwold RHH, Janssen MP. Generating synthetic mixed discrete-continuous health records with mixed sum-product networks. J Am Med Inform Assoc 2022;30:16-25. [PMID: 36228120 PMCID: PMC9748584 DOI: 10.1093/jamia/ocac184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 09/09/2022] [Accepted: 10/01/2022] [Indexed: 12/15/2022] Open

Yan C, Yan Y, Wan Z, Zhang Z, Omberg L, Guinney J, Mooney SD, Malin BA. A Multifaceted benchmarking of synthetic electronic health record generation models. Nat Commun 2022;13:7609. [PMID: 36494374 PMCID: PMC9734113 DOI: 10.1038/s41467-022-35295-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/28/2022] [Indexed: 12/13/2022] Open

Halfpenny W, Baxter SL. Towards effective data sharing in ophthalmology: data standardization and data privacy. Curr Opin Ophthalmol 2022;33:418-424. [PMID: 35819893 PMCID: PMC9357189 DOI: 10.1097/icu.0000000000000878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Zhang Z, Yan C, Malin BA. Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation. J Am Med Inform Assoc 2022;29:1890-1898. [PMID: 35927974 PMCID: PMC9552284 DOI: 10.1093/jamia/ocac131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/25/2022] [Accepted: 07/22/2022] [Indexed: 11/13/2022] Open

Hahn W, Schütte K, Schultz K, Wolkenhauer O, Sedlmayr M, Schuler U, Eichler M, Bej S, Wolfien M. Contribution of Synthetic Data Generation towards an Improved Patient Stratification in Palliative Care. J Pers Med 2022;12:1278. [PMID: 36013227 PMCID: PMC9409663 DOI: 10.3390/jpm12081278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 07/29/2022] [Accepted: 08/01/2022] [Indexed: 11/23/2022] Open

Affiliation(s)

Waldemar Hahn Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
Katharina Schütte University Palliative Center, University Hospital Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
Kristian Schultz Department of Systems Biology and Bioinformatics, University of Rostock, Universitätsplatz 1, 18051 Rostock, Germany
Olaf Wolkenhauer Department of Systems Biology and Bioinformatics, University of Rostock, Universitätsplatz 1, 18051 Rostock, Germany Leibniz-Institute for Food Systems Biology, Technical University Munich, 85354 Freising, Germany Stellenbosch Institute of Advanced Study, Wallenberg Research Centre, Stellenbosch University, Stellenbosch 7602, South Africa
Martin Sedlmayr Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
Ulrich Schuler University Palliative Center, University Hospital Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
Martin Eichler National Center for Tumor Diseases Dresden (NCT/UCC), Fetscherstraße 74, 01307 Dresden, Germany German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany Faculty of Medicine, University Hospital Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Bautzner Landstraße 400, 01328 Dresden, Germany
Saptarshi Bej Department of Systems Biology and Bioinformatics, University of Rostock, Universitätsplatz 1, 18051 Rostock, Germany Leibniz-Institute for Food Systems Biology, Technical University Munich, 85354 Freising, Germany
Markus Wolfien Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany

Collapse

Javidi H, Mariam A, Khademi G, Zabor EC, Zhao R, Radivoyevitch T, Rotroff DM. Identification of robust deep neural network models of longitudinal clinical measurements. NPJ Digit Med 2022;5:106. [PMID: 35896817 PMCID: PMC9329311 DOI: 10.1038/s41746-022-00651-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 07/06/2022] [Indexed: 11/09/2022] Open

GAN-Based Approaches for Generating Structured Data in the Medical Domain. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12147075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Hernandez M, Epelde G, Alberdi A, Cilla R, Rankin D. Synthetic data generation for tabular health records: A systematic review. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.053] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

André A, Peyrou B, Carpentier A, Vignaux JJ. Feasibility and Assessment of a Machine Learning-Based Predictive Model of Outcome After Lumbar Decompression Surgery. Global Spine J 2022;12:894-908. [PMID: 33207969 PMCID: PMC9344503 DOI: 10.1177/2192568220969373] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Incorporation of Synthetic Data Generation Techniques within a Controlled Data Processing Workflow in the Health and Wellbeing Domain. ELECTRONICS 2022. [DOI: 10.3390/electronics11050812] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Torfi A, Fox EA, Reddy CK. Differentially private synthetic medical data generation using convolutional GANs. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.12.018] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Gupta M, Poulain R, Phan TLT, Bunnell HT, Beheshti R. Flexible-Window Predictions on Electronic Health Records. PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE. AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE 2022;36:12510-12516. [PMID: 36312212 PMCID: PMC9610888 DOI: 10.1609/aaai.v36i11.21520] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Dinh TQ, Xiong Y, Huang Z, Vo T, Mishra A, Kim WH, Ravi SN, Singh V. Performing Group Difference Testing on Graph Structured Data From GANs: Analysis and Applications in Neuroimaging. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022;44:877-889. [PMID: 32763848 PMCID: PMC7867665 DOI: 10.1109/tpami.2020.3013433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Postpartum pelvic organ prolapse assessment via adversarial feature complementation in heterogeneous data. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06869-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Nie Y, Huang C, Liang H, Xu H. Adversarial and Implicit Modality Imputation with Applications to Depression Early Detection. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20500-2_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Foomani FH, Anisuzzaman DM, Niezgoda J, Niezgoda J, Guns W, Gopalakrishnan S, Yu Z. Synthesizing time-series wound prognosis factors from electronic medical records using generative adversarial networks. J Biomed Inform 2021;125:103972. [PMID: 34920125 DOI: 10.1016/j.jbi.2021.103972] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 09/20/2021] [Accepted: 12/03/2021] [Indexed: 11/26/2022]

Abstract

Wound prognostic models not only provide an estimate of wound healing time to motivate patients to follow up their treatments but also can help clinicians to decide whether to use a standard care or adjuvant therapies and to assist them with designing clinical trials. However, collecting prognosis factors from Electronic Medical Records (EMR) of patients is challenging due to privacy, sensitivity, and confidentiality. In this study, we developed time series medical generative adversarial networks (GANs) to generate synthetic wound prognosis factors using very limited information collected during routine care in a specialized wound care facility. The generated prognosis variables are used in developing a predictive model for chronic wound healing trajectory. Our novel medical GAN can produce both continuous and categorical features from EMR. Moreover, we applied temporal information to our model by considering data collected from the weekly follow-ups of patients. Conditional training strategies were utilized to enhance training and generate classified data in terms of healing or non-healing. The ability of the proposed model to generate realistic EMR data was evaluated by TSTR (test on the synthetic, train on the real), discriminative accuracy, and visualization. We utilized samples generated by our proposed GAN in training a prognosis model to demonstrate its real-life application. Using the generated samples in training predictive models improved the classification accuracy by 6.66-10.01% compared to the previous EMR-GAN. Additionally, the suggested prognosis classifier has achieved the area under the curve (AUC) of 0.875, 0.810, and 0.647 when training the network using data from the first three visits, first two visits, and first visit, respectively. These results indicate a significant improvement in wound healing prediction compared to the previous prognosis models.

Collapse

Engr YS, Lalande A, Afilalo J, Jodoin PM. Generative Adversarial Networks in Cardiology. Can J Cardiol 2021;38:196-203. [PMID: 34780990 DOI: 10.1016/j.cjca.2021.11.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 11/04/2021] [Accepted: 11/08/2021] [Indexed: 01/18/2023] Open

Zuo Z, Watson M, Budgen D, Hall R, Kennelly C, Al Moubayed N. Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study. JMIR Med Inform 2021;9:e29871. [PMID: 34652278 PMCID: PMC8556642 DOI: 10.2196/29871] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/21/2021] [Accepted: 08/02/2021] [Indexed: 01/29/2023] Open

Abstract

BACKGROUND

Data science offers an unparalleled opportunity to identify new insights into many aspects of human life with recent advances in health care. Using data science in digital health raises significant challenges regarding data privacy, transparency, and trustworthiness. Recent regulations enforce the need for a clear legal basis for collecting, processing, and sharing data, for example, the European Union's General Data Protection Regulation (2016) and the United Kingdom's Data Protection Act (2018). For health care providers, legal use of the electronic health record (EHR) is permitted only in clinical care cases. Any other use of the data requires thoughtful considerations of the legal context and direct patient consent. Identifiable personal and sensitive information must be sufficiently anonymized. Raw data are commonly anonymized to be used for research purposes, with risk assessment for reidentification and utility. Although health care organizations have internal policies defined for information governance, there is a significant lack of practical tools and intuitive guidance about the use of data for research and modeling. Off-the-shelf data anonymization tools are developed frequently, but privacy-related functionalities are often incomparable with regard to use in different problem domains. In addition, tools to support measuring the risk of the anonymized data with regard to reidentification against the usefulness of the data exist, but there are question marks over their efficacy.

OBJECTIVE

In this systematic literature mapping study, we aim to alleviate the aforementioned issues by reviewing the landscape of data anonymization for digital health care.

METHODS

We used Google Scholar, Web of Science, Elsevier Scopus, and PubMed to retrieve academic studies published in English up to June 2020. Noteworthy gray literature was also used to initialize the search. We focused on review questions covering 5 bottom-up aspects: basic anonymization operations, privacy models, reidentification risk and usability metrics, off-the-shelf anonymization tools, and the lawful basis for EHR data anonymization.

RESULTS

We identified 239 eligible studies, of which 60 were chosen for general background information; 16 were selected for 7 basic anonymization operations; 104 covered 72 conventional and machine learning-based privacy models; four and 19 papers included seven and 15 metrics, respectively, for measuring the reidentification risk and degree of usability; and 36 explored 20 data anonymization software tools. In addition, we also evaluated the practical feasibility of performing anonymization on EHR data with reference to their usability in medical decision-making. Furthermore, we summarized the lawful basis for delivering guidance on practical EHR data anonymization.

CONCLUSIONS

This systematic literature mapping study indicates that anonymization of EHR data is theoretically achievable; yet, it requires more research efforts in practical implementations to balance privacy preservation and usability to ensure more reliable health care applications.

Collapse

Improving GAN with inverse cumulative distribution function for tabular data synthesis. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.098] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Murdoch B. Privacy and artificial intelligence: challenges for protecting health information in a new era. BMC Med Ethics 2021;22:122. [PMID: 34525993 PMCID: PMC8442400 DOI: 10.1186/s12910-021-00687-3] [Citation(s) in RCA: 100] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 08/25/2021] [Indexed: 12/15/2022] Open

Wolterink JM, Mukhopadhyay A, Leiner T, Vogl TJ, Bucher AM, Išgum I. Generative Adversarial Networks: A Primer for Radiologists. Radiographics 2021;41:840-857. [PMID: 33891522 DOI: 10.1148/rg.2021200151] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Affiliation(s)

Jelmer M Wolterink From the Department of Applied Mathematics, Faculty of Electrical Engineering, Mathematics and Computer Science, Technical Medical Centre, University of Twente, Zilverling, PO Box 217, 7500 AE Enschede, the Netherlands (J.M.W.); Department of Biomedical Engineering and Physics (J.M.W., I.I.) and Department of Radiology and Nuclear Medicine (I.I.), Amsterdam University Medical Center, Amsterdam, the Netherlands; Department of Informatics, Technische Universität Darmstadt, Darmstadt, Germany (A.M.); Department of Radiology, Utrecht University Medical Center, Utrecht, the Netherlands (T.L.); and Institute of Diagnostic and Interventional Radiology, Universitätsklinikum Frankfurt, Frankfurt, Germany (T.J.V., A.M.B.)
Anirban Mukhopadhyay From the Department of Applied Mathematics, Faculty of Electrical Engineering, Mathematics and Computer Science, Technical Medical Centre, University of Twente, Zilverling, PO Box 217, 7500 AE Enschede, the Netherlands (J.M.W.); Department of Biomedical Engineering and Physics (J.M.W., I.I.) and Department of Radiology and Nuclear Medicine (I.I.), Amsterdam University Medical Center, Amsterdam, the Netherlands; Department of Informatics, Technische Universität Darmstadt, Darmstadt, Germany (A.M.); Department of Radiology, Utrecht University Medical Center, Utrecht, the Netherlands (T.L.); and Institute of Diagnostic and Interventional Radiology, Universitätsklinikum Frankfurt, Frankfurt, Germany (T.J.V., A.M.B.)
Tim Leiner From the Department of Applied Mathematics, Faculty of Electrical Engineering, Mathematics and Computer Science, Technical Medical Centre, University of Twente, Zilverling, PO Box 217, 7500 AE Enschede, the Netherlands (J.M.W.); Department of Biomedical Engineering and Physics (J.M.W., I.I.) and Department of Radiology and Nuclear Medicine (I.I.), Amsterdam University Medical Center, Amsterdam, the Netherlands; Department of Informatics, Technische Universität Darmstadt, Darmstadt, Germany (A.M.); Department of Radiology, Utrecht University Medical Center, Utrecht, the Netherlands (T.L.); and Institute of Diagnostic and Interventional Radiology, Universitätsklinikum Frankfurt, Frankfurt, Germany (T.J.V., A.M.B.)
Thomas J Vogl From the Department of Applied Mathematics, Faculty of Electrical Engineering, Mathematics and Computer Science, Technical Medical Centre, University of Twente, Zilverling, PO Box 217, 7500 AE Enschede, the Netherlands (J.M.W.); Department of Biomedical Engineering and Physics (J.M.W., I.I.) and Department of Radiology and Nuclear Medicine (I.I.), Amsterdam University Medical Center, Amsterdam, the Netherlands; Department of Informatics, Technische Universität Darmstadt, Darmstadt, Germany (A.M.); Department of Radiology, Utrecht University Medical Center, Utrecht, the Netherlands (T.L.); and Institute of Diagnostic and Interventional Radiology, Universitätsklinikum Frankfurt, Frankfurt, Germany (T.J.V., A.M.B.)
Andreas M Bucher From the Department of Applied Mathematics, Faculty of Electrical Engineering, Mathematics and Computer Science, Technical Medical Centre, University of Twente, Zilverling, PO Box 217, 7500 AE Enschede, the Netherlands (J.M.W.); Department of Biomedical Engineering and Physics (J.M.W., I.I.) and Department of Radiology and Nuclear Medicine (I.I.), Amsterdam University Medical Center, Amsterdam, the Netherlands; Department of Informatics, Technische Universität Darmstadt, Darmstadt, Germany (A.M.); Department of Radiology, Utrecht University Medical Center, Utrecht, the Netherlands (T.L.); and Institute of Diagnostic and Interventional Radiology, Universitätsklinikum Frankfurt, Frankfurt, Germany (T.J.V., A.M.B.)
Ivana Išgum From the Department of Applied Mathematics, Faculty of Electrical Engineering, Mathematics and Computer Science, Technical Medical Centre, University of Twente, Zilverling, PO Box 217, 7500 AE Enschede, the Netherlands (J.M.W.); Department of Biomedical Engineering and Physics (J.M.W., I.I.) and Department of Radiology and Nuclear Medicine (I.I.), Amsterdam University Medical Center, Amsterdam, the Netherlands; Department of Informatics, Technische Universität Darmstadt, Darmstadt, Germany (A.M.); Department of Radiology, Utrecht University Medical Center, Utrecht, the Netherlands (T.L.); and Institute of Diagnostic and Interventional Radiology, Universitätsklinikum Frankfurt, Frankfurt, Germany (T.J.V., A.M.B.)

Collapse

Shen L, Kann BH, Taylor RA, Shung DL. The Clinician's Guide to the Machine Learning Galaxy. Front Physiol 2021;12:658583. [PMID: 33889088 PMCID: PMC8056037 DOI: 10.3389/fphys.2021.658583] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 03/10/2021] [Indexed: 11/13/2022] Open

Kaur D, Sobiesk M, Patil S, Liu J, Bhagat P, Gupta A, Markuzon N. Application of Bayesian networks to generate synthetic health data. J Am Med Inform Assoc 2021;28:801-811. [PMID: 33367620 DOI: 10.1093/jamia/ocaa303] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 11/16/2020] [Indexed: 01/08/2023] Open

Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, Payne PRO, Pfaff ER, Robinson PN, Saltz JH, Spratt H, Suver C, Wilbanks J, Wilcox AB, Williams AE, Wu C, Blacketer C, Bradford RL, Cimino JJ, Clark M, Colmenares EW, Francis PA, Gabriel D, Graves A, Hemadri R, Hong SS, Hripscak G, Jiao D, Klann JG, Kostka K, Lee AM, Lehmann HP, Lingrey L, Miller RT, Morris M, Murphy SN, Natarajan K, Palchuk MB, Sheikh U, Solbrig H, Visweswaran S, Walden A, Walters KM, Weber GM, Zhang XT, Zhu RL, Amor B, Girvin AT, Manna A, Qureshi N, Kurilla MG, Michael SG, Portilla LM, Rutter JL, Austin CP, Gersing KR. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. J Am Med Inform Assoc 2021;28:427-443. [PMID: 32805036 PMCID: PMC7454687 DOI: 10.1093/jamia/ocaa196] [Citation(s) in RCA: 304] [Impact Index Per Article: 101.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 01/12/2023] Open

Abstract

Objective

Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers.

Materials and Methods

The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics.

Results

Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access.

Conclusions

The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.

Collapse

Affiliation(s)

Melissa A Haendel Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA.,Translational and Integrative Sciences Center, Department of Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
Christopher G Chute Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
Tellen D Bennett Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, Colorado, USA
David A Eichmann School of Library and Information Science, The University of Iowa, Iowa City, Iowa, USA
Justin Guinney Sage Bionetworks, Seattle, Washington, USA
Warren A Kibbe Duke University, Durham,North Carolina, USA
Philip R O Payne Institute for Informatics, Washington University in St. Louis, Saint Louis,Missouri, USA
Emily R Pfaff North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
Peter N Robinson Jackson Laboratory, Bar Harbor, Maine, USA
Joel H Saltz Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
Heidi Spratt University of Texas Medical Branch, Galveston, Texas, USA
Christine Suver Sage Bionetworks, Seattle, Washington, USA
John Wilbanks Sage Bionetworks, Seattle, Washington, USA
Adam B Wilcox University of Washington, Seattle, Washington, USA
Andrew E Williams Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston,Massachusetts, USA
Chunlei Wu Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA
Clair Blacketer Janssen Research and Development, LLC, Raritan, New Jersey, USA
Robert L Bradford North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
James J Cimino University of Alabama-Birmingham, Birmingham, Alabama, USA
Marshall Clark North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
Evan W Colmenares Department of Pharmaceutical Outcomes and Policy, University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
Patricia A Francis Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Davera Gabriel Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Alexis Graves University of Iowa Institute for Clinical and Translational Science, The University of Iowa, Iowa City, Iowa, USA
Raju Hemadri National Center for Advancing Translational Science, Bethesda, Maryland, USA
Stephanie S Hong Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
George Hripscak Department of Biomedical Informatics, Columbia University, New York, New York, USA
Dazhi Jiao Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Jeffrey G Klann Harvard Medical School, Boston,Massachusetts, USA
Kristin Kostka IQVIA, Durham, North Carolina, USA
Adam M Lee University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
Harold P Lehmann Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Lora Lingrey TriNetX, Cambridge,Massachusetts, USA
Robert T Miller Tufts Clinical and Translational Science Institute, Tufts University, Boston,Massachusetts, USA
Michele Morris Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
Shawn N Murphy Mass General Brigham, Boston,Massachusetts, USA
Karthik Natarajan Irving Medical Center, Columbia University, New York, New York, USA
Matvey B Palchuk TriNetX, Cambridge,Massachusetts, USA
Usman Sheikh National Center for Advancing Translational Science, Bethesda, Maryland, USA
Harold Solbrig Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Shyam Visweswaran Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
Anita Walden Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA.,Sage Bionetworks, Seattle, Washington, USA
Kellie M Walters North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
Griffin M Weber Department of Biomedical Informatics, Harvard Medical School, Boston,Massachusetts, USA
Xiaohan Tanner Zhang Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Richard L Zhu Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Benjamin Amor Palantir Technologies, Palo Alto, California, USA
Andrew T Girvin Palantir Technologies, Palo Alto, California, USA
Amin Manna Palantir Technologies, Palo Alto, California, USA
Nabeel Qureshi Palantir Technologies, Palo Alto, California, USA
Michael G Kurilla Division of Clinical Innovation, National Center for Advancing Translational Science, Bethesda, Maryland, USA
Sam G Michael National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
Lili M Portilla Office of Strategic Alliances, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
Joni L Rutter Office of the Director, National Center for Advancing Translational Science, Bethesda, Maryland, USA
Christopher P Austin National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
Ken R Gersing National Center for Advancing Translational Science, Bethesda, Maryland, USA

Collapse

Zhang Z, Yan C, Mesa DA, Sun J, Malin BA. Ensuring electronic medical record simulation through better training, modeling, and evaluation. J Am Med Inform Assoc 2021;27:99-108. [PMID: 31592533 DOI: 10.1093/jamia/ocz161] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 07/29/2019] [Accepted: 08/15/2019] [Indexed: 12/15/2022] Open