1
|
Kaur D, Sobiesk M, Patil S, Liu J, Bhagat P, Gupta A, Markuzon N. Application of Bayesian networks to generate synthetic health data. J Am Med Inform Assoc 2021; 28:801-811. [PMID: 33367620 DOI: 10.1093/jamia/ocaa303] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 11/16/2020] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVE This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data. MATERIALS AND METHODS We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data. RESULTS Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules. DISCUSSION Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools. CONCLUSION We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.
Collapse
Affiliation(s)
- Dhamanpreet Kaur
- Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Matthew Sobiesk
- Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Shubham Patil
- Rochester Institute of Technology, Rochester, New York, USA
| | - Jin Liu
- Clinical Informatics, Philips Research North America, Cambridge, Massachusetts, USA
| | - Puran Bhagat
- Clinical Informatics, Philips Research North America, Cambridge, Massachusetts, USA
| | - Amar Gupta
- Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Natasha Markuzon
- Clinical Informatics, Philips Research North America, Cambridge, Massachusetts, USA
| |
Collapse
|
2
|
Jacquemard T, Doherty CP, Fitzsimons MB. Examination and diagnosis of electronic patient records and their associated ethics: a scoping literature review. BMC Med Ethics 2020; 21:76. [PMID: 32831076 PMCID: PMC7446190 DOI: 10.1186/s12910-020-00514-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 08/03/2020] [Indexed: 02/22/2023] Open
Abstract
Background Electronic patient record (EPR) technology is a key enabler for improvements to healthcare service and management. To ensure these improvements and the means to achieve them are socially and ethically desirable, careful consideration of the ethical implications of EPRs is indicated. The purpose of this scoping review was to map the literature related to the ethics of EPR technology. The literature review was conducted to catalogue the prevalent ethical terms, to describe the associated ethical challenges and opportunities, and to identify the actors involved. By doing so, it aimed to support the future development of ethics guidance in the EPR domain. Methods To identify journal articles debating the ethics of EPRs, Scopus, Web of Science, and PubMed academic databases were queried and yielded 123 eligible articles. The following inclusion criteria were applied: articles need to be in the English language; present normative arguments and not solely empirical research; include an abstract for software analysis; and discuss EPR technology. Results The medical specialty, type of information captured and stored in EPRs, their use and functionality varied widely across the included articles. Ethical terms extracted were categorised into clusters ‘privacy’, ‘autonomy’, ‘risk/benefit’, ‘human relationships’, and ‘responsibility’. The literature shows that EPR-related ethical concerns can have both positive and negative implications, and that a wide variety of actors with rights and/or responsibilities regarding the safe and ethical adoption of the technology are involved. Conclusions While there is considerable consensus in the literature regarding EPR-related ethical principles, some of the associated challenges and opportunities remain underdiscussed. For example, much of the debate is presented in a manner more in keeping with a traditional model of healthcare and fails to take account of the multidimensional ensemble of factors at play in the EPR era and the consequent need to redefine/modify ethical norms to align with a digitally-enabled health service. Similarly, the academic discussion focuses predominantly on bioethical values. However, approaches from digital ethics may also be helpful to identify and deliberate about current and emerging EPR-related ethical concerns.
Collapse
Affiliation(s)
- Tim Jacquemard
- FutureNeuro, the SFI Research Centre for Chronic and Rare Neurological Diseases, 123 Stephen's Green, Dublin 2, Ireland.
| | - Colin P Doherty
- FutureNeuro, the SFI Research Centre for Chronic and Rare Neurological Diseases, 123 Stephen's Green, Dublin 2, Ireland.,Department of Neurology, St. James's Hospital, James's Street, Dublin 8, Ireland.,Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Mary B Fitzsimons
- FutureNeuro, the SFI Research Centre for Chronic and Rare Neurological Diseases, 123 Stephen's Green, Dublin 2, Ireland
| |
Collapse
|
3
|
Zandesh Z, Ghazisaeedi M, Devarakonda MV, Haghighi MS. Legal framework for health cloud: A systematic review. Int J Med Inform 2019; 132:103953. [DOI: 10.1016/j.ijmedinf.2019.103953] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 07/29/2019] [Accepted: 08/18/2019] [Indexed: 10/26/2022]
|
4
|
Karasneh RA, Al-Azzam SI, Alzoubi KH, Hawamdeh SS, Muflih SM. Patient Data Sharing and Confidentiality Practices of Researchers in Jordan. Risk Manag Healthc Policy 2019; 12:255-263. [PMID: 31819686 PMCID: PMC6890205 DOI: 10.2147/rmhp.s227759] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 11/14/2019] [Indexed: 11/23/2022] Open
Abstract
Purpose The main focus of this study is to assess the knowledge and practices of healthcare practitioners regarding data sharing, security, and confidentiality, with a focus on the use of health data retrieved from electronic health records (EHRs) for research purposes. Methods A descriptive, cross-sectional, questionnaire-based survey study was conducted across all academic institutions including all researchers in the medical field in Jordan. Personal and administrative practices in data sharing were assessed through collecting data from respondents. Results The response rate was 22% with an average of 10.25 years of experience in publications. Almost 60% had published at least 1 to 3 studies using EHRs. The prevalence of researchers who "Always" used antivirus software and preserved patient's information was 75.5% and 92.2%, respectively. However, other personal security and confidentiality measures were not satisfactory. Less than half of health data used in the research was "Always" anonymised or encrypted and only around 44.0% had "Always" used sensitive data with more specificity than normal data. Conclusion Confidentiality and data sharing practices of healthcare practitioners and researchers were generally less than optimal. Efforts from healthcare providers, health institutions, and lawmakers should be put in place to protect the security and confidentiality of electronic patient data.
Collapse
Affiliation(s)
- Reema A Karasneh
- Department of Basic Medical Sciences, Faculty of Medicine, Yarmouk University, Irbid, Jordan
| | - Sayer I Al-Azzam
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, Jordan
| | - Karem H Alzoubi
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, Jordan
| | - Sahar S Hawamdeh
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, Jordan
| | - Suhaib M Muflih
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, Jordan
| |
Collapse
|
5
|
Weng C, Friedman C, Rommel CA, Hurdle JF. A two-site survey of medical center personnel's willingness to share clinical data for research: implications for reproducible health NLP research. BMC Med Inform Decis Mak 2019; 19:70. [PMID: 30943963 PMCID: PMC6448185 DOI: 10.1186/s12911-019-0778-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background A shareable repository of clinical notes is critical for advancing natural language processing (NLP) research, and therefore a goal of many NLP researchers is to create a shareable repository of clinical notes, that has breadth (from multiple institutions) as well as depth (as much individual data as possible). Methods We aimed to assess the degree to which individuals would be willing to contribute their health data to such a repository. A compact e-survey probed willingness to share demographic and clinical data categories. Participants were faculty, staff, and students in two geographically diverse major medical centers (Utah and New York). Such a sample could be expected to respond like a typical potential participant from the general public who is given complete and fully informed consent about the pros and cons of participating in a research study. Results Two thousand one hundred forty respondents completed the surveys. 56% of respondents were “somewhat/definitely willing” to share clinical data with identifiers, while 89% of respondents were “somewhat (17%)/definitely willing (72%)” to share without identifiers. Results were consistent across gender, age, and education, but there were some differences by geographical region. Individuals were most reluctant (50–74%) sharing mental health, substance abuse, and domestic violence data. Conclusions We conclude that a substantial fraction of potential patient participants, once educated about risks and benefits, would be willing to donate de-identified clinical data to a shared research repository. A slight majority even would be willing to share absent de-identification, suggesting that perceptions about data misuse are not a major concern. Such a repository of clinical notes should be invaluable for clinical NLP research and advancement.
Collapse
Affiliation(s)
- Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York City, NY, 10025, USA
| | - Carol Friedman
- Department of Biomedical Informatics, Columbia University, New York City, NY, 10025, USA
| | - Casey A Rommel
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84108, USA
| | - John F Hurdle
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84108, USA.
| |
Collapse
|
6
|
Binswanger IA, Morenoff JD, Chilcote CA, Harding DJ. Ascertainment of Vital Status Among People With Criminal Justice Involvement Using Department of Corrections Records, the US National Death Index, and Social Security Master Death Files. Am J Epidemiol 2017; 185:982-985. [PMID: 28387782 DOI: 10.1093/aje/kww221] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 12/02/2016] [Indexed: 12/23/2022] Open
Affiliation(s)
- Ingrid A. Binswanger
- Institute for Health Research, Kaiser Permanente Colorado, Denver, CO
- Division of General Internal Medicine, Department of Medicine, University of Colorado School of Medicine, Aurora, CO
| | - Jeffrey D. Morenoff
- Department of Sociology, University of Michigan, Ann Arbor, MI
- Institute for Social Research-Populations Studies Center, University of Michigan, Ann Arbor, MI
| | - Charley A. Chilcote
- Institute for Social Research-Populations Studies Center, University of Michigan, Ann Arbor, MI
- Risk, Classification and Program Evaluation, Michigan Department of Corrections, Lansing, MI
| | - David J. Harding
- Department of Sociology, University of California, Berkeley, Berkeley, CA
- Berkeley Population Center, University of California, Berkeley, Berkeley, CA
| |
Collapse
|
7
|
Secondary use of health data. J Formos Med Assoc 2016; 115:137-8. [DOI: 10.1016/j.jfma.2015.03.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 02/26/2015] [Accepted: 03/16/2015] [Indexed: 11/23/2022] Open
|
8
|
Kuehl DR, Berdahl CT, Jackson TD, Venkatesh AK, Mistry RD, Bhargavan-Chatfield M, Raukar NP, Carr BG, Schuur JD, Kocher KE. Advancing the Use of Administrative Data for Emergency Department Diagnostic Imaging Research. Acad Emerg Med 2015; 22:1417-26. [PMID: 26575944 DOI: 10.1111/acem.12827] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 07/09/2015] [Indexed: 01/18/2023]
Abstract
Administrative data are critical to describing patterns of use, cost, and appropriateness of imaging in emergency care. These data encompass a range of source materials that have been collected primarily for a nonresearch use: documenting clinical care (e.g., medical records), administering care (e.g., picture archiving and communication systems), or financial transactions (e.g., insurance claims). These data have served as the foundation for large, descriptive studies that have documented the rise and expanded role of diagnostic imaging in the emergency department (ED). This article summarizes the discussions of the breakout session on the use of administrative data for emergency imaging research at the May 2015 Academic Emergency Medicine consensus conference, "Diagnostic Imaging in the Emergency Department: A Research Agenda to Optimize Utilization." The authors describe the areas where administrative data have been applied to research evaluating the use of diagnostic imaging in the ED, the common sources for these data, and the strengths and limitations of administrative data. Next, the future role of administrative data is examined for answering key research questions in an evolving health system increasingly focused on measuring appropriateness, ensuring quality, and improving value for health spending. This article specifically focuses on four thematic areas: data quality, appropriateness and value, special populations, and policy interventions.
Collapse
Affiliation(s)
- Damon R. Kuehl
- Department of Emergency Medicine; Virginia Tech Carilion School of Medicine; Roanoke VA
| | - Carl T. Berdahl
- Department of Emergency Medicine; Los Angeles County + University of Southern California Medical Center; Los Angeles CA
| | - Tiffany D. Jackson
- Department of Emergency Medicine; University of Alabama Birmingham; Birmingham AL
| | | | - Rakesh D. Mistry
- Department of Emergency Medicine; Section of Emergency Medicine; Children's Hospital Colorado; Aurora CO
| | | | - Neha P. Raukar
- Department of Emergency Medicine; Warren Alpert Medical School of Brown University; Providence RI
| | - Brendan G. Carr
- Department of Emergency Medicine; Sidney Kimmel Medical College; Thomas Jefferson University; Philadelphia PA
| | - Jeremiah D. Schuur
- Department of Emergency Medicine; Brigham and Women's Hospital; Boston MA
| | - Keith E. Kocher
- Department of Emergency Medicine; University of Michigan School of Medicine; Ann Arbor MI
| |
Collapse
|
9
|
Boland MR, Tatonetti NP, Hripcsak G. Development and validation of a classification approach for extracting severity automatically from electronic health records. J Biomed Semantics 2015; 6:14. [PMID: 25848530 PMCID: PMC4386082 DOI: 10.1186/s13326-015-0010-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 03/03/2015] [Indexed: 12/29/2022] Open
Abstract
Background Electronic Health Records (EHRs) contain a wealth of information useful for studying clinical phenotype-genotype relationships. Severity is important for distinguishing among phenotypes; however other severity indices classify patient-level severity (e.g., mild vs. acute dermatitis) rather than phenotype-level severity (e.g., acne vs. myocardial infarction). Phenotype-level severity is independent of the individual patient’s state and is relative to other phenotypes. Further, phenotype-level severity does not change based on the individual patient. For example, acne is mild at the phenotype-level and relative to other phenotypes. Therefore, a given patient may have a severe form of acne (this is the patient-level severity), but this does not effect its overall designation as a mild phenotype at the phenotype-level. Methods We present a method for classifying severity at the phenotype-level that uses the Systemized Nomenclature of Medicine – Clinical Terms. Our method is called the Classification Approach for Extracting Severity Automatically from Electronic Health Records (CAESAR). CAESAR combines multiple severity measures – number of comorbidities, medications, procedures, cost, treatment time, and a proportional index term. CAESAR employs a random forest algorithm and these severity measures to discriminate between severe and mild phenotypes. Results Using a random forest algorithm and these severity measures as input, CAESAR differentiates between severe and mild phenotypes (sensitivity = 91.67, specificity = 77.78) when compared to a manually evaluated reference standard (k = 0.716). Conclusions CAESAR enables researchers to measure phenotype severity from EHRs to identify phenotypes that are important for comparative effectiveness research.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA ; Department of Systems Biology, Columbia University, New York, NY USA ; Department of Medicine, Columbia University, New York, NY USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA
| |
Collapse
|
10
|
Huser V, Kayaalp M, Dodd ZA, Cimino JJ. Piloting a deceased subject integrated data repository and protecting privacy of relatives. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2014; 2014:719-728. [PMID: 25954378 PMCID: PMC4420001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Use of deceased subject Electronic Health Records can be an important piloting platform for informatics or biomedical research. Existing legal framework allows such research under less strict de-identification criteria; however, privacy of non-decedent must be protected. We report on creation of the decease subject Integrated Data Repository (dsIDR) at National Institutes of Health, Clinical Center and a pilot methodology to remove secondary protected health information or identifiable information (secondary PxI; information about persons other than the primary patient). We characterize available structured coded data in dsIDR and report the estimated frequencies of secondary PxI, ranging from 12.9% (sensitive token presence) to 1.1% (using stricter criteria). Federating decedent EHR data from multiple institutions can address sample size limitations and our pilot study provides lessons learned and methodology that can be adopted by other institutions.
Collapse
Affiliation(s)
- Vojtech Huser
- Laboratory for Informatics Development; National Institutes of Health, Clinical Center
| | | | | | - James J Cimino
- Laboratory for Informatics Development; National Institutes of Health, Clinical Center ; National Library of Medicine, Bethesda, MD, USA
| |
Collapse
|