1
|
Li G, Togo R, Ogawa T, Haseyama M. Importance-aware adaptive dataset distillation. Neural Netw 2024; 172:106154. [PMID: 38309137 DOI: 10.1016/j.neunet.2024.106154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 01/04/2024] [Accepted: 01/28/2024] [Indexed: 02/05/2024]
Abstract
Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model training process. Moreover, using raw data for training raises privacy and copyright concerns. To address these issues, a new task named dataset distillation has been introduced, aiming to synthesize a compact dataset that retains the essential information from the large original dataset. State-of-the-art (SOTA) dataset distillation methods have been proposed by matching gradients or network parameters obtained during training on real and synthetic datasets. The contribution of different network parameters to the distillation process varies, and uniformly treating them leads to degraded distillation performance. Based on this observation, we propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance by automatically assigning importance weights to different network parameters during distillation, thereby synthesizing more robust distilled datasets. IADD demonstrates superior performance over other SOTA dataset distillation methods based on parameter matching on multiple benchmark datasets and outperforms them in terms of cross-architecture generalization. In addition, the analysis of self-adaptive weights demonstrates the effectiveness of IADD. Furthermore, the effectiveness of IADD is validated in a real-world medical application such as COVID-19 detection.
Collapse
Affiliation(s)
- Guang Li
- Education and Research Center for Mathematical and Data Science, Hokkaido University, N-12, W-7, Kita-Ku, Sapporo, 060-0812, Japan.
| | - Ren Togo
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| | - Takahiro Ogawa
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| | - Miki Haseyama
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| |
Collapse
|
2
|
Strech D, Haven T, Madai VI, Meurers T, Prasser F. Generating evidence on privacy outcomes to inform privacy risk management: A way forward? J Biomed Inform 2023; 137:104257. [PMID: 36462598 DOI: 10.1016/j.jbi.2022.104257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 10/24/2022] [Accepted: 11/27/2022] [Indexed: 12/03/2022]
Abstract
Effective and efficient privacy risk management (PRM) is a necessary condition to support digitalization in health care and secondary use of patient data in research. To reduce privacy risks, current PRM frameworks are rooted in an approach trying to reduce undesired technical/organizational outcomes such as broken encryption or unintentional data disclosure. Comparing this with risk management in preventive or therapeutic medicine, a key difference becomes apparent: in health-related risk management, medicine focuses on person-specific health outcomes, whereas PRM mostly targets more indirect, technical/organizational outcomes. In this paper, we illustrate and discuss how a PRM approach based on evidence of person-specific privacy outcomes might look using three consecutive steps: i) a specification of undesired person-specific privacy outcomes, ii) empirical assessments of their frequency and severity, and iii) empirical studies on how effectively the available PRM interventions reduce their frequency or severity. After an introduction of these three steps, we cover their status quo and outline open questions and PRM-specific challenges in need of further conceptual clarification and feasibility studies. Specific challenges of an outcome-oriented approach to PRM include the potential delays between concrete threats manifesting and the resulting person/group-specific privacy outcomes. Moreover, new ways of exploiting privacy-sensitive information to harm individuals could be developed in the future. The challenges described are of technical, legal, ethical, financial and resource-oriented nature. In health research, however, there is explicit discussion about how to overcome such challenges to make important outcome-based assessments as feasible as possible. This paper concludes that it might be the time to have this discussion in the PRM field as well.
Collapse
Affiliation(s)
- Daniel Strech
- QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Tamarinde Haven
- QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Vince I Madai
- QUEST Center for Responsible Research, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany; School of Computing and Digital Technology, Birmingham City University, City Centre Campus, Millennium Point, Birmingham B4 7XG, United Kingdom
| | - Thierry Meurers
- Center of Health Data Science, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Fabian Prasser
- Center of Health Data Science, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany.
| |
Collapse
|
3
|
Li G, Togo R, Ogawa T, Haseyama M. Compressed gastric image generation based on soft-label dataset distillation for medical data sharing. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 227:107189. [PMID: 36323177 DOI: 10.1016/j.cmpb.2022.107189] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 07/07/2022] [Accepted: 10/17/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVE Sharing of medical data is required to enable the cross-agency flow of healthcare information and construct high-accuracy computer-aided diagnosis systems. However, the large sizes of medical datasets, the massive amount of memory of saved deep convolutional neural network (DCNN) models, and patients' privacy protection are problems that can lead to inefficient medical data sharing. Therefore, this study proposes a novel soft-label dataset distillation method for medical data sharing. METHODS The proposed method distills valid information of medical image data and generates several compressed images with different data distributions for anonymous medical data sharing. Furthermore, our method can extract essential weights of DCNN models to reduce the memory required to save trained models for efficient medical data sharing. RESULTS The proposed method can compress tens of thousands of images into several soft-label images and reduce the size of a trained model to a few hundredths of its original size. The compressed images obtained after distillation have been visually anonymized; therefore, they do not contain the private information of the patients. Furthermore, we can realize high-detection performance with a small number of compressed images. CONCLUSIONS The experimental results show that the proposed method can improve the efficiency and security of medical data sharing.
Collapse
Affiliation(s)
- Guang Li
- Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| | - Ren Togo
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| | - Takahiro Ogawa
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| | - Miki Haseyama
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| |
Collapse
|
4
|
Oh SR, Seo YD, Lee E, Kim YG. A Comprehensive Survey on Security and Privacy for Electronic Health Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18189668. [PMID: 34574593 PMCID: PMC8465695 DOI: 10.3390/ijerph18189668] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 09/01/2021] [Accepted: 09/09/2021] [Indexed: 12/01/2022]
Abstract
Recently, the integration of state-of-the-art technologies, such as modern sensors, networks, and cloud computing, has revolutionized the conventional healthcare system. However, security concerns have increasingly been emerging due to the integration of technologies. Therefore, the security and privacy issues associated with e-health data must be properly explored. In this paper, to investigate the security and privacy of e-health systems, we identified major components of the modern e-health systems (i.e., e-health data, medical devices, medical networks and edge/fog/cloud). Then, we reviewed recent security and privacy studies that focus on each component of the e-health systems. Based on the review, we obtained research taxonomy, security concerns, requirements, solutions, research trends, and open challenges for the components with strengths and weaknesses of the analyzed studies. In particular, edge and fog computing studies for e-health security and privacy were reviewed since the studies had mostly not been analyzed in other survey papers.
Collapse
Affiliation(s)
- Se-Ra Oh
- Miro Corporation, Incheon 21988, Korea;
| | - Young-Duk Seo
- Department of Computer Engineering, Inha University, Incheon 22212, Korea;
| | - Euijong Lee
- Department of Computer Science, Chungbuk National University, Cheongju 28644, Korea;
| | - Young-Gab Kim
- Department of Computer and Information Security, and Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Korea
- Correspondence:
| |
Collapse
|
5
|
Fake It Till You Make It: Guidelines for Effective Synthetic Data Generation. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11052158] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Synthetic data provides a privacy protecting mechanism for the broad usage and sharing of healthcare data for secondary purposes. It is considered a safe approach for the sharing of sensitive data as it generates an artificial dataset that contains no identifiable information. Synthetic data is increasing in popularity with multiple synthetic data generators developed in the past decade, yet its utility is still a subject of research. This paper is concerned with evaluating the effect of various synthetic data generation and usage settings on the utility of the generated synthetic data and its derived models. Specifically, we investigate (i) the effect of data pre-processing on the utility of the synthetic data generated, (ii) whether tuning should be applied to the synthetic datasets when generating supervised machine learning models, and (iii) whether sharing preliminary machine learning results can improve the synthetic data models. Lastly, (iv) we investigate whether one utility measure (Propensity score) can predict the accuracy of the machine learning models generated from the synthetic data when employed in real life. We use two popular measures of synthetic data utility, propensity score and classification accuracy, to compare the different settings. We adopt a recent mechanism for the calculation of propensity, which looks carefully into the choice of model for the propensity score calculation. Accordingly, this paper takes a new direction with investigating the effect of various data generation and usage settings on the quality of the generated data and its ensuing models. The goal is to inform on the best strategies to follow when generating and using synthetic data.
Collapse
|
6
|
Abstract
Most current access control models are rigid, as they are designed using static policies that always give the same outcome in different circumstances. In addition, they cannot adapt to environmental changes and unpredicted situations. With dynamic systems such as the Internet of Things (IoT) with billions of things that are distributed everywhere, these access control models are obsolete. Hence, dynamic access control models are required. These models utilize not only access policies but also contextual and real-time information to determine the access decision. One of these dynamic models is the risk-based access control model. This model estimates the security risk value related to the access request dynamically to determine the access decision. Recently, the risk-based access control model has attracted the attention of several organizations and researchers to provide more flexibility in accessing system resources. Therefore, this paper provides a systematic review and examination of the state-of-the-art of the risk-based access control model to provide a detailed understanding of the topic. Based on the selected search strategy, 44 articles (of 1044 articles) were chosen for a closer examination. Out of these articles, the contributions of the selected articles were summarized. In addition, the risk factors used to build the risk-based access control model were extracted and analyzed. Besides, the risk estimation techniques used to evaluate the risks of access control operations were identified.
Collapse
|
7
|
Dankar FK, Gergely M, Malin B, Badji R, Dankar SK, Shuaib K. Dynamic-informed consent: A potential solution for ethical dilemmas in population sequencing initiatives. Comput Struct Biotechnol J 2020; 18:913-921. [PMID: 32346464 PMCID: PMC7182686 DOI: 10.1016/j.csbj.2020.03.027] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 03/29/2020] [Accepted: 03/30/2020] [Indexed: 01/13/2023] Open
Abstract
While the majority of population-level genome sequencing initiatives claim to follow the principles of informed consent, the requirements for informed consent have not been-well defined in this context. In fact, the implementation of informed consent differs greatly across these initiatives - spanning broad consent, blanket consent, and tiered consent among others. As such, this calls for an investigation into the requirements for consent to be "informed" in the context of population genomics. One particular strategy that claims to be fully informed and to continuously engage participants is called "dynamic consent". Dynamic consent is based on a personalised communication platform that aims to facilitate the consent process. It is oriented to support continuous two-way communication between researchers and participants. In this paper, we analyze the requirements of informed consent in the context of population genomics, review various current implementations of dynamic consent, assess whether they fulfill the requirement of informed consent, and, in turn, enable participants to make autonomous and informed choices on whether or not to participate in research projects.
Collapse
Affiliation(s)
- Fida K. Dankar
- College of Information Technology, UAEU, Al-Ain, United Arab Emirates
| | - Marton Gergely
- College of Information Technology, UAEU, Al-Ain, United Arab Emirates
| | - Bradley Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, United States
| | | | | | - Khaled Shuaib
- College of Information Technology, UAEU, Al-Ain, United Arab Emirates
| |
Collapse
|
8
|
Soni H, Grando A, Murcko A, Diaz S, Mukundan M, Idouraine N, Karway G, Todd M, Chern D, Dye C, Whitfield MJ. State of the art and a mixed-method personalized approach to assess patient perceptions on medical record sharing and sensitivity. J Biomed Inform 2020; 101:103338. [PMID: 31726102 PMCID: PMC6952579 DOI: 10.1016/j.jbi.2019.103338] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Revised: 11/07/2019] [Accepted: 11/09/2019] [Indexed: 10/25/2022]
Abstract
OBJECTIVE Sensitive health information possesses risks, such as stigma and discrimination, when disclosed. Few studies have used a patient's own electronic health records (EHRs) to explore what types of information are considered sensitive andhow such perceptions affect data sharing preferences. After a systematic literature review, we designed and piloted a mixed-method approach that employs an individual's own records to assess content sensitivity and preferences for granular data sharing for care and research. METHODS A systematic literature review of methodologies employed to assess data sharing willingness and perceptions on data sensitivity was conducted. A methodology was designed to organize and categorize sensitive health information from EHRs. Patients were asked permission to access their EHRs, including those available through the state's health information exchange. A semi-structured interview script with closed card sorting was designed and personalized to each participant's own EHRs using 30 items from each patient record. This mixed method combines the quantitative outcomes from the card sorting exercises with themes captured from interview audio recording analysis. RESULTS Eight publications on patients' perspectives on data sharing and sensitivity were found. Based on our systematic review, the proposed method meets a need to use EHRs to systematize the study of data privacy issues. Twenty-five patients with behavioral health conditions, English and Spanish-speaking, were recruited. On average, participants recognized 82.7% of the 30 items from their own EHRs. Participants considered mental health (76.0%), sexual and reproductive health (75.0%) and alcohol use and alcoholism (50.0%) sensitive information. Participants were willing to share information related to other addictions (100.0%), genetic data (95.8%) and general physical health information (90.5%). CONCLUSION The findings indicate diversity in patient views on EHR sensitivity and data sharing preferences and the need for more granular and patient-centered electronic consent mechanisms to accommodate patient needs. More research is needed to validate the generalizability of the proposed methodology.
Collapse
Affiliation(s)
- Hiral Soni
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - Adela Grando
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States.
| | - Anita Murcko
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - Sabrina Diaz
- Kinesiology, College of Health Solutions, Arizona State University, Phoenix, United States
| | - Madhumita Mukundan
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - Nassim Idouraine
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - George Karway
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - Michael Todd
- College of Nursing and Health Innovation, Arizona State University, Phoenix, United States
| | | | - Christy Dye
- Partners in Recovery, Phoenix, United States
| | | |
Collapse
|
9
|
Dankar FK, Gergely M, Dankar SK. Informed Consent in Biomedical Research. Comput Struct Biotechnol J 2019; 17:463-474. [PMID: 31007872 PMCID: PMC6458444 DOI: 10.1016/j.csbj.2019.03.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 03/19/2019] [Accepted: 03/21/2019] [Indexed: 12/27/2022] Open
Abstract
Informed consent is the result of tumultuous events in both the clinical and research arenas over the last 100 years. Throughout this time, the notion of informed consent has shifted tremendously, both due to advances in medicine, as well as the type of data being gathered. As such, informed consent has misaligned with the goals of medical research. It is becoming more and more vital to address this chasm, and begin building new frameworks to link this disconnect. Thus, we address three goals in this paper. First, we discuss the history of informed consent and unify the varying definitions of the term. Second, we evaluate the current research on the topic, classify them into themes, and attend to the problems therein. Lastly, we employ these themes of informed consent research mentioned previously to provide guidance and insight for future research in the arena.
Collapse
Affiliation(s)
- Fida K. Dankar
- College of IT, UAEU, Al Ain, P.O.Box 15551, United Arab Emirates
| | - Marton Gergely
- College of IT, UAEU, Al Ain, P.O.Box 15551, United Arab Emirates
| | | |
Collapse
|
10
|
Wright CF, Ware JS, Lucassen AM, Hall A, Middleton A, Rahman N, Ellard S, Firth HV. Genomic variant sharing: a position statement. Wellcome Open Res 2019; 4:22. [PMID: 31886409 PMCID: PMC6913213 DOI: 10.12688/wellcomeopenres.15090.2] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/01/2019] [Indexed: 12/12/2022] Open
Abstract
Sharing de-identified genetic variant data is essential for the practice of genomic medicine and is demonstrably beneficial to patients. Robust genetic diagnoses that inform medical management cannot be made accurately without reference to genetic test results from other patients, as well as population controls. Errors in this process can result in delayed, missed or erroneous diagnoses, leading to inappropriate or missed medical interventions for the patient and their family. The benefits of sharing individual genetic variants, and the harms of not sharing them, are numerous and well-established. Databases and mechanisms already exist to facilitate deposition and sharing of pseudonomised genetic variants, but clarity and transparency around best practice is needed to encourage widespread use, prevent inconsistencies between different communities, maximise individual privacy and ensure public trust. We therefore recommend that widespread sharing of a small number of individual genetic variants associated with limited clinical information should become standard practice in genomic medicine. Information robustly linking genetic variants with specific conditions is fundamental biological knowledge, not personal information, and therefore should not require consent to share. For additional case-level detail about individual patients or more extensive genomic information, which is often essential for clinical interpretation, it may be more appropriate to use a controlled-access model for data sharing, with the ultimate aim of making as much information as open and de-identified as possible with appropriate consent.
Collapse
Affiliation(s)
- Caroline F. Wright
- Institute of Biomedical and Clinical Science, University of Exeter, Exeter, UK
| | - James S. Ware
- National Heart and Lung Institute, Imperial Centre for Translational and Experimental Medicine, London, UK
| | - Anneke M. Lucassen
- Department of Clinical Ethics and Law, Faculty of Medicine, University of Southampton, Southampton, UK
| | | | - Anna Middleton
- Faculty of Education, University of Cambridge, Cambridge, UK
- Connecting Science, Wellcome Genome Campus, Cambridge, UK
| | - Nazneen Rahman
- Division of Genetics and Epidemiology, Institute of Cancer Research, UK, London, UK
| | - Sian Ellard
- Institute of Biomedical and Clinical Science, University of Exeter, Exeter, UK
| | - Helen V. Firth
- Department of Clinical Genetics, University of Cambridge Addenbrooke's Hospital Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Cambridge, UK
| |
Collapse
|
11
|
Wright CF, Ware JS, Lucassen AM, Hall A, Middleton A, Rahman N, Ellard S, Firth HV. Genomic variant sharing: a position statement. Wellcome Open Res 2019. [DOI: 10.12688/wellcomeopenres.15090.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Sharing de-identified genetic variant data is essential for the practice of genomic medicine and is demonstrably beneficial to patients. Robust genetic diagnoses that inform medical management cannot be made accurately without reference to genetic test results from other patients, as well as population controls. Errors in this process can result in delayed, missed or erroneous diagnoses, leading to inappropriate or missed medical interventions for the patient and their family. The benefits of sharing individual genetic variants, and the harms of not sharing them, are numerous and well-established. Databases and mechanisms already exist to facilitate deposition and sharing of pseudonomised genetic variants, but clarity and transparency around best practice is needed to encourage widespread use, prevent inconsistencies between different communities, maximise individual privacy and ensure public trust. We therefore recommend that widespread sharing of a small number of individual genetic variants associated with limited clinical information should become standard practice in genomic medicine. Information robustly linking genetic variants with specific conditions is fundamental biological knowledge, not personal information, and therefore should not require consent to share. For additional case-level detail about individual patients or more extensive genomic information, which is often essential for clinical interpretation, it may be more appropriate to use a controlled-access model for data sharing, with the ultimate aim of making as much information as open and de-identified as possible with appropriate consent.
Collapse
|
12
|
Dankar FK, Ptitsyn A, Dankar SK. The development of large-scale de-identified biomedical databases in the age of genomics-principles and challenges. Hum Genomics 2018; 12:19. [PMID: 29636096 PMCID: PMC5894154 DOI: 10.1186/s40246-018-0147-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 03/15/2018] [Indexed: 12/24/2022] Open
Abstract
Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.
Collapse
Affiliation(s)
| | - Andrey Ptitsyn
- Gloucester Marine Genomics Institute, Gloucester, MA, USA
| | - Samar K Dankar
- Faculty of Sciences, University of Balamand, Souk El Ghareb, Lebanon
| |
Collapse
|